Agnostically Learning Halfspaces

Adam Klivans

University of Texas at Austin

Monday, April 9th at 2:30 in AKW 200

ABSTRACT
We give the first algorithm that efficiently learns halfspaces (under distributional assumptions) in the notoriously difficult agnostic framework of Kearns, Schapire, and Sellie. In this model, a learner is given arbitrarily labeled examples from a fixed distribution and must output a hypothesis competitive with the optimal halfspace hypothesis. Our algorithm constructs a hypothesis whose error rate on future examples is within an additive \eps of the optimal halfspace in time poly(n) for any constant \eps>0 under the uniform distribution over {0,1}^n or the unit sphere in R^n, as well as under any log-concave distribution over R^n. It also agnostically learns Boolean disjunctions in time 2^{\tilde{O}(\sqrt{n})} with respect to any distribution. The new algorithm, essentially L_1 polynomial regression, is a noise-tolerant arbitrary-distribution generalization of the ``low-degree'' Fourier algorithm of Linial, Mansour, & Nisan. Our Fourier-type algorithm over the unit sphere makes use of approximation properties of various classes of orthogonal polynomials. Time permitting, I will discuss negative results for other halfspace-based classification tasks. Joint work with A. Kalai, Y. Mansour, and R. Servedio. Return to DMTCS home page.

ABSTRACT

We give the first algorithm that efficiently learns halfspaces (under
distributional assumptions) in the notoriously difficult agnostic framework of
Kearns, Schapire, and Sellie.  In this model, a learner is given arbitrarily
labeled examples from a fixed distribution and must output a hypothesis
competitive with the optimal halfspace hypothesis.

Our algorithm constructs a hypothesis whose error rate on future
examples is within an additive \eps of the optimal halfspace in time
poly(n) for any constant \eps>0 under the uniform distribution over
{0,1}^n or the unit sphere in R^n, as well as under any
log-concave distribution over R^n. It also agnostically learns Boolean
disjunctions in time 2^{\tilde{O}(\sqrt{n})} with respect to
*any* distribution.  The new algorithm, essentially L_1 polynomial
regression, is a noise-tolerant arbitrary-distribution generalization
of the ``low-degree'' Fourier algorithm of Linial, Mansour, & Nisan.  Our
Fourier-type algorithm over the unit sphere makes use of approximation
properties of various classes of orthogonal polynomials.

Time permitting, I will discuss negative results for other halfspace-based classification tasks.

Joint work with A. Kalai, Y. Mansour, and R. Servedio.

Return to DMTCS home page.