CMU Randomized Algorithms

Randomized Algorithms, Carnegie Mellon: Spring 2011

Lecture #12: Learning Theory 1

Today we talked about the problem of learning a target function from examples, where examples are drawn from some distribution D, and the goal is to use a labeled sample S (a set of examples drawn from D and labeled by the target f) to produce a function h such that Pr_{x \sim D}[h(x)\neq f(x)] is low. We gave a simple efficient algorithm for learning decision-lists in this setting, a basic “Occam’s razor” bound, and then a more interesting bound using the notion of shatter coefficients and a “ghost sample” argument. See 1st half of these lecture notes.

A few additional comments:

  • One way to interpret the basic Occam bound is that in principle, anything you can represent in a polynomial number of bits you can learn from a polynomial number of examples (if running time is not a concern). Also “data compression implies learning”: if you can take a set of m examples and find a prediction rule that is correct on the sample and requires < m/10 bits to write down, then you can be confident it will have low error on future points.
  • On the other hand, we would really like to learn from as few examples as possible, which is the reason for wanting bounds based on more powerful notions of the “underlying complexity” of the target function, such as shatter coefficients. Other very interesting bounds are based on a notion called “Rademacher complexity” which is even tighter.
  • For more info, see notes for 15-859(B) machine learning theory

Comments are closed.

%d bloggers like this: