PAC-Learning Geometrical Figures

Paul W Goldberg

Abstract: The thesis studies the following problem: Given a set of geometrical figures (such as planar polygons), each one labelled according to whether or not it resembles some ``ideal'' figure, find a good approximation to that ideal figure which can be used to classify other figures in the same way.

We work within the PAC learning model introduced by Valiant in 1984. Informally, the concepts under consideration are sets of polygons which resemble each other visually. A leaning algorithm is given collections of members and non-members of a concept, and its task is to infer a criterion for membership which is consistent with the given examples and which can be used as an accurate classifier of further example polygons.

In order to formalise the notion of a concept, we use metrics which measure the extent to which two polygons differ. A concept is assumed to be the set of polygons which are within some distance of some fixed central polygon. In the thesis we work most extensively with the Hausdorff metric.

Using the Hausdorff metric we obtain NP-completeness results for several variants of the learning problem. In particular we show that it is hard to find a single geometrical figure which is close to the positive examples but not to the negative examples. This result holds under various assumptions about the specific geometrical figures under consideration. It also holds for several metrics other than the Hausdorff metric.

Despite the NP-completeness results mentioned above we have found some encouraging positive results. In particular, we have discovered a general technique for prediction. (Prediction is a less demanding learning model that PAC learning. The goal is to find a polynomial time.) Using our technique we have obtained polynomial-time algorithms for predicting many of the geometrical concept classes studied in the thesis. These algorithms do not classify geometrical figures by measuring their distance from a single ``ideal'' geometrical figure. Instead, they identify a collection of concepts whose intersection may be used to classify examples reliably.

It is natural to consider the case in which only positive examples are available. In the thesis we show that some but not all of the concept classes may be predicted from positive examples alone.

We consider prediction to be a useful goal, since it solves the practical problem of classifying unlabelled examples. However in the final section of the thesis we show a theoretical limitation to the effectiveness of this technique. In particular, assuming the existence of trapdoor functions, not polynomial-time algorithm for prediction exists for polygons in the plane which are assumed to be equivalent under classes of isometrics that include rotations.

PhD Thesis - Price £8.50

LFCS report ECS-LFCS-93-258 (also published as CST-98-93)

Previous | Index | Next