Parsing in the SML Kit

Nick Rothwell

Introduction: This document describes the parsing scheme of the SML Kit. We introduce the grammar datatype, or abstract syntax, and the grammar specification itself, explaining the conventions used. There are various mechanisms employed to resolve the ambiguities and parsing difficulties of SML, and we describe these together with the mechanisms which deal with infix operators, derived forms, and so on.

We use the MLLex and MLYacc packages. Both have been modified to produce portable code running under the SML Library, rather than SML/NJ-specific code, so that the resulting parser can be compiled under other SML systems. In fact, these versions of MLLex and MLYacc are themselves portable, so that all Kit development work can take place using a different SML compiler.

The generated lexing and parsing functors are supported by a number of utility functors for dealing with infixes, derived-form elimination, and so on. There are also utility functors for lexing, providing positional information for error messages, and so on. All these components are gathered together in the functor Parse.

Section 2 describes the abstract syntax and its parameterisation. Section 3 covers lexing issues, and Section 4 overviews the general structure of the parser. We cover the grammar specification in Section 5, and the functor structure and linkage in Section 6. The constituent files of the parsing system are described in Section 7.

At this stage we should probably mention the main thing that the parsing phase does not do. The Definition states a number of syntactic restrictions (Section 2.9), mainly concerned with repeated identifiers of various kinds. In fact, some of these restrictions are not syntactic since they are impossible to enforce at parse time, and require elaboration. Specifically, repeated identifiers in value bindings and patterns cannot be detected at parse time since it is legal to repeat value and exception constructors, and these cannot be determined before elaboration. We have chosen to enforce none of the rules about repeated identifiers, for the sake of simplicity; it is cleaner to deal with these errors during elaboration. The one remaining syntactic restriction involving val rec is truly syntactic and can be dealt with in the grammar.

LFCS report ECS-LFCS-92-236, September 1992.

Previous | Index | Next