Chapter 1: Introduction

Poly/ML is an implementation of Standard ML, with a few non-standard extensions such as arrays, processes, a make system and the fact that ML values may become 'persistent'.

1.1 The structurePolyML

Most non-standard system-level functions are held in the structure PolyML. This structure contains at least the following functions:

val commit: unit-> bool
val quit: unit -> unit
val exit: int -> unit
val use: string -> unit
val cd: string -> unit

val isLogging: unit -> bool
val startLog: string -> unit
val restartLog: string -> unit
val stopLog: unit -> unit
val logName: unit -> string
val writeLog: string -> unit

val exception_trace: (unit -> a) -> 'a

val print_depth: int -> unit

val error_depth: int -> unit

val timing: bool -> unit
val profiling: int -> unit

val make-database : string -> unit

val make: string unit
val depends: string unit

The purpose of the above functions is explained in the following sections except for make and depends which are discussed in chapter 7.

In addition, the Motif Edition of Poly/ML supports the X Windows System. The functions which provide the Poly/ML interface to X are encapsulated in the structures XWindows and Motif which, because of their size and importance, are described in separate manuals.

1.2 Starting a Poly/ML session

You start a Poly/ML session by running the driver program poly with your Poly/ML database:

unix% poly ML_dbase

This should produce (something like) the response:

Poly/ML RTS version Windows
Copyright (c) 1999 Cambridge University Technical Services Limited
Poly/ML 3.X
>

(‘>’ is the default Poly/ML prompt.) You can now enter an ML program, for example:

> fun fact 0 = 1
# | fact n = n * fact(n-1);

val fact = fn : int -> int

> fact 20;
val it = 2432902008176640000 : int

(‘#’ is the default Poly/ML secondary prompt, which is used whenever the current expression is incomplete.) The Poly/ML system may take slightly longer than normal to respond to the first thing you type; this is because the ML compiler is itself a collection of persistent objects which have to be paged into memory before they can be used.

1.3 Updating the Database

The database can be updated from the local heap with

PolyML.commit ();

All changes to the local heap since the last commit (or since the start of the session, if there was no previous commit) are written out to the database, making them permanent. See section 1.6, below, for a more detailed explanation of this. The function commit returns a bool; this is true in the initial Poly/ML session - the one in which commit was called - but is false in any subsequent Poly/ML session started using the saved database.

1.4 Quitting a Poly/ML session

You can quit the Poly/ML session without updating the database by:

PolyML.quit ();

As this does not update the database, all work done since the last commit will be lost. Alternatively, typing

<control-D>

(or whatever is the local end-of-file indicator) performs a commit, then quits the current Poly/ML session.

If you are running Poly/ML as part of a Unix batch job, it may be useful to set the Unix return code when quitting Poly/ML. This can be done using

PolyML.exit n;

where n is an integer. This quits the Poly/ML session, without updating the database, and sets the Unix return code to n mod 256

Note: PolyML.quit() is the same as PolyML.exit 0 but has been retained for backwards compatibility with previous versions of Poly/ML.

1.5 Including files

You can include pieces of ML program from external files using PolyML. use:

PolyML.use "myfile";

This will attempt to read the file myfile. If myfile does not exist, it will read myfile.ML instead (and if that doesn't exist, it will try myfile.sml). Included files may also contain nested calls to use. File names are interpreted with respect to the current working directory. Initially, the current working directory is the same as the Unix directory in which the Poly/ML session was started. The current working directory may be changed using PolyML.cd, for example:

PolyML.cd "../examples";

selects the 'examples' subdirectory of the parent directory of the old working directory as the new working directory.

1.6 Persistence

Persistence means that values created in one Poly/ML session may be accessed in subsequent Poly/ML sessions without the need to explicitly reload them and without the need for recompilation.

When Poly/ML values are initially created, they are transient (non-persistent). Transient values are stored in the local heap and vanish at the end of the Poly/ML session which created them. By contrast, persistent values are stored in a disk file called a database and are accessible by all future Poly/ML sessions. When a persistent value is used during a Poly/ML session, it is paged into memory using the Unix demand paging mechanism. The function PolyML.commit causes transient values to become persistent, by copying them from the local heap to the database.

If a Poly/ML session attempts to modify a persistent object, the database is not updated directly. Instead, a copy of the modified object is made in the local heap. This means that the modification itself is transient - the current Poly/ML session will use the modified value, but subsequent sessions will see the old value, unless the database is updated using PolyML. commit. Once a modification has been committed, it is permanent; there is no way to revert to the earlier state of the database.

1.7 Debugging

Occasionally, an ML program will produce an unexpected result. When you attempt to discover the reason for this, the first thing you should do is to ensure that values are printed in enough detail to be useful. The amount of detail printed can be altered by PolyML.print_depth. For example

PolyML.print_depth 10;

will print sub-expressions nested up to 10 levels deep. Setting the print depth to 0 disables printing entirely.

The amount of detail printed in ML compiler error messages and when exceptions are raised can be controlled by PolyML.error_depth. For example

PolyML.error_depth 10;

will change the level of detail to 10 (the default is 5).

If an ML program raises an exception which gets propagated to the top level, it may be difficult to discover which function raised the exception. The function PolyML.exception_trace is provided to solve this difficulty. For example, if we define

exception Badfact;
fun badfact 0 = raise Badfact
| badfact n = n * badfact(n-1);

and then try to evaluate

map badfact [1,2,3];

Poly/ML responds with the error message

Exception- Badfact raised
Exception failure raised

which doesn't indicate which function is at fault. To discover where the exception was raised, we should evaluate

PolyML.exception_trace(fn () => map badfact [1,2,3]);

which prints a trace of the state of the stack at the time the exception was raised:

Exception trace for exception - Badfact
ML-badfact
ML-badfact
ML-map()
R
End of trace
Exception- Badfact raised
Exception failure raised

If an ML program appears to be in an infinite loop, it can be interrupted by typing

<control-C>

You now have six options, each indicated by typing a single character ’?’, ‘s’, ‘t’, ‘c’, ‘f’ or ‘q’. Typing:

‘?’ lists the other five options.

‘s’ switches from the current ML shell to the other. The second ML shell is distinguished by having '2>' as its prompt. The second shell inherits all declarations from the first shell, but any declarations made in it are local. When you are in one shell all processing in the other shell is suspended. It is sometimes useful to switch shells to change the value of PolyML.print_depth and then switch shells back again to continue.

‘t’ prints a trace of the state of the execution stack.

‘c’ allows the execution of the program to continue normally.

‘f’ continues the execution of the program, but with the exception Interrupt raised. If, as is normally the case, the program doesn't handle this exception, the exception will be propagated to the top level and so halt the program.

‘q’ is a last resort - it quits the current Poly/ML session without updating the database (i.e. it is equivalent to PolyML.quit()).

1.8 Timing and Profiling

Poly/ML has a couple of functions to measure run-time efficiency. The first measures the amount of time taken to evaluate each top-level expression.

PolyML.timing true;

activates the timing facility. The time taken to evaluate each top-level ML expression will be printed after the expression's value. Typing

PolyML.timing false;

cancels this.

The second function allows more detailed profiling of individual functions. Typing

PolyML.profiling 1;

profiles the amount of user time used by each function. Note that this figure excludes any system time that may have been used. After each top-level expression has been evaluated, a table is printed, showing how many ticks (of a 50 Hz clock) were spent inside each function. Typing

PolyML.profiling 2;

profiles the amount of space used by each function. The table printed shows how many words (1 word = 4 bytes) of storage were allocated by each function. Profiling for space increases program execution time by a large factor (about 10) due to the overhead of recording every storage request. Typing

PolyML.profiling 3;

profiles the number of emulation traps executed to support Poly/ML's arbitrary-precision arithmetic. This is important because each emulation trap requires the execution of an operating system trap-handler which, on some operating systems, may add significantly to the amount of system time used. Finally, typing

PolyML.profiling 0;

disables the profiling facility.

1.9 Session Logging

Poly/ML provides a session logging facility. When this facility is enabled, it records the text of all successful declarations in a user-specified logfile. This allows the user to recreate the current environment in a subsequent Poly/ML session by compiling the logfile with PolyML.use. By selectively turning the logging facility on and off, it is also possible to create a cleaned-up version of the recording.

A declaration (or command) is successful if

1. It successfully compiles.

2. It executes without raising an exception.

When a declaration has been compiled, the text of that declaration is stored in a log buffer. If the execution terminates without raising an exception the log buffer is written to the logfile; if not, the buffer is discarded. Executing PolyML.writeLog (see below) will also cause the log buffer to be discarded.

PolyML.isLogging ();

returns true is session logging is currently enabled.

PolyML.startLog filename;

enables the session logging facility, opening filename as a new logfile. If logging is already enabled, it raises the exception Io "Already logging".

PolyML.restartLog filename;

is like startLog, except that it opens filename in append mode, allowing an existing logfile to be extended.

PolyML.stopLog();

disables the logging facility and closes the currently active logfile. If logging is already disabled, it raises the exception Io "Not logging".

PolyML.logName ();

returns the name of the currently active logfile. If logging is currently disabled, it raises the exception Io "Not logging".

PolyML.writeLog commandString;

writes the string commandString to the logfile. If logging is currently disabled, this function has no effect. The first call of this function within any top-level declaration also has the effect of discarding the temporary log buffer; this means that the current declaration will not be recorded in the logfile.

The reason for this rather unexpected side-effect is to allow user-written interactive programs to write logs which allow their actions to be replayed in batch-mode. If the command which starts the interactive program were logged, then the batch file would require interactive user input. Instead the interactive program should use writeLog to output a sequence of equivalent batch-mode commands.

1.10 Using Poly/ML remotely

Programs which use the X Window System may run remotely; this means that the program may be run on one machine but display its output (with full windowing facilities - not just a 'dumb terminal') on a completely different machine.

For example, suppose we have logged into the machine modus, which is running X. Then we could run the baseCalc demo using

modus% poly ML_dbase
> PolyML.use "baseCalc";

which will pop-up a desk calculator widget, written in Poly/ML, using the modus display. This runs the demo locally.

Note: if modus is not running X, then running the demo will raise an exception.

Now suppose that we are sitting in front of modus but want to run Poly/ML on ponens. First we have to log into ponens and set the DISPLAY variable to tell X that we want our output to appear on modus.

modus% rlogin ponens
ponens% setenv DISPLAY modus:0.0

If we now run the baseCalc demo on ponens, the calculator widget will appear on modus; we are running the demo remotely.

ponens% poly ML_dbase
> PolyML.use "baseCalc";

Try it!

1.11 Flags

The Poly/ML driver program poly accepts several flags from the Unix command line. Typing

unix% poly

(with no database supplied) will print out a complete list of the available flags. The most important of these are the -r flag and the -h flag. Using the -r flag sets read-only mode. For example:

unix% poly -r ML_dbase

will start Poly/ML using ML_dbase, but will not allow PolyML.commit to update the database. Using the -h flag changes the maximum local heap size allowed in Poly/ML. For example:

unix% poly -h 5000 ML_dbase

will prevent the local heap from growing larger than 5000K bytes. The default (if no -h flag is present) is to allow a maximum local heap size of 6144K. See chapter 9 for a discussion of the heap flags.

Using the -noDisplay flag runs Poly/ML in non-X mode, disabling the X interface. When Poly/ML is run in non-X mode, any X call will raise the exception XWindows.XWindows.

1.12 The disc garbage collector

Each update to the database increases its size, even if fewer objects are accessible than before the update. The solution to this problem is to run poly with the -d option.

unix% poly -d ML_dbase

This program performs a complete garbage collection of the database in two phases. The first phase simply copies all of the objects from the database to local memory. This phase may be safely interrupted since it does not change the database. When all the objects are in local memory the database is removed and created again. The objects are then copied into it using a copying garbage collector. This phase must not be interrupted because it will leave the database in an incomplete state. Database garbage collection is very heavy on machine resources. It will need a large amount of virtual memory to contain the database, and will page very heavily unless provided with a lot of local memory.

Performing this garbage collection has two benefits. Firstly it will place objects that refer to each other near to each other in the database to improve locality and to reduce the paging overhead of the system. Secondly, only those objects that are accessible are copied, so that the database is reduced to its minimum size.

Alternatively, you can run the program with the -c flag:

unix% poly -d-c ML_dbase

This adds an extra common-expression elimination phase to the garbage collection process. Whenever the garbage collector finds two immutable objects that represent the same value, it eliminates one of them from the database. Running with the -c flag roughly doubles discgarb's virtual memory requirements but makes a useful additional contribution to reducing the size of the database.