Porting Poly/ML

Porting Poly/ML to a new operating system or architecture is unfortunately a lot more complicated than recompiling the sources.  The complication lies in the fact that a Poly/ML database is both operating system dependent and architecture dependent.

Porting Poly/ML to a new operating system

A database is a formed from two segments which need to be loaded into memory. On most operating systems this is done by "memory mapping" the segments.  This involves requesting the operating system to load the pages of the segments into memory as required and generally gives much better performance than the alternative of allocating memory and reading the segments explicitly.  The contents of the segments need to be loaded at specific virtual addresses.  That is because they contain data structures, such as list cons-cells, which contain pointers to other objects.  In order for the pointers to be machine addresses and for them to work properly every object needs to be reloaded at the same address each time.  It would be possible to relocate all the objects when they were loaded but that would involve loading and updating every object and would lose the advantages of memory mapping.

When porting Poly/ML to a new operating system the first requirement is to find a suitable address range in which to map the database.  Operating systems tend to partition up the address space between program space, data space and stack with some area available for memory mapping.  What that area is depends on the operating system and any special requirements of the hardware.  Often the only way to find out what addresses are suitable is be experiment.  Try mapping a file (on Unix this uses the mmap function) without specifying an address and see what address it gets mapped at.   Then see whether it is possible to map the file giving that address explicitly.  

A database consists of a header and two segments: the immutable area and the mutable area.  The header is a small section at the start of the database, typically a single page, containing information about the database.  The mutable area typically forms 10% of the database and contains values such as refs and arrays whose values can be modified.  The major part of the database is immutable data, such as tuples, datatypes, vectors and, notably, segments containing machine code.  The machine code is native machine code which is why the database is architecture-dependent.  The operating system call which maps files allows the mutable and immutable segments to be mapped to separate regions of the address space.

As well as the two segments for a database there is also a segment, known as the IO segment, which is initialised by the run-time system before a database is loaded.  It enables code in the database to call functions in the run-time system and, as with the other segments, must be created at a specific address.  Although it does not correspond to a portion of a real file it is normally created using the same mmap function used to map the segments of the database.  Although not strictly necessary, the local (temporary) memory used by a Poly/ML program while it is running is also allocated in a similar way.

Once suitable addresses have been found then the values should be added to the "addresses.h" file using appropriate ifdefs.  This may be a good time to add your operating system to the "configure" shell script to set the defines.   The table below shows values currently used in the supported configurations.   The values for some other systems can be found in "addresses.h".

Variable Meaning Linux/FreeBSD/Windows on i386 Solaris on Sparc
H_BOTTOM Start of space for database 0x20000000 0xC0040000
H_TOP Limit of space for database 0x3F000000 0xD0000000
IO_BOTTOM Start of IO area 0x3F000000 0xC0000000
IO_TOP Limit of IO area 0x3F002000 0xC0002000
LOCAL_IBOTTOM Start of space for local immutable data 0x01000000 0x20000000
LOCAL_ITOP Limit of memory for local immutable data 0x1C000000 0xA0000000
LOCAL_MBOTTOM Start of space for local mutable data 0x1C000000 0xA0000000
LOCAL_MTOP Limit of memory for local mutable data 0x20000000 0xC0000000

Build the driver program using the newly modified version of "addresses.h".   If you are a lucky it will all compile but the chances are you will have to make a few changes to account for operating system differences, such as differences in include files.  If the architecture is the same as one for which there is a code-generator you are in luck: take a copy of a database for that architecture and try compacting it using "poly -d".  It may be able to modify the database and relocate it to the addresses you need.  If that fails, perhaps because the address range for which the database was originally designed is too far different from what your system expects, you may have to make some temporary modifications to enable the database to be loaded.

If there is no database for your architecture you can build the version of Poly/ML which includes a byte-code interpreter.  In this case start with the portable database PortDB.txt.  This comes as a text version and contains no assumptions about the address range to which it will be loaded.  You first need to build the "readport" program by typing "make readport" in the "driver" directory and then run "readport PortDB.txt NewDB".  This will read the portable database and create a new database, NewDB, using the address range you set up in "addresses.h".  You should now be able to run poly with this new database.   The interpreter is reasonably fast and should allow you to experiment with Poly.   To get the best out of it you need a native code version and this is the subject of the next section.

Porting Poly/ML to a new architecture

TO BE WRITTEN.

Last updated: 15 January 2002 by David Matthews.