Machina

James Amundson
Fermi National Accelerator Laboratory
amundson@fnal.gov

Overview

Machina is a system for describing and working with projects. A Machina project is an acyclic directed graph whose nodes represent objects and whose edges represent the dependency relationship among the objects. The objects themselves represent file-like things such as sources and products as well as actions such as compiling. (Throughout this document I refer to this acyclic directed graph simply as ``the tree.'') The design of Machina explicitly supports abstraction. This abstraction allows Machina projects to be simultaneously simpler and more general than they could ever be otherwise. The dependency structure of Make can also be thought of as an acyclic directed graph. However, the nodes in the Make tree always represent files.1 Make also infers the dependency tree from rules provided by the user. In Machina, the user describes the tree explicitly.

The text-mode user interface is through Python. The user writes a short python script to describe the tree. All aspects of Machina can be queried. Output is available in either human or machine-readable form. For the case of the tree itself, the machine-readable output form is an XML file. This file is also available as an alternate form of input.

Machina provides a set of objects to deal with common programming languages, documentation formats, packaging systems, etc. The set of objects is called the Machina standard library. The standard library is extensible at the system, user and project level. Machina itself is written in Python, so objects written by users have exactly the same capabilities as objects defined by the standard library.

Machina achieves scalability to large projects through packages. In this context scalability means not only the ability to deal efficiently with project sizes ranging from very small to very large, but also the ability for a developer to deal with a small part of a large project. Packages allow developers to divide their projects into modules which present well defined interfaces to the external world. Packages can be be used actively and passively. Active packages can be built and rebuilt as part of the process. Passive packages can represent pre-built and pre-installed packages.

A simple example

I start with a simple example. It is tempting to start with a program that prints the inevitable ``Hello world.'' For the purposes of demonstrating a build system, ``Hello world'' is too simple - it can be compiled with a single command line. A build system to replace a single command line would have to be contrived. Instead, I have created a package to say ``hello'' in multiple languages.

The hello package is written in C++. The implementation consists of two source files, hello_english.cc, and hello_french.cc, as well as a header file, hello.h. These are to be compiled into a library, libHello. I have also written a usage example, example_hello.cc, which is to be compiled into an executable, example_hello, which uses the library libHello.

The previous paragraph is how I would describe my work to another human. Here is how I describe my project to Machina:

package = Package('hello')

lib_sources =  \

    Sources(['hello_english.cc','hello_french.cc'],'C++')

library = Library('Hello')

CompileConnect(lib_sources,library)

exe_sources = Sources(['hello_example.cc'],'C++')

executable = Executable('example_hello')

CompileConnect(exe_sources,executable)

Connect(library,executable)

I can now compile my package:

> machina build

machina performing "build all":

compile package hello

    compile library Hello

        compile hello_english.cc

        compile hello_french.cc

        link library Hello

    compile executable example_hello

        compile example_hello.cc

        link executable example_hello

Now that I have seen that my code compiles successfully, I would like to try compiling it with optimization turned on. Since I haven't specified any qualifiers, Machina has used the defaults. To find out what the defaults are, I can simply ask Machina:

> machina query package=hello,qualifiers

package hello:

    qualifiers = Debug

I can also ask Machina to list which qualifiers are available:

> machina list qualifiers

available qualifiers:

    standard library:

        Debug

        Optimize(n)

default qualifier is Debug

(This is an abbreviated list for the purposes of this example.) To change the defaults for my package, I simply have to add the line

package.qualifiers.set(Optimize(2))
to the Machina file given above. Invoking Machina again will cause the entire project to be compiled again with optimization. Machina knows it needs to recompile the files because the metadata for the existing files indicates that they were compiled with different qualifiers than those currently being requested.

To round out this example, let's try this: Suppose we would like to have both optimized and debugging versions of the library. We add the lines

library.qualifiers.set([Optimize(2),Debug])
Machina now knows that we want the Hello library built with two sets of qualifiers. To check, we can do

> machina query library=Hello,qualifiers

library Hello:

    qualifiers = Debug,Optimize(2)

which verifies that Machina understood what we told it. Invoking Machina again will produce two versions of the library. One will be compiled with the debugging options, one will be compiled with optimizing options.

The Machina kernel

The core concepts of Machina are the tree objects, the tree itself, which is a directed graph of tree objects, attributes which can be assigned to the tree objects, and non-tree actions, actions which can act on the tree, but are not part of the tree.

Tree objects

Tree objects do the bulk of the work in Machina. A tree object keeps a list of its children and its attributes. It can respond to queries and perform actions. At a minimum, it must be able to answer whether the component it represents is dirty, bring itself up-to-date, and list its children and its attributes. Examples of tree objects are products, such as libraries and executables, sources such as source files (more generally, translation units. See the section on translation units.) and actions such as ``compilation.'' While a tree object may boil down to a file and a few functions, in general it may contain many files, or no files at all. Tree objects may use files for persistence. The files associated with an object may include source files or build products, but they also include metadata for the objects and the files they store. Through metadata, an object can store not only the fact that a source file has been compiled into an object file, but also the compiler options that were used to compile that file.

Dependencies and first-order dependencies

Tree objects can be queried as to their dependencies. The dependencies of an object are all the objects subservient to the object in the tree. For many applications, it is important to be able to isolate only the first-order dependencies of an object. For example: Library A uses Library B, which in turn uses Library C. Although A's dependencies include both B and C, it only directly refers to B. This is called a first-order dependency. Machina allows querying first-order dependencies of an object as well as all dependencies of an object.

Attributes and scopes

Attributes are assigned to objects. In general, tree objects pass their attributes from parent to child, unless they are designated as local. Objects understand a limited set of attributes. The connection is made according to the attributes' scope. Scope is a hierarchical concept. For example, both C and Fortran compilers would accept an attribute with scope Compiler, whereas only the C compiler would understand an attribute with the scope CCompiler and only gcc would understand attributes with scope GCCCompiler.

Child objects can set constraints on the attributes they inherit. Constraints are (local) attributes themselves. Parent objects, can specify whether attributes are optional or required. If a child rejects a required attribute, an error condition is raised. For example, an object may be given the attribute ``optimized.'' It then requests that its several children are optimized. The developer may discover that the compiler exhibits a bug when optimizing one particular source file. He then designates that object as ``non-optimizable.'' Because the attribute was passed as a request instead of a requirement, the build proceeds as expected: the original object is constructed of several parts, all but one of which are optimized. Another example involves thread safety. This time, an object is given the attribute ``threadsafe,'' this time as a requirement; all of its children must be threadsafe. If the developer were to assign the ``non-threadsafable'' attribute to one of the children, Machina would throw an error at the time the tree is constructed.

Non-tree actions

Non-tree actions act on the tree, but are not part of the tree. The simplest non-tree action is a query of the tree itself. Other such actions might be to request a build of an arbitrary set of objects on the tree, to remove intermediate build products, to remove all build products, to install build products, etc. Formally, non-tree actions could be understood as actions on the tree extended by a unique root node, however I consider this distinction useful because it allows for a simple description of the tree even though the actions on it may be extended arbitrarily.

Packages

Machina achieves flexibility and scalability through the use of packages. Packages allow modular structure for the build system; projects can be divided into an hierarchy of packages and subpackages. Machina allows two-way communication between packages, so they are an active part of the build system. Furthermore, packages can define which of their aspects they export as well as which of other packages' properties they will import, minimizing the interconnectedness of the build system.

Exports
Packages can specify their exports, including both objects and attributes. A typical package might export a library, a tree object, and the attributes needed to build against the library, for example header include paths.
Imports
When a package wants to use the services of another package, it declares that it is importing the second package. By default, all exported properties of the second package are made available to the original package. If desired, only named attributes of the imported package will be taken.
Modifiable and nonmodifiable packages
packages in Machina are always ``active'' in the sense that they can be queried by the user and the build system. However, under many circumstances, it may be necessary to designate an imported package as nonmodifiable, i.e., it cannot be built in the process of building the primary package. The most common example consists of two developers using each other's packages. Each developer has her package installed in is her own filespace. The developers can use each other's packages for compiling their own packages, but they do not have build permission in their colleagues' areas. It is preferable to have the nonmodifiable attribute designated at import time. It is also possible, however, to make nonmodifiable an attribute of the package itself.
Metapackages
Packages do not have to do anything on their own. A package can be nothing but a collection of defaults, a collection of other packages, or a combination thereof. Such packages are metapackages.
``External'' packages
Although Machina is designed to work well with packages defined in its own framework, it is inevitable that products of other build systems will be needed in Machina projects. It is straightforward to wrap an external library with a small segment of Machina code such that the package appears as any other package. External packages will usually be designated as nonmodifiable packages, but there is no reason why the wrapping Machina code could not invoke make when appropriate. Machina wrappers for standard system libraries, etc., can be included in the installation, or provided by local extensions.

Translation units and dependencies

This section is logically a subsection of the standard library description. However, the importance of translation units to practically every build process, along with the historical difficulties in dealing with them, warrants a separate section. A translation unit is the stream of characters seen by the compiler at compile time. Modeling the relationship between the elements of a translation unit and the corresponding compiler output is crucial for designing a build system. Unfortunately, the dependency mechanism in make is not rich enough to adequately model this relationship in any but the trivial case, i.e., where there is a one-to-one mapping between translation units and files. The relation between make and translation units is discussed in more detail in the subappendix ``Make and translation units.''

Take the simplest non-trivial case of a compilation. I have a file foo.c that includes the header file foo.h. I compile foo.c to produce foo.o. (Note: Object files like foo.o are usually intermediate steps in the build process. As a general rule, Machina abstracts away intermediate files. Nonetheless, I will continue to discuss the actual object file in this section for conceptual simplicity.) Conceptually, the relationship between these three files looks like this:



\includegraphics{compilation-unit-main.eps}



foo.o depends on the the entire translation unit, which includes both foo.h and foo.c. Dealing with translation units in a build system is somewhat subtle. The simplest thing to do would be to give the list of files associated with each translation unit. That solution is unacceptable because it creates two points of maintenance: if the developer modifies foo.c to also include bar.h, she would have to also modify the list in the build system.

The model for translation units in Machina is based on the following observation: While a translation unit is made up of the union of all the files in the translation unit, those files are not equally difficult to find. At least one file must be trivially available; after all, the compiler needs to be told where to start. The rest of the files, however, are difficult to discover. In general, the entire translation unit must be parsed before all the files can be discovered, an operation which can take the same order of time as compiling the unit itself. Based on this observation, Machina deals with the above situation by declaring that foo.o depends on the translation unit object represented by foo.c. The list of files other than foo.c in the translation unit represented by foo.c is available to foo.o through foo.c on a need-to-know basis. Machina refers to foo.c as the primary file in a translation unit and foo.h as (one of) the secondary file(s) in the translation unit. Here is a representative list of cases:

The Machina standard library

The Machina standard library contains objects for dealing with common build items. The library is extensible. The goal is always to provide developers with easy access to abstraction. The build process can be made arbitrarily specific to certain platforms, compilers, etc., but the abstract description is always the most obvious one.

Attributes

Generic attributes

Automatically defined generic attributes include

Platform
the hardware/operating system combination.
Language
as in human language.
These attributes have global scope by default.

Qualifiers

Qualifiers are attributes that qualify the actions that occur. These are the generic qualifiers.

Debug
generates debugger symbols.
Optimize(X)
compile with optimization level X.
Relocate
produce relocatable (position independent) code.
Threadsafe
produce threadsafe code.
Profile
produce profiler code.
As an example, compilers for each (computer) language define language-specific qualifiers with the appropriate scope. Specific implementations of compilers define qualifiers to exhaust the remaining command line options, again with the appropriate scope.

Constraints

For each qualifier, there is a corresponding constraint. Constraints can specify whether to reject qualifiers or provide alternatives. Constraints default to local scope with the type appropriate to the qualifier they constrain.

Locations

Locations are lists of possible locations for files and/or objects. All of the locations in the standard library refer to directory paths, but they may be constructed to refer to URL's, database locations, etc. Location objects have append, prepend, and remove methods.

Search locations
tell Machina where to look for Machina objects, particularly projects. Other search locations tell compilers where to look for header files, linkers where to look for libraries, etc.
Install locations
tell Machina where to put build products at install time.
Build file locations
All files produced during a build, including metadata files, intermediate files and final projects are placed in locations determined by the tree objects and their location attributes. This allows:

Tree objects

All tree objects have methods for attaching parents and children. They also have methods for updating and determining if they have been modified. All tree objects can be queried. Tree objects can have built-in dependencies, that is, all instances of such objects will contain the same dependency. This is useful for the case where a project has to build a tool that it will later use to build the rest of the project.

Source objects

Source objects represent translation units by default. They can be queried to list their primary and/or secondary files.

Product objects

Products are the goal of building software. They are the items one wants to keep, install, etc. All products objects have the local attribute product.

Library
General library. Library-specific attributes include shared and static. Libraries keep track of their first-order dependencies on other libraries.
Header
Header files made available outside the project.
Executable
General executable.
Standalone Object
Some projects use object (.o) files without linking them into libraries. More commonly, object files will be hidden inside of compiler objects.
Documentation
support for documentation in a variety of formats.

Intermediate objects

The generic intermediate object is the Compile object. Specializations of Compile are available for each language in the default standard library list. Compile uses the corresponding Compiler tool.

Non-tree Actions

Build

Build classes tell the tree to build a product or set of products. Classes to build each product and all the products are provided. Build itself refers to a default, which normally refers to all the products. Users can easily subclass Build to build desired sets of products.

Test

The test class can build a set of products, then execute them and analyze the results.

Install

Install comes with two subclasses by default. System install installs objects with the installable attribute into the the system location as defined by the system install location. Local install installs build products in an easily accessible area in the local tree.

The site-wide settings should include a method for installing the project using a packaging system appropriate to the platform.

Bundle

Bundle creates a packaged distribution of the files in a project.

Clean

Clean removes intermediate files from a project. Metadata allows the user to clean objects that are no longer part of the project. Specializations of clean will also remove all build products, installed products, etc.

Tools

Tools are provided for the use of the tree objects. In most cases, the tree objects themselves should provide needed functionality so that developers do not need to use the tools themselves.

Compiler

Compiler exports an interface for generic compilation. Specializations of Compiler are available for each language in the default standard library list. There is a default compiler implementation defined by the Machina defaults. The generic language Compiler classes take their methods from the default implementation, but preserve the generic interface. The implementation specific classes are available for use only when the details of a compiler are desired or needed.

Linker

Linker provides basic linker functionality. It may use one or more Compiler classes.

Archiver

Archiver provides basic archiver functionality. It may use one or more Compiler and or linker classes.

TextProcessor

TextProcessor exports an interface for processing text objects. Subclasses exists for both preprocessing code and processing documentation files with LATEX, SGML tools, etc.

Methods for building trees

Connect

Connect(foo,bar) connects the output of foo to the input of bar. Connect(foo,baz,bar) connects the output of foo to the intermediate object baz, then connects the output of baz to the input of bar. The first argument to Connect can be a list of objects instead of a single object. CompileConnect inherits from Connect. CompileConnect(source,product) performs a Connect(source, compile-object,product), where compile-object is automatically determined by the type of source.

Sources

Sources is a shortcut for producing a group of source objects of a particular type from a filename.

The Machina interface

Everything in a Machina project revolves around the tree. The tree is described by the user in a python file. User-defined objects are described by separate python files. Machina has two forms of output: human readable and machine readable. The default is human readable. Machine readable output in XML format is available as a command-line option. Every aspect of the current Machina project can be queried from the command line, or programatically.

Machina default locations

Machina has a default location hierarchy: project, user, site, installation, in that order. The project settings are determined in the Machina project files. Users may also specify defaults and provide objects in their own local areas. Correspondingly, the site Machina installation can contain special objects for the system. The last location searched is the Machina installation area, which contains the standard library.

Machina command-line forms

Machina list
lists objects known to the system and their origin. With no arguments, it lists the classes of things that are listable. With an argument, it lists objects inheriting from the type given.
Machina help
produces this list of forms and the list of command line options below.
Machina prepare
constructs the tree corresponding to the current Machina input file. It reports any problems encountered in the tree definition.
Machina query
allows the user to query any object in the current project, including the project itself.
Machina import
takes as its input the (machine readable) output of Machina query. The output is a python file that will create the input. Therefore ``Machina -machine query project | Machina import'' will take a python file describing a project, turn it into an machine readable description of a project, then turn that it another python file with the same functionality as the first. The point is to allow development environments to communicate with Machina through the machine readable form, but still allow the option of exporting a human-readable project file for use without the development environment.
machine <non-tree action>
performs non-tree action.

Machina command-line options

-verbose, -verbose=X
set the level of verbosity of output. 0 corresponds to no output except error messages. 1, the default, gives progress messages plus errors. Higher numbers give more detail.
-continue
tells Machina not to stop at the first error.
-machine
tells Machina to generate machine-readable output.
-help
is the same as ``Machina help'' above.
-parallel=X
tells Machina to build using X parallel processes. Useful on multiprocessor machines.
-parallel-method=<method>
tells Machina to spawn parallel processes using method <method>. The modern trend for large scale computing is away from large multiprocessor machines towards cluster of commodity-based machines. Unfortunately, the trend has not progressed to the point where there is one standard method for communicating between clustered commodity-based machines. Competing methods for clustering and authentication make the number of possibilities large and growing. Machina allows the user to specify arbitrary python code to launch processes on other machines. Standard methods will be available in the Machina standard library when they exist.

Communication with development environments

Machina is designed to be an integral part of a development environment, but it is not an environment in itself. The programmatic interface is designed to allow real two-way communication with (an) external tool(s). I envision a completely GUI development environment. Completely different tools, for example (X)emacs-based or even text-based environments are left to the taste and desires of developers. Because a Machina project keeps track of all the objects in a project, it will be straightforward for development environments to implement, for example, searches through all the source files with the project, graphical views of the project tree, etc.

A complex example

I conclude with a more realistic case. Imagine a project where we are actively working on two packages, Alpha and Beta. Alpha and Beta depend on Gamma, which has already been installed on the development machine.

Alpha creates a library, libAlpha, and an executable, AlphaApp, for public use. AlphaApp uses its libAlpha and the libraries exported from Beta. In addition, AlphaApp uses an internal library, libDB, which is created out of code dynamically generated from the AlphaApp source files by a Python script file DBGen.py. Here is Alpha's Machina file:

package = Package('Alpha')

# Add external package Gamma

Gamma = ImportPackage('Gamma')

# Define libAlpha

libAlpha = Library('Alpha')

libAlpha_sources =  \

   Sources(['myfile1.c','myfile2.c','myfile3.c',\

   'myfile4.c','myfile5.c','myfile6.c'],'C')

exported_headers = Header('Alpha.h','C')

CompileConnect(libAlpha_sources,libAlpha)

# Define AlphaApp

AlphaApp = Executable('AlphaApp')

AlphaApp_sources =  \

   Sources(['appfile1.cc','appfile2.cc','appfile3.cc'],'C++')

AlphaApp_dbsources = \

   Sources(['dbsource1.cc','dbsource2.cc','dbsource3.cc'],'C++')

CompileConnect(AlphaApp_sources,AlphaApp)

CompileConnect(AlphaApp_dbsources,AlphaApp)

Connect(libAlpha,AlphaApp)

Connect(Gamma.libraries(),AlphaApp)

# Define libDB

libDB = Library('DB')

# DBGenCompile is defined separately

libDB_gensources = DBGenCompile(AlphaApp_dbsources)

# The output of DBGen is C++ source

Connect(libDB_gensources,CXXCompile(),libDB)

Connect(libDB,AlphaApp)

# Define the interface we export to other packages

package.export(AlphaApp)

package.export(libAlpha)

package.export(exported_headers)

Where we have defined

class DBGenCompile(Compile):

    def __init__(self):

        self.persistency_style = SimpleFile()

        Compile.__init__(self)

        # DBGenCompile uses the DBGen.py script. Add it as

        # a dependency for all DBGenCompile objects

        self.dependencies.add('DBGen.py')

    def FileName(self):

        return os.path.splitext(self.Child.FileName())[0] + '_db.cc'

    def BringUpToDate(self):

        SystemCommand('python DBGen.py %s > %s' % \

            (self.Child.FilePath(),self.FilePath())

separately. (The sources for Beta and Gamma do not add anything new to this example, so I have omitted them in the interest of brevity. The point of having them in this is example is to demonstrate the ability to actively develop multiple packages.) At this point we can do a machina build of Alpha or Beta. To compile both at once, we define a metapackage, Meta:

package = Package('Meta')

Alpha = ImportPackage('Alpha','./Alpha')

Beta = ImportPackage('Beta','./Beta')

Because Alpha and Beta are not known to the system, we have set default locations for them. Building Meta will build both Alpha and Beta. We can also use Meta as a place to keep common settings for Alpha and Beta. Adding the line

package.qualifiers.set(Optimize(2))
to Meta allows us to build optimized versions of Alpha and Beta by building Meta.

During the course of debugging AlphaApp, we determine that there is a bug in the installed version of Gamma. We download the newer version of Gamma and install it in the current directory. Now, adding the lines

system.locations.packages('Gamma','./Gamma')
provides the new default location for Gamma to the packages imported by Meta. Building Meta again causes Alpha and Beta to be built with the new version of Gamma. Gamma itself is compiled as a consequence of being needed by Alpha and Beta.



Footnotes

... files.1
The .PHONY directive in Make does allow creating nodes that are not literally files. The lack of abstraction remains.


James Amundson 2000-03-31