Fermi National Accelerator Laboratory
Machina is a system for describing and working with projects. A Machina project is an acyclic directed graph whose nodes represent objects and whose edges represent the dependency relationship among the objects. The objects themselves represent file-like things such as sources and products as well as actions such as compiling. (Throughout this document I refer to this acyclic directed graph simply as ``the tree.'') The design of Machina explicitly supports abstraction. This abstraction allows Machina projects to be simultaneously simpler and more general than they could ever be otherwise. The dependency structure of Make can also be thought of as an acyclic directed graph. However, the nodes in the Make tree always represent files.1 Make also infers the dependency tree from rules provided by the user. In Machina, the user describes the tree explicitly.
The text-mode user interface is through Python. The user writes a short python script to describe the tree. All aspects of Machina can be queried. Output is available in either human or machine-readable form. For the case of the tree itself, the machine-readable output form is an XML file. This file is also available as an alternate form of input.
Machina provides a set of objects to deal with common programming languages, documentation formats, packaging systems, etc. The set of objects is called the Machina standard library. The standard library is extensible at the system, user and project level. Machina itself is written in Python, so objects written by users have exactly the same capabilities as objects defined by the standard library.
Machina achieves scalability to large projects through packages. In this context scalability means not only the ability to deal efficiently with project sizes ranging from very small to very large, but also the ability for a developer to deal with a small part of a large project. Packages allow developers to divide their projects into modules which present well defined interfaces to the external world. Packages can be be used actively and passively. Active packages can be built and rebuilt as part of the process. Passive packages can represent pre-built and pre-installed packages.
I start with a simple example. It is tempting to start with a program that prints the inevitable ``Hello world.'' For the purposes of demonstrating a build system, ``Hello world'' is too simple - it can be compiled with a single command line. A build system to replace a single command line would have to be contrived. Instead, I have created a package to say ``hello'' in multiple languages.
The hello package is written in C++. The implementation consists of two source files, hello_english.cc, and hello_french.cc, as well as a header file, hello.h. These are to be compiled into a library, libHello. I have also written a usage example, example_hello.cc, which is to be compiled into an executable, example_hello, which uses the library libHello.
The previous paragraph is how I would describe my work to another human. Here is how I describe my project to Machina:
lib_sources = \
library = Library('Hello')
exe_sources = Sources(['hello_example.cc'],'C++')
executable = Executable('example_hello')
machina performing "build all":
compile package hello
compile library Hello
link library Hello
compile executable example_hello
link executable example_hello
qualifiers = Debug
default qualifier is Debug
To round out this example, let's try this: Suppose we would like to have both optimized and debugging versions of the library. We add the lines
qualifiers = Debug,Optimize(2)
The core concepts of Machina are the tree objects, the tree itself, which is a directed graph of tree objects, attributes which can be assigned to the tree objects, and non-tree actions, actions which can act on the tree, but are not part of the tree.
Tree objects do the bulk of the work in Machina. A tree object keeps a list of its children and its attributes. It can respond to queries and perform actions. At a minimum, it must be able to answer whether the component it represents is dirty, bring itself up-to-date, and list its children and its attributes. Examples of tree objects are products, such as libraries and executables, sources such as source files (more generally, translation units. See the section on translation units.) and actions such as ``compilation.'' While a tree object may boil down to a file and a few functions, in general it may contain many files, or no files at all. Tree objects may use files for persistence. The files associated with an object may include source files or build products, but they also include metadata for the objects and the files they store. Through metadata, an object can store not only the fact that a source file has been compiled into an object file, but also the compiler options that were used to compile that file.
Tree objects can be queried as to their dependencies. The dependencies of an object are all the objects subservient to the object in the tree. For many applications, it is important to be able to isolate only the first-order dependencies of an object. For example: Library A uses Library B, which in turn uses Library C. Although A's dependencies include both B and C, it only directly refers to B. This is called a first-order dependency. Machina allows querying first-order dependencies of an object as well as all dependencies of an object.
Attributes are assigned to objects. In general, tree objects pass their attributes from parent to child, unless they are designated as local. Objects understand a limited set of attributes. The connection is made according to the attributes' scope. Scope is a hierarchical concept. For example, both C and Fortran compilers would accept an attribute with scope Compiler, whereas only the C compiler would understand an attribute with the scope CCompiler and only gcc would understand attributes with scope GCCCompiler.
Child objects can set constraints on the attributes they inherit. Constraints are (local) attributes themselves. Parent objects, can specify whether attributes are optional or required. If a child rejects a required attribute, an error condition is raised. For example, an object may be given the attribute ``optimized.'' It then requests that its several children are optimized. The developer may discover that the compiler exhibits a bug when optimizing one particular source file. He then designates that object as ``non-optimizable.'' Because the attribute was passed as a request instead of a requirement, the build proceeds as expected: the original object is constructed of several parts, all but one of which are optimized. Another example involves thread safety. This time, an object is given the attribute ``threadsafe,'' this time as a requirement; all of its children must be threadsafe. If the developer were to assign the ``non-threadsafable'' attribute to one of the children, Machina would throw an error at the time the tree is constructed.
Non-tree actions act on the tree, but are not part of the tree. The simplest non-tree action is a query of the tree itself. Other such actions might be to request a build of an arbitrary set of objects on the tree, to remove intermediate build products, to remove all build products, to install build products, etc. Formally, non-tree actions could be understood as actions on the tree extended by a unique root node, however I consider this distinction useful because it allows for a simple description of the tree even though the actions on it may be extended arbitrarily.
Machina achieves flexibility and scalability through the use of packages. Packages allow modular structure for the build system; projects can be divided into an hierarchy of packages and subpackages. Machina allows two-way communication between packages, so they are an active part of the build system. Furthermore, packages can define which of their aspects they export as well as which of other packages' properties they will import, minimizing the interconnectedness of the build system.
This section is logically a subsection of the standard library description. However, the importance of translation units to practically every build process, along with the historical difficulties in dealing with them, warrants a separate section. A translation unit is the stream of characters seen by the compiler at compile time. Modeling the relationship between the elements of a translation unit and the corresponding compiler output is crucial for designing a build system. Unfortunately, the dependency mechanism in make is not rich enough to adequately model this relationship in any but the trivial case, i.e., where there is a one-to-one mapping between translation units and files. The relation between make and translation units is discussed in more detail in the subappendix ``Make and translation units.''
Take the simplest non-trivial case of a compilation. I have a file foo.c that includes the header file foo.h. I compile foo.c to produce foo.o. (Note: Object files like foo.o are usually intermediate steps in the build process. As a general rule, Machina abstracts away intermediate files. Nonetheless, I will continue to discuss the actual object file in this section for conceptual simplicity.) Conceptually, the relationship between these three files looks like this:
foo.o depends on the the entire translation unit, which includes both foo.h and foo.c. Dealing with translation units in a build system is somewhat subtle. The simplest thing to do would be to give the list of files associated with each translation unit. That solution is unacceptable because it creates two points of maintenance: if the developer modifies foo.c to also include bar.h, she would have to also modify the list in the build system.
The model for translation units in Machina is based on the following observation: While a translation unit is made up of the union of all the files in the translation unit, those files are not equally difficult to find. At least one file must be trivially available; after all, the compiler needs to be told where to start. The rest of the files, however, are difficult to discover. In general, the entire translation unit must be parsed before all the files can be discovered, an operation which can take the same order of time as compiling the unit itself. Based on this observation, Machina deals with the above situation by declaring that foo.o depends on the translation unit object represented by foo.c. The list of files other than foo.c in the translation unit represented by foo.c is available to foo.o through foo.c on a need-to-know basis. Machina refers to foo.c as the primary file in a translation unit and foo.h as (one of) the secondary file(s) in the translation unit. Here is a representative list of cases:
The Machina standard library contains objects for dealing with common build items. The library is extensible. The goal is always to provide developers with easy access to abstraction. The build process can be made arbitrarily specific to certain platforms, compilers, etc., but the abstract description is always the most obvious one.
Automatically defined generic attributes include
Qualifiers are attributes that qualify the actions that occur. These are the generic qualifiers.
For each qualifier, there is a corresponding constraint. Constraints can specify whether to reject qualifiers or provide alternatives. Constraints default to local scope with the type appropriate to the qualifier they constrain.
Locations are lists of possible locations for files and/or objects. All of the locations in the standard library refer to directory paths, but they may be constructed to refer to URL's, database locations, etc. Location objects have append, prepend, and remove methods.
All tree objects have methods for attaching parents and children. They also have methods for updating and determining if they have been modified. All tree objects can be queried. Tree objects can have built-in dependencies, that is, all instances of such objects will contain the same dependency. This is useful for the case where a project has to build a tool that it will later use to build the rest of the project.
Source objects represent translation units by default. They can be queried to list their primary and/or secondary files.
Products are the goal of building software. They are the items one wants to keep, install, etc. All products objects have the local attribute product.
The generic intermediate object is the Compile object. Specializations of Compile are available for each language in the default standard library list. Compile uses the corresponding Compiler tool.
Build classes tell the tree to build a product or set of products. Classes to build each product and all the products are provided. Build itself refers to a default, which normally refers to all the products. Users can easily subclass Build to build desired sets of products.
The test class can build a set of products, then execute them and analyze the results.
Install comes with two subclasses by default. System install installs objects with the installable attribute into the the system location as defined by the system install location. Local install installs build products in an easily accessible area in the local tree.
The site-wide settings should include a method for installing the project using a packaging system appropriate to the platform.
Bundle creates a packaged distribution of the files in a project.
Clean removes intermediate files from a project. Metadata allows the user to clean objects that are no longer part of the project. Specializations of clean will also remove all build products, installed products, etc.
Tools are provided for the use of the tree objects. In most cases, the tree objects themselves should provide needed functionality so that developers do not need to use the tools themselves.
Compiler exports an interface for generic compilation. Specializations of Compiler are available for each language in the default standard library list. There is a default compiler implementation defined by the Machina defaults. The generic language Compiler classes take their methods from the default implementation, but preserve the generic interface. The implementation specific classes are available for use only when the details of a compiler are desired or needed.
Linker provides basic linker functionality. It may use one or more Compiler classes.
Archiver provides basic archiver functionality. It may use one or more Compiler and or linker classes.
TextProcessor exports an interface for processing text objects. Subclasses exists for both preprocessing code and processing documentation files with LATEX, SGML tools, etc.
Connect(foo,bar) connects the output of foo to the input of bar. Connect(foo,baz,bar) connects the output of foo to the intermediate object baz, then connects the output of baz to the input of bar. The first argument to Connect can be a list of objects instead of a single object. CompileConnect inherits from Connect. CompileConnect(source,product) performs a Connect(source, compile-object,product), where compile-object is automatically determined by the type of source.
Sources is a shortcut for producing a group of source objects of a particular type from a filename.
Everything in a Machina project revolves around the tree. The tree is described by the user in a python file. User-defined objects are described by separate python files. Machina has two forms of output: human readable and machine readable. The default is human readable. Machine readable output in XML format is available as a command-line option. Every aspect of the current Machina project can be queried from the command line, or programatically.
Machina has a default location hierarchy: project, user, site, installation, in that order. The project settings are determined in the Machina project files. Users may also specify defaults and provide objects in their own local areas. Correspondingly, the site Machina installation can contain special objects for the system. The last location searched is the Machina installation area, which contains the standard library.
Machina is designed to be an integral part of a development environment, but it is not an environment in itself. The programmatic interface is designed to allow real two-way communication with (an) external tool(s). I envision a completely GUI development environment. Completely different tools, for example (X)emacs-based or even text-based environments are left to the taste and desires of developers. Because a Machina project keeps track of all the objects in a project, it will be straightforward for development environments to implement, for example, searches through all the source files with the project, graphical views of the project tree, etc.
I conclude with a more realistic case. Imagine a project where we are actively working on two packages, Alpha and Beta. Alpha and Beta depend on Gamma, which has already been installed on the development machine.
Alpha creates a library, libAlpha, and an executable, AlphaApp, for public use. AlphaApp uses its libAlpha and the libraries exported from Beta. In addition, AlphaApp uses an internal library, libDB, which is created out of code dynamically generated from the AlphaApp source files by a Python script file DBGen.py. Here is Alpha's Machina file:
# Add external package Gamma
Gamma = ImportPackage('Gamma')
# Define libAlpha
libAlpha = Library('Alpha')
libAlpha_sources = \
exported_headers = Header('Alpha.h','C')
# Define AlphaApp
AlphaApp = Executable('AlphaApp')
AlphaApp_sources = \
AlphaApp_dbsources = \
# Define libDB
libDB = Library('DB')
# DBGenCompile is defined separately
libDB_gensources = DBGenCompile(AlphaApp_dbsources)
# The output of DBGen is C++ source
# Define the interface we export to other packages
self.persistency_style = SimpleFile()
# DBGenCompile uses the DBGen.py script. Add it as
# a dependency for all DBGenCompile objects
return os.path.splitext(self.Child.FileName()) + '_db.cc'
SystemCommand('python DBGen.py %s > %s' % \
Alpha = ImportPackage('Alpha','./Alpha')
Beta = ImportPackage('Beta','./Beta')
During the course of debugging AlphaApp, we determine that there is a bug in the installed version of Gamma. We download the newer version of Gamma and install it in the current directory. Now, adding the lines