The Mathematica Kernel: Issues in the Design and Implementation
(How to build a software octopus in your basement, given 20
very smart monkeys and 20 years)
This is taken from a talk I gave at Macalaster College, October
2006. Just before Halloween, so it gets a bit scary.
The Mathematica kernel is a large body of software that
encompasses hundreds of person-years of in-house code as well
as several large standalone libraries developed elsewhere. This
makes for an endless assortment of possible problems, as well
as offering many possibilities for future cutting-edge development.
I will also touch on both aspects.
The octopus of the subtitle is a metaphor for the multi-tentacled
Mathematica kernel. The apelets come from the so-called
"infinite monkey theorem". As they are very smart, they require
a mere four hundred monkey-years to do the job at hand (paw?).
It is not yet clear what are the repercussions of mixing zoo
and aquarium species in this cavalier manner.
I'll begin with some quotes. They may help to give an overview
of the complexities of a body of work such as this, particularly
in areas that are not very well developed. Or at least lay bare
my cynicism on this topic.
"In any sufficiently complicated mathematical program, there can
be pairs or trios of design decisions that are individually
quite sensible, but, in tandem, play havoc with one another." (Author: me)
"The more choices you have, the more opportunity for the unwary
developer to really mess things up." (Author unimportant. Okay,
it was again me who said it.)
Vastly abbreviated overview of the Mathematica kernel...
The Mathematica kernel is the computational engine of the program.
It parses and evaluates input in an interpreted environment. Like
the fabled fox, it knows many tricks. Here are some of them.
Hundreds of math functions (can evaluate numerically, simplify
symbolically, etc.)
Arithmetic (the usual, but in ways that are optimized for efficiency)
Linear algebra (numeric, exact, symbolic)
Polynomial and related algebra (gcds, factorization, various
algorithms from algorithmic commutative algebra)
Symbolic calculus (limits, integrals, ODEs, recurrence
solving, summation, more)
Number theory functionality
Numeric algorithms from linear algebra, calculus, optimization,
and elsewhere
An extensive programming language (procedural constructs, pattern
matching, functional programming constructs, more)
String functionality
Graphics (uses math under the hood)
Leveraging your own technology...
As might be surmised from the brief overview, it is sensible to
implement substantial mathematical functionality using Mathematica
itself. When I began at Wolfram ResearchI in 1991 I think perhaps
20% of the kernel code was in Mathematica, with the rest in C.
Now it might be 50% or so, excluding external libraries.
One advantage is that Mathematica is powerful language, and
often one can do in a few lines what might take a few dozen
in C. A disadvantage is that for many years we did not have traditional
debugging tools. The Wolfram Workbench, an Integrated Development
Environment (IDE) product, goes a long way toward filling this void.
For readers unfamiliar with the technology workplace, I remark
that the idea of leveraging of in-house technology is important
and quite common. For example, when I worked on compilers
and related technology for several years prior to graduate
school, we had programs for representing annotated parse trees
that were routinely ported from one project to another, with
small project related tweaks along the way.
The lesson is that good ideas and code can and should be
recycled. Of course, every company must then be on the lookout
for its mistakes, because not everything is recyclable and
ideas don't come with those triangular number codes.
The power of libraries (or: Never do something yourself
that someone else will do for you, and better)...
Until version 5 appeared in 2003, the code for the Mathematica
kernel was essentially all written and maintained by WRI.
There was some amount of code from legacy libraries
LINPACK and EISPACK, for numerical linear algebra, that
had initially been translated from Fortran to C. I believe
there was some polynomial algebra code likewise translated
from existing sources. But all of it was subsequently
modified in-house.
Early in version 5 development we planned to use LAPACK,
the LINPACK/EISPACK successor, as an external library to
which Mathematica would link. Reason: it takes advantage
of processor-tuned efficient Basic Linear Algebra Subroutines
(BLAS) which give as much as an order of magnitude speed
improvement in machine arithmetic linear algebra.
As development progressed it became clear that the
Gnu Multiple Precision (GMP) library was similarly more
efficient for arithmetic of numbers with many (thousands
or more digits). We made accomodations to use GMP in many
places, thus effectively removing some of our arithmetic code.
Later we chose to encorporate specialized libraries e.g. for
sparse linear algebra, in order to support some development
elsewhere. Example: needed sparse linear solver for
development of interior point code for linear programming.
When deciding whether to use external libraries for
certain functionality, we typically consider several questions.
Does it do things Mathematica cannot do, or cannot do as well?
What is the level of the library? State of the art?
Are there licensing issues? If so, can they be resolved?
What is the level of developer support? Do they fix their bugs?
Do ported version exist for all platforms on which we
support Mathematica?
The advantage of libraries (or: We should have thought of
this years ago)...
Better code than we could produce in-house in reasonable time
We gain the benefit of dedicated expertise from external development
Some projects such as LAPACK/BLAS are well funded and have
ongoing development
We don't need to write/maintain it all ourselves
The curse of libraries (or: Remind me why we chose to rely
on these people?)...
A drawback to use of libraries is that the code base is no
longer under our control. Hence Mathematica users inherit
bugs that find their way into such libraries, and we are
responsible for the problems caused thereby.
When we run up against bugs in library code we try to
isolate causes, send smallish examples to the library
authors/vendors, and hope for the best. Among other
things this implies that we actually try to maintain
good working relationships with various external
library purveyors, large and small.
In some cases we can also come up with workarounds in
our own code. This involves adding code to detect
failures in library code, and thereupon going to "plan B".
Of course it also involves inventing and coding plan B.
How to extend technology? (Very carefully)...
In version 4 we unleashed our "packed array" technology. This
is where lists of machine double precision real numbers, say,
are maintained internally and processed as "raw" data structures
containing tensors (vectors, matrices, etc.) of contiguous
floating point numbers, instead of the usual Mathematica "lists"
which are arrays of pointers to expressions, each of which might
be a machine double.
The benefit to such representation is the ability to make
optimal use of fast library code e.g. BLAS, and to make use
of our own vectorized machine precision numeric functions,
in order to gain competitive advantage in regards to speed
of large scale numeric processing.
The down side is this had to be "seamless" insofar as we
were not about to create new data structures and functions
just for this purpose. So such lists need to be
indistinguishable to the user from ordinary lists, except
in speed of processing and memory hoofprint. That is to say,
all functions from part extraction and assignment to
numeric evaluation must look and act the same regardless
of mode of internal storage. This is at it ought to be.
The important thing to realize is that it placed a huge
burden on development and also has had repercussions that
are felt in debugging to this very day. Specifically, a
problem reported to me by October 21 of 2006 can
be attributed to an obscure part of the packed array technology
from the mid-to-late 1990's.
An invitation to infinite recursion...
A dark side to a complicated program like Mathematica is the
ability to make trouble in subtle ways. One is to invoke
pairs of transformations that undo one another.
Case in point: symbolic integration. This sits atop a
tremendous amount of other functionality. One well known
tactic for integration involves finding a partial fraction
decomposition of a rational function.
Another tactic is to accept "partial" results, that is,
split integrands into summands that we can integrate and
those we cannot. As it happens, in some cases it is also
helpful to put a sum of terms over a common denominator.
So imagine what happens if your code has transformations
that use Together on things generated by Apart (these
functions sound like opposites for a good reason...)
Then we might well find that the "togethered" thing
gets us right back to our input, if the integrated
pieces happen to exactly cancel. And we go into recursion.
In some cases it is simply difficult to draw a line between
infinite recursion and simply "trying real hard" on pieces of
a problem. While still allowing for some amount of the latter,
we attempt to forestall recursion in at least a few distinct ways.
*Caching computations "in progress"
One is to rely on caching intermediate computations, in
"canonicalized" form, so that later stages can attempt to
look up results to see if we are reencountering a problem we
have not yet finished processing (an obvious sign of recursion).
*Depth charges
A more coarse approach is to use certain counters in order
to determine approximately how deep we might be in some
computation. If they get incremented too far we give up.
This has some obvious drawbacks (if we stop too soon) but,
in practice, seems to be reasonable.
*Blocking reusage of parent callers
Still another thing is to "block" certain code from being
used in certain places, on the supposition that it is
likely to lead to recursion. We might do this by putting
functions to block on a "stack" using Mathematica's Block
construct. Which, by the way, was not so named for its
"blocking" feature. It is simply a type of procedural
"block" in the classical programming language sense.
Sheer coincidence.
New methods bring new problems...
*Progress
From time to time one learns of a new algorithm for doing a
certain type of computation, say resultants of a pair of polynomials.
One may read that the new method is vastly better than anything that
came before it, and see, in the literature, examples to support
that notion. Sometimes the new algorithms are easy to implement,
and we do so.
*Regress, or, What's the down side?
Quite often it turns out that there are classes of problem for
which the old method simply worked better e.g. faster or using
far less memory. For example, when we implemented the Sylvester
matrix method for computing resultants we found many classes
of example for which the polynomial remainder sequence method
worked better. This is not a surprise, and was not unexpected, but...
In this situation we must resort to various heuristics in order
to decide upon the method to use for a given input. This is how
any good polyalgorithm should behave.
So what is the problem?
The added problem is that we now have a "legacy" wherein, if we
get it wrong on the method choice, we behave worse than in past
even though we might rightly claim that we use "better" methods.
This is the dilemma of having more choices.
Heuristics...
The all-important quote:
"Heuristics are bug ridden by definition. If they didn't have bugs,
then they'd be algorithms." (Author: unknown)
Lesson 1: Even good heuristics can only take you but so far.
Lesson 2: Bad heuristics can take you further, but in the wrong direction.
A drawback to code "improvement"...
*Fix me here...
In symbolic computation it is quite common to find that fixing one
problem exposes weaknesses elsewhere. For example, say algorithm A
decides to do operation X and, if it succeeds, operation Y, else Z.
Say operation Y has six legs (as in "buggy"), but operation X
returns a failure status so algorithm A goes on to do Z and gets
a respectable result. What happens when operation X is improved
so that it no longer returns a failure flag?
*...break me there...
This was NOT a trick question. What happens is that algorithm A
now uses defective procedure Y and messes up.
Code modularity...
The ways in which various parts of Mathematica depend upon one
another lead to certain undesirable outcomes. One is that the
Mathematica kernel is not terribly modular. That is to say, it
is difficult to extract various components e.g. "arithmetic" or
"numerics" or "polynomial algebra" or "graphics", as there tend
to be interconnections.
As the kernel has been build up over many years this is even
more difficult than might otherwise be the case, because in
earlier times the need for modularity, clean "application program
interfaces" (APIs), etc. were not so well understood, at least not
by mathematicians. Times, happily, have changed.
*The Mathematica kernel withstood the transition to the GMP library
for arithmetic. This means the arithmetic, at least, has good,
modular structure.
*Mathematica has extensible numeric differential equation solvers.
These, moreover, use some very nice modular code with carefully
designed data structures.
*Mathematica recently seen the addition of an extensible random number
generator, complete with documented API.
*Mathematica has a huge test suite This last means that when changes
are made, if they have a bad impact in some part of the kernel
far away this is often caught quickly in-house.
Speed vs. replicability...
Caution: This next topic scares people. Take a deep breath.
*Constraining an evaluation
In order to prevent code from hanging it is sometimes necessary
to rely on time constraints. The flow of a computation (e.g. a
symbolic integral) is then influenced by whether a particular
subcomputation does or does not finish in under the allotted time limit.
*Obvious consequence
The same computation may follow different paths on different
machines, or even on the same machine but under different work loads
*Slightly less obvious consequence
Results, both good and bad, may be sporadic and difficult to replicate.
*Still less obvious consequence
A faster machine can be much slower for a given computation
How?! Like so: Algorithm A tries subcomputation X. If it succeeds
in under its time constraint, it goes on to do Y, else Z. If Z
is, for the particular problem at hand, much faster than Y,
then the faster machine that completes X is now herading into
the tarpit of Y while its slower cousin zips instead through Z.
*Another, quite unobvious issue with use of such constraints
Computations that must end due to time or similar constraints are
aborted. Any "change of state" (e.g. temporary resetting of an
option to some function) during the computation must then be
restored by the caller. This can be tricky when coding in
Mathematica and there are nested uses of TimeConstrained.
Example:
answer=TimeConstrained[
computation1;
result=TimeConstrained[
computation2[SomethingThatChangesState],30];
FixStateIfAborted[result];
result
, 10]
If the inner TimeConstrained invocation is aborted then the state
altered by computation2 will be restored. If instead the outer
one is interrupted, and if, unlike this simple example, it is
calling from another planet rather than just outside the door,
it may not realize there could be a state change to repair.
There is a function in Mathematica called CheckAbort that,
I believe, is meant to be used for such scenarios. But...I don't
know how it works, or whether it can really handle issues with
nesting of TimeConstrained.
Scaling an octopus...
Over time the functionality, and corresponding code base, of
the Mathematica kernel has grown considerably. This creates
several burdens.
*Design consistency must be maintained. This is vital because
design consistency has been the backbone, I believe, of the
commercial success of Mathematica.
*New functionality must be robust. A small subset of my colleagues
may disagree, but in general I find that neat bells and whistles
do not win friends if they behave badly e.g. cause the program to crash.
*New functionality should not detract too much from our capability
to repair/improve/maintain old functionality.
*There are endless opportunity costs involved in deciding on
what needs to be added, what should be rewritten, what needs
debugging, etc.
*Interactions with existing technology may cause trouble. New methods
allow us to make use of them elsewhere. An interesting problem then
arises. Code that "worked" in past, but using perhaps non-robust
technology, may begin to fail e.g. because the new methods are
too slow, or have bugs, or are being extended beyond their
appropriate "operating range" (e.g. relying on a polynomial
solver to handle transcendental equations), or ....
*New functionality requires documentation!!!
*Larger memory footprint of the program (what will happen when
Congress revokes Moore's Law?)
*Larger code base implies a larger burden of maintaining legacy code.
So how well does the Mathematica kernel scale as it grows? Thus
far I think remarkably well. But as every stock prospectus will
caution: "Past performance is no guarantee of future returns".
Summary...
I covered several problematic areas in the implementation of
the Mathematica kernel...
...but most areas are not burdened by such issues...
...and even the exceptions tend to work fine for most purposes, so...
...the outlook far less bleak than one might think
Here is what I expect to see down the road.
Over time, some of the problematic areas actually get improved.
Other ideas, methods, functions get developed that make some
problem areas less relevant.
We are at the crossroads between voodoo and science. This makes for
interesting times in computational mathematics. It is anyone's
guess how far we will go, or to what extent voodoo can be
replaced, if not by science, then at least by something more
deserving to be called "art".
My motto? "I write the legacy code of the future."