The Mathematica Kernel: Issues in the Design and Implementation (How to build a software octopus in your basement, given 20 very smart monkeys and 20 years) This is taken from a talk I gave at Macalaster College, October 2006. Just before Halloween, so it gets a bit scary. The Mathematica kernel is a large body of software that encompasses hundreds of person-years of in-house code as well as several large standalone libraries developed elsewhere. This makes for an endless assortment of possible problems, as well as offering many possibilities for future cutting-edge development. I will also touch on both aspects. The octopus of the subtitle is a metaphor for the multi-tentacled Mathematica kernel. The apelets come from the so-called "infinite monkey theorem". As they are very smart, they require a mere four hundred monkey-years to do the job at hand (paw?). It is not yet clear what are the repercussions of mixing zoo and aquarium species in this cavalier manner. I'll begin with some quotes. They may help to give an overview of the complexities of a body of work such as this, particularly in areas that are not very well developed. Or at least lay bare my cynicism on this topic. "In any sufficiently complicated mathematical program, there can be pairs or trios of design decisions that are individually quite sensible, but, in tandem, play havoc with one another." (Author: me) "The more choices you have, the more opportunity for the unwary developer to really mess things up." (Author unimportant. Okay, it was again me who said it.) Vastly abbreviated overview of the Mathematica kernel... The Mathematica kernel is the computational engine of the program. It parses and evaluates input in an interpreted environment. Like the fabled fox, it knows many tricks. Here are some of them. Hundreds of math functions (can evaluate numerically, simplify symbolically, etc.) Arithmetic (the usual, but in ways that are optimized for efficiency) Linear algebra (numeric, exact, symbolic) Polynomial and related algebra (gcds, factorization, various algorithms from algorithmic commutative algebra) Symbolic calculus (limits, integrals, ODEs, recurrence solving, summation, more) Number theory functionality Numeric algorithms from linear algebra, calculus, optimization, and elsewhere An extensive programming language (procedural constructs, pattern matching, functional programming constructs, more) String functionality Graphics (uses math under the hood) Leveraging your own technology... As might be surmised from the brief overview, it is sensible to implement substantial mathematical functionality using Mathematica itself. When I began at Wolfram ResearchI in 1991 I think perhaps 20% of the kernel code was in Mathematica, with the rest in C. Now it might be 50% or so, excluding external libraries. One advantage is that Mathematica is powerful language, and often one can do in a few lines what might take a few dozen in C. A disadvantage is that for many years we did not have traditional debugging tools. The Wolfram Workbench, an Integrated Development Environment (IDE) product, goes a long way toward filling this void. For readers unfamiliar with the technology workplace, I remark that the idea of leveraging of in-house technology is important and quite common. For example, when I worked on compilers and related technology for several years prior to graduate school, we had programs for representing annotated parse trees that were routinely ported from one project to another, with small project related tweaks along the way. The lesson is that good ideas and code can and should be recycled. Of course, every company must then be on the lookout for its mistakes, because not everything is recyclable and ideas don't come with those triangular number codes. The power of libraries (or: Never do something yourself that someone else will do for you, and better)... Until version 5 appeared in 2003, the code for the Mathematica kernel was essentially all written and maintained by WRI. There was some amount of code from legacy libraries LINPACK and EISPACK, for numerical linear algebra, that had initially been translated from Fortran to C. I believe there was some polynomial algebra code likewise translated from existing sources. But all of it was subsequently modified in-house. Early in version 5 development we planned to use LAPACK, the LINPACK/EISPACK successor, as an external library to which Mathematica would link. Reason: it takes advantage of processor-tuned efficient Basic Linear Algebra Subroutines (BLAS) which give as much as an order of magnitude speed improvement in machine arithmetic linear algebra. As development progressed it became clear that the Gnu Multiple Precision (GMP) library was similarly more efficient for arithmetic of numbers with many (thousands or more digits). We made accomodations to use GMP in many places, thus effectively removing some of our arithmetic code. Later we chose to encorporate specialized libraries e.g. for sparse linear algebra, in order to support some development elsewhere. Example: needed sparse linear solver for development of interior point code for linear programming. When deciding whether to use external libraries for certain functionality, we typically consider several questions. Does it do things Mathematica cannot do, or cannot do as well? What is the level of the library? State of the art? Are there licensing issues? If so, can they be resolved? What is the level of developer support? Do they fix their bugs? Do ported version exist for all platforms on which we support Mathematica? The advantage of libraries (or: We should have thought of this years ago)... Better code than we could produce in-house in reasonable time We gain the benefit of dedicated expertise from external development Some projects such as LAPACK/BLAS are well funded and have ongoing development We don't need to write/maintain it all ourselves The curse of libraries (or: Remind me why we chose to rely on these people?)... A drawback to use of libraries is that the code base is no longer under our control. Hence Mathematica users inherit bugs that find their way into such libraries, and we are responsible for the problems caused thereby. When we run up against bugs in library code we try to isolate causes, send smallish examples to the library authors/vendors, and hope for the best. Among other things this implies that we actually try to maintain good working relationships with various external library purveyors, large and small. In some cases we can also come up with workarounds in our own code. This involves adding code to detect failures in library code, and thereupon going to "plan B". Of course it also involves inventing and coding plan B. How to extend technology? (Very carefully)... In version 4 we unleashed our "packed array" technology. This is where lists of machine double precision real numbers, say, are maintained internally and processed as "raw" data structures containing tensors (vectors, matrices, etc.) of contiguous floating point numbers, instead of the usual Mathematica "lists" which are arrays of pointers to expressions, each of which might be a machine double. The benefit to such representation is the ability to make optimal use of fast library code e.g. BLAS, and to make use of our own vectorized machine precision numeric functions, in order to gain competitive advantage in regards to speed of large scale numeric processing. The down side is this had to be "seamless" insofar as we were not about to create new data structures and functions just for this purpose. So such lists need to be indistinguishable to the user from ordinary lists, except in speed of processing and memory hoofprint. That is to say, all functions from part extraction and assignment to numeric evaluation must look and act the same regardless of mode of internal storage. This is at it ought to be. The important thing to realize is that it placed a huge burden on development and also has had repercussions that are felt in debugging to this very day. Specifically, a problem reported to me by October 21 of 2006 can be attributed to an obscure part of the packed array technology from the mid-to-late 1990's. An invitation to infinite recursion... A dark side to a complicated program like Mathematica is the ability to make trouble in subtle ways. One is to invoke pairs of transformations that undo one another. Case in point: symbolic integration. This sits atop a tremendous amount of other functionality. One well known tactic for integration involves finding a partial fraction decomposition of a rational function. Another tactic is to accept "partial" results, that is, split integrands into summands that we can integrate and those we cannot. As it happens, in some cases it is also helpful to put a sum of terms over a common denominator. So imagine what happens if your code has transformations that use Together on things generated by Apart (these functions sound like opposites for a good reason...) Then we might well find that the "togethered" thing gets us right back to our input, if the integrated pieces happen to exactly cancel. And we go into recursion. In some cases it is simply difficult to draw a line between infinite recursion and simply "trying real hard" on pieces of a problem. While still allowing for some amount of the latter, we attempt to forestall recursion in at least a few distinct ways. *Caching computations "in progress" One is to rely on caching intermediate computations, in "canonicalized" form, so that later stages can attempt to look up results to see if we are reencountering a problem we have not yet finished processing (an obvious sign of recursion). *Depth charges A more coarse approach is to use certain counters in order to determine approximately how deep we might be in some computation. If they get incremented too far we give up. This has some obvious drawbacks (if we stop too soon) but, in practice, seems to be reasonable. *Blocking reusage of parent callers Still another thing is to "block" certain code from being used in certain places, on the supposition that it is likely to lead to recursion. We might do this by putting functions to block on a "stack" using Mathematica's Block construct. Which, by the way, was not so named for its "blocking" feature. It is simply a type of procedural "block" in the classical programming language sense. Sheer coincidence. New methods bring new problems... *Progress From time to time one learns of a new algorithm for doing a certain type of computation, say resultants of a pair of polynomials. One may read that the new method is vastly better than anything that came before it, and see, in the literature, examples to support that notion. Sometimes the new algorithms are easy to implement, and we do so. *Regress, or, What's the down side? Quite often it turns out that there are classes of problem for which the old method simply worked better e.g. faster or using far less memory. For example, when we implemented the Sylvester matrix method for computing resultants we found many classes of example for which the polynomial remainder sequence method worked better. This is not a surprise, and was not unexpected, but... In this situation we must resort to various heuristics in order to decide upon the method to use for a given input. This is how any good polyalgorithm should behave. So what is the problem? The added problem is that we now have a "legacy" wherein, if we get it wrong on the method choice, we behave worse than in past even though we might rightly claim that we use "better" methods. This is the dilemma of having more choices. Heuristics... The all-important quote: "Heuristics are bug ridden by definition. If they didn't have bugs, then they'd be algorithms." (Author: unknown) Lesson 1: Even good heuristics can only take you but so far. Lesson 2: Bad heuristics can take you further, but in the wrong direction. A drawback to code "improvement"... *Fix me here... In symbolic computation it is quite common to find that fixing one problem exposes weaknesses elsewhere. For example, say algorithm A decides to do operation X and, if it succeeds, operation Y, else Z. Say operation Y has six legs (as in "buggy"), but operation X returns a failure status so algorithm A goes on to do Z and gets a respectable result. What happens when operation X is improved so that it no longer returns a failure flag? *...break me there... This was NOT a trick question. What happens is that algorithm A now uses defective procedure Y and messes up. Code modularity... The ways in which various parts of Mathematica depend upon one another lead to certain undesirable outcomes. One is that the Mathematica kernel is not terribly modular. That is to say, it is difficult to extract various components e.g. "arithmetic" or "numerics" or "polynomial algebra" or "graphics", as there tend to be interconnections. As the kernel has been build up over many years this is even more difficult than might otherwise be the case, because in earlier times the need for modularity, clean "application program interfaces" (APIs), etc. were not so well understood, at least not by mathematicians. Times, happily, have changed. *The Mathematica kernel withstood the transition to the GMP library for arithmetic. This means the arithmetic, at least, has good, modular structure. *Mathematica has extensible numeric differential equation solvers. These, moreover, use some very nice modular code with carefully designed data structures. *Mathematica recently seen the addition of an extensible random number generator, complete with documented API. *Mathematica has a huge test suite This last means that when changes are made, if they have a bad impact in some part of the kernel far away this is often caught quickly in-house. Speed vs. replicability... Caution: This next topic scares people. Take a deep breath. *Constraining an evaluation In order to prevent code from hanging it is sometimes necessary to rely on time constraints. The flow of a computation (e.g. a symbolic integral) is then influenced by whether a particular subcomputation does or does not finish in under the allotted time limit. *Obvious consequence The same computation may follow different paths on different machines, or even on the same machine but under different work loads *Slightly less obvious consequence Results, both good and bad, may be sporadic and difficult to replicate. *Still less obvious consequence A faster machine can be much slower for a given computation How?! Like so: Algorithm A tries subcomputation X. If it succeeds in under its time constraint, it goes on to do Y, else Z. If Z is, for the particular problem at hand, much faster than Y, then the faster machine that completes X is now herading into the tarpit of Y while its slower cousin zips instead through Z. *Another, quite unobvious issue with use of such constraints Computations that must end due to time or similar constraints are aborted. Any "change of state" (e.g. temporary resetting of an option to some function) during the computation must then be restored by the caller. This can be tricky when coding in Mathematica and there are nested uses of TimeConstrained. Example: answer=TimeConstrained[ computation1; result=TimeConstrained[ computation2[SomethingThatChangesState],30]; FixStateIfAborted[result]; result , 10] If the inner TimeConstrained invocation is aborted then the state altered by computation2 will be restored. If instead the outer one is interrupted, and if, unlike this simple example, it is calling from another planet rather than just outside the door, it may not realize there could be a state change to repair. There is a function in Mathematica called CheckAbort that, I believe, is meant to be used for such scenarios. But...I don't know how it works, or whether it can really handle issues with nesting of TimeConstrained. Scaling an octopus... Over time the functionality, and corresponding code base, of the Mathematica kernel has grown considerably. This creates several burdens. *Design consistency must be maintained. This is vital because design consistency has been the backbone, I believe, of the commercial success of Mathematica. *New functionality must be robust. A small subset of my colleagues may disagree, but in general I find that neat bells and whistles do not win friends if they behave badly e.g. cause the program to crash. *New functionality should not detract too much from our capability to repair/improve/maintain old functionality. *There are endless opportunity costs involved in deciding on what needs to be added, what should be rewritten, what needs debugging, etc. *Interactions with existing technology may cause trouble. New methods allow us to make use of them elsewhere. An interesting problem then arises. Code that "worked" in past, but using perhaps non-robust technology, may begin to fail e.g. because the new methods are too slow, or have bugs, or are being extended beyond their appropriate "operating range" (e.g. relying on a polynomial solver to handle transcendental equations), or .... *New functionality requires documentation!!! *Larger memory footprint of the program (what will happen when Congress revokes Moore's Law?) *Larger code base implies a larger burden of maintaining legacy code. So how well does the Mathematica kernel scale as it grows? Thus far I think remarkably well. But as every stock prospectus will caution: "Past performance is no guarantee of future returns". Summary... I covered several problematic areas in the implementation of the Mathematica kernel... ...but most areas are not burdened by such issues... ...and even the exceptions tend to work fine for most purposes, so... ...the outlook far less bleak than one might think Here is what I expect to see down the road. Over time, some of the problematic areas actually get improved. Other ideas, methods, functions get developed that make some problem areas less relevant. We are at the crossroads between voodoo and science. This makes for interesting times in computational mathematics. It is anyone's guess how far we will go, or to what extent voodoo can be replaced, if not by science, then at least by something more deserving to be called "art". My motto? "I write the legacy code of the future."