## IntroductionAnalysis of biomedical data usually involves fitting the measured data to a model in order to make some predictions about the system under investigation. Choosing the right approach to experimental curve fitting is important if you are to get good results. Sometimes it may be sufficient to use an interpolation method, but often it will be necessary to use nonlinear fitting and all of its associated diagnostic statistics. For example, kinetic data may be fit to a Michaelis-Menten model in order to gain insight into the efficiency of an enzyme, or data from a baseline control experiment may be fit to a quadratic in order to predict the correction necessary for the full experiment. While these two examples may seem very similar, they are in fact very different, and it is important to understand how the differences affect the analysis. These differences lead to different procedures for performing the fit and to differences in the degree of care that must be taken to ensure good results. Another difference is that in the case of the Michaelis-Menten example each of the parameters being determined has a definite biological significance, while there is no biological significance to the parameters in the quadratic case. Clearly it is critical to check the determined parameters for physical reality, good confidence intervals, and so on if they are expected to have biological significance, but it is unimportant if you just want an approximate curve. The biggest difference though is that the Michaelis-Menten example is a nonlinear model, while the quadratic example is a mathematically linear model.
If you have to do a full curve fit, the critical question becomes whether a model is
linear or nonlinear. Unfortunately, this is not always immediately obvious (although,
as a general rule of thumb, biology is nonlinear). A nonlinear model can be defined as
one in which at least one of the partial derivatives of the model with respect to its
parameters includes at least one of the parameters. Use of the symbolic capabilities
of
Given some data and a linear model, the fitting algorithm solves certain linear equations
to determine the best-fit parameters and their confidence intervals. With a nonlinear model,
however, it is not possible to solve the equations directly and so the fitting algorithm has
to perform a search. With any search it is possible to get different results depending on
where you start and when you decide to stop. Nonlinear fitting is a little like being placed
in the middle of a mountain range and trying to find the lowest point. The point you think
is the lowest is dependent on where you are when you start and how long you are
prepared to look. Unless you search the entire area, no matter where you stop,
you cannot be certain that you have found the lowest point in the range or just the
lowest point that you have seen. In fitting terms, you can never be sure that you have
found the absolute best fit, just that you have found a good fit; worse, you have to make
sure that you even have a good fit. Although this is true with any software,
Because of these fundamental differences between linear and nonlinear fitting,
starting guesses and convergence checking are important for nonlinear models
but are irrelevant to linear models. The fitting algorithm in
Whether you need to perform a complex nonlinear fit or can make do with a simple
interpolation,
If you are simply trying to obtain a standard curve for comparison or for baseline
subtraction, and you have plenty of data to describe the line or curve, it is simplest
to use The returned InterpolatingFunction can be used to calculate the expected values of any
input x, but will be most accurate within the specified range, in this case between 0 and 100.
Compare this result with the expected value of Sqrt[50].
InterpolatingFunction will give the exact value of y at the x values used to
construct the approximation.
Linear fitting is performed in As an example, consider fitting the square root data to a quadratic. If you need diagnostic information to determine the quality of the fit or confidence intervals of the parameters, then it will be necessary to use the function Regress
available
in the standard add-on package Statistics`LinearRegression`.
This output is the default. In short, TStat should be high, PValue
should be below 0.05,
RSquared should be close to 1, and the lower EstimatedVariance,
the better. Many
more diagnostics can be generated, and a more complete example is shown in
Figure 2. A complete list of the diagnostics is as follows:
Good nonlinear curve fitting is an art form. Even with the best algorithms, it is often
necessary to provide accurate starting guesses for parameters, and it is always necessary
to check the results thoroughly. Fortunately, it is easy to provide starting guesses in
This data set (Meyer and Roth, 1972) gives five values for x1, x2, and y,
describing a reaction
involving the catalytic dehydration of n-hexyl alcohol. Here x1 is partial pressure of alcohol,
x2 is olefin, and y is the rate of reaction.
In short, the lower EstimatedVariance is, the better the fit, and the
smaller the range of the confidence intervals (CI), the better. Of particular interest
is FitCurvatureTable, which may be used to gauge the reliability of the linear
approximation confidence intervals. If the linear approximations are reasonable,
the maximum intrinsic curvature will be small compared to the confidence region.
As in the linear case, the default behavior shows only a few of the many diagnostic
statistics that may be generated. A more complete example is shown in
Figure 3.
Getting good results from curve fitting is not as simple as having good data and the right model. It is also important to choose the right algorithm. When performing nonlinear fitting, it is critical to take notice of all of the information that is provided and check that the fit is really a good one that makes physical sense. Remember that the algorithm being used knows nothing about biology, and it is up to you to accept or reject its results based on your knowledge.
Further examples of data analysis with
McCullough, B. D. "The Accuracy of (Ian Brooks is an applications developer at Wolfram Research.) |