AIDS Research with Mathematica: An Introduction to Curve Fitting

"Sedimentation Equilibrium Analysis of the HIV NucleoCapsid Protein p24"

download notebookDownload this example as a Mathematica notebook.

Introduction to the Biology

The detailed structure of the HIV core remains unknown. Using electron microscopy, though, researchers believe it to be a highly organized tight shell.


Graphic Courtesy of Janice Kuby

This "shell" surrounding the viral RNA is composed of the proteins p24 and p7. Since disrupting this protective shell would expose the fragile RNA, it provides a potential target for antiviral agents. The first step toward a more detailed understanding of the shell is to determine the association properties of the individual components.

Introduction to the Technique

Two main factors act on a protein when it is spun in an analytical ultracentrifuge: the centrifugal force pushing the protein toward the bottom of the centrifuge cell; and the tendency of the sample to diffuse back up the resulting concentration gradient. There is a range of rotor speeds at which these forces are at equilibrium and a stable protein distribution may be obtained.


When an equilibrium is reached, a single homogeneous protein distributes exponentially, given by the following equation:
where [Graphics:Images/index_gr_5.gif] and [Graphics:Images/index_gr_6.gif]are the concentrations of the sample at radial position [Graphics:Images/index_gr_7.gif] and at a reference position, taken to be the meniscus, respectively. M is the sample molecular mass, [Graphics:Images/index_gr_8.gif] and rho are the sample partial specific volume and solvent density, omega is the angular velocity, [Graphics:Images/index_gr_9.gif] is the distance from the center of rotation, [Graphics:Images/index_gr_10.gif]is the radial position of the reference concentration, R is the Universal Gas Constant, and T is the absolute temperature.
For a simple self-associating system,  monomer + monomer [Graphics:Images/index_gr_11.gif] dimer, the observed distribution will now be the sum of two exponentials, one for monomer and one for dimer.

[Graphics:Images/index_gr_12.gif][Graphics:Images/index_gr_13.gif] +

By fitting the measured data to a series of aggregation models, it is therefore possible to determine the degree to which the protein self-associates.

Here we use Mathematica to demonstrate one possible way to fit a set of experimental data to this model.

Read in the dataset.


Provide the run and sample parameters and calculate the constant term.


Display the primary data.



Load the nonlinear regression package.


The first model to try is that for a single species.   

Determine the best fit values for its mass and [Graphics:Images/index_gr_21.gif] using NonlinearRegress with starting guesses of 20,000 Da. and 0.01 OD[Graphics:Images/index_gr_22.gif].



The determined mass of 39,430 ± 155 Da is significantly greater than the monomer mass of 25,562 Da obtained from Mass Spectroscopy.

Display the observed data and the fitted curve. The intermediate plot of the best fit line alone is suppressed.



Check the quality of the fit by displaying the residuals. If the model is correct, there should be no visible pattern.



The nonrandom residuals indicate that the simple model is incorrect.

We now know from the determined average molecular weight that P24 self-associates. The next step is to determine the degree of self-association.

Now, try a model with components representing the monomer and dimer mass. Try this model next, as the average molecular weight determined previously, 39.4 kDa, lies between the monomer mass of 25.5 kDa and the dimer mass of 51 kDa. For the sake of simplicity, objects relating to the monomer-dimer model are identified by the suffix 12.

Fit to a monomer-dimer model.



This model is clearly better because the variance, a numerical value for the quality of the fit, dropped by almost an order of magnitude.   

Show the new fit.




Display the new residuals.



Although the drop in variance is significant, it is also important that the residuals are now random.

Break the model into its components, thereby showing the amount of monomer and dimer in the sample. Display of the intermediate plots is again suppressed.  





Although the monomer-dimer model fits the data well, it is necessary to examine other likely models.  

Now to check for higher association, fit to a model including a tetramer component.



The varience has dropped again after adding the tetramer component. This time, however, the drop is much smaller, on the order of one percent. The question, then, is whether this drop is meaningful.

Display the fit.




Display the residuals.



There is little visible difference in the residuals between the monomer-dimer and monomer-dimer-tetramer models.

Display the individual components.






By displaying each of the individual components, the insignificant amount of tetramer in the sample becomes clear. The slight drop in the variance is simply due to adding one extra parameter to the model.


From these simple models, it is clear that P24 does not self-associate significantly beyond dimer. Although this analysis does not take into account the contribution of the RNA and P7 to the formation of the nucleocapsid, the cartoons showing P24 forming a highly associated and organized protective shell by itself cannot be correct.