Abstract
Bivariate regression is used to estimate energy expenditure from doubly labeled water data. Two straight lines are fitted to the logarithms of the enrichments of oxygen18 and deuterium simultaneously as a bivariate regression, so that the correlations between the oxygen and deuterium regression coefficients can be estimated. Maximum likelihood methods are used to extend bivariate regression to unbalanced situations caused by missing observations and to include replicate laboratory determination from the same urine samples, even if one of the replicates is missing. Use of maximum likelihood allows the determination of a confidence interval for the energy expenditure based on the log likelihood surface rather than use of the propagation of variance methods for nonlinear transformations. The model is extended to include the subject's deviations from the two lines as a bivariate continuoustime firstorder autoregression to allow for serial correlation in the observations. The analysis of data from two subjects, one without apparent serial correlation and one with serial correlation, is presented.
 deuterium
 oxygen18
 maximum likelihood
the doubly labeled water technique, pioneered by Lifson (7, 8) for energy expenditure measurement in animals, is now frequently used for precise estimation of free living energy expenditure in humans (2, 13). The technique uses two stable isotopes, oxygen18 (^{18}O) and deuterium (^{2}H), which can exist in water form as H_{2} ^{18}O and^{2}H_{2}O, respectively. The mixed isotopes are given to the subject to drink, and after the isotopes are allowed to equilibrate with total body water, samples of body fluid, e.g., urine or saliva, are taken to determine isotopic enrichment over the measurement period, 14–21 days for adults and 5–8 days for infants (10). The elimination of the isotopes from the body describes a monoexponential function in which the ^{2}H is eliminated only as water, and the ^{18}O is eliminated both as water and carbon dioxide (CO_{2}) (10). The difference in the elimination rates of the two isotopes gives an estimate of the rate of CO_{2} production in the body. The CO_{2} production can be converted to energy expenditure by using either Weir's equation (14) or Elia's equation (5), if the respiratory quotient (RQ) of the subject over the study period is known. In an “ideal” experiment, the isotope elimination curve would consist of as many points (samples) as could be conveniently collected during the course of the experiment. Under more practical conditions, the number of points in each curve is limited by both the sample collection and sample processing constraints, as well as the cost of ^{18}O. Current practice, as determined from the various methods sections of publications, records as few as two (twopoint method) data points to as many as 14–22 (multipoint method) data points (10) used to determine the rate of the isotope elimination.
Cole and Coward (1) have reported a statistics technique to determine confidence intervals for these approaches. Their model, in their notation, follows the recommendation of Prentice (10)
The CO_{2} production rate, with fractionation ignored, is
In this article, we use maximum likelihood estimation based on the multipoint method. Confidence limits for energy expenditure are obtained from the log likelihood surface so that methods of propagation of error variances through nonlinear functions are not necessary. Also, instead of treating the logs of the isotope enrichments as univariate regressions, the pair of observations ^{2}H and^{18}O are treated as a bivariate regression, with correlation between the two observations in the pair. In the analysis of doubly labeled water data, the problem can be somewhat more complicated if one of the measurements at a given time is missing. The other observation can be discarded, but this is a waste of information. If each urine sample is split and analyzed twice, it is possible to estimate the measurement error (analytical variance of the laboratory) as well as the physiological variance of the subject. Prentice (Appendix 4.2, p. 290–293 of Ref. 10) discusses the effect of autocorrelated errors on linear regression and states that “the effect of autocorrelation is to reduce the advantage of the multipoint method.” It is also stated that
Because autocorrelation between errors may arise naturally or may arise from attempting to fit an incorrect model, we recommend that if there are ten or more equallyspaced data points, the autocorrelation coefficient for the residuals is obtained. A high absolute value may help to identify wrongly specified models.
Autocorrelation is also referred to as serial correlation. Serial correlation can be observed in residual plots when the residuals seem to have systematic excursions rather than a random appearance. Serial correlation does not affect the position of the fitted line very much, but it does affect confidence intervals of estimates in a nonconservative way. If serial correlation exists and is ignored in the model, the confidence intervals will be too narrow. In this paper we take the approach that serial correlation in the errors can be modeled, even for unequally spaced data, and tested using likelihood ratio tests to determine whether the serial correlation is statistically significant (6). If serial correlation is present in the data, the error model gives more appropriate statistical tests and confidence intervals.
METHODS
The methods presented here are based on bivariate regression (a special case of multivariate regression) in which there are two correlated response variables. Classically, these methods require balanced data, so there can be no missing observations, and, if there are replicate analytical determinations at each time, there must be the same number at every time, and the means of these replications are used as data. Models of this type can be fitted using SAS (11) PROC GLM by using the MANOVA statement. However, replicates must be averaged, and, if an observation is missing at some time point, that time point is discarded. This is due to the assumption of equal covariance matrices at different times in the bivariate regression. Because there are two components of variance, the analytical variance of the laboratory and the physiological variation of the subject, it is not possible to weight the regression by the number of replications. A proper weighting would be a function of these two components of variance.
Physiological variation causes both log enrichments to vary in a similar way about the fitted lines. This is where serial correlation may come into the model. This variation about the lines may not be independent from observation time to observation time, as is required by standard regression analysis. If an enrichment is above the line at one time point, it is likely to be above the line at the next time point. This variation about the line in a dependent fashion is serial correlation, and the physiological variation is a random process. A physiological model that fits the doubly labeled water data well is
Estimating the variance of the laboratory precision.
The variance of the laboratory precision can be estimated directly from the replication data. If there are m _{i}replications at observation time i, these replications can be averaged and the sum of squares of the deviations from the estimated mean can be calculated. This estimate hasm _{i} − 1 degrees of freedom. The sums of squares and degrees of freedom are summed over observation times, and the variance is estimated as the final sum of squares divided by the total number of degrees of freedom. This must be carried out separately for D and O, because the laboratory precision may be different for deuterium and oxygen18 determination. Because these two variances can be estimated first, it eliminates two parameters from the maximum likelihood estimation. The standard deviations of the laboratory precision are estimated as the square roots of these variances.
Modeling the physiological variation.
The simplest model for physiological variation is that γ(t
_{i}) is statistically independent at different observation times and has a Gaussian distribution with zero mean and unit variance. The variances of the two physiological inputs are then
This model has four parameters in the linear regression, two standard deviation parameters for the laboratory precision, ς_{D} and ς_{O}, and two nonlinear parameters, g_{D} and g_{O}, that are the standard deviations of the physiological variation. Details of the maximum likelihood estimation, the method of determining confidence intervals on the energy expenditure, and the extension to models with physiological serial correlation are shown in the .
EXAMPLES
Table 1 shows the enrichment data for a subject without significant serial correlation (subject 1). Missing values are indicated by periods. The details for the calculation of the scale factors are reported by Prentice (p. 217, Ref.10). The enrichments are represented as a fraction of the initial dose giving a normalized enrichment by use of the formula
The natural logarithms (base e) of the scaled data are calculated, and the standard deviations of the laboratory precision are estimated from the replications of the log data. For this data set, the standard deviations of the laboratory precisions are estimated as δ_{O} = 0.00073 and δ_{D} = 0.01166. It should be noted that the standard deviation of the laboratory precision based on the logs of the data is the coefficient of variation (standard deviation divided by the mean) of the original data.
After the lines are fitted to the data, the predicted values of the lines at each time point are calculated as well as the residuals. The normalized residuals are calculated by dividing each residual by its standard deviation. Plots of the fitted lines and the normalized residuals are shown in Fig. 1(left). A normalized residual that is >2.5 in absolute value should be investigated as a possible error. Various investigators use values of normalized residuals in the range of 2.0–3.0 for detecting errors. The second D normalized residual at 7.06 days is −2.93, whose absolute value is >2.5. Going back to the raw data, we see that the value 521.742 appears to be too low relative to the other data. Setting this value to missing produces normalized residuals, where the largest in absolute value is the second D normalized residual of −2.06 at 12.06 days. This is considered to be acceptable. The estimated coefficient of variation of the measurement precision is decreased slightly toς̂_{D} = 0.00969.
From the regression analysis, 95% confidence limits on the slopes and intercepts can be directly calculated and inverted to get confidence limits on N
_{O} and N
_{D}
The data for a second subject, who had significant serial correlation, are shown in Table 2. The parameters forsubject 2 are δ_{a}(O) = 173.88, δ_{a}(D) = 649.969, δ_{t}(O) = −16.75, δ_{t}(D) = −117.745, a = 0.0121, W = 4.9257, and A = 211.52. The RQ for this subject is estimated to be 0.857.
The results of the regression for subject 2 are
The normalized residuals for this subject are shown in Fig. 1(upper right). These residuals show a more systematic behavior than those from the first subject. This second subject has significant serial correlation based on a likelihood ratio test. The change in −2 ln likelihood when the serial correlation parameter is included is 4.93 and is tested as χ^{2} with one degree of freedom (P = 0.014). When serial correlation is modeled, the physiological deviations from the lines are related at different times. The estimated deviation at one time is used to predict the deviation at the next time. The difference between the predicted deviation and observed deviation can be called a recursive residual. These recursive residuals should be nearly uncorrelated if the error structure is properly modeled. Figure 1 (bottom right) shows the normalized recursive residuals for the second subject. Modeling the serial correlation appears to have removed the systematic behavior. The residual at 5 days looks suspicious, because the data points are all slightly below −2.0. This could be a timing error, possibly caused by an improper urine collection. These data points were not removed from the analysis.
Table 3 shows a comparison of the estimated standard errors of the estimated energy expenditure obtained from the −2 ln likelihood surface and by the method of Cole and Coward (1) for 10 subjects. The standard error of the likelihood method is obtained by dividing the 95% confidence interval by 4 (±2 SE). A computer program in FORTRAN is available from the first author. The data input part of the program is specific for our data format from the spreadsheet that we use. This part of the program would need to be modified by a user with a different data format. The program is fairly large because it includes nonlinear optimization subroutines used to search for maximum likelihood estimates of the nonlinear parameters.
DISCUSSION
We have presented a new multipoint method for estimating energy expenditure using doubly labeled water. The method is based on bivariate regression with two components of variation, physiological variation and the variances of the laboratory precision. Unbalanced data caused by missing observations are handled using a maximum likelihood approach. Confidence intervals for the estimated energy expenditure are obtained directly from the log likelihood surface.
Table 3 shows that the estimated energy expenditures are very close to the estimates obtained by the Cole and Coward (1) method. The standard errors for the Cole and Coward method tend to be somewhat larger than those of the likelihood method. This can be partly explained by the fact that the likelihood method uses all available data when a replicate or a replicate pair are missing. It is possible that the approximations used when propagating variances through nonlinear functions may tend to increase the estimated standard errors.
The model is extended to handle serial correlation in the physiological variation. This model is an improvement over the model in the book by Jones (see p. 181 in Ref. 6). The new error model based on a continuoustime firstorder autoregression fits the data better.
The variances of the logs of the laboratory precision are estimated from the replicate data, where the urine samples are divided into two samples and analyzed separately. Each replicate pair produces one degree of freedom for the variance estimate. If one of the replicates is missing, there is no information about the variance from that pair. Two variances are estimated, one for deuterium and one for oxygen18. These unbiased estimates of the two variances are then held fixed in the maximum likelihood estimates of the coefficients of the two lines and the parameters of the physiological variation.
Normalized residuals from both the estimation of the variances of the laboratory precisions and the fitting of the lines are used for error detection. Based on the standard normal distribution, there is some question about how to choose a cutoff value for detecting an outlier. In our data we usually have ∼14 observations on both deuterium and oxygen18. If there is an outlier in the data, it tends to pull the fitted line toward itself, making the residual smaller. We recommend that an outlier with an absolute value >2.5 be considered as a possible error. A case can be made for a cutoff of anything in the range of 2.5–3, because the number of observations increases the probability of an observation in the range of 2–3 in absolute value. In engineering, the value of 3 is often used, but there may be thousands of data points.
If there is serial correlation in the physiological variation, it is important that it be modeled in the error structure. Modeling the serial correlation error structure affects the estimation of the lines very little, but it does affect the width of the confidence intervals. In general, confidence intervals are shorter if the data have serial correlation and it is not modeled, and these shorter intervals are not correct. It is possible that our model can underestimate the width of the confidence intervals. Some of the parameters, such as RQ, used in the model are assumed to be known, when they are actually estimates.
Acknowledgments
We thank the reviewers for their comments.
Footnotes

Address for reprint requests and other correspondence: R. H. Jones, Dept. of Preventive Medicine and Biometrics, School of Medicine, Box B119, Univ. of Colorado Health Sciences Center, 4200 East 9th Ave., Denver, CO 80262 (Email: rhj{at}times.uchsc.edu).

The first author is partially supported by National Institute of General Medical Studies Grant GM38519. The mass spectrometry core is supported in part by the following National Institutes of Health grants: HD04024, HD20716, DK48520, and DK49181. The Center for Human Nutrition has supported one author (B. J. Sonko) for a pilot study.

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. §1734 solely to indicate this fact.
 Copyright © 2000 the American Physiological Society
Appendix
Maximum Likelihood Estimation
To handle laboratory replications, physiological variation, and missing observations, it is necessary to use maximum likelihood estimation. These estimates are usually obtained by finding the maximum of the log likelihood or the minimum of −2 log likelihood with respect to the unknown parameters. The reason −2 log likelihood is often used is that χ^{2} likelihood ratio tests for significance of parameters can be calculated from the change in −2 log likelihood when parameters are added or eliminated from the model. In textbook problems, maximum likelihood estimates are usually obtained by differentiating the log likelihood with respect to the unknown parameters, setting the derivatives to zero, and solving a system of equations. In the real world there is often no closedform solution, and numerical searches need to be used.
In matrix notation, the regression model at a single time point,t
_{i}, with a replicate for both the^{2}H and ^{18}O, is
The search for a minimum of −2 log likelihood can be simplified, because for given values of g_{D}, g_{O},
In the unbalanced case, consider the modification when one of the replications is missing. In Eq. 5
, suppose the second oxygen replication is missing. The resulting equation is
Confidence Intervals From the Log Likelihood Surface
Maximum likelihood estimation also allows confidence intervals for the estimated parameters to be estimated directly from the −2 log likelihood surface. This method is sometimes referred to as “likelihood ratiobased confidence intervals” (see p. 289 of Ref.12) or “profile likelihood confidence intervals” (see p. 43 of Ref. 4).
Model Eq. 4
has four linear parameters, α_{D},k
_{D}, α_{O}, and k
_{O}. The main quantity of interest is
Serial Correlation
The physiological variation can be modeled as a continuoustime firstorder autoregression (see p. 92 of Ref. 6). Let δ_{i} = t
_{i} −t_{i}
_{−}
_{ 1} be the time interval between observation time t_{i}
_{−}
_{ 1} and observation time t
_{i}. The model is
Given this model for the physiological variation, the parameters of the model can be estimated using the state space methodology, as in Jones (6). To obtain the likelihood ratio test for whether the inclusion of the parameter for serial correlation is significant, the models with and without serial correlation are fitted to the data, and the change in −2 ln likelihood is tested as χ^{2} with one degree of freedom. The degrees of freedom are the number of additional parameters included in the model, in this case the single parameter “a.”