|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1Dipartimento di Informatica e Sistemistica, Università degli Studi di Pavia, Pavia; and 2Dipartimento di Ingegneria dell'Informazione, Università degli Studi di Padova, Padua, Italy
Submitted 30 May 2003 ; accepted in final form 31 August 2005
| ABSTRACT |
|---|
|
|
|---|
identification of glucose minimal model; intravenous glucose tolerance test; normal subjects; insulin sensitivity; glucose effectiveness.
The MM is identified from plasma glucose and insulin concentrations measured in the individual during an intravenous glucose tolerance test (IVGTT). The standard sampling schedule is a frequent one (FSS) and consists of
30 points collected over 4 h. To reduce the experiment complexity and economic costs and facilitate the MM use in large clinical/epidemiological studies, reduced sampling schedules (RSS), consisting of 1314 samples, have been proposed for normal subjects (18). Unfortunately, by using RSS, the precision of SG and SI estimates deteriorates and often becomes poor. This difficulty was recently pointed out by Vicini and Cobelli (21), who proposed, as an alternative to the standard nonlinear least squares (NLS) model identification strategy, the use of a "population" approach, namely the iterative two-stage (ITS) approach (19). They showed on 16 normal (or healthy) subjects that ITS allows, in the presence of RSS, a significant improvement (with respect to NLS) of the precision of parameter estimates in the single individual. However, a first intrinsic limitation of ITS is given by the fact that, being a population approach, it requires a sufficiently large (e.g., tens of subjects) and homogenous IVGTT database. In addition, with ITS, any variation in the IVGTT dataset (e.g., the addition of a new subject or the elimination of an outlier) implies to recalculate the parameter estimates for all of the subjects. Other critical points of ITS concern statistical/theoretical issues. For instance, ITS assumes that model parameters are normally distributed within the population and provides confidence intervals that are, by construction, symmetrical with respect to point estimates (e.g., means ± 2 SD) and thus not necessarily realistic (e.g., they can include negative values). The availability of reliable confidence intervals can be crucial in those investigations where the estimated physiological parameters have to be used, possibly together with other clinical, metabolic, and genetic variables, e.g., for characterizing a population or for performing clustering, classification, regression, or risk analysis (6, 10, 11).
In this paper, we approached the problem of identifying MM in the presence of RSS by developing a Bayesian parameter estimation method implemented by Markov chain Monte Carlo (MCMC) (7, 9). The method works on a single individual basis and thus does not impose the requirements of population methods like ITS. By applying the method to the same IVGTT dataset of 16 normal subjects analyzed by Vicini and Cobelli (21), we show that, in passing from FSS to RSS, only a limited deterioration in parameter estimate precision occurs and that, in nearly all of the subjects, point parameter estimates do not significantly change. In addition, by using the Bayesian method, credible confidence intervals (e.g., excluding negative values) are obtained. In nearly all of the subjects, the RSS confidence intervals are very close to the FSS ones, suggesting that the Bayesian methodology can compensate for the loss of information due to sampling reduction. The new method also performs better, in terms of both accuracy and precision of estimates, than the ITS population method (and thus NLS).
| MATERIALS AND METHODS |
|---|
|
|
|---|
The IVGTT data have been already described by Vicini et al. (20), to which we refer the reader for details. Briefly, in each of the 16 normal adults, a glucose bolus (between 300 and 330 mg/kg body wt) was administered at time 0. Plasma glucose and insulin concentrations were measured for 4 h with an FSS of 30 samples, from which an RSS of only 14 samples (bold characters) was extracted (0, 2, 3, 4, 5, 8, 10, 12, 14, 16, 18, 20, 24, 28, 32, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 140, 160, 180, 210, 240 min). The adopted RSS is based on previous studies presented in the literature on optimal sampling schedules for MM identification (4, 17). Measurement errors of glucose concentrations are assumed to be additive, independent, and Gaussian, with zero mean and a 2% coefficient of variation (CV).
MM of Glucose Kinetics
The equations of the MM of glucose kinetics during an IVGTT (2) can be written as follows (15):
![]() | (1) |
![]() | (2) |
MM Identification: NLS and ITS
MM parameter estimation consists of determining the unknown model parameter vector
= [SG, SI, G0, p2]T from plasma glucose concentration measurements by assuming plasma insulin concentration I(t) as a known input (from G0, one obtains V through the above relation, being D and Gb known). Plasma glucose concentration measurement at time ti, i.e., G(ti), can be described as
![]() | (3) |
) is a function of
defined from Eqs. 1 and 2, yi is the measurement, and vi is the measurement error. Note that all model parameters are uniquely identifiable a priori once I(t) in Eq. 2 is assumed to be known. Single Individual Methods
The most widely used approach to estimate
from Eq. 3 is NLS:
![]() | (4) |
provided by Eq. 4 also represents the maximum likelihood estimate.
Precision of NLS estimates, expressed by their standard deviation, is obtained from the square root of the diagonal elements of the inverse of the Fisher information matrix (3). From this matrix, a symmetrical 95% confidence interval can be obtained as
± 2SD.
Population Methods
When the data for the MM identification are available in a population of M homogeneous subjects, the so-called population methods, such as ITS (21), can be applied. Briefly, the basic idea of ITS consists of modeling the M unknown parameter vectors
as M different realizations of a random vector drawn from a Gaussian distribution with mean µ and covariance matrix
. The unknown model parameters
, given as µ and
, are then obtained in each subject by maximum a posteriori estimation. The vector µ and the matrix
are, however, unknown, and they are iteratively estimated by using an empirical procedure which, at each iteration, exploits the current estimates of the M model parameter vectors. Once convergence of the method is achieved, a first-order approximation is used to calculate, in each subject, confidence intervals of the estimated model parameters. These intervals are symmetrical by construction with respect to point estimates.
In principle, other more sophisticated population approaches, such as nonlinear mixed effects models (5) or hierarchical Bayesian models (22), could be used to cope with IVGTT MM identification in the presence of RSS. A detailed discussion comparing ITS and other population approaches is reported by Vicini and Cobelli (21). However, the main drawback of all population approaches is that they appear to be oversized for our specific problem. In fact, 1) they are required to have IVGTT data in a homogeneous and quite large group (i.e., tens of subjects); and 2) any variation in the data (e.g., the addition of a new subject or the elimination of an outlier) implies the recalculation of MM parameter estimates in all of the M subjects.
MM Identification: a Bayesian Approach
The key quantity in Bayesian parameter estimation is the a posteriori probability density function f
|y(
|y) (also called, for brevity, the posterior) of the unknown model parameters ("
|y" reads "
given y") (8). From this function, which provides a complete description of the shape of the estimate uncertainty, the 95% confidence intervals can be derived, e.g., as the interval between the 2.5 and the 97.5 percentile of the distribution. Notably, these intervals are allowed to be asymmetrical with respect to any of the possible point estimates obtainable in a Bayesian framework, such as the median, the mode, or the mean.
The a posteriori probability density function of the parameter vector
can be obtained from the Bayes theorem as:
![]() | (5) |
In the right-hand side of Eq. 5, fy(y) does not depend on
and can be considered as a scale factor, fy|
(y|
) is the likelihood function of the data (obtainable from Eq. 3 by exploiting the knowledge of measurement error statistics), and f
(
) is a given function (also called the a priori density function of
or simply the prior) that formalizes in statistical terms the a priori available knowledge on the unknown model parameters. The problem of choosing f
(
) is discussed below.
The Choice of the Prior
The first a priori knowledge that can be incorporated within f
(
) concerns nonnegativity of all MM parameters. A natural choice is thus the adoption of a log-normal distribution (LN) for SG, SI, G0, and p2. Additional a priori knowledge, which is likely useful to overcome possible numerical identifiability problems (13) in presence of RSS, is obtainable by exploiting the sample distributions of MM parameters estimated by NLS in separate groups of subjects undertaken to IVGTT with FSS. This analysis was performed in an independent study on 50 normal subjects (not including the 16 subjects considered in this paper) and showed that estimates are approximately log-normally distributed. Therefore, we have assumed:
![]() | (6) |
in the 50 normal subjects. These values were also found to be stable with respect to changes of the training set, such as the random extraction of subsets of the 50 subjects. Implementation by MCMC
Obtaining f
|y(
|y) from Eq. 5 is a task that is analytically intractable because of the complex relationships between parameters and data. Therefore, an MCMC simulation strategy was used (9, 12). Details are given in the APPENDIX.
| RESULTS |
|---|
|
|
|---|
(
). For allowing a direct comparison with the results of Vicini and Cobelli (21), we will report the estimates of V in place of those of G0. The focus will be on the two most clinically important MM parameters, i.e., SG and SI.
FSS
Figure 1 reports the estimates of MM parameters obtained with NLS and Bayesian estimators by using Simulation Analysis and Modeling Software II (1) and our MCMC algorithm implemented in MATLAB (The MathWorks, Natick, MA), respectively.
|
However, point estimates are very similar. For the sake of brevity, we simply reported the mean of the percent difference between them, computed as
![]() | (7) |
RSS vs. FSS
Figure 2 shows MM parameter estimates and their confidence intervals obtained in the 16 subjects by using the Bayesian estimator with FSS (already shown in Fig. 1) and RSS. The pictures also include the a posteriori distribution of SG and SI in two representative subjects (subjects 1 and 18).
|
As far as point estimates are concerned, in general they are very similar. In fact, percent differences between RSS and FSS [mean and the 10 and 90 percentile, (in parentheses)] are: 1 (13, 25), 4 (33, 7), 3 (22, 20), and 0.4 (6, 8) for SG, SI, p2, and V, respectively, and correlation is excellent (slope close to 1 and intercept close to 0) for SI (r = 0.99) and V (r = 0.93), and good for SG (r = 0.87) and for p2 (r = 0.79).
Therefore, the comparison of RSS and FSS point estimates and confidence intervals allows us to speculate that the same physiological inferences can be drawn by analyzing MM estimates obtained with the Bayesian estimator.
RSS: Comparison with NLS and ITS
The results obtained in presence of RSS with NLS, ITS, and Bayesian estimators are summarized in Fig. 3 for SG and SI [ITS estimates are taken from Vicini and Cobelli (21)].
|
Notably, point estimates provided by the three methods are, in general, quite similar, and in only a few cases a significant difference can be detected, such as for subject 16. In summary, percent differences between NLS and Bayesian point estimates [mean and the 10 and 90 percentiles, (in parentheses)] are 5 (5, 24), 13 (48, 1), 12 (55, 3), and 0.0 (3, 2) for SG, SI, p2, and V, respectively, whereas percent differences among precision CVs are 351 (18, 824), 348 (4, 709), 493 (34, 1,101), and 175 (7, 385). Percent differences between ITS and Bayesian point estimates are 0.9 (18, 24), 11 (10, 25), 14 (66, 29), and 0.2 (8, 8) for SG, SI, p2, and V respectively, whereas percent differences among precision CVs are 57 (36, 179), 21 (32, 82), 29 (33, 139), and 9 (23, 57).
Sensitivity to the Choice of the Prior Distribution
The results reported so far were obtained using the prior-information model described in MM identification: a Bayesian approach. In particular, f
(
) is a log-normal distribution with parameters fitted into a separate population study. It is important to evaluate the sensitivity of Bayesian estimates to the shape of the prior. Therefore, we have recomputed the RSS Bayesian estimates, assuming, such as for ITS, that f
(
) is a normal distribution with mean (0.025, 0.00059, 282, 0.042) and covariance diag([0.0112, 0.000352, 352, 0.052]). These values make the 95% a priori variability intervals similar to those of the fitted log-normal distribution. Figure 4 shows the new estimates of SG and SI in 16 subjects (middle bars) compared with those previously obtained in RSS vs. FSS (bottom bars).
|
(
) normal vs. log-normal can be summarized [mean and the 10 and 90 percentiles, (in parentheses)] as 2 (5, 12), 6 (23, 1), 2 (14, 8), and 0.08 (2, 2) for SG, SI, p2, and V, respectively. To better visually assess the similarity between the two situations, Fig. 5 shows the posterior distributions of SG and SI in a representative subject (no. 5).
|
Another interesting issue is to assess the role of a priori information, i.e., to understand how important it is to specify an informative distribution. To provide an answer to this question, we have recalculated RSS estimates by MCMC without supplying a priori information, i.e., f
(
) is uniform, so that the a posteriori distribution coincides with the likelihood (which in turn is a function of the residual weighted sum of squares of NLS). In Fig. 4, the new estimates of SG and SI in 16 subjects (top bars) are compared with those previously obtained in RSS vs. FSS (bottom bars). In several cases, point estimates are very close to the one obtained in presence of an informative prior, even if some differences are present. In summary, percent differences between the estimates obtained under the assumption of f
(
) uniform vs. log-normal [mean and the 10 and 90 percentiles, (in parentheses)] are 7 (2, 22), 11 (40, 3), 58 (13, 351), and 2 (5, 0.4) for SG, SI, p2, and V, respectively. Confidence intervals are equal to or larger than those obtained with the log-normal prior. This demonstrates that a priori information plays a role in improving the performance of the Bayesian estimator. Moreover, because even using the noninformative prior brings, in most cases, estimates much more precise than those obtained by NLS (e.g., compare subject 17 in Fig. 4 with Fig. 3), we can conclude that the MCMC simulation strategy avoids the numerical identifiability problems often present with NLS estimates.
| DISCUSSION |
|---|
|
|
|---|
To better grasp the importance of having a credible measure of estimate uncertainty, let us discuss in more detail three situations. First, suppose one wants to characterize a (sub)population of M subjects by determining the variability of SG within it. The most natural and easy-to-implement approach is the determination of the histogram of SG from its M point estimates. When M is quite small, e.g., M = 16, as in our case, the histogram is unavoidably rough (Fig. 6, top) and also does not take into account the different degrees of uncertainty of the M parameter estimates. On the other hand, by exploiting the sampled a posteriori distributions of SG obtained through MCMC (e.g., Fig. 5) in the M subjects, a much more significant distribution (Fig. 6, bottom) can be obtained intrinsically, accounting for the different precision of the estimates in the M subjects. Of interest, the distribution of SG exhibits a smoother tail toward high rather than low values. As a second example, let us consider a classification problem where a subject has to be placed in class A or B on the basis of MM parameter estimates. If, for instance, MM parameter estimates are uncertain, the use of point estimates without taking into account their credibility is not statistically robust unless A and B are well separated. By contrast, exploiting the posterior distribution of parameter estimates allows one to obtain not only a more statistically sound classification of the subject, but also a measure of the credibility of the chosen class. The third example concerns regression (10) and risk analysis (6, 11), where, using a Bayesian framework, the uncertainty of MM parameter estimates can be propagated into the predicted outcome variables.
|
(
) developed in MM indentification: a Bayesian approach is only appropriate for normal subjects. For insulin-resistant subjects, for instance, a starting point could be the prior proposed in Pillonetto et al. (15) to cope with numerical identifiability problems that can sometimes affect MM identification, even in the presence of FSS. | APPENDIX |
|---|
|
|
|---|
30,000 samples were required to have a satisfactory description of the posterior and thus a "robust" assessment of point estimates and confidence intervals. Tens of minutes are required to identify the MM model in a single individual on a Sun Ultra 10. The MCMC algorithm was originally implemented in MATLAB, but to make it easily accessible to researchers the method we proposed, a compiled version for Windows platform, is available upon request. | GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |