## Abstract

The minimal model (MM) of glucose kinetics during an intravenous glucose tolerance test (IVGTT) is widely used in clinical studies to measure metabolic indexes such as glucose effectiveness (S_{G}) and insulin sensitivity (S_{I}). The standard (frequent) IVGTT sampling schedule (FSS) for MM identification consists of 30 points over 4 h. To facilitate clinical application of the MM, reduced sampling schedules (RSS) of 13–14 samples have also been derived for normal subjects. These RSS are especially appealing in large-scale studies. However, with RSS, the precision of S_{G} and S_{I} estimates deteriorates and, in certain cases, becomes unacceptably poor. To overcome this difficulty, population approaches such as the iterative two-stage (ITS) approach have been recently proposed, but, besides leaving some theoretical issues open, they appear to be oversized for the problem at hand. Here, we show that a Bayesian methodology operating at the single individual level allows an accurate determination of MM parameter estimates together with a credible measure of their precision. Results of 16 subjects show that, in passing from FSS to RSS, there are no significant changes of point estimates in nearly all of the subjects and that only a limited deterioration of parameter precision occurs. In addition, in contrast with the previously proposed ITS method, credible confidence intervals (e.g., excluding negative values) are obtained. They can be crucial for a subsequent use of the estimated MM parameters, such as in classification, clustering, regression, or risk analysis.

- identification of glucose minimal model
- intravenous glucose tolerance test
- normal subjects
- insulin sensitivity
- glucose effectiveness.

ever since its inception in the 1980s, more than 400 papers have relied on the minimal model (MM) of glucose kinetics (2) to measure metabolic indexes such as glucose effectiveness (S_{G}) and insulin sensitivity (S_{I}) in humans in both normal and physiopathological conditions.

The MM is identified from plasma glucose and insulin concentrations measured in the individual during an intravenous glucose tolerance test (IVGTT). The standard sampling schedule is a frequent one (FSS) and consists of ∼30 points collected over 4 h. To reduce the experiment complexity and economic costs and facilitate the MM use in large clinical/epidemiological studies, reduced sampling schedules (RSS), consisting of 13–14 samples, have been proposed for normal subjects (18). Unfortunately, by using RSS, the precision of S_{G} and S_{I} estimates deteriorates and often becomes poor. This difficulty was recently pointed out by Vicini and Cobelli (21), who proposed, as an alternative to the standard nonlinear least squares (NLS) model identification strategy, the use of a “population” approach, namely the iterative two-stage (ITS) approach (19). They showed on 16 normal (or healthy) subjects that ITS allows, in the presence of RSS, a significant improvement (with respect to NLS) of the precision of parameter estimates in the single individual. However, a first intrinsic limitation of ITS is given by the fact that, being a population approach, it requires a sufficiently large (e.g., tens of subjects) and homogenous IVGTT database. In addition, with ITS, any variation in the IVGTT dataset (e.g., the addition of a new subject or the elimination of an outlier) implies to recalculate the parameter estimates for all of the subjects. Other critical points of ITS concern statistical/theoretical issues. For instance, ITS assumes that model parameters are normally distributed within the population and provides confidence intervals that are, by construction, symmetrical with respect to point estimates (e.g., means ± 2 SD) and thus not necessarily realistic (e.g., they can include negative values). The availability of reliable confidence intervals can be crucial in those investigations where the estimated physiological parameters have to be used, possibly together with other clinical, metabolic, and genetic variables, e.g., for characterizing a population or for performing clustering, classification, regression, or risk analysis (6, 10, 11).

In this paper, we approached the problem of identifying MM in the presence of RSS by developing a Bayesian parameter estimation method implemented by Markov chain Monte Carlo (MCMC) (7, 9). The method works on a single individual basis and thus does not impose the requirements of population methods like ITS. By applying the method to the same IVGTT dataset of 16 normal subjects analyzed by Vicini and Cobelli (21), we show that, in passing from FSS to RSS, only a limited deterioration in parameter estimate precision occurs and that, in nearly all of the subjects, point parameter estimates do not significantly change. In addition, by using the Bayesian method, credible confidence intervals (e.g., excluding negative values) are obtained. In nearly all of the subjects, the RSS confidence intervals are very close to the FSS ones, suggesting that the Bayesian methodology can compensate for the loss of information due to sampling reduction. The new method also performs better, in terms of both accuracy and precision of estimates, than the ITS population method (and thus NLS).

## MATERIALS AND METHODS

### Database

The IVGTT data have been already described by Vicini et al. (20), to which we refer the reader for details. Briefly, in each of the 16 normal adults, a glucose bolus (between 300 and 330 mg/kg body wt) was administered at *time 0*. Plasma glucose and insulin concentrations were measured for 4 h with an FSS of 30 samples, from which an RSS of only 14 samples (bold characters) was extracted (0, **2**, **3**, **4**, **5**, **8**, **10**, 12, 14, 16, 18, **20**, 24, **28**, **32**, 40, 45, 50, **60**, **70**, 80, 90, 100, 110, **120**, 140, 160, **180**, 210, **240** min). The adopted RSS is based on previous studies presented in the literature on optimal sampling schedules for MM identification (4, 17). Measurement errors of glucose concentrations are assumed to be additive, independent, and Gaussian, with zero mean and a 2% coefficient of variation (CV).

### MM of Glucose Kinetics

The equations of the MM of glucose kinetics during an IVGTT (2) can be written as follows (15): (1) and (2) where *G*(*t*) (mg/dl) and I(*t*) (μU/ml) are plasma glucose and insulin concentrations, respectively, with G_{b} and I_{b} representing their baseline values, *X*(*t*) (min^{−1}) represents insulin in a remote compartment, S_{G} (min^{−1}) is glucose effectiveness, S_{I} (min^{−1}·μU^{−1}·ml) is insulin sensitivity, *p*_{2} (min^{−1}) is an insulin action parameter, *G*_{0} is the value of G extrapolated at *time 0*, with G_{0} = D/*V* + G_{b}, where *V* (dl·kg^{−1}) and D (mg·kg^{−1}) are the glucose distribution volume and the dose, respectively.

### MM Identification: NLS and ITS

MM parameter estimation consists of determining the unknown model parameter vector θ = [S_{G}, S_{I}, G_{0}, *p*_{2}]^{T} from plasma glucose concentration measurements by assuming plasma insulin concentration *I*(*t*) as a known input (from G_{0}, one obtains V through the above relation, being *D* and G_{b} known). Plasma glucose concentration measurement at *time t*_{i}, i.e., G(*t*_{i}), can be described as (3) where *h*(*t*_{i},θ) is a function of θ defined from *Eqs. 1* and *2*, *y*_{i} is the measurement, and *v*_{i} is the measurement error. Note that all model parameters are uniquely identifiable a priori once I(*t*) in *Eq. 2* is assumed to be known.

### Single Individual Methods

The most widely used approach to estimate θ from *Eq. 3* is NLS: (4) where *w*_{i} are weights (generally chosen as the inverse of the variance of the measurement error). Note that, if the measurement errors are independent, Gaussian stochastic variables with zero mean and variance equal to 1/*w*_{i}, θ̂ provided by *Eq. 4* also represents the maximum likelihood estimate.

Precision of NLS estimates, expressed by their standard deviation, is obtained from the square root of the diagonal elements of the inverse of the Fisher information matrix (3). From this matrix, a symmetrical 95% confidence interval can be obtained as θ̂ ± 2SD.

### Population Methods

When the data for the MM identification are available in a population of M homogeneous subjects, the so-called population methods, such as ITS (21), can be applied. Briefly, the basic idea of ITS consists of modeling the M unknown parameter vectors θ as M different realizations of a random vector drawn from a Gaussian distribution with mean μ and covariance matrix Ω. The unknown model parameters θ, given as μ and Ω, are then obtained in each subject by maximum a posteriori estimation. The vector μ and the matrix Ω are, however, unknown, and they are iteratively estimated by using an empirical procedure which, at each iteration, exploits the current estimates of the M model parameter vectors. Once convergence of the method is achieved, a first-order approximation is used to calculate, in each subject, confidence intervals of the estimated model parameters. These intervals are symmetrical by construction with respect to point estimates.

In principle, other more sophisticated population approaches, such as nonlinear mixed effects models (5) or hierarchical Bayesian models (22), could be used to cope with IVGTT MM identification in the presence of RSS. A detailed discussion comparing ITS and other population approaches is reported by Vicini and Cobelli (21). However, the main drawback of all population approaches is that they appear to be oversized for our specific problem. In fact, *1*) they are required to have IVGTT data in a homogeneous and quite large group (i.e., tens of subjects); and *2*) any variation in the data (e.g., the addition of a new subject or the elimination of an outlier) implies the recalculation of MM parameter estimates in all of the M subjects.

### MM Identification: a Bayesian Approach

The key quantity in Bayesian parameter estimation is the a posteriori probability density function *f*_{θ|y}(θ|*y*) (also called, for brevity, the posterior) of the unknown model parameters (“θ|*y*” reads “θ given *y*”) (8). From this function, which provides a complete description of the shape of the estimate uncertainty, the 95% confidence intervals can be derived, e.g., as the interval between the 2.5 and the 97.5 percentile of the distribution. Notably, these intervals are allowed to be asymmetrical with respect to any of the possible point estimates obtainable in a Bayesian framework, such as the median, the mode, or the mean.

The a posteriori probability density function of the parameter vector θ can be obtained from the Bayes theorem as: (5)

In the right-hand side of *Eq. 5*, *f*_{y}(*y*) does not depend on θ and can be considered as a scale factor, *f*_{y|θ}(*y*|θ) is the likelihood function of the data (obtainable from *Eq. 3* by exploiting the knowledge of measurement error statistics), and *f*_{θ}(θ) is a given function (also called the a priori density function of θ or simply the prior) that formalizes in statistical terms the a priori available knowledge on the unknown model parameters. The problem of choosing *f*_{θ}(θ) is discussed below.

### The Choice of the Prior

The first a priori knowledge that can be incorporated within *f*_{θ}(θ) concerns nonnegativity of all MM parameters. A natural choice is thus the adoption of a log-normal distribution (LN) for S_{G}, S_{I}, G_{0}, and *p*_{2}. Additional a priori knowledge, which is likely useful to overcome possible numerical identifiability problems (13) in presence of RSS, is obtainable by exploiting the sample distributions of MM parameters estimated by NLS in separate groups of subjects undertaken to IVGTT with FSS. This analysis was performed in an independent study on 50 normal subjects (not including the 16 subjects considered in this paper) and showed that estimates are approximately log-normally distributed. Therefore, we have assumed: (6) where the values of mean and variance were determined by fitting the sample distribution of θ in the 50 normal subjects. These values were also found to be stable with respect to changes of the training set, such as the random extraction of subsets of the 50 subjects.

## RESULTS

Here, we discuss the results obtained on the 16 IVGTT datasets. First, we consider the FSS situation, which represents the reference, and we discuss NLS and Bayesian estimates. Then, we assess the information loss that occurs in a single subject with the Bayesian estimator when RSS instead of FSS is adopted. Subsequently, we compare the Bayesian estimator with both ITS and NLS estimator for RSS. Finally, we discuss the dependence of the estimates on the prior *f*_{θ}(θ).

For allowing a direct comparison with the results of Vicini and Cobelli (21), we will report the estimates of V in place of those of G_{0}. The focus will be on the two most clinically important MM parameters, i.e., S_{G} and S_{I}.

### FSS

Figure 1 reports the estimates of MM parameters obtained with NLS and Bayesian estimators by using Simulation Analysis and Modeling Software II (1) and our MCMC algorithm implemented in MATLAB (The MathWorks, Natick, MA), respectively.

The figure clearly shows that, even in presence of FSS, NLS provides for some subjects (e.g., *subjects 5*, *6*, and *17* for S_{G} or *subjects 5* and *8* for S_{I}) noncredible confidence intervals; i.e., negative values are included. In particular, in 10 of 16 subjects, *p*_{2} estimate exhibits noncredible confidence intervals. On the contrary, confidence intervals of Bayesian estimates are always credible and in general are narrower than those of NLS estimates. In some cases, the difference between the confidence intervals in the two cases is remarkable, e.g., in *subjects 12* and *17*.

However, point estimates are very similar. For the sake of brevity, we simply reported the mean of the percent difference between them, computed as (7) in the 16 subjects together with 10 and 90 percentiles, which can give an idea of the variability of such differences in the various subjects. Average values and 10 and 90 percentiles (in parentheses) are −0.6 (−24, 12), 0.3 (−19, 28), −8 (−31, 7), and 0.3 (−1, 2) for S_{G}, S_{I}, *p*_{2} and *V*, respectively. Similar quantities can be calculated to quantify the percent differences between CV of the estimates. Average values (and 10 and 90 percentiles) are: 321 (6, 811), 219 (9, 554), 304 (2, 672), and 148 (8, 412).

### RSS vs. FSS

Figure 2 shows MM parameter estimates and their confidence intervals obtained in the 16 subjects by using the Bayesian estimator with FSS (already shown in Fig. 1) and RSS. The pictures also include the a posteriori distribution of S_{G} and S_{I} in two representative subjects (*subjects 1* and *18*).

The comparison demonstrates that, in general, the use of the Bayesian estimator allows a reduction of sampling schedule without significantly affecting the accuracy and precision of MM identification. In fact, the a posteriori probability density functions, and thus the confidence intervals, of RSS estimates are generally very close to the FSS ones (see also the posterior of S_{I} in Fig. 2 in *subjects 1* and *18*). Even in those rare cases in which some differences between the two estimates are present (see S_{G} estimates in Fig. 2 in *subject 18*), the confidence intervals derived for FSS and RSS lie in relatively near regions. The only exception concerns *p*_{2} in *subject 16*, but this is not surprising because *p*_{2} is often known to be a poorly identifiable parameter (14).

As far as point estimates are concerned, in general they are very similar. In fact, percent differences between RSS and FSS [mean and the 10 and 90 percentile, (in parentheses)] are: −1 (−13, 25), −4 (−33, 7), 3 (−22, 20), and −0.4 (−6, 8) for S_{G}, S_{I}, *p*_{2}, and *V*, respectively, and correlation is excellent (slope close to 1 and intercept close to 0) for S_{I} (*r* = 0.99) and *V* (*r* = 0.93), and good for S_{G} (*r* = 0.87) and for *p*_{2} (*r* = 0.79).

Therefore, the comparison of RSS and FSS point estimates and confidence intervals allows us to speculate that the same physiological inferences can be drawn by analyzing MM estimates obtained with the Bayesian estimator.

### RSS: Comparison with NLS and ITS

The results obtained in presence of RSS with NLS, ITS, and Bayesian estimators are summarized in Fig. 3 for S_{G} and S_{I} [ITS estimates are taken from Vicini and Cobelli (21)].

As already highlighted in Vicini and Cobelli (21), in the presence of RSS, NLS is not suitable for MM identification because it leads to an unacceptably large CV. For example, in one subject (*subject 11*), a CV higher than 500% is obtained for *S*_{I}; i.e., in this case, MM is a posteriori nonidentifiable. In 10 of 16 subjects, NLS obtains noncredible confidence intervals (with negative portions much larger than in Fig. 1). As shown in Fig. 3, ITS offers a first improvement by drastically reducing the width of the confidence intervals, although negative portions are still possible (e.g., S_{I} of *subject 11*). The Bayesian estimator further reduces the uncertainty of the estimates with respect to ITS and, in all subjects, confidence intervals of all parameters do not include negative values. In addition, they are allowed to be asymmetrical; e.g., see S_{I} of *subject 17*.

Notably, point estimates provided by the three methods are, in general, quite similar, and in only a few cases a significant difference can be detected, such as for *subject 16*. In summary, percent differences between NLS and Bayesian point estimates [mean and the 10 and 90 percentiles, (in parentheses)] are 5 (−5, 24), −13 (−48, 1), −12 (−55, 3), and 0.0 (−3, 2) for S_{G}, S_{I}, *p*_{2}, and *V*, respectively, whereas percent differences among precision CVs are 351 (18, 824), 348 (4, 709), 493 (34, 1,101), and 175 (7, 385). Percent differences between ITS and Bayesian point estimates are 0.9 (−18, 24), 11 (−10, 25), −14 (−66, 29), and 0.2 (−8, 8) for S_{G}, S_{I}, *p*_{2}, and *V* respectively, whereas percent differences among precision CVs are 57 (−36, 179), 21 (−32, 82), 29 (−33, 139), and 9 (−23, 57).

### Sensitivity to the Choice of the Prior Distribution

The results reported so far were obtained using the prior-information model described in *MM identification: a Bayesian approach*. In particular, *f*_{θ}(θ) is a log-normal distribution with parameters fitted into a separate population study. It is important to evaluate the sensitivity of Bayesian estimates to the shape of the prior. Therefore, we have recomputed the RSS Bayesian estimates, assuming, such as for ITS, that *f*_{θ}(θ) is a normal distribution with mean (0.025, 0.00059, 282, 0.042) and covariance *diag*([0.011^{2}, 0.00035^{2}, 35^{2}, 0.05^{2}]). These values make the 95% a priori variability intervals similar to those of the fitted log-normal distribution. Figure 4 shows the new estimates of S_{G} and S_{I} in 16 subjects (middle bars) compared with those previously obtained in *RSS vs. FSS* (bottom bars).

In general, the point estimates and confidence intervals obtained are poorly sensitive to changes in the prior. Percent differences between point estimates obtained under the assumption of *f*_{θ}(θ) normal vs. log-normal can be summarized [mean and the 10 and 90 percentiles, (in parentheses)] as 2 (−5, 12), −6 (−23, 1), −2 (−14, 8), and 0.08 (−2, 2) for S_{G}, S_{I}, *p*_{2}, and V, respectively. To better visually assess the similarity between the two situations, Fig. 5 shows the posterior distributions of S_{G} and S_{I} in a representative subject (no. 5).

Because the difference of the estimates between log-normal and normal cases is small, one concludes that the Bayesian estimator is not significantly sensitive to small changes of the shape of the prior.

Another interesting issue is to assess the role of a priori information, i.e., to understand how important it is to specify an informative distribution. To provide an answer to this question, we have recalculated RSS estimates by MCMC without supplying a priori information, i.e., *f*_{θ}(θ) is uniform, so that the a posteriori distribution coincides with the likelihood (which in turn is a function of the residual weighted sum of squares of NLS). In Fig. 4, the new estimates of S_{G} and S_{I} in 16 subjects (top bars) are compared with those previously obtained in *RSS vs. FSS* (bottom bars). In several cases, point estimates are very close to the one obtained in presence of an informative prior, even if some differences are present. In summary, percent differences between the estimates obtained under the assumption of *f*_{θ}(θ) uniform vs. log-normal [mean and the 10 and 90 percentiles, (in parentheses)] are 7 (−2, 22), −11 (−40, 3), 58 (−13, 351), and −2 (−5, 0.4) for S_{G}, S_{I}, *p*_{2}, and *V*, respectively. Confidence intervals are equal to or larger than those obtained with the log-normal prior. This demonstrates that a priori information plays a role in improving the performance of the Bayesian estimator. Moreover, because even using the noninformative prior brings, in most cases, estimates much more precise than those obtained by NLS (e.g., compare *subject 17* in Fig. 4 with Fig. 3), we can conclude that the MCMC simulation strategy avoids the numerical identifiability problems often present with NLS estimates.

## DISCUSSION

The possibility of reducing the number of samples in MM identification is appealing for obvious economical and ethical reasons and thus can have an important impact in clinical practice, especially in large-scale studies. Unfortunately, standard NLS is usually unsuccessful in dealing with RSS due to unacceptable deterioration of the quality of estimates. Recently, population parameter estimation methods have been proposed to cope with RSS MM identification. However, population methods are not without problems. For instance, ITS [used in Vicini et al.(21)], assumes that the model parameters have a Gaussian probability distribution within the population, with the consequence that confidence intervals (symmetrical by construction) are often not realistic (e.g., they include negative values). In addition, population methods are in general difficult to implement because the database must consist of a sufficiently large number of homogenous subjects, and any change of it (e.g., removal of an outlier) requires us to repeat model identification in all of the subjects. In this paper, we have shown that RSS MM identification can be successfully approached by a new MCMC Bayesian estimation method that works well not only in terms of population values but also at the single individual level. On a database of 16 normal subjects, the new method was shown to provide accurate and precise MM parameter estimates (i.e., no bias with respect to FSS and acceptable CV). Notably, results are not sensitive to small changes of the probability density function chosen to express a priori knowledge. In our paper, we chose a log-normal distribution obtained by fitting an independent database of MM parameters. In addition to working reliably at the single-subject level, an additional advantage of MCMC over ITS is that it provides the a posteriori probability distribution of the parameter estimates from which credible confidence intervals can be determined (e.g., asymmetrical and in the range of nonnegative values). This can be crucial in investigations where MM parameters are used in conjunction with other indexes (physiological, clinical, metabolic, genetic) in statistical analyses, such as characterization of a (sub)population, clustering, classification, regression, risk analysis, and so on (6, 10, 11).

To better grasp the importance of having a credible measure of estimate uncertainty, let us discuss in more detail three situations. First, suppose one wants to characterize a (sub)population of M subjects by determining the variability of S_{G} within it. The most natural and easy-to-implement approach is the determination of the histogram of S_{G} from its M point estimates. When M is quite small, e.g., M = 16, as in our case, the histogram is unavoidably rough (Fig. 6, *top*) and also does not take into account the different degrees of uncertainty of the M parameter estimates. On the other hand, by exploiting the sampled a posteriori distributions of S_{G} obtained through MCMC (e.g., Fig. 5) in the M subjects, a much more significant distribution (Fig. 6, *bottom*) can be obtained intrinsically, accounting for the different precision of the estimates in the M subjects. Of interest, the distribution of S_{G} exhibits a smoother tail toward high rather than low values. As a second example, let us consider a classification problem where a subject has to be placed in *class A* or *B* on the basis of MM parameter estimates. If, for instance, MM parameter estimates are uncertain, the use of point estimates without taking into account their credibility is not statistically robust unless *A* and *B* are well separated. By contrast, exploiting the posterior distribution of parameter estimates allows one to obtain not only a more statistically sound classification of the subject, but also a measure of the credibility of the chosen class. The third example concerns regression (10) and risk analysis (6, 11), where, using a Bayesian framework, the uncertainty of MM parameter estimates can be propagated into the predicted outcome variables.

The proposed methodology to cope with RSS MM identification has been developed and applied in normal subjects undertaken to IVGTT, for which a validated reduced schedule was available (4, 18). Given the encouraging results, a natural extension to pathological subjects, e.g., insulin resistant or diabetic subjects, is of obvious importance. However, this extension is not simple because two prerequisite steps are needed. First, because there is no evidence that RSS developed in Cobelli (4) and Ruggeri and Steil et al. (18) for normal subjects are also appropriate in pathological conditions, there is the need to develop a “pathology”-specific RSS. This requires exploiting a large “pathology” FSS database. Second, available databases of MM parameters need to be exploited to build a “pathology”-specific probabilistic model of prior knowledge. In fact, the prior distribution *f*_{θ}(θ) developed in *MM indentification: a Bayesian approach* is only appropriate for normal subjects. For insulin-resistant subjects, for instance, a starting point could be the prior proposed in Pillonetto et al. (15) to cope with numerical identifiability problems that can sometimes affect MM identification, even in the presence of FSS.

## APPENDIX

In this section, some details about the implementation of MCMC algorithms are given, even if technicalities are avoided. Among the several MCMC methods proposed in the literature, we have chosen a single-component Metropolis-Hastings random walk (13), as described in Pillonetto et al. (15). In particular, the Markov chain was generated by extracting samples (for one MM parameter at a time) from a normal distribution with a mean equal to the current value of the chain and fixed variance (suitably chosen for each parameter to speed up the MCMC algorithm). Convergence was assessed by using the Raftery criterion (16); ∼30,000 samples were required to have a satisfactory description of the posterior and thus a “robust” assessment of point estimates and confidence intervals. Tens of minutes are required to identify the MM model in a single individual on a Sun Ultra 10. The MCMC algorithm was originally implemented in MATLAB, but to make it easily accessible to researchers the method we proposed, a compiled version for Windows platform, is available upon request.

## GRANTS

This work was in part supported by Ministero dell’Istruzione, dell’Università e della Ricerca-MURST Grant “Estimation of nonaccessible parameters in physiological systems”, National Institute for Biomedical Imaging and Bioengineering Grant No. P41 EB-001975.

## ACKNOWLEDGMENTS

We thank the anonymous referees for their stimulating comments, which have helped us to significantly improve the paper.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2006 by American Physiological Society