- Contents
- Mixed-effects models in S and S-PLUS - PDF Free Download
- Mixed Effects Models in S and S-Plus Solutions Manual
- Mixed Effects Models in S and S-Plus.pdf
- Mixed-Effects Models Methods and Classes for S and S Plus

To Elisa and Laura To Mary Ellen, Barbara, and Michael PrefaceMixed-eﬀects models provide a ﬂexible and powerful to. Nonlinear Mixed-Effects Models. Front Matter. Pages PDF · Nonlinear Mixed-effects Models: Basic Concepts and Motivating Examples. Pages PDF | 5+ minutes read | On Sep 1, , V. J Carey and others published Mixed- Effects Models in S and S-Plus.

Author: | MARCELLUS LELONEK |

Language: | English, Spanish, Portuguese |

Country: | Kosovo |

Genre: | Fiction & Literature |

Pages: | 404 |

Published (Last): | 25.06.2016 |

ISBN: | 390-2-65599-400-4 |

Distribution: | Free* [*Registration needed] |

Uploaded by: | JOHNETTA |

Request PDF on ResearchGate | Mixed-Effect Models in S and S-plus | Linear Mixed-Effects * Theory and Computational Methods for LME Models * Structure of . The nlme library we developed for analyzing mixed-eﬀects models in implementations of the S language, including S-PLUS and R, provides the underlying. The software comes with a number of online manuals (in PDF format) . In the current version of S-plus linear and nonlinear mixed-effects models can be fitted .

S-Plus ; Mathematical statistics - Computer programs. This paperback edition is a reprint of the edition. This book provides an overview of the theory and application of linear and nonlinear mixed-effects models in the analysis of grouped data, such as longitudinal data, repeated measures, and multilevel data. A unified model-building strategy for both linear and nonlinear models is presented and applied to the analysis of over 20 real datasets from a wide variety of areas, including pharmacokinetics, agriculture, and manufacturing. A strong emphasis is placed on the use of graphical displays at the various phases of the model-building process, starting with exploratory plots of the data and concluding with diagnostic plots to assess the adequacy of a fitted model. Over figures areincluded in the book. The NLME package for analyzing mixed-effects models in R and S-PLUS, developed by the authors, provides the underlying software for implementing the methods presented in the text, being described and illustrated in detail throughout the book. The balanced mix of real data examples, modeling software, and theory makes this book a useful reference for practitioners using mixed-effects models in their data analyses. It can also be used as a text for a one-semester graduate-level applied course in mixed-effects models. Researchers in statistical computing will also find this book appealing for its presentation of novel and efficient computational methods for fitting linear and nonlinear mixed-effects models. Pinheiro is a Senior Biometrical Fellow at Novartis Pharmaceuticals, having worked at Bell Labs during the time this book was produced. He has published extensively in mixed-effects models, dose finding methods in clinical development, and other areas of biostatistics.

In this case the response is on the horizontal axis. A value greater than 1. The gaps are given in units of character heights. It would be better to keep these combinations on the same row so we can more easily compare treatments and grass types. The horizonal axis in the panel is the primary covariate.

This does not always create a good arrangement for comparing patterns across outer factors. In these cases a third component can be added to the layout argument causing the plot to be spread over several pages. Half the plants of each type were chilled overnight before the measurements were taken. Carbon dioxide uptake versus ambient CO2 concentration for Echinochloa crus-galli plants. There were four groves of trees. This plot shows a default layout of the panels.

This plot shows an alternative layout of the panels. These data are described in more detail in Appendix A. The axis annotations can be at powers of 10 or at powers of 2. Optical density versus DNase concentration. Describing the Structure of Grouped Data 0 2 4 5 6 8 10 0 7 2 6 4 6 8 10 2 3 2. You can consider skipping this section unless you want to modify the way the data are presented within each panel.

This change is incorporated in Figure 3. If no primary covariate is available. When the primary covariate is numeric. The presentation of the data within each panel is controlled by a panel function. The actual symbol that is drawn is determined by the trellis device that is active when the plot is displayed. Some care needs to be taken when doing this. In the DNase assay data. Optical density versus DNase concentration for eleven runs of an assay.

The concentration is shown on a logarithmic scale. If you do override the default panel function. It is usually an open circle. This panel function can be overridden with an explicit panel argument to the plot call if. The last four lines of the panel function add a line through the data values. Describing the Structure of Grouped Data 3. Another optional argument. This is a common occurrence. This summary function should take a numeric vector as its argument and return a single numeric value.

Using collapse can help to reduce clutter in a plot. For the Pixel data. The default action is to preserve. It is more informative to examine the mean curve for each wafer and the standard deviation about this mean curve. Displaying at the Wafer level with separate curves for each Site within each Wafer. The default is the mean function. Mean pixel intensity of lymph nodes in the axillary region versus time by Side within Dog. The two curves on each plot correspond to the left and the right sides of the Dog.

Describing the Structure of Grouped Data 0 5 Pixel intensity 6 10 15 20 0 4 5 5 10 15 20 7 10 1 2 3 9 8 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 Time post injection days FIGURE 3. Mean pixel intensity of lymph nodes in the axillary region versus time by Dog. The panels correspond to wafers.

Current versus voltage for the Wafer data. Mean current versus voltage at the wafer level for the Wafer data. Each point on each curve is the average of the current for that voltage at eight sites on the wafer. Standard deviation of current versus voltage at the wafer level for the Wafer data. Each point on each curve is the standard deviation of the current at that voltage at eight sites on the wafer. These could be experimental factors. The gapply function is part of the nlme library.

Both lapply and sapply can be used to apply a function to the components of a list. Because the groupedData class inherits from the data. Another way of checking on the data. The gsummary function. The table and unique functions are not solely intended for use with grouped data. The Theoph data are another example of a medical study where the grouping is by Subject. They are from a study of the pharmacokinetics of the drug theophylline. Because factor levels are often coded as integers.

We could replace sapply with lapply and get the same information. They all apply a function to subsections of some data structure and gather the results in some way. The result of lapply is always a list. In this section we expand on the usage of these functions and introduce another groupwise summary function gapply. Checking the data. As this is a short-duration study about 24 hours and only one dose of theophylline was given.

The value returned will be NULL unless there are invariants other than the grouping factor. When multiple modes are present. Describing the Structure of Grouped Data within each group. Any non-numeric variables are represented by their modes within each group.

A variable that is invariant within each group is represented by the single value that it assumes within each group. The FUN argument to gsummary is applied only to numeric variables in the grouped data object. In other words. When there are a large number of groups in the data. The summarized values of conc. Returning to the question of the missing values in conc. A less obvious peculiarity in the data is an apparent inconsistency in the numeric values of height and weight.

The Quinidine data. For these subjects every value of interval was missing. The mean function behaves like this and returns NA for every subject in each of these three variables.

Because any subject in the experiment must have at least one dosage record and at least one concentration record. The default value of FUN in gsummary is mean with na. The behavior can be overridden in mean and in several other summary functions by giving the optional argument na.

The default behavior of most summary functions in S is to return the value NA if the input vector contains any missing values. In particular.: We can get an exact count of the number of concentrations by counting the number of nonmissing values in the entire conc variable.

With only concentration measurements for subjects there are fewer than three concentration measurements per subject. Both summaries are useful to us in understanding these data.

Describing the Structure of Grouped Data yes: Because there are subjects in the study. The function gapply is available to perform calculations such as this.

The rows that are neither dosage records nor concentration measurements are cases where values of other variables changed for the subject. To explore this further. If this argument is missing. For each subject we must determine whether there are any records with both the conc variable and the dose variable missing.

This is not uncommon in such routine clinical data data that are collected in the routine course of treatment of patients.

A total of out of the subjects have fewer than four response measurements. It is a named vector of the counts where the names are the values of the Subject variable.

When only a single name or index is given. If more than one variable is to be passed. The second argument to gapply is the name or index of the variable in the groupedData object to which we will apply the function. To obtain the distribution of the measurements.

To illustrate this. A common consequence of having so few observations for most of the subjects is that the information gained from such a study is imprecise compared to that gained from a controlled experiment. The data for subject 4 provide a better example. It is more informative if we convert the result to a vector of levels of the Subject factor for which there are records that are neither dosage records nor concentration records.

We did not cull such redundant rows from the data set because we wanted to be able directly to compare results based on these data with analyses done by others. We can see there that the record in question is row Informative and visually appealing trellis graphics displays of the data can be quickly and easily generated from the information that is stored with the data.

Other variables can be designated as outer or inner factors relative to the grouping factors.. Use the outer argument to the plot function to produce a plot of the CO2 data similar to Figure 8. One plot should have the rows determined by Variety and the columns by Treatment.

Describing the Structure of Grouped Data Here rows The regular data summary functions in S can be applied to the data as well as the gsummary and gapply functions that are especially designed for these data. Row 65 records a change in the glyco variable i. These objects include the data. Accessor or extractor functions are available to extract either the formula for these variables or the value of these variables.

The other plot should have rows determined by Treatment and columns by Variety. Informative plots and summaries of the data are very useful for the preliminary phase of the statistical analysis. Create a plot of these data arranged as four rows of three columns. Produce two such plots of these data. The Dialyzer data.

Applying gsummary to this object will produce the average optical density for each pair of replicates. When such replicate observations are available.

Note the change in the aspect ratio of the panels relative to the plots in parts c and d. The QB variable. Compare this plot to Figure 5. Verify your result using the isBalanced function. Determine which Subjects are at which QB levels. Hunter and Hunter. Exercises 3. In the DNase data. Describing the Structure of Grouped Data b Determine the standard deviation of the optical density at each set of replicate observations.

You may wish to replace those small values with zero. Use a boxplot of the logarithm of the standard deviations to check for these. Note that some of these standard deviations should be zero because the replicate observations are equal. The lmList function.

We describe the lme function from that library in this chapter. The S modeling function lme is described. These models have gained popularity over the last decade. Its use is illustrated through examples. The covariates can be factors. Models with more complex within-group covariance structures. We concentrate here on lm.

A typical call to lm is of the form lm formula. The expression on the right-hand side describes the covariates used and the ways they will be combined to form the model.

Chapter 4 and also in Venables and Ripley Several other arguments to lm are available and are described in detail in Chambers and Hastie In this chapter we will restrict our attention to models in which the within-group errors are independent and have equal variance.

An interaction between two covariates is denoted by the: Nesting of a covariate within a factor is denoted by the. The constant term 1 is included by default and does not need to be given explicitly in the model.

They are implicit. The two main modeling functions are lm. We make extensive use of examples to introduce and illustrate the available functions and methods. Function calls are also allowed on the left-hand side of the formula. If a model without an intercept term is desired. Chapter 6. Any function of a covariate that returns one of these types of covariate can also be used on the right-hand side of the formula.

Chapter 2 and Venables and Ripley This is not surprising. More detailed references on the formula language include Chambers and Hastie Intercept age To illustrate some of these capabilities.

The lm function operates in a style common to most modeling functions in S. We can use the following model. Value Std. Intercept Sex age Sex: Min 1Q Median 3Q Max The summary method displays the results in more detail. Intercept Sex age Sex 0. The grouped nature of these data. The Sex: Residual plots corresponding to the fm2Orth.

Boxplots of the fm2Orth. Orthodont If data is a groupedData object see Chapter 3. Because we are using the same grouping. Orthodont Because the lmList function is a generic function Chambers and Hastie. Continuing with the analysis of the orthodontic data. Any linear formula allowed in lm can also be used as a model formula with lmList. A typical call to lmList is lmList formula.

Table 4. The lmList function and the methods associated with it are useful for this. We illustrate the use of some of these methods below. F11 Orthodont Coefficients: Intercept age M16 Main lmList methods. More detailed output can be obtained using the summary method. Intercept Value Std.

We can remove this correlation by centering the data. Those with experience analyzing regression models may already have guessed why this pattern occurs. We see that subject M13 has an unusually low intercept. There appears to be a negative correlation between the intercept and slope estimates.

F11 0. M05 Pairs plot for fm1Orth. It is because all the data were collected between age 8 and age It is clear that the orthodontic distance for subject M13 has grown at an unusually fast rate.

The two quantities being estimated then are the distance at 11 years of age and the slope. I age. M16 Pairs plot for fm2Orth. This can be evaluated with the intervals method.

F04 Both intercept and slope estimates seem to vary with individual. Except for subject M M02 0. To further illustrate the capabilities of lmList. The primary purpose of the IGF-I experiment was to investigate possible trends in control values with tracer age.

This reduces the potential for serial correlation in the responses. They are described in more detail in Appendix A. Examination of Figure 4. There is little indication of lot-to-lot variation in either the intercept or the slope estimates.

It has a low intercept compensated by a high slope. For the orthodontic data. To use this default. If the random formula is omitted. In this section. Several optional arguments can be used with this function. When data inherits from class groupedData. The third argument is typically given as a one-sided linear formula. It prints the estimates of the standard deviations and the correlations of the. Main lme methods. When an lmList object. We illustrate the use of these methods through the examples in the next sections.

General positive-definite StdDev Corr Intercept 2. Some more detailed output is supplied by summary. Orthodont Log-restricted-likelihood: For the fm1Orth.

In general. General positive-definite StdDev Corr Intercept 1. Error DF t-value p-value Residuals are extracted with the resid method. The fm2Orth. Predicted values are obtained with the predict method.

Subject gives the within-group predictions. Subject 1 M11 The function compareFits can be used for this. Intercept coef fm2Orth. It is instructive. The estimated withingroup residual standard deviations are identical.

The pooling of subjects in the lme estimation gives a certain amount of robustness to individual outlying behavior. F04 0.

The plots in Figure 4. M13 1. It is also interesting to compare the fm2Orth. In the terminology of split-plot experiments. Intercept and Sex are associated. This is because the model used in lm ignores the group structure of the data and incorrectly combines the between-group and the within-group variation in the residual standard error. Ratio p-value fm2Orth. The anova method can be used to compare lme and lm objects. IGF Log-restricted-likelihood: General positive-definite StdDev Corr Intercept 0.

There is also evidence of large variability in the estimates of the Intercept and age standard deviations. Error DF t-value p-value Intercept 5.

Lot lower est. We can investigate it with the summary method. The primary question of interest in the IGF-I study is whether the tracer decays with age. In many practical applications. Also notice that the outlying observations for lots 4. Standard pdMat classes. TABLE 4. The pdMat constructors have the same name as their corresponding classes so. A function that creates an object of a given class is called a constructor for that class. In cases when both the grouping structure and a pdMat class are to be declared in random.

Ratio p-value fm1IGF. Radioimmunoassays of IGF-I The pdMat object returned by the constructor is passed through the random argument to the lme function.

Because IGF is a groupedData object.

Diagonal Intercept age Residual StdDev: Intr nitro Yet another representation of 4. The pdBlocked class is used to represent block diagonal matrices in the nlme library. It takes as an argument a list of pdMat objects. Blocked Block 1: Intercept Formula: The response variable is the logarithm of the optical density measure on a well.

Only live cells can make the stain. VarietyVictory Formula: We illustrate its use with an example of a cell culture plate bioassay conducted at Searle. The cells are treated with a compound that they metabolize to produce the stain. Within each block. VarietyGolden Rain.

These data are described in detail in Appendix A. Log-optical density measured on the central 60 wells of a cell culture plate. Panels in the plot correspond to the serial dilutions and symbols refer to the samples. The plot of the log-optical densities. There does not appear to be any interactions between sample and dilution. Composite Structure: Assay Log-restricted-likelihood: This type of variance— covariance structure is represented in S by a pdBlocked object with pdIdent elements.

The pdDiag constructor and methods can serve as templates for these. Intercept 0. Multiple of an Identity dilut1 dilut2 dilut3 dilut4 dilut5 StdDev: For this. New pdMat classes. Multiple of an Identity samplea sampleb samplec sampled samplee samplef StdDev: The plot of the data. In the case of 4. These data are also described in Appendix A. The order of nesting is taken to be the order of the elements in the list.

There are two nested grouping levels in this example: Thickness of Oxide Coating on a Semiconductor Littell et al. We describe the multilevel model capabilities in lme through the analyses of two examples from Integrated Circuiting IC manufacturing. Oxide Log-restricted-likelihood: We can test.

Wafer lower est. We can assess the variability in these estimates with the intervals method. Ratio p-value fm1Oxide 1 4 Lot Intercept 1 The intensity of current in mA at 0.

Measurements were made on eight sites in each of ten wafers selected from the same lot. Current versus voltage curves for each site.

Two levels of nesting are present in these data: The corresponding multilevel model for the intensities of current yijk at the kth level of voltage vk in the jth site within the ith wafer is expressed. The main objective of the experiment was to construct an empirical model for simulating the behavior of similar circuits.

As usual. From Figure 4. Intr voltag voltage To make the optimization more stable during this initial model-building phase. Wafer 1 1 2. As with single-level objects. Before pursuing this any further. The most useful of these methods are based on plots of the residuals. Assumption 2. The validity of the distributional assumptions may also be formally assessed using hypothesis tests. The nlme library provides several methods for assessing the validity of these assumptions.

We start by considering Assumption 1 on the within-group error term. Wafer predict. Assumption 1. For the fm2Orth. We illustrate the capabilities of the plot method through the analysis of the examples described in earlier sections of this chapter. The symbol ". First, one of the fundamental assumptions underlying the OLS approach is that the residuals have homogeneous variance.

However, various studies 13 — 15 have suggested that the variability of VF sensitivities is increased when sensitivity is lower in both normal and damaged eyes. A second assumption within the OLS framework is that the residuals the differences between the observed values and those predicted by the trend over time are uncorrelated.

In longitudinal data, it is common for temporal autocorrelation to be present even if no glaucomatous progression is observed. The same effect can also occur when a linear fit is imposed on data that are actually changing in a nonlinear manner. As an example, the measure of interest in this case, the MD is assumed to change at a constant rate over time.

Therefore, given the logarithmic nature of the decibel dB scale used in perimetry, linear change in sensitivity over time implies that the proportion of remaining RGCs that die each year must remain constant. If we hypothesize an alternative formulation, where the actual number of remaining RGCs that die each year remains constant, resulting in a linear decline in structural measures such as retinal nerve fiber layer thickness, then based on the cross-sectional findings this would result in the loss of sensitivity in decibels appearing to accelerate exponentially.

If a linear fit is placed over data that are accelerating downward exponentially, the residuals will tend to be positive in the center of the period and negative at either end of the sequence, causing potentially significant temporal autocorrelation. In the presence of heteroscedasticity or temporal autocorrelation, OLS estimators are no longer the best linear unbiased estimators.

Thus, statistics and confidence intervals from OLS may not be valid for drawing inference. Third, longitudinally collected data are often grouped by one or more grouping factors. In the case of ophthalmic tests, data are grouped within an eye i. Observations taken within the same eye or from the same individual are likely to be more similar than observations taken from a different eye or person.

This indicates the need for more sophisticated methods that take into account correlation among observations within the same group. In this study, we use multilevel mixed-effects methods that account for group effects both within eyes and between fellow eyes of the same individual , as well as temporal autocorrelation of within-group errors, to generate more valid P values for assessing the significance of change in VF sequences from participants with glaucoma.

We then consider the use of a nonlinear model in which VF sensitivity declines exponentially with time based on the alternative formulation outlined above, and we determine whether this removes evidence of autocorrelation. We demonstrate and test these statistical methods with a relatively simple predictive model using only age as a predictor of the rate of glaucomatous field change. However, the overall goal of this work is to aid in the development of better predictive models that will help clinicians manage patients with glaucoma.

Furthermore, the current version of the nlme library for R does not support the same range of graphics presentations as does the S-PLUS version. Errata and updates of the material in the book will be made available on-line at the same sites. The book is divided into parts.

Chapter 1 gives an overview of LME models, introducing some examples of grouped data and the type of analyses that applies to them. The theory and computational methods for LME models are the topics of Chapter 2. Chapter 3 describes the structure of grouped data and the many facilities available in the nlme library to display and summarize such data.

The model-building approach we propose is described and illustrated in detail in the context of LME models in Chapter 4.