Practical case of R language linear mixed effect model

Practical case of R language linear mixed effect model

Original link: tecdat.cn/?p=3059

Original source: Tuoduan Data Tribe Official Account

 

Introduction

Analysts who deal with grouped data and complex hierarchies, from measurements embedded in participants, counties nested in states or students nested in classrooms, often find that they need modeling tools to reflect this kind of data. structure. In R, there are two main ways to fit multi-level models that take into account this structure in the data. These tutorials will show users how to use

lme4
Packages in R to fit linear and non-linear mixed effects models, and how to use
rstan
To fully fit the Bayesian multi-level model. The focus here is how to make the model fit R rather than the theory behind the model. For background knowledge of multi-level modeling, see Resources. 

This tutorial will show how to

lme4
  Set up and run some basic models, including:

  • Construct varying intercepts, varying slopes, and varying slope and intercept models in R
  • Generate predictions and explain parameters from mixed-effects models
  • Generalized and nonlinear multi-level models
  • Fully Bayesian multi-level model fit
    rstan
    Or other MCMC methods

Set up the environment

It is easy to start multi-level modeling in R.

lme4
It is a specification package for implementing multi-level models in R, although there are many packages that depend on and enhance its feature set, including Bayesian extensions.
lme4
 It has recently been rewritten to improve speed and integrate the C++ code base, so the encapsulated functions are somewhat constantly changing. 

To install

lme4
, We just need to run:

# Major version install.packages("lme4") # Or install the development version library(devtools) install_github("lme4", user = "lme4") Copy code

Read in data

Multi-level models are suitable for specific types of data structures, where cells are nested in groups (usually more than 5 groups), and we want to model the group structure of the data. For our introductory example, we will start

lme4
A simple example in the documentation starts and explains what the model is doing. 

library(lme4) # load library library(arm) # function used for regression in R # summary(lmm.data) head(lmm.data) Copy code
## id extro open agree social class school ## 1 1 63.69 43.43 38.03 75.06 d IV ## 2 2 69.48 46.87 31.49 98.13 a VI ## 3 3 79.74 32.27 40.21 116.34 d VI ## 4 4 62.97 44.41 30.51 90.47 c IV ## 5 5 64.25 36.86 37.44 98.52 d IV ## 6 6 50.97 46.26 38.83 75.22 d I Copy code

 model

Let us first fit a simple OLS regression. 

OLSexamp <- lm(extro ~ open + agree + social, data = lmm.data) display(OLSexamp) Copy code
## lm(formula = extro ~ open + agree + social, data = lmm.data) ## coef.est coef.se ## (Intercept) 57.84 3.15 ## open 0.02 0.05 ## agree 0.03 0.05 ## social 0.01 0.02 ## --- ## n = 1200, k = 4 ## residual sd = 9.34, R-Squared = 0.00 Copy code

 The R model interface is very simple, first specify the dependent variable, then 

~
Symbols, predictors, each are named. Additive symbols indicate that these are modeled as additive effects. Finally, we specify the data to calculate the model. Here we use the
lm
The function performs OLS regression, but there are many other options in R.

If we want to extract metrics such as AIC.

MLexamp <- glm(extro ~ open + agree + social, data = lmm.data) display(MLexamp) Copy code
## glm(formula = extro ~ open + agree + social, data = lmm.data) ## coef.est coef.se ## (Intercept) 57.84 3.15 ## open 0.02 0.05 ## agree 0.03 0.05 ## social 0.01 0.02 ## --- ## n = 1200, k = 4 ## residual deviance = 104378.2, null deviance = 104432.7 (difference = 54.5) ## overdispersion parameter = 87.3 ## residual sd is sqrt(overdispersion) = 9.34 Copy code
AIC (MLexamp) Copy code
## [1] 8774Copy code

This leads to poor model fit. Let us now look at a simple model.

Fit different models

Our next step may be to use grouping variables (such as school or class) to fit different models. 

MLexamp.2 <- glm(extro ~ open + agree + social + class, data = lmm.data) display(MLexamp.2) Copy code
## glm(formula = extro ~ open + agree + social + class, data = lmm.data) ## coef.est coef.se ## (Intercept) 56.05 3.09 ## open 0.03 0.05 ## agree -0.01 0.05 ## social 0.01 0.02 ## classb 2.06 0.75 ## classc 3.70 0.75 ## classd 5.67 0.75 ## --- ## n = 1200, k = 7 ## residual deviance = 99187.7, null deviance = 104432.7 (difference = 5245.0) ## overdispersion parameter = 83.1 ## residual sd is sqrt(overdispersion) = 9.12 Copy code
AIC(MLexamp.2) Copy code
## [1] 8719Copy code
anova (MLexamp, MLexamp.2, test = "F") copy the code
## Analysis of Deviance Table ## ## Model 1: extro ~ open + agree + social ## Model 2: extro ~ open + agree + social + class ## Resid. Df Resid. Dev Df Deviance F Pr(>F) ## 1 1196 104378 ## 2 1193 99188 3 5190 20.8 3.8e-13 *** ## --- ## Signif. codes: 0'***' 0.001'**' 0.01'*' 0.05'.' 0.1 '' 1 Copy code

This is often called a fixed effect. 

MLexamp.3 <- glm(extro ~ open + agree + social + school, data = lmm.data) display(MLexamp.3) Copy code
## glm(formula = extro ~ open + agree + social + school, data = lmm.data) ## coef.est coef.se ## (Intercept) 45.02 0.92 ## open 0.01 0.01 ## agree 0.03 0.02 ## social 0.00 0.00 ## schoolII 7.91 0.27 ## schoolIII 12.12 0.27 ## schoolIV 16.06 0.27 ## schoolV 20.43 0.27 ## schoolVI 28.05 0.27 ## --- ## n = 1200, k = 9 ## residual deviance = 8496.2, null deviance = 104432.7 (difference = 95936.5) ## overdispersion parameter = 7.1 ## residual sd is sqrt(overdispersion) = 2.67 Copy code
AIC(MLexamp.3) Copy code
## [1] 5774Copy code
anova (MLexamp, MLexamp.3, test = "F") copy the code
## Analysis of Deviance Table ## ## Model 1: extro ~ open + agree + social ## Model 2: extro ~ open + agree + social + school ## Resid. Df Resid. Dev Df Deviance F Pr(>F) ## 1 1196 104378 ## 2 1191 8496 5 95882 2688 <2e-16 *** ## --- ## Signif. codes: 0'***' 0.001'**' 0.01'*' 0.05'.' 0.1 '' 1 Copy code

The school factor greatly improves the fit of our model. But how do we explain these effects?

table(lmm.data$school, lmm.data$class) Copy code
## ## abcd ## I 50 50 50 50 ## II 50 50 50 50 ## III 50 50 50 50 ## IV 50 50 50 50 ## V 50 50 50 50 ## VI 50 50 50 50 Copy code

Here, we can see that we have a perfectly balanced design with 50 observations in each combination of classroom and school.

Let's try to model these unique factors. 

MLexamp.4 <- glm(extro ~ open + agree + social + school:class, data = lmm.data) display(MLexamp.4) Copy code
## glm(formula = extro ~ open + agree + social + school:class, data = lmm.data) ## coef.est coef.se ## (Intercept) 80.36 0.37 ## open 0.01 0.00 ## agree -0.01 0.01 ## social 0.00 0.00 ## schoolI:classa -40.39 0.20 ## schoolII:classa -28.15 0.20 ## schoolIII:classa -23.58 0.20 ## schoolIV:classa -19.76 0.20 ## schoolV:classa -15.50 0.20 ## schoolVI:classa -10.46 0.20 ## schoolI:classb -34.60 0.20 ## schoolII:classb -26.76 0.20 ## schoolIII:classb -22.59 0.20 ## schoolIV:classb -18.71 0.20 ## schoolV:classb -14.31 0.20 ## schoolVI:classb -8.54 0.20 ## schoolI:classc -31.86 0.20 ## schoolII:classc -25.64 0.20 ## schoolIII:classc -21.58 0.20 ## schoolIV:classc -17.58 0.20 ## schoolV:classc -13.38 0.20 ## schoolVI:classc -5.58 0.20 ## schoolI:classd -30.00 0.20 ## schoolII:classd -24.57 0.20 ## schoolIII:classd -20.64 0.20 ## schoolIV:classd -16.60 0.20 ## schoolV:classd -12.04 0.20 ## --- ## n = 1200, k = 27 ## residual deviance = 1135.9, null deviance = 104432.7 (difference = 103296.8) ## overdispersion parameter = 1.0 ## residual sd is sqrt(overdispersion) = 0.98 Copy code
AIC(MLexamp.4) Copy code
## [1] 3396Copy code

This is very useful, but what if we want to understand the impact of the school and the impact of the classroom, as well as the impact of the school and the class? 

MLexamp.5 <- glm(extro ~ open + agree + social + school * class-1, data = lmm.data) display(MLexamp.5) Copy code
## glm(formula = extro ~ open + agree + social + school * class- ## 1, data = lmm.data) ## coef.est coef.se ## open 0.01 0.00 ## agree -0.01 0.01 ## social 0.00 0.00 ## schoolI 39.96 0.36 ## schoolII 52.21 0.36 ## schoolIII 56.78 0.36 ## schoolIV 60.60 0.36 ## schoolV 64.86 0.36 ## schoolVI 69.90 0.36 ## classb 5.79 0.20 ## classc 8.53 0.20 ## classd 10.39 0.20 ## schoolII:classb -4.40 0.28 ## schoolIII:classb -4.80 0.28 ## schoolIV:classb -4.74 0.28 ## schoolV:classb -4.60 0.28 ## schoolVI:classb -3.87 0.28 ## schoolII:classc -6.02 0.28 ## schoolIII:classc -6.54 0.28 ## schoolIV:classc -6.36 0.28 ## schoolV:classc -6.41 0.28 ## schoolVI:classc -3.65 0.28 ## schoolII:classd -6.81 0.28 ## schoolIII:classd -7.45 0.28 ## schoolIV:classd -7.24 0.28 ## schoolV:classd -6.93 0.28 ## schoolVI:classd 0.06 0.28 ## --- ## n = 1200, k = 27 ## residual deviance = 1135.9, null deviance = 4463029.9 (difference = 4461894.0) ## overdispersion parameter = 1.0 ## residual sd is sqrt(overdispersion) = 0.98 Copy code
AIC(MLexamp.5) Copy code
## [1] 3396Copy code

Explore random slopes

Another option is to build a separate model for each school and class combination. If we think that the relationship between our variables may be highly dependent on the school and class combination, we can simply fit a series of models and explore the parameter changes between them:

require(plyr) display(modellist[[1]]) Copy code
## glm(formula = extro ~ open + agree + social, data = x) ## coef.est coef.se ## (Intercept) 35.87 5.90 ## open 0.05 0.09 ## agree 0.02 0.10 ## social 0.01 0.03 ## --- ## n = 50, k = 4 ## residual deviance = 500.2, null deviance = 506.2 (difference = 5.9) ## overdispersion parameter = 10.9 ## residual sd is sqrt(overdispersion) = 3.30 Copy code
display(modellist[[2]]) Copy code
## glm(formula = extro ~ open + agree + social, data = x) ## coef.est coef.se ## (Intercept) 47.96 2.16 ## open -0.01 0.03 ## agree -0.03 0.03 ## social -0.01 0.01 ## --- ## n = 50, k = 4 ## residual deviance = 47.9, null deviance = 49.1 (difference = 1.2) ## overdispersion parameter = 1.0 ## residual sd is sqrt(overdispersion) = 1.02 Copy code

We will discuss this strategy in more depth in a future tutorial, including how to perform performance inferences in the list of models generated in this command.

Build different slope models

Although all the above techniques are effective methods to solve this problem, they are not necessarily the best methods when we are clearly interested in changes between groups. This is where the mixed effects modeling framework is useful. Now we use

lmer
Functions with a familiar formula interface, use special syntax to specify group-level variables:
(1|school),
Make
lmer
Fit a linear model with variable intercept group effect
school
.

display(MLexamp.6) Copy code
## lmer(formula = extro ~ open + agree + social + (1 | school), ## data = lmm.data) ## coef.est coef.se ## (Intercept) 59.12 4.10 ## open 0.01 0.01 ## agree 0.03 0.02 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. ## school (Intercept) 9.79 ## Residual 2.67 ## --- ## number of obs: 1200, groups: school, 6 ## AIC = 5836.1, DIC = 5789 ## deviance = 5806.5 Copy code

We can use multiple groups to fit multiple group effects.

display(MLexamp.7) Copy code
## lmer(formula = extro ~ open + agree + social + (1 | school) + ## (1 | class), data = lmm.data) ## coef.est coef.se ## (Intercept) 60.20 4.21 ## open 0.01 0.01 ## agree -0.01 0.01 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. ## school (Intercept) 9.79 ## class (Intercept) 2.41 ## Residual 1.67 ## --- ## number of obs: 1200, groups: school, 6; class, 4 ## AIC = 4737.9, DIC = 4683 ## deviance = 4703.6 Copy code

Finally, we can fit the nesting with the following syntax:

display(MLexamp.8) Copy code
## lmer(formula = extro ~ open + agree + social + (1 | school/class), ## data = lmm.data) ## coef.est coef.se ## (Intercept) 60.24 4.01 ## open 0.01 0.00 ## agree -0.01 0.01 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. ## class:school (Intercept) 2.86 ## school (Intercept) 9.69 ## Residual 0.98 ## --- ## number of obs: 1200, groups: class:school, 24; school, 6 ## AIC = 3568.6, DIC = 3508 ## deviance = 3531.1 Copy code
Copy code

it's here

(1|school/class)
, We want to
1|
The mixed effect of different intercepts in the curriculum setting of schools and schools.

Use lmer to fit a varying slope model

However, if we want to explore the impact of different student level indicators, because they vary from classroom to classroom. We can fit different slope models instead of fitting models by school (or school/class). Here, we modify our random effects term to include variables before the grouping term:

(1 + open|school/class)
Tell R the slope of the fit change and the intercept model for different schools and school categories, and allow
open
The slope of the variable varies from school to school.

display(MLexamp.9) Copy code
## lmer(formula = extro ~ open + agree + social + (1 + open | school/class), ## data = lmm.data) ## coef.est coef.se ## (Intercept) 60.26 3.93 ## open 0.01 0.01 ## agree -0.01 0.01 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. Corr ## class:school (Intercept) 2.62 ## open 0.01 1.00 ## school (Intercept) 9.51 ## open 0.00 1.00 ## Residual 0.98 ## --- ## number of obs: 1200, groups: class:school, 24; school, 6 ## AIC = 3574.7, DIC = 3506 ## deviance = 3529.3 Copy code

Conclusion

In R language and ecosystem, it is very easy to fit mixed effects models and explore group variation. In future tutorials, we will explore the comparison of models, use mixed effects models for reasoning, and create graphical representations of mixed effects models to understand their effects.

appendix

## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] plyr_1.8 arm_1.6-10 MASS_7.3-29 lme4_1.0-5 ## [5] Matrix_1.1-0 lattice_0.20-24 knitr_1.5 ## ## loaded via a namespace (and not attached): ## [1] abind_1.4-0 coda_0.16-1 evaluate_0.5.1 formatR_0.10 ## [5] grid_3.0.1 minqa_1.2.1 nlme_3.1-113 splines_3.0.1 ## [9] stringr_0.6.2 tools_3.0.1 Copy code

 

 

Thank you very much for reading this article. If you have any questions, please leave a message below!


references

 

1. Lmer mixed linear regression model based on R language

 

2. Use Rshiny to explore lme4 generalized linear mixed model (GLMM) and linear mixed model (LMM) with R language

 

3. Practical case of R language linear mixed effect model

 

4. Practical case of R language linear mixed effect model 2

 

5. Practical case of R language linear mixed effect model

 

6. Partially folded Gibbs sampling of Linear Mixed-Effects Models

 

7. R language LME4 mixed effects model to study the popularity of teachers

 

8. The HAR-RV model based on mixed data sampling (MIDAS) regression in R language predicts GDP growth

 

9. Hierarchical linear model HLM using SAS, Stata, HLM, R, SPSS and Mplus