Tuesday, 19 February 2013

Working with R2MLwiN Part 2




Specifying the model

This is the second part of a series of notes demonstrating use of the R package, R2MLwiN, an R command interface to the multilevel modelling software package, MLwiN (see the MLwiN site for getting access to MLwiN). The first set of notes showed how to get started with R2MLwiN. In these notes, I show how to fit predictors (continuous, categorical, and interactions) to the fixed-effects part of a multilevel regression model, and how to fit random slopes to the regression model. The examples use the ALNT.csv data (see Working with R2MLwiN Part 1 for a description of the data). Though the series is concerned with demonstrating Bayesian estimation using MCMC methods, the examples presented here do not depend on MCMC methods of estimations, and so to speed up the running of the examples, they use maximum likelihood estimation. It is easy enough to switch between maximum likelihood and MCMC procedures; set the estM option to 1 for MCMC, and 0 for maximum likelihood.

Some Preliminaries

Load the R2MLwiN and the plyr packages. The ddply() function from the plyr package will be used to add some derived variables to the data file.

library(R2MLwiN)
library(plyr)

The examples in these notes use the ALNT data file available from github. Assuming the data file is located in your working directory, the following command loads the data, and lists the variables. (See the end of this blog for an easy way to get GitHub data into an R session.)

alnt = read.csv("ALNT.csv", header = TRUE, sep = ",")
names(alnt)

The alnt data file needs some modifications:
  • a column of 1s for the 'intercept' variable;
  • grand mean centring of the write3 variable;
  • group mean centring of the write3 variable;
  • group (school) means for the write3 variable; and,
  • two variables, gender and location, should be changed from numeric variables to factors.

alnt$cons = 1                                                            # Intercept

alnt$write3C = alnt$write3 - mean(alnt$write3)                           # Grand mean centring
alnt = ddply(alnt, .(school), mutate, write3GP = write3 - mean(write3))  # Group mean centring

alnt = ddply(alnt, .(school), mutate, GrpMeans = mean(write3))           # Group means

alnt$gender = factor(alnt$gender, labels = c("male", "female"))          # Factor labels
alnt$location = factor(alnt$location, labels = c("metro", "prov", "rural", "remote"))

Next, set up the information to be passed to MLwiN:

# 1. Path to MLwiN executable (mlwin.exe)
mlwin = c("C:/Program Files/MLwiN v2.26/i386")        # Substitute the path on your machine

# 2. The variables containing IDs for all levels - highest level first
levID = c("school", "student")

# 3. Distribution to be modelled
D = "Normal"

# 4. Method of estimation (maximum likelihood)
estoptions = list(EstM = 0)                           # Set EstM to 1 for MCMC estimation

Continuous Predictors

In all the models to follow, write7 is the dependent variable. In Part 1 of these notes, the only predictor in the model was the intercept, and so the model was an “empty model”. In this section, I take into account students earlier (year 3) writing achievements.

The code demonstrates the setting up of models using different versions of the write3 variable as the predictor variable. The models are random-intercepts models. Versions of the write3 variable enter the models in turn as:
  1. uncentred write3;
  2. write3 centred on the grand mean;
  3. write3 centred on the group means + school means; and,
  4. write3 centred on the grand mean + school means.

The code for the formulas to be passed to MLwiN are:

# No centring
f1 = "write7 ~ (0 | cons + write3) + (1 | cons) + (2 | cons)"   

# Grand mean centring           
f2 = "write7 ~ (0 | cons + write3C) + (1 | cons) + (2 | cons)"   

# Group mean centring + School means          
f3 = "write7 ~ (0 | cons + write3Gp + GrpMeans) + (1 | cons) + (2 | cons)" 

# Grand mean centring + School means
f4 = "write7 ~ (0 | cons + write3C + GrpMeans) + (1 | cons) + (2 | cons)"  

Run the models.

mod1 = runMLwiN(Formula = f1, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())

mod2 = runMLwiN(Formula = f2, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())

mod3 = runMLwiN(Formula = f3, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())

mod4 = runMLwiN(Formula = f4, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())

Summaries for the 3rd and 4th models only are shown below:


MODEL 3

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.75s 
Number of obs:  3097 
Deviance statistic:  37810.9 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3Gp+GrpMeans)+(1|cons)+(2|cons) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.       z     Pr(>|z|)         [95% Conf.   Interval] 
cons       306.12204    61.41767    4.98     6.22e-07   ***    185.74563   426.49845 
write3Gp     0.42190     0.01934   21.81   1.882e-105   ***      0.38398     0.45981 
GrpMeans     0.84466     0.11832    7.14    9.426e-13   ***      0.61275     1.07657 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                Coef.   Std. Err.   [95% Conf.    Interval] 
var_cons   1216.74207   269.28083    688.96134   1744.52279 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   11335.23047   291.24032   10764.40994   11906.05100 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 


MODEL 4

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.51s 
Number of obs:  3097 
Deviance statistic:  37810.9 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C+GrpMeans)+(1|cons)+(2|cons) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.       z     Pr(>|z|)         [95% Conf.   Interval] 
cons       525.47400    62.23589    8.44    3.086e-17   ***    403.49390   647.45410 
write3C      0.42190     0.01934   21.81   1.883e-105   ***      0.38398     0.45981 
GrpMeans     0.42277     0.11989    3.53    0.0004216   ***      0.18778     0.65776 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                Coef.   Std. Err.   [95% Conf.    Interval] 
var_cons   1216.73645   269.27958    688.95817   1744.51473 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   11335.23242   291.24034   10764.41184   11906.05301 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 

Consider the estimates of the coefficients for write3Gp, write3C and GrpMeans in the summaries of the two models. Model 3 contains coefficients for write3 centred on the group means, and the group means. The first coefficient is a within-schools slope (\( \,\beta_w \)); the second coefficient is a between-schools coefficient (\( \,\beta_b \)). Their interpretations are:
  • \( \beta_w \) is the expected difference in write7 scores for two students in the same school whose write3 scores differ by one point;
  • \( \beta_b \) is the expected difference between the mean write7 scores for two schools that differ by one point for mean write3 scores.

Model 4 contains coefficients for write3 centred on the grand mean, and the group means. The first coefficient is again the within-groups slope (\( \,\beta_w \)) (the same within-groups slope that was estimated in Model 3 even though a different version of the predictor was used). The second coefficient is a contextual effect (\( \,\beta_c \)). Its interpretation is:
  • \( \beta_c \) is the expected difference in write7 scores for two students with the same write3 scores but who attend schools that differ in mean write3 scores by one point.

The relationship among the three \( \beta \) values is: \( \beta_b = \beta_w + \beta_c \)

Thus, the two coefficients returned by either model can be used to calculate the third.

Using Model 4, \( \,\beta_b \) can be derived from \( \,\beta_w \) and \( \,\beta_c \):
  • \( \beta_w = \) 0.4219
  • \( \beta_c = \) 0.4228
  • Thus, \( \,\beta_b = \) 0.8447

Or, using Model 3, \( \,\beta_c \) can be derived from \( \,\beta_w \) and \( \,\beta_b \):
  • \( \beta_w = \) 0.4219
  • \( \beta_b = \) 0.8447
  • Thus, \( \,\beta_c = \) 0.4228

Categorical predictors: gender and location

A binomial predictor (gender) is fitted to the model by naming the variable in the fixed part of the model formula, but to get the category names included in the output, a reference category should be nominated within square brackets. A multinomial variable (location) is similarly fitted by naming it and nominating a reference category within square brackets.

f5 = "write7 ~ (0 | cons + write3C + gender[male] + location[remote]) + (1 | cons) + (2 | cons)"

mod5 = runMLwiN(Formula = f5, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())


MODEL 5

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.55s 
Number of obs:  3097 
Deviance statistic:  37711.1 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C+gender[male]+location[remote])+(1|cons)+(2|cons) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.       z     Pr(>|z|)         [95% Conf.   Interval] 
cons       703.17004    21.45731   32.77   1.542e-235   ***    661.11449   745.22560 
write3C      0.38592     0.01927   20.02    3.526e-89   ***      0.34814     0.42370 
female      41.25539     3.89543   10.59     3.29e-26   ***     33.62049    48.89028 
metro       23.29506    22.51005    1.03       0.3007          -20.82384    67.41395 
prov        24.03177    27.16254    0.88       0.3763          -29.20582    77.26936 
rural       19.27530    23.43201    0.82       0.4107          -26.65060    65.20119 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                Coef.   Std. Err.   [95% Conf.    Interval] 
var_cons   1508.27771   319.07181    882.90845   2133.64697 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   10928.06738   280.81009   10377.68973   11478.44504 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 

Interaction between a continuous predictor and a binomial predictor: write3C:gender

Interactions are fitted to the model by naming the variables in the fixed part of the model formula, separating the two names with a colon.

# No interaction
f6 = "write7 ~ (0 | cons + write3C + gender) + (1 | cons) + (2 | cons)"

# Interaction
f7 = "write7 ~ (0 | cons + write3C + gender + write3C:gender) + (1 | cons) + (2 | cons)"

mod6 = runMLwiN(Formula = f6, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())

mod7 = runMLwiN(Formula = f7, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())


MODEL 6

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.56s 
Number of obs:  3097 
Deviance statistic:  37712.2 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C+gender)+(1|cons)+(2|cons) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.        z    Pr(>|z|)         [95% Conf.   Interval] 
cons       723.94867     5.59581   129.37           0   ***    712.98108   734.91626 
write3C      0.38583     0.01927    20.02   3.471e-89   ***      0.34806     0.42359 
gender      41.25045     3.89577    10.59   3.369e-26   ***     33.61488    48.88602 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                Coef.   Std. Err.   [95% Conf.    Interval] 
var_cons   1521.67712   321.32080    891.89993   2151.45431 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   10930.38574   280.87301   10379.88476   11480.88673 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 


MODEL 7

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.52s 
Number of obs:  3097 
Deviance statistic:  37709.4 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C+gender+write3C:gender)+(1|cons)+(2|cons) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
                     Coef.   Std. Err.        z    Pr(>|z|)         [95% Conf.   Interval] 
cons             723.23883     5.60825   128.96           0   ***    712.24686   734.23080 
write3C            0.35672     0.02594    13.75   5.077e-43   ***      0.30587     0.40757 
gender            41.17575     3.89428    10.57   3.959e-26   ***     33.54311    48.80839 
write3C:gender     0.06165     0.03681     1.67     0.09397   .       -0.01050     0.13381 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                Coef.   Std. Err.   [95% Conf.    Interval] 
var_cons   1519.48962   320.89186    890.55314   2148.42611 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   10920.60156   280.62144   10370.59365   11470.60947 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 

Interactions with multinomal predictors: gender:location and write3C:location

It seems that all the categories (except the reference category) of the multinomail variable implicated in the interaction have to be named.

f8 = "write7 ~ (0 | cons + write3C + location[remote] + gender[male] + gender:metro + gender:prov + gender:rural) + (1 | cons) + (2 | cons)"

f9 = "write7 ~ (0 | cons + write3C + location[remote] + write3C:metro + write3C:prov + write3C:rural) + (1 | cons) + (2 | cons)"

mod8 = runMLwiN(Formula = f8, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())

mod9 = runMLwiN(Formula = f9, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())


MODEL 8

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.61s 
Number of obs:  3097 
Deviance statistic:  37706.8 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C+location[metro]+gender[male]+gender:prov+gender:rural+gender:remote(1|cons)+(2|cons) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
                    Coef.   Std. Err.       z    Pr(>|z|)         [95% Conf.   Interval] 
cons            728.29382     7.43251   97.99           0   ***    713.72637   742.86127 
write3C           0.38605     0.01927   20.03   2.969e-89   ***      0.34828     0.42382 
prov             -0.24233    19.42681   -0.01        0.99          -38.31818    37.83352 
rural           -11.41142    12.93321   -0.88      0.3776          -36.76004    13.93720 
remote          -41.91774    25.48594   -1.64         0.1          -91.86926     8.03377 
female           37.61590     4.50712    8.35   7.068e-17   ***     28.78211    46.44969 
gender:prov       1.77819    14.83192    0.12      0.9046          -27.29185    30.84823 
gender:rural     15.37919    10.37243    1.48      0.1382           -4.95041    35.70878 
gender:remote    36.63394    23.55428    1.56      0.1199           -9.53159    82.79948 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                Coef.   Std. Err.   [95% Conf.    Interval] 
var_cons   1505.50220   318.52834    881.19812   2129.80628 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   10913.01953   280.42303   10363.40048   11462.63858 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 


MODEL 9

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.59s 
Number of obs:  3097 
Deviance statistic:  37812.2 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C+location[metro]+write3C:prov+write3C:rural+write3C:remote)+(1|cons)+(2|cons) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
                     Coef.   Std. Err.        z    Pr(>|z|)         [95% Conf.   Interval] 
cons             747.09021     7.12631   104.84           0   ***    733.12289   761.05753 
write3C            0.42681     0.02207    19.34   2.435e-83   ***      0.38355     0.47006 
prov              -3.61633    18.43805    -0.20      0.8445          -39.75424    32.52157 
rural             -4.88976    12.03440    -0.41      0.6845          -28.47675    18.69722 
remote           -22.29548    22.68583    -0.98      0.3257          -66.75889    22.16793 
write3C:prov      -0.12652     0.07598    -1.67     0.09589   .       -0.27544     0.02240 
write3C:rural      0.03808     0.05334     0.71      0.4754           -0.06648     0.14263 
write3C:remote     0.27499     0.11943     2.30      0.0213   *        0.04092     0.50906 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                Coef.   Std. Err.   [95% Conf.    Interval] 
var_cons   1520.18079   322.95416    887.20226   2153.15931 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   11295.73242   290.25483   10726.84341   11864.62144 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 

Random intercepts and random slopes

So far, the only variation between the schools allowed by the models is in the schools' intercepts, but the slopes can also be allowed to vary. To fit random slopes to the regression model, name write3C in the level 2 random part of the model formula.

f10 = "write7 ~ (0 | cons + write3C) + (1 | cons) + (2 | cons + write3C)"

mod10 = runMLwiN(Formula = f10, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())

The model can be represented as follows:
\[ \begin{align}
write7_{ij} &=\beta_{0j} + \beta_{1j}write3C_{ij} + e_{0ij}\\\
\beta_{0j} &=\beta_{0} + u_{0j}\\\
\beta_{1j} &=\beta_{1} + u_{1j}\\\
\end{align}
\]\[ \begin{align}
\text{ where }
\begin{bmatrix}
u_{0j}\\ u_{1j}
\end{bmatrix}
& \sim N(0,\; \Omega_u) \qquad\text{:}\qquad\Omega_u =
\begin{bmatrix}
\sigma^2_{u0} & \phantom{1} \\
\sigma_{u01} & \sigma^2_{u1}
\end{bmatrix}\\\
\text{and } e_{ij} & \sim N(0,\; \sigma^2_e)
\end{align}
\]

The analysis returns estimates for the fixed parts of the model: the intercept (\( \,\beta_0 \)) and the slope (\( \,\beta_1 \)). As well, there are student level residuals (\( \,e_{ij} \)) and thus a student level (level 1) variance (\( \,\sigma^2_{ij} \)). At the school level, there are residuals for the intercepts (\( \,u_{0j} \)) and for the slopes (\( \,u_{1j} \)), and thus corresponding to the two sets of residuals, there are two school level (level 2) variances (\( \,\sigma^2_{u0} \) and \( \,\sigma^2_{u1} \)). But the school intercepts and slopes can be correlated, and thus there is also a level 2 covariance (\( \,\sigma_{u01} \)).


MODEL 10

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.97s 
Number of obs:  3097 
Deviance statistic:  37809 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C)+(1|cons)+(2|cons+write3C) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.        z    Pr(>|z|)         [95% Conf.   Interval] 
cons       744.10114     5.12260   145.26           0   ***    734.06103   754.14124 
write3C      0.44131     0.02517    17.53   7.906e-69   ***      0.39198     0.49064 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                       Coef.   Std. Err.   [95% Conf.   Interval] 
var_cons           1.406e+03   304.63530    809.37110   2.004e+03 
cov_cons_write3C   3.104e+00     1.11109      0.92639   5.282e+00 
var_write3C        1.483e-02     0.00676      0.00157   2.809e-02 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   11227.50391   290.93843   10657.27506   11797.73276 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 

The covariance (3.104) corresponds to a correlation of 0.680. It means that schools with larger mean write3 scores tend to have larger slopes.

To constrain the level 2 covariance to zero, the clre option is used. clre is a matrix with three rows. Here, the matrix contains one column because there is only one coviariance to be contrained to zero. The first row of the matrix indicates the level of the covariance to be contrained to zero; here, level 2. The second and third rows indicate the covariance; here, the covariance between cons and write3C. clre is passed to MLwiN as one of the list elements of estoptions.

clre = matrix(, nrow = 3, ncol = 1)
clre[1, 1] = 2
clre[2, 1] = "cons"
clre[3, 1] = "write3C"

estoptions = list(EstM = 0, clre = clre)

mod10b = runMLwiN(Formula = f10, levID = levID, D = D, indata = alnt, estoptions = estoptions, 
    MLwiNPath = mlwin, workdir = tempdir())


MODEL 10b

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN multilevel model (Normal) 
Estimation algorithm:  IGLS        Elapsed time : 0.56s 
Number of obs:  3097 
Deviance statistic:  37817.2 
--------------------------------------------------------------------------------------------------- 
The model formula:
write7~(0|cons+write3C)+(1|cons)+(2|cons+write3C) 
Level 2: school     Level 1: student      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.        z    Pr(>|z|)         [95% Conf.   Interval] 
cons       744.25812     5.21915   142.60           0   ***    734.02877   754.48747 
write3C      0.44040     0.02370    18.58   4.536e-77   ***      0.39395     0.48686 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the school level: 
                  Coef.   Std. Err.   [95% Conf.   Interval] 
var_cons      1.474e+03   315.63714    855.19228   2.092e+03 
var_write3C   1.001e-02     0.00585     -0.00146   2.147e-02 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the student level: 
                 Coef.   Std. Err.    [95% Conf.     Interval] 
var_cons   11239.34082   291.29545   10668.41223   11810.26941 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 

The only difference between this model and the previous one is in the variance-covariance structure. The variance-covariance structure for this model can be represented as:
\[ \begin{align}
\begin{bmatrix}
u_{0j}\\ u_{1j}
\end{bmatrix}
& \sim N(0,\; \Omega_u) \qquad\text{:}\qquad\Omega_u =
\begin{bmatrix}
\sigma^2_{u0} & \phantom{1} \\
0 & \sigma^2_{u1}
\end{bmatrix}\\\
\text{and } e_{ij} & \sim N(0,\; \sigma^2_e)
\end{align}
\]






Getting GitHub data into R

Christopher Gandrud's Source_GitHubData function makes it easy to get GitHub data into R. The URL for the ALNT data file is https://raw.github.com/SandyMuspratt/Blog-files/master/R2MLwiN%20I/ALNT.csv. Or, the shortened bitly URL is: http://bit.ly/10A5LBY

The following commands should get the ALNT data into your R session.

# Christopher Gandrud's source_GitHubData function
source_GitHubData <-function(url, sep = ",", header = TRUE)
{
   require(httr)
   request <- GET(url)
   stop_for_status(request)
   handle <- textConnection(content(request, as = 'text'))
   on.exit(close(handle))
   read.table(handle, sep = sep, header = header)
}

url = "http://bit.ly/10A5LBY"
alnt = source_GitHubData(url = url)
names(alnt)

[1] "student"  "school"   "gender"   "write3"   "write7"   "location" "dsi"

3 comments:

  1. Great, post I must say – much more accessible than the specifications in the R2Mlwin manual. One question, after model specification and run, how do you then go about to organize and prepare your multilevel outputs (mainly regression-tables)? Basically, I want to (1) organize my outputs for analysis (maybe send them to excel, or you might have some other recommendation), (2) after analysis decide which models to present, and then make them journal-presentable. Also, I am mainly running IGLS (might to MCMC later on), I will run both linear probability models as well as logistic.

    I am new to R, so I have not much experience here.


    All input is appreciated.
    Best,
    Adel

    ReplyDelete
  2. Sorry for the delay. I was trying to get some work finished before the Easter break. With respect to point 1, I assume you're trying to organise output into a table in MS Word. In the past, it was largely a matter of cut-and-paste, but some recent work by others last year and this year have made the job a little easier. You might be able to take something from a recent blog by Tal Galili and some of the comments therein. Your point 2 - are you referring to model selection? There are a number of books giving examples in R on model selection. One I found useful was Gelman and Hill's "Data analysis using regression and multilevel/hierarchical model". Also Faraway's "Extending the linear model with R", and Zuur et al's "Mixed effects models and extensions in ecology with R" (the latter, as the title suggests, takes examples from biology). For an MLwiN flavour, there are tutorials and teaching modules on the MLwiN (multilevel modelling) website. But if you are using maximum likelihood methods of estimation and want to work with R, I would not be recommending R2MLwiN. There are packages in R (e.g., lme4) better suited to the task.

    ReplyDelete
  3. Thanks for the reply and suggestions Sandy. Some follow up:

    (1) Using R2mlwin or some package in R: My main problem that I hope to solve by using r2mlwin is the absence of good syntax within the Mlwin environment itself. I am quite familiar with Mlwin’s point-and-click workflow, but it is not very smooth in the long run as you know. And since I am impressed by what is possible to do with R (just started to learn and use it), I hope it is a better investment to use Runmlwin (the Stata plugin) – I am mainly a SPSS user. My experience thus far is however that it is not very smooth to work with R2mlwin – this might be due to the fact that I am still new to R.

    (2) Estimation procedures: I understand some of the basic principles of MCMC but not enough and therefore prefer IGLS estimations. I understand that R2mlwin is heavily biased to MCMC, after I have been in contact with the developer. Currently, I am running some models that show different results in R2mlwin compared to running them in Mlwin. I am suspecting that missing values are handled slightly differently, even if both use the Mlwin engine.

    (3) A related question to (2), about missing and categorical dependent variable: how does R2mlwin handle missing? It should be exactly the same as Mlwin is handling it. The numbers of observations are different after running and comparing the same models in Mlwin and with r2mlwin. Do you have any idea why this is so?

    ReplyDelete