## Specifying the model

This is the second part of a series of notes demonstrating use of the R package, R2MLwiN, an R command interface to the multilevel modelling software package, MLwiN (see the MLwiN site for getting access to MLwiN). The first set of notes showed how to get started with R2MLwiN. In these notes, I show how to fit predictors (continuous, categorical, and interactions) to the fixed-effects part of a multilevel regression model, and how to fit random slopes to the regression model. The examples use the ALNT.csv data (see Working with R2MLwiN Part 1 for a description of the data). Though the series is concerned with demonstrating Bayesian estimation using MCMC methods, the examples presented here do not depend on MCMC methods of estimations, and so to speed up the running of the examples, they use maximum likelihood estimation. It is easy enough to switch between maximum likelihood and MCMC procedures; set the`estM`

option to 1 for MCMC, and 0 for maximum likelihood.### Some Preliminaries

Load the`R2MLwiN`

and the `plyr`

packages. The `ddply()`

function from the `plyr`

package will be used to add some derived variables to the data file.```
library(R2MLwiN)
library(plyr)
```

The examples in these notes use the ALNT data file available from github. Assuming the data file is located in your working directory, the following command loads the data, and lists the variables. (See the end of this blog for an easy way to get GitHub data into an R session.)

```
alnt = read.csv("ALNT.csv", header = TRUE, sep = ",")
names(alnt)
```

The alnt data file needs some modifications:

- a column of 1s for the 'intercept' variable;
- grand mean centring of the write3 variable;
- group mean centring of the write3 variable;
- group (school) means for the write3 variable; and,
- two variables, gender and location, should be changed from numeric variables to factors.

```
alnt$cons = 1 # Intercept
alnt$write3C = alnt$write3 - mean(alnt$write3) # Grand mean centring
alnt = ddply(alnt, .(school), mutate, write3GP = write3 - mean(write3)) # Group mean centring
alnt = ddply(alnt, .(school), mutate, GrpMeans = mean(write3)) # Group means
alnt$gender = factor(alnt$gender, labels = c("male", "female")) # Factor labels
alnt$location = factor(alnt$location, labels = c("metro", "prov", "rural", "remote"))
```

Next, set up the information to be passed to MLwiN:

```
# 1. Path to MLwiN executable (mlwin.exe)
mlwin = c("C:/Program Files/MLwiN v2.26/i386") # Substitute the path on your machine
# 2. The variables containing IDs for all levels - highest level first
levID = c("school", "student")
# 3. Distribution to be modelled
D = "Normal"
# 4. Method of estimation (maximum likelihood)
estoptions = list(EstM = 0) # Set EstM to 1 for MCMC estimation
```

### Continuous Predictors

In all the models to follow, write7 is the dependent variable. In Part 1 of these notes, the only predictor in the model was the intercept, and so the model was an “empty model”. In this section, I take into account students earlier (year 3) writing achievements.The code demonstrates the setting up of models using different versions of the write3 variable as the predictor variable. The models are random-intercepts models. Versions of the write3 variable enter the models in turn as:

- uncentred write3;
- write3 centred on the grand mean;
- write3 centred on the group means + school means; and,
- write3 centred on the grand mean + school means.

The code for the formulas to be passed to MLwiN are:

```
# No centring
f1 = "write7 ~ (0 | cons + write3) + (1 | cons) + (2 | cons)"
# Grand mean centring
f2 = "write7 ~ (0 | cons + write3C) + (1 | cons) + (2 | cons)"
# Group mean centring + School means
f3 = "write7 ~ (0 | cons + write3Gp + GrpMeans) + (1 | cons) + (2 | cons)"
# Grand mean centring + School means
f4 = "write7 ~ (0 | cons + write3C + GrpMeans) + (1 | cons) + (2 | cons)"
```

Run the models.

```
mod1 = runMLwiN(Formula = f1, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
mod2 = runMLwiN(Formula = f2, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
mod3 = runMLwiN(Formula = f3, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
mod4 = runMLwiN(Formula = f4, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
```

Summaries for the 3rd and 4th models only are shown below:

```
MODEL 3
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.75s
Number of obs: 3097
Deviance statistic: 37810.9
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3Gp+GrpMeans)+(1|cons)+(2|cons)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 306.12204 61.41767 4.98 6.22e-07 *** 185.74563 426.49845
write3Gp 0.42190 0.01934 21.81 1.882e-105 *** 0.38398 0.45981
GrpMeans 0.84466 0.11832 7.14 9.426e-13 *** 0.61275 1.07657
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1216.74207 269.28083 688.96134 1744.52279
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 11335.23047 291.24032 10764.40994 11906.05100
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

```
MODEL 4
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.51s
Number of obs: 3097
Deviance statistic: 37810.9
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C+GrpMeans)+(1|cons)+(2|cons)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 525.47400 62.23589 8.44 3.086e-17 *** 403.49390 647.45410
write3C 0.42190 0.01934 21.81 1.883e-105 *** 0.38398 0.45981
GrpMeans 0.42277 0.11989 3.53 0.0004216 *** 0.18778 0.65776
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1216.73645 269.27958 688.95817 1744.51473
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 11335.23242 291.24034 10764.41184 11906.05301
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

Consider the estimates of the coefficients for

`write3Gp`

, `write3C`

and `GrpMeans`

in the summaries of the two models. Model 3 contains coefficients for write3 centred on the group means, and the group means. The first coefficient is a within-schools slope (\( \,\beta_w \)); the second coefficient is a between-schools coefficient (\( \,\beta_b \)). Their interpretations are:- \( \beta_w \) is the expected difference in write7 scores for two students in the same school whose write3 scores differ by one point;
- \( \beta_b \) is the expected difference between the mean write7 scores for two schools that differ by one point for mean write3 scores.

Model 4 contains coefficients for write3 centred on the grand mean, and the group means. The first coefficient is again the within-groups slope (\( \,\beta_w \)) (the same within-groups slope that was estimated in Model 3 even though a different version of the predictor was used). The second coefficient is a contextual effect (\( \,\beta_c \)). Its interpretation is:

- \( \beta_c \) is the expected difference in write7 scores for two students with the same write3 scores but who attend schools that differ in mean write3 scores by one point.

The relationship among the three \( \beta \) values is: \( \beta_b = \beta_w + \beta_c \)

Thus, the two coefficients returned by either model can be used to calculate the third.

Using Model 4, \( \,\beta_b \) can be derived from \( \,\beta_w \) and \( \,\beta_c \):

- \( \beta_w = \) 0.4219
- \( \beta_c = \) 0.4228
- Thus, \( \,\beta_b = \) 0.8447

Or, using Model 3, \( \,\beta_c \) can be derived from \( \,\beta_w \) and \( \,\beta_b \):

- \( \beta_w = \) 0.4219
- \( \beta_b = \) 0.8447
- Thus, \( \,\beta_c = \) 0.4228

### Categorical predictors: `gender`

and `location`

A binomial predictor (`gender`

) is fitted to the model by naming the variable in the fixed part of the model formula, but to get the category names included in the output, a reference category should be nominated within square brackets. A multinomial variable (`location`

) is similarly fitted by naming it and nominating a reference category within square brackets.```
f5 = "write7 ~ (0 | cons + write3C + gender[male] + location[remote]) + (1 | cons) + (2 | cons)"
mod5 = runMLwiN(Formula = f5, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
```

```
MODEL 5
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.55s
Number of obs: 3097
Deviance statistic: 37711.1
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C+gender[male]+location[remote])+(1|cons)+(2|cons)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 703.17004 21.45731 32.77 1.542e-235 *** 661.11449 745.22560
write3C 0.38592 0.01927 20.02 3.526e-89 *** 0.34814 0.42370
female 41.25539 3.89543 10.59 3.29e-26 *** 33.62049 48.89028
metro 23.29506 22.51005 1.03 0.3007 -20.82384 67.41395
prov 24.03177 27.16254 0.88 0.3763 -29.20582 77.26936
rural 19.27530 23.43201 0.82 0.4107 -26.65060 65.20119
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1508.27771 319.07181 882.90845 2133.64697
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 10928.06738 280.81009 10377.68973 11478.44504
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

### Interaction between a continuous predictor and a binomial predictor: `write3C:gender`

Interactions are fitted to the model by naming the variables in the fixed part of the model formula, separating the two names with a colon.```
# No interaction
f6 = "write7 ~ (0 | cons + write3C + gender) + (1 | cons) + (2 | cons)"
# Interaction
f7 = "write7 ~ (0 | cons + write3C + gender + write3C:gender) + (1 | cons) + (2 | cons)"
mod6 = runMLwiN(Formula = f6, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
mod7 = runMLwiN(Formula = f7, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
```

```
MODEL 6
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.56s
Number of obs: 3097
Deviance statistic: 37712.2
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C+gender)+(1|cons)+(2|cons)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 723.94867 5.59581 129.37 0 *** 712.98108 734.91626
write3C 0.38583 0.01927 20.02 3.471e-89 *** 0.34806 0.42359
gender 41.25045 3.89577 10.59 3.369e-26 *** 33.61488 48.88602
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1521.67712 321.32080 891.89993 2151.45431
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 10930.38574 280.87301 10379.88476 11480.88673
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

```
MODEL 7
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.52s
Number of obs: 3097
Deviance statistic: 37709.4
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C+gender+write3C:gender)+(1|cons)+(2|cons)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 723.23883 5.60825 128.96 0 *** 712.24686 734.23080
write3C 0.35672 0.02594 13.75 5.077e-43 *** 0.30587 0.40757
gender 41.17575 3.89428 10.57 3.959e-26 *** 33.54311 48.80839
write3C:gender 0.06165 0.03681 1.67 0.09397 . -0.01050 0.13381
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1519.48962 320.89186 890.55314 2148.42611
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 10920.60156 280.62144 10370.59365 11470.60947
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

### Interactions with multinomal predictors: `gender:location`

and `write3C:location`

It seems that all the categories (except the reference category) of the multinomail variable implicated in the interaction have to be named.```
f8 = "write7 ~ (0 | cons + write3C + location[remote] + gender[male] + gender:metro + gender:prov + gender:rural) + (1 | cons) + (2 | cons)"
f9 = "write7 ~ (0 | cons + write3C + location[remote] + write3C:metro + write3C:prov + write3C:rural) + (1 | cons) + (2 | cons)"
mod8 = runMLwiN(Formula = f8, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
mod9 = runMLwiN(Formula = f9, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
```

```
MODEL 8
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.61s
Number of obs: 3097
Deviance statistic: 37706.8
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C+location[metro]+gender[male]+gender:prov+gender:rural+gender:remote(1|cons)+(2|cons)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 728.29382 7.43251 97.99 0 *** 713.72637 742.86127
write3C 0.38605 0.01927 20.03 2.969e-89 *** 0.34828 0.42382
prov -0.24233 19.42681 -0.01 0.99 -38.31818 37.83352
rural -11.41142 12.93321 -0.88 0.3776 -36.76004 13.93720
remote -41.91774 25.48594 -1.64 0.1 -91.86926 8.03377
female 37.61590 4.50712 8.35 7.068e-17 *** 28.78211 46.44969
gender:prov 1.77819 14.83192 0.12 0.9046 -27.29185 30.84823
gender:rural 15.37919 10.37243 1.48 0.1382 -4.95041 35.70878
gender:remote 36.63394 23.55428 1.56 0.1199 -9.53159 82.79948
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1505.50220 318.52834 881.19812 2129.80628
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 10913.01953 280.42303 10363.40048 11462.63858
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

```
MODEL 9
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.59s
Number of obs: 3097
Deviance statistic: 37812.2
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C+location[metro]+write3C:prov+write3C:rural+write3C:remote)+(1|cons)+(2|cons)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 747.09021 7.12631 104.84 0 *** 733.12289 761.05753
write3C 0.42681 0.02207 19.34 2.435e-83 *** 0.38355 0.47006
prov -3.61633 18.43805 -0.20 0.8445 -39.75424 32.52157
rural -4.88976 12.03440 -0.41 0.6845 -28.47675 18.69722
remote -22.29548 22.68583 -0.98 0.3257 -66.75889 22.16793
write3C:prov -0.12652 0.07598 -1.67 0.09589 . -0.27544 0.02240
write3C:rural 0.03808 0.05334 0.71 0.4754 -0.06648 0.14263
write3C:remote 0.27499 0.11943 2.30 0.0213 * 0.04092 0.50906
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1520.18079 322.95416 887.20226 2153.15931
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 11295.73242 290.25483 10726.84341 11864.62144
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

### Random intercepts and random slopes

So far, the only variation between the schools allowed by the models is in the schools' intercepts, but the slopes can also be allowed to vary. To fit random slopes to the regression model, name`write3C`

in the level 2 random part of the model formula.```
f10 = "write7 ~ (0 | cons + write3C) + (1 | cons) + (2 | cons + write3C)"
mod10 = runMLwiN(Formula = f10, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
```

The model can be represented as follows:

\[ \begin{align}

write7_{ij} &=\beta_{0j} + \beta_{1j}write3C_{ij} + e_{0ij}\\\

\beta_{0j} &=\beta_{0} + u_{0j}\\\

\beta_{1j} &=\beta_{1} + u_{1j}\\\

\end{align}

\]\[ \begin{align}

\text{ where }

\begin{bmatrix}

u_{0j}\\ u_{1j}

\end{bmatrix}

& \sim N(0,\; \Omega_u) \qquad\text{:}\qquad\Omega_u =

\begin{bmatrix}

\sigma^2_{u0} & \phantom{1} \\

\sigma_{u01} & \sigma^2_{u1}

\end{bmatrix}\\\

\text{and } e_{ij} & \sim N(0,\; \sigma^2_e)

\end{align}

\]

The analysis returns estimates for the fixed parts of the model: the intercept (\( \,\beta_0 \)) and the slope (\( \,\beta_1 \)). As well, there are student level residuals (\( \,e_{ij} \)) and thus a student level (level 1) variance (\( \,\sigma^2_{ij} \)). At the school level, there are residuals for the intercepts (\( \,u_{0j} \)) and for the slopes (\( \,u_{1j} \)), and thus corresponding to the two sets of residuals, there are two school level (level 2) variances (\( \,\sigma^2_{u0} \) and \( \,\sigma^2_{u1} \)). But the school intercepts and slopes can be correlated, and thus there is also a level 2 covariance (\( \,\sigma_{u01} \)).

```
MODEL 10
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.97s
Number of obs: 3097
Deviance statistic: 37809
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C)+(1|cons)+(2|cons+write3C)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 744.10114 5.12260 145.26 0 *** 734.06103 754.14124
write3C 0.44131 0.02517 17.53 7.906e-69 *** 0.39198 0.49064
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1.406e+03 304.63530 809.37110 2.004e+03
cov_cons_write3C 3.104e+00 1.11109 0.92639 5.282e+00
var_write3C 1.483e-02 0.00676 0.00157 2.809e-02
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 11227.50391 290.93843 10657.27506 11797.73276
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

The covariance (3.104) corresponds to a correlation of 0.680. It means that schools with larger mean write3 scores tend to have larger slopes.

To constrain the level 2 covariance to zero, the

`clre`

option is used. `clre`

is a matrix with three rows. Here, the matrix contains one column because there is only one coviariance to be contrained to zero. The first row of the matrix indicates the level of the covariance to be contrained to zero; here, level 2. The second and third rows indicate the covariance; here, the covariance between `cons`

and `write3C`

. `clre`

is passed to MLwiN as one of the list elements of `estoptions`

.```
clre = matrix(, nrow = 3, ncol = 1)
clre[1, 1] = 2
clre[2, 1] = "cons"
clre[3, 1] = "write3C"
estoptions = list(EstM = 0, clre = clre)
mod10b = runMLwiN(Formula = f10, levID = levID, D = D, indata = alnt, estoptions = estoptions,
MLwiNPath = mlwin, workdir = tempdir())
```

```
MODEL 10b
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN multilevel model (Normal)
Estimation algorithm: IGLS Elapsed time : 0.56s
Number of obs: 3097
Deviance statistic: 37817.2
---------------------------------------------------------------------------------------------------
The model formula:
write7~(0|cons+write3C)+(1|cons)+(2|cons+write3C)
Level 2: school Level 1: student
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Conf. Interval]
cons 744.25812 5.21915 142.60 0 *** 734.02877 754.48747
write3C 0.44040 0.02370 18.58 4.536e-77 *** 0.39395 0.48686
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the school level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 1.474e+03 315.63714 855.19228 2.092e+03
var_write3C 1.001e-02 0.00585 -0.00146 2.147e-02
---------------------------------------------------------------------------------------------------
The random part estimates at the student level:
Coef. Std. Err. [95% Conf. Interval]
var_cons 11239.34082 291.29545 10668.41223 11810.26941
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
```

The only difference between this model and the previous one is in the variance-covariance structure. The variance-covariance structure for this model can be represented as:

\[ \begin{align}

\begin{bmatrix}

u_{0j}\\ u_{1j}

\end{bmatrix}

& \sim N(0,\; \Omega_u) \qquad\text{:}\qquad\Omega_u =

\begin{bmatrix}

\sigma^2_{u0} & \phantom{1} \\

0 & \sigma^2_{u1}

\end{bmatrix}\\\

\text{and } e_{ij} & \sim N(0,\; \sigma^2_e)

\end{align}

\]

#### Getting GitHub data into R

Christopher Gandrud's Source_GitHubData function makes it easy to get GitHub data into R. The URL for the ALNT data file is https://raw.github.com/SandyMuspratt/Blog-files/master/R2MLwiN%20I/ALNT.csv. Or, the shortened bitly URL is: http://bit.ly/10A5LBYThe following commands should get the ALNT data into your R session.

```
# Christopher Gandrud's source_GitHubData function
source_GitHubData <-function(url, sep = ",", header = TRUE)
{
require(httr)
request <- GET(url)
stop_for_status(request)
handle <- textConnection(content(request, as = 'text'))
on.exit(close(handle))
read.table(handle, sep = sep, header = header)
}
url = "http://bit.ly/10A5LBY"
alnt = source_GitHubData(url = url)
names(alnt)
```

`[1] "student" "school" "gender" "write3" "write7" "location" "dsi"`

Great, post I must say – much more accessible than the specifications in the R2Mlwin manual. One question, after model specification and run, how do you then go about to organize and prepare your multilevel outputs (mainly regression-tables)? Basically, I want to (1) organize my outputs for analysis (maybe send them to excel, or you might have some other recommendation), (2) after analysis decide which models to present, and then make them journal-presentable. Also, I am mainly running IGLS (might to MCMC later on), I will run both linear probability models as well as logistic.

ReplyDeleteI am new to R, so I have not much experience here.

All input is appreciated.

Best,

Adel

Sorry for the delay. I was trying to get some work finished before the Easter break. With respect to point 1, I assume you're trying to organise output into a table in MS Word. In the past, it was largely a matter of cut-and-paste, but some recent work by others last year and this year have made the job a little easier. You might be able to take something from a recent blog by Tal Galili and some of the comments therein. Your point 2 - are you referring to model selection? There are a number of books giving examples in R on model selection. One I found useful was Gelman and Hill's "Data analysis using regression and multilevel/hierarchical model". Also Faraway's "Extending the linear model with R", and Zuur et al's "Mixed effects models and extensions in ecology with R" (the latter, as the title suggests, takes examples from biology). For an MLwiN flavour, there are tutorials and teaching modules on the MLwiN (multilevel modelling) website. But if you are using maximum likelihood methods of estimation and want to work with R, I would not be recommending R2MLwiN. There are packages in R (e.g., lme4) better suited to the task.

ReplyDeleteThanks for the reply and suggestions Sandy. Some follow up:

ReplyDelete(1) Using R2mlwin or some package in R: My main problem that I hope to solve by using r2mlwin is the absence of good syntax within the Mlwin environment itself. I am quite familiar with Mlwin’s point-and-click workflow, but it is not very smooth in the long run as you know. And since I am impressed by what is possible to do with R (just started to learn and use it), I hope it is a better investment to use Runmlwin (the Stata plugin) – I am mainly a SPSS user. My experience thus far is however that it is not very smooth to work with R2mlwin – this might be due to the fact that I am still new to R.

(2) Estimation procedures: I understand some of the basic principles of MCMC but not enough and therefore prefer IGLS estimations. I understand that R2mlwin is heavily biased to MCMC, after I have been in contact with the developer. Currently, I am running some models that show different results in R2mlwin compared to running them in Mlwin. I am suspecting that missing values are handled slightly differently, even if both use the Mlwin engine.

(3) A related question to (2), about missing and categorical dependent variable: how does R2mlwin handle missing? It should be exactly the same as Mlwin is handling it. The numbers of observations are different after running and comparing the same models in Mlwin and with r2mlwin. Do you have any idea why this is so?

Thanks For Your valuable posting, it was very informative. Am working inCloud Erp Software In India

ReplyDelete