Instituut voor toegepaste statistiek en data-analyse Geaccrediteerd door de Vereniging voor Statistiek

# Cursus Multilevel en Longitudinale Data-analyse met R

#### Introduction

Suppose you study the grades of pupils in school classes. Pupils in some classes may have better grades than those in other classes. Class membership influences the pupils’ grades. In statistical models, such influence must be included. With only a few classes, the mean grades of the classes could be compared, but if you have, say, 100 classes, another approach is needed. A similar example would be data from patients in many clinics, where clinics have different ways of treating patients and hence clinics have an impact on the outcome, like a patient’s wellbeing after a treatment. Finally, consider longitudinal data from observing patients once in a month, during several years. Patients’ outcomes may change over time, but in different ways for different patients. In all these situations multilevel models can be applied.

Multilevel models have important benefits. In the pupils’ grades example, different types of schools can be compared and the results generalised to the population of all schools, not only to the schools in the sample. For example, you could test if the influence of gender on the pupils’ grades varies significantly over schools (in the population) and the influence of gender may depend on the schools’ teaching methods. In the example of monthly measuring patients, you could test if there exists a linear (or quadratic, or …) influence of time on a patient’s wellbeing, and if this influence differs significantly before and after the patients received some treatment. In addition, a substantial number of patients may not be measured every month. For such patients, you would not have to throw away all data, but the data from the observed months can still be used, giving you more power for testing the treatment’s influence.

A particular kind of longitudinal data, very common in for example medical and biological research, are survival data: how long does a patient survive after a drug was administered, and how does this compare to patients receiving a placebo? An important analysis method for such data is Cox regression. Models for just one event, like dying, and models for “competing” events, like dying of different causes, are discussed. The course also covers particular conditions that must be met for these models to be applied.

#### Set-up

The course exists of seven lessons, in which the focus lies on interpreting the results of the models shown, and not on the mathematical background. The statistical package R was chosen for its popularity in the scientific and statistical realm. What is more, R is open-source software which is very well documented on the internet, with numerous examples of all kinds of statistical analyses worked out in detail. In the lessons, explanation of the multilevel models is interspersed with examples in R.

In a nutshell, the course covers:

• short introduction in R and RStudio
• why using multilevel analysis and for what type of data
• fixed and random influences of predictors
• models if the influence of a predictor differs across groups (interaction)
• models for longitudinal data
• models for binary 0/1 data or logistic regression
• GEE models as an alternative for multilevel models
• Cox regression models for survival data

#### Requirements

In order to participate, students should have some understanding of basic statistical techniques, like t-tests, analysis of variance and linear regression. Previous experience with R is not needed.

#### Content of Multilevel Analyse, Longitudinale Data-analyse en Mixed Models

Lesson 1

• Introduction in R and working with RStudio
Reading your data, missing values, continuous data vs. categorical data, graphs: histograms, barcharts, scatterplots, t-tests for two samples, analysis of variance, linear regression, installing R packages
• Introduction in Multilevel Analysis
When to use it, literature, why multilevel data need a special model, examples of multilevel data, intraclass correlation, effective sample size, hypothesis testing when ignoring the multilevel structure of data, the most simple model: null model, fixed and random means, model equation and notation, between and within variance, running the null model in R on a fictitious example, interpret model results and calculate intraclass correlation

Lesson 2

• Multilevel models with predictors
Revisiting the null model, add a predictor with a fixed or random effect, apply model on real data: a fixed effect predictor and a random effect predictor, interpret the results, interpret the correlation of random slope and intercept, add and interpret interaction of continuous/categorical predictors, cross-level interaction

Lesson 3

• Extensions of the multilevel model
Within vs. between effect of a predictor and make within and between variable with R, test the difference of within and between effect, relevance of within effect for longitudinal data, testing a fixed vs. random effect of a predictor: deviance, more powerful Snijders & Bosker tests, when to use deviance tests more in general, posterior means vs. ordinary means
• Multilevel models for longitudinal data: introduction
Traditional repeated measures ANOVA, as in SPSS, on wide data, multilevel model on long data, compound symmetry, predicted variance and correlation over time

Lesson 4

• Multilevel models for longitudinal data: continued
Modelling different variances and correlations over time: multilevel growth model MGM, advantages of the MGM, fixed vs. varying occasions, elaborate example of varying occasions, within vs. between effect, another way to model correlated data: GLM, different kinds of variance/correlation structures over time, comparing the fit of different variance/correlation structures for your data and how to decide which structure to choose

Lesson 5

• Multilevel model for binary data: logistic regression
Problem with binary 0/1 data and linear model, the logistic regression equation, interpret odds ratios for continuous and categorical predictors, logistic regression for “true” proportions, the multilevel logistic model, underlying latent scale viewpoint, intraclass correlation

Lesson 6

• Odds and ends of logistic regression
The problem of omitted variables, interpretation of predictor-effects in terms of probabilities instead of odds: average marginal effect and average partial effect, predicted probability for an “average” person vs. population averaged prediction, generalized estimating equations GEE instead of the multilevel model

Lesson 7

• A model for survival data: Cox regression
Why survival data need a special model, the problem of censored data, hazard rate and survival probability, the Cox regression model, interpretation of the regression coefficients, the assumption of proportional hazards and how to check if it holds, using time-varying predictors, model for competing events.

#### Lecturer

Ben Pelzer was an assistant professor at Radboud University, where he obtained his PhD in 2006 on a statistical model for repeated cross sections data. He gave basic and advanced courses in the field of statistics during many years. As a statistical consultant, he was involved in research in the fields of statistics, sociology, medicine, history, demography, communication science, education, political science and management and he (co)authored publications in all these areas. He also contributed to statistical software packages in R.