Dr. Santiago Barreda's New Book on Bayesian Methods

Interviewing Dr. Barreda Along With Dr. Noah Silbert

by Nick Aoki, Cluster on Language Research
December 06, 2022

Introduction:

Dr. Santiago Barreda is an Associate Professor of Linguistics at UC Davis and will be publishing a book in Spring 2023, along with Dr. Noah Silbert! The book is entitled “Bayesian Multilevel Models for Repeated Measures Data: A Practical and Conceptual Introduction in R”.

We interviewed Dr. Barreda to learn more about his research and about the content of his new book. The online version of the book can be viewed here: https://santiagobarreda.github.io/bmmrmd/.

Interview

1. What does your research focus on?

My research focuses on some of the most basic questions in linguistics: How do we map physical events (sounds) to internal, phenomenological qualities like vowel height and frontness, which define the linguistic/phonetic signal communicated between speakers. I am also interested in how people determine what kind of person they are listening to, the age, gender, dialect, size, of the speaker, and so on. I also like to develop and investigate the methods used by linguists. I think linguists have a tendency to apply methods as heuristics or to see them as ‘objective’, and that we need to worry more about the consequences of the methods we apply and about the ideologies that underlie many of our conventional approaches to linguistic research problems.

2. Your book “presents an introduction to the statistical analysis of repeated measures data using Bayesian multilevel regression models”. What are Bayesian multilevel regression models and what are repeated measures data? Are these methods only used by linguists or can they be employed by researchers in other fields?

Multilevel regression models (aka hierarchical models, mixed effects models) are regression models that include both ‘fixed’ and ‘random’ effects. A full description is more complicated than this, but we can say that ‘fixed’ effects are selected arbitrarily by the researcher, while random effects are randomly sampled from a population. For example, in an experiment testing which cookie is most delicious, I arbitrarily select some cookies to taste (fixed effect) but randomly sample a small set of humans from the population to taste them. Traditional regression models only include a single random variable (i.e., the error), meaning they can only handle a single random effect and can only estimate a single variance. Repeated measures data is data where you have multiple data points from each different ‘source’ of your data. This kind of data is ubiquitous in linguistics, but is also common in all kinds of fields. Repeated measures data naturally leads to ‘random effects’, which means that it often needs to be analyzed with multilevel models. As a result, knowledge of multilevel models is useful in almost any field. I will leave a discussion of the ‘Bayesian’ aspect for a bit later on.

3. Many books have been published about statistical methods in the social sciences. What differentiates this book from others that have been published before?

Three things make this book unique as far as I can tell. First, the book is about Bayesian methods, and in particular deals with some advancements from the last few years that makes these models very accessible to the average researcher. Second, the book is entirely focused on repeated measures data. Although linguists (and researchers in many other fields) deal with this sort of data almost exclusively, often this topic is either not discussed or left until the last 1-2 chapters of a book. I think the idea is that you should spend 1-2 years learning all about models you’ll basically never use before you can learn about the models you will actually need for your work. In this book I try to just skip to the models people need for their work. Third, the book is written at a (basically) introductory level. As far as I can tell, there are few introductory books about multilevel models, fewer for repeated measures data, and even fewer that talk about any of this from a Bayesian perspective.

4. Linguists frequently use mixed-effects models, but fit these using frequentist, instead of Bayesian, approaches. What makes Bayesian models different from frequentist models, and what prompted you to start employing Bayesian models in your own work?

The distinction between ‘Bayesian’ and ‘frequentist’ is a bit overblown, especially since some modern frequentist approaches (such as the lmer function from the lme4 R package) behave in a ‘half-way Bayesian’ manner. In the formal sense a Bayesian model means that all your parameters are treated as variables and given prior probability distributions. However, researchers often don’t care about that aspect of it too much and it often has very little effect on our models. The more useful part is that Bayesian models are usually estimated using very flexible sampling methods that allow researchers to easily do all sorts of things that were once very complicated. I compare it to learning to be a carpenter and building your own furniture, as opposed to going to IKEA and buying some pre-built furniture (as with ‘off-the-shelf’ frequentist models). For example, with Bayesian models you can easily fit complicated models without ‘convergence’ issues, use t-distributed errors to increase model robustness, and easily fit heteroscedastic (mixture) models and more esoteric models such as multinomial and ordinal regression models. All of this is done within a single framework that makes learning to fit a new kind of model very easy.

5. What advice do you have for students who are interested in learning more about Bayesian methods or about statistics in general?

Being good at statistics is not about being “smart” or being “good at math”, it is about practice and a sincere desire to learn and use statistics. I think statistical knowledge is very much like speaking a foreign language or playing the piano. You don’t take a piano class once or read a piano book once and then decide you’re “bad at the piano” because you can’t play a song on your first try, or 6 months after the class without ever practicing it. Despite this, students might take a statistics class, never look at the material again or practice, and then think they are ‘bad at statistics’ because they can’t remember or use what they learned. You need to be willing to play a song badly many times before you can play it well. In the same way, if you want to get good at statistics you need to be prepared to not really understand what you’re doing, to make mistakes, and to practice using statistics over and over. After some time and with enough repetition, things will make sense but it won’t be clear when or how it happened (much like learning a foreign language!).