Language Variation and Change

Research Article

Models, forests, and trees of York English: Was/were variation as a case study for statistical practice

Sali A. Tagliamontea1 and R. Harald Baayena2

a1 University of Toronto

a2 University of Tübingen and University of Alberta

Abstract

What is the explanation for vigorous variation between was and were in plural existential constructions, and what is the optimal tool for analyzing it? Previous studies of this phenomenon have used the variable rule program, a generalized linear model; however, recent developments in statistics have introduced new tools, including mixed-effects models, random forests, and conditional inference trees that may open additional possibilities for data exploration, analysis, and interpretation. In a step-by-step demonstration, we show how this well-known variable benefits from these complementary techniques. Mixed-effects models provide a principled way of assessing the importance of random-effect factors such as the individuals in the sample. Random forests provide information about the importance of predictors, whether factorial or continuous, and do so also for unbalanced designs with high multicollinearity, cases for which the family of linear models is less appropriate. Conditional inference trees straightforwardly visualize how multiple predictors operate in tandem. Taken together, the results confirm that polarity, distance from verb to plural element, and the nature of the DP are significant predictors. Ongoing linguistic change and social reallocation via morphologization are operational. Furthermore, the results make predictions that can be tested in future research. We conclude that variationist research can be substantially enriched by an expanded tool kit.

Footnotes

This paper grew out of a discussion at a workshop held at New Ways of Analyzing Variation 38 in Ottawa, Canada, in October 2009, titled “Using Statistical Tools to Explain Linguistic Variation” (Tagliamonte, 2009). The workshop brought together leading proponents of a range of different statistical tools and methods in order to exchange views. This final version of the paper benefited from the critical eye of three astute Language Variation and Change reviewers, as well as detailed comments from Alexandra D'Arcy. We thank everyone for his or her input.