Analyzing Linguistic Variation

John C. Paolillo CSLI Publications, Stanford, CA (2002)


These web pages are intended to accompany the above book, providing additional information about available software, as well as data files for the analyses presented in the book, errata and other useful information.

Site Contents

Here is a rough outline of what is contained in this site. As materials are added, the structure of this site may change.

  • Software links, including:
    • Versions of VARBRUL
    • R-Varb, an R source file supporting variationist analysis
    • GLMStat, a very easy-to-use program for running Generalized Linear Models.
  • Data Files
    • ds.tok, ds.cel The Department Store study of (r) in NYC English (Labov 1972)
    • cs.cel (r) spirantization in Panamanian Spanish (Cedergren and Sankoff 1972)
    • Gender bias in syntax textbooks (Macaulay and Brice 1997)
    • anttila.cel Finnish genitive plural weakening (Anttila 1997)
  • Example Analyses
    • Logistic regression
    • Stepwise regression modeling
    • Handling Interactions


Analyzing Linguistic Variation: Statistical Models and Methods explains the methodological and statistical bases for the approach to linguistic analysis known as Variationism. This approach is most closely identified with William Labov, whose pioneering work of the 1960's and 1970's energized a generation of socially-interested linguistic scholars. This apporach has its greatest number of practicioners in the field of sociolinguistics, but is has attracted others from fields such as historical linguistics, language acquisition and dialectology as well.

The variationist approach makes extensive use of logistic regression as a tool for quantitative analysis. This represents the earliest and most thorough-going application of Generalized Linear Models to linguistic data. GLMs are coming to enjoy greater use in computational linguistics, so the experience and findings of variationist sociolinguistics are quite relevant to the forefront of linguistic research today.

The book Analyzing Linguistic Variation documents the statistical theory behind the variationist approach, and explains its relation to theory-building in linguistics more generally. The book makes extensive use of examples of variationist data drawn from actual research studies to illustrate the principles of constructing models using logistic regression and relating them to theoretical linguistic models. Sections of the book also explain the relation of logistic regression to more basic techniques such as chi-tests, as well as to other, alternative GLM-type models.

This website is not meant to substitute for the book, so it will be most useful to people who already have a copy of the book. Others are welcome to browse and otherwise make use of this material.

