Skip to main content

Fixed-Effects Regression Modeling

  • Chapter
  • First Online:
A Practical Handbook of Corpus Linguistics

Abstract

This chapter presents fixed-effects regression modeling as a family of methods that describe a dependent variable in terms of one or more independent variables. The chapter focuses on multiple linear regression and on binomial logistic regression, discussing examples of regression analyses on the basis of corpus-linguistic data. The chapter offers descriptions of published studies that have used these methods. Besides explaining the fundamental notions and assumptions of different types of regression, the chapter also illustrates practical aspects of applying regression analyses through the use of artificially created data sets. In order to give readers a practical introduction to regression modeling, it is shown how manipulations of the underlying data result in different outcomes of the respective analyses. The chapter further discusses how the results of regression analyses can be usefully visualized. A selection of resources for further reading is included to offer readers a starting point for further study of regression modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    See Gries (2013:28) for a general discussion of p-values.

  2. 2.

    This is just but one possible regression model for binary outcomes (Agresti and Kateri 2011). Furthermore, it might appear as a bit arbitrary that the response variable is the log odds-ratio rather than a more “intuitive” measure (like the arithmetic difference of probabilities or so.) The reason is that this particular model can be analyzed (and computationally treated) within the framework of generalized linear models (Gelman and Hill 2006).

  3. 3.

    If there is a natural ordering of the response variable (e.g. we assume that excellent > perfect > great in terms of their valency) then ordinal regression might be a more informative tool (Harrell 2015).

  4. 4.

    We acknowledge that the artificially created data set suffers from overdispersion, i.e. the variation is higher than statistically expected. We do not discuss this problem or its consequences here, as that would go beyond the intended scope of the chapter.

References

  • Agresti, A., & Kateri, M. (2011). Categorical data analysis. Berlin: Springer.

    Google Scholar 

  • Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In G. Boume, I. Kraemer, & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 69–94). Amsterdam: Royal Netherlands Academy of Science.

    Google Scholar 

  • Bybee, J., & Scheibman, J. (1999). The effect of usage on degrees of constituency: The reduction of don’t in English. Linguistics, 37(4), 575–596.

    Article  Google Scholar 

  • Diessel, H. (2009). Iconicity of sequence. A corpus-based analysis of the positioning of temporal adverbial clauses in English. Cognitive Linguistics, 19, 457–482.

    Google Scholar 

  • Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Gries, S. T. (2013). Statistics for linguistics with R: A practical introduction (2nd ed.). Berlin: De Gruyter.

    Book  Google Scholar 

  • Harrell, F. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Berlin: Springer.

    Book  Google Scholar 

  • Hilpert, M. (2008). The English comparative – Language structure and language use. English Language and Linguistics, 12(3), 395–417.

    Article  Google Scholar 

  • Hilpert, M. (2013). Constructional change in English: Developments in allomorphy, word formation, and syntax. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Szmrecsanyi, B. (2005). Language users as creatures of habit: A corpus-linguistic analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory, 1(1), 113–150.

    Article  Google Scholar 

  • Tily, H., Gahl, S., Arnon, I., Kothari, A., Snider, N., & Bresnan, J. (2009). Syntactic probabilities affect pronunciation variation in spontaneous speech. Language and Cognition, 1(1), 47–165.

    Google Scholar 

  • Wallis, S. A., Aarts, B., Ozon, G., & Kavalova, Y. (2006). The Diachronic Corpus of Present-Day Spoken English (DCPSE). London: Survey of English Usage.

    Google Scholar 

  • Westin, I. (2002). Language change in English newspaper editorials. Amsterdam: Rodopi.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Hilpert .

Editor information

Editors and Affiliations

1 Electronic Supplementary Materials

Further Reading

Further Reading

Baayen, R.H. 2008. Analyzing Linguistic Data: A Practical Introduction to Statistics . Cambridge University Press, Cambridge.

Baayen’s introduction to statistics for linguistics is a cornerstone reference that not only covers the regression techniques that were discussed in this chapter, but also offers a more thorough discussion of nonlinear predictors, model criticism and validation, as well as regression with breakpoints.

Gries, S.T. 2013. Statistics for Linguistics with R: A Practical Introduction . Second edition. De Gruyter, Berlin.

Gries’s book includes several chapters on regression techniques that are illustrated with concrete data sets. The chapters gradually build up in complexity, presenting analyses with binary, categorical, and numerical predictors. The chapters further offer insights into ordinal, multinomial, and Poisson regression.

Levshina, N. 2015. How to do Linguistics with R: Data exploration and statistical analysis . John Benjamins, Amsterdam.

Levshina’s book offers a particularly learner-friendly discussion of regression techniques. The book presents these methods as ways of approaching linguistic questions, and it places emphasis on data visualization. The examples that are covered include an application of logistic regression to the choice between the Dutch causative constructions with doen and laten.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hilpert, M., Blasi, D.E. (2020). Fixed-Effects Regression Modeling. In: Paquot, M., Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_21

Download citation

Publish with us

Policies and ethics