Abstract
This chapter presents statistical methods and issues commonly encountered in the design and analysis of substance abuse research studies. It begins with a general discussion contrasting primary and secondary data analysis, followed by an overview of study design from the perspective of the conduct of primary research, including hypothesis and planned analysis specification, sampling schemes, and power analysis. Next, descriptions of study characteristics are included from the perspective of secondary analysis, paying particular attention to characteristics that need to be considered when determining appropriate analytic methods and interpreting results. Statistical methods reviewed include: various types of regression (linear regression, logistic regression, survival analysis), related topics, such as moderators and mediators, as well as multilevel models (for longitudinal or clustered observations), and latent variable modeling techniques, including structural equation modeling, latent class analysis, latent transition analysis, and growth mixture modeling. Finally, overviews of four major special topics particularly important when using secondary data are provided, which include: multiplicity of hypotheses, combining data and results from multiple studies, missing data, and propensity scores. Where helpful, concepts and methods are illustrated using practical examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: Wiley.
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.
Arbuckle, J. L. (2006). Amos (version 7.0) [computer program]. Chicago, IL: SPSS.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 2, 238–246.
Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software.
Bentler, P. M. (2000–2008). EQS 6 structural equations program manual. Encino, CA: Multivariate Software, Inc.
Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719–725.
Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, 29, 451–457.
Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley.
Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation approach (Wiley series on probability and mathematical statistics). Hoboken, NJ: Wiley.
Browne, M. W. (1974). Generalized least squares estimators in the analysis of covariance structures. South African Statistical Journal, 8, 1–24.
Browne, M. W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.
Celeux, G., & Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13, 195–212.
Chou, C.-P., & Bentler, P. M. (1990). Model modification in covariance structure modeling: A comparison among likelihood ratio, lagrange multiplier, and Wald tests. Multivariate Behavioral Research, 25, 115–136.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Collins, L. M. (2006). Analysis of longitudinal data: The integration of theoretical model, temporal design and statistical model. Annual Review of Psychology, 57, 505–528.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34(2), 187–220.
D’Agostino, R. (1998). Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine, 17, 2265–2281.
Dang, H. D. (2011). A latent transition analysis of self-efficacy among men treated for cocaine dependence (doctoral dissertation). Available from ProQuest dissertations and theses database (UMI No. 3472617).
Deeks, J. J., Higgins, J. P. T., & Altman, D. G. (2011). Chapter 9: Analysing data and undertaking meta-analyses. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions, version 5.1.0 (updated March 2011). London, UK: The Cochrane Collaboration. Available from www.cochrane-handbook.org
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series. B, 39, 1–38.
DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7, 177–188.
Eliason, S. (1997). The categorical data analysis system. Version 4.0 of MLLSA. Iowa City, IA: University of Iowa.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.
Feng, W., Jun, Y., & Xu, R. A (2006). Method/macro based on propensity score and Mahalanobis distance to reduce bias in treatment comparison in observational study. SAS Technical Report, paper PR05, pp. 1–11.
Friedman, L. M., Furberg, C. D., & DeMets, D. L. (2010). Fundamentals of clinical trials (4th ed.). New York, NY: Springer.
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8.
Green, S. B., Thompson, M. S., & Babyak, M. A. (1998). A Monte Carlo investigation of methods for controlling type I errors with specification searches in structural equation modeling. Multivariate Behavioral Research, 33, 365–384.
Guo, S., & Fraser, M. W. (2010). Propensity score analysis: Statistical methods and application. Thousand Oaks, CA: Sage Publications.
Heitjan, F., & Little, R. J. A. (1991). Multiple imputation for the fatal accident reporting system. Applied Statistics, 40, 13–29.
Higgins, J. P. T., & Green, S. (Eds.). (2011). Cochrane handbook for systematic reviews of interventions version 5.1.0 (updated March 2011). London, UK: The Cochrane Collaboration. Available from www.cochrane-handbook.org
Homburg, C., & Dobartz, A. (1992). Covariance structure analysis via specification searches. Statistical Papers, 33(1), 119–142.
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. New York, NY: Wiley.
Hser, Y.-I., Evans, E., Huang, Y., & Anglin, M. D. (2004). Relationship between drug treatment services, retention and outcomes. Psychiatric Services, 55(7), 767–774.
Hser, Y.-I., Evans, E., Huang, D., & Messina, N. (2011). Long-term outcomes among drug-dependent mothers treated in women-only versus mixed-gender programs. Journal of Substance Abuse Treatment, 41(2), 115–123.
Hser, Y.-I., Huang, D., Chou, C.-P., & Anglin, M. D. (2007). Trajectories of heroin addiction: Growth mixture modeling results based on a 33-year follow-up study. Evaluation Review, 31(6), 548–563.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
Hussong, A. M., Curran, P. J., & Bauer, D. J. (2013). Integrative data analysis in clinical psychology research. Annual Review of Clinical Psychology, 9, 61–89.
Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussions). Statistical Science, 22, 523–539.
Jones, B. L., Nagin, D. S., & Roeder, K. (2001). A SAS procedure based on mixture models for estimating developmental trajectories. Sociological Methods and Research, 29, 374–393.
Jöreskog, K. G., & Sörbom, D. (2006). LISREL 8.8 for Windows [computer software]. Skokie, IL: Scientific Software International, Inc.
Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data (2nd ed.). Hoboken, NJ: Wiley.
Klein, J. P., & Moeschberger, M. L. (2003). Survival analysis: Techniques for censored and truncated data (2nd ed.). Hoboken, NJ: Springer.
Kline, R. B. (1998). Principles and practice of structural equation modeling. New York, NY: Guilford Press.
Lanza, S. T., Dziak, J. J., Huang, L., Wagner, A., & Collins, L. M. (2013). PROC LCA and PROC LTA Users’ guide (version 1.3.0). University Park, PA: The Methodology Center, Penn State.
Li, L., & Hser, Y.-I. (2011). On inclusion of covariates for class enumeration of growth mixture models. Multivariate Behavioral Research, 46(2), 266–302.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York, NY: Wiley.
Lo, Y., Mendell, N., & Rubin, D. (2001). Testing the number of components in a normal mixture. Biometrika, 88, 767–778.
MacCallum, R. C. (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 100, 107–120.
MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Reviews in Psychology, 51, 201–226.
MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111, 490–504.
McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics, 36, 318–324.
McLellan, A. T., Kushner, H., Metzger, D., Peters, R., Smith, I., Grissom, G., et al. (1992). The fifth edition of the addiction severity index. Journal of Substance Abuse Treatment, 9(3), 199–213.
Muthén, B. O. (2003). Statistical and substantive checking in growth mixture modeling: Comment on Bauer and Curran (2003). Psychological Methods, 8, 369–377.
Muthén, B. O. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Ed.), The Sage Handbook of Quantitative Methodology for the Social Sciences (pp. 345–368). Thousand Oaks, CA: Sage Publications.
Muthén, B., & Muthén, L. (2000). The development of heavy drinking and alcohol-related problems from ages 18 to 37 in a U. S. National sample. Journal of Studies on Alcohol, 61(2), 290–300.
Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599–620.
Muthén, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55(2), 463–469.
Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus user’s guide (7th Ed.). Los Angeles, CA: Muthén and Muthén.
Nagin, D. S. (1999). Analyzing developmental trajectories: A semiparametric group-based approach. Psychological Methods, 4(2), 139–157.
Nagin, D. S., & Tremblay, R. E. (2001). Analyzing developmental trajectories of distinct but related behaviors: A group-based method. Psychological Methods, 6, 18–34.
Neale, M. C., Boker, S. M., Xie, G., & Maes, H. H. (2003). Mx: Statistical modeling (6th ed.). Richmond, VA: Department of Psychiatry.
Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14, 535–569.
Peugh, J., & Fan, X. (2012). How well does growth mixture modeling identify heterogeneous growth trajectories? A simulation study examining GMM’s performance characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 19, 204–226.
Riley, R. D., Lambert, P. C., & Abo-Zaid, G. (2010). Meta-analysis of individual participant data: Rationale, conduct, and reporting. British Medical Journal, 340(7745), 521–525.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.
Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern epidemiology (3rd ed.). Philadelphia, PA: Lippincott, Williams & Wilkins.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
Rubin, D. B. (1997). Estimating causal effects from large data sets using propensity scores. Annals of Internal Medicine, 127, 757–763.
Rubin, D. B., & Thomas, N. (1996). Matching using estimated propensity scores: Relating theory to practice. Biometrics, 52, 249–264.
SAS Institute Inc. (2013). SAS/STAT® 13.1 user’s guide. Cary, NC: SAS Institute Inc.
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. American Statistical Association 1988 proceedings of the Business and Economics Sections (pp. 308–313). Alexandria, VA: American Statistical Association.
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York, NY: Chapman and Hall.
Schenker, N., & Taylor, J. M. G. (1996). Partially parametric techniques for multiple imputation. Computational Statistics and Data Analysis, 22, 425–446.
Schwartz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Sclove, L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333–343.
Smith, A. K., Ayanian, J. Z., Covinsky, K. E., Landon, B. E., McCarthy, E. P., Wee, C. C., et al. (2011). Conducting high-value secondary dataset analysis: an introductory guide and resources. Journal of General Internal Medicine, 26(8), 920–929.
Sörbom, D. (1989). Model modification. Psychometrika, 54, 371–384.
Steiger, J. H., & Lind, J. C. (1980). Statistically-based tests for the number of common factors. Paper presented at the Annual Meeting of the Psychometric Society, Iowa City, IA.
Stewart, L. A., & Tierney, J. F. (2002). To IPD or not to IPD?: Advantages and disadvantages of systematic reviews using individual patient data. Evaluation and the Health Professions, 25(1), 76–97.
Stewart, L. A., Tierney, J. F., & Clarke, M. (2011). Reviews of individual patient data. In J. P. T. Higgins & S. Green (Eds.), Cochrane handbook for systematic reviews of interventions (version 5.1.0) [updated March 2011]. London, UK: The Cochrane Collaboration, 2011. Available from www.cochrane-handbook.org
Tofighi, D., & Enders, C. K. (2007). Identifying the correct number of classes in growth mixture models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture models (pp. 317–341). Charlotte, NC: Information Age.
Tucker, L. R., & Lewis, C. (1973). The reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.
Vermunt, J. K. (1997). LEM 1.0: A general program for the analysis of categorical data. Tilburg, NL: Tilburg University.
Vermunt, J. K. (2004). Latent Markov Model. In M. S. Lewis-Beck, A. Bryman, & T. F. Liao (Eds.), The sage encyclopedia of social science research methods (pp. 553–554). Thousand Oaks, CA: Sage Publications.
Vermunt, J. K., & Magidson, J. (2013). Latent GOLD 5.0 upgrade manual. Belmont, MA: Statistical Innovations Inc.
Von Davier, M. (1997). WINMIRA program description and recent enhancements. Methods of Psychological Research Online, 2, 25–28.
Weiss, R. E. (2005). Modeling longitudinal data. New York, NY: Springer.
Weston, R., & Gore, P. A., Jr. (2006). A brief guide to structural equation modeling. The Counseling Psychologist, 34, 719–751.
Willett, J. B., & Singer, J. D. (1993). Investigating onset, cessation, relapse, and recovery: Why you should, and how you can, use discrete-time survival analysis to examine event occurrence. Journal of Consulting and Clinical Psychology, 61(6), 952–965.
Yang, C. (2006). Evaluating latent class analyses in qualitative phenotype identification. Computational Statistics and Data Analysis, 50, 1090–1104.
Ye, Y., & Kaskutas, L. A. (2008). Using propensity scores to adjust for bias when assessing the effectiveness of Alcoholics anonymous in observational studies. Drug and Alcohol Dependence, 104, 56–64.
Acknowledgements
The writing of this chapter was supported by the National Institute on Drug Abuse, Center for Advancing Longitudinal Drug Abuse Research (CALDAR, P30 DA016383, PI: Hser).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
King, A., Li, L., Hser, YI. (2017). Common Statistical Methods for Primary and Secondary Analysis in Substance Abuse Research. In: VanGeest, J., Johnson, T., Alemagno, S. (eds) Research Methods in the Study of Substance Abuse. Springer, Cham. https://doi.org/10.1007/978-3-319-55980-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-55980-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55978-0
Online ISBN: 978-3-319-55980-3
eBook Packages: Social SciencesSocial Sciences (R0)