Skip to main content

Introduction to Social Network Analysis

  • Living reference work entry
  • First Online:
Methods in Health Services Research

Part of the book series: Health Services Research ((HEALTHSR))

  • 369 Accesses

Abstract

This chapter introduces statistical methods used in the analysis of social networks and in the rapidly evolving parallel-field of network science. Although several instances of social network analysis in health services research have appeared recently, the majority involve only the most basic methods and thus scratch the surface of what might be accomplished. Cutting-edge methods using relevant examples and illustrations in health services research are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  • Airoldi EM, Fienberg SE, Xing EP. Mixed membership stochastic blockmodels. J Mach Learn Res. 2008;9:1981–2014.

    PubMed  PubMed Central  Google Scholar 

  • Anselin L. Spatial econometrics: methods and models. Dordrecht: Kluwer; 1988.

    Book  Google Scholar 

  • Barabasi A-L, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–12. http://www.sciencemag.org/content/286/5439/509.abstract

  • Barabasi A-L, Albert R, Jeong H. Mean-field theory for scale-free random networks. Phys A Stat Mech Appl. 1999;272:173–87. http://www.sciencedirect.com/science/article/pii/S0378437199002915 .

    Article  Google Scholar 

  • Barnett ML, Landon BE, O’Malley AJ, Keating NL, Christakis NA. Mapping physician networks with self-reported and administrative data. Health Serv Res. 2011;46:1592–609.

    Article  PubMed  PubMed Central  Google Scholar 

  • Barnett ML, Christakis NA, O’Malley AJ, Onnela J-P, Keating NL, Landon BE. Physician patient-sharing networks and the cost and intensity of care in US hospitals. Med Care. 2012a;50:152–60.

    Article  PubMed  PubMed Central  Google Scholar 

  • Barnett ML, Keating NL, Christakis NA, O’Malley AJ, Landon BE. Reasons for referral among primary care and specialist physicians. J Gen Intern Med. 2012b;27:506–12.

    Article  PubMed  Google Scholar 

  • Berkman L, Glass T. Social integration, social methods, social support, and health. In: Social epidemiology. New York: Oxford University Press; 2000. p. 137–73.

    Google Scholar 

  • Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas A. Models of social networks based on social distance attachment. Phys Rev E. 2004;70:056122. doi:10.1103/PhysRevE.70.056122.

    Article  CAS  Google Scholar 

  • Bonacich P. Power and centrality: a family of measures. Am J Sociol. 1987;92:1170–82.

    Article  Google Scholar 

  • Borgatti S, Everett M. Network analysis of 2-mode data. Soc Networks. 1997;19:243–69.

    Article  Google Scholar 

  • Breiger R. The duality of persons and groups. Soc Forces. 1974;53:181–90.

    Article  Google Scholar 

  • Cartwright D, Harrary F. A generalization of Heider’s theory. Psychol Rev. 1956;63:277–92.

    Article  CAS  PubMed  Google Scholar 

  • Centola D. Failure in complex social networks. Math Sociol. 2009;33:64–8.

    Article  Google Scholar 

  • Choi D, Wolfe P, Airoldi E. Stochastic blockmodels with growing number of classes. Arxiv preprint. 2010;arXiv:1011.4644.

    Google Scholar 

  • Christakis N, Fowler J. The spread of obesity in a large social network over 32 years. N Engl J Med. 2007;357:370–9.

    Article  CAS  PubMed  Google Scholar 

  • Christakis NA, Fowler JH. Social contagion theory: examining dynamic social networks and human behavior. Stat Med. 2013;32:556–77.

    Article  PubMed  Google Scholar 

  • Coleman J, Katz E, Menzel H. The diffusion of innovations among physicians. Sociometry. 1957;20:253–70.

    Article  Google Scholar 

  • Coleman J, Katz E, et al. Medical innovation: a diffusion study. Indianapolis: Bobbs-Merrill; 1966.

    Google Scholar 

  • Davidsen J, Ebel H, Bornholdt S. Emergence of a small world from local interactions: modeling acquaintance networks. Phys Rev Lett. 2002;88:128701. doi:10.1103/PhysRevLett.88.128701.

    Article  PubMed  CAS  Google Scholar 

  • Dorogovtsev SN, Mendes JFF, Samukhin AN. Structure of growing networks with preferential linking. Phys Rev Lett. 2000;85:4633–6. doi:10.1103/PhysRevLett.85.4633.

    Article  CAS  PubMed  Google Scholar 

  • Duijn MV, Snijders TAB, Zijlstra B. P2: a random effects model with covariates for directed graphs. Statistica Neerlandica. 2004;58:234–54.

    Article  Google Scholar 

  • Erdős P, Rényi A. Random graphs. Publ Math. 1959;6:290–7.

    Google Scholar 

  • Faust K. Centrality in affliation networks. Soc Networks. 1997;19:157–91.

    Article  Google Scholar 

  • Feller W. An introduction to probability theory and its applications, vol. 2. New York: Wiley; 1966.

    Google Scholar 

  • Festinger L. The analysis of sociograms using matrix algebra. Hum Relat. 1949;2:153–8.

    Article  Google Scholar 

  • Fineberg S, Wasserman S. Categorical data analysis of single sociometric relations. In: Sociological methodology. New Jersey: Jossey-Bass; 1981. p. 156–92.

    Google Scholar 

  • Fletcher JM. Social interactions and smoking: evidence using multiple student cohorts, instrumental variables, and school fixed effects. Health Econ. 2008;19:466–84.

    Article  Google Scholar 

  • Fletcher JM, Lehrer SF. The effect of adolescent health on educational outcomes: causal evidence using genetic lotteries between siblings. Canadian labor market and skills researcher network, working paper no. 32. 2009.

    Google Scholar 

  • Fortunato S. Community detection in graphs. Phys Reports. 2010;486:75–174.

    Article  Google Scholar 

  • Frank O, Strauss D. Markov graphs. J Am Stat Assoc. 1986;81:832–42.

    Article  Google Scholar 

  • Freeman L. Centrality in social networks, I. Conceptual clarification. Soc Networks. 1979;1:215–39.

    Article  Google Scholar 

  • Freeman L. The development of social network analysis: a study in the sociology of science. Vancouver: Empirical Press; 2004.

    Google Scholar 

  • Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabasi A-L. The human disease network. Proc Natl Acad Sci. 2007;104:8685–90. http://www.pnas.org/content/104/21/8685.abstract

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Goldenberg A, Zheng AX, Fineberg SE, Airoldi EM. A survey of statistical network models. Found Trends Mach Learn. 2009;2:129–233.

    Article  Google Scholar 

  • Goodreau S. Advances in exponential random graph (p*) models applied to a large social network. Soc Networks. 2007;29:231–48.

    Article  PubMed  PubMed Central  Google Scholar 

  • Granovetter MS. The strength of weak ties. Am J Sociol. 1973;78:1360–80.

    Article  Google Scholar 

  • Guimera R, Nunes Amaral LA. Functional cartography of complex metabolic networks. Nature. 2005;433:895–900.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Haines V, Hurlbert J. Network range and health. J Health Soc Behav. 1992;33:254–66.

    Article  CAS  PubMed  Google Scholar 

  • Handcock MS, Robins GL, Snijders TAB, Moody J, Besag J. Assessing degeneracy in statistical models of social networks. J Am Stat Assoc. 2003;76:33–50.

    Google Scholar 

  • Handcock M, Raftery A, Tantrum J. Model-based clustering for social networks. J Roy Stat Soc A. 2007;170:301–54.

    Article  Google Scholar 

  • Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M. ergm: A package to fit, simulate and diagnose exponential-family models for networks, http://CRAN.R-project.org/package=ergm. Version 2.2-6. 2010. Project home page at http://statnetproject.org

  • Hanneke S, Fu W, Xing EP. Discrete temporal models of social networks. Electron J Stat. 2010;4:585–605.

    Article  Google Scholar 

  • Harary F. On the notion of balance of a signed graph. Mich Math J. 1953;2:143–6.

    Article  Google Scholar 

  • Harary F. The number of linear, directed rooted and connected graphs. Trans Am Math Soc. 1955;78:445–63.

    Article  Google Scholar 

  • Heider F. Attitudes and cognitive orientation. J Psychol. 1946;21:107–12.

    Article  CAS  PubMed  Google Scholar 

  • Hidalgo CA, Blumm N, Barabasi A-L, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS Comput Biol. 2009;5:e1000353. doi:10.1371/journal.pcbi.1000353.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Hoff PD. Bilinear mixed effects models for dyadic data. J Am Stat Assoc. 2005;100:286–95.

    Article  CAS  Google Scholar 

  • Hoff P. Modeling homophily and stochastic equivalence in symmetric relational data. In: Advances in neural information processing systems, vol. 20. Cambridge, MA: MIT Press; 2008. p. 657–64.

    Google Scholar 

  • Hoff PD, Raftery AE, Handcock MS. Latent space models for social networks analysis. J Am Stat Assoc. 2002;97:1090–8.

    Article  Google Scholar 

  • Holland P, Leinhardt S. An exponential family of probability-distributions for directed-graph. J Am Stat Assoc. 1981;76:33–50.

    Article  Google Scholar 

  • Holland P, Laskey K, Leinhardt S. Stochastic blockmodels: some first steps. Soc Networks. 1983;5:109–37.

    Article  Google Scholar 

  • House J, Kahn R. Measures and concepts of social support. In: Social support and health. Orlando: Academic; 1985. p. 83–108.

    Google Scholar 

  • Huisman M, Van Duijn M. Software for statistical analysis of social networks. In: The Sixth International Conference on Logic and Methodology; Amsterdam: 2004.

    Google Scholar 

  • Huisman M, Van Duijn M. Software for social networks analysis. In: Models and methods in social network analysis. Cambridge: Cambridge University Press; 2005.

    Google Scholar 

  • Hunter D. Curved exponential family models for social networks. Soc Networks. 2007;29:216–30.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hunter DR, Handcock MS. Inference in curved exponential family models for networks. J Comput Graph Stat. 2006;15:565–83.

    Article  Google Scholar 

  • Iwashyna TJ, Chang VW, Zhang JX, Christakis AN. Physician social networks and variation in prostate cancer treatment in three cities. Health Serv Res. 2002;37:1531–51.

    Article  PubMed  PubMed Central  Google Scholar 

  • Karrer B, Newman MEJ. Stochastic blockmodels and community structure in networks. Phys Rev E. 2011;83:016107. doi:10.1103/PhysRevE.83.016107.

    Article  CAS  Google Scholar 

  • Katz L. On the matrix analysis of Sociometric data. Sociometry. 1947;10:233–41.

    Article  Google Scholar 

  • Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18:39–43.

    Article  Google Scholar 

  • Katz L, Powell JH. Measurement of the tendency toward reciprocation of choice. Sociometry. 1955;18:659–65.

    Article  Google Scholar 

  • Keating NL, Ayanian JZ, Cleary PD, et al. Factors affecting influential discussions among physicians: a social network analysis of a primary care practice. J Gen Intern Med. 2007;22:794–8.

    Article  PubMed  PubMed Central  Google Scholar 

  • Klovdahl A. Social networks and the spread of infectious diseases. Soc Sci Med. 1985;21:1203–16.

    Article  CAS  PubMed  Google Scholar 

  • Kossinets G, Watts DJ. Empirical analysis of an evolving social network. Science. 2006;311:88–90. http://www.sciencemag.org/content/311/5757/88.abstract

    Article  CAS  PubMed  Google Scholar 

  • Krapivsky PL, Redner S, Leyvraz F. Connectivity of growing random networks. Phys Rev Lett. 2000;85:4629–32. doi:10.1103/PhysRevLett.85.4629.

    Article  CAS  PubMed  Google Scholar 

  • Krivitsky PN. Exponential-family random graph models for valued networks. 2012. arXiv preprint, 1101.1359v2 [stat.ME] 19 Jan 2012.

    Google Scholar 

  • Krivitsky PN, Handcock MS. Fitting position latent cluster models for social networks with latentnet. J Stat Softw. 2008;24. http://statnetproject.org

  • Krivitsky PN, Handcock MS. A separable model for dynamic networks. 2010. arXiv preprint, 1011.1937v1[stat.ME].

    Google Scholar 

  • Kumpula JM, Onnela J-P, Saramäki J, Kaski K, Kertész J. Emergence of communities in weighted networks. Phys Rev Lett. 2007;99:228701. doi:10.1103/PhysRevLett.99.228701.

    Article  PubMed  CAS  Google Scholar 

  • Landon BE, Keating NL, Barnett ML, Onnela JP, Paul S, OˆaMalley AJ, Keegan T, Christakis NA. Variation in patient-sharing networks of physicians across the United States. JAMA. 2012;308:265–73.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Laumann E, Marsden P, Prensky D. The boundary specification problem in network analysis. In: Burt R, Minor M, editors. Applied network analysis: a methodological introduction. Beverly Hills: Sage; 1983. p. 18–34.

    Google Scholar 

  • Lorrain F, White H. Structural equivalence of individuals in social networks. J Math Sociol. 1971;1:49–80.

    Article  Google Scholar 

  • Lyons R. The spread of evidence-poor medicine via flawed social-network analyses. Stat Polit Policy. 2011;2:1–26.

    Google Scholar 

  • Manski CA. Identification of endogenous social effects: the reflection problem. Rev Econ Stud. 1993;60:531–42.

    Article  Google Scholar 

  • Marsden P. Network methods in social epidemiology. In: Methods in social epidemiology. New York: Jossey-Bass; 2006. p. 267–86.

    Google Scholar 

  • Marsden PV, Friedkin NE. Network studies of social influence. Sociol Methods Res. 1993;22:127–51.

    Article  Google Scholar 

  • Marsili M, Vega-Redondo F, Slanina F. The rise and fall of a networked society: a formal model. Proc Natl Acad Sci USA. 2004;101:1439–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • McPherson ML, Smith-Lovin C, et al. Birds of a feather: homophily in social networks. Annu Rev Sociol. 2001;27:415–44.

    Article  Google Scholar 

  • Moreno JL. Who shall survive? Nervous and mental disease processing. The University of Michigan, Ann Arbor; 1934.

    Google Scholar 

  • Mucha PJ, Richardson T, Macon K, Porter MA, Onnela J-P. Community structure in time-dependent, multiscale, and multiplex networks. Science. 2010;328:876–8. http://www.sciencemag.org/content/328/5980/876.abstract

    Google Scholar 

  • Newcomb TM. An approach to the study of communicative acts. Psychol Rev. 1953;60:393–404.

    Article  CAS  PubMed  Google Scholar 

  • Newman ME. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev. 2001;64:016132.

    CAS  Google Scholar 

  • Newman MEJ. Modularity and community structure in networks. Proc Natl Acad Sci. 2006;103:8577–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Newman M. Networks: an introduction. New York: Oxford University Press; 2010.

    Book  Google Scholar 

  • Newman MEJ. Communities, modules and large-scale structure in networks. Nat Phys. 2012;8:25–31.

    Article  CAS  Google Scholar 

  • Newman MEJ, Girvan M. Mixing patterns and community structure in networks. In: Pastor-Satorras R, Rubi J, Diaz-Guilera A, editors. Statistical mechanics of complex networks. Berlin: Springer; 2003.

    Google Scholar 

  • Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113. doi:10.1103/PhysRevE.69.026113.

    Article  CAS  Google Scholar 

  • Nowicki K, Snijders TAB. Estimation and prediction for stochastic blockstructures. J Am Stat Assoc. 2001;96:1077–87.

    Article  Google Scholar 

  • O’Malley AJ. The analysis of social network data: an exciting frontier for statisticians. Stat Med. 2013;32:539–55.

    Article  PubMed  Google Scholar 

  • O’Malley AJ, Christakis NA. Longitudinal analysis of large social networks: estimating the effect of health traits on changes in friendship ties. Stat Med. 2011;30:950–64.

    Article  PubMed  PubMed Central  Google Scholar 

  • O’Malley AJ, Marsden PV. The analysis of social networks. Health Serv Outcome Res Methodol. 2008;8:222–69.

    Article  Google Scholar 

  • O’Malley AJ, Arbesman S, Steiger DM, Fowler JH, Christakis NA. Egocentric social network structure, health, and pro-social behaviors in a National Panel Study of Americans. PLoS One. 2012;7:e36250. doi:10.1371/journal.pone.0036250.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Opsahl T. Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc Networks. 2011; 34. doi:10.1016/j.socnet.2011.07.001.

  • Opsahl T, Agneessens F, Skvoretz J. Node centrality in weighted networks: generalizing degree and shortest paths. Soc Networks. 2010;32:245–51.

    Article  Google Scholar 

  • Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005;435:814–8. doi:10.1038/nature03607.

    Article  CAS  PubMed  Google Scholar 

  • Paul S, O’Malley AJ. Hierarchical longitudinal models of relationships in social networks. J R Stat Soc Ser C Appl Stat. 2013;62:705–22.

    PubMed  PubMed Central  Google Scholar 

  • Pham HH, O’Malley AS, Bach PB, Saiontz-Martinez C, Schrag D. Primary care physicians’ links to other physicians through Medicare patients: the scope of care coordination. Ann Intern Med. 2009;150:236–42.

    Article  PubMed  PubMed Central  Google Scholar 

  • Piraveenan M, Prokopenko M, Zomaya AY. Assortative mixing in directed biological networks. IEEE Trans Comput Biol Bioinform. 2010;9:66–78. To appear.

    Google Scholar 

  • Pollack CE, Weissman G, Bekelman J, Liao K, Armstrong K. Physician social networks and variation in prostate cancer treatment in three cities. Health Serv Res. 2012;47:380–403.

    Article  PubMed  PubMed Central  Google Scholar 

  • Porter MA, Onnela J-P, Mucha PJ. Communities in networks. Not Am Math Soc. 2009;56(1082–1097):1164–6.

    Google Scholar 

  • Price DDS. A general theory of bibliometric and other cumulative advantage processes. J Am Soc Inf Sci. 1976;27:292–306. doi:10.1002/asi.4630270505.

    Article  Google Scholar 

  • Robins G, Pattison P, Woolcock J. Small and other worlds: global network structures from local processes. Am J Sociol. 2005;110:894–936.

    Article  Google Scholar 

  • Robins GL, Snijders TAB, Wang P, Handcock MS, Pattison PE. Recent developments in exponential random graph (p ) models for social networks. Soc Networks. 2007;29:192–215.

    Article  Google Scholar 

  • Robins GL, Pattison PE, Wang P. Closure, connectivity and degree distributions: exponential random graph (p*) models for directed social networks. Soc Networks. 2009;31:105–7.

    Article  Google Scholar 

  • Rubin D. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978;6:34–58.

    Article  Google Scholar 

  • Seidman SB. Network structure and minimum degree. Soc Networks. 1983;5:269–87.

    Article  Google Scholar 

  • Shalizi RR, Rinaldo A. Consistency under sampling of exponential random graph models. 2012. arXiv preprint. arXiv:1111.3054v3

    Google Scholar 

  • Shalizi CR, Thomas AC. Homophily and contagion are generically confounded in observational social network studies. Sociol Methods Res. 2011;40:211–39.

    Article  PubMed  PubMed Central  Google Scholar 

  • Simmel G. The sociology of Georg Simmel. New York: The Free Press; 1908.

    Google Scholar 

  • Snijders T. The degree variance: an index of graph heterogeneity. Soc Networks. 1981;3:163–74.

    Article  Google Scholar 

  • Snijders T. Stochastic actor-oriented models for network change. J Math Sociol. 1996;21:149–72.

    Article  Google Scholar 

  • Snijders TAB. The statistical evaluation of social network dynamics. In: Sociological methodology. Oxford, UK: Basil Blackwell; 2001. p. 361–95.

    Google Scholar 

  • Snijders TAB. Models for longitudinal social network data. In: Models and methods in social network analysis. Cambridge: Cambridge University Press; 2005. p. 215–47.

    Chapter  Google Scholar 

  • Snijders TAB. Statistical methods for network dynamics. In: Luchini SR et al., editors. Proceedings of the XLIII Scientific Meeting, Italian Statistical Society, Basil Blackwell, Ltd; 2006. p. 281–96

    Google Scholar 

  • de Solla Price DJ. Networks of scientific papers. Science. 1965;149:510–5. http://www.sciencemag.org/content/149/3683/510.short .

    Article  Google Scholar 

  • Steglich C, Snijders TAB, Pearson M. Dynamic networks and behavior: separating selection from influence. Sociol Methodol. 2010;40:329–93.

    Article  Google Scholar 

  • Szabo G, Barabasi AL. Network effects in service usage. 2007. Arxiv preprint. http://lanl.arxiv.org/abs/physics/0611177

  • Thompson S. Adaptive web sampling. Biometrics. 2006;62:1224–34.

    Article  PubMed  Google Scholar 

  • Thompson S, Frank O. Mode-based estimation with link-tracing sampling designs. Survey Methodol. 2000;26:87–98.

    Google Scholar 

  • Thompson S, Seber GAF. Adaptive sampling. New York: Wiley; 1996.

    Google Scholar 

  • Toivonen R, Onnela J-P, Saramäki J, Hyvönen J, Kaski K. A model for social networks. Phys A Stat Mech Appl. 2006;371:851–60. http://www.sciencedirect.com/science/article/pii/S0378437106003931

    Article  Google Scholar 

  • Traud AL, Mucha PJ, Porter MA. Social structure of Facebook networks. Phys A Stat Mech Appl. 2012;391:4165–80. http://www.sciencedirect.com/science/article/pii/S0378437111009186

    Article  Google Scholar 

  • VanderWeele TJ. Sensitivity analysis for contagion effects in social networks. Sociol Methods Res. 2011;40:240–55.

    Article  PubMed  PubMed Central  Google Scholar 

  • VanderWeele TJ, Ogburn EL, Tchetgen Tchetgen EJ. Why and when “Flawed” social network analyses still yield valid tests of no contagion. Stat Polit Policy. 2012;3:1050. doi:10.1515/2151-7509.1050.

    Google Scholar 

  • Vázquez A. Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys Rev E. 2003;67:056104. doi:10.1103/PhysRevE.67.056104.

    Article  CAS  Google Scholar 

  • Wang W, Wong G. Stochastic Blockmodels for directed graphs. J Am Stat Assoc. 1987;82:8–19.

    Article  Google Scholar 

  • Wang P, Sharpe K, Robins GL, Pattison PE. Exponential random graph (p*) models for affiliation networks. Soc Networks. 2009;31:12–25.

    Article  Google Scholar 

  • Wasserman SS, Faust K. Social network analysis: methods and applications. Cambridge: Cambridge University Press; 1994.

    Book  Google Scholar 

  • Wasserman S, Pattison P. Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p. Psychometrika. 1996;61:401–25.

    Article  Google Scholar 

  • Westveld AH, Hoff PD. A mixed effect model for longitudinal relational and network data, with applications to international trade and conflict. Ann Appl Stat. 2011;5:843–72.

    Article  Google Scholar 

  • White D, Harary F. The cohesiveness of blocks in social networks: node connectivity and conditional density. Sociol Methodol. 2001;31:305–59.

    Article  Google Scholar 

  • Wong LH, Pattison P, Robins G. A spatial model for social networks. Phys A Stat Mech Appl. 2006;360:99–120. http://www.sciencedirect.com/science/article/pii/S0378437105004334

    Article  Google Scholar 

  • Zijlstra BJH, Duijn MV, Snijders TAB. The multilevel P2 model: a random effects model for the analysis of multiple social networks. Methodology. 2006;2:42–7.

    Article  Google Scholar 

Download references

Acknowledgments

The time and effort of Dr. O’Malley and Dr. Onnela on researching and developing this chapter was supported by NIH/NIA grant P01 AG031093 and Robert Wood Johnson Award #58729. The authors thank Mischa Haider, Brian Neelon, and Bruce E Landon for reviewing an early draft of the manuscript and providing several useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. James O’Malley .

Editor information

Editors and Affiliations

Glossary of Terms

Glossary of Terms

To help readers familiar with social networks understand the network science component of the chapter and conversely for readers familiar with network science to understand the social network component, the following glossary contains a comprehensive list of terms and definitions.

Terms Used in Social Networks

  1. 1.

    Social network: A collection of actors (referred to as actors) and the (social) relationships or ties linking them.

  2. 2.

    Relationship, Tie: A link or connection between two actors.

  3. 3.

    Dyad: A pair of actors in a network and the relationship(s) between them, two relationships per measure for a directed network, one relationship per measure for an undirected network.

  4. 4.

    Triad: A triple of three actors in the network and the relationships between them.

  5. 5.

    Scale or valued relationship: A nonbinary relationship between two actors (e.g., the level of a trait). We focused on binary relationships in the chapter.

  6. 6.

    Directed network: A network in which the relationship from actor i to actor j need not be the same as that from actor j to actor i.

  7. 7.

    Nondirected network: A network in which the state of the relationship from actor i to actor j equals the state of the relationship from actor j to actor i.

  8. 8.

    Sociocentric network data: The complete set of observations on the n(n − 1) relationships in a directed network, or n(n − 1)/2 relationships in an undirected network, with n actors.

  9. 9.

    Collaboration network: A network whose ties represent the actors’ joint involvement on a task (e.g., work on a paper) or a common experience (e.g., treating the same episode of health care for a patient).

  10. 10.

    Bipartite: Relationships are only permitted between actors of two different types.

  11. 11.

    Unipartite: Relationships are permitted between all types of actors.

  12. 12.

    Social contagion, Social influence, Peer effects: Terms used to describe the phenomenon whereby an actor’s trait changes due to their relationship with other actors and the traits of those actors.

  13. 13.

    Mutable trait: A characteristic of an actor than can change state.

  14. 14.

    Social selection: The phenomena whereby the relationship status between two actors depends on their characteristics, as occurs with homophily and heterophily.

  15. 15.

    Homophily: A preference for relationships with actors who have similiar characteristics. Popularly referred to as “birds of a feather flock together.”

  16. 16.

    Heterophily: A preference for relationships with actors who have different characteristics. Popularly referred to as “opposites attracting.”

  17. 17.

    In-degree, Popularity: The number of actors who initiated a tie with the given actor.

  18. 18.

    Out-degree, Expansiveness, Activity: The number of ties the given actor initiates with other actors.

  19. 19.

    k-star: A subnetwork in which the focal actor has ties to k other actors.

  20. 20.

    k-cycle: A subnetwork in which each actor has degree 2 that can be arranged as a ring (i.e., a k-path through the actors returns to its origin without backtracking. For example, the ties A-B, B-C, and C-A form a three-cycle.

  21. 21.

    k degrees of separation: Two individuals linked by a k-path (k − 1 intermediary actors) that are not connected by any path of length k − 1 or less.

  22. 22.

    Density: The overall tendency of ties to form in the network. A descriptive measure is given by the number of ties in the network divided by the total number of possible ties.

  23. 23.

    Reciprocity: The phenomena whereby an actor i is more likely to have a tie with actor j if actor j has a tie with actor i. Only defined for directed networks.

  24. 24.

    Clustering: The tendency of ties to cluster and form densely connected regions of the network.

  25. 25.

    Closure: The tendency for network configurations to be closed.

  26. 26.

    Transitivity: The tendency for a tie from individual A to individual B to form if ties from individual A to individual C and from individual C to individual B exist. A form of triadic closure commonly stated as “a friend of a friend is a friend.” Reduces to general triadic closure in an undirected network.

  27. 27.

    Centrality: A dimenionless measure of an actor’s position in the network. Higher values indicate more central positions. There are numerous measures of centrality. Four common ones are degree, closeness, betweeness, and eigenvalue centrality. Degree and eigenvalue centrality are extremes in that degree centrality is determined solely from an actor’s degree (it is internally focused) while eigenvalue centrality is based on the centrality of the actors connected to the focal actor (it is externally focused).

  28. 28.

    Structural balance: A theory which suggests actors seek balance in their relationships; for example, if A likes B and B likes C then A will endeavor to like C as well to keep the system balanced. Thus, the existence of transitivity is implied by structural balance.

  29. 29.

    Structural equivalence: The network configuration (arrangement of ties) around one actor is similar to that of another actor. Even though actors may not be connected, they can still be in structurally similar situations.

  30. 30.

    Structural power: An actor in a dominant position in the network. Such an actor may be one in a strategic position, such as the only bridge between otherwise distinct components.

  31. 31.

    Network component: A subset of actors having no ties external to themselves.

  32. 32.

    Graph theory: The mathematical basis under which theoretical results for networks are derived and empirical computations are performed.

  33. 33.

    Digraph: A graph in which edges can be bidirectional. Unlike social networks, digraphs can contain self-ties. Graphs lie in two-dimensional space.

  34. 34.

    Hypergraph: A graph in dimension three or higher.

  35. 35.

    Maximal subset: A set of actors for whom all ties are intact in a binary network (i.e., has density 1.0). If the set contains k actors, the maximal subset is referred to as a k-clique.

  36. 36.

    Scalar, vector, matrix: Terms from linear and abstract algebra. A scalar is a 1 × 1 matrix, a vector is a k × 1 matrix, and a matrix is k × p, where k, p > 1.

  37. 37.

    Adjacency matrix: A matrix whose off-diagonal elements contain the value of the relationship from one actor to another. For example, element ij contains the relationship from actor i to actor j. The diagonal elements are zero by definition.

  38. 38.

    Matrix transpose: The operation whereby element ij is exchanged with element ji for all i, j.

  39. 39.

    Row stochastic matrix: A matrix whose rows sum to 1 and contain nonnegative elements. Thus, each row represents a probability distribution of a discrete-valued random variable.

  40. 40.

    Random variable: A variable whose value is not known with certainty. It can relate to an event or time period that is yet to occur, or it can be a quantity whose value is fixed (i.e., has occurred) but is unknown.

  41. 41.

    Parametric: A term used in statistics to describe a model with a specific functional form (e.g., linear, quadratic, logarithmic, exponential) indexed by unknown parameters or an estimation procedure that relies on specification of the complete distribution of the data.

  42. 42.

    Nonparametric: A model or estimation procedure that makes no assumption about the specific form of the relationship between key variables (e.g., whether the predictors have linear or additivie effects on the outcome) and does not rely upon complete specification of the distribution of the data for estimation.

  43. 43.

    Outcome, Dependent variable: The variable considered causally dependent on other variables of interest. This will typically be a variable whose value is believed to be caused by other variables.

  44. 44.

    Independent, Predictor, Explanatory variable, Covariate: A variable believed to be a cause of the outcome.

  45. 45.

    Contextual variable: A variable evaluated on the neighbors of, or other members of a set containing, the focal actor. For example, the proportion of females in a neighboring county, the proportion of friends with college degrees.

  46. 46.

    Interaction effect: The extent to which the effect of one variable on the outcome varies across the levels of another variable.

  47. 47.

    Endogenous variable: A variable (or an effect) that is internal to a system.

    Predictors in a regression model that are correlated with the unobserved error are endogeneous; they are determined by an internal as opposed to an external process. By definition outcome variables are endogenous.

  48. 48.

    Exogenous variable: A variable (or an effect) that is external to the system in that its value is not determined by other variables in the system. Predictors that are independent of the error term in a regression model are exogeneous.

  49. 49.

    Instrumental variable (IV): A variable with a non-null effect on the endogeneous predictor whose causal effect is of interest (the “treatment”) that has no effect on the outcome other than that through its effect on treatment. Often-used sufficient conditions for the latter are that the IV is (i) marginally independent of any unmeasured confounders and (ii) conditionally independent of the outcome given the treatment and any unmeasured confounders. In an IV analysis a set of observed predictors may be conditioned on as long as they are not effects of the treatment and the IV assumptions hold conditional on them. While subject to controversy, IV methods are one of the only methods of estimating the true (causal) effect of an endogeneous predictor on an outcome.

  50. 50.

    Linear regression model: A model in which the expected value of the outcome (or dependent variable) conditional on one or more predictors (or explanatory variables) is a linear combination of the predictors (an additive sum of the predictors multiplied by their regression coefficients) and an unobserved random error.

  51. 51.

    Longitudinal model: A model that describes variation in the outcome variable over time as a function of the predictors, which may include prior (i.e., lagged) values of the outcome. Observations are typically only available at specific, but not necessarily equally spaced, times. Longitudinal models make the direction of causality explicit. Therefore, they can distinguish between the association between the predictors and the outcome and the effect of a change in the predictor on the change in the outcome.

  52. 52.

    Cross-sectional model: A model of the relationship between the values of the predictors and outcomes at a given time. Because one cannot discern the direction of causality, cross-sectional models are more difficult to defend as causal.

  53. 53.

    Stochastic block model: A conditional dyadic independence model in which the density and reciprocity effects differ between blocks defined by attributes of the actors comprising the network. For example, blocks for gender accomodate different levels of connectedness and reciprocity for men and women.

  54. 54.

    Logistic regression: A member of the exponential family of models that is specific to binary outcomes. It utilizes a link function that maps expected values of the outcome onto an unrestricted scale to ensure that all predictions from the model are well-defined.

  55. 55.

    Multinomial distribution: A generalization of the binomial distribution to three or more categories. The sum of the probabilities of each category equals 1.

  56. 56.

    Exponential random graph model: A model in which the state of the entire network is the dependent variable. Provides a flexible approach to accounting for various forms of dependence in the network. Not amenable to causal modeling.

  57. 57.

    Degeneracy: An estimation problem encountered with exponential random graph models in which the fitted model might reproduce observed features of the network on average but each actor draw bears no resemblence to the observed network. Often degenerate draws are empty or complete graphs.

  58. 58.

    Latent distance model: A model in which the status of dyads are independent conditional on the positions of the actors, and thus the distance between them, in a latent social space.

  59. 59.

    Latent eigenmodel: A model in which the status of dyads are independent conditional on the product of the (weighted) latent positions of the actors in the dyad.

  60. 60.

    Latent variable: An unobserved random variable. Random effects and pure error terms are latent variables.

  61. 61.

    Latent class: An unobserved categorical random variable. Actors with the same value of the variable are considered to be in the same latent class.

  62. 62.

    Factor analysis: A statistical technique used to decompose the correlation (or covariance) matrix of a set of random variables into groups of related items.

  63. 63.

    Generalized estimating equation (GEE): A statistical method that corrects estimation errors for dependent observations without necessarily modeling the form of the dependence or specifying the full distribution of the data.

  64. 64.

    Random effect: A parameter for the effect of a unit (or cluster) that is drawn from a specified probability distribution. Treating the unit effects as random draws from a common probability distribution allows information to be pooled across units for the estimation of each unit-specific parameter.

  65. 65.

    Fixed effect: A parameter in a model that reflects the effect of an actor belonging to a given unit (or cluster). By virtue of modeling the unit effects as unrelated parameters, no information is shared between units and so estimates are based only on information within the unit.

  66. 66.

    Ordinary least squares: A commonly used method for estimating the parameters of a regression model. The objective function is to minimize the squared distance of the fitted model to the observed values of the dependent variable.

  67. 67.

    Maximum likelihood: A method of estimating the parameters of a statistical model that typically embodies parametric assumptions. The procedure is to seek the values of the parameters that maximize the likelihood function of the data.

  68. 68.

    Likelihood function: An expression that quantifies the total information in the data as a function of model parameters.

  69. 69.

    Markov chain Monte Carlo: A numerical procedure used to fit Bayesian statistical models.

  70. 70.

    Steady state: The state-space distribution of a Markov chain describes the long-run proportion of time the random variable being modeled is in each state. Often Markov chains iterate through a transient phase in which the current state of the chain depends less and less on the initial state of the chain. The steady state phase occurs when successive samples have the same distribution (i.e., there is no dependence on the initial state).

  71. 71.

    Colinearity: The correlation between two predictors after conditioning on the other observed predictors (if any). When predictors are colinear, distinguishing their effects is difficult, and the statistical properties of the estimated effects are more sensitive to the validity of the model.

  72. 72.

    Normal distribution: Another name for the Gaussian distribution. Has a bell-shaped probability density function.

  73. 73.

    Covariance matrix: A matrix in which the ijth element contains the covariance of items i and j.

  74. 74.

    Absolute or Geodesic distance: The total distance along the edges of the network from one actor to another.

  75. 75.

    Cartesian distance: The distance between two points on a two-dimension surface or grid. Adheres to Pythagorus Theorem.

  76. 76.

    Count data: Observations made on a variable with the whole numbers (0, 1, 2, ) as its state space.

  77. 77.

    Statistical inference: The process of establishing the level of certainty of knowledge about unknown parameters (or hypothesis) from data subject to random variation, such as when observations are measured imperfectly with no systematic bias or a sample from a population of interest is used to estimate population parameters.

  78. 78.

    Null model: The model of a network statistic typically represents what would be expected if the feature of interest was nonexistent (effect equal to 0) or outside the range of interest.

  79. 79.

    Permutation test: A statistical test of a null hypothesis against an alternative implemented by randomly reshuffling the labels (i.e., the subscripts) of the observations. The significance level of the test is evaluated by resampling the observed data 50–100 times and computing the proportion of times that the test is rejected.

Terms Used in Network Science

  1. 1.

    Network science: The approach developed from 1995 onwards mostly within statistical physics and applied mathematics to study networked systems across many domains (e.g., physical, biological, social, etc). Usually focuses on very large systems; hence, theoretical results derived in the thermodynamic limit are good approximations to real-world systems.

  2. 2.

    Thermodynamic limit: In statistical physics refers to the limit obtained for any quantity of interest as system size N tends to infinity. Many analytical results within network science are derived in this limit due to analytical tractability.

  3. 3.

    Statistical physics: The branch of physics dealing with many body systems where the particles in the system obey a fix set of rules, such as Newtonian mechanics, quantum mechanics, or any other rule set. As the number of bodies (particles) in a system grows, it becomes increasingly difficult (and less informative) to write down the equations of motion, a set of differential equations that govern the motion of the particles over time, for the system. However, one can describe these systems probabilistically. The word “statistical” is somewhat misleading as there is no statistics in the sense of statistical inference involved; instead everything proceeds from a set of axioms, suggesting that “probabilistic” might be a better term. Statistical physics, also called statistical mechanics, gives a microscopic explanation to the phenomena that thermodynamics explains phenomenologically.

  4. 4.

    Generative model: Most network models within network science belong to this category. Here one specifies the microscopic rules governing, for example, the attachment of new nodes to the existing network structure in models of network growth.

  5. 5.

    Cumulative advantage: A stylized modeling mechanism introduced by Price in 1976 to capture phenomena where “success breeds success.” Price applied the model to study citation patterns where power-law or power-law-like distributions are observed for the distribution of the number of citations and successfully reproduced by the model.

  6. 6.

    Polya urn model: A stylized sampling model in probability theory where the composition of the system, the contents of the urn, changes as a consequence of each draw from the urn.

  7. 7.

    Power law: Refers to the specific functional form P (x) ∼ x −α of the distribution of quantity x. Also called Pareto distribution. See scale-free network.

  8. 8.

    Preferential attachment: A stylized modeling mechanism introduced by Barabasi and Albert in 1999 where the probability of a new node to attach itself to an existing node i of degree k i is an increasing function of k i ; in the case of linear preferential attachment, this probability is directly proportional to k i . In short, the higher the degree of a node, the higher the rate at which it acquires new connections (increases its degree).

  9. 9.

    Weak ties hypothesis: A hypothesis developed by sociologist Mark Granovetter in his extremely influential 1973 paper “The strength of weak ties.” The hypothesis, in short, states the following: The stronger the tie connecting persons A and B, the higher the fraction of friends they have in common.

  10. 10.

    Modularity: Modularity is a quality-function used in network community detection, where its value is maximized (in principle) over the set of all possible partitions of the network nodes into communities. Standard modularity reads as \( Q={\left(2 m\right)}^{-1}{\sum}_{i, j}\left({A}_{i j}-\frac{k_i{k}_j}{2 m}\right)\delta \left({c}_i,{c}_j\right) \) where c i is the community assignment of node i and δ is Kronecker delta; other quantities as defined in the text.

  11. 11.

    Rate equations: Rate equations, commonly used to model chemical reactions, are similar to master equations but instead of modeling the count of objects (e.g., number of nodes) in a collection of discrete states (e.g., the number of k-degree nodes N k (t) for different values of k), they are used to model the evolution of continuous variables, such as average degree, over time.

  12. 12.

    Master equations: Widely used in statistical physics, these differential equations model how the state of the system changes from one time point to the next. For example, if N k (t) denotes the number of nodes of degree k, given the model, one can write down the equation for N k (t + 1), i.e., the number of k-degree nodes at time t + 1.

  13. 13.

    Fitness or affinity or attractiveness: A node attribute introduced to incorporate heterogeneity in the node population in a growing network model. For example, in a model based on preferential attachment, this could represent the inherent ability of a node to attract new edges, a mechanism that is superimposed on standard preferential attachment.

  14. 14.

    Community: A group of nodes in a network that are, in some sense, densely connected to other nodes in the community but sparsely connected to nodes outside the community.

  15. 15.

    Community detection: The set of methods and techniques developed fairly recently for finding communities in a given network (graph). The number of communities is usually not specified a priori but, instead, needs to be determined from data.

  16. 16.

    Critical point: The value of a control parameter in a statistical mechanical system where the system exhibits critical behavior: previously localized phenomena now become correlated throughout the system which at this point behaves as one single entity.

  17. 17.

    Phase diagram: A diagram displaying the phase (liquid, gas, etc.) of the system as one or more thermodynamic control parameters (temperature, pressure, etc.) are varied.

  18. 18.

    Phase transition: Thermodynamic properties of a system are continuous functions of the thermodynamic parameters within a phase; phase transitions (e.g., liquid to gas) happen between phases where thermodynamic functions are discontinuous.

  19. 19.

    Network diameter: The longest of the shortest pairwise paths in the network, computed for each dyad (node pair).

  20. 20.

    Hysteresis: The behavior of a system depends not only on its current state but also on its previous state or states.

  21. 21.

    Quality function: Typically a real-valued function with a high-dimensional domain that specifies the “goodness” of, say, a given network partitioning. For example, given the community assignments of N nodes, which can be seen as a point in an N-dimensional hypercube, the standard modularity quality function returns a number indicating how good the given partitioning is.

  22. 22.

    Dynamic process: Any process that unfolds on a network over time according to a set of prespecified rules, such as epidemic processes, percolation, diffusion, synchronization, etc.

  23. 23.

    Slice: In the context of multislice community detection, refers to one graph in a collection of many within the same system, where a slice can capture the structure of a network at a given time (time-dependent slice), at a particular resolution level (multiscale slice), or can encode the structure of a network for one tie type when many are present (multiplex slice).

  24. 24.

    Scale-free network: Network with a power-law (Pareto) degree distribution.

  25. 25.

    Erdős-Rényi model: Also known as Poisson random graph (after the fact that the degree distribution in the model follows a Poisson distribution), Bernoulli random graph (after the fact that each edge corresponds to an outcome of a Bernoulli process), or the random graph (as the progenitor of all random graphs). Starting with a fixed set of N nodes, one considers each node pair in turn independently of the other node pairs and connects the nodes with probability p. Erdős and Rényi first published the model in 1959, although Solomonoff and Rapoport published a similar model earlier in 1951.

  26. 26.

    Watts-Strogatz model: A now canonical model by Watts and Strogatz that was introduced in 1998. Starting from a regular lattice structure characterized by high clustering and long paths, the model shows how randomly rewiring only a small fraction of edges (or, alternative, adding a small number of randomly placed edges) leads to a small-world characterized by high clustering and short paths. The model is conceptually appealing, and shows how to interpolate, using just one parameter, from a regular lattice structure in one extreme to an Erdős-Rényi graph in the other.

  27. 27.

    Mean-field approximation: Sometimes called the zero-order approximation, this approximation replaces the value of a random variable by its average, thus ignoring any fluctuations (deviations) from the average that may actually occur. This approach is commonly used in statistical physics.

  28. 28.

    Ensemble: A collection of objects, such as networks, that have been generated with the same set of rules, where each object in the ensemble has a certain probability associated with it. For example, one could consider the ensemble of networks that consists of six nodes and two edges, each begin equiprobable.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this entry

Cite this entry

O’Malley, A.J., Onnela, JP. (2017). Introduction to Social Network Analysis. In: Sobolev, B., Gatsonis, C. (eds) Methods in Health Services Research. Health Services Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4939-6704-9_15-1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6704-9_15-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4939-6704-9

  • Online ISBN: 978-1-4939-6704-9

  • eBook Packages: Springer Reference MedicineReference Module Medicine

Publish with us

Policies and ethics