Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Examples of network data in political science are ubiquitous, and include records of legislative co-sponsorship, alliances between countries, social relationships, and judicial citations.Footnote 1 Numerical estimates of the influence of each node (e.g. legislator, country, citizen, opinion), defined in terms of its propensity to form a relationship with another node, are often of interest to an analyst in each of these examples. In this chapter we present a new approach to solving a common problem in the social sciences—that of estimating the influence of vertices in a network. Our approach assumes that observed levels of influence relate to an underlying latent “quality” of the vertices.Footnote 2 Although common methods for measuring influence in networks assume that each vertex has the potential to influence every other vertex, many networks reflect temporal, spatial, or other practical constraints that make this assumption implausible. We present a scoring method that is appropriate for measuring influence in networks where (1) some vertices cannot form an edge with certain vertices for reasons that are unrelated to their underlying “quality” and (2) each vertex may be influenced by a different number of other vertices, so that some edges reveal different amounts of information about the latent “quality” of the influencing vertices.

As an example, we rate the “quality” of Supreme Court decisions, which we define as the likelihood that the decision will be cited in a future decision. These decisions are readily analyzed by our method due to their connectedness—the Supreme Court’s explicit usage of previous decisions as precedent for current and future decisions generates a network structure. The network data enable us to assess some instances when a given decision “succeeded” (i.e., was cited in a later opinion) or “failed” (i.e., was not cited in a later opinion). However, because later decisions cannot be cited by earlier opinions, the data do not allow us to observe whether a given opinion would have been cited by an earlier opinion. Our network structure is necessarily incomplete.

The method we describe and employ in this chapter is intended to deal explicitly with this problem of incompleteness. The method, developed and explored in more detail by Schnakenberg and Penn (2012), is founded on a simple (axiomatic) theoretical model that identifies each opinion’s latent quality in an (unobserved) world in which every object has the potential to succeed or fail. The theoretical model identifies the relative quality of the objects under consideration by presuming that the observed successes are generated in accordance with the independence of irrelevant alternatives (IIA) choice axiom as described by Luce (1958). In a nutshell, the power of this axiom for our purposes is the ability to generate scores for alternatives that are not directly compared in the data. Substantively, these scores locate all opinions on a common scale.

1 Inferring Quality from Network Data

We conceive of our data as a network in this chapter. Accordingly we first lay out some preliminaries and then discuss how one applies the method to general network data. We represent the observed network data by a graph denoted by G=(V,E), where V={1,2,…,n} is a set of n vertices and E is a set of directed edges, where for any v,wV, (v,w)∈E indicates that there is an edge from v to w.Footnote 3 We define a community to be a subset of vertices, CV, with a community structure \(\mathcal{C}=(C_{1},\ldots, C_{n})\) being a set of subsets of V, and C i being the community of vertex i.

Underlying our model is an assumption that each vertex j in a community C i has the potential to influence vertex i. To define this formally, let \(\tilde{E}\) be a set of potential interactions, with \(E\subseteq \tilde{E}\). If (i,j)∈E then we know that i and j interacted with j influencing i, and so it is known that they had the potential to interact: it is known that jC i . On the other hand, of course, \((i, k)\not\in E\) need not imply that i could not have been connected to k. Rather, it may be the case that opinion that i could have been connected to k, but the link was not created for some reason (possibly because k was not of high enough quality to influence i, possibly because k and i never had an opportunity to interact, or for some other independent factor(s)). Our community structure is designed to accommodate this fact, and in particular we assume that kC i implies that \((i, k)\in \tilde{E}\). Thus, k being in community C i implies that k had the potential to influence i (i.e., i had the opportunity to link to k), regardless of whether k may or may not have succeeded (i.e., regardless of whether an edge between i and k is observed).

The second assumption of our model is that each vertex can be placed on a common scale representing the vertex’s quality. We assume that vertices with higher latent qualities are more likely to have had successful (i.e., influential) interactions with vertices that they had the potential to interact with. Thus, the higher latent quality of vertex i, the more likely that, for any given vertex jV, \((j, i)\in \tilde{E}\) implies that (j,i)∈E.

Our goal is to estimate each vertex’s “latent quality” score subject to a network G and an observed or estimated community structure, \(\mathcal{C}\). We conceive of our network and community structure as generating a collection of “contests” in which some vertices were influential, some had the potential to be influential but were not, and others had no potential to influence. These contests are represented by the set \(\mathcal{S}=\{s\in V: (s, v)\in E\mbox{ for some }v\in V\}\). Thus, every vertex that was influenced represents the outcome of a contest.

Let x=(x 1,…,x n )∈R n represent each vertex’s latent quality. Then for each \(i\in \mathcal{S}\) we let the expected influence of vertex k in contest i (i.e., probability of i connecting to k), which we denote by E(i,k), equal 0 if \((i, k)\not\in \tilde{E}\). Thus, k’s expected influence in contest i is zero because in this opinion we assume that \(k\not\in C_{i}\), and thus k had no potential to influence i (i.e., there is no chance that i will connect to k). Otherwise,

$$ E(i, k)={\frac{{x_k}}{{\sum_{j\in C_i}x_j }}}. $$

In words, the expected share of influence of k in a contest in which k has the potential to influence i is k’s share of latent influence relative to the total latent influence of the vertices that can potentially influence i.

Similarly, we can calculate the share of actual influence of k in i, or A(i,k), by looking at the total set of vertices that actually influenced i in the network described by G. This set is W i ={w:(i,w)∈E}⊆C i , and (without any additional information such as edge weights), k’s share is \(\frac{1}{{|W_{i}|}}\) if kW i and 0 otherwise. We can now utilize our network and community structure to estimate x subject to an unbiasedness constraint that is conditional on the community structure. The constraint is that

$$ \sum_{s\in \mathcal{S}}E(s, i)=\sum_{s\in \mathcal{S}}A(s, i)\quad\mbox{ for all }i, $$

or that each vertex’s total actual score equals their total expected score. Satisfaction of this constraint implies, given a correct community structure, that no vertex is estimated to be more or less influential than it actually was. Schnakenberg and Penn (2012) prove that, subject to a minimal connectedness condition, there exists a vector \(x^{*}=(x^{*}_{1},\ldots, x^{*}_{n})\) that solves the above system of equations and that is unique up to scalar multiplication.Footnote 4 Viewed substantively, this vector represents the relative qualities/influences of the different nodes. In particular, as x is uniquely identified up to scalar multiplication, the ratio of any two nodes’ qualities,

$$ \rho^i_j \equiv \frac{x_i}{x_j}, $$

is uniquely identified. This ratio \(\rho^{i}_{j}\) represents the hypothetical relative frequency of selection/influence by node i versus that by node j in a future contest in which both nodes i and j compete (i.e., for any future node that both i and j have the ability to exert influence on).

2 Measuring the Quality of Precedent

The use of judicial precedent by Supreme Court Justices—and, in particular, a focus on citations as an indication of this usage—has attracted sustained attention from legal and political science scholars for over 60 years.Footnote 5 Unsurprisingly, given the breadth of the topic, scholars have adopted various approaches to the study of precedent, but most have focused on the determinants of citation: in a nutshell, what factor or factors of an opinion augur revisitation of the opinion in future opinions?

Because our model imputes unobserved relationships between objects, it is particularly well-suited to analyzing networks in which certain links are impossible to observe. These types of networks could, for example, arise in situations in which vertices are indexed by time and a later vertex is incapable of influencing a vertex that preceded it.

We utilize a data set consisting of the collection of citations by United States Supreme Court majority opinions to Supreme Court majority opinions from 1791 to 2002. Thus, viewed in the theoretical framework presented above in Sect. 1, the vertices of our network are Supreme Court majority opinions, and if majority opinion i cites majority opinion j, we include the edge (i,j)∈E.

Before moving on, it is important to note what we are explicitly abstracting from in our operationalization of the judicial citation/precedent network. Most importantly, we omit consideration of all opinions other than the majority opinion. Both dissenting and concurring opinions are relevant for understanding both the bargaining processes at work in constructing the majority opinion and inferring the role and quality of precedent (e.g., Carrubba et al. (2011)).Footnote 6 In addition, our approach ignores the citing opinion’s treatment of the cited opinion (e.g., favorable, critical, or distinguishing).Footnote 7 , Footnote 8 We leave each of these for future work.

Differentiating Cases: Community Structure

As discussed earlier, the method we employ allows us to compare/score objects that have not been directly compared. Accordingly, it offers an analyst the freedom to “break up” the data in the sense of estimating (or, perhaps, observing) communities of objects that are less likely to be directly compared with one another. For the purposes of this chapter, we take into account only the temporal bias discussed earlier—later opinions cannot be cited by earlier opinions—and presume that each opinion is eligible (i.e., “in competition”) for citation by every subsequently rendered opinion.Footnote 9

Thus we construct the community C i for a given opinion i as follows. Letting Year(i) be the year in which opinion i was heard, we assume that for any pair of vertices (i.e., majority opinions), i, j,

$$ \mbox{Year}(i)> \mbox{Year}(j)\quad \Leftrightarrow\quad j\in C_i. $$

In words, an opinion can be influenced by any and only opinions that strictly predate it.

Data

We apply our method to Fowler and Jeon’s Supreme Court majority opinion citation data (Fowler et al. (2007), Fowler and Jeon (2008)). There are a number of ways one might approach this data when considering the question of the quality or influence of each opinion. The most straightforward approach would rank all of the opinions that have been cited at least once (any opinion that is not cited by any other opinion in the database cannot be ranked). In this approach, every opinion is a contest, and each opinion that is cited at least once is a contestant.

Practical constraints prohibit us from ranking all of the opinions. Fortunately, our approach implies that we can examine any subset of the data and recover relative rankings that are (in theory) identical to the rankings that would be estimated from the entire data set. Accordingly, we restrict our attention to the 100 most frequently cited opinions between 1946 and 2002. In graph theoretic terms, we examine the smallest subgraph containing all edges beginning or ending (or both) with an opinion whose in degree (number of times cited) ranks among the top 100 among the opinions rendered between 1946 and 2002. This graph contains many more than 100 opinions (3674, to be exact). After these opinions, and their incident edges, are selected, they are then used for our community detection algorithm, which we now describe.

Using the years of the opinions to create the communities as described earlier, we then solve for the influence scores of the opinions (i.e., contestants) as follows. First, we choose the contestants in turn and, for each majority opinion (i.e., contest) that was subsequent to an opinion and cited at least one member of the contestant’s community, we count the contestant as having been participant (i.e., available for citation) in that majority opinion/contest. If the contestant was cited in (i.e., won) that contest, the contestant is awarded 1/|W| points, where W is the set of opinions (contestants) cited in that majority opinion (contest). Otherwise, the contestant is awarded 0 points in that contest. With this vector of scores for each contestant in each contest, it is then possible to directly apply the method developed by Schnakenberg and Penn (2012) to generate the latent influence scores of each majority opinion, \(\hat{x}=(\hat{x}_{1},\ldots,\hat{x}_{n})\).

These latent influence scores represent, in essence, the appeal of each majority opinion as a potential citation in any subsequent majority opinion. What this appeal represents in substantive terms is not unambiguous, of course. It might proxy for the degree to which the opinion is easily understood, the degree to which its conclusions are broadly applicable,Footnote 10 or perhaps the likelihood that the policy implications of the opinion support policies that are supported by a majority of justices in a typical opinion. Obviously, further study is necessary before offering a conclusion on the micro-level foundations of these scores. Such research will require inclusion of observed and estimated covariates distinguishing the various opinions and majority opinions.

3 Results

We now present the results of three related analyses. We first present our results for the 100 most-cited opinions rendered between 1946 and 2002.Footnote 11 Following that, we present the results for the 100 most-cited opinions since 1800.Footnote 12 Finally, we consider the 204 most-cited opinions since 1800 with an eye toward comparing the ranking of the 100 most-cited opinions since 1946 with the ranking of those cases when all opinions that have been cited at least as many times as these 100 are considered.

3.1 Top 100 Opinions Since 1946

Table 2 presents the opinions with the top 36 estimated latent quality scores for this period. This is the set of opinions for which the estimated quality score is greater than 1, which is by construction the average estimated quality score for the 100 cases.

This ranking is interesting in a number of ways. The top two majority opinions score significantly higher than all of the others.Footnote 13 The top-scoring opinion, Chevron, is a well-known case in administrative law with broad implications for the judicial review of bureaucratic decision-making. The second-ranked opinion, Gregg, clarified the constitutionality of the death penalty in the United States. Of course, the third highest scoring opinion is the famous Miranda decision in which the Court clarified the procedural rights of detained individuals.

Space prevents us from a full-throated treatment of the scores, but a few simple correlations are of interest. Table 1 presents three Pearson correlation coefficients relating the opinions’ scores with, respectively, the age of the opinion, the number of subsequent opinions citing the opinion, and the number of subsequent opinions citing the opinion divided by the age of the opinion.

Table 1 Descriptive correlations with scores. Sample: Top 100 most-cited cases since 1946

The negative correlation between the age of an opinion and its score is broadly in line with previous work on the depreciation of the precedential value (or, at least, usage) of judicial opinions.Footnote 14 It is important to note, however, that this effect is potentially at odds with the IIA axiom on which the scoring algorithm is based. We partially return to this question below when we expand the sample of opinions.

That the correlation between the opinions’ scores and the number of times each opinion has been cited by a subsequent Supreme Court majority opinion is positive is not surprising: the score of an opinion is obviously positively responsive to the number of times that an opinion has been cited, ceteris paribus. Accordingly, the interesting aspect of the correlation is not that it is positive but, rather, that it is not closer to 1. Indeed, inspection of Table 2 indicates, a fortiori, that the rankings of the opinions with respect to the number of citations they have received and with respect to their scores are not identical. Put another way: the scores are measuring something different than the opinions’ citation counts or, as it is commonly known in network analysis, the degree centralities of the opinions in the citation network.

Table 2 The 36 highest scoring opinions. Sample: Top 100 most-cited cases since 1946

Finally, the correlation between the score and the average number of times per year the opinion has been cited since it was handed down is strongly positive. This highlights the fact that the scores control for the fact that an opinion cannot cite an opinion that is rendered subsequently. Again, though, it is important to note that the ranking of the opinions generated by our scores differs from that generated by the number of citations per year. It is useful to consider the origins of this difference. Specifically, the distinction arises because of the fact that the IIA axiom on which the method is based implies that an opinion’s “reward” (or score) for being cited by a subsequent opinion is inversely proportional to the number of other opinions cited by that opinion. At the extreme, for example, a hypothetical opinion that cited every previous opinion would compress the scores of the opinions in the sense that the scores of all opinions that initially had lower than average scores would increase as a result of the citation by the hypothetical opinion, whereas the scores of all of those opinions with above average scores prior to the hypothetical opinion would decrease.Footnote 15

3.2 Top 100 Opinions Since 1800

We now present our results for the top 100 most-cited opinions rendered between 1800 and 2002. Table 3 presents the opinions with the top 38 estimated latent quality scores for this period. As with the previous analysis for the period between 1946 and 2002, this is the set of opinions for which the estimated quality score is greater than 1.

Table 3 The 38 most influential cases among the top 100 most-cited cases since 1800

Comparing these scores with those in Table 2, it is perhaps surprising how similar the two sets of scores are. In particular, the top three majority opinions are identical and have very similar scores in the two analyses. Things get interesting at the fourth highest-scoring position. First, the majority opinion ranked fourth-highest in the 1946–2002 analysis reported in Table 2, Cannon v. University of Chicago, is not among the top 100 most-cited majority opinions since 1819.Footnote 16 The fourth highest-scoring opinion among the 100 most-cited majority opinions since 1819 is Miller v. California, in which the Court affirmed and clarified the power of state and local governments to place limits on obscenity. This opinion is, of course, among the top 100 most-cited rendered since 1946, yet ranks only 19th in the scores reported in Table 2. This point highlights a feature of the scores in both tables: after the top 3 or 4, there is a relatively large “plateau” of scores.

Beyond visual inspection, it is useful to reconsider the correlations analogous to those reported in Table 1. These are displayed in Table 4 and closely conform to the conclusions drawn in the discussion of the correlations reported in Table 1: older opinions tend to have lower scores, and scores are positively associated with both number of subsequent citations as well as the average annual rate of subsequent citation.

Table 4 Descriptive correlations with scores. Sample: Top 100 most-cited cases since 1800

3.3 Probing IIA: Top 204 Opinions Since 1800

We calculated the scores for the top 204 most-cited majority opinions since 1819. This is the smallest set of most-cited opinions for the entire time period that contains the top 100 most-cited opinions rendered since 1946. Each opinion rendered after 1946 is accompanied by two scores and two ranks: the “Post ’46” values are identical to those reported in Table 2. The “Full” values, presented in Table 6, correspond to the rank of that opinion’s score from the analysis of the 204 most-cited opinions since 1800 relative to the analogous scores for the opinions rendered after 1946. The IIA axiom underpinning the scoring method implies that the relative ranking of the opinions should be invariant to including additional opinions, as the scoring of the 204 most-cited opinions does. Inspection indicates a strong similarity between the two rankings. Most telling are the following two correlations between, respectively, the (relative) ranks of the 100 post-1946 opinions in the two samples and the scores of these cases in the two samples in Table 5.

Table 5 Intersample correlations of scores. Sample: Top 100 most-cited cases since 1946
Table 6 Comparing scores of post 1946 cases (full sample: 204 most-cited opinions since 1800)

Each of these correlations indicate a very strong agreement between the (relative) ranks and scores, respectively, for the top 100 most-cited opinions since 1946. This agreement provides support for the supposition of IIA that identifies the method.

4 Conclusion

In this chapter we score all Supreme Court majority opinions since 1800 on the basis of their “quality” (measured as influence or citability), using network citation data. In placing all such opinions on a common scale we are faced with the problem that majority opinions cite heterogeneous numbers of other opinions and that an opinion cannot be cited by a different opinion that predates it—our network is necessarily incomplete. To deal with the incomplete nature of our data we utilize an axiomatic scoring method that is designed to compare objects that have never been directly compared in the data.

The scores calculated by this method are analogous to measures of network influence—specifically, it is a vertex metric. As such, it fundamentally differs from other centrality measures for partially connected networks such as eigenvector centrality and degree centrality. One difference is that our measure does not utilize the score of s in computing the contribution of link (s,v) to v’s score (as in eigenvector centrality); instead our score utilizes the scores of the other w that could have potentially influenced s, or \(\{w: (s, w)\in \tilde{E}\}\). In generating estimates of the x i using observed network and community data we impute “influence relationships” between vertices that did not have the potential to interact. This leads to the following interpretation of our scores: if there were a hypothetical vertex with a community equal to the set of all possible vertices, then our scores represent the expected influence of each vertex on that hypothetical vertex.

The analysis presented in this chapter is preliminary, with an obvious shortcoming being the fact that we assume that the community of a case i, or collection of cases that could potentially influence i, consists of all of the cases that predate it. In future work we intend to allow community structure to be determined not only by the year in which a case was considered but also by the topic of the case. Additionally, we hope to apply our scoring method to other types of incomplete network data as we believe it provides a useful new measure of node centrality that generalizes the concept of in-degree centrality.