Introduction

The interest towards the use of computational techniques in the analysis of the societal dimension of criminal organizations has significantly grown in recent years. The adoption of software for network analysis, data mining and visualization is gradually opening up new prospects to the study of crime for both scientific and investigation purposes. An important issue, in this field, is to bring the use (and also, somehow, the development) of computational tools within the real life activity of public prosecutors and criminal investigation departments. This not only allows to increase the quality, the variety, and the amount of the information used for the analyses, but also to create a closer link between a “computationally-inspired” approach to crime and the inquirers know-how.

In this paper we present a “holistic” methodology aiming at combine document-enhancement methods, network analysis principles, and visualization techniques to support public prosecutors and criminal investigation departments in delving into the societal dimension of criminal organizations. The research, lying somewhere between Legal Informatics, Social Network Analysis (SNA), Visual Analytics (VA) and data mining, is characterized by a strongly interdisciplinary approach and pays a special attention to the feedback loop taking place between theory and application level: advancement in science boost new applications and new applications trigger new research questions. As crime is, at the same time, a socially and legally defined phenomenon, the development of new knowledge relevant both in scientific and investigation terms, requires to bring together aspects studied from different disciplines spanning from sociology to criminology, from criminal law to criminal procedure law, social networks analysis, visual analytics. According to this vision, what is described is the result of a study that did not take place in an exclusively research context but is the result of a thorough discussion with judges and legal practitioners. After a brief introduction of the background and of the research issues, we describe the rationale and the basic feature of a framework designed to support the proposed approach.

We present then a first case study connected with real-world criminal investigation concluding with some final remarks discussing both the domain-related and technical issues of the research sketching an overview of ongoing works and future directions.

Background

In this Section we offer a brief overview of the background of the research. Our work has been inspired by the results achieved in two different research areas: Criminal Network Analysis (CNA) and Visual Analytics. We will discuss each of them in the following.

Delving into the societal dimension of crime: criminal network analysis

Recent years have witnessed a growing interest towards the use of Social Network Analysis techniques in the study of criminal organizations both for scientific and investigation goals. Sociality has a tremendous influence on crime: a large part of criminal phenomena from pornography trafficking to hacking and other cybercrimes, is strongly conditioned (inhibited or facilitated) by relational dynamics. Criminals are often highly social actors: they communicate among them, collaborate and form groups in which it is possible to distinguish leaders, sub-communities and actors with very different roles. Therefore, SNA techniques seem to be perfectly suited to the needs of organized crime studies. This task is facilitated by the growing amount of digital information spanning from e-mails, to credit-card operations today available. The data deluge and the availability of increasingly advanced data mining techniques are offering both researchers and law enforcement agencies new tools and methodologies to unveil and understand structures and dynamics of criminal networks.

Criminal Network Analysis (CNA) is today a well-established interdisciplinary research area (Morselli 2009) in which network analysis techniques are employed to analyze large volume of relational data and gain deeper insights about the criminal network under investigation. There are many examples of application of the principles of SNA in the analysis of criminal organizations (Dècary-Hètu 2012; Didimo et al. 2014; Haynie 2001; Johnson 2010; Lu et al. 2010; Patacchini and Zenou 2008; Schroeder et al. 2003; Thomas 2001; Xu and Chen 2005a). Specifically, in Xu and Chen (2005a), authors state that effective use of SNA techniques to mine criminal network data can have important implication for crime investigations, mainly because they can aid enforcement agencies fighting crime proactively, for example by intervening before a crime takes place and also with the required police efforts. Additionally, they developed CrimeNet Explorer (Xu and Chen 2005b), a system that, in addition to the visualization functionality, detects subgroups in a network, discovers interaction patterns between groups, and identifies central members in a network.

As also discussed in Hutchins and Benham-Hutchins (2010), the most common challenge to address when designing and implementing a new software system is the ability to use information and documents from many sources. Specifically, data can be structured or unstructured, whereas the latter representing the most important barrier for researchers that want to develop a new system for the domain experts. The lack of structure in the information indeed totally prevents or make it difficult the application of any algorithm and any automated processing.

Authors in Hutchins and Benham-Hutchins (2010) address this challenges by proposing HIDTASIS, a system software that supports both qualitative (visual) and quantitative (mathematical) analysis of criminal network data. Similar to other analyzed works the proposed system only applies SNA metrics to analyze the structure of the criminal network in order to derive useful insights about roles’ entities and communication patterns. After 9/11, social network experts began to look explicitly at the use of network methodology in understanding and fighting terrorism. Some important works include the analysis performed by Krebs (2002), that collected publicly available data on the Al-Qaeda hijackers and applied the social network metrics to derive useful insights about the connections among the terrorists and their roles in the network (Muhammad Atta was identified as the key leader of the network). A similar analysis was performed by taking into account newspaper articles and radio commentary (Richard 2002). Specifically, in this work the authors analyzed the New York Times archive from September 11, 2000 to October 31, 2001. They analyzed over 500 articles involving about 25-30 persons, in different events and in different geographical areas. By analyzing these articles they derived some general conjectures about the network features of the terrorist organization (i.e., high degree of connectivity and considerable redundancy, small dynamic units, hierarchical management of the network with a central leadership. etc.).

Additional works studied the potential use of SNA to destabilize terrorist networks (Memon and Larsen 2006). Specifically, in this work the authors proposed a framework of automated analysis, visualization, and destabilization of terrorist networks to assist law enforcement and intelligence agencies to ascertain terrorist network knowledge efficiently and effectively. Their prototype, iMiner, exploits algorithms and metrics to automatically detect cells from a network, to identify various roles (e.g., central members, gatekeepers, and followers), and finally, to analyze the effect on the network after capturing a terrorist.

An interesting work in the field of criminal network analysis was recently proposed by Ferrara et al. (2014). The authors present LogAnalysis, an expert system specifically designed to allow statistical network analysis, community detection (to identify groups or clan), and visual exploration of mobile phone network data by using different state-of-the-art view layouts. The tool also enables to study the temporal evolution of the criminal network and to highlight some crucial information about the dynamics of the connections when criminal events occur. Finally, authors presents results of a case study inspired by a real criminal investigation.

Making sense of data: visual analytics and data mining

Both Social and Criminal Network analysis are in various ways connected with the need of managing data and make sense of them. We live in a world in which the amount of investigative information to be dealt with is continuously expanding. Police departments and public prosecutors are facing with a scenario in which the availability of raw data is no longer the main problem. The real problem is how to turn the data deluge into reliable and comprehensible knowledge: users often face an information overload problem getting lost in a flood of data that has no value unless they are able extract the information contained in it. A second problem is about how to intuitively access information after the data analysis took place.

The solution can be found in visual analytics, a fledging research field aiming to provide people with innovative ways to turn large dataset into knowledge while also enabling them to act upon their findings in real-time. The term “visual analytics” appeared around 2000 within the computer science research area (Keim et al. 2008; Wong and Thomas 2004) and then gradually moved to other contexts giving birth to a multidisciplinary field that combines visualization, human-computer interaction, data analysis, data management, geo-spatial and temporal data processing, spatial decision support and statistics (Keim et al. 2010). The approach greatly widened the scope of both the information visualization and the data mining fields, resulting in new techniques relevant both from a scientific and information retrieval standpoint. Nowadays, according to a well-known definition focusing on the goal of this emerging research area (Kohlhammer et al. 2011), we can state that VA is the creation of tools and techniques to enable people to:

  • Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data

  • Detect the expected and discover the unexpected

  • Provide timely, defensible, and understandable assessments

  • Communicate these assessment effectively for action

In VA, the human cognitive abilities are combined with the power of computational data processing and visualization becomes the means of a cooperative process: the users guide the analysis while the system provides the means of interaction to focus on their specific tasks. In many application areas, several people work along the processing path from data to decision: visual representations sketch this path and allowing them to collaborate across different tasks and at different levels of detail.

In the last decade, VA techniques have become relevant in a growing number of areas spanning from physics to business intelligence in which large information datasets need to be analyzed. The study of financial markets (Ziegler et al. 2008), just to give an example, requires the analysis of huge amounts of daily generated data and a crucial challenge, therefore, is to analyze these data to monitor the market, understand historical situations, forecast trends or identify patterns of recurring events.

From this perspective, criminal investigation is no exception. Criminal social life is the result of interactions that produce intricate networks and rise different research and investigative challenges: identify relations between individuals; analyze the role of a single or of a group within the criminal network; track the evolution of a given network or sub-community over time; unveil and measure relevant features (e.g. complexity, relevance) of criminal groups.

All these analyses can be crucial in many different fields spanning from lawyering to scholarly investigation. When combined with SNA, visualizations techniques can represent an interesting innovation in this regard: they do not only allow easier and more intuitive information retrieval (Zhang 2007) (a feature that is particularly useful when one has to handle large quantities of data), but they also offer new insights in the criminal world.

Computation and crime: research and investigative issues

The scenario sketched in the previous sections suggests the opportunities and the issues potentially deriving by strengthening the use of computational approaches and tools into criminal investigations. On the one hand, Criminal Network Analysis has proved to be a valuable tool to understand the structural and the functional features of criminal organizations (identify the most important members of a criminal organization; determine the existence of sub-communities; measure the interactions between individuals and subgroups and the flow of communications among them). On the other hand, the combination of the insights potentially deriving from bringing VA into the core of investigation activities could significantly enhance the understanding of criminal organizations. The possibility, for a public prosecutor, to visualize and analyze in real time the components and features of the criminal networks under investigation thanks to SNA techniques, data mining and visualization could steer the developments of the investigation in new and more effective directions.

Different issues arise in this scenario. A relevant problem in this context is the lack of structured information enabling VA an effective application of SNA metrics. Information needed to this aim (e.g., criminal records, dialogues transcript etc.) are produced during the whole investigative activity by public prosecutors and police officers using tools (mainly traditional text editors) that neither offer scaffolding support nor allow a drafting workflow producing the structured information needed for SNA. This implies the loss of a wealth of knowledge potentially useful for the study of social relations within the criminal organization.

A survey conducted in this perspective, interviewing Italian public prosecutor, highlighted a set of critical issues ranging from typos to mere lack of metadata. As highlighted in a our earlier work (Lettieri et al. 2014), some relevant examples include: (1) Inability to create structured, connected and reusable information (flat documents, missing any kind of contextual or semantic mark-up); (2) Data incorrectness (unintentional drafting errors and repetitions); (3) Asynchronous collaboration (lack of knowledge sharing among officers involved in the investigation.

The analysis of the effort historically made to fight crime shows how information technology may play a key role in the evolution of both criminal investigation and crime control strategies (Byrne and Marx 2011; Chan 2001). As highlighted in Reichert (2001), ICT “increases our ability to store and process large volumes of data, improving intelligence and investigative capabilities”. The implementation of a “technology-led policing” (De Pauw et al. 2011) has to deal anyway with practical and cultural stumbling blocks that inhibit change. This is due to the concurrence of different factors: the absence of user-friendly crime analysis tools, the lack of technical skills of investigators (public prosecutors, police officers), two factors that, when combined with the pressure of daily routine, often result into a resistance to innovation (Sykes 1992). This is why, besides experimenting new investigative methodologies, it becomes essential to define approaches and develop tools that seamlessly integrate within the existing procedures and workflow.

CrimeMiner: a framework to investigate the societal dimension of crime

In this section we describe the holistic approach that we propose to integrate in a single “place” different functionalities, from the early phase of document drafting, to the latest analysis step, aiming at infer insights about criminal investigations. This approach has been deployed into a framework, named CrimeMiner (Lettieri et al. 2013), whose main goal is to walk users through to the overall investigation workflow.

The idea is to bring computation at the core of investigation activities carried out by public prosecutors and police departments allowing them to gain deeper insights about the criminal network being investigated. The identification of social relations and roles within criminal organizations cannot depend exclusively on an uncritical and mechanical combination of data and algorithms conceived outside the investigative domain. Every algorithm underlies somehow a theory of the phenomenon under investigation. In our view, it is essential to offer to both investigators and public prosecutors tools to be used on their own, allowing them to make experiments with SNA and VA within a guided experience. Starting from these considerations, we designed and developed an integrated environment implementing the following functionalities:

  • Text editing, to be used to draft the document produced during the investigation

  • Document-enhancement, to add the meta-information needed to perform network analysis of criminal groups

  • Generation and analysis of graphs of the investigated criminal network

  • Visual analytics and data mining

In the following we describe the architecture of CrimeMiner and its main functionalities.

An integrated workflow

A public prosecutor or police officer concerned to take advantage of network analysis and visual analytics in its investigative activities, would play one by one, involving also external crime analysts, three different activities: 1) drafting documents containing most of the relevant information to the investigation; 2) creating a database to store and manage the data needed for further analysis; 3) processing that data by means of network analysis and visualization software.

This activities’s organization shows some limitations:

  • It is time consuming, in terms of effort required to a user already overwhelmed by heavy caseloads

  • It is difficult for users that normally lack the technical skills and are also often reluctant to IT innovations

  • It makes cumbersome to exploit the CNA insights during the investigation (e.g., a public prosecutor could decide to increase the control over a group of criminals in the light of CNA results)

CrimeMiner brings together all three activities mentioned above in a single workflow. Depending on the need, the user moves from document editing to data entry, from visualization to analysis seamlessly, using a single interface, remaining always within the same application. According to this vision, CrimeMiner implements an integrated workflow, shown in Fig. 1, made of three steps containing different activities. We will describe these steps in the following.

Step 1: Creation:

The first step of the workflow consists in the creation of a “project” i.e., an archive that will store all the documents (criminal proceedings, remands etc.), the data, the visualizations and the results of the analyses produced during the investigation by both prosecutors and police officers (see Fig. 2).

Step 2: Editing:

The second step includes two different activities carried out using a “MSWord-like” interface The first one, “Document editing”, consists in the drafting of the documents normally produced by the user during the investigative process. The second, “Data Entry”, results into the tagging of the information needed for visualization and analysis. While drafting the document, the user populates a database with the information related both to the criminals subject of the investigation (i.e., personal details, criminal records) and their interactions (i.e., conversations captured by means of wire/environmental tapping). This creates a connection within the project data, as shown in Fig. 2.

Step 3: Analysis:

This step includes two activities variously connected with the analysis of the criminal organization. The first one, “Visual Analysis” consisting in the study the graph depicting the criminal network generated on-the-fly by CrimeMiner. The user explores the structural and functional features of both individuals and groups by applying network analysis algorithms (centrality measures, community detection algorithms, etc.). The second one, that we called “Visual Information Retrieval”, allows the user to interrogate the database containing all the information about ongoing investigation (from personal details and criminal records of suspects to the statistics about trends in the communications between individuals) by simply interacting with the graph.

Fig. 1
figure 1

CrimeMiner integrated workflow

Fig. 2
figure 2

Tagging mechanism: relations between documents and databases in a CrimeMiner Project

Functionalities

The main functionalities provided by CrimeMiner can be summarized as follows.

1. Scaffolding:

The system supports domain experts’ scaffolding and monitors all the activities needed to proceed from drafting to graph visualization and CNA. People working on investigative activities will benefit of this approach being guided (by means of tooltip, online guide, and so on) through the whole process minimizing the errors and maximizing the effectiveness.

2. Document-enhancement:

This functionality aims at overcome the limitations coming from the lack of structured information. Specifically, Document-enhancement allows to give a structure to the content of a document providing all the metadata needed to implement the criminal network analysis and visualization features (Xu and Chen 2005a). It allows not only to generate information ready to be handled with graph visualization and network analysis algorithms, but it also drastically reduce the number of errors in data entry. The adoption of solutions somehow inspired by the Facebook tagging mechanism, allows us to avoid different irregularities like the one deriving, for instance, by the fact that when the same person is mistakenly reported as Jo, Joe, Joseph, this produces three database entities and, consequently, three different nodes on the graph.

As shown in Fig. 3, CrimeMiner provides an advanced editor that, in addition to the traditional word-processing facilities (i.e., the Document Editor), allows the creation of a database (through the forms of what we call Database Editor) containing all relevant information for CNA (e.g. police records and reports etc.). The figure gives an example of how the system highlights tagged people in the document draft. The system helps avoiding errors in data entry by suggesting that the name of a person the user is typing is already available in the database (in the example, the person, anonymized as Node 72, is highlighted in violet). In addition, to allow a check on the correctness of the system’s suggestion, CrimeMiner shows details about that individual (see Fig. 3) by simply clicking on the name.

3. Criminal Network Analysis:

Graph Theory is a powerful mathematical tool very useful in studying relationships between entities. Data currently handled by CrimeMiner consist in people records and phone/environment tappings between two or more of them: people represent the vertices (or nodes) of the graph. Tappings make a relationship between two or more people and so they can be represented as edges in the graph.

CrimeMiner offers a set of SNA metrics and methods to allow the study of the characteristics of the criminal organization and the identification of the role of single individuals within it. Current implementation supports the use of the most relevant SNA measures allowing to assess features such as the dominance, subordination, influence or prestige of social actors (Freeman 1979). Among the implemented measures we can mention:

  • Degree centrality. It is used to measure the activity of a particular node (actor) in a network: a node with a high degree (high number of both incoming and outgoing communications) is likely to be a leader or a “hub” within the group. The more ties the actors have, the more power they (may) have, given their advantaged positions. Because they have many ties, they may have alternative ways to satisfy needs, and hence are less dependent on other individuals. Moreover, Degree centrality for a directed graph or network has one of two forms: In-degree centrality and Out-degree centrality. The former measures the number of received ties. An actor that receives more ties can be classified as prominent, increasing his/her level of importance. The latter measures the number of outgoing ties. Actors with many outgoing ties may quickly reach many other actors in the network, and therefore, they are often characterized as influential.

  • Betweenness centrality. Betweenness is a measure of the extent to which a node is connected to other nodes that are not connected to each other. The betweenness, therefore, measures the centrality of a node in terms of his/her favored position. More specifically, more people depend on an actor to make connections with other people, the more power that actor has. In other words, it is a measure of the degree to which a node serves as an intermediary in the network.

  • Eigenvector centrality and Page Rank. Eigenvector centrality is an extension of the Degree centrality and it is a measure of the influence of a node in a network. Specifically, a node is important if it is linked to by other important nodes. It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. Google’s PageRank is a variant of the eigenvector centrality measure.

  • Modularity. Modularity is a method to extract communities from large networks. A community can be simply defined as a group of people which know each other and share interests, knowledge, or collaborate to reach a common target. It is today one of the most widely used method for detecting communities in large networks.

CrimeMiner allows also to generate different views of the graph filtering data according to user defined parameters. Very simple implementations are currently embedded into the tool: for example it is possible to set a threshold on the nodes degree or highlight in the graph people sharing the same criminal records or living in a certain area. An example of graph visualization and analysis is shown in Fig. 4.

4. Visual Analytics and Information Retrieval:

CrimeMiner generates graphs of the network under investigation that can be used both for network analysis as well as for information retrieval. The color and the size of both nodes and edges can be calculated according to the parameters chosen by the user e.g. Page Rank for the nodes and Out-degree centrality for the edges. In the example reported in Fig. 4 the size of the nodes is proportional to the degree centrality while edges’ thickness is proportional to the overall number of phone calls two nodes are involved in. The Visual Information Retrieval functionality, allows visual browsing through the CrimeMiner database offering the possibility to access the data belonging both to criminal network and its members by simply interacting with a graph. This is obtained through a window showing useful informations when the user clicks a tagged entity on the document, a node or an edge on the graph.

Fig. 3
figure 3

CrimeMiner Document Editor module

Fig. 4
figure 4

Visualization and analysis

Architecture

CrimeMiner is under active development. A current prototype is developed in Java but we are switching to a web-based version to improve development time, user experience quality, and deployment procedure. This choice was made based on different reasons mainly deriving from the feedback we recently collected from the domain experts involved in the project. The first one is that users nowadays are very confident using web-based applications, their graphical interfaces and their interaction patterns. Moreover, users just need to reach a website and log-in, while the download-install-execute process of a desktop application feels outdated and cumbersome.

From a structural point of view, the current (desktop) and prototype (Web) versions of CrimeMiner are practically the same (see Fig. 5). Differences are related to the libraries/components implementing each layer. From the technical point of view, CrimeMiner follows a typical Model-View-Controller layer architecture, very common on both desktop and Web applications.

Fig. 5
figure 5

CrimeMiner architecture overview

The CrimeMiner architecture is made up of three different components: (1) the Data Persistence Layer, (2) the Business Layer, and (3) the User Interface Layer. The Data Persistence Layer deals with the storage of data in MySQL databases. Data stored includes personal details of investigated people, tapping records (wiretapping and environmental tappings) and, finally, the document created by the user by means of CrimeMiner.

The Business Layer is responsible for the creation, management, and manipulation of data, by implementing the functionalities provided by the Document Editor, the Database Editor, and finally, the GEPHI Library, that allows the generation, visualization, and analysis of the graph.

The prototype version exhibits slightly differences due to the change of technological platform: the standalone Document Editor is replaced with a web-based editor; the Database Editor features are implemented by using Angular JS Footnote 1 framework; the GEPHI Library (only available on desktop platform) is replaced by two web-based libraries, an internally developed SNA Library for network analysis, and a Graph Library for visualization based on Linkurious.js.Footnote 2 Finally, the topmost layer is the User Interface Layer, that allows the user to interact with CrimeMiner features. The desktop-based version of CrimeMiner, the UI consisted in a Main view user interface (where the user can edit the database and write the documents) and in a separated GEPHI user interface embedding GEPHI features into a separated window. The Web-based version of CrimeMiner will offer a unique and more coherent user interface. Figure 6 shows the relations between the Data Persistence and the User Interface Layer.

Fig. 6
figure 6

Graphs as alternate views to data. Data visualization is allowed in a text format (a) and in a graph format (b)

The information provided by the users and stored in the CrimeMiner databases is used by the system to produce different textual (a) and graphical (b) representations.

A case study: testing CrimeMiner with real data

Taking a cue from the scenario so far sketched, we have decided to test CrimeMiner with data coming from the proceedings of a real trial against organized crime, i.e., the Cava clan, a criminal organization from Quindici, Avellino, Italy, a trial that took place in Italy in the early 2000s and that ended few months ago with a judgment from the Italian Supreme Court.Footnote 3 The investigation involved more than 70 people, with 300 phones under tapping, for an overall amount of 2791 telephone tappings, reported in a document (legally a Request for interim measures) consisting of about 3000 pages.

Our goal was threefold: start testing the functionalities of CrimeMiner; exploit basic SNA metrics to analyze a real criminal organization and validate results against the findings (identification of leading members of the organizations, determinations of criminal liabilities, convictions) offered by the Italian Supreme Court.

According to even very recent literature in this field (Ferrara et al. 2014), our attention has been focused, in this stage, on the analysis of phone logs, data that convey important information used by police departments and public prosecutors to verify the existence of interpersonal relationships and the communication flows. The analysis of communication networks among criminals allows not only to identify the structure of the organization but also facilitate the identification of the individuals who play a key role inside the network, or connect different subgroups. For sake of privacy, personal information (name and surname) have been anonymized.

Experiment design

We have designed a testing and experimental phase divided in two steps:

Document drafting and enhancement

Using the Editor module we have simulated the activity of a public prosecutor partially re-drafting the request for interim measures. We focused our attention on the portions of the text (mainly telephone tapping transcripts) containing relevant individual and relational information. Document enhancement functionalities implemented in CrimeMiner allowed to mark the information needed to generate the graph and study the relational structure of the criminal organization.

Visualization, analysis and validation

Using the data gathered and structured in the previous step, we generated the graph of the criminal network and analyzed it by means of the most common SNA measures. Nodes represents people under investigations, edges represent the telephone tappings between them. To validate our approach (essentially our choices about data and measures), we compared our results with evidences resulting from the judgment of the Italian Supreme Court (May 2015).

Preliminary results

Results so far obtained seem to be very interesting and promising. The analysis showed that relevant features of the network (and therefore interesting investigative insights) can be derived even by considering only the number and direction of phone calls without taking into account the content of the communication among people investigated or other forms of relational information (wiretaps, FB, emails, etc.) or other factors (e.g., attributes of individuals like criminal records). We focused more in detail on the identification of leaders, hubs, and sub-communities, and on the level of social activity of the members of the network.

1. Sub-communities:

The study of groups within criminal organizations is crucial from the investigative standpoint. The analysis of the features of the sub-communities (i.e., size, density and frequency of the communications) can be extremely relevant to identify essential features of the criminal activity (i.e., trends, patterns) and to steer the development of the investigation (i.e., tightening checks of specific activities or individuals).

In the field of SNA, this activity can be supported by the application of several community detection algorithms. In our case study we used the Modularity algorithm by Blondel (Blondel et al. 2008), that allowed to successfully identify the main sub-communities of the Cava criminal organization. As shown in Fig. 7 we detected 5 large sub-communities (composed by a number of persons ranging from 30 to 60). In each of them a key role is played by individuals found out to be influential criminals by the Supreme Court. We have to emphasize that, for privacy reasons, names have been anonymized.

2. Level of social activity:

The number, the temporal distribution and the direction of the communications between individuals belonging to a criminal network is a good warning sign of social and (potentially) criminal activity. Mapping phone calls allows to re-build the structure and the dynamics of the social interaction network in which each individual is embedded. In our case study, we used the Degree centrality measure to determine the overall level of social activity of each individual (see Fig. 8). This attribute was identified considering the total amount of both made and received phone calls (the highest the number the highest the presumed level social activity).

We then computed out-degree and in-degree centrality to distinguish prominent from influential actors according to the definition provided in the previous section. The analysis allowed to identify a group of persons showing high values for each of the aforementioned measures (Nodes 10, 5, 72). Interestingly, these individuals were finally sentenced to imprisonment ranging from 19 to 30 years. Results are shown in Fig. 9 and Table 1.

3. Leadership:

The leadership, the tendency of individuals to influence, coordinate, and control members of a group is one of the most important features for the analysis of the criminal organizations and, at the same time, a well-known research topic in SNA and CNA areas. One of the most used algorithms to detect leaders in social networks is the Page Rank (Easley and Kleinberg 2010). PageRank that underlies the idea that the relevance of an individual can be inferred from the social relevance of the individuals that relate in some way to him (e.g. citing or calling him). In our case study Page Rank allowed to identify the most prominent actors of the Cava organization. Nodes 72,128 and 5 shown in Table 2, has been recognized as the leader of the clan in the real trial (see Fig. 10).

4. Hubs:

The execution of criminal activities often requires the intervention of intermediaries allowing information exchange, trafficking and orders enforcement. To discover “brokers” in the clan the Betweenness centrality measure was used. The results of the analysis showed that in the organization under scrutiny there is an interesting overlap between the leader’s and broker’s role as highlighted in Fig. 11 and in Table 3 (see, in particular, Nodes 72 and 5).

Finally, the application of the SNA metrics, gave us insights that substantially reflect evidences resulting from the judgment of the Italian Supreme Court (May 2015).

Fig. 7
figure 7

Sub-communities of the criminal network identified by applying the Modularity algorithm. Each color identifies a different community

Fig. 8
figure 8

Degree centrality to determine the overall level of social activity. Measured the total amount of both made and received phone calls

Fig. 9
figure 9

In-degree and Out-degree centrality measures to identify prominent and influential people, respectively

Fig. 10
figure 10

Application of the PageRank algorithm to infer leaders of the criminal network

Fig. 11
figure 11

Application of the Betweenness centrality to discover people playing the role of intermediary in the criminal network

Table 1 Level of social activity
Table 2 PageRank results: list of the most prominent individuals involved in criminal activities in the Cava clan
Table 3 Betweenness centrality results: Nodes 72 and 5 play a crucial role both as leader and as broker in the network

Conclusion and future work

The work presented is just a first step towards the integration of SNA in investigative activities. Functionalities so far implemented were discussed with prosecutors involved in the fight against organized crime and seem to represent a good starting point to exploit computational techniques in the investigative process. Anyway, even if in progress, experiments have given some interesting outcomes. The most tangible results are found on the level of implementation. The work done so far has allowed us to explore the issues related to the parsing of unstructured investigative and prosecution documents and to obtain useful information on the processing of these kinds of documents. Results are encouraging in terms of capability of our tool to extract relevant information: insights derived through our approach fit with the trial evidences resulting from the Supreme Court judgment.

Our analysis still suffers a number of limitations. First of all, we used a single even if meaty document taking into account only the number and the direction of phone calls, an information that by itself is not unequivocally meaningful of the role played by a person within a group. The level of social activity (in our case the number of phone calls) of a leader can significantly vary depending on the organization (e.g. Mafia, Camorra or Ndrangheta), the region and the culture considered. According to this consideration, it clearly emerged how the creation of an effective investigative tool requires not only to increase the amount and variety of information processed (e.g., gathering data from various criminal justice databases) but also to refine criteria used in processing judicial and police information “embedding” the inquirers know-how in the analysis. In this perspective, it would be useful to estimate the significance of other investigative information (e.g. criminal records ) with regards to the probability that an individual is part of the criminal organization. The most interesting result, however, is found on the methodological level: our experience showed that, as authoritatively highlighted in von Lampe (2006), an interdisciplinary approach to crime analysis is an essential condition for the development of effective interdiction and law enforcement strategies.

Future works will cover different aspects. From a theoretical point of view, we will continue the study of the contribution that social sciences (from criminology to sociology etc.) can give to the understanding of the criminal phenomenon. From an application point of view, a first goal is represented by the enhancement of document drafting features: we are planning to embed templates for each of the different documents to be created during the investigations in order to ease the work of the users. Another relevant goal is to enhance the analysis. We are not only planning to take into account available and not yet used metadata (e.g. activity area, criminal records, family ties etc.), but we will also experiment NLP and sentiment analysis techniques to extract relevant information from the message content (e.g. level dangerousness of individuals; details about the content and the evolution of the criminal activities etc.). Moreover, we will compare the effectiveness of different algorithms for both community detection and predictive analyses. The implementation of more advanced visual information retrieval functionalities is another important aspect for investigators, while the support for online synchronous collaboration will be extremely useful for public prosecutors and police officers.

Finally, we plan to perform an in-depth evaluation study in order to assess the usability of CrimeMiner and its overall satisfaction and acceptance. Moreover, domain experts will be involved in order to also assess its effectiveness with real use cases.