Keywords

1 Introduction

Nowadays, the community security management system has been widely used in China, and massive resident information has been collected, such as resident demographic data, visiting records, social relationships, criminal records, etc. Many existing works have focused on analyzing such complex data to detect anomaly events. The branch of data mining has wide applications in security, finance, and many others. These methods can be categorized into two groups based on various data models: high-dimensional method (e.g., [1, 2]) and graph based method (e.g., [3,4,5]). As graph is recently widely used to model real objects, this paper also focuses on the anomaly detection problem based on graph models. Different from existing work concerned with general criminal events, this paper focuses on some urgent real problems such as drug abuse behavior discovery and illegal pyramid selling organization detection. We develop a community data analysis system containing four main components. The components and our main contribution can be summarized as follows.

  • We collect massive real community data from some cities in China, and use several data cleaning techniques to obtain high-quality data. The sensitive profiles are desensitized before we store the data into the database.

  • We use evolving social graphs to model the house visiting data, and employ a subgraph mining algorithm to solve the criminal events detection problem.

  • We develop a powerful visualization system to display the evolving social graphs, the warning messages, and the detected criminal communities.

2 System Architecture and Demonstration

Our system is implemented based on J2EE platform and the system architecture is shown in Fig. 1. The system has four main components, including data collection, community social graph construction, data analysis, and result visualization.

Fig. 1.
figure 1

The system architecture

2.1 Data Collection

We gather data from several real communities in China. The raw data contain massive information, such as resident demographic data, house visiting records, social relationships, and historical criminal records. We do data cleaning first to delete all the noisy data. Then, to protect residents’ privacy, we desensitize the demographic data by removing explicit properties and randomly generating some synthetic profiles. Furthermore, we extract the most important features for profiling the human behaviors. The extracted profiles of a resident are shown in our system as in Fig. 2.

Fig. 2.
figure 2

The user profile page

2.2 Community Social Graph Construction

We use a large static social graph to model the social relations of all residents, as shown in Fig. 3. In the social graph, each resident is represented as one node. If two residents have relationship such as family relation or friendship, there will be one edge being connected between these two nodes. To support efficient criminal community detection, we use a sequence of time-evolving graphs to model the house visiting data for each house owner, as shown in Fig. 4. Different from the social graph, the edge in time-evolving graphs represent the visiting relationship. For example, if one resident of node A visited a house owner of node B, then there will be one edge connecting node A and node B.

Fig. 3.
figure 3

The community social graph example for several selected residents

Fig. 4.
figure 4

The evolving visiting graphs for a selected address in May, and a highlighting result of the detected illegal pyramid selling organization (Color figure online)

2.3 Data Analysis and Visualization

Based on the graph models, we formulate the criminal community detection tasks as the subgraph mining problem, and implement a frequent pattern mining algorithm to solve it [6]. The details of the algorithm can be seen in our summited full research paper to a conference. We omit the details here as we will publish a technical report later online. As shown in Fig. 4, the substructures with red nodes and edges are the detected illegal pyramid selling organization. The system contains other pages to show other kinds of detected results. We omit them here for the space constraints.