Keywords

1 Introduction

In recent years, the sustainable development of information technologies such as big data analysis and artificial intelligence has promoted the construction of public security informatization [1]. Massive amounts of heterogeneous data are quickly generated in all aspects of society. To better use these data, the public security bureau has formed a public security basic information network and built big data centers.

With the advent of the big data era, traditional data processing methods and technologies are no longer able to meet the increasing demands of data, and explosive data growth has forced people to seek new data processing methods [2,3,4,5]. This kind of data processing demand is particularly prominent in the field of public security [6, 7]. The public security bureau masters a large number of data sources, such as public security business systems (e.g., criminal records, sentence records), government management systems (e.g., tax records, bank records), Internet data (e.g., online shopping records, online chat records), social data (e.g., travel records), etc. We regard all these data being collected by the public security bureau as police data.

However, due to the wide range of police data sources and the continuous emergence of new data sources, it hinders a thorough understanding of police data by grassroots police officers. The lack of correlations between data of various sources has formed more and more data islands [8], which severely hampers the use of police data. Currently, there exist several intelligent police data management platforms, such as the Tianhe Big Data PlatformFootnote 1, which provides a visual operating environment, and completes the configuration and construction of data processing and analysis processes by dragging and dropping pre-defined elements. However, since it lacks unified data specifications and standards of police data, these systems are always developed towards specific police data sets and show poor generality and reusability. In addition, data query methods provided by existing visualization platforms are mainly based on string matching [9], which often leads to inaccurate querying results due to inconsistency in semantics.

In this paper, we propose and implement a customized police data modeling platform (CPDMP). With CPDMP, police officers may have a better understanding of multi-source data in a visualized environment and digitally convert their practical experience as customized models. To better organize multi-source data and eliminate semantic inconsistency issues in CPDMP, we construct a public security domain knowledge model (PSDKM) considering collected police data. Additionally, we have also added query expansion capabilities to CPDMP to enhance its usability.

The three main contributions of this paper are as follows.

  • We built eight police thematic databases and a police domain knowledge model to help police officers better understand and use multi-source data.

  • We implemented a customized modeling platform to provide a visual method to help police officers convert their own experiences into digital models.

  • The domain knowledge model, which brings query expansion capability to CPDMP, can be further extended as a foundation.

The rest of the paper is organized as follows. In Sect. 2, we present the related concepts and a running example. In Sect. 3, we illustrate the architecture and main functions of CPDMP, and detail PSDKM. Experiment settings and results analysis are given in Sect. 4. In Sect. 5, we present the related work. Finally, we conclude in Sect. 6.

2 Related Concepts and Running Example

In this section, we provide relevant concepts and a running example to introduce the work of this paper, including ontology (Sect. 2.1), query expansion (Sect. 2.2), and a running example (Sect. 2.3, Fig. 1).

Fig. 1.
figure 1

A running example of the modeling platform. A1 and A2 show the database management and model supermarket interfaces. B1 presents data tables that can be dragged and dropped. B2 and B3 show modeling tools and their configurations. C1 shows a set of other operations.

2.1 Ontology

Ontology is a conceptual model that describes concepts and relationships between concepts in the related fields [10]. The structure of the ontology is a five-tuple [11] O: = {C, R, H, Rel, A}. C and R are two disjoint sets. Among them: the elements in C are called concepts; R is a relationship set between elements in C; H represents the concept level, that is, the taxonomy relation between concepts; Rel represents a non-taxonomy relation between concepts; A stands for ontological axiom.

In this paper, we use a bottom-up method to construct the public security domain knowledge model (PSDKM). Through the analysis of police data, the concepts, terms and their relationships in the field of public security are clarified.

2.2 Query Expansion

The relational database is usually accessed through structured query language SQL, which is a deterministic and precise query, that is, users need to construct accurate SQL query statements. The querying conditions should be accurate, and the querying results are also accurate. However, this kind of methods is all based on string matching and takes no advantage of semantic relationships between data (such as synonym, subordinate relationships, etc.).

Query expansion is one of the primary methods to implement semantic queries in relational databases [12]. It replaces the keywords in original query statements with related words, concepts, etc. to construct a new query statement.

2.3 Running Example

Fugitives chase is a common task of police, and police officers pay special attention at hotels. Figure 1 shows a data analysis model, which is built with CPDMP and aims to identify fugitives at hotels. We use this model as a running example to illustrate the main functions of CPDMP. As demonstrated in the running example, the left-hand side lists public security thematic databases (B1), the modeling interface is in the middle, and the right-hand side shows the related tools (B2) and their configurable windows (B3). The required data sources can be dragged and dropped into the modeling interface, and a top-down structure is used to build data models. At the bottom of the modeling interface, there are a set of operations including save, detail, delete a node, delete a connection, and reset (C1), of data models (or model elements).

3 CPDMP Overview

In this section, we introduce CPDMP, which is a general knowledge-enabled police data customized modeling platform. First, we briefly introduce the implementation architecture of CPDMP in Subsect. 3.1. Then, we introduce PSDKM and CPDMP in detail in Subsect. 3.2 and Subsect. 3.3, respectively. Finally, we illustrate the use of CPDMP in Subsect. 3.4.

3.1 Implementation of CPDMP

Fig. 2.
figure 2

The implementation architecture of CPDMP.

The implementation architecture of CPDMP is shown in Fig. 2. The public security data center is a police data storage center built on the police security intranet (physically isolated from the Internet). The construction of the knowledge model and the output of the data model in CPDMP all depend on the support of the public security data center. The construction of CPDMP adopts the method of separating the front-end and back-end. The front end displays the platform and provides a user interface, and the back end provides business services support.

3.2 PSDKM

Towards managing large amounts of police data, CPDMP was established. Especially, in CPDMP, we construct different thematic databases and a knowledge model “PSDKM” to express the connections among police data.

According to the real requirements of a specific local Public Security Bureau, we have established eight thematic databases to support police officers’ daily work. Table 1 shows the detailed information (i.e., thematic name, tables involved, data size, main contents and data sources) of each thematic database. The total number of tables contained in the eight theme databases is 211.

Table 1. Detailed information of eight thematic databases.

To define and show potential relationships between data, we adopted a semi-automatic approach to build PSDKM [10, 13]. First, we determine that the scope of knowledge is mainly from 211 tables. Then, we look for data and their relationships from databases. Related steps are as follows.

  • Analyze the information of tables to sort out the domain scope and important concepts in PSDKM.

  • Extract important concepts and construct the conceptual framework of PSDKM.

  • Semi-automatically converting the conceptual framework of PSDKM into an owl file (a knowledge model) using mapping rules.

  • Use Protégé to refine and evaluate the initial constructed knowledge model.

The relevant mapping rules are as follows.

  • Tables in relational databases are mapped to classes.

  • Columns of tables are mapped to properties of classes.

  • Each row of tables is mapped to an entity.

  • The value of each cell in tables is mapped to a property value. When a cell corresponds to a foreign key, we replace it with the entity or property pointed to by the foreign key.

As shown in Fig. 3, we sorted out 178 classes, 1033 data properties and 791 object properties. Notice that, building PSDKM is a continuous iterative process.

Fig. 3.
figure 3

An illustration of PSDKM.

3.3 CPDMP

CPDMP uses graphics to describe actual combat experience [14]. The data modeling process is visualized and can be modified or extended at any time.

Fig. 4.
figure 4

The connections between modules and the PSDKM in CPDMP.

As shown in Fig. 4, CPDMP consists of six modules (five functional modules and one application module) and a knowledge model (PSDKM).

Database Management. The main function provided by this module is to adapt to different databases usage requirements. It allows police officers to view and operate original databases. To facilitate users’ operations, we provide a visual interface to realize operations such as viewing databases, viewing, adding and deleting tables, etc. We categorized eight thematic databases. Take “risk management” as an example, as shown in Table 2, this thematic database includes 13 tables, such as the basic information of face grasping, camera position, etc. These tables come from different original databases and relevant data is synchronized to the “risk management” thematic database in real-time.

Table 2. Risk management thematic database.

Modeling Canvas.

This module is one of the core modules of CPDMP, it provides the interface for police officers to drag and drop elements (e.g., tables and linking tools) to build data models.

After dragging a table into the canvas in the form of a node, the format of the node is specially designed to ensure that the data of each node is consistent. A node includes coordinate information, data information, connection information, etc.

CPDMP presents relationships between tables (nodes in Canvas) as connections (edges in Canvas). The format of a connection is shown in Table 3.

Table 3. The format of a connection.

Linking Tools.

For building data models, it is necessary to connect different tables and generate new tables. In this module, we design 14 kinds of configurable linking tools, including 4 connection tools, 1 copy tool, 1 export tool, 1 sort tool, 1 de-duplication tool, 3 Boolean tools, 1 conditional filter tool, and 2 warning tools.

As shown in Table 4, the 4 connection tools are: inner connection, outer connection, left connection, and right connection.

Table 4. Four connection tools.

The main function of the “copy tool” is to copy a connection node to adjust output fields. The “export tool” is used to execute output results. The function of “sort tool” includes ascending and descending sorting, which can be configured according to practical needs. The “de-duplication tool” can remove the same fields between different tables, and only keep one of them to display. The function of “conditional filter tool” is to set the selection conditions on one or more fields (of a table) and return the filtered data rows.

Table 5 shows the functions of “Boolean tools” and “warning tools”. Boolean tools include three types: intersection, union and difference set. Warning tools include conditional warning and timed warning.

Table 5. The functions of Boolean tools and warning tools.

SQL Statements.

The function of this module is to find the dependencies between nodes in a data model and execute data models.

A complete data model has a visually top-down structure. In order to discover the data model structure during execution, we start from the execution node and look up all dependent nodes in turn. In this way, we not only find all nodes in a data model, but also find the dependencies between nodes.

During the execution of data models, the information contained in data models is converted into SQL statements. These modules support two kinds of querying forms: single-keyword and multi-keyword.

In the single-keyword-based querying method, to improve the accuracy of querying results, PSDKM is involved to perform query expansion. The query expansion steps with PSDKM are as follows.

  • Extract the generated query statements, identify the keywords (replace with synonym, upper and lower levels), and generate new query statements.

  • Execute the newly generated query statements and output new querying results.

  • Compare new querying results with original querying results.

  • Classify the results of all queries.

For the multi-keyword querying method, it concerns the realization of combining concepts, individuals and relations in PSDKM. The keywords may include concepts and individuals in PSDKM, and individual is an instance of a concept. If the keywords contain an individual, we can directly find the concept corresponding to the individual, and then reason the user’s query intention through semantic relationships defined between concepts.

Model Supermarket.

To save and reuse data models being built by police officers, we design this module. As shown in Table 6, a complete data model includes model ID, model title, model nodes, model connections, model creator, model description, and model type.

This information can be used to identify data models. If a data model is modified, the corresponding information of this data model will also be automatically updated.

Table 6. All fields of a data model stored in model supermarket.

Intelligent Applications.

The majority of police applications require police data to be real-time [15]. Executing one same data model, different results may be obtained in different time slots. Therefore, CPDMP provides intelligent applications, such as early warning functions that realize notifications to police officers.

3.4 Methodology

In this section, we illustrate the use of CPDMP with a UML activity diagram.

As shown in Fig. 5, first, a user should start to clarify their specific needs based on their practical experience. Then, the user can choose whether to create a new data model or reuse an existing data model. During the process of creating a new data model, the user can choose to select tables, select linking tools, tool configuration and connection tools and tables. In order to ensure the reliability of created data models, the user can execute part of the data model during the modeling process. After connecting tools and tables, the user can choose whether to save the model or execute the query directly.

Fig. 5.
figure 5

The methodology of employing CPDMP.

When choosing to reuse an existing data model, the user can select an existing data model in the model supermarket. Then, the selected data model will be presented in the modeling canvas. The user can modify the data model or use the data model directly. In the process of modifying data models, the user can perform the same operations as creating a new data model.

4 Experiments

In this section, we evaluate CPDMP with a specific use case. Considering the sensitivity of police data, we conducted experiments with our collaboration partner “Sucheng branch of Suqian Public Security Bureau” (S2PSB). Four grassroots police officers are invited to take part in experiments.

4.1 Experiments Setting

The four under-test police officers have rich actual combat experience. They range in age from 28 to 35 and have at least 3-year working experience. Before experiments started, we gave the four police officers a two-hour training lecture, which included data presentation, linking tools description, modeling demonstration, etc. Then, leave them one hour to get acquainted with CPDMP and allow them to ask questions. After that, we distributed a specific modeling requirement to four police officers. They were asked to complete the modeling task and output the results within one hour.

Fig. 6.
figure 6

The experimental data model.

The experimental data model describes the process of finding people with criminal records from hotel occupants. We compared the results obtained by CPDMP and the traditional SQL-based querying method (currently used in S2PSB) to verify the integrity of CPDMP. To verify that CPDMP is easier to use than the traditional SQL-based querying method, we collected subjective opinions with a questionnaire on four police officers after the experiment. When they completed the experiments, the four police officers were asked to answer this predefined questionnaire immediately. There are 15 questions in the questionnaire (e.g., Is the visualized data easy to understand? How visible is the modeling process?). Answers to these questions are used to measure the degree of satisfaction. For each question, police officers can choose a value between 0–100 as the answer. Different value ranges indicate different satisfaction of police officers (<60 means dissatisfied. > 60 & < 80 means neutral. > 80 means satisfaction). Finally, we count the responses of each police officer. Calculate the average of all questions as the final evaluation criterion.

An example of the data model is shown in Fig. 6. Through the investigation and analysis of police officers, we know that the person with higher crime rates at hotels are those involved in pornography and drugs. So, we only use these two types of criminal recorders as the results of the experiment to meet the actual needs.

4.2 Data Set

In the current study, we use the practical data from S2PSB. In the experiment, we use the hotel occupants in Suqian City of February 2022 as the dataset. There is a total of 298 657 records in the dataset. The data includes hotel information, identity information and check-in information of each hotel occupant.

In addition, we also used some other data tables as the input of the data model, including the basic information of personnel, hotel unit information, Sucheng Police Station Information and hotel check-in form in S2PSB. Considering the confidentiality of police data and to prevent data leakage, we do not display detailed data in experiments.

4.3 Experiment Results and Analysis

Table 7 shows the querying results of CPDMP and the traditional SQL-based querying method. For different types of persons with criminal records, the number of results obtained by CPDMP is higher than that of the traditional SQL-based querying method.

Experiment results show that CPDMP outperforms traditional SQL-based querying methods in the integrity of querying results. In practical applications, the traditional SQL-based querying method requires a combination of police officers and technicians to perform the query. On CPDMP, police officers can build data models and obtain querying results by themselves. Table 8 shows the statistics on answers to the questionnaire. Through the analysis of these answers, we found that police officers thought CPDMP was superior to traditional methods in terms of convenience. This also indicates that CPDMP outperforms the traditional SQL-based querying method.

Table 7. Experiment results.
Table 8. The results of the questionnaire statistics.

5 Related Work

Big data analysis is a hot issue in many application areas. In the context of public security, Yu et al. [1] built a unified police big data analysis platform to provide strong support for the public security bureau in carrying out various police activities. Alic et al. [16] focused on data analysis in the field of public transportation and developed a platform “BIGSEA” that can process data under various constraints. Khorshidi et al. [17] presented how interpretable models can be constructed of police officer risk assessments and discussed issues of fairness that may arise when constructing models for police officer complaints and misconduct. Xi et al. [18] optimized the data visualization design model from the perspective of user experience, and combined human vision and cognitive laws to study the police system data visualization design strategy to guide the design practice. However, these methods are only simple data analyses and do not consider the deep-level characteristics of police data.

Many studies on knowledge model have been conducted with the goal of enhancing use of domain knowledge. Sitar-Taut et al. [19] proposed a recommender pipeline integrating domain-specific modeling, knowledge graphs, and a multi-criteria decision method that bridges the decision-makers’ priorities with the customer-facing output. Zheng et al. [20] adopted a combination of top-down and bottom-up methods to build a knowledge model, and applied the knowledge model to the field of hazardous chemical management. Luo et al. [21] constructed a large-scale E-commerce Cognitive Concept Net named “AliCoCo”, which is practiced in Alibaba, the largest Chinese e-commerce platform in the world. Donalds et al. [22] presented and illustrated a new cybercrime knowledge model that incorporates multiple perspectives and offered a more holistic viewpoint for cybercrime classification. However, these knowledge models do not provide a solution to the semantic inconsistency problem in police data. Considering the heterogeneity and sensitivity of policing data, a specific domain knowledge model (ontology) is necessary to be built, iteratively improved and customized extended.

Taking query expansion into consideration, Alfred et al. [23] focused on ontology-based query expansion methods in the field of agriculture. The method expands the original query by adopting new words and helps users to search for satisfactory results of queries. Jain et al. [24] proposed an information query method based on fuzzy ontology. The fuzzy ontology is constructed by using domain-specific knowledge. Based on the constructed fuzzy ontology, the most semantically relevant words to the query are identified and the query is expanded. However, there is still a lack of applications in the field of public security.

6 Conclusion

This paper presents the work of developing a customized police data modeling platform. First, after systematically analyzing the police data from 211 tables collected by a public security bureau branch, we propose a police domain knowledge model. Then, we detail the architecture and functions of the customized modeling platform mainly through six core modules. Next, we use a UML activity diagram to illustrate the using mechanism of the customized modeling platform. Finally, we evaluate the performance and prove the feasibility and integrity of CPDMP by carrying out experiments with police officers.

For the future work, we will focus on improving the scalability of CPDMP, continuously optimizing and improving the quality of PSDKM, and building more applicable models with CPDMP.