Abstract
With the rapid development of information and communication technologies, massive amounts of data continue to be generated and flood all aspects of society. As one of the key departments of the government, the public security bureau masters all kinds of heterogonous data. Deep analysis of these data will help to detect and prevent public security cases and maintain social stability. Therefore, it is an urgent demand for grassroots police officers to better manage and use these data. To address this demand, in this paper, we present the work of designing and implementing a customized data modeling platform. With the modeling platform, which owns a visual interface, police officers can have a better overview and understanding of collected data and use the drag-and-drop method to build data analysis models. As a core component of this modeling platform, after analyzing 211 tables of practical police data, we built a public security domain knowledge model. Cooperating with the Sucheng branch of Suqian Public Security Bureau, we conducted a set of experiments with police officers on real police data. Experiment results show that the modeling platform has better user-friendliness and outperforms the traditional SQL-based querying method considering the integrity of querying results.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In recent years, the sustainable development of information technologies such as big data analysis and artificial intelligence has promoted the construction of public security informatization [1]. Massive amounts of heterogeneous data are quickly generated in all aspects of society. To better use these data, the public security bureau has formed a public security basic information network and built big data centers.
With the advent of the big data era, traditional data processing methods and technologies are no longer able to meet the increasing demands of data, and explosive data growth has forced people to seek new data processing methods [2,3,4,5]. This kind of data processing demand is particularly prominent in the field of public security [6, 7]. The public security bureau masters a large number of data sources, such as public security business systems (e.g., criminal records, sentence records), government management systems (e.g., tax records, bank records), Internet data (e.g., online shopping records, online chat records), social data (e.g., travel records), etc. We regard all these data being collected by the public security bureau as police data.
However, due to the wide range of police data sources and the continuous emergence of new data sources, it hinders a thorough understanding of police data by grassroots police officers. The lack of correlations between data of various sources has formed more and more data islands [8], which severely hampers the use of police data. Currently, there exist several intelligent police data management platforms, such as the Tianhe Big Data PlatformFootnote 1, which provides a visual operating environment, and completes the configuration and construction of data processing and analysis processes by dragging and dropping pre-defined elements. However, since it lacks unified data specifications and standards of police data, these systems are always developed towards specific police data sets and show poor generality and reusability. In addition, data query methods provided by existing visualization platforms are mainly based on string matching [9], which often leads to inaccurate querying results due to inconsistency in semantics.
In this paper, we propose and implement a customized police data modeling platform (CPDMP). With CPDMP, police officers may have a better understanding of multi-source data in a visualized environment and digitally convert their practical experience as customized models. To better organize multi-source data and eliminate semantic inconsistency issues in CPDMP, we construct a public security domain knowledge model (PSDKM) considering collected police data. Additionally, we have also added query expansion capabilities to CPDMP to enhance its usability.
The three main contributions of this paper are as follows.
-
We built eight police thematic databases and a police domain knowledge model to help police officers better understand and use multi-source data.
-
We implemented a customized modeling platform to provide a visual method to help police officers convert their own experiences into digital models.
-
The domain knowledge model, which brings query expansion capability to CPDMP, can be further extended as a foundation.
The rest of the paper is organized as follows. In Sect. 2, we present the related concepts and a running example. In Sect. 3, we illustrate the architecture and main functions of CPDMP, and detail PSDKM. Experiment settings and results analysis are given in Sect. 4. In Sect. 5, we present the related work. Finally, we conclude in Sect. 6.
2 Related Concepts and Running Example
In this section, we provide relevant concepts and a running example to introduce the work of this paper, including ontology (Sect. 2.1), query expansion (Sect. 2.2), and a running example (Sect. 2.3, Fig. 1).
2.1 Ontology
Ontology is a conceptual model that describes concepts and relationships between concepts in the related fields [10]. The structure of the ontology is a five-tuple [11] O: = {C, R, H, Rel, A}. C and R are two disjoint sets. Among them: the elements in C are called concepts; R is a relationship set between elements in C; H represents the concept level, that is, the taxonomy relation between concepts; Rel represents a non-taxonomy relation between concepts; A stands for ontological axiom.
In this paper, we use a bottom-up method to construct the public security domain knowledge model (PSDKM). Through the analysis of police data, the concepts, terms and their relationships in the field of public security are clarified.
2.2 Query Expansion
The relational database is usually accessed through structured query language SQL, which is a deterministic and precise query, that is, users need to construct accurate SQL query statements. The querying conditions should be accurate, and the querying results are also accurate. However, this kind of methods is all based on string matching and takes no advantage of semantic relationships between data (such as synonym, subordinate relationships, etc.).
Query expansion is one of the primary methods to implement semantic queries in relational databases [12]. It replaces the keywords in original query statements with related words, concepts, etc. to construct a new query statement.
2.3 Running Example
Fugitives chase is a common task of police, and police officers pay special attention at hotels. Figure 1 shows a data analysis model, which is built with CPDMP and aims to identify fugitives at hotels. We use this model as a running example to illustrate the main functions of CPDMP. As demonstrated in the running example, the left-hand side lists public security thematic databases (B1), the modeling interface is in the middle, and the right-hand side shows the related tools (B2) and their configurable windows (B3). The required data sources can be dragged and dropped into the modeling interface, and a top-down structure is used to build data models. At the bottom of the modeling interface, there are a set of operations including save, detail, delete a node, delete a connection, and reset (C1), of data models (or model elements).
3 CPDMP Overview
In this section, we introduce CPDMP, which is a general knowledge-enabled police data customized modeling platform. First, we briefly introduce the implementation architecture of CPDMP in Subsect. 3.1. Then, we introduce PSDKM and CPDMP in detail in Subsect. 3.2 and Subsect. 3.3, respectively. Finally, we illustrate the use of CPDMP in Subsect. 3.4.
3.1 Implementation of CPDMP
The implementation architecture of CPDMP is shown in Fig. 2. The public security data center is a police data storage center built on the police security intranet (physically isolated from the Internet). The construction of the knowledge model and the output of the data model in CPDMP all depend on the support of the public security data center. The construction of CPDMP adopts the method of separating the front-end and back-end. The front end displays the platform and provides a user interface, and the back end provides business services support.
3.2 PSDKM
Towards managing large amounts of police data, CPDMP was established. Especially, in CPDMP, we construct different thematic databases and a knowledge model “PSDKM” to express the connections among police data.
According to the real requirements of a specific local Public Security Bureau, we have established eight thematic databases to support police officers’ daily work. Table 1 shows the detailed information (i.e., thematic name, tables involved, data size, main contents and data sources) of each thematic database. The total number of tables contained in the eight theme databases is 211.
To define and show potential relationships between data, we adopted a semi-automatic approach to build PSDKM [10, 13]. First, we determine that the scope of knowledge is mainly from 211 tables. Then, we look for data and their relationships from databases. Related steps are as follows.
-
Analyze the information of tables to sort out the domain scope and important concepts in PSDKM.
-
Extract important concepts and construct the conceptual framework of PSDKM.
-
Semi-automatically converting the conceptual framework of PSDKM into an owl file (a knowledge model) using mapping rules.
-
Use Protégé to refine and evaluate the initial constructed knowledge model.
The relevant mapping rules are as follows.
-
Tables in relational databases are mapped to classes.
-
Columns of tables are mapped to properties of classes.
-
Each row of tables is mapped to an entity.
-
The value of each cell in tables is mapped to a property value. When a cell corresponds to a foreign key, we replace it with the entity or property pointed to by the foreign key.
As shown in Fig. 3, we sorted out 178 classes, 1033 data properties and 791 object properties. Notice that, building PSDKM is a continuous iterative process.
3.3 CPDMP
CPDMP uses graphics to describe actual combat experience [14]. The data modeling process is visualized and can be modified or extended at any time.
As shown in Fig. 4, CPDMP consists of six modules (five functional modules and one application module) and a knowledge model (PSDKM).
Database Management. The main function provided by this module is to adapt to different databases usage requirements. It allows police officers to view and operate original databases. To facilitate users’ operations, we provide a visual interface to realize operations such as viewing databases, viewing, adding and deleting tables, etc. We categorized eight thematic databases. Take “risk management” as an example, as shown in Table 2, this thematic database includes 13 tables, such as the basic information of face grasping, camera position, etc. These tables come from different original databases and relevant data is synchronized to the “risk management” thematic database in real-time.
Modeling Canvas.
This module is one of the core modules of CPDMP, it provides the interface for police officers to drag and drop elements (e.g., tables and linking tools) to build data models.
After dragging a table into the canvas in the form of a node, the format of the node is specially designed to ensure that the data of each node is consistent. A node includes coordinate information, data information, connection information, etc.
CPDMP presents relationships between tables (nodes in Canvas) as connections (edges in Canvas). The format of a connection is shown in Table 3.
Linking Tools.
For building data models, it is necessary to connect different tables and generate new tables. In this module, we design 14 kinds of configurable linking tools, including 4 connection tools, 1 copy tool, 1 export tool, 1 sort tool, 1 de-duplication tool, 3 Boolean tools, 1 conditional filter tool, and 2 warning tools.
As shown in Table 4, the 4 connection tools are: inner connection, outer connection, left connection, and right connection.
The main function of the “copy tool” is to copy a connection node to adjust output fields. The “export tool” is used to execute output results. The function of “sort tool” includes ascending and descending sorting, which can be configured according to practical needs. The “de-duplication tool” can remove the same fields between different tables, and only keep one of them to display. The function of “conditional filter tool” is to set the selection conditions on one or more fields (of a table) and return the filtered data rows.
Table 5 shows the functions of “Boolean tools” and “warning tools”. Boolean tools include three types: intersection, union and difference set. Warning tools include conditional warning and timed warning.
SQL Statements.
The function of this module is to find the dependencies between nodes in a data model and execute data models.
A complete data model has a visually top-down structure. In order to discover the data model structure during execution, we start from the execution node and look up all dependent nodes in turn. In this way, we not only find all nodes in a data model, but also find the dependencies between nodes.
During the execution of data models, the information contained in data models is converted into SQL statements. These modules support two kinds of querying forms: single-keyword and multi-keyword.
In the single-keyword-based querying method, to improve the accuracy of querying results, PSDKM is involved to perform query expansion. The query expansion steps with PSDKM are as follows.
-
Extract the generated query statements, identify the keywords (replace with synonym, upper and lower levels), and generate new query statements.
-
Execute the newly generated query statements and output new querying results.
-
Compare new querying results with original querying results.
-
Classify the results of all queries.
For the multi-keyword querying method, it concerns the realization of combining concepts, individuals and relations in PSDKM. The keywords may include concepts and individuals in PSDKM, and individual is an instance of a concept. If the keywords contain an individual, we can directly find the concept corresponding to the individual, and then reason the user’s query intention through semantic relationships defined between concepts.
Model Supermarket.
To save and reuse data models being built by police officers, we design this module. As shown in Table 6, a complete data model includes model ID, model title, model nodes, model connections, model creator, model description, and model type.
This information can be used to identify data models. If a data model is modified, the corresponding information of this data model will also be automatically updated.
Intelligent Applications.
The majority of police applications require police data to be real-time [15]. Executing one same data model, different results may be obtained in different time slots. Therefore, CPDMP provides intelligent applications, such as early warning functions that realize notifications to police officers.
3.4 Methodology
In this section, we illustrate the use of CPDMP with a UML activity diagram.
As shown in Fig. 5, first, a user should start to clarify their specific needs based on their practical experience. Then, the user can choose whether to create a new data model or reuse an existing data model. During the process of creating a new data model, the user can choose to select tables, select linking tools, tool configuration and connection tools and tables. In order to ensure the reliability of created data models, the user can execute part of the data model during the modeling process. After connecting tools and tables, the user can choose whether to save the model or execute the query directly.
When choosing to reuse an existing data model, the user can select an existing data model in the model supermarket. Then, the selected data model will be presented in the modeling canvas. The user can modify the data model or use the data model directly. In the process of modifying data models, the user can perform the same operations as creating a new data model.
4 Experiments
In this section, we evaluate CPDMP with a specific use case. Considering the sensitivity of police data, we conducted experiments with our collaboration partner “Sucheng branch of Suqian Public Security Bureau” (S2PSB). Four grassroots police officers are invited to take part in experiments.
4.1 Experiments Setting
The four under-test police officers have rich actual combat experience. They range in age from 28 to 35 and have at least 3-year working experience. Before experiments started, we gave the four police officers a two-hour training lecture, which included data presentation, linking tools description, modeling demonstration, etc. Then, leave them one hour to get acquainted with CPDMP and allow them to ask questions. After that, we distributed a specific modeling requirement to four police officers. They were asked to complete the modeling task and output the results within one hour.
The experimental data model describes the process of finding people with criminal records from hotel occupants. We compared the results obtained by CPDMP and the traditional SQL-based querying method (currently used in S2PSB) to verify the integrity of CPDMP. To verify that CPDMP is easier to use than the traditional SQL-based querying method, we collected subjective opinions with a questionnaire on four police officers after the experiment. When they completed the experiments, the four police officers were asked to answer this predefined questionnaire immediately. There are 15 questions in the questionnaire (e.g., Is the visualized data easy to understand? How visible is the modeling process?). Answers to these questions are used to measure the degree of satisfaction. For each question, police officers can choose a value between 0–100 as the answer. Different value ranges indicate different satisfaction of police officers (<60 means dissatisfied. > 60 & < 80 means neutral. > 80 means satisfaction). Finally, we count the responses of each police officer. Calculate the average of all questions as the final evaluation criterion.
An example of the data model is shown in Fig. 6. Through the investigation and analysis of police officers, we know that the person with higher crime rates at hotels are those involved in pornography and drugs. So, we only use these two types of criminal recorders as the results of the experiment to meet the actual needs.
4.2 Data Set
In the current study, we use the practical data from S2PSB. In the experiment, we use the hotel occupants in Suqian City of February 2022 as the dataset. There is a total of 298 657 records in the dataset. The data includes hotel information, identity information and check-in information of each hotel occupant.
In addition, we also used some other data tables as the input of the data model, including the basic information of personnel, hotel unit information, Sucheng Police Station Information and hotel check-in form in S2PSB. Considering the confidentiality of police data and to prevent data leakage, we do not display detailed data in experiments.
4.3 Experiment Results and Analysis
Table 7 shows the querying results of CPDMP and the traditional SQL-based querying method. For different types of persons with criminal records, the number of results obtained by CPDMP is higher than that of the traditional SQL-based querying method.
Experiment results show that CPDMP outperforms traditional SQL-based querying methods in the integrity of querying results. In practical applications, the traditional SQL-based querying method requires a combination of police officers and technicians to perform the query. On CPDMP, police officers can build data models and obtain querying results by themselves. Table 8 shows the statistics on answers to the questionnaire. Through the analysis of these answers, we found that police officers thought CPDMP was superior to traditional methods in terms of convenience. This also indicates that CPDMP outperforms the traditional SQL-based querying method.
5 Related Work
Big data analysis is a hot issue in many application areas. In the context of public security, Yu et al. [1] built a unified police big data analysis platform to provide strong support for the public security bureau in carrying out various police activities. Alic et al. [16] focused on data analysis in the field of public transportation and developed a platform “BIGSEA” that can process data under various constraints. Khorshidi et al. [17] presented how interpretable models can be constructed of police officer risk assessments and discussed issues of fairness that may arise when constructing models for police officer complaints and misconduct. Xi et al. [18] optimized the data visualization design model from the perspective of user experience, and combined human vision and cognitive laws to study the police system data visualization design strategy to guide the design practice. However, these methods are only simple data analyses and do not consider the deep-level characteristics of police data.
Many studies on knowledge model have been conducted with the goal of enhancing use of domain knowledge. Sitar-Taut et al. [19] proposed a recommender pipeline integrating domain-specific modeling, knowledge graphs, and a multi-criteria decision method that bridges the decision-makers’ priorities with the customer-facing output. Zheng et al. [20] adopted a combination of top-down and bottom-up methods to build a knowledge model, and applied the knowledge model to the field of hazardous chemical management. Luo et al. [21] constructed a large-scale E-commerce Cognitive Concept Net named “AliCoCo”, which is practiced in Alibaba, the largest Chinese e-commerce platform in the world. Donalds et al. [22] presented and illustrated a new cybercrime knowledge model that incorporates multiple perspectives and offered a more holistic viewpoint for cybercrime classification. However, these knowledge models do not provide a solution to the semantic inconsistency problem in police data. Considering the heterogeneity and sensitivity of policing data, a specific domain knowledge model (ontology) is necessary to be built, iteratively improved and customized extended.
Taking query expansion into consideration, Alfred et al. [23] focused on ontology-based query expansion methods in the field of agriculture. The method expands the original query by adopting new words and helps users to search for satisfactory results of queries. Jain et al. [24] proposed an information query method based on fuzzy ontology. The fuzzy ontology is constructed by using domain-specific knowledge. Based on the constructed fuzzy ontology, the most semantically relevant words to the query are identified and the query is expanded. However, there is still a lack of applications in the field of public security.
6 Conclusion
This paper presents the work of developing a customized police data modeling platform. First, after systematically analyzing the police data from 211 tables collected by a public security bureau branch, we propose a police domain knowledge model. Then, we detail the architecture and functions of the customized modeling platform mainly through six core modules. Next, we use a UML activity diagram to illustrate the using mechanism of the customized modeling platform. Finally, we evaluate the performance and prove the feasibility and integrity of CPDMP by carrying out experiments with police officers.
For the future work, we will focus on improving the scalability of CPDMP, continuously optimizing and improving the quality of PSDKM, and building more applicable models with CPDMP.
References
Yu, H., Hu, C.: A police big data analytics platform: framework and implications. In: 2016 IEEE First International Conference on Data Science in Cyberspace (DSC), pp. 323–328. IEEE (2016)
Elgendy, N., Elragal, A.: Big data analytics in support of the decision making process. Procedia Computer Science 100, 1071–1084 (2016)
Chong, D., Shi, H.: Big data analytics: a literature review. J. Manage. Analytics 2(3), 175–201 (2015)
Silva, B.N., et al.: Urban planning and smart city decision management empowered by real-time data processing using big data analytics. Sensors 18(9), 2994 (2018)
Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues, and opportunities. In: International conference on database systems for advanced applications, pp. 1–15. Springer (2013). https://doi.org/10.1007/978-3-642-40270-8_1
Lum, C., Koper, C.S., Willis, J.: Understanding the limits of technology’s impact on police effectiveness. Police Q. 20(2), 135–163 (2017)
Chan, J.B.: The technological game: how information technology is transforming police practice. Criminal Justice 1(2), 139–159 (2001)
Zhang, Y., Tang, X., Du, B., Liu, W., Pu, J., Chen, Y.: Correlation feature of big data in smart cities. In: International Conference on Database Systems for Advanced Applications. pp. 223–237. Springer (2016). https://doi.org/10.1007/978-3-319-32055-7_19
Kim, W.: Xrel: a path-based approach to storage and retrieval of xml documents using relational databases. ACM Trans. Internet Technol. (TOIT) 1(1), 110–141 (2001)
Westerinen, A., Tauber, R.: Ontology development by domain experts (without using the “o” word). Applied Ontology 12(3–4), 299–311 (2017)
Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intell. Syst. 16(2), 72–79 (2001)
Azad, H.K., Deepak, A.: Query expansion techniques for information retrieval: a survey. Inf. Process. Manage. 56(5), 1698–1735 (2019)
Zhao, Y., Dong, J., Peng, T.: Ontology classification for semantic-web-based software engineering. IEEE Trans. Serv. Comput. 2(4), 303–317 (2009)
Kamsu-Foguem, B., Chapurlat, V.: Requirements modelling and formal analysis using graph operations. Int. J. Prod. Res. 44(17), 3451–3470 (2006)
Carnaz, G., Nogueira, V.B., Antunes, M., Ferreira, N.: An automated system for criminal police reports analysis. In: International Conference on Soft Computing and Pattern Recognition, pp. 360–369. Springer (2018). https://doi.org/10.1007/978-3-030-17065-3_36
Alic, A.S., et al.: Bigsea: a big data analytics platform for public transportation information. Futur. Gener. Comput. Syst. 96, 243–269 (2019)
Khorshidi, S., Carter, J.G., Mohler, G.: Repurposing recidivism models for forecasting police officer use of force. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 3199–3203 (2020). https://doi.org/10.1109/BigData50022.2020.937817
Xi, Z., Chunyu, W.: Research on data visualization design for police system. In: 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID), pp. 463–468 (2021). https://doi.org/10.1109/AIID51893.2021.9456583
Sitar-Taut, D.A., Mican, D., Buchmann, R.A.: A knowledge-driven digital nudging approach to recommender systems built on a modified onicescu method. Expert Syst. Appl. 181, 115170 (2021)
Zheng, X., Wang, B., Zhao, Y., Mao, S., Tang, Y.: A knowledge graph method for hazardous chemical management: ontology design and entity identification. Neurocomputing 430, 104–111 (2021)
Luo, X., et al.: Alicoco: Alibaba e-commerce cognitive concept net. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 313–327. SIGMOD’20, Association for Computing Machinery, New York, NY, USA (2020)
Donalds, C., Osei-Bryson, K.M.: Toward a cybercrime classification ontology: a knowledge-based approach. Comput. Hum. Behav. 92, 403–418 (2019)
Alfred, R., et al.: Ontology-based query expansion for supporting information retrieval in agriculture. In: The 8th International Conference on Knowledge Management in Organizations, pp. 299–311. Springer (2014). https://doi.org/10.1007/978-94-007-7287-8_24
Jain, S., Seeja, K., Jindal, R.: A fuzzy ontology framework in information retrieval using semantic query expansion. Int. J. Information Manage. Data Insights 1(1), 100009 (2021)
Acknowledgement
The authors would like to thank the policemen from Sucheng branch of Suqian Public Security Bureau for their cooperation and assistance. This work was partially supported by the Shandong Provincial Natural Science Foundation (No. ZR2021MF026).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, T., Jiang, H., Zhang, H., Yan, X. (2023). A Knowledge-Enabled Customized Data Modeling Platform Towards Intelligent Police Applications. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13421. Springer, Cham. https://doi.org/10.1007/978-3-031-25158-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-25158-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25157-3
Online ISBN: 978-3-031-25158-0
eBook Packages: Computer ScienceComputer Science (R0)