Methodology for Data and Information Quality Assessment in the Context of Emergency Situational Awareness

Botega, Leonardo Castro; de Souza, Jéssica Oliveira; Jorge, Fábio Rodrigues; Coneglian, Caio Saraiva; de Campos, Márcio Roberto; de Almeida Neris, Vânia Paula; de Araújo, Regina Borges

doi:10.1007/s10209-016-0473-0

Methodology for Data and Information Quality Assessment in the Context of Emergency Situational Awareness

Long paper
Published: 03 June 2016

Volume 16, pages 889–902, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Universal Access in the Information Society Aims and scope Submit manuscript

Methodology for Data and Information Quality Assessment in the Context of Emergency Situational Awareness

Download PDF

Leonardo Castro Botega¹,
Jéssica Oliveira de Souza¹,
Fábio Rodrigues Jorge¹,
Caio Saraiva Coneglian¹,
Márcio Roberto de Campos¹,
Vânia Paula de Almeida Neris¹ &
…
Regina Borges de Araújo¹

1008 Accesses
21 Citations
Explore all metrics

Abstract

Situation Assessment (SA) approaches aim to provide powerful resources to support decision makers in enhancing their Situational Awareness (SAW). The process of SA in emergency response systems is of utmost importance once the information acquired and inferred from human reports is used to support the deployment of tactics and resources to attend incidents. However, operators of such systems may face informational barriers leading to an erroneous SAW and consequently jeopardize the assessment process if they are not handled. One of such barriers in this context is the presence of low-quality data or information. Hence, a challenging issue in this field is to determine how to generate, score, update and represent data and information quality cues to support operators to reason under uncertainties and improve their understanding about an ongoing situation. The state of the art in this area presents a research gap regarding methodologies for the information quality assessment which can be used in the emergency management domain. Also, there is a lack of approaches that interface with different levels of situational information during an assessment routine. Hence, in order to enhance operators situational awareness, a new methodology is presented to improve the capabilities of SA systems by enriching knowledge about situations with reliable metadata. Such methodology, named Information Quality Assessment Methodology in the Context of Emergency situational awareness, is composed by: elicitation of data and information quality requirements; definition of functions and metrics to quantify quality dimensions, such as completeness, timeliness, consistency, relevance and uncertainty; and the representation of situational information by the instantiation of a situation model, which can be consumed by an ontology. Finally, a case study is addressed to verify the applicability of the methodology using data and information from a robbery event. The results obtained show situational models with qualified information that feed SA systems, enabling them to be aware of information quality.

Conceptual Framework to Enrich Situation Awareness of Emergency Dispatchers

User-Driven Methodology for Data Quality Assessment in the Context of Robbery Events

Situation-Dependent Data Quality Analysis for Geospatial Data Using Semantic Technologies

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Situational Awareness (SAW) is a concept widely spread in military and aviation areas, with increasing use in different application domains that require critical decision making, related to the level of consciousness that an individual or team has to a situation. It is a dynamic understanding of an operator about what is happening in the environment and the projection of its status in the near future [1].

According to Endsley [1] SAW is divided into three levels: perception of the elements in the environment, comprehension of these elements status in a situation and their evolution in a near future. Achieving complete SAW is a process that takes place in the human mind, which requires cognitive activity. However, poor understanding of information in critical domains may not only cause the loss of its global significance, but can also lead to failures and jeopardize human safety.

Information quality is one of the crucial factors to the effectiveness of decision-making systems. Imperfect information, which does not truly describe real-world situations (e.g., incomplete set of necessary data and misspelled words), reduces the reliability of the systems, contributes negatively to the mental model formation and, consequently, undermines the SAW process [2, 3].

One critical service that can benefit from a better SAW-oriented support to decision making is the 911 emergency response system. Lack of proper identification of objects and its relations in a situation can impact on both tactics definition and resource allocation to a call (e.g., criminals and victims in a robbery report). Operators can be provided with informational quality cues to help them reason under uncertainties and improve their understanding about an ongoing situation. However, such cues are not always informative or fully reliable [4, 5].

The state-of-the-art review of approaches to support the SA process and SAW enrichment of decision makers includes solutions such as: cognitive models, ontologies and frameworks based on core ontologies, fuzzy logic and data fusion models [6, 7].

It can be noted that there is a lack of a common ground regarding approaches to assess and represent data and information quality in the context of emergency situations assessment for the improvement of SAW of human operators.

This paper introduces a methodology for the assessment of quality of situational information, using data from robbery events, to enrich SAW of emergency response system operators during real-time analysis. Such methodology is to be coupled in a situation assessment process and can be employed every time a new data is acquired or when information is transformed by other assessment functions (e.g., data and information fusion, integration, cleansing and filtering).

Also, the methodology is presented as a three-stage routine that can be employed to support the measurement and representation of data and information quality, useful to enhance perception and to support operators to gain a better understanding under imperfect information, which can lead to uncertainties. To illustrate the methodology, 911 reports of robbery events are addressed.

The paper is organized as follows: Sect. 2 discusses approaches for data and information quality management. Section 3 presents the proposed methodology for data and information quality assessment followed by a case study that applies this methodology in a robbery situation assessment in Sect. 4 and Conclusions.

2 Data and Information Quality Management: Foundations and State of the Art

The data and information quality management refers to the establishment and definition of roles, responsibilities, policies and procedures regarding the acquisition, maintenance, representation and dissemination of data and information. Thus, from an information input and a given application context, it is defined a rational process to measure, represent and even improve the quality of data and information [8].

The literature registers that there is no pattern for the specification of processes, functions and dimensions to the data and information quality assessment. There is a lack of specific methodologies built for the emergency response systems domain and also generic approaches that support the requirements. Each application area holds its own data quality requirements and categorization, called dimensions, whose meanings are defined according to objectives, tasks and associated decisions. Hence, the applicability of such dimensions is also dependent on quality requirements [9, 10].

One of the most relevant methodologies in the area of data quality management is Total Data Quality Management (TDQM) [11]. As a widely cited work in this context, TDQM lays the foundation for research and development of applications in several fields. New methodologies typically use TDQM as a basis and reference for the comparison with new methods.

Considering other methodologies, in Goal-Question-Metric methodology (GQM) [12], the objectives must be established to orient on the identification of metrics in a given context. Data quality assessment methodology (DQA) [13] was one of the first methodologies to identify general metrics for data and information quality. Comprehensive Methodology for Data Quality Management (CDQ) [9] aims to be complete, flexible and simply to apply, integrating techniques and current tools in a framework that can be applied to every type of structured, semistructured and unstructured data.

Some of the methodologies attend more directly to specific domains, such as the Methodology for the Quality Assessment of Financial Data (QAFD) [14] and the Methodology to assess Data Quality in Cooperative Information Systems (DaQuinCIS) [15], dedicated to the financial area. The Cost-effect Of Low Data Quality (COLDQ) aims to study of more cost/effect/benefit equation. The Methodology for Information Quality Assessment (AIMQ) [16] aims on performance. Activity-based Measuring and Evaluating of Product Information Quality (AMEQ) [17] seeks data quality in the manufacturing domain. Different from TDQM that deals with data as a product, AMEQ deals with information as a product.

Methodologies such as ISTAT [18], created by the Italian National Agency of Census (Instituto Nazionale di Statistica), focus on the most common data to the central administration, regional and periphery and also in corresponding standards.

The Canadian Institute for Health Information Methodology (CIHI), centered in health data, gives priority to the size and heterogeneity of databases, considering a number of quality criteria [9].

The Data Warehouse Quality Methodology (DWQ) [11] and Total Information Quality Management (TIQM) [19] methodologies support data warehousing projects.

Regarding the data quality assessment in critical systems (AIS), approaches such as from Xu and Bowen [20] deal with data quality in the domain of accounting information systems. The authors state that data quality is particularly important for AIS and organizations, for both external reporting and internal managerial purposes, and that data quality limitations can jeopardize the analysis, planning and evaluation of the dynamics of the financial operations [21].

Tee et al. [22] presented a more recent case study, using interview and survey methods, examined factors influencing the level of data quality within a target organization. Many other approaches also presented solutions for this particular critical domain [23–29].

Regarding methodologies for emergency management systems, in his work, O’Brien [30] aims do redefine data quality dimensions required for critical information systems in three main dimensions: content, time and shape. Among the quality attributes there are readiness, acceptance, frequency, period, accuracy, relevance, completeness, conciseness, breadth, performance, clarity, detail, order, presentation and media.

Wang et al. [11, 31] categorize quality dimension attributes for critical applications in four main classes (intrinsic, contextual, representational and contextual data quality). Intrinsic data quality implies guaranteeing credibility and reputation to data, and among the attributes there are credibility, reputation, accuracy and objectivity.

Contextual data quality is comprised of attributes that should be considered and evaluated according to the context of the task to be performed, having as attributes: value-added, relevance, timeliness, completeness and appropriate amount of data.

As for the quality of representation, the attributes are defined according to the given format-related aspects (such as conciseness and representation), and the meaning in the understanding and interpretation of such data. Finally, the authors classify individual accessibility-related attributes.

Laudon [32] deals with information quality regarding criminal records of the USA to be used by emergency response systems. The authors claim that specific methodologies were not established to the quality analysis of such records and little effort was dedicated to define quality levels for them. In this work, two kinds of record were evaluated: digital record of criminal history (i.e., prison records) and prison warrants. The quality dimensions were defined based on research and interviews made by the authors with more than 100 state and federal criminal justice teams. In this work, the quality dimensions were completeness, precision and record ambiguity.

The criminal record system of the Bureau of Justice Statistics [33] employs a methodology for the assessment of completeness and precision of criminal history data, producing qualified information for both state and federal repositories, for further usage of emergency response systems. The data requirements were elucidated based on well-defined patterns and procedures already defined for five application scenarios. Based on such scenarios, the auditor must perform a careful review about the laws and state regulations, forms, instruction manuals and criminal record output formats, for defining a list of requirements affordable for completeness and precision assessment.

The Uniform Criminal Records (UCR) [34] of the FBI developed a methodology to ensure the data quality in criminal records by reviewing crime reports using the following routines: interviews with administrative employees to guarantee that the quality patterns defined by UCR were followed; review of incidents selected samples if the patterns and definitions were correctly applied, based on dimensions of imprecision and the existence of records and reports regarding the incident. Finally, closing meetings are performed to indicate flaws on the process.

A common point of existing data management methodologies is the need of a subjective step of analysis performed by experienced users in the field by means of questionnaires, interviews or surveys.

Besides being too specific to particular purposes, in approaches for emergency management systems, it is noticed that most of them are applied after the event has emerged, and at the moment, the victims are reporting the crime to the emergency service [19, 35–37]. A preventive approach would better help such systems to avoid spreading low-quality data in the system [38, 39].

3 Information Quality Assessment Methodology in the Context of Emergency Situational Awareness (IQESA)

For the improvement of SAW, especially to increase perception (Level 1 of SAW) and to support situational understanding (Level 2 of SAW), the IQESA methodology is aimed at illustrating all phases in order to evaluate and represent the quality of data and information as part of an information assessment process, whose goal is to obtain and provide the maintenance of SAW in the emergency management context.

The IQESA methodology consists of three basic steps: (1) elicitation of data and information quality requirements, (2) definition of functions and metrics to quantify quality dimensions and (3) representation of situational information.

In order to illustrate these steps, the 911 emergency response system is used, more precisely real-time report on an ongoing robbery analyzed by a trained human operator, who is supposed to gather vital information and make decisions on dispatching the appropriate resources.

As part of a situation assessment cycle, before submission to the routines described in this methodology, information is obtained and identified by acquisition processes and eventually combined by data fusion methods, whose outcome is in the form of objects or collections of related objects (also known as situations). Finally, such information is evaluated according to the dimensions and functions discussed in this section.

As mentioned earlier, every time a new information is acquired or transformed, and the quality assessment must be performed to guarantee that the system itself and the human operator will be fed by qualified information. For further information about the complete assessment process, readers should refer to [40–42].

3.1 Elicitation of data and information quality requirements

The elicitation of data and information quality requirements was carried out with the support of trained 911 dispatchers, subject matter experts (SME) members of São Paulo State Police Force (Polícia Militar do Estado de São Paulo—PMESP).

A goal-driven task analysis (GDTA^{Footnote 1}) [2] and a questionnaire were employed to identify the information priorities to be considered.

The GDTA data can be obtained by semistructured interviews with SME. In such interview the designer of the assessment system must inquire the SME about the tasks they must perform in their daily activities as a dispatcher of emergency management systems, the decision they have to make to perform such tasks and, finally, what information is necessary to make each decision and which source can provide or infer that information (e.g., sensors, events or functions of the assessment process). The results are sets of information needed to perform decisions which may be classified under each level of situational awareness.

The questionnaire is organized under a Likert scale from 0 (no importance) to 7 (essential information), so that it is possible to develop a scale of importance for every SAW-related information. Rules, protocols and procedures are also used to define information priorities. Currently used by PMESP, a “decision tree,” representing all types of events and standard procedures, is applied to guide dispatchers during an emergency call, presenting what should be asked to the caller.

The decision tree, besides proposing a script of decision making when responding to an emergency call, also reveals information that must be obtained and in a particular order. However, not even all dispatchers rely on the dependencies imposed by the decision tree, mostly due to the absence of some vital information that prevents proceeding with the inquiry.

With the information gathered in the GDTA (an example of relevant information to Level 1 of SAW analysis is shown in Table 1), it was possible to create an attribute tree (Fig. 1), illustrating the information hierarchy and dependence in a robbery situation.

The main central node is the situation itself (Robbery), the next nodes are the fundamental entities to consider the situation as a robbery (victim, criminal, stolen object and the location), and the leaves represent the attributes that describe each entity (also known as descriptors).

The attribute tree plays an important role in the next steps considering quality evaluation and representation of domain knowledge, which are determined by the information requirements of objects and their attributes.

In the robbery example, it is possible to characterize the following essential objects: victim, offender, stolen object and place and time of the event. An example of the description of attributes expected for each object is as follows:

Criminal and victim, whose attributes are similar: clothes, features, accessories and respective descriptions. The criminal has specific attributes such as current location and escape direction.
Object: defines the characteristics of the stolen object such as color, brand, size and model. There is also a call for vehicles and characteristics such as plate and year in case this information is available;
Location: this information is given along with some specification (premises, lots, apartments) and information related to the address such as street and district.

Table 1 Example of information requirements defined by the GDTA in Level 1 of SAW

Full size table

3.2 Definition of functions and metrics to quantify quality dimensions

This step defines the functions and metrics to quantify the dimensions of quality information pertaining to the emergency management domain, such as completeness, temporal completeness, timeliness, relevance and consistency. Each object found in previous processes for acquisition of information, both before and after the merging of information of routines, is evaluated for these dimensions, thus defining local indexes of quality of data or information.

3.2.1 Syntax accuracy assessment

Syntax accuracy is the first quality dimension assessed and applied on information of objects found in a string (text format), as shown in the attributes tree.

This assessment aims to mitigate grammar errors that can negatively influence the evaluation of completeness to be described in this section. These errors are usually present on data resulting from soft sensors [6, 23].

For this initial assessment an algorithm called Metaphone [13] is used, which generates a key according to the audio of the word pronunciation. Consequently, even if there is a word with grammar error, the same text key is generated.

Thus, an algorithm called Levenshtein distance [24] is used to compare keys. It measures the edit distance between two strings, and as a result, the algorithm returns the number of operations required for a string to match the other.

This process depends on a dictionary with the keys generated from the other text entries as a benchmark. If the result of the comparison of the keys is equal to 0, it means that the words are equivalent, indicating the presence of attributes (words that qualify and describe objects) obtained by the data acquisition process.

Thus, even if there are words with syntax problems, the routine assessment of completeness, for example, will be smooth.

This dimension was adopted as part of a method to avoid inconsistencies which could impact other dimensions evaluation processes, leading to outcome inconsistency and consequently affecting the SAW process carried out by the human operator when evaluating the situations.

Syntax accuracy is not represented quantitatively. It is only an internal mechanism of data quality control.

3.2.2 Completeness assessment

The completeness evaluation consists of providing a quantitative measure of how much a report is complete in what concerns the presence of attributes that describe it.

The metrics for calculating the completeness were defined based on the attributes tree and the results of the questionnaire, thus defining the essential objects, the attributes that must necessarily be present to not compromise the index (quantitative value of completeness), as well as attributes priority to control its influence in the calculation. For example, in an analysis of robbery situation, these priority attributes are:

Venue;
Presence and type of weapons used;
Current Location of the criminal;
Victim’s condition;
Information on the stolen object;

Formula (1) defines the use of metrics to calculate the dimension of completeness of the objects identified in a textual report:

$$\begin{aligned} \frac{ \delta \sum { \varphi +(10\sum { \beta \times \varphi } -\sum { \varphi } } ) }{ 10\sum { \varphi } } \end{aligned}$$

(1)

Being:

$\delta$ presence of the object (0 if not present, 1 if present in JSON object); $\beta$ presence of the attribute (1 if present, 0 if not present); $\varphi$ weight of the attribute (2 if priority, 1 if not priority).

If an object is identified, without attributes (present in the string, without attributes that describe it), a standard 10 % rate is set to such an object. Therefore, $\delta$ represents one of the four key objects whose presence is expected as equal to 0 if there is no object present in the string, and 1 if all of them are present.

Therefore, if the object is present, it means that completeness has reached 10 %.

Finding indexes for the other $\beta$ items consists of determining the presence of descriptive attributes in the range of 1 when the attribute is present and 0 if absent, $\varphi$ is multiplied by the importance of the attribute in a scale of 1 if it is a conventional attribute and up to 2 if it is a priority attribute. The result is divided by the sum of $\varphi$ and finally multiplied by 100 to be considered in the calculation of certainty, which is discussed later.

3.2.3 Example of completeness assessment

The calculation of completeness rate is more frequent during the quality assessment process, since it may happen in several moments of a process for evaluating situations, specifically every time that new information is inferred. Each time there is a process of inference, such as a fusion of information, which aggregates more information about an event, and the indexes must be recalculated and adjusted to a greater or lesser value.

If little information is provided in a textual report, the completeness index of some objects is low, and according to defined metrics, objects with no attributes have a low completeness index. If there are priority attributes both in the objects place and criminal, the completeness rates of these objects do not suffer bigger deductions.

In the following, a small sample of the application of the completeness assessment of information from a report of robbery is shown, in this case showing the attributes evaluated on a criminal.

Crime reported to 911: “A crime just happened here at Domingos Setti Ave. A driver was threatened and ordered to leave the vehicle without taking anything. The robber fled toward the Klabin subway”.

Criminal object and its attributes identified and assessed by completeness:

Gender: male;
Status: running;
Scape Direction: Klabin subway.

Hence, in this case the completeness index is equal to 23.80.

The update of the completeness index is made when a new processing is demanded by the human operator or by the system itself. This routine is necessary in order to add information that the operators deem necessary to their understanding of the situation.

In this case, if such process occurs and the system finds new objects or complementary and consistent attributes of other data sources, the completion rate can increase.

In the following, a sample of the update of the completeness assessment of information from the combination of the previous report with a new report of robbery is shown, also illustrated using only the attributes evaluated on a criminal.

New Crime reported to 911: “A guy was robbed in front of me by someone armed. It happened at Domingos Setti near the Don Paladino restaurant. The robber had a gun, was a tall guy and had tattoos on his arms. The victim looks very hurt”.

Criminal object and its attributes identified and assessed by completeness:

Gender: male;
Status: running;
Scape Direction: Klabin subway;
Height: tall;
Weapon: revolver;
Tattoos: arms.

Hence, in this case the new completeness index is equal to 42.85.

In this new report, new attributes, such as in the criminal object, were found; evidence that he had a weapon and a tattoo on his arms, as well as a reference to the place. Thus, the completeness index of such objects increased.

This process of importance update occurs whenever a new inference about objects or attributes is obtained by means of the fusion process or other processes such as the very interaction of the human expert via the user interface.

3.2.4 Timeliness assessment

Timeliness is the examination of the time evolution of an event and their situations. The identification of time data is absolutely important, since in some cases the robbery report may be notified in real time during the event, or seconds, minutes or hours after the event. The identification of these aspects assists in defining the action plan to be followed, as well as the definition of public security service category that will be responsible to deal with the emergency (state police, military force, fire department, etc.).

Four attributes are defined to assess the scale of timeliness: (1) hour of the robbery, (2) time that the robbery report was generated, (3) time taken to process such information in the search for objects and attributes and (4) current time of the system.

Note that the evaluation of timeliness according to the variation of these attributes can contribute to SAW positively if it is based on reliable data or negatively if temporal data is incomplete or misinterpreted.

The evaluation of timeliness dimension results in two types of information: a quantitative index of the existence of the four attributes needed and the elapsed time, in minutes, since the event emerged. Formula (2) was adjusted to perform the quantitative count:

$$\begin{aligned} \sum _{ y=1 }^{ 4 }{ -\theta } \end{aligned}$$

(2)

where $\theta$ is a composite index of the current time, less the following time attributes: the time of the event contained in a report, the time the report was processed and the time taken to process such information to find objects and attributes.

Considering the time data that can be obtained in this domain, a dimension called time completeness was defined, aiming to provide a quantitative index of time data completeness (present in a report).

3.2.5 Situation certainty

When composing a situation, that is, a relationship between two or more identified objects, a global index of certainty is established: that is the confidence that the system has in the situational information. This is a preliminary generalization of quality calculated on the basis of each quality attribute available.

Despite the certainty in a situation to be calculated on the basis of other local dimensions, the absence of any size or inability to calculate it, does not prevent such global index to be inferred. Certainty can be calculated due to the existent completeness and time dimensions.

Calculation is based on the average of the four completion rates (object, criminal, victim and local) to assess a situation:

Sum of the values of completeness of each criminal in the report;
Divide the result by the numbers of criminals;
Add the result of the division to the sum of the completeness of other existing objects;
Divide the result by 4.

3.2.6 Consistency and relevance assessment

As defined in the literature, consistency is the violation of semantic rules for a given set of data or information. It is noteworthy that, according to the relational theory [13], consistency can be measured and statistically evaluated, and much of the literature considers consistency to address quality issues in databases, taking into account predefined constants to find and solve such problems [17].

Scannapieco et al. [15] exemplify data inconsistency in a response provided to a data set in which a person’s marital status is “married,” and the age is “five years old.”

Thus, metrics for evaluation are set according to a set of semantic rules established for a specific data set. Also, as discussed by Batini et al. [6], two different metrics are commonly used for evaluation: the first based on techniques of linking between data, used to identify consistency rules for foreign keys in the presence of inconsistent data [9], and the second used to check business rules.

Dimension of relevance is the degree to which certain sets of information meet the user needs. Relevance is also defined as the extent to which the data is relevant and useful for the task to be performed [13]. According to the survey of metric assessments to quantify a dimension, directed by Batini et al. [25], relevance can be measured by subjective methods such as assessments applied by experienced users in the area.

3.2.7 Example of consistency and relevance assessment

The evaluation of consistency is performed with the help of syntax analysis, and the rules defined for such an assessment depends on the verification of existing data values in the current context of analysis.

One must consider that information to be evaluated for such dimensions is inferred by the same processes for the completeness and timeliness, that is, whose origin is in reports given by human beings, who are susceptible to a number of flaws, inconsistencies and uncertainties.

It is known that it is possible that reports with similar characteristics, although with different data, or considered inconsistent according to the current context of the situation, are subjected to the process of evaluating situations, which would decrease the percentage of quality if they are incorporated into the partial result. It also can happen when considering the relevance of the information according to the current context of analysis.

An example of inconsistent information for this domain is presented below. Aiming to raise successive fusions to increase the representation of information, it is possible that the system considers a third Report along with Report 1 and Report 2, which may have in common, with the current situation, the attributes date, time and place.

However, the description of the reported event can be totally different from the current situation under review. These events are likely to occur after a fusion process is performed at least twice, in which similar information was found in different reports.

The following reports are taken as realistic examples to perform the examples of consistency and relevance assessment:

Report 1: “A crime just happened here at Domingos Setti Ave. A driver was threatened and ordered to leave the vehicle without taking anything. The robber fled toward Klabin subway station”.

Report 2: “I’ve just witnessed two men stealing a black Mercedes here at Domingos Setti Ave. They threatened the victim, got into the car, and fled toward Klabin subway station”.

Report 3: “A black Mercedes was taken from a man right before me”.

Report 4: “There was a robbery on São José Ave. and the offender hit the driver of a silver Porsche Cayenne”.

Report 5: “Two minutes ago a lady was threatened and her car was stolen at Domingos Setti Ave. The offenders fled to the west district taking her car along”.

Reports 1 and 2 consist of two pieces of information obtained by means of two different calls, and Reports 3 and 4 come from posts on a social network. It is important to say that information posted on social network such as Twitter is geolocated. In this manner, Reports 1 and 2 are related, and by comparing the attributes and objects present in the Report 3 one can state that this is related to the above, a process that would be performed by data fusion.

Report 4 is also taken from a social network, and even with the geolocation attribute equivalent to the same address of the previous reports, it is inconsistent, considering the event of robbery of a black Mercedes, which occurred on Domingos Setti Avenue.

This situation may be correct; however, in accordance with the context of this particular event, it is inconsistent because geolocation can be used as a criterion for a merging of reports. Merging with the Report 4 could both reduce the confidence rating and put consistency at risk, undermining information whose purpose was to provide a better perception and understanding of the situation.

In this context, the dimension of relevance was not defined considering whether some information is relevant to the human expert, but to assist low-level processes to determine relevant inputs for processing the data fusion. As an example, Report 5 illustrates a situation with the same characteristics of Reports 1, 2 and 3, with inconsistent though relevant information.

In Report 5, even if the inconsistency of information about the victim (which has been proven as a male person) is evident, there is a new relevant information on a possible current location of the criminals (west), and as a result of the questionnaire applied by experts, information about the current location of the offender is given priority. Thus, a criterion is established to assist in data fusion.

The quality assessment phase always provides information on the existence or not of priority attributes. Thus, even if one low-quality or inconsistent piece of information is the merging process, this may or may not be discarded if the priority attribute has been obtained in advance or not, in case the inconsistency does not interfere with the relevance of the latest information (selection process to be made under criteria defined by the data fusion algorithm)

It is important to note that the consistency and relevance have no quantitative indexes because they assist the evaluation process of the machine in order to provide reliable information to the human expert. In addition, the system does not have autonomy to make decisions. If the expert determines that the Reports 1, 2 and 3 are imperfect, and that Report 4 is the most appropriate, he or she may carry out demand fusion from Report 4 and discard the previous ones.

3.3 Representation of situational information

In this work, the semantic model of ontologies was adopted to the representation of situational information, that is, the accumulated knowledge about situations (objects and relationships among them) that grows through time. For the development of such domain ontology, the Noy and McGuiness methodology [26] was used, supported by the requirements obtained in the acquisition of HUMINT data. Its class hierarchy is shown in Fig. 2.

In Fig. 2 it is possible to observe the main classes and their relationships regarding the emergency management domain, more specifically for situations of robbery crime.

Figure 3 shows a diagram with the object properties of such ontology.

Every existing relation has an inverse relation. For instance, a situation has a stolen object and this relation is the inverse of the stolen object being part of a situation. This is specially important inside an ontology due to the capability of such model on inferring about its classes and instances.

The data properties of the proposed ontology are also essential for its purposes. Such feature aims to establish relations among classes, known as domain, and absolute values, known as range (string, integer, Boolean).

In this context, the attributes referring to the data quality are all data properties, that is, all data, regardless of its nature, is always attached to a class and a decimal value. That is because quality is an absolute value represented by a decimal value, which indicates in percentage the quality according to the quality dimensions approached in this work. The classes that contain attributes regarding data and information quality are presented in Table 2.

Table 2 Relations among classes and data properties regarding data and information quality

Full size table

These attributes are inserted in the ontology representation as information quality calculations are done. These quality attributes are only used to represent quality, while the other attributes and their respective objects and relations are part of the calculus.

Hence, the attributes regarding the ontology and the representation of information quality are: completeness, temporal completeness, currency, consistency, relevance and uncertainty.

The ontology class may or may not contain some of the quality attributes. Before a class is evaluated, the quality attribute receives a null value and only after the ontology processing, it will be assigned a quantitative value regarding the quality of containing information.

Hence, the ontology is used as a representational structure of this work, aiming to represent the situational information enriched by aggregated semantics to each object and qualified attribute.

The ontology built was used as a base to the construction of a JSON object model which incorporates all properties, classes and restrictions of the ontology. Thus, such JSON object becomes an instance of the ontology classes. In other words, the representation model is an OWL ontology and the instances are JSON objects generated from this model.

Hence, the JSON object works as a key-value structure. Each key is an ontology property, and each value is the property content for the current instance, being a unique value or a list.

Figure 4 presents the structure of the JSON object to support an ontology instance. It is possible to notice the ontology classes, such as category, report/delation, time, criminal, object, victim and location, beside the data properties, inside the classes.

Also, there are some relations, such as the report, that is a property of situation object. Hence, this JSON object becomes a complete instance of the ontology, covering all classes and properties.

It is also possible to verify in Fig. 4, the properties regarding information quality. In all four classes: criminal, victim, object and location, there are attributes referring to data and information quality, as part of the current instance. Data regarding quality are data properties of the ontology that inside a JSON object becomes attributes referring a containing class.

Section 4 presents a case study that addresses the use of the IQESA methodology for assessing the quality of information of a complete situation of the emergency management domain, addressing all dimensions in a single instance.

4 Case study: robbery information quality assessment

This case study presents a complete example in which SAW is a paramount factor for decision making because of the impact on police resources allocation. Given the large amount of criminal events reported to the São Paulo State Police from Brazil (PMESP), and considering the stress that emergency operators are submitted to, the main objective of this case study is to illustrate how information quality awareness can support situation assessment systems and consequently help operators are enhancing their SAW, by reducing their uncertainties and providing high-level abstractions, resulting in a more efficient emergency call response service.

The results regarding information quality illustrated in this section were all obtained through the application of the Methodology for data and information quality assessment in the Context of Emergency situational awareness. Furthermore, all information (inferencing, processing and representation) is handled by a complete situation assessment system, empowered by data classification and fusion [6], quality assessment [30], information visualization [32] and SAW-oriented user interfaces [33] and developed by the authors’ group.

This case study discusses specifically the situation of robbery. Hence, the aim is to identify and understand contexts associated with this kind of crime, such as location, criminals, stolen object and victims. The crime is initially reported by phone, and then, the assessment steps mentioned above are applied. For this case study, it is considered that a situational information was already acquired and had their objects (and relations/situations) among them recognized, as part of a situation assessment and fusion cycle. For further information about data fusion models and processes, the reader must refer to [34]. An example of a robbery report is given below:

Report 1: “I just saw a lady being mugged here in the square of Sé. The bastard took her purse”.

After such report is submitted to the objects and their attributes identification and classification, the first data quality assessment is performed. Such process is performed every time the system receives and infers new information about objects and situations.

Each object has priority attributes, each with different weights. The classified objects and attributes with their respective indexes of completeness and temporal completeness are shown in Fig. 5.

Hence, the completeness indexes are calculated as follows.

Only two attributes were found referring to the victim (e.g., the victim gender and one reference word), none of which are priorities. Divided by the quantity of needed attributes, a total of 0.1428 points are obtained, that is, a 14.28 % completeness rate.

The same occurs with other objects found in the report. Considering that very small amount of data was delated, such objects received limited quality indexes regarding completeness (10.52 % for criminal data, 25 % for location data and 10.2 % for stolen object data).

According to the description above, the temporal completeness considers the four temporal attributes defined. Hence, the presence of each of them is scored with a 25 %. Considering Report 1, there are two of them. Hence a 50 % score was reached.

The certainty calculus occurs by means of a sum of all quality indexes available (e.g., four regarding completeness and one for temporal completeness) and then divided by five (total of indexes available) resulting in a 22 % certainty score.

The same process occurs if a second report is delivered to the system, such as for example Report 2, with its JSON object shown in Fig. 6.

Report 2:“A blue shirt guy just threaten and robbed a lady nearby of the square of Sé. He had a knife”

Once Report 2 has the necessary complementary information, it will present improved quality indexes, and consequently, their fusion results in a unique and more significative information with better overall quality indexes.

Figure 7 presents the result of the fusion between Reports 1 and 2, which was also submitted to the same processes (first objects identification and then the information quality assessment), to obtain its completeness and temporal completeness indexes.

After the fusion process, the operator is free to demand the system for a new fusion, new data from other HUMINT sources (from human intelligence, e.g., social networks) or even by direct input updating or inserting new information.

If a third Report is considered, it is possible to notice that it will be discarded by consistency and relevance analysis.

Report 3: “a black fox car was stolen from a guy by the square of Sé. He injured the driver. I got the plate 000–1111”.

In this case, object has 4 attributes, i.e., description, model, color and plate. Dividing by the total amount of attributes, also considering their weights, gives a total of 0.5454, and multiplying by 100, a completeness score of 54 % is obtained. Figure 8 presents the results of the assessment of Report 3.

However, such report will not be considered for the current situation (and fusion process), as it possesses issues of consistency and consequently of relevance.

By owning the same location, Report 3 is classified as a candidate for fusion and for the situation at hand; however, it is not in accordance with the current context (purse stolen from a lady near the square of S) and consequently it is not relevant to the context. Hence, the information fusion is not implemented with this report and the situation does not incorporate such information.

5 Conclusions

This work introduced a methodology to qualify information by quantifying it through the dimensions and metrics defined for the emergency management domain. Assessing the quality of information, as part of a whole situation assessment routine, the authors expect to enhance SAW of specialists when analyzing emergency reports.

Considering that operators may have to make quick and improved decisions under heavy stress, the IQESA methodology tackles information quality by measuring and representing them for further usage in a situation assessment cycle, which generally includes, besides processing phases, a graphical representation process.

Hence, quality indexes inferred in the context of this methodology also present subsidies to reduce uncertainty as a means for better perception of how much the information about what is going on during an emergency is reliable.

For the requirements elicitation, interviews with experts were carried out with the application of GDTA methodology. As a result, an attribute tree was created with the main robbery objects and their attributes. The dimensions to perform information quality assessment were defined as: syntactic accuracy, completeness, temporal completeness, consistency, relevance and uncertainty.

The weights of the objects and attributes set for a robbery were established with the support of PMESP police experts by questionnaires and interviews. Metrics and functions to measure each dimensions were also build based on such requirements.

Finally, a domain ontology based on SAW core ontology [4] was set to provide a semantic representation, meaning and relationships between qualified objects and attributes.

The assessment of the data performed in each robbery report offers resources for a full perception of the entities of an event and the necessary information about an ongoing crime.

With qualified information, operators may take improved decisions about what resources to apply or choose to improve the quality of information employing refinement routines available through most of the human-centered data assessment models.

The knowledge generated may assist the development of systems that require SAW, since the assessment of quality tends to improve the representation of both present and absent reports information.

Since the main objective of the IQESA methodology is to focus on the perception of the elements, and it does so by identifying elements present in reports of robbery events by highlighting them and setting scores of quality, the methodology meets its goal.

As future work, the efficiency and level of situation awareness of operators will be measured (by evaluation of PMESP members). In that respect, information visualization techniques and SAW-oriented user interfaces are under development to accommodate the graphical representation of quality indexes and their updating by means of operators interaction with the system.

Notes

GDTA—requirements elicitation technique derived from task analysis which reveals tasks to be done, decisions to be made and information needed to perform decisions, classified under the SAW levels

References

Endsley, M.R.: Designing for Situation Awareness: An Approach to User-Centered Design. CRC Press, New York (2011)
Book Google Scholar
Todoran, I., Lecornu, L., Khenchaf, A., Caillec, J.L.: A methodology to evaluate important dimensions of information quality in systems. J. Data Inf. Qual. 6(23), 1–23 (2015)
Google Scholar
Ciaramella, A., Cimino, M., Marcelloni, F., Straccia, U.: Combining Fuzzy Logic and Semantic Web to Enable Situation-Awareness in Service Recommendation. Database and Expert Systems Applications. Springer, Berlin (2010)
Google Scholar
Mescherin, S., Kirillov, I., Klimenko, S.: Ontology of emergency shared situation awareness and crisis interoperability. International Conference on Cyberworlds (2013)
Baumgartner, N., Gittesheim, W., Mitsch, S., Retschitzegger, W., Schwinger, W.: Be Aware! situation awareness, the ontology-driven way. Data Knowl. Eng. 69(11), 1181–1193 (2010)
Article Google Scholar
Blasch, E., Llinas, J., Lambert, D., Valin, P., Das, S., Chong, Chee, Kokar, M., Shahbazian, E.: High level information fusion developments, issues, and grand challenges: fusion 2010 panel discussion. 13th Conference on Information Fusion (FUSION) (2010)
Matheus,C.J., Kokar,M.M., Backawski,K., Letkowski,J., Call,C., Hinman,M., Salerno, J., Boulware., D.: SAWA: An assistant for higher-level fusion and situation awareness. Defense and Security. International Society for Optics and Photonics (2005)
Batini, C., Daniele, B., Frederico, C., Simone, G.A.: Data quality methodology for heterogeneous data. Int. J. Database Manage. Syst. 3(1), 6079 (2011)
Google Scholar
Batini, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys (CSUR) (2009)
Lee, Y.W., Strong, D.M., Kahn, B.K., Wang, R.Y.: AIMQ: A methodology for information quality assessment. Inf. Manage. 40(2), 133146 (2002)
Article Google Scholar
Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)
Article Google Scholar
Wang, R., Strong, D.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 533 (1996)
Google Scholar
Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14, 6–14 (2005)
Google Scholar
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 315–326. VLDB Endowment (2007)
Winkler, W.E.: Methods for evaluating and creating data quality. Inf. Syst. 29(7), 531–550 (2004)
Article Google Scholar
Bobrowski, M., Marr, M., Yankelevich, D.A.: Homogeneous framework to measure data quality. Fourth Conference on Information Quality, p. 115124 (1999)
Lee, Y., Strong, D., Kahn, B., Wang, R.: AIMQ: a methodology for information quality assessment. Inf. Manage. 40(2), 133–146 (2002)
Article Google Scholar
De Amicis, F., Batini, C.: A methodology for data quality assessment on financial data. Stud. Commun. Sci. 4(2), 115–136 (2004)
Google Scholar
Batini, C., Barone, D., Mastrella, M., Maurino, A., Ruffini, C.: A framework and a methodology for data quality assessment and monitoring. In: ICIQ, pp. 333–346 (2007)
Bowen, P.: Managing data quality accounting information systems: a stochastic clearing system approach. Unpublished Ph.D. dissertation, University of Tennessee (1993)
Al-Hiyari, A., Al-Mashregy, M.H.H., Mat, N., Alekam, J.M.: Factors that affect accounting information system implementation and accounting information quality: A survey in University Utara Malaysia. Am. J. Econ. 3(1), 2731 (2013)
Google Scholar
Tee, S.W., Bowen, P.L., Doyle, P., Rohde, F.H.: Factors influencing organizations to improve data quality in their information systems. Acc. Financ. 47(2), 335355 (2007)
Article Google Scholar
Haug, A., Zachariassen, F., Liempd, D.V.: The costs of poor data quality. J. Ind. Eng. Manag. 4(2), 168193 (2011)
Google Scholar
Grabski, S.V., Leech, S.A., Schmidt, P.J.: A review of ERP research: a future agenda for accounting information systems. J. Inf. Syst. 25(1), 3778 (2011)
Google Scholar
Umble, E.J., Haft, R.R., Umble, M.M.: Enterprise resource planning: implementation procedures and critical success factors. Eur. J. Oper. Res. 146, 241257 (2003)
Article MATH Google Scholar
Zhang, L., Lee, M.K.O., Zhang, Z., Banerjee, P.: Critical success factors of enterprise resource planning systems implementation success in China. In: Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS03) (2003)
Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 8695 (1996).s
Article Google Scholar
Bradley, J.: Management based critical success factors in the implementation of enterprise resource planning systems. Int. J. Acc. Inf. Syst. 9(3), 175200 (2008)
Article Google Scholar
Daoud, H., Triki, M.: Accounting information systems in an ERP environment and Tunisian firm performance. Int. J. Digit. Acc. Res. 13, 135 (2013)
Google Scholar
OBrien, J.: Sistemas de Informao e as Decises Gerenciais na Era da Internet.2.ed.So Paulo: Saraiva, vol. 2, p. 431 (2004)
Agre, J., Vassiliou, M.S., Kramer, C.: Science and Technology Issues Relating to Data Quality in C2 Systems. Institute for Defense Analyses (2011)
Laudon, K.C.: Data quality and due process in large interorganizational record systems. Communications of the ACM (1986)
Bureau of Justice Statistics: U. S. Assessing completeness and accuracy of criminal history record systems: audit guide. Bureau of Justice Statistics. U.S. Department of Justice, Office of Justice Programs, Ed, Indiana, p. 65 (1992)
Koudas, N., Sarawagi, S., Srivastava, D.: Record linkage: similarity measures and algorithms. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 802–803. ACM, New York (2006)
Cummings, M.L., Ryan, J.C.: Shared Authority Concerns in Automated Driving Applications (2014)
Kim, W., Lee, S., Kim, Y., Choi, H.: Evaluating the Data Quality and the Uncertainty in Electroencephalogram Signals for a Neuromarketing Service which Computes Attentional Engagement, p. 6266 (2014)
Philips, L.: The double metaphone search algorithm. C/C++ Users J 18(6), 38–43 (2000)
MathSciNet Google Scholar
Noy, N.F., McGuinness, D.L.: Ontology Development 101: A Guide to Creating Your First Ontology (2001)
Matheus, C.J., Kokar,M.M., Backawski, K., Letkowski, J., Call, C., Hinman, M., Salerno, J., Boulware., D.: SAWA: An assistant for higher-level fusion and situation awareness. Defense and Security. International Society for Optics and Photonics (2005)
Botega, L., Campos, M., Neris, V.P.A., Araújo, R.: SAW-oriented user interfaces for emergency dispatch systems. In: 17th International Conference on Human Computer Interaction, 2015, Los Angeles. Lecture Notes in Computer Science (LNCS) (2015)
Junior, V., Sanches, M., Botega, L., Souza, J., Coneglian, C. S., Fusco, E., Campos, M., Arajo, R.: Multi-criteria fusion of heterogeneous information for improving situation awareness on military decision making system. In: 17th International Conference on Human Computer Interaction, 2015, Los Angeles. Lecture Notes in Computer Science (LNCS) (2015)
Souza, J., Botega, L., Campos, M., Arajo, R.: Conceptual framework to enrich situation awareness of emergency dispatchers. In: 17th International Conference on Human Computer Interaction, 2015, Los Angeles. Lecture Notes in Computer Science (LNCS) (2015)

Download references

Acknowledgments

The authors would like to thank São Paulo Military State Police for the support in this project.

Author information

Authors and Affiliations

Wireless Networking and Distributed Interactive Simulations Laboratory, Federal University of São Carlos, São Carlos, Brazil
Leonardo Castro Botega, Jéssica Oliveira de Souza, Fábio Rodrigues Jorge, Caio Saraiva Coneglian, Márcio Roberto de Campos, Vânia Paula de Almeida Neris & Regina Borges de Araújo

Authors

Leonardo Castro Botega
View author publications
You can also search for this author in PubMed Google Scholar
Jéssica Oliveira de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Fábio Rodrigues Jorge
View author publications
You can also search for this author in PubMed Google Scholar
Caio Saraiva Coneglian
View author publications
You can also search for this author in PubMed Google Scholar
Márcio Roberto de Campos
View author publications
You can also search for this author in PubMed Google Scholar
Vânia Paula de Almeida Neris
View author publications
You can also search for this author in PubMed Google Scholar
Regina Borges de Araújo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Castro Botega.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Botega, L.C., de Souza, J.O., Jorge, F.R. et al. Methodology for Data and Information Quality Assessment in the Context of Emergency Situational Awareness. Univ Access Inf Soc 16, 889–902 (2017). https://doi.org/10.1007/s10209-016-0473-0

Download citation

Published: 03 June 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s10209-016-0473-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Methodology for Data and Information Quality Assessment in the Context of Emergency Situational Awareness

Abstract

Similar content being viewed by others

Conceptual Framework to Enrich Situation Awareness of Emergency Dispatchers

User-Driven Methodology for Data Quality Assessment in the Context of Robbery Events

Situation-Dependent Data Quality Analysis for Geospatial Data Using Semantic Technologies

1 Introduction

2 Data and Information Quality Management: Foundations and State of the Art

3 Information Quality Assessment Methodology in the Context of Emergency Situational Awareness (IQESA)

3.1 Elicitation of data and information quality requirements