1 Introduction

The financial industry is becoming more complex due to the lack of effective communication between risk experts and decision makers (Huang et al. 2009). For example, a recent study of the life insurance industry in Australia found that managing risk involves more than protecting value (Islam et al. 2021b, 2020a). According to the Australian Prudential Regulation Authority (APRA 2019), net claims expenses increased by \(12.6\%\), i.e., from \(\$22.1\) billion to \(\$24.9\) billion from the year ended 2018 to the year ended 2019. It is observed that re-insurers have a higher capital coverage ratio than direct insurers (https://www.apra.gov.au/news-and-publications). Hence, insurance managers (IMs) need to take proper action to avoid fraud and reduce loss (Eppler and Aeschimann 2009). This strong control can be achieved through efficient communication between all the engaged bodies. Visualization is one approach to obtain such efficient risk-relevant information (Ko et al. 2016). Although many forms of business diagrams such as tables, charts and formulas are a common solution for claim and risk management in the insurance industry, it may be challenging for IMs to identify the most relevant risks and to initiate adequate countermeasures. Therefore, data visualization is an effective solution to obtain risk-releted information for risk experts and decision makers (Wagner et al. 2015; Islam et al. 2020a; Borgo et al. 2018).

Visual analytics solutions (VAS) are widely used for different purposes in a variety of areas such as finance, biomedical, education, forecasting research in academia etc. (Leite et al. 2020, 2017; Rudolph et al. 2009; Thomas et al. 2017; Islam et al. 2020c). They enable researchers to gain better insights and to inform decision-makers through the analysis of large-scale datasets (Chang et al. 2007; Trelles Trabucco et al. 2019; Islam et al. 2020b). Moreover, a VAS expedites knowledge and provides evidence to improve outcomes (Koldijk et al. 2015). For example, over the last few decades, several VAS have been proposed focus on fraud detection and customer monitoring (Niu et al. 2018; Leite et al. 2015; Huang et al. 2009). These visualizations enable stakeholders to identify suspicious cases where traditional methods fail. However, most of the existing VAS are not effective in the insurance industry because (1) claim risk is very difficult to describe and extremely hard to visualize due to the multi-diversity of data; (2) decision makers are not expert in the procedures of VAS with outcomes such as diagrams, risk maps, and the impact/likelihood positions of specific business risks. Moreover, existing research on the impact of customer behaviors on visualization processing has concentrated on primary insights that are employed in a non-interactive system (e.g., pie and bar charts), while the visualization outcomes on an interactive visualizations system are still limited. Thus, (1) to the best of our knowledge, there is no research on interactive visualizations system to systematically examine the visualization of insurance risk; (2) interaction approaches only aid the theoretical processes for exploring risk visualization; and (3) previous works only examine either low-level tasks (e.g., value retrieval) or high-level tasks (in specific analyses). Furthermore, there is no existing research which considers both high-level and low-level tasks together for decision making. Thus, with natural language interaction, the need to visualize and monitor the policyholders’ claim risk is more urgent than ever before.

Fig. 1
figure 1

Summary of InsCRMVis interface components (Insurance claim and risk management with visual analytics). a query searching, b speech : allow users to freely express query to get visual insights, and c Mouse/Touch/Pen: can be supported with visualization

The recent development of natural language interfaces (NLIs) with data visualization has attracted immense attention from the research community, business decision-makers and industry to improve their net profit and considerably better performance has been achieved via predictive and analytic capabilities (Srinivasan and Stasko 2017). However, it is important to examine why NLIs are essential for data visualization and why they have been increasing in popularity? The existing studies on data visualization for NLIs found that NLIs have the ability to handle large data sets even with limited human and financial resources (Elmqvist et al. 2008; Srinivasan and Stasko 2020). It has been observed that existing NLIs have been used for the effective exploration and communication of ideas in various business domains (Leite et al. 2020; Gao et al. 2015). Additionally, NLIs can assist users to run queries to gain insights into large databases. However, when people want to query their data, they can have difficulty in generating the desired visual response using the existing NLIs. Moreover, existing NLIs are often intended for domain experts and have complex interfaces, hence challenges relating to ambiguity still remain. Therefore, an appropriate NLI with data visualization is required (e.g., language-based, speech-based, touch-based, speech+touch, language+speech+touch) to understand and explain large-scale datasets, particularly for insurance claim and risk management decision support.

The main contribution of this paper is to design a new visual analytic solution (VAS) named InsCRMVis using NLIs to visualize policyholders’ claims and risks in the life insurance industry. The development of VAS for risk visualisation has three goals: (1) to demonstrate the scope of risk visualisation, that is, where and when it may be beneficial and should be recognised as a valuable tool for risk managers; (2) to present a checklist of the most important aspects to consider when visualising risks or risk-related data; and (3) to demonstrate how to visualise hazards for risk management, communication, and risk-related decision-making. Figure 1 illustrates how our system enables IMs to express their questions and intents more freely to gain insights from a large database of 26,817 policyholders’ insurance claims to better manage risk (Laskar et al. 2020). To do this, speech and touch interaction can be supported by visualization, which enables IMs to follow up on the current status of policyholders to handle risk (Thompson et al. 2018; Alsaiari et al. 2020; Srinivasan et al. 2020; Lo and Green 2013). Then, we collected 169 questions from the IMs and found that VAS correctly answered 69% of all the questions. For evaluation, we performed a study of 10 users with three datasets, namely a questionnaire, demographics, and claims, using VAS. The experts’ evaluation suggests that InsCRMVis can identify claim risks accurately assist IMs to reduce loss and guide changes to insurance premium policies for further development planning and management. Thus, the key contributions of this paper are as follows:

  • We introduce a new design space and present an end-to-end framework that enables experts to explore a large database of policyholders’ claim behavior to reduce risk.

  • We present the results of 26,817 policyholders’ claim behaviors to expose the effects of different visualizations on understanding, distraction, driving performance, expert experience, and risk management.

  • We collected 169 questions from the insurance stakeholders. We find that our system correctly answers 69% of these questions.

  • To evaluate the performance, we performed a user study of 10 experts. The experts’ evaluations suggest that InsCRMVis provides better insights and assists IMs to reduce loss and guide changes to insurance premium policies.

The paper is organized as follows: Sect. 2 discusses the related work on NLIs with data visualization for insurance claim and risk management. In Sect. 3, we describe in detail the methodology, the data, data pre-processing, domain characterization and task analysis, and visual analytics solution. We provide details of InsCRMVis system in Sect. 4. In Sect. 5, we present a comprehensive result analysis. We also illustrate a user study of the proposed VAS to assess its capacity to inform the relevant variables for exploring insurance claims and risk management in Sect. 6. Finally, conclusions and future directions are provided in Sect. 7.

2 Related work

Natural language interaction for data visualization has been widely explored both by commercial software developers and the research community. We limit our discussion of the existing related studies to the following: (1) insurance claims and risk management; (2) visualization for claims and risk analysis; and (3) NLIs with data visualization. In the following section, we review the state-of- the-art in these areas to explain the motivation for the proposed VAS.

2.1 Insurance claims and risk management

The exploration of insurance claims and risk management has attracted a significant amount of attention because a large number of policyholders have inflicted great loss on insurance companies and society as a whole (Islam et al. 2021b; Vo et al. 2021). Insurance risk management is a branch of financial risk management and it includes life insurance and healthcare insurance (Eppler and Aeschimann 2009; Ko et al. 2016). From the existing studies, it can be observed that life insurance claims and risk management has attracted more attention than other financial risk management issues. According to KPMG’s Life Insurance Insights (2020), Australian life insurance companies’ premium revenue decreased by 6.1% to $17.3 billion, compared to approximately $18.4 billion per annum for the 2017 to 2019 period (https://home.kpmg/au/en/home/insights/2020/10/life-insurance-insights.html). Moreover, according to Arych and Darcy (2020), approximately 21%–36% of life insurance claims involve suspected fraud, but only 3% of perpetuators are prosecuted. Although researchers have expended great effort to address the problem of insurance claims and risk management using various effective risk management methods, these methods are often inadequate to handle claim and risk management problems (Niu et al. 2018; Eppler and Aeschimann 2009; Leite et al. 2015; Islam et al. 2021b). Moreover, existing studies have revealed that there is a need for data of better quality, consistency and transparency in relation to insurance claims (Zerafa et al. 2021). Also, they lead to inferior outcomes in terms of extracting new insights to make a correct decision. Therefore, there is an increasing demand to improve risk management through the design and implementation of a cost-effective, practical, and real business-wide visual analytic solution.

2.2 Visualization for claim and risk management

Claim and risk visualization employ systematic and interactive methods such as charts, maps, and conceptual diagrams to enhance the quality of risk communication along the entire claim and risk management life cycle. It helps experts and decision makers improve their understanding and deal more effectively with risk in the insurance industry. Visualization and visual analytics have been introduced both in academia and industry: (1) to provide a clear view of customers’ adverse behavior, transaction monitoring, premium fluctuations, and in complex everyday decision-making (Han et al. 2013; Chang et al. 2007; Schulz et al. 2015; 2) to characterize data, user and task (Keim et al. 2008; Ceneda et al. 2016; Stolte et al. 2002); and (3) to discover imbalances and monitor risk (Dasgupta et al. 2015). Whereas some contributions are domain-specific, e.g., visual animation is adopted to investigate the vast amounts of time-series data (Walker et al. 2015; Archambault et al. 2010; Su et al. 2016). To monitor the behavior of a specific stock market user who has exhibited adverse trading patterns and to identify the real-time stock market performance, the 3D treemap is implemented (Huang et al. 2009; Fujiwara et al. 2020). To detect adverse user behavior, the coordinated specific keywords visualization is developed within the wire transactions (Soriano-Vargas et al. 2020). Additionally, various interactive visualization systems are developed to help the stakeholders make an immediate decision for different business scenarios (Yue et al. 2018; Conati et al. 2014). The clustering-based visualization system has been used in financial risk monitoring, discovering imbalances in financial networks, and for predicting head and neck cancer patients (Chen et al. 2013; Afzal et al. 2019; Tosado et al. 2020). However, there are very few works on claim and risk management in the insurance domain. Moreover, the existing systems have limitations in relation to investigating a large number of variables and satisfying specific requirements, e.g., measuring new claims costs, number of accidental claims, and the number of mental health claims of domain experts. Therefore, in this study, we address the gaps in the existing visualization systems and meet the demands of domain experts.

2.3 Natural language interfaces with data visualization

Natural language interfaces (NLIs) are emerging as a promising paradigm for data analysis with visualization (Obeid and Hoque 2020; Islam et al. 2021a). It is gaining in popularity because it helps to improve the usability of visualization systems. Typically, these interfaces respond to user queries by either creating a new visualization and/or by highlighting answers within an existing VAS. It has been explored by the research community and also as commercial software. Existing studies have provided various NLI-based VAS that use well-structured commands to specify visualization. For example, NLI-based VAS such as articulate (Sun et al. 2010), and ConveRSE (Iovine et al. 2020) enable people to explore how NL affects in the incorporation of digital assistants and recommendation systems. DataTone (Gao et al. 2015) manages ambiguity to let people specify a visual response through NL queries and to develop the useful NLIs for data visualization. FlowSense (Yu and Silva 2019) allows the user to write a query and visualization components to specify system functionality. Eviza (Setlur et al. 2016) incorporates a probabilistic grammar-based approach and a finite state machine to provide NLIs for an interactive query dialog. Evizeon (Hoque et al. 2017) supports compound queries, and lexical cohesion with visualizations. The ideas in Evizeon and Eviza were also utilized to describe the Ask Data feature to specify NL queries in an organized shape in https://www.tableau.com/products/new-features/ask-data. From the aforementioned systems, it has been observed that NLIs provide an opportunity to ask any questions in generating the desired visualizations using natural language. However, in the insurance domain, there are no NLI-based VAS to identify insurance claims and manage risk management. Therefore, inspiried by the aforementioned visualization systems, we leverage data visualization with natural language interactions to explore insurance claims and manage risk.

In summary, the existing research on data visualization for exploring insurance claims and risk management is very limited. Although few studies have developed an interactive visualization system, there is no study on data visualization with NLIs to meet the practical requirements of risk domain experts in the insurance industry. Thus, to the best of our knowledge, this is the first work using NLIs with visual analytics approaches to address insurance claims and risk management issues.

3 Methodology

3.1 Data description

This work uses three types of data collected from an Australian insurance company, namely (1) questionnaire data ; (2) demographic data; and (3) policyholders’ claims. All the attributes of the questionnaire dataset are binary where the demographic and claim datasets consist of binary, categorical, numerical etc. data. The attribute descriptions are given in Fig. 2. A brief description of each dataset is presented respectively.

Fig. 2
figure 2

The data sources of the policyholders claims. County and state-level detailed infrastructure

Questionnaire dataset: We acquired the dataset, amassed over 10 years, from a screening questionnaire provided by an insurance company. The questionnaire was considerably large and detailed, comprising information on 64,000 policyholders from 834 questions ranging from personal, medical, family history, occupational details, lifestyle, etc. with responses labelled 0 for ‘No’ and 1 for ‘Yes’. For example, the questions asked whether the participants drink alcohol or not, whether they have cancer or not, whether they smoke or not, whether they have a disease or not, etc.

Demographic datasets The demographic dataset comprises five attributes, namely insurance ID, gender, age, occupation, and postcode. The ‘gender’ attribute comprises ‘male’ and ‘female’. The ‘postcode’ attribute reports the Australian postcode of the policyholders’ place of residence. The ‘age’ attribute reports the age of the applicant in whole years, and shows the youngest applicant is 3 years old and the oldest is 78 years old. The ‘occupation’ attribute contains 18 different categories such as ‘T-Trades’, ‘S-Supervisor of Trades’, ‘R-Special Risk’, etc. As part of the demographic information analysis, we also use the Socio-Economic Indexes for Areas (SEIFA) data set.

Claim dataset The confidential customer claim dataset is provided by the IMs for research purposes only. In total, more than 27,458 claims were recorded from 2010 to April 2019.

3.2 Data pre-processing

As discussed in Sect. 3.1, various information is recorded in the dataset consisting of various attributes. Since attributes have values in different categories, the dataset may contain missing values. To simplify the system to ensure only the most significant data is used, data pre-processing involved reducing the less important and redundant attributes which offer no benefit to exploration and analysis. As part of the data preprocessing, redundant fields that were not eliminated were combined. Finally, the dataset comprised information on 26,817 policyholders with 21 attributes relating to insurance claims and risk management. We applied these cleansed datasets to provide a broader and more comprehensive analysis to explore claims and risk management in the insurance industry.

3.3 Domain characterization and design consideration

To develop an effective visualization for exploring claim and risk management, we first must understand the common decision-making process within insurance guidelines. We need to know what types of information are available in the decision problem, and how humans process these information entities.

In this work, we collaborated with a team of IMs who have more than five years of working experience. Our task is to understand the decision-making problem through a series of interviews and discussions. Therefore, we collected several questions that could not be answered by existing VAS as listed in Table 1. These questions suggest that analysis should be able to inspect the behavior of both individual and/or group policyholders, as well as identify the most important information for exploring insurance claims and risk management.

Table 1 Key questions identified in collaboration with domain experts

We note that our collaborators wanted to conduct a comprehensive analysis of policyholders’ behavior and also wanted to find specific values and information. Thus, it was essential to occupy the insurance claim data without losing detail, e.g., being able to display specific values. As the capability to present response defined the demands for designing a VAS, our design efforts focused on bringing complementary views of various relationships and supporting IMs to examine representative variable in relation to adverse behavior. Our analysis shows that our VAS has the capability to compare variables in terms of policyholders’ behavior. In Sect. 3.4, we discuss our system’s properties which are useful in obtaining responses to such queries.

Based on our experience and taking into account recently proposed design and functional criteria, our system InsCRMVis must include representations of:

R1 Domain-specific data The key influences need to be emphasized and sorted regarding their relevance to the policyholders’ claim benefits. Hence, the IMs must know the supporting as well as contradictory facts before making a decision on a claim for benefits and the potential alternatives.

R2 Key factors IMs must be aware of the information provided by the policyholder in relation to a claim for benefits through visualization in risk management.

R3 Monitor and/or control risks A conceptual NLI incorporating the purpose (why?), the content (what?), the target groups (for whom?), the situation (when?) and the format (how?) allows IMs to systematically explore data visualization in risk management and to discuss new insights.

R4 Decision Insurance decision-making for claim and risk management aims at finding the right information, aims at finding an adverse outcome for a specific policyholders.

R5 NLIs for Investigating insurance claim and risk management NLIs enhanced by visualization requires thorough task analysis and domain expertise to explore risk management and claim analysis

3.4 Visual analytics solution

According to the novel visualization toolkit named NL4DV developed by Narechania et al. (2020), we design our proposed natural language based framework named InsCRMVis for risk data visualization. Figure 3 illustrate the components of the proposed methodology for VAS. We aim to cover the scope of risk visualization, that is to say, highlight various purposes, what are the contents and for whom risk visualization can provide benefits. We consider InsCRMVis to be a useful tool and provide a checklist of the key factors to consider when visualizing risks or risk-related information. InsCRMVis consists of four components: (1) data collection and processing; (2) designing a visual analytics framework; (3) applying the framework to a specific domain; and (4) evaluation.

Fig. 3
figure 3

Proposed architecture of NLI based visualization

Step 1 First, we collected a dataset for data processing, organizing, and cleaning, as described in Sect. 3.2. This ensures the dataset is effective, as organizing and cleansing data make it more reliable and free of duplication.

Step 2 Like many other web applications, our visual analytics framework named InsCRMVis consists of two components: (1) InsCRMVis- Automatic Query Answering, and (2) InsCRMVis- Multimodal System. The first component of the InsCRMVis framework allows the user to search various queries to gain insights into a large database and the second component allows interaction between various plots in a visualization system through touch, mouse and speech. The panel also has a filter option based on the claim score.

Step 3 Our InsCRMVis framework is applied to the life insurance domain in Australia and allows IMs to explore insurance claims and risk management.

Step 4 The domain-specific application framework is dependent on expert evaluation to obtain feedback to assist in reducing loss, and guiding changes to insurance premium policies.

As illustrated in Table 1, we provide specific questions relating to why, what, for whom, when and how the risk-related information should be visualized, , as shown in Fig. 4. Therefore, it is important to start with these questions which will provide possibly useful answers for risk visualization. Through our interface, we can observe this represents a process view of risk depiction; a solution that emphasizes the act of visualizing, rather than just the resulting graphic artifact.

Fig. 4
figure 4

System overview: key questions of the risk visualization framework

4 System description

In this section, we present a new design space data visualization architecture namely InsCRMVis to explore insurance claims and risk management, as shown in Fig. 5. The designed framework combines multiple visualization components such as text, speech, touch etc. which conveys the claim behavior of each policyholder in a consistent representation of the data observations. Our approach is similar to the method proposed by Narechania et al. (2020). It integrates multiple natural language processing and visualization techniques into a framework to support risk experts in the investigation of the claim behaviors of policyholders. It comprises three key components, namely (1) data interpretation, (2) query analyzer, and (3) visualization generation. In the following, we briefly described how InsCRMVis uses these key components to to explore and minimize claim risk?

Fig. 5
figure 5

Framework: overview of the interface functionalities such as input data, query analyzer, and visualization generation

4.1 Data interpretation

We use insurance claims along with questionnaires and demographic data to infer various types of attributes. For example, our dataset contains the attribute ‘Monthly Benefit’ with a range of values. When we look at temporal information, our system may provide misleading information which can lead to poor decision choices. Thus, to overcome this issue, InsCRMVis iterates through the underlying data item values to derive metadata consisting of the attribute types such as quantitative, nominal, ordinal, temporal along with values for each attribute in a range. This attribute metadata is utilized to interpret queries to analyze exact tasks and generate appropriate visual responses.

4.2 Query analyzer

A natural language interaction-based visualizer should be able to analyze the phrases in the query that are more informative. To generate a visual response from a query, NLIs need to identify the related information such as analytic tasks, data attributes, type of visualization, and values as shown in Fig. 6. For example, ’Create a histogram showing distribution of M Sex in NSW’. In response to this query, InsCRMVis performs three operations: query parsing, attribute interface and task interface.

Fig. 6
figure 6

An illustration of a query analyzer while interpreting NL queries

In order to extract details and adopt more relevant phrases, the query parser first runs a set of NLP blocks that include part of speech (POS tags), dependency tree, and N-grams. Followed by query parsing, InsCRMVis searches for data attributes that are specified both explicitly and implicitly. Finally, InsCRMVis analyzes the remaining N-grams for references to analytic tasks such as correlation, distribution, derived value, trend, and a fifth filter task, as shown in Table 2.

4.3 Visualization generation

InsCRMVis uses Vega-Lite to operate as the regulating visualization grammar to visualize up to three attributes at a time. It holds the Vega-Lite marks such as tick, bar, point, line, arc, area, boxplot, text and encodings: x, y, size, color, row, column, etc. Satyanarayan et al. (2016). Similar to NL4DV, the combination of Vega-Lite marks and encodings allows InsCRMVis to support a variety of popular visualization types like bar, histograms, line, strip plots, pie charts, box plots, area, and scatterplots (Narechania et al. 2020). To provide insights related to the query, InsCRMVis analyzes the query for explicit requests for visualization types (e.g., ‘pie charts’, ‘histogram’, ‘box plots’) or implicitly infers visualizations from attributes and tasks. To implicitly determine visualizations, InsCRMVis utilizes a combination of the attributes and tasks derived from the query. Then, it compiles the inferred visualizations into a visList. Each object in visList is composed of a vlSpec containing the Vega-Lite specification for a chart, an inferenceType field to highlight if a visualization was requested explicitly or implicitly derived by NL4DV, and a list of attributes and tasks to which a visualization maps.

Table 2 Types of queries and visualization observed in this study

4.4 Implementation

The InsCRMVis system is developed as a web-based application, where Python and Flask are used to develop the back-end to support data processing and analysis. JavaScript is used to implement the front-end where data-driven documents (D3) are used to build visualization views. A combination of HTML, CSS elements provide the interface and the AngularJS framework is used to structure the web application using a model-view-controller paradigm. The web-based front-end is connected to the back-end through a query engine interface where the query engine brings in aggregated data from the back-end based on interactions and user selections on the front-end. Figure 1 displays the primary screen of the InsCRMVis front-end which comprises a full view for visualizing insurance claim datasets.

5 Results

InsCRMVis visually guides domain experts to identify claim behavior and reduce risks using the various functions described in Sect. 4. We highlight the following outcomes to the selected motivating examples raised by the questions listed in Table 1. In this study, we used 169 questions and we see that 69% of the answers generated by our system are correct.

Identifying and understanding relevant risks (Q1 and Q2): Fig. 7 shows a variety of questions and charts generated by our system. We observe that the use of an appropriate VAS can help the stakeholders become aware of specific risks and provide way to deal with these risks adequately. For instance, an insurance company cannot reduce the number of risky policyholders directly. By analyzing different factors with a reliable system to establish a fair claims management process, a good overview of many relevant business decisions can be gained. Thus, the design of an interactive visualization system is important to avoid any complications with factors that are not relevant.

Exploring situations for risk visualization (Q3 and Q4): The exploration of a large database for risk visualization can provide useful insights for various risk-related purposes. In Fig. 4, we provide target groups and usage situations to make sure for whom and when risk information needs to be visualized to make a decision. For example, IMs most likely want to identify risky users to allocate adequate resources to mitigation measures and to understand how their risks are interrelated. In Fig. 7, we provide some question answering to identify risk profiles. Furthermore, a claim and risk management outcome would look very different if it was intended to be used as a print-out and handed to risk committee members during a meeting. Therefore, this diversity of application situations illustrates that risk visualization should be used systematically in most risk-related activities.

Comparing the behavior of observations (Q5): In order to investigate how the NLIs can provide the desired visualization responses, in Fig. 7 we provide the visual response to different questions where different modalities of interactions are utilized. For example, “Show me the distribution of males in the suburb of Lakemba”, “How many females aged 25 are in the state of NSW?”. Based on the results, we argue that the combination of different interaction modalities is a promising research direction in achieving the desired visual response to explore and refine data in an interactive system.

Fig. 7
figure 7

Sample questions with answers generated by our system. The answer in Q1(a), and Q1(b) is for query searching, Q2 is for speech and Q3(a) and Q3(b) is for touch/pen/mouse

6 User study

To observe how visual responses are generated by VAS on the measures of trust and usefulness, we conducted a user study which helps us to understand the penitential utility of the proposed framework. The primary aim of our study is to examine how real users would use the InsCRMVis system and to investigate their reaction to multi-modal interaction techniques to explore various queries with several visual views. Thus, the evaluation questions were generated based on the aforementioned key questions (Q1, Q2, Q3, Q4, and Q5) provided in Table 4. We performed the study in web-based environments to enhance the system validity, since participants can work on their own.

6.1 Participants

We performed our user study with 10 participants, 7 men and 3 women, aged between 20 and 59 years. The users were recruited through emailing lists. The participants were mostly students or teachers in universities and stakeholders who had expertise in risk management in the insurance industry. Additionally, participants were familiar with basic data visualizations (e.g., bar charts, line charts, etc.) as they frequently encountered these as part of their study or work.

6.2 Study design

To validate the performance of InsCRMVis, a TLI questionnaire was administered. A detailed study was conducted to investigate how to visualize information to gain better insights. To visualize the effectiveness of VAS, we implemented five visualization observations such as status position (mouse), text input visualization, speech input visualization, touch, and pen input visualization as shown in Table 3. In Table 4, the key observation questions were selected to contain satisfactory statements on how our system works and how text/speech/touch output should be structured to avoid distraction. Then, we collected free-form responses as to what the participants considered relevant to the usefulness of our system. The study took about 10 min and all the participants worked in automotive research.

6.3 Discussion

In this study, the participants rated seven measures on a standard five-point LIkert scale, strongly disagree, disagree, neutral, agree, and strongly agree. The results of these questionnaires are presented in Fig. 8. We note that the majority of the responses were positive ratings. In particular, most participants agreed that the tool is useful and it enabled them to find interesting insights from the data quickly. More importantly, 6 out of 10 participants found the combination of multiple input modalities to be useful for exploring visualizations.

Table 3 User study response for output visualization to post-study questionnaires
Table 4 Key observations identified in collaboration with domain experts where expert responses 1, 2, 3, 4, and 5 indicate strongly disagree, disagree, neutral, agree, and strongly agree, respectively
Fig. 8
figure 8

Post-study ranking for output visualization and opinion position

Table 3 shows the performance of various input visualization components such as mouse, text, touch etc. Every concept is built on this system. We found the visual responses generated by our system with text queries are more significant than traditional visual outcomes. Additionally, the use of speech and touch were evaluated as both rational and appealing. It is noted that during the conversation with the domain experts about the system status, 75% of the participants positively responded to visualizations of the output text. They informed that the full text was more convenient for utilization than keywords. Additionally, variations in speech output were mostly accepted. Moreover, two-thirds of the 10 suggested suggested multi-modal actions are also appealing to identify visual responses. Thus, the participants’ comments were mainly in relation to the user interface, which should be robust, interactive, and smart.

At the end of the study, we provide a series of guidelines that IMs can follow when attempting to visualize risks. The following are the key guidelines:

  • Representation of simple text query/conversation can be more flexible to make productive use of visualization in risk management.

  • Use of up-to three attributes from the dataset is comparatively more informative to domain experts.

  • Use of unnecessary elements in a visualization may cause confusion because of various expectations.

  • Various types of risks should be depicted using different queries/symbols.

  • Primary risk information should be distinguished from secondary or less important information.

The experts’ feedback and user studies ensure the effec- tiveness of our VAS in insurance claims and risk management. We noticed several other challenges that should be addressed. Even though the proposed framework had good outcomes and is valuable for risk visualization, there is room for improvement. For example, if someone wants to get a visual response using a variety of keywords but the proposed framework fails to visualize, this requires the use of transformer (BERT and RoBERTa etc.)-based word embedding methods. The resulting explanation may provide better insights, however, the proposed framework does not provide any explanations for generating a visual response. Another improvement can be to provide guidelines for risk insight visualization.

In summary, we present a web-based visual analytics tool enhanced with query searching, speech, and touch for insurance claims and risk management. We primarily focus on how our system is able to support IMs. We concluded that full text query searching has advantages and provides interesting insights. Additionally, domain experts preferred visualization through speech.

7 Conclusions and future work

In this paper, we presented a web-based visual analytics solution (InsCRMVis) that contains a suite of interactive visualizations, designed in consideration of the task requirements of risk management domain experts. To the best of our knowledge, this is the first work to use natural language interactions with data visualization to address policyholders’ claims and manage risk in the insurance industry. In this study, we conducted meetings, interviews, and observational sessions to understand their analysis workflows. Our system supports the analysis of multiple types of insurance datasets, such as relational, claim, and demographical. We find that people ask questions and our system provides useful visual insights. Our automatic question-answering pipeline achieves an overall accuracy of 69%. Finally, we provide a qualitative evaluation of InsCRMVis by domain experts based on several use-cases to demonstrate the usefulness of this system in different application scenarios.

In the future, we plan to add visual analytics to collaborative data analysis such as underwriting, mental health analysis, adverse selection etc. While the N-Gram approach generates the visualization of multiple types of data and conveys how the corpus provides a better response, visual analytics requires advanced techniques such as transformer-based word embedding methods (BERT, RoBERTa etc.) and offer little variations in style. Thus, applying transformer-based word embedding methods may help address such limitations.