Introduction

Offenses that are horrendous, incredibly cruel, and abominable are frequently classified as heinous crimes. Heinous crimes, like all forms of crime, are the consequence of a complex and diverse set of events, and they usually entail some form of extreme personal injury or victim death. According to the Indian Penal Code (IPC) heinous crime is considered as an offence committed by a person in an inhumane manner, such as murder (IPC Section 299, 300, 302) (chopping the death body into pieces), forcible rape (IPC Section 375, 376D) ( kidnapping and committing rape numerous times), or burning someone alive (IPC Section 304B, 326A, 326B) [20]. Furthermore, in some nations, outrageous financial exploitation of elderly or disabled persons is also considered heinous.

Reconviction (recidivism) research is becoming crucial as victims, prosecutors, and the public grapple with how to react to atrocious acts in order to protect the public, foster rehabilitation, offer a means of vengeance, and punish the criminals. Additionally, this study is crucial because recidivism has been regarded as an essential feature by numerous criminal justice boards across the world to evaluate policy interventions in sentencing, incarceration, and deciding parole policies for heinous crime convicts. Furthermore, recidivism research are also significant in fundamental criminal justice issues, and can assist in a number of interventions programme related to convicts, some of which are included below.

  • Incapacitation (Analyze the impact of prison sentence)

  • Specific Deterrence (Determine whether a conviction deters offenders from commiting additional crimes)

  • Improving in–prison rehabilitation programs

  • Desistance

  • Evaluating Prison Performance

According to numerous empirical studies carried out by the Department of Justice of numerous countries indicate that released convicted prisoners have high recidivism rate. Both judiciary and criminologists are extremely interested in understanding how intervention or sanction affects the criminal nature of convicted offenders. In addition, they are also apprehensive about developing methods for early prediction of heinous crime recidivism in offenders using different discriminative risk markers.

There are numerous ways to determine recidivism, and each one relies on a different definition of re-involvement in criminal activity. Significant risk factors for estimating the likelihood of committing a re-offence can be broadly divided into static and dynamic categories, and they include details on the psychological, behavioral, re-arrest, conviction, and length of reincarceration of offenders. Static risk factors are immutable aspects of a convict, such as their socioeconomic status, personality traits, criminal past, and demographic information. Dynamic risk factors, on the other hand, are things that can be affected or modified during the rehabilitation process, such as substance abuse, involvement with antisocial peers, mental health issues, low income, and employment challenges. Various criminological studies have demonstrated the importance of both static and dynamic risk factors in determining the likelihood that former offenders of serious crimes may recidivate. In order to quantify the early recidivism risk among heinous crime convicts, criminologists generally take into account recidivism metrics as listed in Table 1. The present study takes into consideration the use of machine learning techniques for the early risk assessment of violent crime recidivism, as well as to support structured professional judgement and effective rehabilitation.

Table 1 Static and dynamic risk factors for heinous crime convicts

Quantitative assessment of the likelihood of recidivism during the pretrial detention, trial, sentencing, and parole stages is one of the most challenging tasks for the criminologists and the judiciary. In last few years, numerous research groups have been attracted towards quantitative criminology, and have contributed towards the development of automated solutions related to behavioral criminal profiling and the judiciary system. In the literature, most work done are devoted towards automated crime prediction and of different processes related to improve the efficiency of the justice delivery system. Despite the fact that there has been a lot of study done to identify the various risk factors that contribute to violent crime recidivism, there hasn’t been much work done on utilising machine learning to assess these risks using psycho–social and behavioral markers. According to the literature that is currently accessible on computer–aided recidivism detection, the most of the study that has been conducted thus far can primarily be categorized based on the type of sub–problem addressed, and is shown below.

  1. i.

    Evaluating multiple risk factors for predicting general, violent, and sexual recidivism

  2. ii.

    Prediction of criminal recidivism in offenders with mental illness

  3. iii.

    Investigate the efficacy of psychometric tools in automated criminal recidivism prediction

  4. iv.

    Review of intelligent recidivism prediction models and its relevance on the effectiveness of parole board decisions

  5. v.

    Understanding the impact of psycho-social, socioeconomic, and behavioral traits in predicting criminal recidivism

  6. vi.

    Appraising the algorithmic fairness and improving the accuracy of various risk assessment techniques for recidivism prediction

Additionally, few researchers have also addressed the issue of envisioning criminal recidivism in juveniles, and first–time–offenders (FTO). In this work, we initially present a summary of literature review on identifying risk factors for predicting general, violent, and sexual recidivism as depicted in Table 2. A selective overview of studies on machine learning–based approaches for detecting abduction, murder, and sexual assault is also shown in Table 3.

Table 2 Findings of the reviewed sources on multiple risk factors for predicting general, violent, and sexual recidivism
Table 3 Summary of the reviewed articles on machine learning based prediction of heinous crime

Additionally, few studies have also shown a close correlation between mental illness and violent reoffending [16]. As a result, we give a collection of a few studies on risk factors and its use in prediction of recidivism in offenders with mental illness as demonstrated in Table 4.

Table 4 A systematic review on prediction of recidivism in offenders with mental illness

The preceding literature review on computer–assisted violent recidivism risk assessment highlighted the following research gaps.

  1. i.

    Most of the research studies measure recidivism risk using only socio–demographic or psychological factors, resulting in poor accuracy

  2. ii.

    Accommodating additional risk markers in existing standard recidivism risk assessment software’s viz. COMPAS [5] and LSIR [11] is difficult

  3. iii.

    Some schemes measure recidivism risk based on preliminary crime investigation records, ignoring individual risk markers

  4. iv.

    Generally machine learning-based recidivism risk assessment techniques do not eliminate bias, resulting in poor prediction ability, sparsity, and fairness

However, there are still a number of critical issues in the field of automated heinous crime detection and risk assessment that remain to be investigated. Taking into account the research directions, it has been assessed that there is ample scope to develop an enhanced intelligent system for the recidivism risk assessment of violent criminal offenders. Such a risk assessment framework will help the judiciary identify offenders who are likely to commit violent crimes again and recommend rehabilitation and supervision. As a result, the major contributions of this study is listed below:

  • Efforts have been made to construct an intelligent criminal recidivism risk assessment framework that encompasses personality, psycho-social, and environmental markers as input.

  • The study uses two violence risk assessment scales viz. \(HCR-20\) and \(V-Risk-10\) for quantifying offender behavior.

  • The significance of the collected features are evaluated using a mutual information-based scoring method in order to classify convicts according to their likelihood of recommitting a heinous crime.

  • Finally, five distinct classifiers and a decision tree ensemble classifier for recidivism behavioral sub-typing of heinous crime convicts are evaluated on the HCC datasets.

Fig. 1 depicts the suggested algorithmic framework for assessing the likelihood of reconviction in heinous crime convicts. The article’s contents are shown below. Section “Data and Methods” discusses the procedure of prison data collection and quantification of behavioral and non-behavioral risk attributes among violent offenders. In addition, it also provides a detailed discussion on the process of feature value normalisation and one-way ANOVA-based feature selection. Section “Ensemble Learning Methods” serves as an demonstration towards ensemble learning and decision tree ensembles. Machine learning based recidivism risk assessment along with the proposed decision tree ensemble based method for automated recidivism risk assessment in heinous crime convicts is presented in Sect. “Bagged Tree Ensemble Framework for Recidivism Risk Assessment of Heinous Crime Convicts”. Section “Simulation Results and Discussion” demonstrates our simulations on acquired data from felonies and their recidivism risk gradation analysis. Finally, Sect. “Conclusion” offers concluding remarks.

Fig. 1
figure 1

Schematic representation of the proposed algorithmic framework for violence recidivism risk assessment

Data and Methods

Routine evaluations of offenders with prior convictions for horrific crimes are frequently required to determine recidivism risk and execute appropriate rehabilitation strategies. In the subsections that follow, we describe the number of convicted offenders examined, the psychological questionnaire adopted, and the procedures followed during the acquisition of behavioral and non–behavioral data.

Study Subject Selection

The present experimental research was conducted on 132 inmates convicted of heinous crimes and incarcerated in Jharkhand State, India, between 2018 and 2021. Inclusion criteria for this study included 18 to 45 years old convicted of heinous crimes viz. murder, rape, kidnapping/ abduction and serving between two and ten years in prison. The proportion of convicts with different heinous crimes considered for this study is depicted in Table 5.

Table 5 Conviction based distribution of offenders

Data about each individual offender is gathered at the primary and secondary levels, and might be qualitative or quantitative. The primary data was gathered through interviews with offenders conducted by a clinical psychologist, while secondary data sources included the offenders past criminal history and present observed behavior gathered through prison authorities. Interview questionnaire comprises of questions designed to evaluate personality, psycho-social, socioeconomic, offence description, and current prison behavior of each participating convicts. Moreover, the survey questionnaire was created with the assistance of a panel of professionals that included a forensic expert, clinical psychologist, criminal lawyer, and prison administrator. Standard violence risk assessment tools \(HCR-20\) and \(V-Risk-10\) [3] with appropriate customization were also included in the questionnaire to measure deviant conduct in individual violent crime convicts. Moreover, qualified prison counsellors who were psychology graduates interviewed all convicts over the course of a six–month period to maintain complete data accuracy. In addition, the following measures were also used to ensure that survey–based data acquisition is free from different types of bias:

  1. i.

    Each inmate’s imprisonment psycho-social experiences is reviewed by the authorities and the related vocational instructor

  2. ii.

    In accordance with commonly accepted clinical guidelines, extreme and neutral responses to a given category were standardized

  3. iii.

    Question order bias and non-response bias were minimized by randomly ordering the inquiries and their corresponding answers

Recidivism Feature Measurement and Representation

Assessing the recidivism risk of violent offenders using the \(V-Risk-10\) and \(HCR-20\) psychological profiles is regarded as the gold standard world wide and are investigated in this research. In addition to the preceding psychometric scores, we also consider personality, psychological, socio-demographic, and prison information in the survey questionnaire for behavioral portrayal of each offender and algorithmic recidivism risk assessment. Since the \(HCR-20\) and \(V-Risk-10\) risk markers alone cannot accurately model violent recidivism characteristics of offenders, other essential known features has been considered here for accurate recidivism risk assessment. In order to develop a decision tree ensemble model for automated violent recidivism risk assessment among heinous crime convicts, all of the reported risk markers are being used. Standard and specially designed psychometric measures categorised as scale dependent and scale independent risk markers used to measure violent recidivism among offenders in the present investigation, is summarised in Tables 6 and 7.

Table 6 Scale dependent violence recidivism risk markers
Table 7 Scale independent violence recidivism risk markers

The intended criminal history dataset encompasses felony convictions for three distinct categories (viz. rape and sexual assault, murder, and kidnapping and abduction) of violent crimes for 132 offenders. Specifically, the dataset includes 73 analyzable attributes, including personality, psycho-social, demographic, and other known risk markers. The above stated 73 set of diversified attributes represents 20 \(HCR-20\) behavioral characteristics, a 10 set of \(V-Risk-10\) evaluation parameters, and 43 other attributes. In addition, prior to machine learning–based analysis the subjective behavioral responses of convicts were quantified with assistance from Central Institute of Psychiatry, Ranchi. The quantification of the associated psychological score for each of the 20 items in \(HCR-20\) and 10 items in \(V-Risk-10\), were marked as absence (0), mildly significant (1), or absolutely significant (2). Additionally, the remaining 43 attributes were also graded accordingly in consultation with domain experts.

Feature selection

In pattern classification applications, feature selection is a significant technique for reducing the dimensionality of a dataset by eliminating redundant features [9]. In particular, the classification performance of a model on separate datasets improves as the amount of features used in the model is reduced. In addition, the existence of an enormous number of irrelevant features might impede the training process and result in a neural network with a higher proportion of connection weights than required by the task. Numerous approaches for selecting features are available, with varying degrees of applicability based on the domain in consideration; a comprehensive review of these methods is provided elsewhere [7]. The use of mutual information criteria is one such method for identifying an informative subset of attributes to be used as input data for a classification model. For selecting discriminative features, Battiti [2] was the first to address the issue of evaluating mutual information for feature selection. Mutual information evaluates spontaneous dependencies between random variables and assesses the significance (information content) of individual attributes in difficult classification problems. In this study, we therefore adopt the principle of mutual information (MI) to assess the informational value of each individual feature in relation to the output class.

Machine learning algorithms for feature space classification

In pattern recognition, majority of machine learning algorithms model the relationship between measured features and a class label (target) given a set of observations. Thereafter, the established model is then utilised to predict the class label of subsequent observations based on their characteristics. Each classifier must be configured with a suitable machine learning algorithm so that similar model inputs yield desired outputs. The overall set of observed data is classified into training and testing data sets; the training data is utilised to update the network’s weights; and the training process is referred to as learning paradigms. The remaining test data is then used to evaluate the trained classifier’s performance. In this research, we suggest using a decision tree ensemble to assess recidivism risk of heinous crime convicts using individual, socio–demographic and behavioral risk indicators. Moreover, the effectiveness of the retrieved features for classification was also evaluated using five more prominent classifiers: PNN, LDA, KNN, and SVM [13, 18, 26, 31].The parameters of each classifier were optimised for maximum efficacy,and their performances were evaluated using the same training and testing data sets. Section “Ensemble Learning Methods” offers a short insight to ensemble learning methods. Decision tree ensembles are investigated in Section “Decision Tree Ensembles”. The decision tree ensemble framework for the assessment of recidivism risk among heinous crime convicts is detailed in Sect. “Bagged Tree Ensemble Framework for Recidivism Risk Assessment of Heinous Crime Convicts”.

Ensemble Learning Methods

Ensemble learning is a machine learning paradigm that blends the predictions of different classification models to improve the overall forecasting accuracy. Bagging, stacking, and boosting are the three classifier merging algorithms that predominate in the field of ensemble learning. Popularity of these algorithms is largely attributable to their simplicity of implementation and effectiveness on a wide variety of predictive modelling challenges. In addition, ensemble processes can also be categorized according to two distinct criteria, as illustrated in Table 8.

Table 8 Different types of ensemble learning methods

Decision tree ensembles have been a popular option for many scenarios when it comes to solving complex classification and regression problems. The bias and variance associated with simple and complex decision trees are both eliminated by ensemble techniques, which integrate multiple decision trees to obtain better prediction performance than a single decision tree. Furthermore, the premise behind using decision tree ensemble model is based on the idea that a group of weak learners can come together to become a strong learner. The subsequent section details the creation of ensembles of decision trees as well as the various techniques to combine individual trees.

Decision Tree Ensembles

A single decision tree will never adapt effectively to untrained data. However, we can create extremely precise forecasts by aggregating the predictions of a significant number of decision trees. As the bias and variance of a decision tree are both low, the results of multiple trees can be blended to further lower the variance while preserving the small bias. The process of integrating multiple decision trees into a single, more robust model (classification/Regression) is termed as decision tree ensemble. As was previously indicated, three prominent ensemble learning algorithms used to merge decision trees are bagging, boosting, and stacking. Bagging, boosting, and stacking are the standard approaches to aggregate decision trees, Table 9 introduces these techniques and distinguishes their core concepts.

Table 9 Different Types of Ensemble Learning Methods

In the following section, we present a bagging ensembles of decision trees for assessing the recidivism risk among heinous offender populations.

Bagged Tree Ensemble Framework for Recidivism Risk Assessment of Heinous Crime Convicts

Ensemble learning approach is a process that combines the strengths of numerous models (often referred to as weak learners or basic models) that have been trained to address the same classification/regression task. The basic idea is that weak models coupled appropriately produce more accurate, resilient, and reliable outcomes. The basic principle is that weak models integrated appropriately can reduce bias and/or variance of individual weak learners and help build a strong learner (or ensemble model) that provides more accurate, robust, and reliable results. In this study, we adopt the Bootstrap aggregation (Bagging) strategy as an ensemble approach to reduce the decision tree variance. Use of bootstrap replicates of training data during bagging is intended to introduce diversity among ensemble member classifiers during training of individual weak learners. The Bagging process averages the predictions of individual weak learners trained with bootstrap replicates of the training dataset [8]. In addition to minimising variance, bagging is also intended to prevent the learning system from becoming over-fit. Figure 2 is a simplified illustration of the bagging method for developing a decision tree ensemble for feature classification.

Fig. 2
figure 2

A bagged ensemble of decision tree for feature classification

Steps to create an ensemble of decision tree:

  1. 1)

    Initialize an empty list to store the T decision trees

  2. 2)

    For t in range(T):

    • Randomly select a subset of the training examples with replacement. Let this subset be denoted X_t and the corresponding labels be y_t.

    • Create a decision tree with the specified maximum depth (max_depth) using the subset X_t and labels y_t.

    • Add the decision tree to the list of decision trees.

  3. 3)

    To make a prediction on a new example x:

    • For each decision tree in the ensemble:

      1. i.

        Make a prediction using the decision tree on the new example x.

      2. ii.

        Store the prediction.

    • Aggregate the predictions of all decision trees in the ensemble. This can be done using methods such as majority voting (for classification problems) or averaging (for regression problems).

    • Return the final prediction.

  4. 4)

    Evaluate the performance of the ensemble on the testing dataset using a suitable metric such as accuracy, Precision, Recall, F1 score.

For the purpose of risk assessment of heinous criminal convicts using various socio–demographic and behavioral markers, this multiple bagged decision tree ensemble model assigns a class label to each feature instance using majority of votes [1] of all the decision trees.

In addition to using bagged decision tree ensemble, we also simulate recidivism risk assessment with the use of some other standard classifiers viz. PNN, LDA, KNN,and SVM.

Simulation Results and Discussion

The HCR-20 and V-risk-10 based customized questionnaire were followed for the creation of violent recidivism dataset. This dataset was used to test the effectiveness of different standard classification models viz. PNN, LDA, KNN, SVM, DTC, and Treebagger ensemble. Each models’ competence can be evaluated by how accurately it places an entity into one of several predetermined categories. Standard metrics viz. sensitivity, specificity, accuracy, and F-Score [23] are used to evaluate the effectiveness of different classification models. For a fair comparison five fold cross validation is harnessed to evaluate the performance of the above classifiers. The parameter settings of the decision tree classifier and the tree bagger ensemble is depicted in Table 10.

Table 10 Parameter settings of different classifiers

The efficacy of all the models can be measured based on how well it can categorise a subject. The classification of a behavioral risk from the subject group might have four different outcomes: true positive (\(T_{p}\)), true negative (\(T_{n}\)), false positive (\(F_{p}\)), and false negative (\(F_{n}\)). The effectiveness of the suggested method can be evaluated with respect to four metrics viz. accuracy, sensitivity, specificity, and F–score [23].

Table 11 Predictive validity based on class labels

Table 11 displays the overall findings in terms of predictive validity for the simulated classifiers, depicting both the actual and predicted outcomes. Table 12 represents classification results in terms of average accuracy for six classifiers. In addition, Table 13 presents classification outcomes in terms of recall, precision, specificity and F–score. Moreover, the performance results of the proposed system and the existing methods are presented in Table 14.

Figure 3 shows a relationship between features and their respective p values, which serve as an indicator of the features’ ability to distinguish between the distinct types. Information on the likelihood of recidivism for those convicted of Heinous Crimes is derived from a subset of the available risk markers (chosen from a pool of around 73).

Fig. 3
figure 3

Examining the p value and feature index in a G-scatter plot to extract useful characteristics

Table 12 Average prediction (classification) accuracy over 5-fold
Table 13 Performance evaluation of the classifier
Table 14 Comparison of proposed system with existing results

Table 15 the total execution time (in seconds) of the aforementioned classifiers during the whole classification model (training and validation phases).

Table 15 Variation of computation time in seconds

The proposed experimental findings show that the average precision for various base classifiers (PNN, LDA,KNN, SVM) is (\(70.10-78.00\)), while the recall for these classifiers is (\(72.10-75.60\)). Treebagger ensemble model offers the highest overall precision (\(93.30\%\)) and recall \(92.30\%\) Treebagger ensemble’s average accuracy was \(88.38\%\) for scale-dependent and scale-independent risk variables. The ensemble model takes more time to compute than other base classifiers.

Conclusion

Criminals convicted of heinous or violent offences are especially susceptible to recidivism. In order to make decisions about parole, bail, punishment, etc., the judiciary often conducts routine screening of incarcerated people guilty of violent crimes. The standard procedure followed by most of the detention center authorities in developing countries to evaluate inmates is typically based on their own personal prejudices and perceptions. However, recently few selected prisons across the globe have adapted computer aided methods for the recidivism risk assessment and decision making. Such methods include automated behavioral analysis and their statistical analysis. The purpose of this research is to develop machine learning based methods for the recidivism risk assessment in heinous crime convicts. In this regard, we propose a a novel ensemble learning model for risk assessment among heinous crime offenders using customised quantitative recidivism risk markers. A survey questionnaire comprising psycho–social, personality traits, and cumulative prison conduct based on customized \(HCR-20\) and \(V-risk-10\) recidivism risk markers scale was used to create the data set followed by decision tree ensemble based risk assessment. Using mutual information based feature selection strategy, we identified 73 discriminative attributes from the original set of 93 risk markers.

In addition to simulating the suggested tree bagger ensemble, we also simulate five other standard classifiers to compare their results against the designed violent risk assessment dataset. The proposed method is able to achieve an accuracy, recall, precision, specificity, and F-Score of \(88.38\%\), \(92.30\%\), \(93.30\%\), \(96.10\%\), and \(92.70\%\) respectively, It is also observed that the proposed scheme outperforms existing systems [14, 27, 28, 30] in terms of accuracy. We are sure that such findings will facilitate criminologists in early quantitative assessment of violent crime convicts.

In future, more efforts will be made to explore possible approaches: (a) for identifying personality abnormalities in predicting recidivism in violent offenders, (b) explore advanced machine learning methods for maximizing the performance