1 Introduction

The rapid developments and advancements have revolutionized the recent era in data analytics, machine learning, artificial intelligence, the Internet of Things (IoT), and the exponential growth of computing power. Machine learning and data analytics play a huge role in healthcare, changing the diagnosis and treatment of various diseases. Likewise, artificial intelligence (AI) and machine learning (ML) have proven their worth and significance in medical sciences and pharmacy. Several expert systems and medical diagnostics applications have been developed to improve the patient’s health and life and assist the doctors and physicians’ expertise, skills, and practices. Accordingly, an excessive amount is being spent developing decision-making applications, clinical tools, and machines. These applications, tools, and machines support doctors, psychologists, medical specialists, and radiologists in the swift, accurate and timely detection of diseases [151].

Worldwide, a large number of people are suffering from diverse disorders such as depression, mental stress, anxiety, and diseases of the central nervous system such as Alzheimer’s [185], dysgraphia [154], Parkinson’s disease, traumatic injury, vascular disease, disease of Lewy body and degeneration of frontotemporal lobar. The second most common and most frequent neurological disorder, known as motor system disorder, is Parkinson’s disease (PD). The diagnosis and detection of PD are tricky and challenging, and it is still an open and hot problem for researchers. The research and experimentation directed toward the diagnosis of Parkinson’s disease is an old concept. It can be traced back to the seventeenth century when Galen described the various symptoms of Parkinson’s disease and the spread of infection was round about 7–10 million peoples [171]. Sufficient efforts have been performed since the seventeenth century for PD identification on time and on early stage but still no robust system or test made was available to detect the signs and symptoms of PD on early stage because the symptoms and signs vary and are different from one person to another person. These symptoms are also very similar to other disease that can make difficult in diagnosis of PD from other disease and someone can incorrectly interpret the disease. But symptoms of shaking, tremors and gaits can play an important role in PD detection at early stage. In 1817, an English doctor named ‘James Parkinson’ [130] wrote an essay ‘Shaking palsy,’ where different symptoms of Parkinson’s disease were discussed. Henceforth, the disease got named ‘Parkinson’ in honor of James Parkinson.

Number of articles (review, state of the art or survey) published for Parkinson’s disease but some cover purely medical side, some covers the statistics and few cover the application of machine learning [2, 37, 46, 102, 136, 146]. The country-specific, regionally, and globally detailed systematic analysis and statistics of increasing number of Parkinson’s disease from 1990 to 2016 are presented in [46]. They reported that PD patients increased from 2.5 million in 1990 to 6.1 million in 2016 across the world. More recently, in this review, Channa et al. [37] presented wearable technologies for PD detection from 2009 to 2020. Recently, in 2020, Mendon¸ca et al. [102] covered the works and impact of depression on PD individuals. Priera et al. [136] surveyed automated PD identification systems from 2015 and 2016. In this paper, our primary focus was on the methods, studies, and research using speech, handwriting, MRI, and gait as input data for detecting Parkinson’s disease with the help of machine learning and deep learning. Our focus is to cover the current methods using deep learning and machine learning after 2016. We also compile all available data sets in the field of machine learning. The framework developed and followed for this survey paper is provided in Fig. 1. The six major parts of the framework are the preliminary concepts, PD datasets, state-of-the-art techniques, the challenges and issues, discussion and analysis, conclusions, and future directions.

Fig. 1
figure 1

Framework for artificial intelligence-based PD detection

The primary motivation of this study is to educate the readers, beginners, researchers, and medical specialists in the field of machine learning, medical, and pharmacy about machine learning and deep learning techniques and approaches in a comprehensive way that are used for PD detection. The ultimate goals of this survey are to:

  1. 1.

    Provide a comprehensive review of the application of different techniques (machine learning and deep learning) used to automatically detect Parkinson’s disease.

  2. 2.

    Investigate and compare the accuracy achieved by various research studies and determine the most promising model.

  3. 3.

    Highlight the categories (speech, handwriting, MRI, and gait) of PD datasets and provide a detailed description for each of the datasets available.

  4. 4.

    Comprehend the feature extraction methods for PD detection and the various types of features.

  5. 5.

    Address the open challenges and issues observed during automated PD diagnosis.

  6. 6.

    Provide a comparative analysis for different categories of PD datasets and analysis.

  7. 7.

    Provide a comparative analysis using different approaches (machine learning and deep learning) from various PD datasets and analyses.

  8. 8.

    Find the most widely used approach (machine learning or deep learning) by the research community.

The rest of the paper is organized based on the given framework as follows: Section 2 provides a detailed overview of several preliminary concepts. Next, Sect. 3 presents the various categories (speech, handwriting, radiology, and gait) of the PD datasets. The available datasets in each category are also presented. Afterward, in Sect. 4, different studies for the detection of Parkinson’s disease are discussed in detail. Various techniques of machine learning, deep learning, and some other medical tools and methods are discussed. Different open research challenges and issues are provided and elaborated in Sect. 5. In Sect. 6, a detailed comparative discussion and analysis are provided based on this study’s research article. Finally, Sect. 7 concludes the research article.

2 Preliminary concepts

To understand the role of machine learning and artificial intelligence for Parkinson’s disease detection, we need to go though fundamental concepts of the Parkinson’s disease. In the subsections, a thorough discussion is provided for nervous system disorders. Similarly, the various causes and symptoms (motor and non-motor) for Parkinson’s disease are presented. Subsequently, the importance of automated Parkinson’s disease detection in comparison with traditional non-artificial intelligence-based methods is discussed.

2.1 Nervous system disorders

The nervous system [108] plays a vital role in controlling different mental process such as thinking, memory and learning. It connects us with the environment internally as well as externally by using its receptors. There are two components of the nervous system, i.e., central nervous system (CNS) and peripheral nervous system (PNS). The central nervous system [31] is composed of the brain and spinal cord. It combines all information that it receives and likewise controls and maintains the activities of the body. The peripheral nervous system [144] consists of ganglia (i.e., peripheral nerves) that are located outside of the brain and spinal cord. It works as a communicator between the central nervous system and other parts of the body. It also facilitates a connection between CNS and the stimulus (external and internal) to let the body respond to its environment.

There exist some diseases that affect the central nervous system and lead to the disruption of wholebody functionality. Some of the infections like encephalitis and poliomyelitis [29, 87, 162] affect the central nervous system along with neurological disorders like autism, neurodegenerative disorders like Parkinson’s and Alzheimer disease, which adversely damages the nervous system. In neurodegenerative disorders, the cells in CNS cease to operate. Henceforth, neurodegenerative disorders are associated with diseases that are progressive in nature. These degenerative disorders may lead to disabilities and affect patient’s everyday activities.

2.2 Parkinson’s disease: overview, causes and symptoms

Parkinson’s disease is a neurological disorder that is mostly observed in elderly people [42] as well as sometimes in adults [65]. Depression and stress can also cause Parkinson’s disease [102]. Trauma, head injury, inheritance, genetics and environmental factors (toxins, pesticide, chemicals, etc.) may also cause PD [127]. It is a disease that damages the central nervous system and affects the neurons that are involved in different movements. Additionally, it affects cognitive and mental activities. Parkinson’s disease is a brain disorder (substantia nigra), and normally, these cells produce a vital chemical known as dopamine as depicted in Fig. 2. Dopamine allows a smooth, coordinated function of the body’s muscles and movement. When this dopamine level decreases in the substantia nigra of the brain, the person experiences Parkinson’s disease symptoms. Alteration in the cortisol level also leads to PD [173].

Fig. 2
figure 2

Overview of Parkinson’s and healthy person neurons [18]

Parkinson’s disease symptoms appear when around 80 percent of the dopamine-producing cells are damaged. Its symptoms are divided into the motor as well as non-motor symptoms [152]. The motor symptoms are related to movement. Contrarily, the non-motor symptoms are irrelative to movement. Motor symptoms [76] include muscle rigidity (inability to be bent), tremor (shaking or trembling), bradykinesia (slowness of movement), postural instability (an issue related to balancing), movement’s disorders, shrinking of muscles. It includes posture disruption as well as disruption of mood (anxiety and depression) [44] (see Fig. 3).

Fig. 3
figure 3

Symptoms of Parkinson’s disease [99]

A person suffering from PD is at a higher risk of falling [210] since dopamine is responsible for the controlled movement of the body. These constant falls result in overwhelming fear of dropping, which results in the limitation of multiple activities and leads to restrictions in participation. This disease decreases the quality of life; the patients suffer from pain and their psychological behavior is affected negatively. Since it is a chronic and progressive disease [161], the condition of patients suffering from the disease grows more critical over time. It has been reported that PD has no cure [141] and it reduces the quality of life [66]. Dancing is one of the physical exercises usually recommended for PD patients because it can enhance the movement and balancing the capabilities of patients [165]. In the longitudinal study from the Honolulu Heart program, it is observed that the intake of coffee acts as the protective agent against PD [156].

2.3 Significance of automated PD diagnosis

Parkinson’s disease is diagnosed through neurological tests and brain scanning. These methods (test and scans) are extremely expensive. In addition, the expertise and knowledge of professionals are required for the diagnosis the Parkinson’s disease. Principally in the field of medical, early diagnosis and treatments are highly valued and preferred. However, the aforementioned may not be possible for a large number of patients using manual PD diagnosis. It has been noticed that sometimes diagnosing a large number of patients using manual methods may be a time-consuming task leading to tremendous delays.

Consequently, rather than using brain scanning and neurological testing, researchers have developed various machine learning-based methods and techniques to differentiate PD patients from healthy controls by using handwriting, voice and speech samples [186]. The automated system aims to provide benefits in various ways. It reduces the burden associated with manual processes required for clinical trials [131]. It also decreases the time required for testing software and exposes the maximum number of errors in minimum time [30]. Hence, it can be claimed that automated diagnosis systems are time efficient. Similarly, the deep learning models demonstrate accurate results for segmentation and automated detection [86]. Advanced healthcare systems such as sensor-based systems can provide health as well as economic benefits. The interventions in technology are performing exceptionally well. It enhances healthcare access, refines the results in medical treatments and achieves better performance while reducing the costs [110].

Gait analysis also plays an important role in the detection of Parkinson’s disease. The FOG is defined as an irregular inability to breed effective steps in the absence of any known cause other than high-level gait disorder or parkinsonism [60]. It is commonly practiced during step initiation and turning but also when tackled with obstacles, doorways, distraction and stress. Generally, pressure-sensor walkways are being used in the laboratories for gait assessment [51]. Gait deficiencies in Parkinson’s patients arise from a disruption in the motor set function of the basal ganglia specifically included in the regulation of movement amplitude. Early studies investigating the PD gait disorders used stride analyzers to designate abnormalities in spatial–temporal considerations of gait [177].

3 Datasets for PD analysis

Dataset serves as a vital and preliminary role in the artificial intelligence-based learning and diagnosis systems. To diagnose Parkinson’s disease, various types of data inputs are reported in the literature. The data inputs are speech signals, MRI, PET/CT images and handwriting images. Over the years, different datasets have been developed and are available for professionals and prospective researchers in the field of AI and machine learning to conduct analysis and experiments. Overall, the datasets for PD detection can be divided into four major categories: speech datasets, handwriting datasets, radiology image datasets and gait datasets. This section provides a thorough review of the available PD datasets for each of these dataset categories.

3.1 Speech datasets

Speech is the ability to express thoughts and feelings through fluent sound. The speech has been proven to be helpful for the identification of various diseases such as Parkinson’s disease. Speech impairment is one of the leading symptoms found in PD patients. Several research studies have been directed toward the collection of Parkinson’s speech datasets. The available PD speech datasets are listed and discussed below.

  • Parkinson Multiple Types of Sound Recording Data set The PD database [159] consists of training and test sets and includes a total number of 1040 instances. It is comprised of data from PD patients, as well as healthy individuals from both genders (male and female). The training set consists of the data collected from 20 PD patients (6 females and 14 males) and 20 healthy persons (10 females and 10 males). A total of 26 speech samples were recorded for each person. These speech samples include short sentences, words, numbers and sustained vowels. Subsequently, 26 linear features were extracted from each of the speech samples. On the other hand, the testing set has speech recordings of only two sustained vowels “a” and “o”, collected from 28 PD subjects. Hence, there are 168 voice samples, including 6 voice samples from each PD subject. This dataset can be used for classification and regression purposes.

  • Parkinson Speech Dataset by Max Little This speech dataset was created by Max Little of the University of Oxford in collaboration with the National Centre for Voice and Speech in Denver, Colorado, who recorded the speech signals [88, 89]. The proposed biomedical voice dataset contains 195 sustained vowel phonations from 31 subjects and 23 diagnosed with Parkinson’s disease. The duration of the PD diagnosis ranged from 0 to 28 years, and the age of the participants ranged from 46 to 85 years. The phonations were recorded in Industrial Acoustic Academy (IAC) by using a headmounted microphone. Per subject six phonations were recorded, and it consisted of 23 attributes, i.e., name of the subject, jitter, shimmer, amplitude, average, maximum and minimum vocal frequency and status (healthy or non-healthy), etc.

  • PC-GITA Database by J.R. Orozco-Arroyave A dataset of 100 subjects, 50 non-healthy and 50 healthy people, was developed by authors in [118]. In each group, there were 25 men and 25 women. The age of the men diagnosed with PD ranged from 33 to 77, whereas the age of the women diagnosed with PD ranged from 44 to 75 years old. All the participants were Spanish native speakers. The recordings were captured in noise controlled room by using an audio card with 24 bits capacity. The voice recording was captured when participants were rapidly repeating sustained vowels, syllables like ‘pa-ka-ta’, words and sentences. Lastly, all the subjects were examined by neurological experts.

  • Parkinson Speech Dataset by Skodda This dataset includes the voice recording of 73 PD patients and 43 healthy controls [172]. The age of patients ranged from 43 to 83. The patients were on dopaminergic medication at least 4 weeks before the examination. All the subjects involved in the recordings were native German speakers. These recordings were captured when participants repeated the syllables ‘pa’ and ‘ba’ at least 25 times. Speech samples were recorded digitally by using a headset microphone and commercial audio software. The distance between the microphone and the mouth of the subject was 3 cm.

  • Parkinson Speech Dataset by Bayestehtashk Bayestehtashk [19] captured the voice recording of 168 patients and 21 healthy subjects. The recordings were captured at three different stages. In the initial stage, participants repeated the sustained vowel ‘ah’ for about ten seconds. Next, in the second stage, they repeated the syllables ‘pa-ta-ka’. In the final stage, the participants repeated the three passages, i.e., “The north wind and the sun”, “the rainbow”, and “the Grandfather”. Since three samples were recorded per subject, therefore the dataset contains a total number of 567 samples.

3.2 Handwriting datasets

Handwriting is the process of writing down some material by individuals, usually to express their feelings and thoughts. Handwriting serves as a biomarker for Parkinson’s disease detection because it is one of the main symptoms of PD. Different characteristics of handwriting have been evaluated by numerous researchers to diagnose the diseases of Alzheimer’s and Parkinson’s. Likewise, researchers have also observed handwriting issues in subjects with dysgraphia. It has been observed that the handwriting of a Parkinson’s disease patient gets highly affected. Approximately 63 percent of Parkinson’s disease patients suffer from micrographia. Further, the handwriting data can be categorized into online and offline [140]. In online handwriting, writing samples are collected by writing on tablets, whereas the offline samples are collected by writing on pages. The available PD handwriting datasets are listed and discussed below.

  • PaHaW Dataset Parkinson’s disease handwriting database is a readily available dataset. It has been utilized by nearly all the researchers that have carried out their studies on the diagnosis of Parkinson’s through handwriting. In the PaHaW dataset, handwriting samples were taken from 75 subjects: 37 Parkinson’s patients (19 men and 18 women) and 38 healthy controls (20 men and 18 women). These handwriting samples were in the English language and were collected using a digitized tablet. The digitized tablet holds coordinates values in-air as well as on the surface movement of the hand.

  • PDMultiMC dataset The handwriting samples were collected from 32 subjects, consisting of 16 healthy and 16 non-healthy subjects in PDMultiMC dataset [180]. The 16 patients consisted of 12 males and 4 females, whereas 5 males and 11 females were present in the healthy subjects group. The examination of PD patients was done in “on-state” and “off-state”. On-state refers to 1 h after taking their regular dose of dopaminergic medication, whereas off-state means no dopaminergic medication. The subjects were asked to complete seven handwriting tasks. These writing tasks were prepared according to the templates like “Repetitive-cursive letter”, “Triangular wave”, “Repetitive ‘Monday’ word”, “Repetitive ‘Tuesday’ word”, “Repetitive subject’s name” and “Repetitive subject’s last name”.

  • HandPD HandPD dataset [135] was gathered at the faculty of Medicine of Botucatu, Sao Paulo State University, Brazil. It consists of handwritten examination data of two groups, i.e., healthy group and the patients’ group. The healthy group consisted of 18 individuals (6 males and 12 females), whereas the patient group consisted of 74 individuals (59 males and 15 females). In the healthy group, two individuals were left-handed, while 16 were right-handed. In the patient group, five were left-handed, while 69 were right-handed. HandPD dataset consists of a total of 736 images, 368 images in each group. A few examples that have been extracted from the handPD dataset are shown in Fig. 4.

    Fig. 4
    figure 4

    Few examples of spirals and meanders extracted from HandPD dataset [138]

  • NewHandPD The NewHandPD dataset [137] is an extended HandPD dataset. It is composed of data of two groups, i.e., healthy and patient. The healthy group consisted of 35 individuals (18 males and 17 females), whereas the patient group consisted of 31 people (21 males and 10 females). Henceforth, the total number of individuals used for the dataset creation was 66. A total of 12 examinations were conducted for each of the individuals. Out of these 12 examinations, 4 examinations were related to meanders, 2 circled movements (which included one circle in the air and one on the paper), and left- and right-handed diadochokinesis. Overall, this dataset consists of 264 images, 420 signals of HCS and 372 signals from patients.

  • Parkinson’s Disease Spiral Drawings Using Digitized Graphics Tablet This database [72] consists of spiral drawings of 77 people, out of which 62 are PD patients and 15 are healthy individuals. Three types of handwriting were recorded: static spiral test (SST), dynamic spiral test (DST) and stability test on certain point (STCP) from all the individuals using a Wacom Cintiq 12WX graphics tablet. In the SST, patients were asked to retrace the archimedean spirals that appeared on the graphics tablet. In the DST, the patients were forced to keep the pattern in mind by using blinking spirals. In the third test, patients were asked to hold the digital pen on point in the middle of the screen without touching the screen.

3.3 Radiology image datasets

Several imaging studies dedicated to the evaluation and detection of Parkinson’s disease use magnetic resonance imaging (MRI) to examine the brain’s structure. Images dataset for Parkinson’s disease detection consists of MRI images [17]. The MRI produces clear images of the human body without affecting the tissues or cells. MRI technique detects Parkinson’s disease at an early stage. The available PD MRI datasets are listed and discussed below.

  • Neurocon Neurocon dataset [16] was obtained from 43 subjects. It consisted of 27 healthy and 16 non-healthy individuals. These subjects were examined at earlier stages of the PD at the Department of Neurology, University Emergency Hospital, Bucharest. To collect data, all the subjects had to undergo an fMRI scan of 8.05 min and to access the changes in PD and healthy subjects.

  • Tao Wu Tao Wu dataset [16] was created by collecting data from 40 subjects, which contained 20 Parkinson’s disease patients and 20 healthy controls. All patients were in the medium phase of the disease at the time of collection. For scanning purposes, all the subjects had to pass out an 8 min of fMRI scan for the determination of disease.

  • Parkinson’s Progression Markers Initiative (PPMI) PPMI dataset [95] is an easily available dataset. It is composed of MRI, PET and DTI (diffusion tensor imaging) images. It is composed of data from 600 subjects having 400 people suffering from Parkinson’s disease and 200 healthy individuals. The subjects were evaluated with MDS-UPDRS and different clinical tests for anxiety, sleep, cognitive health, etc. To examine the changes in the DAT density, all the subjects experienced longitudinal DAT imaging.

  • NTUA Parkinson Dataset Currently, this dataset [179] contains two types of images, i.e., dopamine transporter (DAT) scan images and MRI (magnetic resonance imaging) images of 78 individuals. Out of 78 individuals, 55 are Parkinson’s patients and 23 are normal controls. There are 920 DaT scan images, including 590 of PD and 330 of NPD. Also, 43,087 MRI images include 32,706 of PD 10,381 of NPD. Their target was to obtain a database that consisted of 100 PD patients and 40 healthy controls.

3.4 Gait datasets

Freezing of gait (FOG) is one of the most devastating symptoms among the different motor symptoms in PD. In FOG, freezing mainly occurs in the patient’s legs [15]. The available PD gait datasets are listed and discussed below.

  • Daphnet Freezing of Gait Dataset Daphnet Freezing of Gait dataset was created by Bachlin et al. [15]. The study was performed in the Laboratory of Gait and Neurodynamics, Department of Neurology, Tel Aviv Sourasky Medical Center (TASMC). It was performed on idiopathic PD patients having a history of FOG. The study was performed in two different sessions in two different sessions, each session consisted of three kinds of tasks: first, walking back and forth in a straight line; second, random walks in the reception hall; and third, walking simulating activities of daily living. The data were recorded from 10 patients in a total of 8 h and 20 min. During the study, eight patients out of the participants displayed FOG, while 2 patients did not show any freezy event.

  • Dataset by PhysioBank This database was collected by PhysioBank [61]. It contains gait measures from 93 idiopathic PD patients and 73 healthy controls. The vertical ground force reaction of the participants was observed as they walked at a self-selected pace on ground level for 2 min approximately. There were eight sensors beneath each foot for measurement of force (in terms of Newton) as time’s function. The output of these sensors was digitized and noted as 100 samples per second.

Summary of the statistics for different datasets for each of the categories (speech, handwriting, radiology and gait) for PD identification is given in Table 1.

Table 1 Statistics of datasets for PD identification

4 Machine learning for PD diagnosis

Over the past few years, different approaches have been presented for PD detection. The related work can be categorized based on the type of data; speech, handwriting, radiology and gait used for Parkinson’s disease detection In this section, a critical review has been provided for the contribution of different research studies in the field of medical and artificial intelligence for PD detection.

4.1 Speech-based diagnosis

Speech data has been used for Parkinson’s disease (PD) detection by different researchers using machine learning and deep learning reported in the literature. First, we present traditional machine learning techniques and then deep learning and other techniques for PD detection from speech data in this section.

The dataset collected by Max little of Oxford University, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, was used in a research study by Aich et al. [3]. The principal component analysis (PCA) was applied for feature extraction. Similarly, a genetic algorithm (GA) was also used for feature selection. A total of 11 features were obtained using the PCA; 10 were obtained using the GA. The SVM classifier achieved an accuracy of 97.57% using GA-based features.

Likewise, Haq et al. [68] used the dataset developed by Max Little. To select appropriate features, the L1-Norm support vector machine algorithm was applied. K-fold cross-validation was implied for the validation and generalization of the results of the proposed scheme. In another study, multiple feature evaluation and classification methods for the improvement of Parkinson’s disease diagnosis from voice data were proposed by Mostafa et al. [109]. A new multiple feature evaluation approach (MFEA) of a multiagent system was proposed. It was tested using five independent classification schemes: Na¨ıve Bayes, random forest, decision tree, neural network and support vector machine. Results determined that MFEA improved the performance of classifiers and also perceived the best set of features. The highest diagnosis accuracy of 99.49% was achieved by the random forest classifier.

Mathur et al. [96] applied random forest, MLP, KNN, AdaBoost, SMO, bagging, and decission tree (DT) algorithms for the prediction of Parkinson’s disease using 195 instances and 24 attributes [164]. For preprocessing, the “BestFirstSearch” method and the “cfsSubsetEval” attribute evaluator were applied using WEKA. The dataset name or citation is not mentioned and got from UCI repository. Each algorithm is combined with KNN and then compared to find the most effective, highest accuracy. The combination of the KNN algorithm with MLP reported promising accuracy in comparison with other algorithms. Similarly, Ali et al. [7] considered voice recordings for Parkinson disease analysis using 28 PD patients that comprises 168 samples. The X2 statistical model was used for feature selection and noisy feature elimination. An accuracy of 97.5% was obtained for LOSO CV (leave-one-subject-out cross-validation) on the training database, whereas a 100% accuracy was reported for the testing database. Polat et al. [142] proposed a hybrid machine learning method by using a two-class dataset: 192 were healthy individuals and 564 were PD patients. SMOTE (Synthetic Minority Over-Sampling Technique) was used to overcome the class imbalance problem. Following, the classification of the PD dataset was done by using a random forest classification method. The random forest classification achieved an accuracy of 87.037%, whereas the proposed hybrid method (i.e. combination of RF and SMOTE) achieved 94.89% success.

The application of a novel hybrid classifier has also been considered to improve the diagnostic performance. Parisi et al. [128] applied hybrid classifier for the diagnosis of PD at an early stage. Multi-layer perceptron (MLP) model was used for feature selection on data obtained from the University of California-Irvine (UCI) Machine Learning database. These selected features were then fed to the LSVM (Lagrangian support vector machine) for the classification. The results obtained using the proposed hybrid feature-driven algorithm (MLP-LSVM) showed a classification accuracy of 100%. In another work, Wu et al. [200] used a database containing the voice recordings of 27 PD patients and 446 HCs collected in a soundproof room. These recordings were then resampled at the sampling rate of 16 kHz. In this study, a feature learning algorithm was proposed to learn the features for the automatic detection of PD. Initially, the first derivatives of Mel spectrum were calculated. Next, spherical K-means was used for the training of the two dictionaries for PD and HC groups. Finally, a linear encoding followed by a pooling was used to gain learned features. Experimental results showed that the learned features obtained using the proposed algorithm were able to achieve better performance in the detection of PD in comparison with features from the two baseline methods. Using the learned features for the detection of PD, the highest accuracy was 85% and the highest specificity was around 90%.

Islam et al. [73] applied three classifiers: FBANN (feed forward back propagation-based artificial neural network), SVM (support vector machine) and RT (random tree) on speech dataset created by Max Little of the University of Oxford [88]. A 100-time repeated tenfold cross-validation analysis was carried out for each of the classifiers to observe the validation of the classification with an acceptable error rate. The proposed scheme achieved up to 97.37% recognition accuracy by using a selective feature set and optimized statistical parameters. Similarly, Rouzbahani and Daliri [79] used the dataset of Max Little et al., which was collected in 2009. Data normalization was carried out, and 22 features were selected. Next, these features were fed into three classifiers, namely KNN (K-Nearest Neighbor), SVM (support vector machine) and DFM (discrimination-function-based). Each classifier has its pros and cons. The performance of these classifiers was compared; it was determined that the KNN classifier obtained the best performance with a correct rate of 0.9382%.

Shirvan and Tahami [169] extracted various features that have been extracted from PD patients’ voice signals. Next, they detected the optimized features for classification using a genetic algorithm. Finally, the classification was carried out using the KNN classifier. The dataset used consisted of 192 voice recordings from 32 subjects out of which 23 were Parkinson’s patients. The six voice signals were recorded by each person for a duration of 3 s. Classification accuracy of 93.7% was reported when using four optimized features, 94.8% when using seven optimized features and 98.2% when using nine optimized features. Besides, Naranjo et al. [114] developed a two-stage selection and classification method to accurately match the replication-based experimental design. The specified statistical approach allowed the computational problems to be solved by using the Gibbs sampling algorithm. The accuracy, sensitivity and specificity of the proposed method were 86.2%, 82.5% and 90.0%, respectively. However, the interpretability of the results was enhanced and exposed a better chain mixing, lowering the calculation time w.r.t the classification approaches presented previously in the scientific literature. Similarly, Naranjo et al. [113] developed a clinical system for the detection of PD, which extracted features from voice recordings using an advanced statistical approach. To access the performance of the proposed system, a voice recording replication-based experiment was conducted to distinguish between healthy individuals and Parkinson’s patients. The accuracy of the system for training and test data was 85%. The obtained validation accuracy was 75.2%. Similarly, Tsoulos et al. [191] collected data from 36 subjects (19 PD patients and 17 HCs) and applied newly NNC (neural network construction). The data were collected by using the iMotor application for three tasks, i.e., two-target finger tapping test, pronation-supination test and reaction time test. Two-target finger tapping test produced the first dataset, called TWO-TARGET. Pronation-supination produced the second dataset, called PALM. The third dataset called REACTION was produced by the reaction time test. Results revealed that the proposed algorithm differentiated PD patients from HC by 93.11% accuracy. In another work, an expert system was developed that uses Diadochokinesis tests for the discrimination of healthy people from people diagnosed with Parkinson’s disease, which was proposed by Montana et al. [105]. The database used in this study was composed of voice recordings, 27 HCs and 27 PD patients (54 Spanish native speakers). These voice recordings were collected for the development and validation of the system. The proposed system was based on temporal and spectral features that were extracted from VOT (voice onset time) segments of /ka/ syllables whose boundaries were delimited by using a novel algorithm.

This approach was also applied to /pa/ and /ta/ syllables for the comparisons. A high accuracy rate of 92.2% (tenfold cross-validation) and 94.4% (LOO cross-validation) was obtained using the proposed approach. Similarly, in 2016, Agarwal et al. [1] proposed an efficient approach using the extreme learning machine (ELM) to predict Parkinson’s disease accurately utilizing speech samples. A reliable data set from the UCI repository was used to find the performance. This method discriminates healthy patients and PD patients with an accuracy of 90.76% and 0.81% MCC for the training dataset. The proposed method gave an accuracy of 81.55% when tested with an independent dataset comprising of PD patients. On comparing with the existing methods such as neural network and support vector machine, it was concluded that the proposed technique gave much better results. In the year 2016, Behroozi and Sami [20] proposed differentiating PD patients from HCs based on different vocal tests. In this work, a new framework was introduced that used an independent classifier for each vocal test. Parkinson’s speech dataset with multiple types of sound recordings was used in this study. After extracting different features, 26 classifiers were built for 26 vocal tests. The leave-one-out cross-validation technique was used for all of these classifiers. A majority vote of classifiers was used for the decision of whether the subject has PD or not. By using the proposed methodology, classification accuracy enhances up to 15%.

Similarly, a machine learning approach research study by Su and Chuang [175] used the dataset created by Sakar et al. [159]. This dataset contained data of 40 persons among which there were 20 PD patients (6 females, 14 males) and 20 HCs (10 females, 10 males). The fuzzy entropy was used to discard irrelevant features. A total of 26 linear and nonlinear features were extracted from each voice signal. Linear discriminant classifier (LDA) was used for the performance evaluation of the feature selection. It was observed in the results that different feature selection is needed for various voice samples. It was also found that dynamic feature selection could get a high rate of classification accuracy rather than the selection of all features.

In a study in 2015, Benba et al. [25] used the dataset containing 34 people. Out of these 34 people, 17 were PD patients. From each person, 1–20 coefficients of the MFCCs were then extracted. Next, the leave-one-subject-out validation scheme along with the SVM and different types of its kernels was used for the classification. The best classification accuracy of 91.17% was achieved by using the first 12 coefficients of MFCCs by linear kernel SVM. Similarly, in another work (2014, September) Benba et al. [21] used the same dataset of 34 people containing the pronunciation of sustained vowel /a/. They extracted 1–20 coefficients from each subject’s MFCCs. Vector quantization (VQ) with six codebook sizes was used to compress the frames. SVM classifier with different types of its kernel and LOSO (leave-one-subject-out) validation scheme was used for classification. The best average result of 82% was obtained using the codebook size of 1. Benba et al. [22] achieved the best classification accuracy of 82.35% by using the same dataset in a research study conducted in 2014. The authors extracted 1–20 coefficients of perceptual linear prediction (PLP) from each subject. They compressed the frames by calculating their average value for extracting the voiceprint from each individual. LOSO validation scheme along different types of SVM kernels like RBF, linear and polynomial was used for the classification. In 2019, Arora et al. [13] collected data by conducting a monthly control study with twenty people consisting of 10 PD patients and 10 healthy controls. A range of time- and frequency-domain features were extracted from the acceleration time series. Then, a classifier was used to map the features onto a binary diagnostic output variable. Three classifiers were used for classification: random forest, random classifier and conditional random classifier. An average sensitivity of 98.5% and average specificity of 97.5% were obtained by using the random forest classifier for differentiating PD patients from healthy humans. Likewise, in 2019 Poorjam et al. [143] used a dataset containing 7500 recordings of 20-s sustained vowel /a/ phonations collected through an android smartphone [207]. The recordings for both healthy controls and the PD patients were collected from around the world. Infinite hidden Markov model (iHMM) was used to split the signal into variable duration segments to the frames of the signals in the MFCC (Mel frequency cepstral coefficient) domain. Next, a multinomial naive Bayes classifier was used to mark the segments. The experimental results revealed that even by using a small amount of training data a 96% accuracy is attained.

Moreover, in 2013, Sakar et al. [159] collected data by recording the voice of 20 people suffering from PD, in that 6 were female and 14 were male. Similarly, data were collected from 20 healthy individuals consisting of 10 females and 10 males who appealed at the Department of Neurology in Cerrahpasa, Faculty of Medicine, Istanbul University. The dataset was composed of multiple sound recordings, i. e., vowels, words and sentences. During its collection, 28 patients were asked to say the vowels ‘a’ and ‘o’ three times leading to a total of 168 recordings. Praat acoustic analysis software was used to extract features from voice samples. The normalization process was used as a preprocessing step to ensure each feature has zero mean and standard deviation. Subsequently, the features were fed into SVM and k-NN classifiers for the PD diagnosis. The SVM classifier reported accuracy of 77.50%, higher than the k-NN classifier.

In a study by Braga et al. [32], data were used that consisted of three different speech databases. The first database was collected in 2014 by Proenca et al. [147]. This database was composed of 12 females and 10 males leading to a total of 22 patients suffering from PD. The second database collected was composed of 30 healthy speakers: 21 were females and 9 were males. The third database was collected by Sakar et al. [159] in 2013; however, only 18 patients were used due to the loss of audio quality. Three learning algorithms, i. e., RF, SVM and NN, were optimized. The LOO CV (leave-one-out cross-validation) technique was used to gain the accuracy of learning algorithms. The random forest (RF) algorithm achieved 99.94% accuracy, support vector machine (SVM) achieved 92.38% and neural nnetwork (NN) algorithm achieved an accuracy of 91.10%. Consequently, it was established that the RF algorithm delivered high accuracy in comparison with the other algorithms. In a research study by Almeida et al. [8], the same dataset as Vaiciukynas et al. [194] was used. Eighteen feature extraction techniques and four machine learning methods were used for the classification of the data. The best AC feature extractor of classification for the AC channel was YA, and for the SP channel, it was KT. It was revealed that it was more efficient for the detection of disease phonation tasks in comparison with the speech tasks. Four classifiers, namely KNN (k-nearest neighbors), MLP (multilayer perceptron), OPF (optimum path forest) and SVM (support vector machine), were used for the classification. More favorable results were obtained by these classifiers when compared to the Vaiciukynas et al., since their accuracy for the AC channel was 94.55% and EER was 19.01%. Using the SP channel, an accuracy of 92.94% and EER 14.1% was achieved.

In another machine learning approach study, two vocal tasks in a soundproof booth by using two channels of acoustic cardioid (AC) and smart phone (SP) were recorded simultaneously by Vaiciukynas et al. [194]. These were treated as isolated modalities, i.e., phonation and speech. Phonation modality contained the sound of ‘a’ vowel, whereas the speech modality contained the pronunciation of native language sentences. Further, the voiced and unvoiced parts were also treated as separate modalities. A total of 99 subjects of both genders were involved in the collection of the database. Random forest (RF) supervised algorithm was used to detect PD and to fuse information in the form of a soft decision. The fusion of all the feature sets and modalities resulted in EER (equal error rate) of 19.27% for the AC microphone and EER of 23% for the SP microphone. In another study in the year 2019, Shukla et al. [170] used Lee Silverman Voice Treatment (LSVT) dataset that contained 126 supported vowel phonations and 310 dysphonia features. The purpose of the study was to ensure the early diagnosis of PD by using multiple preprocessing techniques called multi-preprocessing system (MPS). Seven different classifiers were used and experimental results determined that the RF achieved the highest performance in terms of accuracy, i.e., 94.98%, sensitivity was 93.18%, precision was 94.96% and F-measure was 94.7%. A dataset containing 22 voice patterns was used by Lahmiri and Shmuel [85], which is a reduced dataset of [190]. This dataset contained vowel phonations of 147 PD patients and 48 HCs. The SVM classifier was used that was trained and tested by following the tenfold cross-validation method. Results revealed that for the first fourteen voice patterns that were identified by the Wilcoxon-based pattern ranking technique, the SVM classifier achieved the highest classification accuracy of 92.21%. Next, for the first thirteen voice patterns trained by the ROC-based pattern ranking technique, a specificity of 82.79% was obtained by SVM. The highest sensitivity obtained by the SVM classifier was 99.63% with only one voice pattern under the ROC-based pattern ranking technique. When trained including all the 22 phonation-based features, the SVM classifier achieved an overall accuracy of 91.82%, a sensitivity of 80.72% and specificity of 95.02%.

In the year 2014, Orozco-Arroyave et al. [118] used the first database considering speech recordings of Spanish native speakers for studying the disorders of speech related to the PD. Three tasks, i.e., phonation, articulation and prosody, were designed for the recordings. The main purpose was to analyze several aspects of the voice and speech of people who are suffering from PD. The SVM classifier was used for training using a radial basis Gaussian kernel with bandwidth. It was tested by following tenfold cross-validation. Based on the obtained results, it was suggested that the measures that scale the variability of the pitch and stability of the phonation are good features for detecting the presence of PD. In another study, Orozco-Arroyave et al. [120] used six codecs for compressing speech recordings. The authors also tested the impact of speech compression for the automatic classification of PD and HC speakers. Praat software was used for the segmentation of voiced and unvoiced frames. The normalization in amplitude and mean cepstral subtraction was done as preprocessing to avoid possible bias introduced by the channel. An SVM was used for the classification. Results indicated that the proposed methodology could be used for the telemonitoring of PD patients through the internet or mobile communication network.

Contrarily to the previous works which presented pattern recognition method for only detecting Parkinson’s disease, in 2015, Caesarendra et al. [34] presented a pattern recognition method for the classification of PD patients into multiple stages by using voice features. The University of California-Irvine (UCI) data repository was used, and 22 features were obtained. These features were extracted using principal component analysis(PCA) and linear discriminant analysis (LDA). It was found that in terms of extracting significant features PCA performed better in comparison with the LDA. The classification was carried out using four classifiers, namely SVM (support vector machine), KNN (K-nearest neighbor), AdaBoost (adaptive boosting) and ART-KNN (adaptive resonance theory-Kohonen neural network). The results of these classifiers were also compared. The classification results concluded that SVM had better testing accuracy in comparison with the other methods. The dataset of voice, handwriting and speech was used by Sharma et al. [167] for PD detection. The handwritten dataset was collected from 158 individuals out of which 105 were PD patients. The speech dataset contained recordings of 31 individuals out of which 23 were PD patients. Voice dataset was collected by taking recordings of 20 PD patients and 20 HCs. A new model MGWO (modified Grey Wolf Optimization) was proposed that was an altered version of GWO. The MGWO gave a reduced set of features. The decision tree, random forest and k-NN classifiers were used on the set of features. The proposed algorithm achieved an accuracy of 94.83%. In another study, Aich et al. [3] used the dataset created by Max Little of Oxford University in collaboration with the National Centre for Voice and Speech. It contained voice recordings of 31 people out of which 23 were PD patients. Principal component analysis (PCA) was used for feature extraction, resulting in 11 features extracted. Two feature sets were used in this study: original feature set (OFS) and PCA-based feature sets. The performance metrics of different classifiers were also compared. A nonlinear-based classification approach was used for the comparison. It was observed that the random forest classifier using PCA-based feature set obtained an accuracy of 96.83%.

In 2018, Wan et al. [198] used two datasets to find the severity of PD by analyzing their speech and movement patterns, which were measured by using a smartphone accelerometer. One of those datasets was the UCI dataset. The second dataset was collected by using a smartphone. Different machine learning algorithms were applied to these datasets such as logistic regression, K-nearest neighbors, random forests, M5P and DMLP (deep multilayer perception). It was found that the DMLP model performed the best with both datasets. The dataset containing voice measurements of 31 people was used by Alqahtani et al. [10] using the traditional machine learning approach. Out of the 31 people, 23 were Parkinson’s patients. This dataset contained 24 columns and the first column had the individual’s name. NNge classification algorithms were used to analyze voice recordings for the classification of PD patients and HCs. Parameters of the NNge classification algorithm were optimized, and the SMOTE algorithm was used to balance data to enhance accuracy. Lastly, NNge using the AdaBoostM1 ensemble classifier was implemented on the balanced data. It attained an accuracy of 96.30%. The primary focus of the study done by Oung et al. [125] in 2017 was the detection and classification of PD by using signals from wearable audio and motion sensors. It was based on both EWT (empirical wavelet transform) and EWPT (empirical wavelet packet transform). The EMT/EMWT was applied for the decomposition of both speech and motion data signals of 65 subjects (31 men and 34 women) into five levels. Three classifiers: k-KNN (K-nearest neighbor), PNN (probabilistic neural network) and ELM (extreme learning machine), were used to analyze the performance of the algorithm. Experimental results confirmed that 90% accuracy was obtained by using EWT/EWPT-ELM based on signals from audio and motion sensors. However, more than 95% accuracy was achieved when EWT/EWPT-ELM was applied to signals with the integration of both the signal’s information. In a research study in 2018, Benmalek et al. [26] focused on the problem of diagnosis of PD at an early stage by classification of the essential features of a person’s voice. The PVA (Patient Voice Analysis) dataset of Tsanas et al. [189, 190] was used. The dataset contains 375 voice samples of PD patients and healthy controls. The features from each of the voice signals were extracted by using MFCC and PLP Cepstral techniques. Feature selection algorithms were also used for the analysis and selection of the features to classify the persons into four groups according to Unified Parkinson’s Disease Rating Scale (UPDRS). Accuracy of 87.6% was achieved using the MFCC along with the LLBFS algorithm for differentiating PD patients of three different stages and healthy control.

In another study, V’asquez-Correa et al. [196] performed an automatic classification of PD and healthy individuals by using speech recording collected in a non-controlled environment. It included six sentences and readable text. A speech enhancement technique was used to enhance the quality of voice signals. A support vector machine (SVM) with a soft margin was employed to distinguish between healthy individuals and PD patients. Results proved that it was possible to discriminate between PD patients and healthy individuals using recordings. Voiced features accuracies ranged from 64 to 86%, whereas unvoiced features resulted in the accuracy from 78 to 99%. Similarly, in 2014, V’asquez-Correa et al. [197] developed a new device for real-time evaluation of the speech signals for PD patients. The developmental activities were done using MATLAB, digital signal processor (DSP), whereas the device was developed on a mini-computer. This newly developed platform showed an increase in the difference of the fundamental period of speech (pitch) of the PD patients. The results showed that the newly developed device was useful for the monitoring and assessment of the speech therapy of Parkinson’s patients. In another research, Shahbakhi et al. [164] proposed a new algorithm for the diagnosis of Parkinson’s disease based on voice analysis. After extracting the optimized features, SVM was applied for the classification of PD and HC. The dataset was composed up of a range of biomedical voice signals containing 31 people, 8 were healthy individuals and 23 were Parkinson’s disease patients. In conclusion, the classification accuracy was 94.50% using four optimized features. The accuracy was 93.66% using the 7 optimized features, whereas 94.22% accuracy was achieved using the 9 optimized features. Sztaho et al. [178] performed automatic classification on speech produced by PD patients. Linear regression models were applied on a set of extracted acoustic features from the middle of the vowel in different documents and continuous speech. The speech samples were partitioned into different lengths of time. It achieved a lower spearman correlation on the UPDRS scores and provided the best results on the development set. High intra-variation of the extracted features was also experienced, whereas, in 2011, Rusz et al. [157] showed the potential of the Bayes rule to expose the changes in the speech performance of PD patients. The speech data were recorded by 23 speakers with Parkinson’s disease and 23 healthy speakers. A total of 19 different acoustic measurements were found to differentiate PD patients and healthy individuals. In conclusion, the 21 PD patients and 21 healthy people were correctly classified by using this Bayes rule. Henceforth, it was proved that the Bayes theorem is feasible for the identification of impaired voice features. In another research study conducted by Rusz and Cmejila [158], the aim was to determine the presence of speech disorder in PD patients at an early age. Additionally, the authors aimed to analyze the specific characteristics of the voice impairments in the PD patients and recognize their voice signature for clinical measurement methods w.r.t automatic assessment. The final aim of the research study was to design new automatic measurement methods of articulation. Dataset was collected by 46 Czech native speakers and 23 PD patients. Each of 19 representative features was pre-selected, and Wald sequential analysis was then applied to them to assess the efficiency and extent of vocal impairment. Based on applied statistical methods, it was analyzed that 78% of unprocessed Parkinson’s patients indicated some voice disorders.

In 2016, a study was performed by Meghraoui et al. [100] to demonstrate Parkinson’s disease (PD) recognition based on voice inquiry. The experiment was conducted on a dataset from the Department of Computer Engineering at Istanbul University, created by Olcay Kursun et al. [159]. The recordings were made on stereo-channel mode and saved in WAVE format. Two types of classifiers, namely Bernoulli and multinomial Naıve Bayes (NB), were applied to the data to select the most relevant feature parameters for the detection of PD. The accuracy of using a multinomial Naıve Bayes classification model was 95%. Likewise, Jain and Shethy [75] worked to develop a predictive model to accurately predict the Unidentified Parkinson’s Disease Rating Scale (UPDRS). This study used the dataset from the UCI repository created by Athanasios Tsanas and Max Little of the University of Oxford. The proposed method used a classification algorithm. It first estimated the stage of the disease. This acted as a feature for the statistical regression method. This method is useful for clinical estimation of UPDRS and on a weekly basis to determine remote PD monitoring, tracking UPDRS monitoring for six months. Moreover, it was suggested that an efficient method to observe PD leading to advantageous treatment of the Patients should also be proposed. In the year 2017, Alhussein [6] suggested a Parkinson’s disease monitoring structure that can be used in smart cities. Using this structure doctors will constantly observe the health and get a response to the disease situation. Initial symptoms of PD can easily be identified, and appropriate prescriptions can be provided. In this study, speech signals from the participants were caught using different sensors. Next, the signals were transferred to the cloud for processing. Using a support vector machine-based classifier, results were generated in the cloud. These results along with the signal features were sent to the authorized doctors who recommended medicines to the patients. It was proved using different experiments that the new system had an accuracy rate of 97.2% in detecting Parkinson’s disease. Cepstral separation difference (CSD) was used in 2014 by Khan et al. [81]. Cepstral separation difference (CSD) was used for quantification of speech deficiency in Parkinson’s disease (PD) that shows a ratio between source and filter using a source-filter speech model. CSD features were tested on 240 clinically rated running samples that were collected from 60 PD patients and 20 healthy individuals. The correlation between the speech symptoms severity and CSD feature was strong up to 0.78. CSD was compared with some non-CSD features for speech symptoms description in terms of consistency and responsibility. The results revealed that CSD features are reliable to be used for the discrimination between severity levels of speech disorder in Parkinson’s disease. In 2015, Orozco-Arroyave et al. [122] considered speech recordings of reading texts and speeches spoken in three different languages. The authors modeled the energy content of the borders between voiced and unvoiced sounds. The results proved that it was possible to achieve accuracy in the range of 91–98% depending on the language using text. The proposed method achieved accuracies above 98% for all the three languages concerning the result of the speech. Orozco-Arroyave et al. [121] in 2014 considered three databases containing speech recordings of three different languages: German, Spanish and Czech. SVM was used for the classification process. The experiments and results validated that it was possible to obtain accuracies from 84 to 99% on the Spanish database and 84–96% on the German database. The results obtained from isolated words and result obtained with /pa/-/ta/-/ka/ determined that accuracies were ranging from 97.6 to 99% for the three languages.

In a research study by Ma et al. [94], a novel hybrid methodology for the diagnosis of PD was introduced. The hybrid method included the kernel-based learning machine along with SCFWKELM (Subtractive Clustering Features Weighting). SVM, KNN and ELM (extreme learning machine) classifiers were used to differentiate affected and unaffected people. A tenfold cross-validation scheme was used for the validation of results. The hybrid classifier obtained a 99.49% accuracy. A new automated system for early detection of PD using vowels was proposed by Tuncer et al. [193]. The study contains 756 voice signals belonging to 252 people that were collected using microphones. Feature selection was performed by using a combination of SVD (singular value decomposition) and MAMa (minimum average maximum). A relief-based feature selection method was used to select 50 significant features that were used. These features were passed to different classification models. The KNN classifier achieved the best accuracy rate of 96.83%. Senturk [163] applied machine learning processes for the diagnosis of PD at early stage. Regression trees, SVM and ANN were used for the classification on Max Little Dataset. For the extraction of useful features, recursive feature elimination and feature importance method were applied. With the least number of voice features, SVM with recursive feature elimination showed the highest accuracy rate of 93.84%.

The deep learning methods have gained immense popularity. Recently for different speech detection and recognition tasks, different researchers have used deep learning approaches for the detection of PD from speech data. In the year 2020, Zahid et al. [204] used three different techniques for PD detection using the Spanish dataset pc-Gita. Initially, transfer learning techniques were applied to spectrograms. Next, different deep features were extracted from these spectrograms. Finally, the evaluation was carried out using simple acoustic features and deep learning classifiers. It was observed that the multilayer perceptron (MLP) gave the highest accuracy of 99.7% for vowel ‘o,’ while the random forest achieved an accuracy of 99.1% for vowel ‘i’. In 2019, Gil et al. [58] proposed the artificial neural network and support vector machine. The training of the SVM was observed by SMO (Sequential Minimal Optimization) algorithm, which is an efficient training method for the SVM. The dataset used in this work was taken from the UCI machine learning repository containing voice recording data of 31 people out of which 23 were PD patients. The proposed method achieved an accuracy of 90%. A feature extraction approach using voice signals for the detection of PD patients was presented in 2013 by Jafari et al. [74]. The given feature set consists of 13 usual Mel-frequency cepstral coefficients (MFCCs) and seven nonlinear phonetic features. The dataset was composed up of 200 voice recordings, 10 from the normal persons and 25 PD patients with different sternness levels. To discriminate the PD patients, a multilayer perception (MLP) neural network classifier with one hidden layer was used. In the overall classification performance of discrimination of healthy individuals and PD patients, accuracy was 97.5%, whereas the accuracy rate for discrimination of mild and severe PD patients was 95.5%. Likewise, in 2016, Al-Fatlawi et al. [5] worked on deep belief network as an efficient technique for the identification of Parkinson’s disease. This identification was based on the voice signals of the patients. The data were inputted into the DBN by the use of a feature extraction process to create a template for matching the voice of the PD patients. To optimize the network constraints, it used restricted Boltzmann machine (RBM) to overwhelm the problem of the arbitrary values of the initial weights. Secondly, for fine-tuning, the backpropagation algorithm was used as supervised learning. The test accuracy of the suggested system was 94%, which is better in comparison with all the other methods. Can [36] proposed the boosting committee machine for the diagnosis of Parkinson’s with the artificial neural network. The dataset of M.A Little containing voice recordings of healthy and unhealthy peoples was used for analysis. A neural network with backpropagation by filtering and majority voting techniques was applied. The proposed method predicted that out of 195 instances 75.4% were of Parkinson’s patients and others were healthy. It obtained a 92.9% accuracy. Bielby et al. [28] performed a study to discriminate PD from HC by using audio data. For this purpose, they applied RNN and feed-forward NN on voice recordings provided by Naranjo et al. [113]. The proposed method gave better results when compared with other novel approaches. It was observed that neural networks were discriminated PD from HC with an accuracy rate of 96%.

There are few studies for PD detection from speech data that don’t fall under the category of machine learning or deep learning. These research contributions involve the use of medical treatments or other methods for PD detection from speech data. An investigation was conducted to explain the phenotypes and pathophysiology of voice and speech disorders in PD patients by using a sub-thalamic nucleus deep brain stimulation (STN-DBS) by Tsuboi et al. [192]. A cross-sectional study was conducted on 76 Parkinson’s patients treated with bilateral STN-DBS and 33 medical treated Parkinson’s patients. It was observed that PD-DBS patients had meaningfully worse speech and voice disorders in comparison with the PD-Med patients. Likewise, in 2013, Tsanas et al. [188] worked with the LSVT to estimate the potential of using constant vowel phonations to factually and automatically copy the speech experts’ assessment of PD patients’ voice as acceptable or unacceptable. The study was conducted on 14 PD patients. The participants had typical voice and speech characteristics of PD upon telephone transmission; this was determined by a qualified speech-language pathologist and verified by two other expert PD pathologists during data collection. A total of 156 sustained vowels were characterized by 309 dysphonia measures and by using the feature selection algorithm selected for a parsimonious subset. Then these were discriminated between two groups (acceptable and unacceptable) with almost 90% accuracy.

In another study, Liu et al. [91] examined whether or not the abnormal vocalization in PD patients was associated with sensory processing of voice auditory view. The dataset consisted of 12 PD patients, 13 age- and sex-matched healthy persons. The persons persisted in vowel sound and received unexpected agitations in voice loudness or pitch aural feedback. It was proved that when all of them produced compensatory replies in the fundamental frequency and their voice amplitude, the PD patients exhibited a larger response magnitude than that of the control group. It was observed that the processing of voice auditory response was abnormal in PD patients and might be related to the dysfunctional mechanism of error detection and correlation in sensory feedback processing. Likewise in another research study, Rajanikanth et al. [149] took the voice of the respective person as input. Next, some noise was added to the recording of the person and then processed. After processing, that added noise was removed and compared with the reference signals of healthy controls. A wide range of speech signal processing algorithms (dysphonic measures) was used to quantify the extent of speech disorders. The differentiation between PD patients and healthy controls was made with almost 99% accuracy.

The summary of the research contributions for PD detection from speech data using different approaches is given in Table 2.

Table 2 Summary of related work for PD detection from speech data using different approaches

4.2 Micrographia-based diagnosis

Similar to speech, micrographia is a common symptom among PD patients and can be effectively used as data for PD detection. State-of-the-art approaches such as machine learning and deep learning have been used and reported in the literature for PD detection from micrographia data.

Numerous researchers have opted to use traditional machine learning techniques and micrographia data for PD detection. A low-cost system for Parkinson’s disease detection from offline handwritten Archimedean spirals was proposed in 2018 by Gupta et al. [64]. The PaHaW dataset was used to test the efficiency of the proposed system. Tremor estimation distance-based and Fourier transform-based distance features were extracted to discriminate PDs from the healthy controls. The classification of the tremor estimation distance feature was performed using SVM with radial basis function kernel, whereas SVM with sigmoid kernel was applied on Fourier transform-based distance features. Furthermore in another research, Drotar et al. accessed in-air and on-surface kinematic variables of handwriting on PaHaW [101] dataset by using a digitizing tablet. Kinematic features like speed, velocity, acceleration, stroke speed and jerk were extracted. Feature selection algorithms and support vector machine learning methods were proposed for classifying healthy and non-healthy subjects. It yielded an overall accuracy of 78% and 84% on the surface and in-air hand movement. In 2020, Gupta et al. [65] used the PaHaW dataset and developed an age-dependent and sex-specific method for the discrimination of PD patients and control subjects. The dataset was categorized into the subgroups of male, female, elders and adults. The support vector machine obtained an accuracy of 70.62%, 79.55%, 74% and 83.75% for adults, elders, male and female, respectively. Likewise, Taleb et al. [180] proposed a subset of handwriting features and diagnosed Parkinson’s disease on behalf of handwriting samples. The PDMultiMC dataset was utilized, and SVM with an RBF kernel was applied to this dataset. An accuracy of 96.875%, the sensitivity of 93.75% and specificity of 100% were achieved. The authors in [50] extended their work and extracted pressure and spatio-temporal (i.e., stroke height/width) features in addition to kinematic features. The classification was done using the SVM classifier along with the RBF kernel. The proposed study reported a classification accuracy of 89%. Similarly in another study, Drotar et al. [48] presented the novel handwriting marker for the diagnosis of Parkinson’s on the PaHaW dataset. In addition to kinematic and spatio-temporal features, other handwriting measures were used based on signal energy, entropy and empirical mode decomposition of handwriting signals. Automated diagnosis of Parkinson’s was done using an SVM classifier with a radial Gaussian kernel that yielded an 88.133% accuracy. It achieved a sensitivity and specificity of 89.47 and 91.89%, respectively. In 2016, the authors in [49] further proposed different machine learning models, i.e., KNN (K-nearest neighbor), AdaBoost and support vector machine classifier for classification of healthy and non-healthy subjects. It was concluded that kinematic and pressure features SVM were the best classification models with an accuracy of 81.3%. In comparison, the AdaBoost and KNN models reported accuracy of 78.9% and 71.7%, respectively. In 2013, Rosenblum et al. [155] collected handwriting data from 20 non-healthy and 20 healthy subjects. The author introduced the MANOVA analysis to test the difference among pressure, velocity and spatio-temporal features of handwriting. This analysis showed that 97.5% of participants were correctly classified using the handwriting data. Similarly, in 2014, Nackaerts et al. [33, 111, 112] proposed SOS test for analyzing the handwriting abnormalities of PD patients. The authors examined the handwriting samples of 26 healthy and 87 non-healthy subjects by using the SOS (‘Systematic Screening of Hand-writing 357 Difficulties’) test. Besides, handwriting task by 18 non-healthy and 11 healthy subjects was performed under a dual-task state. Lastly, the correlation analysis was performed on the SOS test result. It was observed that PD patient’s handwriting task speed was slower and stroke duration was high in comparison with the healthy subjects. It was also observed that the PD patient’s handwriting amplitude decreased under the dual-task state. Rios-Urrego et al. [153] extracted different features from handwriting drawings and applied different classification models to discriminate PD and HC. By applying RF, KNN and SVM, the system achieved accuracy up to 93.1%. When this technique was applied on a different dataset, it achieved accuracy up to 83.3%. Bernardo et al. [27] collected drawing samples drawn by the participants using specific software. The drawings contained 85 triangles, 80 spiral Archimedean and 76 cubes patterns. After extracting 11 optimal attributes from the preprocessed data, they applied different ML models such as SVM, OPF and Na¨ıve Bayes. The results obtained by this new build system were about 96%. Impedovo et al. [71] applied ML techniques for early detection of Parkinson’s disease using handwriting samples. They applied different ensemble ML models such as SVM (with RBF and linear kernel), KNN, LDA, AdaBoost and RF on the PaHaW dataset and achieved better results with high specificity and low sensitivity.

In comparison with the machine learning approaches, few researchers have opted to use the deep learning approaches for PD detection using the handwriting data. A research study conducted in 2021 by Kamran et al. [78] applied number of CNN architectures on number of PD handwriting datasets. They conducted several experiments by combining different patterns from different datasets and reported that meander and spiral contribute more in PD detection than circle and simple patterns. In 2020, Razzak et al. [150] investigated the different handwriting patterns using deep learning models and found out the more curving and complicated patterns help more in the identification of PD. Naseer et al. [115] used the PaHaW (Parkinson’s disease handwriting) dataset containing handwriting samples of 38 HCs and 37 PD patients. The identification of PD patients was carried out using a deep convolutional neural network classifier with transfer learning and data augmentation techniques (rotations, flipping and contours). The transfer learning approaches like freeze and fine-tuning were investigated by using ImageNet and MNIST datasets as source tasks independently. A 98.28% accuracy was achieved by using a trained fine-tuning-based network on ImageNet and PaHaW datasets. The proposed approach provided a more acceptable detection of PD in comparison with other state-of-the-art studies. In 2018, Pereire et al. [134] used a dataset containing images acquired during handwriting exams of 18 HCs (6 males, 12 females) and 74 PD patients (59 males, 15 females). The CNN (convolutional neural network) was used to learn features from images that were produced by handwritten dynamics. The result obtained by using CNN was compared to raw data texture-based descriptors, showing promising results. In this work, it was determined that CNN can learn significant features and could differentiate a PD patient from healthy control with an accuracy of 95%. In the year 2019, Moetesum et al. [103] performed a study on the PaHaw dataset by extracting visual features characterized by graphomotor samples of Parkinson’s patients. Convolutional neural network was applied on these extracted features and was passed to SVM for classification. By applying late and early fusion on visual features only the accuracy achieved was 83%. Similarly in another research, Gavrilescu [57] tried to analyze the personality type by observing handwriting features: slant, stroke, baseline, pressure, the height of letter and speed. A three-layer feed-forward neural network model was proposed for personality detection, i.e., either introvert or extrovert. The base layer gave the handwriting features, whereas on middle layer contained the neural network for each of the personality characteristics that exhibited the availability and intensity. The last layer predicted the actual personality with an accuracy of 86.7% using the personality features and intensity from the middle layers. The proposed system figured out the personality trait accurately in less than 1 min. Therefore, it was an efficient method in comparison with the questionnaire. In 2019, Diaz et al. [45] proposed a new system for the diagnosis of PD using dynamically enhanced static images of handwriting using the PaHaW dataset. Transfer learning was applied to obtain appropriate features from the data. Lastly, an ensemble of different classification models was employed. It achieved accuracy up to 86.67%. The accuracy achieved was adequate in comparison with other newly proposed systems, based on the dynamic and static handwriting datasets. Loconsole et al. [92] worked on computer-assisted handwriting analysis for detection of PD. They used sEMG (surface ElectroMyoGraphy) signal processing techniques and AI-based classification models. Four different feature set values were considered to analyze five research-related queries regarding the best artificial intelligence-based classification method between SVM approaches and ANN optimal topology. After performing the experiments, SVM gave better results than ANN optimal approaches.

Few studies are using other methods and medical techniques used for PD detection from micrographia that doesn’t fall under the category of machine learning or deep learning approaches. An improved and optimized version of the crow search algorithm (OCSA) was proposed by Gupta et al. [63]. The algorithm was applied to the HandPD dataset. This method was used in predicting Parkinson’s disease, and the accuracy rate was about 100%. It helped in aiding the patients in having early treatment. Next, the results were compared with the chaotic crow search algorithm. It was observed that the method found an optimal subset of features. It was suggested that for increased accuracy, the number of features should be minimized. Likewise, in another research, Gemmert et al. [195] examined that PD patients have not performed well on the larger target size. The handwriting samples of 13 healthy and 13 non-healthy controls were collected. After analysis, it was concluded that the stroke size and duration of PD patients were modulated up to 1.5 cm independently, whereas sizes above 1.5 cm resulted in PD patients undershooting in the handwriting tasks. Further, it was observed that the stroke duration was high, and the size was small in samples of PD patients in comparison with healthy persons.

The summary of the research contributions for PD detection from handwriting data using different approaches is given in Table 3.

Table 3 Summary of related work for PD detection from handwriting data using different approaches

4.3 Radiology-based diagnosis

MRI biomarkers exhibit the enormous potential to characterize disease process in PD [129]. It is proved to be a good source for a better understanding of the neural substrates contributing to postural instability. State-of-the-art approaches such as machine learning and deep learning, as well as some other methods, have been used and reported in the literature for PD detection from radiology data.

Radiology data and machine learning techniques have given favorable results for PD detection. A new system based on FIG (fuzzy information gain) function and K-means clustering algorithm was proposed by Huang et al. [69]. The information about fuzzified pixels was measured using the FIG function, whereas the k-means algorithm was used to cluster the pixels. The changes in the MRI were classified into three groups: minimum, maximum and average change regions. Experimental results were obtained by utilizing seven different image segmentation techniques. This method achieved a Jaccard similarity coefficient of 0.92, a peak signal-to-noise ratio of 30.14 and an average mean squared error of

63.49 among nine MRIs of PD. In comparison with the other image segmentation methods, the performance showed an enhancement of 6.98–64.29%, 3.54–6.20% and 20.73–32.94%, over the Jaccard similarity coefficient, peak signal-to-noise ratio and average mean squared error, respectively.

3D-MRI images were used in 2018 by Cigdem et al. [43] for the diagnosis of Parkinson’s disease. The PPMI dataset was used for the data, an SVM was used for classification and PCA was used for dimensionality reduction. Voxel-based morphometry (VBM) technique was used to compare morphological differences between PD patients and HCs in GM (gray matter) and WM (white matter). The highest accuracies of 73.75%, 72.50%, and 93.7% were obtained for GM, WM and combination of them by using TIV (total intracranial volume) as a covariate and f-contrast for model building. Similarly in 2018, Amoroso et al. [11] proposed a new system for the detection of PD by investigating the parts of the brain that were affected through PD. This study utilized the PPMI dataset. Firstly, a network was defined of brain regions and then associated them suitably. The feature selection was done with the help of random forests. Next, these features were combined by using an SVM. The proposed system was able to detect the damaged brain regions by achieving an accuracy of up to 97%. Likewise in another research study, Hamdi and Laouini [67] introduced another machine learning method for diagnosis of Parkinson’s disease, i.e., the CAD (computer-aided diagnosis system) based on SVM and histogram equalization. Extracted features were used as input to the classifier, whereas SVM (support vector machine) was trained to identify the subjects affected by Parkinson’s disease. An accuracy of 91.37% and 92.39% was reported for VAF and PCA, respectively.

In 2012, Long et al. [93] applied a support vector machine classifier on structural and resting-state functional magnetic resonance images (rsfMRI) of nineteen right-handed patients and twenty-seven normal people. Preprocessing was performed by using statistical parametric mapping. The proposed method gave an accuracy of 86.96%, sensitivity and specificity of 78.95% and 92.59%, respectively. In a study by Salvatore [160], Parkinson’s disease was detected by MRI images dataset and classification using an SVM. Accuracy greater than 90% was reported. In contrast, Przybyszewski [148] also used the MRI dataset but applied the reflexive saccades measurements for evaluating the disease level. This method gave 70% accuracy. Morales et al. [107] used the dataset of MRI images of 45 patients including 27 males and 18 females. Four classifiers were used for classification purposes, i.e., na¨ıve Bayes, multivariate filter-based na¨ıve Bayes, filter selective na¨ıve Bayes and SVM (Support Vector Machine). Through experimentation, it was concluded that the multivariate filter selection na¨ıve Bayes was the best classifier, achieving the highest cross-validate accuracy, specificity and sensitivity.

The transcranial sonography images and watershed segmentation were used by Chen et al. [39] in 2012. Three datasets of TCS images were used. The first dataset included 42 images of PD patients and 36 TCS images from HCs. Similarly, dataset 2 contained 15 TCS images of 10 patients and 8 images from HCs. Furthermore, the third dataset consisted of 10 PD TCS images from 5 PD patients and 27 TCS images from 14 controls. Local features were extracted for the proposed local image analysis method. The performance of these features was evaluated by using a feature selection method. It was noted by cross-validation results that the local features could be used for the detection of Parkinson’s disease. Furthermore, in 2013 Armananzas et al. [12] selected the features using a wrapper selection scheme. The database used in this study was collected by one of the authors, (PPMI), which included 410 patients. Three categories (mild, moderate and severe) were used for problem classification using five classifiers. The binary classifiers produced the best diagnosis of non-motor symptoms with an accuracy of 72–92%. In a study by Prashanth et al. [145], the data from PPMI (Parkinson’s progression markers’s initiative) database were used. Two classifiers, SVM and classification tree, were used for classifying Parkinson’s and healthy persons. A total of 89.39% of data was correctly classified by these classifiers.

Likewise, Jin et al. [77] performed a study on the PPMI dataset using the ML approach. A newly proposed methodology of ReliefF-SVM-based dMRI exploration was performed to study the potential between scans without evidence of dopaminergic deficit (SWEDD) and PD. The SVM discriminated against SWEDD and PD with an accuracy rate of 81.25%. Peng et al. [132] used a multilevel region-of-interest (ROI) feature-based machine learning method for discrimination of PD and HC. Different high-level correlative attributes and low-level ROI attributes were combined to make multilevel features. Multi-kernel-based SVM was applied to the extracted features from the PPMI dataset to classify both classes. This newly proposed method is successful in achieving 85.78% accuracies

In comparison with the machine learning approaches, there are few studies for PD detection from radiology data using deep learning approaches. The EEG signals were used for the detection of Parkinson’s disease by Oh et al. [116]. The EEG signals of 20 normal controls (11 women, 9 men) and 20 PD patients (10 women, 10 men) were taken into consideration. A thirteen-layer CNN model was implemented for the diagnosis of PD using EEG signals. It was noted that there was a possibility of data overfitting without the dropout layer of the model. An accuracy of 88.25%, the sensitivity of 84.71% and specificity of 91.77% were obtained. In contrast, Sharma and Giri [166] diagnosed Parkinson’s disease by using neural networks on MRI brain images. For the earlier diagnosis of Parkinson’s, they applied clustering on MRI brain slices. Next, segmentation was carried out using k-means. Finally, a classifier, i.e., neural network, was used for classifying the healthy and Parkinson’s subjects. The proposed method achieved 85.92% accuracy. Shinde et al. [168] performed a study on computer-based analysis by using CNN to create diagnostic and predictive biomarkers of PD from Neuromelanin sensitive magnetic resonance imaging (NMS-MRI). The study was applied to 20 patients with Parkinson’s syndrome, 45 patients with PD and 35 HC. This method achieved 80% testing accuracy. In 2018, Pahuja et al. [126] performed a study to establish association amid object biomarkers of PD established on T1-weighted MRI scans and other clinical biomarkers by using the PPMI dataset. The optimal features were extracted by using voxel-based morphometry. For the classification of different subjects, they applied the SRAN (self-adaptive resource allocation network), support vector machines (SVM) and extreme learning machine (ELM). The results showed that the SRAN classifier gave better results in comparison with the SVM and the ELM. It achieved the accuracy up to 97%. Similarly, Tang et al. [184] used the PPMI dataset to train ANNs to classify PD and HC. The authors wanted to predict the UPDRS motor score from six non-imaging features at baseline and in year 4 from 92 imaging parameters from 12 different regions. Different parameters achieved 70% predictive accuracy in performing the target.

There are few studies for PD detection from radiology data that don’t fall under the category of machine learning or deep learning but are other methods. A network-based approach was introduced by Monajemi et al. [104] to describe relationships among tremors and the brain connectivity of PD patients. To study the human brain connectivity and functionality, various methods were used like functional magnetic resonance imaging (fMRI) and electroencephalography (EEG). The results observed effective brain connectivity measure relationships with tremors and the differences among the connectivity values of PD patients with tremors (i.e., mild and severe hand tremors).

The summary of the research contributions for PD detection from radiology data using different approaches is given in Table 4.

Table 4 Summary of related work for PD detection from radiology data using different approaches

4.4 PD detection using gait data

The abnormal gait pattern in Parkinson’s patients is described by stride length, reduced gait velocity and an improved proportion of the gait cycle spent in double limb support [124]. State-of-the-art approaches such as machine learning and deep learning have been used and reported in the literature for PD detection from gait data and have achieved promising results.

In comparison with the other categories of data, few researchers have opted to use machine learning techniques for PD detection from extracted gait data. An innovative system using fuzzy logic for home assessment of PD patients was proposed by Pepa et al. [133]. A smartphone app was developed for the detection of gait in PD patients. The data were collected from patients with idiopathic PD. The proposed system achieved an accuracy of 93%. Likewise, Pham et al. [139] worked with the freezing of gait data to develop an automated detector. Surviving detection algorithms are subject-dependent; therefore, the proposed system worked as subject-independent. An anomaly score detector (ASD) with adaptive thresholding was developed to identify FoG events. This innovative multi-channel freezing catalog attained the sensitivity and specificity of 96% and 78%, respectively. Conversely, the vertical axis freezing catalog was best for a single input, attaining the sensitivity (specificity) of 89% (94%) for the back sensor and 94% (84%) for the ankle. Gait analysis was researched by Medeiros et al. [98] for PD detection by observing the walking irregularities. Principal component analysis was applied to the gait data to identify the user’s irregularities that may confirm the development of Parkinson’s disease. An experimental study was performed on 100 participants containing both healthy and PD patients. Euclidean distance was used as the classifier with leave-one-out cross-validation. The accuracy achieved using the proposed approach was 81%. Similarly, Cho et al. [41] introduced a vision-based diagnostic system to evaluate the gait patterns of PD patients, applied algorithms by combining PCA (principal component analysis) and LDA (linear discriminant analysis) on gait patterns. The authors reported a 95.49% accuracy. The proposed system was able to differentiate the gait patterns of PD patients and healthy controls with a high classification rate. An optimized method was proposed by Chen et al. [40] in 2018 that helped to effectively distinguish between PD patients and healthy controls. In this work, effective gait features were extracted by using a U-shaped gait-sensing platform. Non-dimensionalization and min–max normalization were performed during the preprocessing. Following, features were fed into the SVM classifier which was optimized by using a PSO (particle swarm optimization) algorithm. The proposed method improved the accuracy from 87.12 to 95.66%. Orphanidou et al. [123] performed a study on the detection of freezing of gait by using machine learning approaches. The study was performed on the Daphnet Freezing of Gait dataset. It proved that machine learning methods are a good tool for detecting PD at an early stage. Among seven different ML methods, SVM with polynomial kernel achieved the highest accuracy of 91%.

The deep learning technologies and the gait data have been used and reported in the literature for PD detection in a higher number than that of machine learning approaches. A 3D pose estimation was performed using deep learning by Kondragunta et al. [84]. The data were collected from elderly people with ages higher than 80 years. The data were composed of regular gait data and cognitive dual tasks. Deep learning approaches such as CNN were used for the estimation of 2D poses by using RGB images. Mapping the depth information led to the creation of a 3D pose environment that was used to project 2D poses for the extraction of appropriate features. Different gait features were used, such as step width, step angle and stride length. It was a novel system that used the 3D pose estimation for PD detection. In 2017, Baby et al. [14] worked with wavelet transform-based feature extraction and gait characteristics techniques to distinguish between PD patients and healthy individuals. This method helped the physicians to identify and start treatment of the disease at an early stage. An artificial neural network (ANN)-based classifier and various coefficient back propagation (BP) algorithms were used to evaluate the performance of the procedures. It gave an average efficiency of 86.75%. Likewise, Zeng et al. [205] worked on gait analysis through deterministic learning theory and proposed a method to discriminate between Parkinson’s patients and healthy individuals. In this study, the dataset provided by PhysioBank was used. RBF neural network was used to detect abnormalities in gait patterns of PD patients. The obtained accuracy, specificity and sensitivity were 96.39%, 95.89% and 96.77%, respectively. Similarly in another research, Chen et al. [38] worked on the perspective of extreme learning technique (ELM) and kernel ELM (KELM) for early diagnosis of Parkinson’s disease for the initial identification of Parkinson’s disease. The efficiency of the suggested technique was thoroughly assessed against the PD dataset. This new method has achieved auspicious classification accuracy through tenfold cross-validation analysis. The highest rate of 96.47% and the average accuracy of 95.97% over 10 runs of tenfold CV were achieved. Likewise, Torvi et al. [187] analyzed the performance of deep learning algorithms for the early prediction of FoG. Additionally, to establish a better prediction model for specific subjects, the enactment of domain adaptation procedures was analyzed to address the domain discrepancy of the data from altered subjects. The study was performed on the Daphnet Freezing of Gait dataset to determine the potential of algorithms to precisely identify the FoG events before their inception by using the LSTM network. Experimental results proved incredible results in accurately identifying FoG events in short periods. In 2016, Eskofier et al. [53] worked on the movement disorder of PD patients and a deep learning approach for the monitoring. The main focus of the study was the detection of Bradykinesia. The dataset was collected from 10 PD patients with idiopathic PD patients using inertial measurement units. Standard machine learning pipelines with deep learning on CNN were compared. It was concluded that in terms of classification rate deep learning exceeds other state-of-the-art machine learning algorithms by at least 4.6%. The gait data (vertical ground reaction force) recorded by foot sensors were used in 2018 by Zhao et al. [208] for the detection and severity rating of Parkinson’s disease. PhysioNet database was used, containing three PD gait sub-datasets contributed by three researchers (Ga, Ju and Si). The dataset contains the gait information of 93 patients and 73 HCs. A two-channel model combining LSTM (long short-term memory) and CNN (convolutional neural network) was developed to learn the spatio-temporal patterns behind the gait data. The model was trained and tested on those three datasets. The proposed method achieved prediction accuracy more than the existing ones. This method helped neurosurgeons because the diagnosis procedure became simple. Xia et al. [201] performed a study on the detection of FOG in PD patients using CNN. As input, 1D acceleration signals were used. The proposed method was helpful in the automatic detection and discrimination of gait events from a normal walk. The proposed method achieved 99% classification accuracy. In 2017, Camps et al. [35] used signal processing and deep learning methods for the detection of gait in PD patients. The data were recorded from 15 patients with demonstrated gait. Gyroscope, tri-axial accelerometer and magnetometer signals were recorded by using the inertial measurement unit. RNN and LSTM were used to detect FOG in the patients and achieved promising results with 78% specificity and 88.6% sensitivity.

Numerous studies have been published for PD detection from gait data using techniques other than that of deep learning or machine learning. Transcranial direct current stimulation (tDCS) non-intrusive method was used for inducing extended functional changes in the human cerebral cortex by Ferrucci et al. [54]. This technique could prolong the treatment options for patients with movement ailments. This study recruited nine idiopathic PD patients of which 4 were women. They delivered bilateral anodal and sham tDCS in random order, to assess how tDCS affects cognitive and motor function in Parkinson’s patients in three diverse experimental sessions held at least one month apart. The results proved that anodal tDCS applied for five uninterrupted days over the cerebrum and motor cortical areas mend levopoda-induced dyskinesias in Parkinson’s’ patients. In 2016, Little et al. [90] worked with adaptive deep brain stimulation (aDBS) that uses feedback from brain signals to monitor simulations. The research aimed to test whether potential benefits were retained with bilateral aDBS in the face of coexisting treatment. Bilateral aDBS was applied on four patients with PD undergoing DBS of the nucleus’ subthalamic; its mean stimulation voltage was 3.0 ± 0.1 V. The UPDRS scores were (43%) better with aDBS rather than without stimulations. Motor enhancement with aDBS occurred regardless. An average Time On Stimulation (ToS) of only 45% levopoda was well accepted during aDBS and directed to further decrease in ToS. Similarly in research by Zago et al. [203], gait analysis of the Parkinson’s patients by using the commercial inertial unit was analyzed. The gait of 22 PD patients was recorded with both an optoelectronic system and a commercial IMU-based system. Different spatiotemporal features were compared between both these systems. Features (stride length and step duration) were though not statistically dissimilar but showed adequate values of RMSE and MAE. Outcomes revealed that the algorithm entrenched in the recent release of commercial IMU required more enhancements to be used with Parkinson’s patients. Generally, the system was accurate for the evaluation of gait spatiotemporal parameters. In 2018, Alomari et al. [9] examined the relationship of the leg, arm and handgrip neuromuscular performance with cardiovascular function in Parkinson’s patients. The experiments were conducted on 30 healthy controls and 29 Parkinson’s patients. Their blood pressure, hand grip and vascular measures, legs and shoulder neuromuscular performance were gained. The important fact about the study was that the regression technique determined that alteration in peripheral and central cardiovascular function events established an abstemiously strong relationship with depreciated handgrip (R2-ange = 0.196–0.257), shoulder (R2-range = 0.146–0.289) and leg (R2-range = 0.19–0.35) neuromuscular enactment. The results perceived that depreciated neuromuscular performance and cardiovascular function are associated with PD. Vertical ground reaction force (VGRF) signals recorded from PD patients as well as from normal subjects were analyzed by Soubra et al. [174]. The study was performed to analyze abnormal gait patterns to identify PD patients. Various important features were dug out from sensors located at different positions on the left and right feet. Finally, these extracted attributes were used to classify between healthy control and PD patients. Results proved that the extracted features may hide the conveyed information. Furthermore, frequency-related parameters were able to discriminate between PD patients with different stages. Summa et al. [176] worked to analyze the motor symptoms in the PD patients using gyroscope signals and recorded detailed MDS-UPDRS motor tasks via a magneto-inertial device. The signals were recorded by 7 PD patients and 7 age-matched control subjects to genuinely study the characteristics of goal mouth movements. By the use of gyroscope signals, different features were severely analyzed to assess the bradykinesia of Parkinson’s patients. Feature changes from the OFF to ON stage were perceived with the MDS-UPDRS changes in the frequency domain. Results suggested that the pro-supination chore was more reliable to explain bradykinesia signs with a gyroscope. It was concluded that it is promising to monitor bradykinesia using simple features and a wearable sensor. Likewise in 2019, McGill et al. [97] performed a study to analyze the consequence of ballet on gait variability and balance assurance for Parkinson’s patients. The study was performed on a group of 19 PD patients who were already involved in weekly ballet classes, whereas 13 controlled subjects with Parkinson’s were asked not to participate in the dance during the study process. Result did not establish an important effect on gait variability and balance confidence due to this weekly ballet class. This paper discards the studies that suggest dancing can improve the balance and gait of PD patients. Maachi et al. [52] proposed a state-of-the-art technique for the detection of Parkinson’s disease through gait information. 1D Convnet was applied to construct CNN classifier. 18 1D-Signals determining VGRF were processed using this model. This experiment was applied on PhysioNet database. This newly proposed method achieved the accuracy of 98.7%. Moon et al. [106] worked to differentiate Parkinson’s disease and essential tremor patients using wearable inertial motion sensor using machine learning approach. Gait and balanced variables were gathered through walk and instrumental stand test from 45 patients with ET and 524 PD patients. Different ML approaches like SVM, KNN, gradient boosting, random forest, neural network and decision trees were compared with some mock data using F1 score. The result showed that highest F1-score was obtained using neural networks, i.e., 0.61.

The summary of the research contributions for PD detection from gait data using different approaches is given in Table 5.

Table 5 Summary of related work for PD detection from gait data using different approaches

5 Challenges and issues

During the last decades, numerous sensor-based technologies and artificial intelligence-based systems have been developed for PD detection. Most of the state-of-the-art studies have reported encouraging results for the detection of Parkinson’s disease. Nevertheless, regardless of these advancements and technologies, there are some issues and challenges that are faced during the automated detection and analysis (see Fig. 5).

Fig. 5
figure 5

Different challenges and issues observed with AI-based PD detection

  • Cardinality of Datasets One of the main challenges in the diagnosis of Parkinson’s disease is the cardinality of the dataset. The dataset used for detecting Parkinson’s disease is composed of a few participants. Due to the lack of participants, the size of data is small; therefore, it is not a good approach to train a model with less data to get the efficient test accuracy. The model that is trained with maximum amount of data will be more intelligent than the model trained with less data. Recently, some attempts have been done for getting the acceptable dimension. During the last decade, there has been a dramatic change in the size and complexity of data; thus, several emerging data analysis techniques have been presented. However, automated detection of healthy and last stage patients is a challenging task due to the lack of dataset and acquisition tools. Consequently, accurate and effective classification is still an issue.

  • Testing and Training of Large Datasets The automatic pattern recognition and classification tools particularly devised for automatic detection and monitoring of neurodegenerative disorders are coupled with cognitive models that can handle the small range of datasets. The dataset that contains images such as MRI and EEG consumes a large amount of hardware. It becomes a challenging task to handle that huge data in the phase of testing and training.To overcome this challenge, it is necessary to expand the hardware resources. There should be an efficient processor that can handle this large amount of imagery data in a lesser time. After improving the hardware resources, one can process large amount of data in less time.

  • Maintenance of Privacy and Confidentiality of Patient’s Data Maintaining the privacy and confidentiality of the patient’s private data is also a challenging task because most of the patients do not want to disclose certain information due to a lack of trust and the perception that this information might not be kept confidential. To overcome this problem, there is need to take some specific measures to ensure the patients that their personal information will be keep secret. And there should be an organized system in order to keep their record secure and to build their trust.

6 Discussion and analysis

In this article, a detailed review is provided for PD detection using artificial intelligence. This section provides a comparative analysis to find the distribution of research studies directed toward each category of data, i.e., speech data, handwriting data, radiology data and gait data. The contribution distribution for each data category is shown in Fig. 6. Based on the research articles provided in this study, after analysis, it is found that 54% of researchers worked on AI-based PD detection using speech data, whereas 17% of researchers contributed toward working on AI-based PD detection using the handwriting data. For radiology data, a 16% article contribution is observed. The least research article contribution, 13%, is observed for the gait data in this study.

Fig. 6
figure 6

Research articles contribution distribution for each data category observed in this study

Another comparative analysis is carried out to find the distribution of research studies directed toward using the different approaches (machine learning or deep learning) for each category of data, i.e., speech data, handwriting data, radiology data and gait data. The research article contribution distribution using different approaches from each data category is shown in Fig. 7. Based on the research articles provided in this study, after analysis, it is found that 46% of researchers worked on AI-based PD detection using machine learning from gait data, whereas 54% of researchers contributed toward working on AI-based PD detection using machine learning from the gait data. For radiology data, a 71% article contribution is observed using machine learning and 29% using deep learning approaches. A research article contribution of 67% is reported for handwriting data using machine learning and 33% using deep learning. Lastly, for speech data, the research article contribution of 89% is found using machine learning and only 11% using deep learning.

Fig. 7
figure 7

Research articles contribution distribution using different techniques for each data category observed in this study

The third and final comparative analysis is observed for finding the total research articles distribution of machine learning and deep learning studies used in this study. Based on the analysis, it is found that 77% of researchers opted to use machine learning approaches, whereas 23% of researchers have used deep learning approaches. The research article contribution distribution between machine learning and deep learning approach is shown in Fig. 8.

Fig. 8
figure 8

Research articles contribution distribution using machine learning and deep learning techniques for each data category observed in this study

7 Conclusion and future directions

Artificial intelligence has proved to be a promising technology for the medical diagnosis of various diseases. In this research paper, a detailed review of Parkinson’s disease is provided. PD is a chronic disorder that badly affects the daily life of the person. Different preliminary concepts, methodologies, computational models, datasets and challenges are thoroughly addressed in the article. Four categories of datasets are considered: speech, handwriting, radiology and gait. Similarly, state-of-the-art methods and studies relating to machine learning, deep learning and some other medical-based researches and technologies are also reviewed. Several challenges and issues are also discussed that are usually faced during an automated PD detection.

Based on the extensive literature study provided in this survey, several findings are concluded. Some of these findings are: (1) the diagnosis of Parkinson’s disease is an extensively challenging task due to similar symptoms and signs to other diseases that affects the quality of life. (2) The PD disease is not limited to a certain age and can affect people of different ages, elderly as well as adults. (3) The effects of PD disease vary from person to person; it affects speech, movement, writing and many other daily life activities. (4) There is still no best and robust test available to diagnose this neurological disorder, while leading to increase daily. (5) In comparison with the other datasets, PD detection from speech datasets is more widely adopted by the research community. (6) In comparison with the other datasets, PD detection from gait datasets is the least used by the research community. (7) Machine learning is the most widely used approach by researchers for PD detection from speech, radiology and handwriting datasets. (8) Overall between machine learning and deep learning, machine learning is the most widely used approach by researchers for PD detection.

In the future, prospective researchers can work around these findings and try to address the various challenges and issues with PD detection. The number of PD samples is very limited in available datasets. So, researchers in this field need to make available a large and benchmark dataset for community. In addition, researchers may also focus more on tremors and gaits types features, symotoms and sings using e-health kits, sensor-based body devices with Internet of thing (IoT) that may help more in recognition of PD and monitoring of PD at home. Overall, it can be concluded that deep learning-based techniques of artificial intelligence have been successfully used for PD detection in the past and has a high potential to develop a robust computer-aided PD system in the future.