Keywords

1 Introduction

Almost we all heard the word “Autism” in our life. But we aren’t fully aware of the fact that what it is unless someone in our own family is suffering from it. “The world health organization (WHO) estimates that one in 100 children [30] worldwide have ASD.” Autism signs are typically easy to spot before the age of five. The conduct and interpersonal communication of the person being diagnosed with ASD are the key areas of focus. It is unknown how common autism is in many low- and middle-income nations. In the past, screening equipment was used to catch instances earlier and began therapy as soon as possible which sometimes gave the most successful outcomes. It is found in research that approximately 30% of children with autism have less extreme signs and symptoms at age 6 years than at 3 years. It is far surprising to see that some children appear to get better dramatically while others do not. But it is an encouraging sign that seems to indicate that autism doesn’t worsen with age and can get much better with time in some cases. Deep learning has demonstrated incredible performance in the area of pattern recognition in recent times. Numerous CNN-based methodologies are being suggested for evaluation. Neuro images in ASD patients can be interpreted using CNN techniques. The researchers discovered that 2.53% of the eligible individuals had been identified with ASD between 2014 and 2019. The current rate of autism worldwide shows that the common occurrence of ASD around the world is growing day by day. Today, it is crucial to identify ASD in children as soon as possible. Due to the lack of a conventional medical testing system for ASD, diagnosing it may be difficult. In fact, diagnosis is completed by searching the child’s developmental records and behavior. ‘Deep Learning’ models have advanced tremendously as a consequence of years of study by numerous scientists and researchers. There are many models of this type that can be used to categorize autism. We can efficiently diagnose autism using a variety of CNN-based techniques.

2 Review of Literature

Over the years, there has been an upsurge in autism. The condition affects people from all races and socioeconomic backgrounds. “Approximately 1.8% of children in the United States have been diagnosed with autism, a figure that has more than doubled over the past 20 years, based on a WHO (World Health Organization) report from 2022. It is estimated that 1 in 160 individuals globally has autism and that 1 in 100 children have ASD [30].” Today, adequate medical care, successful interventions, therapies, and treatments significantly enhance the quality of life for children with autism. “With the use of three pre-trained models-MobileNet (95% accuracy), Xception (94%), and InceptionV3 (89%), Ahmed, Z.A. et al. [3] created a deep learning-based online application for autism detection.” “A face-based attention recognition model was suggested by Banire, B. et al. [6] employing two techniques: an SVM classifier and a CNN method for converting time-domain spatial data into 2D spatial images.” “Li, G., Liu, et al. [15] suggested a patch-level data-expanding approach to multi-channel CNN and obtained respectable accuracy for an early autism diagnosis.” To extract the features from unlabeled videos, Liang, S et al. [16] developed an unsupervised feature-learning technique. The technique can produce results with an accuracy of 98.3%. Khosla, Y et al. [13] achieved 87% accuracy using pre-trained deep learning models - MobileNet, InceptionV3, and ResNetV2.Facial expressions, AUs, arousal, and valence were used in an end-to-end method for Autism categorization developed by Li, B. et al. [14]. Autism detection ability was enhanced by approximately 7% with 76% f1 score, as evidenced by the use of several facial attribute representations. “A CNN classifier model was suggested by Jahanara, S. et al. [12] to identify autism. The pre-trained ImageNet VGG19 version, ReLU actuation Function, Adam Optimizer, and categorical cross-entropy loss function all have been employed in CNN and have produced results that are significantly more accurate than the conventional method of diagnosis.” “A method to diagnose ASD based on anthropometric facial traits that may distinguish between ASD and TD was proposed by Michelassi, G.C. et al. [19], and the best result was obtained by an SVM classifier with 86.2% accuracy.” Sumi, A.I. et al. [24] developed a system that gathers information from four distinct sensors (GPS, heart rate, accelerometer, and sound). The system also uses a fuzzy rule-based approach. “Afrin, M. et al. [1] suggested an artificial intelligence (AI)-based solution for autistic kids and demonstrated a recognition performance based on the LDPv descriptor.” Taj-Eldin, M. et al. [25] talked on the usefulness of wearable physiological and emotion monitoring equipment for patients with ASD. Additionally, they contrasted their sensing capacities and investigated the literature on cutting-edge prototypes, clinical validity, and clinical viewpoints. Yolcu, G. et al. [31] proposed a deep learning strategy for recognizing facial expressions with four CNN structures and obtained a 5% success rate higher than the face recognition system using raw photos. “Duda, M. et al. [10] employed machine learning to distinguish between autism and ADHD. SVC, LDA, Categorical Lasso, and Logistic Regression performed with an accuracy of 93%, corresponding to a 92% reduction in the number of behaviors recorded with the standard SRS.” LeNet, GoogLeNet, AlexNet, VGGNet, and ResNet were utilized by Beary, M. et al. [8] to categorize children as either healthy or autistic. The SVM model is thought to be the most often used technique to acquire ASD classification, according to research and analysis done by Thabtah, F. [26]. Al Banna, M. et al. [5] evaluated emotional state using an AI-based emotion detection system and discovered sensor data for autistic patients. “With an F1 score of 95% and an accuracy of 95%, Lu, A. et al. [17] proposed a feasible ASD screening method that uses face pictures and the VGG16 model to identify children with ASD.” “In [32], children with ASD and TD were subjected to eye tracking while talking face-to-face with interviewers. Four ML classifiers - SVM, LDA, DT, and RF-were employed to implement forward feature selection in order to achieve the highest classification accuracy.” “In [2], for the classification of ASD, the authors created three AI-based techniques: FFNNs, ANNs, and LBP.” “Baranwal, A. et al. [7] employed Artificial Neural Networks (ANN), Random Forest, Logistic Regression, Decision Tree, and SVM as part of a dataset screening technique to detect ASD in adults, kids, and teenagers.” “Rani, P. et al. [21] used image processing and machine learning methods to identify emotions in autistic kids by analyzing their facial gestures.” “Thabtah, F et al. [27] suggested an AI technique that maintains sensitivity, specificity, and accuracy.” “Mazumdar, P. et al. [18] described a method for identifying children with ASD that uses data from both eye-tracking and machine learning.” “A CNN-based model may be more helpful than other traditional machine learning classifiers to detect autism, according to Raj, S. et al. [20], who employed a variety of machine learning and deep learning approaches to do so.” “In [4], to develop classifiers to identify ASD, authors pre-processed autism datasets and suggested a machine learning approach.” “Satu, M.S. et al. [22] discussed autism in Bangladesh’s divisional regions and used the J48, Logistic Model Tree, Random Forest, and Reduced Error Pruned Tree. J48 produced the best results.” Thomas, M. et al. [28] introduced an original approach for identifying ASD that makes use of ANN. For the diagnosis of ASD, structural and functional MRIs were performed. Buffle, P, et al. [9] offered information to understand the requirements and difficulties faced by pediatric practitioners with the detection of ASD. “In [11], authors applied deep learning algorithms to identify ASD patients using the brain imaging dataset to identify ASD patients, and they reported achieving 70% accuracy.” From sparse-array raw EEG signals, authors [29] utilized a CNN model to distinguish between facial emotions seen by people with and without ASD. In order to socially communicate with autistic kids, Silva, V. et al. [23] created an automated system that recognizes emotions from facial features and interacts with them with a robotic platform.

Table 1. Literature review

The literature study demonstrates that autism has had a significant negative impact on children’s lives and that this impact is growing daily. Doctors and researchers are working constantly to treat the child. Although we have seen some uses of machine learning techniques, we believe there is still much to be done. Therefore, we made the decision to employ transfer learning with the most recent innovations in order to quickly and reasonably identify the autistic child and assist their guardians in improving the child’s condition by taking necessary actions.

3 Dataset

The dataset that has been used in this study is collected from Kaggle. It is an open-source dataset that is available to be used by all. The age range is between 4 to 13. The dataset, entitled ‘Autistic Children Facial Dataset’ contains a total of 2936 facial images of children, where 1468 images for autistic children and 1468 images for non-autistic children. It is divided into 3 categories: train, test, and valid. Each three of them contains 2 subcategories: autistic and non_autistic. In the training repository, there are 1268 images of the autistic child and 1268 images of the non-autistic child. In the test repository, each of the sub-categories contains 150 images. In the validation repository, each of the sub-categories contains 50 images.

We provide the classifier with the image data, pre-process it to make it trainable and testable, transform it to an array of pixels, and scale all the images to 224\(\,\times \,\)224 pixels so that the machine can analyze them more quickly. In order to normalize the data, we first convert the arrays to NumPy-arrays and then divide those NumPy-arrays by 255 (the image’s grayscale range) to obtain values for the NumPy-arrays that fall between 0.0 and 1.0. We sparse these data into two classes by setting 0 for all the ‘autistic data’ and 1 for all the ‘non-autistic data’ which are referred to as the predicted outputs. The classifier then uses this data to train the machine. We have shown some sample images below.

Fig. 1.
figure 1

Autistic child 1

Fig. 2.
figure 2

Autistic child 2

Fig. 3.
figure 3

Autistic child 3

Fig. 4.
figure 4

Normal child 1

Fig. 5.
figure 5

Normal child 2

Fig. 6.
figure 6

Normal child 3

4 Methodology

In this section, the steps that have been taken to complete this research have been illustrated. This section includes the methods that are used to perform this classification, how we get them, use them, and which one is preferred.

We will be classifying whether the child is Autistic or Normal. We have used facial images as input data for this classification. To make this classification possible we need to process a huge amount of data. For that, we use deep learning models. Three deep learning models are used up to their latest versions, namely VGG 19, Inception V3, and DenseNet 201. These models are lightweight, have faster processing capability, are easy to use, maintain accuracy, reliability, and so on compared to the other available models. They are more developed than their previous versions. We have measured the performances of these three models individually and compared their accuracy, precision, and recall. The comparisons are given below in Table 2 and Table 3. We have also measured the AUROC value which is shown in Table 4. After all these calculations are done, we get the VGG 19 as our preferred model among the three of them.

At this point, transfer learning is used. We import one of the deep learning models at a time from the ‘Keras applications’. We employ 3 fully connected layers, 1 softmax layer, 5 maxpooling layers, and 16 convolutional layers. We use the VGG19 method including the parameters where include_top is false, weight is imagenet and input shape is 224\(\,\times \,\)224\(\,\times \,\)3. As the layers of the models are already trained using the ‘imagenet dataset’ there is no need to train the layers again so we create a loop and set the value for layer_trainable as false. Then, flattening is performed and the resulting outputs are sent into the dense layer. In the parameters of the EarlyStopping function, the value of monitor is loss, mode is min, verbose is 1 and patience is 5. The model is compiled where the optimizer used is adam, loss is calculated by sparse categorical cross-entropy and the metric used is accuracy. As an activation we use softmax. We execute the model, where we provide ‘input data’ in the form of arrays, their ‘predicted outputs’, epochs is 11, callback is EarlyStopping function, batch size is 32 and shuffle is true, to train the machine. Once the training is completed we can test the machine for unknown data and the machine predicts if the new data belongs to the autistic class or the normal class.

Table 2. Comparing the performances of VGG 19, Inception V3 and DenseNet 201 (to classify Autistic Child).
Table 3. Comparing the performances of VGG 19, Inception V3 and DenseNet 201 (to classify Normal Child).
Table 4. AUROC values of VGG 19, Inception V3 and DenseNet 201
Fig. 7.
figure 7

Proposed methodology

4.1 Classification Algorithms

VGG 19, Inception V3, DenseNet 201

In our system, we have used 3 deep learning models and after comparing their performances we get the VGG 19 as our chosen model.

Significance of Accuracy Curve. We have shown the training accuracy curve of the three models. Training accuracy performs a very critical role in measuring testing accuracy. If we can achieve good training accuracy then we will be able to get good testing accuracy also. When training accuracy is good, it means we have trained our machine well enough to learn how it can recognize unknown data. As we are working with facial data processing so in our case, we tried to teach the machine how to learn the images by recognizing their pixel density. And we have been able to get around 95–99.9% training accuracy for all the models. So it can be said that the machine will recognize the testing data well.

Fig. 8.
figure 8

Accuracy of VGG 19

Fig. 9.
figure 9

Accuracy of Inception V3

Fig. 10.
figure 10

Accuracy of DenseNet 201

5 Result and Discussion

We have proposed the VGG 19 model for Autism detection. Here, we use VGG 19, Inception V3, and DenseNet 201. The models provide 85% accuracy with VGG 19, 78% accuracy with Inception V3, and 83% accuracy with DenseNet 201. Amongst all of them, we can see that VGG 19 gives more accuracy. The system can be executed using both Google-Colab and Jupyter notebook. We have noticed that the results, these software platforms provide are pretty much reliable, but the results may vary up to the rate of (+–)2%.

To identify the chosen model, measuring accuracy is not well enough, We also need to calculate the AUROC value also. To calculate the factors in a ROC curve, we use AUROC which measures the area beneath the entire ROC curve and offers a combined degree of performance throughout all feasible category thresholds. It should be noted, the detection of a child’s situation in autism is simple and suitable using machine learning methods. Although it is not a fully curable thing, it can get improved by diagnosing, and for that, it should be detected as early as possible. The more early it is detected, the more it will be helpful for the children and their parents to deal with it and make the situation easy for everyone. As we know it is a lifelong circumstance but the lifestyles of anyone may be greatly progressed with the right care and aid. It just not affects the child who is infected but also affects the persons around him/her mentally. It varies through families, relationships, and society. We cannot deny the fact that early detection can make many things get better and can be developed when it is not too late. Early intervention stops complex behavior from turning into a dependency. Proper early remedy can reduce children’s signs and symptoms and can improve their natural behavior.

5.1 Performance Matrices

Performance evaluation matrices are displayed in this section. Among the three models, we have selected the finest model after evaluating their performances. In the performance evaluation, ‘0’ defines an autistic child, and ‘1’ defines a normal child.

Recall What percentage of true positives were successfully identified?

Precision How many of the positive identifications were actually accurate?

figure a

F1 Score. It assesses the dataset’s accuracy of the model. F1 score is defined as follows:

$$\begin{aligned} F1 score = 2 * \frac{Recall * Precision}{Recall + Precision} \end{aligned}$$
(3)

Accuracy. Accuracy is one factor to consider during classification. It is the proportion of predictions that our model successfully predicted.

$$\begin{aligned} Accuracy = \frac{TP+TN}{TP+FN+FP+TN} \end{aligned}$$
(4)
Fig. 11.
figure 11

Performance of VGG 19

Fig. 12.
figure 12

VGG 19 confusion matrix

Fig. 13.
figure 13

Performance of inception V3

Fig. 14.
figure 14

Performance of DenseNet 201

ROC Curve. The total performance of a model with respect to category thresholds is summarized by the ROC curve. The TPR and the FPR are represented by the ROC curve.

  • TPR - (True Positive Rate)

  • FPR - (False Positive Rate)

figure b
Fig. 15.
figure 15

ROC curve

6 Comparative Study

We know that each work is more individual than the others. Because every author works in their own way and methodology. The methodology and the models that are used in this paper are totally unique. We have used a different amount of data for training, testing and validation, pre-processed them and used different methods. Although every piece is different, comparison with various papers gives us an overall idea about the concept and prepares our minds to think differently. Here, in this section, we will try to make a comparison of different papers with ours. Table 5 displays some comparisons.

Table 5. Performance matrices of our proposed model with the related studies

Baranwal, A, et al. [7] use ASD screening datasets like binary, integer, and string type data. It is a questionnaire-format dataset. 292 child records are used. They use ANN, RF, LR, DT, and SVM ML models for classifying autism and get 96.77% accuracy. Ahmed, I.A et al. [2] use 547 child image data. They use ANN, FFNN, and CNN models and their proposed model is CNN-based ResNet-18 which provides 97.6% accuracy. Li, B, et al. [14] use a video dataset and get access to 88 children’s data. The classifier is trained using four facial attributes: expressions, valence, AUs, and arousal. About 1.2 million samples from 88 kids are included in the dataset. They achieve 76% accuracy using seven binary classifiers utilizing BottleNeck, MobileNet, and EESP approaches. Whereas we’ve used a facial-image-based dataset which contains 2936 image data. We have used transfer learning techniques which make the classifiers much faster while executing the code. The deep learning techniques used are VGG 19, Inception V3, and DenseNet 201. All these three models are reliable and generate reasonable results. We have got 85.0% accuracy.

7 Contribution

Autism is a life-long situation where one cannot be fully cured but proper remedies and services can improve one’s symptoms and everyday activities. There are many treatments available to deal with it because it is not a matter to be ignored. Children with autism are also a part of our society. So it is our responsibility to make them capable enough to live their life healthily. Doctors and researchers have been working on how autism can be detected effectively. Autism can be identified in a number of ways, including eye tracking, brain MRIs, human behavior observation, etc. One of the most effective ways among them to detect autism is to use facial expressions with the help of ML techniques. Deep learning has achieved outstanding results in a wide range of pattern recognition and image analysis in recent times. We decided to work on autism in children through image classification for early detection. We used the dataset of facial images of children which assisted us with preparing, testing, and approving the models to analyze Autism. We made an effort to create a model using deep learning and transfer learning techniques. We used three methods and among all of them, VGG 19 provided the most accurate results in our research. It is able to detect the child’s condition pretty accurately and fast. Using this ML model we can help the child by detecting their disease and making their parents conscious of their child’s condition which helps them to consult with a specialist as early as possible. We think and hope that early detection can really help and improve a child’s life and the child’s family members through their life journey.

8 Conclusion

In a few countries, autism no longer gets sufficient interest to work on. In truth, it is overlooked in many countries. Parents of autistic children can learn early on a way to assist their child to enhance mentally, emotionally, and bodily during their developmental degrees with help from specialists and companies. However, it’s so vital to diagnose ASD, as without an analysis this can make so many things tough for someone who’s affected and for those who are related to them. In this paper, we carried out some ML models: VGG 19, DenseNet 201, and Inception V3. VGG 19 is taken into consideration to be one of the best computer imaginative and prescient models to this point. Amongst all the classifiers we can see that VGG 19 gives more accuracy.