1 Introduction

Knee injuries can broadly be classified into three types of namely ACL tear, meniscal tear and abnormal knee. These injuries are so common that around 113,000 ACL tears [1] per year occur in the United States alone, out of which almost 75,000 of those lead to reconstructive surgeries. All three injuries are prevalent among athletes and obese people. These injuries are different from each other. The anatomy of an ACL tear is shown below in Fig. 1a. ACL tears occur due to the stretching, partial tearing, or complete tear of the anterior cruciate ligament. A complete tear is the most common type of damage. Figure 1 below shows how an ACL tear looks like. This injury can only be diagnosed through MRI imaging.

Fig. 1
figure 1

Anatomy of a Anterior Cruciate Ligament tear [2] and Meniscus tear [3]

The anatomy of a meniscus tear is shown below in Fig. 1b. This happens when you forcefully try and twist or rotate your knee whilst applying your entire body weight on it, leading to the meniscus tearing. If not appropriately healed, it could lead to osteoarthritis. Figure 1b also demonstrates how a meniscus tear looks like and where the injury occurs. This injury can also be diagnosed through MRI examination.

Another type of injury is an abnormal knee. Abnormal results usually occur due to some pain or injury caused to the knee’s ligaments due to damage of the knee cap, bone cancer, inflammation, infection in bone, arthritis or degeneration of the knee with age. Figure 2 below shows an MRI of an abnormal knee. MRI is used for the diagnosis of an abnormal knee.

Fig. 2
figure 2

MRI scan of abnormal knee [4]

Therefore, to diagnose all of these injuries, an MRI is used, a radiology imaging technique that creates a picture of the anatomy and physical processes of the body. MRI scans produce a set of various image slices of the organ stacked up together in a voluminous manner so that it looks like an animation of the organ, and the organs can be viewed from different levels of depth on several axes of the organ. 3D representation of MRI images exhibits intrinsic features that helps the deep neural networks to learn effectively [5].

There are three planes in a knee MRI scan, namely, sagittal, axial, and coronal, and to diagnose an injury correctly, the doctor needs to examine the knee MRI scans from all three axes to achieve a good global view. Figure 3 below shows the three planes of the MRI scan of a knee. But even MRI scans have a few significant drawbacks—it’s significantly time-intensive and considering the number of knee injuries that occur. There is a need for a quicker system to help diagnose them, in subject to diagnostic error and variability. Not all the images produced of each plane have the same number of slices, leading to difficulty caused when plotting the slices of MRI and the global view that a radiologist needs to make a proper diagnosis cannot be achieved, especially when there are a lot of slices in an MRI scan. It is necessary to go over each one of them to make diagnosis.

Fig. 3
figure 3

The three planes of the MRI scan of a knee [6]

Deep learning techniques and approaches are well equipped for modelling the intricate connections between medical images and their interpretations and give quick preliminary results following MRI exams because these deep learning approaches can intelligently learn features layer by layer, thereby improving the quality of MRI diagnosis in the absence of radiologists and specialists. Clinical experts can improve the quality and consistency of MRI interpretation by providing predictions from a deep learning model.

Hence to tackle these issues which doctors and radiologists face while diagnosing a knee MRI scan, the contribution of paper is to propose deep learning model using transfer learning. The proposed model applies five pre-trained CNNs, namely VGG16, VGG19, ResNet152V2, InceptionV3, and DenseNet201 on the MRNet Dataset to classify the knee injury into the three types of knee injuries which were discussed earlier. This would help reduce the time of diagnosis per patient allowing a quick switch from one patient to another. Still, it would also give lesser false positive predictions with better accuracy rates so the patient could avoid invasive knee surgeries. All in all, this model will work efficiently at a much faster pace. Also, the proposed model in the paper is doing a three-class classification on all the three planes of a knee MRI scan. The past work done in this area does not cover every plane and measuring accuracy by considering all considered models. They perform classification on only one plane out of the three planes of an MRI scan. The average accuracies and F1 score of each model are: VGG16—75.279%, VGG19—77.5%, ResNet152V2—78.33%, DenseNet201—71.64%, InceptionV3—71.39%. The performance of proposed model is also compared with state-of-the-art deep learning model in [7]. The model proposed in paper shows better accuracy as compared with state-of-the-art [7].

The remainder of this paper is organized as follows: Section 2 discusses the related work with a comparative review of related work; Section 3 describes the research methodology with dataset details, dataset pre-processing, and system architecture. Section 4 provide the  results and discussion of developed model; Section 5 concludes the work with future work.

2 Related works

Deep learning has not been used a lot for disease detection for MRI. It is usually very challenging because it requires analysis of complex abnormalities on multiple sections of different image datasets [7,8,9]. In [8], the author Liu F detected the cartilage lesions within the knee joints using a deep learning approach. Using segmentation and CNNs for classification, a fully automated deep learning-based cartilage lesion detection system was developed. The sensitivity and specificity of the cartilage lesion detection system at the optimal threshold according to the Youden index were 84.1% and 85.2%, respectively, for evaluation 1 and evaluation 2 were 80.5% and 87.9%, respectively. In Pedoia et al. [7], proposes deep learning models for detecting a stage the severity of meniscus and patellofemoral cartilage lesions. The authors showed that detecting meniscal and patellar cartilage lesions using a fully automated deep-learning pipeline is possible. The work shows accuracies of 80.74%, 78.02%, and 75.00% for normal, small, and complex large lesions, respectively. In Fang et al. [9], created three separate CNNs. They made three classification models: one that selected only those image sections containing ACL from the entire MRI dataset. Second, that isolated the region of the intercondylar notch that included the ACL on the chosen image sections to taper the range of information, and a third that created a classification model that finally evaluated the presence or absence of ACL tear on the selected image sections. The DenseNet provided the best diagnostic performance, but their model had loads of limitations as training three different models is a burden. But the biggest drawback was that this could only detect ACL tears that had full-thickness ligament injuries and not the ones with partial tears or intra-substance sprains of the ACL. The reason for it is that detecting these is much more challenging than full-thickness tears because the changes shown by these in the contour and signal intensity of the injured ligament are much subtler [10, 11]. The specificity and sensitivity achieved by this model is 96%, and AUC is 0.98. In [12], a unique CNN architecture called ELNet was implemented, which was lightweight, unlike the one discussed above. It remained lightweight because of the novel integration of multi-slice normalization and BlurPool operations. This model was robust regardless of a highly unbalanced distribution making it very helpful when the number of cases is large. It also helps locate tears on the most significant slice, but the only drawback is that it does not incorporate all the three planes of the MRI—sagittal, axial and coronal. In [13] this paper, a customized 14 layers ResNet-14 architecture of CNN was used with the help of class balancing and data augmentation. The result was calculated using sensitivity, specificity, accuracy, precision, and F1 score of the CNN, resulting in the following AUCs: 0.980 for healthy ACL, 0.970 for partially torn ACL and 0.999 for fully torn ACL. This benefited our assertion made in the beginning that the deep learning approach can be used to detect and evaluate ACL injuries automatically. In [14], used features extraction, histogram-oriented gradient descriptor and gist descriptor techniques were on the dataset. The area under the receiver operating characteristic curve (AUC) achieved for this model was 0.894 for injury detection 0.943 for total rupture. This was achieved by combining SVM and random forest (RF). AlexNet architecture of CNN was avant-garde in works [15] related to deep learning’s use in the ACL tear detection using MRI. With the help of transfer learning ImageNet [16], it extracted the features of the MRNet dataset. In Jaskaran and Sandeep [17], have used machine learning techniques such as decision trees, SVMs, k-nearest and Markov's process to detect ACL tears in sports injuries. Still, they concluded that AI techniques were giving better results as their method gave an accuracy of only 54%. In Chang et al. [18], developed a convolutional Siamese network to predict unilateral knee pain using MRI scans and achieved a great AUC value of 0.8. In Lim et al. [19], proposed using deep neural networks with scaled PCA to detect Osteoarthritis using statistical data and achieved an AUC of 78%. In Wahid et al. [20], implemented a multi-layered convolutional sparse coding to classify the MRI scan as an ACL tear but only for the coronal plane. Although it achieved a good accuracy of 85%, it was not of great use in diagnosis as it examined only one of the three planes and only one kind of injury. Effective object detection model such as You Only Look at Once (YOLO) with CNNs can be used to localize the object (area where the features reveal disease) [21].

3 Research methodology

The methodology that was applied to solve the problem of knee injury detection using MRI is shown in Fig. 4. This discussion of the section is divided into three main parts—dataset considered, data pre-processing and system architecture.

Fig. 4
figure 4

Overview of used methodology

3.1 Dataset

The dataset used is the MRNet dataset [22], consisting of 1370 knee MRI exams performed at Stanford University Medical Centre. The dataset contains 1,104 (80.6%) abnormal exams, with 319 (23.3%) ACL tears and 508 (37.1%) meniscal tears. The samples have been split into a training set (1130 exams, 1088 patients) and a test set (120 exams, 113 patients) per plane, which means axial, coronal, and sagittal. The resolution of each image is 1500 × 2000 pixels.

3.2 Data pre-processing

First, the dataset is loaded, and it is divided into two parts: the test set, which has 1130 slices of the MRI scan, and the other part being the test set, which has 120 slices of the MRI scans per plane. Then through data augmentation, a function was defined that ensures that the MRI scan input has dimensions s × 256 × 256 × 3 where s is the number of slices in the MRI scan and 3 represents the number of color channels per slice and then call the function for both test and train data. This is done so because the CNN architecture typical to all the networks is MRNet, and the input of MRNet has the dimensions s × 256 × 256 × 3.

3.3 System architecture

Figure 5 shows the system architecture of the model. Around 1130 images per plane were fed as input for training the model and test the model on 120 images. The data was pre-processed with the help of data augmentation to tackle an unequal number of slices in an MRI scan. The five pre-trained networks—VGG16, VGG19, InceptionV3, ResNet152V2 and DenseNet201-are used to perform transfer learning.

Fig. 5
figure 5

System architecture

For each pre-trained model, the training layer is being froze to preserve the knowledge that these networks contain, and then the output is taken. A dense layer is added with one neuron within the sigmoid activation function to get the probability. After which the output of each model is taken and added a global average pooling layer, a dropout of 0.6 is added, and with the dense layer, the probability is obtained. After doing this, it’s checked how well the model predicts abnormal cases, predicting ACL cases and predicting meniscus cases at each plane by making individual predictions for each. Then all the three predictions are combined, which leads to the combining of all the probabilities. If the combined probability comes out to be greater than 1 then it is appended to 1; otherwise, it is appended to 0, where 1 stand for it being positive for the injury and 0 indicates that it is injury-free. Then the predicted probabilities are converted into a list, i.e., a 1D array, and compare with the dataset's column containing the actual probability values of the dataset, which then converted into a 1D array to calculate each pre-trained network to see which performed the best. After calculating each ensemble's accuracy and F1 scores, the accuracy graphs and training loss graphs were plotted, thereby concluding our classification ensemble.

4 Results and discussion

The model’s mainly aim to perform multiclass classification on the MRI scans provided by the MRNet. The implemented model examines all three planes of an MRI scan. Results for each plane were given for diagnosis of the three kinds of injuries—ACL tear, meniscus tear, or abnormal injury. The results obtained for each model in each plane for all three kinds of injuries is shown in Table 1. The discussion on Table 1 results is as follows,

Table 1 Accuracy and F1 score for each model with respect to each injury

VGG19 gave the highest accuracy of 87.5% compared to all the other pre-trained networks for abnormalities in the knee. VGG19 is popular for providing higher accuracies on large scale image recognition settings and training the deeper networks by pretraining on shallower versions. ResNet152V2 gave the highest accuracy of 83.33% compared to all the other pre-trained networks for detecting ACL tears in the knee, and it also shows the highest average accuracy of 78.33%. ResNet152V2 gives good results as it uses skip connection, making it possible to train on way deeper networks. It makes it easier to copy activations from layer to layer, preserving information through every layer, and these skip connections facilitate both features constructed in shallow and deep networks. DenseNet201 gave the highest accuracy of 70% compared to all the other pre-trained networks for detecting meniscus tears. DenseNet201 gives good results because it is well suited for smaller datasets as instead of adding the activations produced by one layer to later layers, they are concatenated together.

Table 2 shows the comparative average accuracy of all implemented model and comparison of the obtained results with state-of-the art results in [7]. The average accuracy of the RestNet152V2 model considered in work is highest compared with the state-of-the-art model in [7]. The highest accuracy achieved by the VGG19 model in the proposed work is 87.5% (for abnormalities), whereas the highest accuracy achieved by [7] is 80.74% (for abnormalities).

Table 2 Comparative average accuracy

5 Conclusions

The main goal of this paper was to build a deep learning model that helps in the detection of the three kinds of a knee injury, diagnosing all the three planes of a knee MRI scan. Using the proposed ensemble, for the detection desired results were achieved using data augmentation, transfer learning, and fine-tuning on pre-trained networks. Five pre-trained networks were used. All the five pre-trained networks gave good enough results for all three kinds of knee injuries while examining each axis of the knee MRI scan. VGG19 performed best for detecting abnormalities in the knee. The accuracies and F1-scores achieved by VGG19 for abnormalities were higher than those achieved by any other pre-trained networks. For abnormal injuries, achieved an accuracy of 87.5% and achieved an F1-score of 92.46%. ResNet152V2 performed best for detecting ACL tears in the knee. The accuracy and F1-score achieved by ResNet152V2 for ACL tears were higher than those achieved by any other network. For ACL tears, achieved an accuracy of 83.33% and F1-score of 80.39%. DenseNet201 performed the best for meniscus tears and gave results better than all the other pre-trained networks used. The accuracies and F1-score for meniscus tears achieved by DenseNet201 are 70% and 80.21%, respectively. The novelty of this research is that it not only focuses on detecting the type of knee injury and classifying it into the three types of injuries but it also diagnosis each plane of the MRI scan and tells which pre-trained network out of the five pre-trained networks used performs the best making this model very beneficial to radiologists for making a more accurate diagnosis and saving the radiologists a lot of diagnosis time. At the same time, it tackles the issue of an unequal number of slices of an MRI scan, making a diagnosis for radiologists easier and ultimately helps to prioritize high-risk patients while saving many from unnecessary invasive knee surgeries. The proposed technique is also compared with state-of-the-art work and it shows better accuracy than them. The future scope of this study can be extended to experiment with other pre-trained networks and developed an efficient model for knee injury detection.