Introduction

Facial palsy (FP) not only leads to a variety of motor deficits in the face, but also affects the emotional and expressive possibilities of the patient. The psychosocial effects have received more attention in the recent years [1, 2]. To assess the psychosocial impairment internationally accepted general psychological measurement tools and also patient-reported outcomes measures like the FDI and FaCE have been established [24]. In contrast, only little progress has been made toward the development of an internationally accepted facial nerve grading scale to assess the clinically dominant motor disorder related to facial palsy [5, 6]. The main reason probably is that all facial gradings systems used in clinical routine rely on a subjective assessment. Due to the subjective nature of the assessment, all grading systems, even the frequently used systems like the House-Brackmann and Sunnybrook scale, have problems with reproducibility with low interobserver and intraobserver variability, and sensitivity to changes over time and/or following interventions [5]. One should be aware that new electronic scales are practical because they can be used in a variety of electronic devices, including smartphones, tablets, and computers [7]. Nevertheless, such systems still are clinician based and not objective. Although these deficits are well known, even the very important randomized controlled clinical trials on treatment of FP relied and still rely on these subjective assessment tools [810].

There is an urgent need for photographic standardization to report outcomes for patients with facial palsy [11]. Furthermore, a reliable grading tool offering zonal information of the face, describing static, dynamic, and synkinesis features extracted from these photographic series is needed [12]. At best, the results should be presented to the clinician with immediate graphical outputs [13].

It is said that quantitative technology-based systems still are too complex for clinical routine, are not yet widely available and are expensive [5]. Recently, we have introduced an easy to use, fast, automated and objective facial grading system based on action coding of facial expressions using standardized photographs [14]. Facial action coding is a standard method to assess facial expression in psychology but not familiar to most physicians dealing with facial palsy. Now we refined the system allowing a presentation of the objective results with graphical outputs in the notation of the Stennert index, House-Brackmann scale, and Sunnybrook grading, three facial grading systems physicians are much more familiar with for application in clinical routine treatment of patients with facial palsy.

Material and methods

The study protocol for this retrospective data analysis was approved by the institutional ethics committee.

Patients

A standardized data collection were performed in the Department of blinded. All photograph series of patients with unilateral FP were collected from January 2007 to December 2011. The diagnostic procedures and therapy for all patients were the same, for details, see [15].

Standard photographs of facial expression

Patients with acute facial palsy are routinely photographed using static posed facial expressions in a standardized manner. Participants were seated in a brightly lit room and instructed to focus a digital camera. The patients were instructed to perform a facial expression as demonstrated by the professional photographer. The patients were requested to perform the expression spontaneously. Furthermore, facial expressions were not exercised prior to the photographs. The sequence of static posed nine expressions was always constant (Supplementary material: Supplement Fig. 1): (1) at rest, (2) closing both eyes, (3) closing both eyes with maximal effort, (4) frowning, (5) wrinkling the nose, (6) lifting corners of mouth with closed mouth, (7) showing the teeth, (8) pursing the lips, and (9) pull down both corners of mouths. Hence, normally nine images were taken as a set per patient per time of assessment. All images were transferred serially from the digital camera into the electronic medical record of patients.

Clinical grading of facial palsy

The FP was graded according to the House-Brackmann grading [16], the Sunnybrook grading [17] and also according to the Stennert Index [18]. The House-Brackmann scale is a gross six-point facial grading system (I = normal; VI = total paralysis). Sunnybrook grading is a regional weighted system that rates three subscores: resting symmetry, the degree of voluntary facial muscle movement, involuntary muscle contraction (synkinesis). The three subscores are used to calculate a composite score (0 = total paralysis; 100 = normal function). The Stennert index classifies the face at rest (0–4 points; 0 = normal to 4 = complete loss of resting tone) and during motion (0–6 points; 0 = normal to 6 = no motion) separately. Both subscores are summed up to the Stennert total score. Two trained observers independently graded all photograph series. In case of disagreement, the photograph series was evaluated together to find an agreement.

Automated facial grading using compact discriminative facial image features and a support vector machine

A novel fast and marker-free automated method for unilateral facial grading was developed based on support vector machines [19] and certain discriminative facial image features [2023]. Previously, we have published an active appearance model (AAM) approach for automated action coding of facial expressions in patients with FP [14]. The present method is a further development. Instead of only using features of a trained AAM and Action Units (AU), predicted from it, as described in another publication by our group [21], we additionally exploited Euclidean landmark distances, predicted by Supervised Descent Method (SDM) [22], and features in the form of vectorized layer activations of a pre-trained Convolutional Neural Network (CNN) [23]. Briefly, the contribution is twofold: first, compared to landmark detection by AAM fitting, the application of SDM has the advantage of numerical stability during localization, improved speed as well as less sensitivity against differences between the expressions used to train the model (usually standard emotions using publicly available datasets) and analyzed expression under facial paralysis that differs a lot from normal emotions. In addition to the AAM model parameter, which covers linear relationships between expressions, and AUs, the distance between landmarks localized by SDM from the left and right side of the face were computed. Moreover, features were extracted out of layer activations of a pre-trained CNN, which show advances in many machine learning applications. The second improvement arises from a different set of features used for the facial grading. Based on different sub-indices in the clinical grading of facial palsy further investigation has shown that particular feature types (of the first contribution) for specific sub-indices are better suited for the prediction process and achieved better results. Support vector machines (SVM) were trained for each sub-index using the related features generated from the training images and ground truth grading (i.e., the subjective grading results) for facial grading prediction.

In summary, methods for generating new, powerful features partially using a more stable localization method of facial landmarks, that showed significant improvement in localization accuracy in the context of faces with FP, were developed. Moreover, the set of new features were used for facial grading prediction.

Statistical analysis

All statistical analyses were performed using IBM SPSS, version 24.0.0.0 and SAS 9.4 procedures proc mixed and proc freq. Univariate analysis with Spearman’s correlation was used to analyze the correlation between subjective and objective gradings. Kappa statistics were used for the ordinal grading of House-Brackmann grading and Stennert index. Furthermore multinomial logistic regression random effects models were used to examine the association between subjective and objective gradings for these scales. The intraclass coefficient (ICC) based on a linear one way random effects model was applied for the scaled data of the Sunnybrook grading. All tests were two-tailed, and p values < 0.05 were considered significant.

Results

Patients’ characteristics and standard photograph series

Table 1 gives an overview of the patients’ characteristics. Median age of the patients was 50 years with a wide range from young children (4 years) to elderly age (91 years). The genders were balanced (Females: 52%). There was no side dominance (Left side: 55%). Most patients had an idiopathic FP (59%), followed by traumatic lesions (19%), and FP due to infectious diseases (16%).

Table 1 Characteristics of patients with peripheral facial palsy (N = 233)

Overall, 508 standard photograph series, i.e. 4572 still photographies, from 233 patients were analyzed and facial gradings were performed (Supplementary material: Supplemental Table 1). The median interval between the onset of the palsy and first photography series was 6 days. 95% of the patients presented within 90 days after onset. 44.6%, 26.6% and 14.2% of the patients were photographed one time, two times, and three times, respectively, during the course of FP. The median interval between first and second photography session was 41 days. The median interval between second and third photography session was 63 days.

Subjective and objective facial grading

All subjective and objective gradings for the three gradings system are summarized in Table 2. At first examination, the median subjective and objective House-Brackmann grading were 3 for both assessments (range 1–6 for subjective grading and 2–4 for objective grading). At first examination, the median subjective and objective total scores of Stennert index were 5 for both assessments (range 0–10 for subjective grading and 0–8 for objective grading). At first examination, the median subjective and objective total scores of the Sunnybrook grading were 41 and 45, respectively (range 1–96 for subjective grading and 11–91 for objective grading).

Table 2 Subjective and objective grading of the patients at first examination (N = 233)

Spearman’s correlation analysis is shown in Table 3 and Fig. 1. The correlation in-between the three different grading systems was moderate (r ≤ 0.5), both in-between the subjective and objective gradings. There was no correlation between the subjective and objective House-Brackmann grading (r = 0.007). The correlation between the subjective and objective total score of the Stennert index was moderate (r = 0.534). The correlation was best for the Sunnybrook grading. The correlation between the subjective and objective total score of the Sunnybrook grading was good (r = 0.698). An example of a series measurement over the time course of the facial palsy is shown for one patient in Fig. 2.

Table 3 Correlation between the different subjective and objective gradings*
Fig. 1
figure 1

Comparison of the subjective versus the objective grading by the three different grading systems: House-Brackmann grades (I–VI), Stennert total index (0–10), Stennert index subscores, Sunnybrook total score grading results (0–100), and Sunnybrook subscore gradings. The Sunnybrook grading showed the best prediction and correlation. X-axis: subjective grading. Y-axis: 95% confidence intervals (CI) of predication of the objective grading). The results of Spearman’s correlation analysis are also shown

Fig. 2
figure 2

Example of a 77-year-old woman with idiopathic facial palsy. Standard photographs, each row is showing one examination. The patients showed an incomplete recovery: good recovery of the resting tone with full symmetry, but defective healing of the muscle movement coordination. The graph is showing the subjective (blue) and objective grading of the photographic series over the time course of the facial nerve regeneration. First examination 3 days after onset of the palsy on top, last examination 696 days after onset at the bottom

Kappa and ICC statistics and the multinomial logistic regressions confirmed the results of the univariate correlation analysis. There was no agreement between subjective and objective House-Brackmann grading (Test for symmetry 80.61, df = 15, p < 0.001) the corresponding weighted kappa = − 0.0105; 95% confidence interval (CI) =  − 0.0542 to 0.0331; p = 0.6541). Based on to the multinomial logistic regression analysis, there was a lower probability for higher (more severe FP) subjective compared to objective House-Brackmann gradings [Odds ratio (OR) 0.573; CI 0.360–0.912; p = 0.0193]. There is also no agreement between subjective and objective total Stennert grading (Test for symmetry 166.37, df = 55, p < 0.001) although there is nonzero weighted kappa = − 0.2670; CI 0.2154–0.3186; p < 0.0001). Here, the probability for higher scores (more severe FP) was higher for subjective compared to objective Stennert gradings (OR 1.608; CI 1.202–2.150; p = 0.0014). The best agreement was seen between subjective and objective Sunnybrook grading (ICC 0.34645; p 0.5596; Fig. 3).

Fig. 3
figure 3

Bland–Altman plot showing the agreement between subjective and objective Sunnybrook grading. A fair agreement can only be seen between subjective and objective Sunnybrook grading

Discussion

It is suggested that the ideal facial nerve grading instrument should have to follow characteristics [5]: it provides regional scoring of facial function, performs static and dynamic measures, examines secondary sequelae of facial palsy like synkinesis, yields reproducible results with low interobserver and intraobserver variability, is sensitive to track changes over time and following interventions, convenient for clinical use. In absence of a sophisticated technology-based methodology for clinical application in daily routine, members of the Sir Charles Bell Society have recommended a widespread adoption of the clinician-based Sunnybrook Facial Grading Scale as the current standard in reporting outcomes of facial nerve disorders [24]. As the objective Sunnybrook grading revealed the best results, we primarily propose to use the presented automated algorithm to generate a Sunnybrook grading.

The presented approach seems to fulfill many aspects of an ideal facial grading instrument [11]. It uses nine easy to take standardized photographs. Taking standardized photographs still is much easier to realize in clinical routine than taking a standardized video. The approach is marker-free, i.e. this source of error and variability is excluded. It does not matter how often the photographs are presented to the algorithm: An analysis of the same nine pictures always reveals the same result, i.e. there is absolutely no interobserver and intraobserver variability. Furthermore, the results of the image analysis are expressed in the familiar nomenclature of classical grading system instead of introducing new and unfamiliar parameters to characterize facial function. This was a major disadvantage of our first approach automatically recognizing action units defined be the Facial Action Coding System (FACS) [14]. Psychologists are familiar with FACS, but neurologists and otorhinolaryngologists are not trained to use FACS to define a motor dysfunction in patients with FP.

The Facial Assessment by Computer Evaluation software (FACEgram) is a validated free software developed for accurate measurement of facial dimensions, proportions, and angles based on the analysis of video pixel data using artificial neural networks [25]. It also does not need head makers. The result is expressed as an objective House-Brackman grading and as relative function in different facial region compared to the healthy side. It was proposed that FACEgram might have the potential to become the universal automated facial grading system [26]. Interestingly, the groups propagating FACEgram actually focus more on spreading an electronic version of a subjective clinician-graded facial function scale [6]. The reasons might be that the FACEgram analysis is limited to patients with normal pupil position and cannot quantify static procedures.

The OSCAR system is also a marker-free system, but needs the recording of a 20 min video to calculate an objective synkinesis index [27]. Although the system was first introduced in 1994, it took until 2017 to present further 13 patients. A relative synkinesis index is generated expressing the amount of asymmetry induced by synkinesis on the paretic face in relation to the active movements of the normal side. This is a quite unfamiliar parameter. Furthermore, the functional deficit induced by the palsy is not quantified.

The development of a new facial grading system is facing the immanent problem that there is no gold standard because all subjective systems including the three systems applied in the present study have a more or less limited accuracy. Putting this aside, the objective Sunnybrook grading revealed the best results. This simultaneously provides the benefit that an objective regional scoring of facial function for static and dynamic measures is calculated, that synkinesis is scored if present, and that a nomenclature is used that is anyway suggested for standard reporting of facial motor function in facial nerve disorders.

The use of still photographies has immanent limitations as the dysfunction of facial movement is deduced from stills. Changes over time during a facial task are not taken into account. Especially the analysis of synkinesis is at best an analysis of synchronous movement over the time. This explains the limited results when looking only on the analyses of synkinetic movements in our patients’ cohort. Furthermore, it might be helpful to train the algorithm in a large set of patients with more variable synkinesis. The future certainly lies in a three-dimensional (3D) video analysis of the face. A 3D analysis allows a much more detailed analysis of slightest movements in the face. 3D cameras become cheaper and first marker-based approaches have been published [28]. Currently, we try to adapt the presented algorithm to analyze 3D videos of patients with facial palsy. Due to the large data volumes generated by 3D videos, the analyses will be, at least for the next years, limited to practices and hospital with appropriate hardware. To track patients at home simpler approaches are needed. A solution would be a smartphone-based automatic diagnosis system. A first approach was published that discriminates facial nerve palsy from normal subjects has been published [29]. These systems have to be expanded to allow regional analyses and validated in larger cohorts of patients.