Introduction

Cone beam computed tomography (CBCT) is used for the evaluation and diagnosis of disease with routine use continuing to increase (Anderson et al. 2014). However, the luxury of using this diagnostic tool comes together with a radiation dose risk to the patient. Although low dose protocols have been developed to address this matter, there is a lack of consistency among those protocols and clinical uses in current CBCT models (Nemtoi et al. 2013; Anderson et al. 2014), particularly for dose risk and image quality (Carter et al. 2008).

Visual grading experiments have been frequently used to assess image quality. In these experiments, each image is assessed by multiple observers for optimal diagnostic performance and is assigned a score reflecting the extent of image quality. An example is to evaluate the subjective image quality with visual grading analysis (VGA) (Hidalgo Rivas et al. 2015; Kadesjö et al. 2015; Liang et al. 2010; Pittayapat et al. 2013; Vandenberghe et al. 2007). To optimize the radiation dose used in the clinical setting, it is important to relate the physical image evaluation to the subjective image quality. Often, the scores from VGA are defined on a 3-, 5-, or 7-step Likert-type scale which is a widely used psychometric scale (Likert 1932). For example, when a 5-step scale is used, it could be presented as 1 for “Clearly not visible” to 5 for “Clearly visible”. In this sense, the scores are defined in an ordinal scale, meaning that they have a natural ordering but the differences between 1 and 2 may not be the same as those between 2 and 3. The ordinal nature poses a challenge to researchers as it requires some special techniques to handle.

To analyze data from visual grading experiments, the method of visual grading analysis score (VGAS) is often used—it simply calculates the average score across all criteria and observers (Kadesjö et al. 2015; Månsson 2000). The scores are then plotted against the explanatory variables or compared between different groups using statistical methods such as t-test and analysis of variance (ANOVA). Due to the ordinal nature of the data, any methods involving calculation of the means are inappropriate, as this would assume that the data are interval or ratio in nature.

In making the image analysis statistically valid, a visual grading characteristic (VGC) method was developed and formulated based on the receiver operating characteristic (ROC) method (Båth and Månsson 2007). However, VGC is limited to comparing two parameters at a time (Zarb et al. 2015; Zheng et al. 2016). When assessing the effects of more than two parameters, researchers can use the visual grading regression (VGR) method to handle ordinal data (Zarb et al. 2015; Zheng et al. 2016; Smedby and Fredrikson 2010; Smedby et al. 2013; Saffari et al. 2015; Agresti 2010). With this approach, the probability of the response variable being less than a certain score is modeled and the simultaneous effects (fixed and/or random) of several explanatory variables can be assessed (Hedeker and Gibbons 1994). Further, it can be easily performed with almost all modern statistical software.

When observers are asked to give only one score for each image, VGR can be applied directly, and the differences among observers can be captured by incorporating random effects (Zarb et al. 2015). In some visual grading experiments, observers are asked to give more than one score for each image, such as to assess the visibility of several (n) anatomical landmarks and an overall quality of an image. The scale of the scores for the visibility (e.g., a 5-step scale) of individual anatomical landmarks may be different from that for the overall image quality (e.g., a 3-step response representing the acceptability of the image for clinical use). In the above example, there are a total of n + 1 response variables (n = anatomical landmarks plus an overall image quality response) for each image, and they are statistically dependent. If a low score is given to an anatomical landmark, other anatomical landmarks will be likely to receive low scores as well. In this regard, the data structure is multivariate ordinal. The ordinal regression model, typically assuming independence among response variables, may become questionable if used in the scenario. Although a multivariate ordinal regression model is more appropriate to handle this kind of data (Liu and Hedeker 2006), practitioners may find it hard to understand and interpret the results.

In multivariate statistics, to handle the challenges brought by the dependency structure among the variables, it is common to reduce the dimension of the multivariate data. In terms of visual grading experiments, one can reduce the dimension of the multivariate ordinal data by defining an integrated image quality (IIQ) score for each image based on all scores given, which technically creates a new response variable (Hidalgo Rivas et al. 2015; Al-Humairi et al. 2016a, b). However, in previous IIQ applications, the effects of explanatory variables have not been quantified and inter-observer variabilities have not been considered (Hidalgo Rivas et al. 2015; Al-Humairi et al. 2016a, b). In this paper, a new method is proposed which combines the IIQ and VGR methods to manage the multivariate ordinal data arising from visual grading experiments. This method is named here as the integrated visual grading regression (IVGR) model.

Planning for dental implant surgery was used to investigate this new method. Dental implant surgery is often considered as an elective procedure; and updated radiographs and/or CBCT images are listed as surgical safety requirements of this procedure (Bidra 2017). While CBCT is useful “to evaluate the position of the implant and its surrounding structures, and to determine whether the implant is removed, following dental implant surgery” (Kim et al. 2020), the effective dose of dental CBCT units differs markedly (Rios et al. 2017). As recommended by the European Association of Osseointegration (EAO), the imaging technique chosen should be optimized to provide the relevant diagnostic information with the least radiation dose (Harris et al. 2012). To the best of the authors’ knowledge, there are no consensus guidelines of dose or image quality for dental implantology. Hence, the aims of this study were to (1) provide a general formulation and application of the IVGR method and (2) assess and predict possible radiation dose reduction, using a set of CBCT scan data related to anatomical landmarks for dental implantology.

Materials and methods

Application

Experimental setup

A skull phantom comprising a dry adult human skull embedded in Plexiglas-simulating soft tissue (3 M MRI CT Phantom Real Human Skull), was used. The skull was imaged with a CBCT unit (Planmeca Promax 3D Max) operated at 70 kVp and 8, 10 and 12 mA, 80 kVp and 4, 6, 8, 10, and 12 mA, and 96 kVp and 4, 6, 8, 10 and 12 mA, using a large field of view (FOV) to the maxillofacial area (Fig. 1), with 32 s scan time and a voxel size of 200 μm. The exposure parameters were pre-determined by the manufacturer; no manual adjustment was performed. The radiation dose was recorded as a dose area product (DAP), in mGy·cm2, which was extracted from the data embedded in each image. Ethics and radiation safety approval was granted by the Charles Sturt University Human Research Ethics and Radiation Safety Committees (Reference Number: 414/2013/01).

Fig. 1
figure 1

Examples of cone beam computed tomography (CBCT) images and the anatomical landmarks used in this study: (A) CBCT image at 80 kVp and 10 mA, showing the maxillofacial landmarks (MC: mandibular canal, MF: mental foramen, LF: lingual foramen, RT: right-side), (B) CBCT image at 70 kVp and 12 mA, (C) CBCT image at 80 kVp and 10 mA, (D) CBCT image at 96 kVp and 10 mA

Volume datasets were stored in the Digital Imaging and Communications in Medicine (DICOM) format. Axial, parasagittal, and three-dimensional reconstructed images through the area of interest and a cross-section of the posterior mandible through the middle of the prospective implant position were selected for review. The posterior mandibular region was selected for investigation because the anatomy of this site presents a higher risk for dental implant surgery and therefore is of diagnostic importance (Froum et al. 2011). All images were independently reviewed in blinded random order by three separate observers who were experienced dental clinicians capable of providing surgical service for dental implantology and trained to interpret dental CBCT images for the study task. After each observer completed a calibration training session, they evaluated the visibility of anatomical landmark quality and overall image quality as per routine pre-surgical assessment of implant placement. The observers ranked seven anatomical landmark questions (ALQ, Table 1) on a five-point rating scale; 1: definitely it is not clearly visible, 2: probably it is not clearly visible, 3: indecisive whether that is visible or not, 4: probably it is clearly visible, and 5: definitely it is clearly visible. In addition, an overall image quality (OIQ) score for the pre-surgical implant placement image on a three-point rating scale (poor, acceptable and clear) was recorded. A VGA method was used because it was believed that the decision on clinical adequacy or unacceptability of image quality for clinical purposes should authentically rest with the observers. The observers were allowed to record their subjective opinion regarding the visibility of the anatomical landmarks or structures relevant to the clinical indication. The absolute VGA (a score given to an image by the observer based on their experience), was adopted in the study. The absolute VGA is a preferred method for quantification of subjective opinions, since Zarb et al. (2015) have reported that the images were comparable to each other when using the absolute VGA for CT scan optimization on head scanning.

Table 1 Anatomical landmark questions

Image analysis was based on the requirement of the image at the pre-surgical stage of implant placement in the mandible; for the purpose of this study, only the left side of the mandible was assessed. In each view up to four slices were presented so that all ALQs were able to be adequately visualized and assessed. The scores given by all observers were documented for each exposure. Further, the observers made an overall analysis of the images indicating whether the images were acceptable for the diagnostic task on the site of the pre-surgical implant placement.

Images were evaluated under routine clinical viewing conditions in the reporting setting. Observers were instructed to rest their eyes if they felt they were suffering eye strain. In addition, observers were required to replicate their clinical work environment and wear glasses if used as well as changing the display window width and level or using any magnification methods if they typically used them. They were also allowed to adjust the brightness and contrast as they thought appropriate. Each observer was required to assess thirteen acquisitions along with five duplicated and randomly presented slices for testing intra-observer reliability. The inter-observer reliability was measured by comparing the scores between observers. The kappa statistic was used to test the intra- and inter-observer agreement in this study.

Integrated visual grading regression (IVGR)

An IVGR analysis can be formulized as a three-stage protocol. In Stage 1, an integrated score based on certain criteria must be defined. Each image is then assigned an integrated score to represent its overall image quality. In Stage 2, an ordinal regression model is fitted using the integrated score in Stage 1 as the response variable on the explanatory variables in the study. To capture the variabilities of the observers, it is suggested to include the observers as a random effect. In Stage 3, the effects of the explanatory variables are assessed based on the fitted model. This stage varies depending on the field of application and the aim of the experiment. To use IVGR analysis, the integrated scores must be ordinal in nature. If the integrated scores belong to interval or ratio data, VGAS could be used instead. The idea of IVGR is formulized in generic terms below.

Consider an experiment involving \(K\) observers, each was asked to give \(q\) scores for each of \(J\) images. Denote by \({Y}_{ijk}\) the \(i\) th score for the \(j\) th image given by observer \(k\) for \(i=1,\dots ,q;\,j=1,\dots ,J;\,k=1,\dots , K.\) The scales of \({Y}_{i}\) for different \(i\) may differ. The quality of the image is hypothesized to be affected by \(p\) explanatory variables \({X}_{1},{X}_{2},\dots ,{X}_{p}\).

Stage 1: Defining an integrated score

In this stage an integrated score \({Z}_{jk}\) is needed for image \(j\) assessed by observer \(k\) based on the q scores given. In general, \({Z}_{jk}\) is a function of \({Y}_{ijk}\) for all \(i=1,\dots , q\)

$${Z}_{jk} = f({Y}_{1jk},{Y}_{2jk},\dots,{Y}_{qjk})$$
(1)

where \(f\) is a function used to classify the images into T ordinal categories.

For simplicity, assume the categories are labeled by \(1, 2, \dots , T\). The new variable \(Z\) is univariate ordinal in nature. Thus, the dimension of the data can be reduced from \(q\) to one, bypassing the difficulties of handling multivariate ordinal data.

Stage 2: Ordinal regression

Here an outline of ordinal regression is provided. Readers are referred to statistical texts such as Agresti (2007), Powers and Xie (2008) and Kleinbaum et al. (2007) for more details.

As \({Z}_{jk}\) now is univariate ordinal in nature, usual ordinal regression approaches can be used. Common choices of the link functions include logit and probit (Agresti 2007). In the present work focus is on the logit link below as it is often found to be more intuitive and easier to interpret (Dow and Endersby 2004). In particular, one models the natural logarithm of the odds of obtaining \({Z}_{jk}\) not greater than a particular level \(c\) against \({Z}_{jk}\) greater than \(c\) using a regression equation for \(c=1, 2, \dots ,T-1\), assuming there are \(T\) levels. Putting the \(p\) explanatory variables as fixed effects and the observer random effects into the model, the ordinal regression model takes the form

$${\text{logit}}\left[P\left({Z}_{jk}\le c\right)\right]=\mathrm{ln}\left[\frac{P\left({Z}_{jk}\le c\right)}{P\left({Z}_{jk}>c\right)}\right]={\alpha }_{c}-{\sum }_{m=1}^{p}{\beta }_{m}{X}_{mj}-{\delta }_{k}, c=\mathrm{1,2},\dots ,T-1$$
(2)

where \({\alpha }_{c}\) is the threshold parameter, \({\beta }_{m}\) is the coefficient for \({X}_{m}\), often called the effect parameter, and \({\delta }_{k}\) is the random effect for observer \(k\).

If one defines \({Z}_{jk}\) on a binary scale, the ordinal regression model reduces to a logistic regression model. In Eq. (2), all \({X}_{m}\) are assumed to be continuous. If some of them are qualitative factors, indicator variables can be used accordingly. The clmm2 command from the ordinal package of R (R Core Team 2020) is capable of fitting the above model (Christensen 2019, 2015).

Stage 3: Model interpretation

With the fitted parameters, given the values of the explanatory variables, one would calculate the odds ratio or the probability that \({Z}_{jk}\) is classified into a particular category. If \({\beta }_{m}>0\), \(Z\) tends to be higher at higher levels of \({X}_{m}\), when all other explanatory variables remain unchanged. In particular, \(\mathrm{exp}({\beta }_{m})\) represents the odds ratio of \(Z\) being rated at a higher category when \({X}_{m}\) increases by one unit while all other explanatory variables remain unchanged. The threshold parameters \({\alpha }_{c}\) represent the log-odds of \({Z}_{jk}\) being classified into category \(c\) or below when the image is assessed by an ‘average’ observer (so that \({\delta }_{k}\) = 0), and all explanatory variables \({X}_{m}\) equal to zero. In practice, these threshold parameters may have no meaningful interpretation when it may not be sensible to have all \({X}_{m}\) equal to zero. As described in Agresti (2007), the \(\beta {^{\prime}}\) s are usually the parameters of interest. In terms of probabilities, from Eq. (2), one would calculate the probability that \({Z}_{j}\) is greater than \(c\) for any \(c\) between 1 and \(T-1\) (inclusive) when an image \(j\) is assessed by an ‘average’ observer (so that \({\delta }_{k}=0\)) as

$$P\left({Z}_{j}>c\right)=\frac{1}{1+\mathrm{exp}[{\widehat{\alpha }}_{c}-{\sum }_{m=1}^{p}{\widehat{\beta }}_{m}{X}_{mj}]},$$
(3)

where  \(\widehat{\alpha }\) and  \(\widehat{\beta }\) denote the estimates of the corresponding parameters.

As another point of view, assume one wishes to minimize \({X}_{1}\) while maintaining a probability \({p}_{0}\) (where \(0<{p}_{0}<1\)) of the integrated score being greater than \({c}_{0}\), the minimum of \({X}_{1}\) can be found by

$$X_1 ^{\text{min}}=\frac{\hat{\alpha}_{c_0}}{\hat{\beta}_1}-\frac{{\sum }_{m=2}^{p}{\hat{\beta }}_{m}{X}_{mj}}{\hat{\beta}_1}-\frac{1}{\hat{\beta}_1}\ln \left(\frac{1}{p_0}-1\right). $$
(4)

Equation (4) was used to find the minimum radiation dose level in the present application.

Data analysis

As above, denoted by \({Y}_{ijk}\) the score of the \(i\)th question of the \(j\)th image assessed by observer \(k\). Here, \({Y}_{1}\) to \({Y}_{7}\) represented the scores for the seven ALQs, and \({Y}_{8}\) the OIQ score. Out of the eight scores, the last one, \({Y}_{8}\), is perhaps the most important one and has to be treated differently. It is natural to assume that an image should be at least acceptable for clinical use. Therefore, \({Y}_{8}\ge 2\) is required. The integrated image quality for image \(j\) assessed by observer \(k\) on a 4-step scale was defined as follows:

$${Z}_{jk}=\mathrm{max}\left\{0,{\sum }_{i=1}^{7}1\left\{{Y}_{ijk}\ge 4\right\}-4\right\}\times 1\left\{{Y}_{8jk}\ge 2\right\}$$
(5)

where \(1\{A\}\) is the indicator function which takes a value of 1 if the condition \(A\) is satisfied; and a value of 0 otherwise.

It can be easily verified that \({Z}_{jk}\) takes a value from the set \(\{\mathrm{0,1},\mathrm{2,3}\}\). If the image is not acceptable for clinical use (\({Y}_{8jk}<2\)) and/or less than five of the seven anatomical landmarks scored “4” or above, then \({Z}_{jk}=0\), representing a poor integrated image quality. If the image is acceptable for clinical use (\({Y}_{8jk}\ge 2\)), \({Z}_{jk}\) will depend on the number of anatomical landmarks scored “4” or above. Naturally, the more the anatomical landmarks scored “4” or above, the better the image quality reflected by \({Z}_{jk}\). Overall speaking, one could interpret the image quality as “poor” if \({Z}_{jk}=0\); “acceptable” if \({Z}_{jk}=1\); “good” if \({Z}_{jk}=2\); and “excellent” if \({Z}_{jk}=3\).

In general, the concept of integrated score is flexible in the sense that Eq. (5) can be modified easily to cater different needs in various applications. As given in Eq. (1), any sensible choice of function \(f\) could be used. It is possible to include more or fewer ranks, as well as making the criteria more or less stringent. However, caution must be taken especially if one wishes to make the criteria less stringent. In medical studies, it is suggested to define rules which are tighter rather than looser.

Without having other explanatory variables, the level of dose is the sole explanatory variable, denoted by \(X\). Natural logarithm transformation was applied on \(X\), as suggested by Smedby et al. (2013). With observers considered as the random effects, the ordinal regression model admits the form

$${\text{logit}}\left[P\left({Z}_{jk}\le c\right)\right]={\alpha }_{c}-\beta \mathrm{ln}\left({X}_{j}\right)-{\delta }_{k}, c=0, 1, 2.$$
(6)

It is demanded to have a probability of at least 75% for an image being classified as at least acceptable (Jones et al. 2015; Favazza et al. 2015; Schaefferkoetter et al. 2015; Prasad et al. 2002), namely \(P\left({Z}_{jk}>0\right)=P\left({Z}_{jk}\ge 1\right)\ge 0.75\), when the image is assessed by an ‘average’ observer. In other words, following Eq. (4), the minimum level of dose required can be calculated as

$${X}_{min}=\mathrm{exp}\left[\frac{{\widehat{\alpha }}_{0}}{\widehat{\beta }}-\frac{1}{\widehat{\beta }}\mathrm{ln}\left(\frac{1}{0.75}-1\right)\right]$$
(7)

If another probability of detection threshold (e.g., 50%) is used, one can replace the number of 0.75 with the corresponding value in Eq. (7).

Results

As described earlier, the main purpose of the paper is to apply IVGR to CBCT and investigate the potential for dose reduction while maintaining acceptable image quality. Following this, the exposure parameters and observers’ scores are presented in Table 2, and the integrated scores against the logarithm of dose level according to the observers in Fig. 2. In general, a higher dose level led to a higher integrated score even though low scores were occasionally given by Observer 1 for high dose levels and high scores were sometimes given by Observer 3 for low dose settings. The kappa value of the intra-observer reliability for each observer is displayed in Table 3. These values indicate moderate to substantial agreements (Landis and Koch, 1977). The kappa values of the inter-observer reliability ranged from 0.261 to 0.468 (Table 3). These values indicate fair to moderate agreements (Landis and Koch 1977). As visual grading is a subjective task, it is natural to see a lower inter-observer reliability (Lee et al. 2019), compared to intra-observer reliability. The between-observer variabilities were captured as random effects in the model.

Table 2 Exposure parameters and observer scores (N: the number of anatomical landmarks being scored as 4—probably it is clearly visible, or 5—definitely it is clearly visible; OIQ: the overall image quality score based on the three-point rating scale)
Fig. 2
figure 2

Plot of integrated scores against logarithm of dose level by observers

Table 3 Kappa values demonstrating the intra-observer (diagonal) and inter-observer (off-diagonal) agreements

Table 4 shows the estimated parameters for the ordinal regression model described by Eq. (6). From the p-values, both \({\alpha }_{2}\) and \(\beta\) are significant at 5%. Since \(\mathrm{ln}(X)\) cannot be 0, the estimated threshold parameters have no meaningful interpretation. However, these were used to derive the probability of an image having a particular integrated score. The coefficient of \(\mathrm{ln}(X)\) is positive indicating that a higher level of dose increased the image quality, id est, the image is more likely to be classified in higher categories. Specifically, when \(\mathrm{ln}(X)\) is increased by 1 (that is, when the level of dose is multiplied by \(\mathrm{exp}\left(1\right)=2.718\)), the odds ratio of \({Z}_{j}\) being rated in a higher category is \(\mathrm{exp}\left(1.371\right)=3.940\) (95% confidence interval (CI): 1.016–15.280).

Table 4 Result of ordinal regression analysis

Figure 3 shows the plot of \(P\left({Z}_{jk}\ge 1\right)\) against the dose level. The minimum dose level required to achieve a probability of 75% for an image being classified as at least acceptable, when assessed by an ‘average’ observer (that is, when \({\delta }_{k}=0\)) is 1,346.91 mGy·cm2 DAP, a 31% reduction compared to 1,962 mGy·cm2 DAP, which is the default dosage of the CBCT unit used (Al-Humairi et al. 2016a).

Fig. 3
figure 3

Plot of probability of \({Z}_{jk}\) being greater than or equal to 1 against the dose level; DAP dose area product

The estimated random effects can also be extracted from the model. Figure 4 shows the estimated modes and the 95% CIs for each of the observers. Among the observers, Observer 1 tends to give the lowest rating while Observer 3 tends to give the highest rating. Again, such a discrepancy is not unexpected as each person perceives visibility differently.

Fig. 4
figure 4

Observer effects given by 95% confidence intervals based on the estimated variances of random effects

Discussion

Integrated visual grading regression (IVGR)

This study has reported for the first time the use of a statistically feasible IVGR method to analyze the multivariate ordinal data of subjective image quality in the context of CBCT clinical pre-surgical planning for dental implant placement. The relevance of human perception and cognition was highlighted by this work. Even though some researchers have assessed subjective image quality in CBCT dental implantology imaging, none of them has applied IVGR to manage the ordinal data obtained from the observers’ grading scores (Lofthag-Hansen et al. 2011; Dawood et al. 2012; Alawaji et al. 2018; Park et al. 2019; Shelley et al. 2011). Previous studies have had different focuses such as the effects of exposure parameters (Lofthag-Hansen et al. 2011; Dawood et al. 2012; Alawaji et al. 2018; Park et al. 2019), FOVs (Lofthag-Hansen et al. 2011), and imaging systems (Shelley et al. 2011) on subjective image quality. Although higher inter-observer agreements have been reported by Shelley et al (2011) and Park et al (2019), their papers and Lofthag-Hansen et al (2011) also presented inconsistent intra-observer agreements among the observers, similar to the present data. This study also suggested a 31% reduction of the CBCT radiation dose from the manufacturer’s recommendation and this agreed with Alawaji et al (2018) who considered the possibility of reducing dose by 30%. Dawood et al (2012) estimated a dose reduction up to 87.5% from the default setting even though the utilization of three-dimensional reconstruction would be compromised. While the authors of the present paper acknowledge the contribution of the earlier papers on optimization of radiation dose in dental implantology imaging, the present study has added the value of the IVGR method to this field.

In the present model, the dose value of 1346.91 mGy·cm2 DAP was predicted under the assumption of 75% probability for an image being classified at least acceptable for diagnosis. In clinical ROC studies, a value of Az = 0.75 is generally accepted as a common value and anything above (Az > 0.75) is considered as superior (Metz 1989). In psychophysics studies, a 50% probability of detection is generally considered as the threshold (Krantz 2012). The predicted dose value would be much lower than the DAP value of 1346.91 if the threshold probability was set at 50% in the present model. The predicted dose value is therefore not a threshold dose but an indicative dose value that is acceptable for clinical practice. Acceptable image quality, that supports clear identification of anatomical structures as well as the morphology, dimension and quality of the bone, is required for development of an acceptable image quality protocol. Development of such a protocol has further potential to promote dose reduction (SEDENTEXCT 2012). This is a reason why the image quality of selected anatomical landmarks as well as trabecular and cortical bone were evaluated in the present study. An earlier paper has also reported the influence of optimization protocols on the associated image quality of cortical and trabecular patterns (Koizumi et al. 2010). In general, image quality is assessed with established criteria for the visibility of key landmarks (Attard et al. 2018). While objective methods are repeatable, defining the image quality is clinical task based (Barrett et al. 2015). Owing to a gap between subjective and objective assessment methods for image quality analysis (Hidalgo-Rivas et al. 2014), visibility of anatomical landmarks alone cannot be considered as an adequate performance indicator (Zanca et al. 2012). The present study aimed to explore potential for implementation of an IVGR assessment method on maxillofacial CBCT images but not to suggest this as a superior substitute for objective assessment methods.

Statistical considerations

The key to the success of visual grading experiments is defining an integrated score. In general, clinical image quality is criteria based, and there is considerable known inter- and intra-observer variability in VGA for specific criteria of a specific image, which is the main challenge in employing VGA in quantifying clinical image quality. Undoubtedly, ordinal regression should be employed in VGA for ordinal data. It is common to incorporate random effects into the model to capture the heterogeneity between the observers, especially when large variability exists (Drikvandi and Noorian 2019).

Smedby and Fredrikson (2010) stated that it is not statistically acceptable to use a common statistical method relying on least-squares estimates such as t-test and ANOVA on ordinal type data. VGA is an easy and straightforward approach, but the statistical analysis of the scoring data has some limitations and there is a lack of consistency in the choice of the methods. The scoring data are usually non-linear numerical values and consequently they do not fit the parametric statistical methods such as ANOVA. To address the issue, the methods of non-parametric visual grading (Båth and Månsson 2007) and VGR (Zarb et al. 2015; Zheng et al. 2016; Smedby and Fredrikson 2010; Smedby et al. 2013; Saffari et al. 2015) provide easier and more practical applications which have been proposed by previous researchers. VGR methodology is in agreement with the present study developed using methodology which introduces the concept of IIQ by adding a clinical question (an OIQ that evaluated whether the overall image quality was acceptable for a pre-surgical assessment of a dental implant) to it. VGR methodology can be used for multivariate ordinal regression, such as various anatomical landmarks and overall scores in this study. The difficulty of using general VGR in the present study is that the individual landmark scores and overall scores are not generally independent from each other. The IIQ was thus to select the critical components of the multivariate scores to form a single integrated score for the VGR.

Image quality

Linear, logarithmic, or logistic (non-linear) functions have all been reported in the literature for the relationship between diagnostic image quality and dose. The logistic function can be considered as the united function for all of them (Zheng 2017). Psychometric factors affect the evaluation of image quality because image quality involves the interaction between human attitude and image detail. The observer will score the image attributes in relation to their agreement about whether they are clearly visualized. This is called the self-efficacy theory, which was reinforced and linked to image quality as described by Mraity et al (2014). The present approach of validation of the psychometric scale of image quality and developing an image quality method that answers the principal clinical questions agrees with this theory. A psychometric approach explains the other application of the proposed method to eliminate the disagreement between observers and link it to the psychometric approach while answering the clinical questions. Apart from dose levels, the proposed approach is flexible in the sense that other psychometric factors such as decision thresholds can be included in the model as extra explanatory variables, provided that they were recorded during the experiments.

The optimization methods in dental radiology focus on providing an image that fits the clinical purpose adequately while minimizing radiation exposure to the patient. An image quality index was defined using a descriptor in an ordinal scale based on subjective evaluation of the visual data contained within the image. Therefore, it is widely agreed that the term adequate or acceptable image quality indicates a satisfactory answer to the primary clinical question (Månsson 2000; Båth and Månsson 2007).

Limitations and future directions

A limitation of this study is the small number of observers used. Although it would be ideal to determine the minimum number of observers required for dose optimization, in the statistics literature, practical methods for estimating the minimum sample size for general ordinal regression problems with random effects are yet to be established. Based on simulations, Ali et al. (2016) recommended using a minimum of 50 groups (observers in the present application) to achieve a power of 80% or above; and Bauer and Sterba (2011) suggested that the maximum likelihood estimators were least biased for data with at least 100 groups. However, the recruitment of such a large number of observers in visual grading experiments is not practical. Moreover, it was challenging to recruit fully registered specialists to participate in the current study due to the significant time required for observation and scoring. As a recent study has used only three observers (Almashraqi et al. 2017), with adequate pre-training provided to the observers, the small number is considered acceptable for this research purpose. Intra- and inter-observer agreements reported in this study were also consistent with those of previous studies (Hidalgo Rivas et al. 2015; Heetveld et al. 2005). Although the present observers were experienced clinicians specializing in dental implantology and completed a calibration training session prior to participation, more extensive trainings may improve the agreements.

Further, the conduction of this experiment on a single device only, the use of a single phantom only and the large FOV, are also limitations of this study. The use of a large FOV to the maxillofacial area replicates some clinical scenarios where an evaluation of several edentulous areas for implant placement is indicated, in preference to multiple radiation exposures. As smaller FOVs are more often used in dental implantology, these should also be considered in future studies.

The quality of a radiographic image is essentially determined by the observers’ opinions (Sund et al. 2004) which are based on their individual experience and other technical parameters. A previous study emphasized observer perception and cognition as a relevant factor in image quality assessment (Kundel 2015). Image perception is considered as a unified realization of the contents of the image signal (displayed image) and cognition is the ability to explain the connotation of the displayed images in the context of medical scenarios. The psychological (human visualization and perception) and physical (anatomical landmark) elements combine to inform the evaluation of the image (Rossmann and Wiley 1970). As making a clinical judgment is a complex decision-making process, superior resolution of anatomical and physical landmarks on the displayed image can influence observer variability by focusing the observer on certain structures within the image (Thornbury et al. 1978). As shown in Fig. 2, a high dose level may not necessarily result in higher integrated score. On one hand, this may be due to the subjective nature of the evaluation tasks—observers’ own personal likings. On the other hand, it may indicate that the dose level is not the sole explanator. While the effect of dose was found to be statistically significant, the small number of observers led to a relatively large standard error, causing wide confidence intervals for the effect and threshold parameters. Thus, to validate the dose level proposed in this study, a larger scale investigation using more observers and additional explanatory variables such as extra anatomical landmarks and pathologies is indicated.

As medical imaging is an essential tool used in the diagnosis and treatment planning of various health conditions (Sakata-Goto et al. 2012; Spuur 2019; Tanny et al. 2018), IVGR may be useful in creating optimization protocols to further benefit the safety of patients by establishing minimum acceptable dose levels for diagnostic imaging and reporting in other medical imaging modalities. Future investigations including the fields of orthopedics, mammography, traumatology and orthodontics are indicated.

Conclusion

This study has reported a preliminary and achievable application of IVGR in CBCT dental implantology imaging. Within the limitations of this study, the authors have highlighted the conundrum of the putative statistical analysis of visual grading scoring. Therefore, part of the conclusion of this study clarifies the feasibility of the derived IVGR method. With a 31% dose reduction estimated, this study has also demonstrated that IVGR can be a valuable method for dose minimization, which may be used in the future to predict optimization methods for specific clinical tasks and develop low dose protocols. This conclusion is pertinent for clinicians and researchers, as it highlights the need to underpin research methodology with carefully controlled experiments for the potential reduction of radiation dose. Further investigations with more human observers are indicated to validate the IVGR model in other clinical applications including conventional CT and planar radiographic imaging.