Keywords

1 Introduction

Recent technological advances including new imaging modalities as well as storing, sharing and computing resources have facilitated the collection of very large amounts of three-dimensional medical data [1]. In this scenario shape and texture analysis of such data has been receiving increasing attention during the last few years. The overall objective is that of extracting quantitative parameters from the imaging data (biomarkers) capable of correlating with clinical features such as disease phenotype and/or survival. The whole process, usually referred to as radiomics, can be regarded as an improvement on the traditional practice wherein medical images were mostly used as pictures for qualitative visual interpretation only [2, 3]. In the management of oncologic disorders, for instance, a number studies have supported the use of radiomics for a variety of tasks including prediction of outcome [4, 5] and response to treatment [6, 7]; discrimination between benign, malignant, primary and metastatic lesions [8,9,10]; and classification of hystologic subtypes [11].

Radiomics, however, is still a new discipline and definitely far from being mature. There are significant obstacles that prevent the application on a large scale – chief among them the lack of large enough datasets for building models and classifiers, and the absence of standards establishing how the biomarkers should be computed [12]. The objective of this paper is to provide an overview of the steps involved, discuss the open issues and indicate directions for future research. A significant part of the paper deals with the description of the most common texture and shape features used in the literature.

2 Methods

The flow-chart of Fig. 1 summarises the overall workflow in radiomics. Image acquisition is always the first step and can optionally be followed by a post-processing phase. Segmentation is then required to separate the region of interest (ROI) from the background. Feature extraction is the core of the procedure and consists of extracting a set of meaningful parameters (features) from the ROI. The features can undergo some post-processing step as for instance selection and/or reduction. Finally, the resulting data are fed to some classifier or regression model suitable for the required task.

Fig. 1.
figure 1

The overall pipeline. Dashed lines indicate optional steps.

2.1 Image Acquisition

There are three main classes of medical imaging modalities providing three-dimensional data [13]: Computed Tomography (CT), Positron Emission Tomography (PET) and Magnetic Resonance Imaging (MRI).

Computed Tomography is based on the unlike absorption of X-rays by different tissue types, therefore the signal is proportional to the tissue density in this case [14]. Positron Emission Tomography estimates the metabolic activity of the tissue by measuring the radioactive decay of some specific radio-tracers. Those used in PET contain isotopes (e.g. \(^{11}C\), \(^{15}O\) and \(^{18}F\)) which emit positrons through \(\beta {+}\) decay. The positrons collide and annihilate with the electrons in the tissue, this way emitting two \(\gamma \) rays 180\(^{\circ }\) apart that are detected by the sensors [15]. Finally, in Magnetic Resonance Imaging the signal comes from positrons (hydrogen nuclei) contained in water and lipids. The signal in this case is proportional relaxation time – i.e. the time to return to the equilibrium magnetization state once the external magnetisation field is switched off [16].

In all the imaging modalities the scanning usually proceeds axially (head to feet), this way producing, as a result, a variable number of axial cross-sections with fixed size (slices). A three-dimensional voxel model is eventually reconstructed by piling up all the slices.

2.2 Pre-processing

Pre-processing usually involves one or more of the following operations: (1) windowing (rescaling), (2) filtering and (3) resampling. Although frequently overlooked, pre-processing is a fundamental step in the pipeline with significant effects on the overall results, as for instance shown in [17, 18].

Windowing consists of applying an upper and lower threshold to the raw intensity values returned by the scans, this way excluding from the analysis those values that fall outside the range. In CT, for instance, windowing is routinely used to exclude from the analysis those anatomic parts (e.g. bones) that are reputed not relevant to the disease investigated.

Filtering can be carried out either to reduce noise and/or highlight features at different spatial scales. A variety of methods can be used for this purpose, as for instance Butterworth smoothing [18], Gaussian [19] and Laplacian of Gaussian [7, 20] filters.

Resampling involves changing the number of bits (bit depth) used for encoding the intensity values. The bit depth of the raw data depends on the scanning device and settings used (12 and 16 bit are standard values in the practice). These are usually reduced to lower values (downasmpling) before feature extraction: eight, six and four bit and are common choices [6, 18, 21, 22].

2.3 Segmentation

The objective of segmentation is that of identifying the part of the scan (ROI) that is considered relevant to the analysis. A ROI usually represents a clinically relevant region, as for instance a potentially cancerous lesion (Fig. 2). Segmentation is a crucial step, for it provides the input to the subsequent phases. Unfortunately, this is also a tedious and time-consuming procedure. Although a number of methods have been investigated for automatising the process – these include, among the others thresholding [23, 24], region growing [23,24,25], edge detection [23, 24] and convolutional networks [26, 27] – segmentation remains by and large a manual procedure in which the experience and sensitivity of the physician play a major role. Besides, the decision whether to include or exclude dubious areas such as necrosis, atelectasis, inflammation and/or oedema is essentially the clinician’s responsibility and, as such, hard to automatise.

Fig. 2.
figure 2

CT (left) and PET (right) scans of a lung lesion with manually delineated ROI.

2.4 Feature Extraction

Feature extraction can be considered the ‘core’ of the whole procedure and consists of computing meaningful parameters from the regions of interest. There are two main strategies to feature extraction: the ‘hand-designed’ or ‘hand-crafted’ paradigm one the hand, and Deep Learning on the other.

In the hand designed approach the functions for feature extraction (also indicated as image descriptors) are mostly designed by hand, the design process being based on some prior knowledge about filtering, perceptual models and/or relatively intuitive visual properties (e.g. coarseness, business, contrast, etc.) This model-driven, ‘a priori’ paradigm is independent on the data to analyse. By contrast, Deep Learning is a data-driven, ‘a posteriori’ strategy in which the descriptors are essentially shaped by the data. The feature extractors, in this case, are based on sets of combinable blocks (layers) of which only the overall skeleton is defined a priori, and their behaviour depends on lots of free parameters whose values need to be determined by training over huge sets of data. In this paper we are mostly concerned with the hand-design paradigm; for an overview of Deep Learning and its potential applications in the field we refer the reader to Refs. [28,29,30].

Regardless the method used, there are some desirable properties that one would always expect from features. First, they should be discriminative, i.e.: they should enable good separation among the classes involved in the problem investigated (e.g. classification benign vs. malignant). Second, they should be interpretable on the basis of some physical characteristics (e.g. round/elongated, coarse/fine, etc.) Third, they should be few: this, again, facilitates interpretation, limits the computational overhead and reduces the chances of overfitting. Here below we briefly review some of the most common shape and texture features used in radiomics.

Shape Features. Shape features have been investigated as potential biomarkers for a range of diseases. In oncologic disorders, for instance, lesions presenting ill-defined (‘spiculated’) borders are considered suggestive of malignancy, aggressiveness and in general worse prognosis; whereas those with regular, well defined margins are more frequently indicative of benign or less aggressive lesions [31,32,33]. For a quantitative evaluation of shape different parameters have been proposed – among them compactness, spherical disproportion, sphericity and surface-to-volume ratio (Eqs. 14). In formulas, indicated with A the surface area of the ROI, V the volume and R the radius of a sphere with volume V we have:

$$\begin{aligned} \text {compactness} = \frac{V}{\sqrt{\pi }A^{2/3}} \end{aligned}$$
(1)
$$\begin{aligned} \text {spherical dispr.} = \frac{A}{4\pi R^2} \end{aligned}$$
(2)
$$\begin{aligned} \text {sphericity} = \frac{\pi ^{1/3} \left( 6V \right) ^{2/3}}{A} \end{aligned}$$
(3)
$$\begin{aligned} \text {surface-to-vol. ratio} = \frac{A}{V} \end{aligned}$$
(4)

Compactness, spherical disproportion and surface-to-volume ratio from CT, for instance, were found predictive of malignancy in lung lesions [34]; surface-to-volume ratio from MRI showed potential to differentiate between clinically significant and non-significant prostate cancer [31]; and functional sphericity from PET images correlated with clinical outcome in non-small-cell lung cancer [32].

Texture Features

Basic Statistics. These are parameters that can be computed directly from the raw data with no further processing. Resampling is not required. They include: mean, maximum, median, range, standard deviation, skewness and kurtosis (for definitions and formulae see also [35]). All these features are by definition invariant to geometric transformations of the input data such as rotation, mirroring, scaling and/or voxel permutation. Most of these features are also rather intuitive and their implementation straightforward.

Histogram-Based Features. This kind of features are derived from the probability distribution (histogram) of the intensity levels within the ROI. Features like energy (Eq. 5 – sometimes also referred to as uniformity) and entropy (Eq. 6) are routinely used for assessing the ‘heterogeneity’ of tumour lesions. There is indeed evidence that higher heterogeneity may correlate with worse overall prognosis and response to treatment [36,37,38,39]. Histogram-based statistics are invariant to geometric transformations of the input data – just as basic statistics are – but they heavily depend on the resampling scheme used. In formulas, given N the number of quantisation levels and p the probability of occurrence of the i-th intensity level, we have:

$$\begin{aligned} \text {energy}_\text {H} = \sum _{i=0}^{N-1} \left[ p \left( i \right) \right] ^2 \end{aligned}$$
(5)
$$\begin{aligned} \text {entropy}_\text {H} = \sum _{i=0}^{N-1} p \left( i \right) \text {log}_2 \left[ p \left( i \right) \right] \end{aligned}$$
(6)

where entropy is expressed in bits in this case. Subscript ‘H’ is used to indicate that the features are computed from histograms and to differentiate them from those computed from co-occurrence matrices (see below).

Grey-Level Co-occurrence Matrices. Co-occurrence matrices (GLCM) represent the two-dimensional joint distribution of the intensity levels between pairs of voxels separated by a given displacement vector. By changing the orientation and the length of the vector GLCM can probe the local signal variation at different scales and orientations. Co-occurrence matrices, a classic tools in texture analysis, were originally designed for planar images [40] but their extension to three-dimensional data is straightforward [41]. In this case there are 26 possible orientations for a given scale and as many GLCM, of which, however, only 13 non-redundant. A GLCM with values mainly clustered around the main diagonal will indicate a texture with low variability; a highly dispersed matrix will be characteristic of a variable texture. To capture this behaviour one usually extracts some global parameters from the GLCM, as for instance contrast, energy, entropy and homogeneity (Eqs. 710). Again, these have shown potential as clinical biomarkers in a number of studies [5, 6, 22, 42]. Indicated with i and j the indices of the two voxels separated by a given displacement vector, we have:

$$\begin{aligned} \text {contr.}_\text {CM} = \frac{ \sum _{i=0}^{N-1} \sum _{j=0}^{N-1} \left( i-j \right) ^2 p \left( i,j \right) }{\left( N-1\right) ^2} \end{aligned}$$
(7)
$$\begin{aligned} \text {energy}_\text {CM} = \sum _{i=0}^{N-1} \sum _{i=0}^{N-1} \left[ p \left( i,j \right) \right] ^2 \end{aligned}$$
(8)
$$\begin{aligned} \text {entr.}_\text {CM} = \sum _{i=0}^{N-1} \sum _{j=0}^{N-1} p \left( i,j \right) \text {log}_2 \left[ p \left( i,j \right) \right] \end{aligned}$$
(9)
$$\begin{aligned} \text {hom.}_\text {CM} = \sum _{i=0}^{N-1} \sum _{j=0}^{N-1} \frac{p \left( i,j \right) }{1 + \left| i-j \right| } \end{aligned}$$
(10)

Other Texture Features. Texture analysis has been an area of intense research for more than forty years, and, as a results, the amount of available methods is huge. Among those that have received attention in the field of radiomics are: neighbourhood grey-tone difference matrices (NGTDM [6, 21, 22, 43]), grey-level run-length matrices (GLRLM [21, 22, 44]), Local Binary Patterns (LBP [17, 45]), Laws’ masks [46, 47] and wavelets [48, 49]. For definitions and further details we refer the reader to the given references.

2.5 Post-processing

The features returned by the extraction phase can undergo further processing to (a) reduce their number and (b) increase their discrimination capability. The main strategies to achieve this goal are feature selection and feature generation [50]. The first aims at identifying the most discriminative features so as to reduce their overall number while retaining as much information as possible. This is particularly important in radiomics, where some shape and texture features tend to be highly correlated to each other, as recently shown in [51]. The second consists of generating new features from the original ones via some suitable transformation, as for instance Principal Component Analysis (PCA) and Independent Component Analysis (ICA) [50].

2.6 Data Analysis/Classification

The last step consists of feeding the features to a classifier to make predictions about the disease type (computer-assisted diagnosis) and/or the clinical outlook (prognostication). To this end suitable machine learning models and large enough sets of labelled data (train set) are required. As for the model, one can choose among a number of different solutions (e.g. linear classifiers, Support Vector Machines, Classification Trees, neural networks and/or a combination of them [50, 52]): the main problem is selecting the right model for the specific task. Getting the right data for training, however, can be rather hard, for it requires finding large enough sets of manually classified/annotated clinical records. For prognostication the data need also to be longitudinal, which implies following up on a cohort of patients for a long period of time.

3 Discussion

A number of recent studies have advocated the use of shape and texture descriptors as potential image biomarkers in the diagnosis and treatment of a range of disorders. Still, clear advantages for patient management are yet to be demonstrated above and beyond traditional imaging techniques and basic biomarkers [37]. From a technical standpoint there are at least three major hurdles that still limit the applicability of radiomics on a large scale. The first is the lack of standardisation in all the steps detailed in Sect. 2. Differences in the imaging acquisition parameters, feature definition and naming conventions, pre- and post-processing procedures are all sources of variations that reduce reproducibility, increase artefacts and eventually lead to biased results [36]. To overcome this problem, an important standardisation initiative is in progress [12]. The second is the lack of large enough datasets of imaging data to train the classifiers. To fill this gap, international, open access repositories are being developed [53]. The third is that some steps of the pipeline (e.g. lesion segmentation) still rely too heavily on human intervention, which may potentially lead to biased results and low reproducibility [54].

4 Conclusions

Shape and texture features from three-dimensional biomedical data have been attracting much research interest in recent years. It is believed that such quantitative imaging features may help uncover patterns that would otherwise go unnoticed to the human eye. In this paper we have provided an overview about the methods, issues and perspectives. On the whole it is clear that radiomics has the potential to improve the patient management in a number of diseases, but there are still significant obstacles along the way – chief among them the absence of standardisation and the lack of large datasets of clinical data.