1 Introduction

Microscopic image processing has become an important research area [12, 28] in recent years. Follicular lymphoma (FL) is a group of malignancies of lymphocyte origin that arise from lymph nodes, spleen, and bone marrow in the lymphatic system in most cases. It is the second most common non-Hodgkins lymphoma [6]. Characteristic of FL is the presence of a follicular or nodular pattern of growth presented by follicle center B cells consisting of centrocytes and centroblasts. World Health Organization’s (WHO) histological grading process of FL depends on the number of centroblasts counted within representative follicles, resulting in three grades with increasing severity [10]:

  1. Grade 1:

    0–5 centroblasts (CBs) per high-power field (HPF),

  2. Grade 2:

    6–15 centroblasts per HPF, and

  3. Grade 3:

    More than 15 centroblasts per HPF.

While grades one and two are considered indolent, with long average survival rates and no needs of chemotherapy, grade three is an aggressive disease. It is rapidly fatal if not immediately treated with aggressive chemotherapy [21]. Therefore, accurate grading of follicular lymphoma images is of course essential to the optimal choice of treatment. In FL grading problem, human experts manually count the centroblasts in an HPF image. This is obviously time consuming. Some computerized methods mimic this approach [16, 19, 25]. Instead of counting the centroblasts individually, we can treat images as textures and try to classify the texture formed by centroblasts in this article. Recently, Suhre proposed a two-level classification tree using sparsity-smoothed Bayesian classifier and reported very high accuracy [27].

Dataset used in [27] and CERTH-AUTH database [20] are used in this paper. First dataset consists of 90 images for each of three grades of follicular lymphoma. CERTH-AUTH database consists of nine images for grade two and five images of grade three of follicular lymphoma. Examples of grades one, two, and three images are presented in Fig. 1.

Fig. 1
figure 1

Example images for grades one, two, and three of follicular lymphoma. a Grade 1, b Grade 2, c Grade 3

In Sect. 2, the proposed multi-scale directional filtering approach is reviewed. In Sect. 3, the proposed feature extraction scheme using directional filterbank outputs are described. In Sect. 4, experimental results are presented.

2 Directional filtering framework

Directional filtering is a new framework developed in this paper. In this framework, a one-dimensional (1D) prototype filter with impulse response \(f_h\) with order \(N\) is rotated in 2D to filter images in various directions. In this way, a bank of filters are obtained by rotating \(f_h\) along a set of angles parameterized by \(\theta \).

To obtain a directional filterbank, the high-pass \(f_h\) of a wavelet filterbank is rotated along various directions. Instead of rotating \(f_h\) by bilinear (or cubic) interpolation, we use the following method: For a specific angle \(\theta \), we draw a line \(l\) going through the origin \((l: y=\tan {\theta }x)\) and determine the coefficients of the rotated filter \(f_\theta (i,j)\) proportional to the length of the line segment within each pixel \((i,j)\), which is denoted by \(|l_{i,j}|\). For odd \(N,\, f_0(0)\) is exactly the center of rotation, and therefore, value of \(f_0(0)\) does not change in \(f_\theta (0,0)\). Therefore, we take the line segment in origin pixel \(|l_{0,0}|\) as reference (\(|FG|\) in Fig. 2b). For \(\theta \le 45^\circ \), \(|l_{0,0}| = \frac{1}{\cos {\theta }}\), assuming each pixel is of unit side. For each pixel in column \(j\) in the grid, we calculate the \(f_\theta (i,j)\) as:

$$\begin{aligned} f_\theta (i,j)=f_h(i) \times \frac{|l_{i,j}|}{|l_{0,0}|} \end{aligned}$$

This approach is also used in computerized tomography [8].

Fig. 2
figure 2

Filter rotation process for the Lagrange à trous filter. a \(f_h(i,j)\), b Line \(\theta =\arctan {(1/2)} = 26.565^\circ \), c lengths of line segments in each pixel of the rectangular grid, d resulting directional filter \(f_{26.56^\circ }\)

Calculating the line segment \(|l_{i,j}|\) is straightforward. To rotate the filter for \(\theta \le 45^\circ \) (which corresponds to \(N_v \le 1\)), we place \(f_0\) to the vertical center of a \(N \times N\) grid, where \(C_X(i,j)\) and \(C_Y(i,j)\) are the coordinates of the center of cell with the horizontal index \(i=0,\ldots ,N-1\), and the vertical index \(j=0,\ldots ,N-1\). Then, we construct a line \(l\) along the desired direction where the bisector of the line is the exact center of the grid (which is also the center of filter). For every pixel of the grid, we calculate the rotated filter coefficients as:

$$\begin{aligned} f_\theta (i,j)&= f_h(i,0) \times \frac{\sin {\theta }}{2} \nonumber \\&\times \left( \min \left\{ C_x(i,j)+0.5,\frac{C_y(i,j)+0.5}{\tan {\theta }}\right\} ^2 \right. \nonumber \\&\left. -\max \left\{ C_x(i,j)-0.5,\frac{C_y(i,j)-0.5}{\tan {\theta }}\right\} ^2\right) \end{aligned}$$
(1)

To rotate the filter for \(\theta \ge 45^\circ \), we first rotate the filter \(90^\circ -\theta \) then transpose \(f_{90^\circ -\theta }\) to get \(f_{\theta }\). Note that this method of rotation does not change the DC response of the original filter, because \(\sum _{i,j}f_\theta (i,j) = \sum _{k}f_0(k)\).

Resulting filters at angles \(\theta =\{0^\circ ,\, \pm 26.56^\circ ,\, \pm 45^\circ \), \(\pm 63.43^\circ ,\, 90^\circ \}\) for the filter \(f_h\) (Fig. 2a) form a directional filter bank are shown in the first row of Table 1. The number of nonzero filter coefficients are larger in bilinear interpolation resulting a higher computational cost compared to the proposed approach. Furthermore, the frequency responses of the proposed filters are smoother than those of bilinear-based methods as shown in Fig. 3. These directional filters are used in a multi-resolution framework for feature extraction. For the first scale, directional images can be extracted by convolving the input image with this filter bank. The mean and the standard deviation of these directional images are used as the directional feature values of the image (other statistics can also be used). To obtain direction feature values at lower scales, the original image is low-pass-filtered and decimated by a factor of two horizontally and vertically and a low–low subimage is obtained. Since downsampling is a shift variant process, we also introduce a half-sample delay before downsampling. To implement this, we downsample two shifted versions of input image (corresponding to \((\Delta x, \Delta y) =\{(0,0),(1,1)\}\)), filter the two downsampled images using our directional filter bank, and fuse the outputs to construct one output image per filter in directional filter bank. Fusion method used in article is simply taking square of images, summing them, and taking the square root of the sum.

Fig. 3
figure 3

Frequency responses of directional filters at various orientations obtained by proposed method (a, c and e) and bilinear interpolation (b, d, and f). Proposed method produces smoother frequency responses. a Directional filter \((\theta =0^\circ )\), b rotational filter \((\theta =0^\circ )\), c directional filter \((\theta =45^\circ )\), d rotational filter \((\theta =45^\circ )\), e directional filter \((\theta =63^\circ )\), f rotational filter \((\theta =63^\circ )\)

Table 1 Directional filters for \(\theta =\{0^\circ , \pm 26.56^\circ , \pm 45^\circ , \pm 63.43^\circ , 90^\circ \}\) obtained using proposed method (first column) and bilinear interpolation (second column), respectively

A variant of this multi-scale filtering framework uses four shifted versions instead of two (corresponding to \((\Delta x, \Delta y) =\{(0,0),(1,0),(0,1),(1,1)\}\)). Although this increases the accuracy by average \(1\,\%\), it also doubles the computational complexity. This speed vs. accuracy trade-off should be evaluated for potential applications.

The low-pass filter \(f_l\) used in directional filterbank can be the low-pass filter of a wavelet filter bank. In this case, it can be an ordinary half-band filter. The low–low sub-image can be filtered by directional filters to obtain the second level directional subimages and corresponding feature values. This process can be repeated several times depending on the nature of input images. The filtering flow diagram is shown in Fig. 4.

Fig. 4
figure 4

Flowchart of directional filtering framework. In this article, one low-pass \(f_l\) and eight directional high-pass filters with \(\theta =\{0^\circ , \pm 26.56^\circ , \pm 45^\circ , \pm 63.43^\circ , 90^\circ \}\) are used for image analysis

The proposed directional filterbank design is different from Do and Vetterli’s filterbank [5], where directional filters are obtained from filters of a quincux filterbank using modulations and rotations by resampling matrices. Other directional and quincux filterbanks include [1, 2, 7, 13], but none of them uses Herman and Kuba’s directional interpolation approach. In our experiments, we use directional filters in three scales, and \(\theta =\{0^\circ ,\pm 26.56^\circ , \pm 45^\circ , \pm 63.43^\circ ,90^\circ \}\). The low-pass filter is the half-band filter \(f_l=[0.25~0.5~0.25]\) and the high-pass filter Kingsburys \(8th\) order q-shift analysis filter [15]: \(f_0=[-0.0808~0~0.4155~-0.5376~0.1653~0.0624~0-0.0248]\).

In Fig. 4, \(f_0\) is the 2D version of \(f_l\) and \(f_{\theta 1=0}, f_{\theta 2=26.56}\), \(f_{\theta 3=45}, f_{\theta 4=63.43}, f_{\theta 5=90}, f_{\theta 6=-26.56}, f_{\theta 7=-45}, f_{\theta 8=-63.43}\) are the rotated high-pass filters obtained from Kingsbury’s filter \(f_h\).

3 Feature extraction and classification

Since images in this dataset are of relatively uniform texture, there is no need to segment the images prior to feature extraction. Also, it is not possible to have two different grades of FL in an image, so we produce one decision per image. Each input image is fed to the feature extraction algorithms directly after converting to grayscale. We use the mean and the standard deviation of filter outputs for a 3-scale and 8 directional filterbank, and the feature vector size is \(2 \times 3 \times 8 = 48\).

Choosing number of scales and directions larger than necessary may result in redundant data, which in turn increase complexity and reduce classifier accuracy due to curse of dimensionality. In order to overcome this problem, we apply several well-known dimension reduction techniques to our features before classification. Each feature is classified once without any dimension reduction, once after principal component analysis (PCA) [11], once after linear discriminant analysis (LDA) [23], and one after independent component analysis (ICA) [9]. For PCA, the dimension is reduced while keeping the 99.9 % of the cumulative energies of eigenvalues. Since the maximum number of dimensions is bounded by the number of classes, dimension is reduced to two for each feature, in LDA.

We classify the extracted features using support vector machines (SVM) with radial basis function (RBF) as the kernel function. The accuracy of the system is measured by twofold, tenfold, and leave-one-out cross-validations, which are standard methods for measuring the accuracy of classification in the literature. In order to find the best possible accuracy, we perform a parameter search for C and \(\gamma \) parameters of SVM using a simple heuristic.

dummy

4 Experimental results

We compare the proposed feature extraction scheme with various multi-scale directional feature extraction algorithms, such as curvelets [4], contourlets [5] steerable pyramids [26], complex wavelets [14], Gabor filters [22], and texton filterbanks [17, 18, 24]. We use a 270 image dataset that has 90 images per grade, which is also use in [27].Footnote 1 Experimental results are presented in Tables 2 and  3. Mean accuracy in Tables 2 and  3 is calculated by dividing the trace of confusion matrix to the number of elements in the dataset. Directional filtering method paired with LDA achieves perfect classification accuracy, even in twofold cross-validation in first dataset. Table 4 compares the leave-one-out cross-validation accuracies of directional filtering paired with LDA with method proposed in [27], where the new method performs better than the current state of the art. Similar results are presented in Table 3 for CERTH-AUTH dataset.

Table 2 Twofold, tenfold, and leave-one-out cross-validation accuracies of each grade, for each feature in first dataset
Table 3 Twofold, tenfold, and leave-one-out cross-validation accuracies of each grade, for each feature in CERTH-AUTH dataset
Table 4 Leave-one-out cross-validation accuracies of each grade, for each feature on first dataset

Figure 5 shows directional-filtering-based features of first dataset reduced to two dimensions by LDA. All grades are compactly clustered and easily separable.

Fig. 5
figure 5

Directional-filtering-based features of first dataset reduced to two dimensions by LDA

We also performed tests to measure the computational complexity of algorithms. These tests are done on a computer with Intel i7-4700MQ CPU and 16 GB memory. Values presented in Table 5 are average times over 10 runs. It is clear that directional filters are the most efficient among tested algorithms. They can extract feature parameters from a \(512 \times 512\) image in eight directions andthree scales in 0.032 s in MATLAB.

Table 5 Time required for each feature to be extracted from a \(N \times N\) image, for \(N = [512,\,1,\!024,\,2,\!048]\)

5 Conclusion

A method for grading FL images, based on a novel multi-scale directional feature extraction framework, is proposed. In this framework, we use a directional filterbank filtering the image at \(\theta =\{0^\circ ,\, \pm 26.56^\circ ,\, \pm 45^\circ ,\, \pm 63.43^\circ ,\, 90^\circ \}\) directions. This new multi-scale directional framework is compared with a number of multi-scale directional image representation methods including the complex wavelet transforms, curvelets, contourlets, gray-level co-occurrence matrices, Gabor filters, steerable pyramids, and texton filter banks.

In terms of computational efficiency, directional filter banks are the fastest among all tested methods.

When features extracted with proposed method are reduced to 2D using linear discriminant analysis, a SVM classifier with a proper selection of parameters achieves almost perfect recognition accuracy, surpassing other multi-scale directional feature extraction algorithms, and the state-of-the-art method.

Therefore, texture-classification-based grading of FL images is as good as centroblast-counting-based conventional methods.