Parallel and distributed processing for high resolution agricultural tomography based on big data

Alves, Gabriel M.; Cruvinel, Paulo E.

doi:10.1007/s11042-023-15686-2

Parallel and distributed processing for high resolution agricultural tomography based on big data

Published: 20 June 2023

Volume 83, pages 10115–10146, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Parallel and distributed processing for high resolution agricultural tomography based on big data

Download PDF

111 Accesses
1 Altmetric
Explore all metrics

Abstract

In the field of high-resolution tomography, there is currently a notable increase in the volume of tomographic projections and data produced. Such a context has been demanding new computational approaches to the process of reconstruction and processing of the resulting digital images. This paper presents a new approach to meet such a demand, such as optimizing the set of tomographic projections for the reconstruction process, parallelizing algorithm reconstruction, and processing the data in a distributed manner. In this context, a customized method for the high-resolution tomographic reconstruction of agricultural samples has been validated. Hence, tomographic projections with greater amounts of information based on measurements of the spectral density of the projections can be prioritized, and the reconstructive process parallelization using the known filtered back-projection can be considered (i.e., distributed data flow and the use of the Apache Spark environment). For the operation, such an approach based on the big data environment has been organized, that is considering a cluster installed on the Amazon Web Services platform, whose configuration has been defined after the evaluation of the speedup and efficiency metrics. The developed method proved to be useful for carrying out high-resolution tomography analyses of large quantities of agricultural samples, based on the paradigms of precision agriculture for gains in sustainability and competitiveness of the production process.

Exploiting Scalable Parallelism for Remote Sensing Analysis Models by Data Transformation Graph

Review of Parallel Processing Methods for Big Image Data Applications

The Hiperwall tiled-display wall system for Big-Data research

Article Open access 28 October 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Medical Imaging

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The study of computed tomography (CT) applied to agriculture, which began in the early 1980s, focuses on soil science and included investigating the processes of water infiltration and the properties of density, moisture, and porosity [24]. In Brazil, the first X-ray and $\gamma $-ray minitomograph scanners for soil science applications were built in 1987. This makes it is possible to measure samples in the laboratory and constitutes an important step in the development and advancement of the tomography technique in the country. Subsequently, other tomographs were developed on at millimeter scale, such as portable $\gamma $-ray tomography and Compton scattering tomography [4, 7, 20, 27, 31].

Moreover, other types of agricultural research started using CT to develop their studies and perform analyses [1, 2, 13, 23]. CT enables non-invasive analysis of the interior of a body or object and, therefore, an alternative method for evaluating the internal morphology of agricultural samples. The non-invasive analysis of the interior of the agricultural samples is possible because CT produces an image of the interior of a body by reconstructing the projections obtained from X-ray beams that go through a body without damaging it. Therefore, reconstruction from projections is considered a fundamental step and demands high computational capacity, in addition to managing a large amount of data [9, 11, 22].

In this context, the term big data can be applied to a large volume of tomographic data because it represents a new method of handling available data nowadays, which is often unstructured. Big data can be applied to the increased demand for analyses, as the number of species and varieties of seeds continues to increase. In addition, it should be noted that big data techniques have already been used in various agricultural applications, such as in the process of tomographic reconstruction, in the treatment of information to be reconstructed three-dimensionally and in the development of new algorithms [3, 6, 12, 17, 19, 26, 34,35,36]. Therefore, the opportunity to integrate these three areas (e.g., CT, agriculture, and big data) is intended to allow the reconstruction of tomographic images in a big data environment to enable a greater number of agricultural analyses. Thus, good quality image reconstruction should be initially considered using smaller sets of tomographic projections to reduce the time involved in the reconstruction and allow, a significant increase in the number of analyses in the same frame. Consequently, new solutions are of interest in the parallelization of the algorithms involved in reconstruction and in the use of architectures that allow hardware processing [8, 28, 32]. However, it should be noted that the use of cloud computing clusters for tomographic reconstruction has not yet been explored.

This study aims to develop a method for two-dimensional (2D) and three-dimensional (3D) (volumetric) high-tomographic image reconstruction in a parallel and distributed big data environment that will allow the selection of the most relevant projections to reconstruct good quality images to allow a greater number of agricultural analyses to be completed in the same time frame. The main contribution of this study is the distinguished reduction in the time requested for a high resolution CT image reconstruction based on the projections selection by its spectrum of energy. In addition, for both 2D and 3D image reconstruction methods it has been considered the inclusion of parallelization and a framework operating in a distributed environment. The remainder of this paper proceeds as follows. Section 2 presents the fundamentals of CT and the power spectral density (PSD) used for the selection of the projections. Section 3 presents the organization of the method for tomographic reconstruction of agricultural samples in a big data environment. Section 4 presents the results and discussion of this work. Finally, the conclusions are presented in Section 5.

2 Fundamentals of computed tomography and power spectral density

2.1 Computed tomography and methods of reconstruction

The main problem of CT is obtaining an image of the object under study from the reconstruction of projections that were obtained based on transmission. The solution is to reconstruct an image by obtaining line integrals along straight lines that pass through the object.

The physical model of X-ray attenuation in transmission CT is illustrated in Fig. 1. A narrow beam represented by a straight L with intensity I(x) comes from the source and passes through the object, which has a certain attenuation coefficient $\mu $. The detector registers the remaining intensity of the beam, and this information is used to reconstruct the 2D image of the object [14, 16, 25].

From the physical model, the following (1) can be obtained, known as the Lambert-Beer equation, which expresses the amount of exponentially attenuated X-rays along straight L.

$$\begin{aligned} I = I_0 \exp \left( -\int _L \mu (x)dx\right) \end{aligned}$$

(1)

For tomographic reconstruction purposes, the variation of this attenuation should be measured along the straight L, which can be obtained using the following (2).

$$\begin{aligned} p(L) = \int _L \mu (x)dx = - \ln \left( \frac{I}{I_0}\right) \end{aligned}$$

(2)

From this equation, a reconstruction method is obtained by the radon transform, to discover a function $f:\mathbb {R}^2 \rightarrow \mathbb {R}$ from all line integrals in a previously determined domain. In CT, it is used to determine the distribution of attenuation $\mu (x)$ which corresponds to the density of the object under study. Therefore, the problem is considered an inverse problem because it seeks to find the attenuation coefficient from the available data, that is, from I and $I_0$.

One approach to understanding the process of tomographic reconstruction is to consider an X-ray beam as a straight line from the source to the detector. This set (i.e., source and detector) is rotated by an angle $\theta \in [0, 2\pi )$ so that the entire object is scanned in the plane at one fixed position z. Figure 2 presents a schematic diagram of a parallel projection, highlighting the distance t. Evidently, a projection is a set of line integrals, represented by $P_\theta (t)$, considering the same cross-section or position in the z-axis, as well as the same angle $\theta $ of the set (i.e., source and detector) in relation to the fixed coordinates (x, y).

In practice, it is ideal if the linear attenuation coefficient values are given as a result of fixed coordinates (x, y), which is possible using the relation between the polar and Cartesian coordinates. Therefore, the perpendicular distance from the origin to line L (X-ray beam) can be determined using the following (3):

$$\begin{aligned} t = x \cos ( \theta ) + y \sin ( \theta ). \end{aligned}$$

(3)

Thus, $P_\theta (t)$ is based on $\varepsilon $ in which angle $\theta $ determines the inclination of the $\varepsilon $-axis concerning the horizontal line, and the integral of the function is made on a straight line perpendicular to this axis. Further, scanning the entire interval $\theta \in [0, 2\pi )$ is unnecessary, but only the interval $\theta \in [0, \pi )$ to avoid data redundancy.

Because, computationally, infinite line integrals cannot be obtained, the cross-section, can be represented at a certain angle, making use of the Dirac delta function, which has the sampling property. Equation (3) in the 2D case, can be rewritten as

$$\begin{aligned} P_\theta (t) = \int _{-\infty }^{\infty } \int _{-\infty }^{\infty } \mu (x, y) \delta (x \cos \theta + y \sin \theta - t) dxdy. \end{aligned}$$

(4)

Equation (4) is known as the Radon transform, $\mathcal {R}_\theta \mu (t) = P_\theta (t)$. Therefore, the problem of reconstructing an image consists of determining $\mu (x,y)$ from $\mathcal {R}_\theta \mu (t)$. The Radon transform maps the space domain (x, y) in the domain (t, $\theta $), where each point in space (t, $\theta $) corresponds to a line in space (x, y).

The Radon inverse transform, $\mathcal {R}^{-1}$, is used to reconstruct $\mu $ and can be obtained through the Fourier slice theorem, or the central slice theorem, which relates the projections of the Radon transformation to the Fourier transform.

2.2 Power spectral density

The PSD of a signal is often solved by estimating the autocorrelation function with the available data, which is applied after the Fourier transform to obtain the desired spectral description. However, different approaches are available to perform spectral estimation that can be classified as parametric or non-parametric methods.

The first type is generally simpler to calculate, but it requires a priori knowledge signal, whereas the second type assumes no particular structure behind the available data [10, 21].

Given a random signal in the time domain, $\mathcal {X}(t)$, it is assumed that it is sampled over a finite time interval ($-T/2$, T/2) and is denoted by $X_T(t)$. When applying the Fourier transform, we obtain:

$$\begin{aligned} \tilde{X}_T(f) = F\{X_T(t)\} = \int ^\infty _{-\infty }X_T(t)e^{-2\pi jft}dt = \int ^{T/2}_{-T/2}\mathcal {X}(t)e^{-2\pi jft}dt \end{aligned}$$

(5)

From (5), we obtain the module and argument of $\tilde{X}_T$ the amplitude spectrum and phase spectrum, respectively. The spectral energy density, is calculated from $\tilde{X}_T$ using the expected value of the square of the amplitude spectrum, as indicated in the (6):

$$\begin{aligned} E(f) = \mathcal {E} \{ |\tilde{X}_T(f) |^2\} \end{aligned}$$

(6)

It is observed that E(f) tends to infinity when T tends to infinity. Therefore, dividing (6) by the interval of T limits the growth and provides the density of the power spectrum expressed by (7), which is real and not negative. This definition is valid and exists for all stationary processes with zero mean and finite variance. For agricultural tomography, the samples to be tested are moved to the tomographic table. They remain stationary during the projection acquisition process so that this theory can be used [5]. Additionally, as the Poisson noise is a priority in the tomographic process, it is also considered that stationary behavior will be exhibited throughout the tomographic process.

$$\begin{aligned} S(f) = \lim _{T \rightarrow \infty }\mathcal {E} \left\{ \frac{1}{T} \Bigg |\int ^{T/2}_{-T/2}\mathcal {X}(t)e^{-2\pi jft}dt \Bigg |^2 \right\} \end{aligned}$$

(7)

In the discrete case, considering the sequence x[n], we obtain (8), where $\hat{S}$ is the estimator per periodogram. This is equivalent to applying a rectangular window over to interval $0 \le n \le (T -1)$ of sequence x[n] to square the Fourier transform module of the truncated sequence and normalizes the result by a factor T to obtain a measure of PSDs.

$$\begin{aligned} \hat{S}(e^{jf}) = \frac{1}{T} \Bigg |\sum ^{T-1}_{n = 0}x[n]e^{-jfn} \Bigg |^2 \end{aligned}$$

(8)

Based on the spectral density of each tomographic projection present in a considered sinogram, the energy of each projection was evaluated to try to identify those that have a more relevant set of information to obtain the tomographic reconstruction in two dimensions. In this context, information on the spectral density of each tomographic projection can be obtained by considering the power spectrum related to it.

When considering signal $s = s(t)$, continuous in time, as a function that represents a random signal and $S = S(\omega )$, a function representing the periodogram of this signal, it is possible to decompose S as follows:

$$\begin{aligned} S = S_r + jS_i, \end{aligned}$$

(9)

where $S_r$ and $S_i$ are the real and imaginary parts, respectively, and $j = \sqrt{-1}$. This equation can still be written in polar form as follows:

$$\begin{aligned} S = |S|e^{j\theta (p)}. \end{aligned}$$

(10)

Therefore, the amplitudes in the spectrum translated by (10) can be given by:

$$\begin{aligned} |S |= \sqrt{S_r^2 + S_i^2}. \end{aligned}$$

(11)

Hence, (8) can be considered when working with an X-ray tomographic projection, which is represented as a sequence s[n], as the signal is discrete; that is,

$$\begin{aligned} \hat{S}(e^{jf}) = \frac{1}{M} \Bigg |\sum ^{M-1}_{n = 0}s[n]e^{-jfn} \Bigg |^2, \end{aligned}$$

(12)

where n is in the interval $0 \le n \le (M -1)$ and represents the number of samples in the sequence s[n].

3 Materials and methods

Figure 3 presents the block diagram that illustrates the overview of the method developed for the 2D and 3D (volumetric) reconstruction of tomographic images of agricultural samples in big data environment.

The samples obtained using agricultural tomographs were projections that were inserted and stored in a big data environment, which is represented by the dashed line. The process consisted of selecting the projections that used spectral density to evaluate the associated energy in each tomographic projection to select those that had more relevant information. Subsequently, 2D and 3D parallel reconstruction steps were performed, and finally, the images were made available for viewing.

3.1 Big data environment

The organization of the big data environment was considered from two perspectives: infrastructure and application. These two perspectives are built using a technology stack. Figure 4 illustrates the technology stack used in the organization of the big data environment for this work and emphasizes, through the dashed lines, the cluster representing the infrastructure, and the method of reconstruction of the tomographic images, developed using Python language.

Table 1 presents the technologies used for the organization of the big data environment and the respective versions installed and configured to compose the environment.

Table 1 Versions of the technologies used in the organization of the big data environment

Parallel and distributed processing for high resolution agricultural tomography based on big data

Abstract

Similar content being viewed by others

Exploiting Scalable Parallelism for Remote Sensing Analysis Models by Data Transformation Graph

Review of Parallel Processing Methods for Big Image Data Applications

The Hiperwall tiled-display wall system for Big-Data research

Explore related subjects

1 Introduction

2 Fundamentals of computed tomography and power spectral density

2.1 Computed tomography and methods of reconstruction

2.2 Power spectral density

3 Materials and methods

3.1 Big data environment

3.2 Tomographic projection selection model

3.3 Reconstruction algorithm from the selected projections

3.4 Experimental evaluation

4 Results and discussions

4.1 Infrastructure analysis for the tomographic reconstruction method based on big data

4.2 Speedup assessment

4.3 Efficiency assessment

4.4 Structural similarity assessment

4.5 Peak signal to noise ratio assessment

4.6 Volumetric visualization of tomographic images

5 Conclusion

Data availability statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation