1 Introduction

Internet-of-things (IoT) is a combination of sensor-based devices and systems which have the ability to collect and transfer data over a network without the need of manual control [1]. There are many cases in which power hungry devices are used in order to connect the node of a system’s data with data centers [2]. In order to reduce the unnecessary data transferring and the high power consumption new computing paradigms are applied. Edge computing is a solution which brings computation and data storage closer to data sources. In the case of remote sensor systems, where online recharging capabilities are unavailable, power consumption is a vital factor. As a result edge computing should utilize power efficient devices.

A potential approach to minimize such systems’ power consumption involves analog integrated architectures, especially with transistors that operate in the sub-threshold region [3]. A particular field which can benefit from the advantages of analog computing and hardware parallelization is image processing, since it is usually computationally expensive. A popular image processing method is edge detection, which can be used in various applications, including medical diagnosis [4], real-time object recognition [5], navigation systems [6] and more [7, 8].

In the literature only a few works that involve analog integrated-based image edge detection exist. In particular, [9] demonstrates a two-stage network, in which the first stage detects the edges by applying a series of filters and thresholding circuits. Then, the second stage reconstructs the original image based on the edges that are previously detected. [10] presents a morphological edge-detector which estimates the images’ erosion and dilation. In addition, [11] and [12] propose an analog implementation of the Sobel operator [13], achieving high quality results at the cost of chip area and power consumption. Another existing approach is to combine current [14] or voltage mode [15, 16] mixed-signal circuitries that employ convolution filters.

Despite the fact that the aforementioned works are characterized by lower power consumption compared to their digital counterparts, the growing need in higher image resolutions requires even greater power management, especially for battery-dependent IoT devices. In this direction, the work presented in [17] utilizes a compact Gaussian function circuit, reducing the circuit’s power dissipation to 0.9 μW per pixel. Nonetheless, this reduction was achieved, using a custom edge-detection algorithm, at the cost of performance in accuracy.

In this work, motivated by the challenges related to the trade-off between power efficiency and performance, we propose a high-performance compact analog integrated edge-detector that takes advantage of an ultra-low power current-mode Gaussian function circuit. Its architecture is directly integrated on a photodiode sensor array and produces a digital output without the need of any data converters. The proposed design can be used in various different applications without any significant modifications. This article extends the authors’ previous work [18] which shares a low-power, low area and fast analog Bayesian classifier for thyroid disease detection, based on a low-power current-mode Gaussian function circuit. In particular, we present a hardware-friendly implementation of the Robert’s Cross operator (RCO) to structure an analog edge-detector, using the same building blocks as in [18]. The implemented architecture achieves a low power dissipation of 33 nW per pixel, Peak Signal-to-Noise-Ratio (PSNR) value of 27.7 dB and Structural Similarity Index Metric (SSIM) of 0.81.

The remainder of this paper is organized as follows. Section 2 explains the mathematical foundations of the RCO. The proposed implementation and its basic building blocks are analyzed in Sect. 3. Section 4 presents the experimental results of the proposed edge detector and compares them with a software-based model. A comparison and discussion are provided in Sect. 5. Section 6 concludes the article.

2 Mathematical background

The RCO is one of the first and simplest edge detectors in the literature [19, 20]. It is a computationally efficient differential operator that, despite the approximations considered, achieves impressive results. The RCO detects regions with high spatial frequency in the diagonal direction and therefore produces results that mimic to the human perception’s ones.

We assume an image with a \(N \times M\) resolution, \(x_{i,j}\) denotes the light intensity of a pixel (ij), for each \(i < N\), \(j < M\). First, consider \(y_{i,j}\) as the root square of \(x_{i,j}\):

$$\begin{aligned} y_{i,j} = \sqrt{x_{i,j}}, \end{aligned}$$
(1)

then, the approximation of the image’s gradient \(z_{i,j}\) is calculated as:

$$\begin{aligned} z_{i,j} = \sqrt{(y_{i,j} - y_{i+1,j+1})^2 + (y_{i+1,j} - y_{i,j+1})^2}. \end{aligned}$$
(2)

This gradient is in fact a grayscale image in which the detected edges are characterized by a high intensity value, whereas flat areas by a low one.

In practice, the RCO is calculated by convolving the given image with two \(2 \times 2\) diagonal matrices \({\mathbf {M}}_x\) and \({\mathbf {M}}_y\), given by:

$$\begin{aligned} {\mathbf {M}}_x = \begin{bmatrix} +1 &{} 0 \\ 0 &{} -1 \end{bmatrix} {~\mathrm {and}~} {\mathbf {M}}_y = \begin{bmatrix} +1 &{} 0 \\ 0 &{} -1 \end{bmatrix}. \end{aligned}$$
(3)

Through the convolution of \({\mathbf {M}}_x\) and \({\mathbf {M}}_y\) with the image considered, the two components \({\mathbf {G}}_x\) and \({\mathbf {G}}_y\) are calculated as:

$$\begin{aligned} {\mathbf {G}}_x= & {} \mathbf {Image} *{\mathbf {M}}_x, \end{aligned}$$
(4)
$$\begin{aligned} {\mathbf {G}}_y= & {} \mathbf {Image} *{\mathbf {M}}_y. \end{aligned}$$
(5)

where \(*\) denotes the convolution operator. The calculation process for both \({\mathbf {G}}_x\) and \({\mathbf {G}}_y\) can be simplified as:

$$\begin{aligned} G_x(i,j)= & {} y_{i,j} - y_{i+1,j+1}, \end{aligned}$$
(6)
$$\begin{aligned} G_y(i,j)= & {} y_{i+1,j} - y_{i,j+1}. \end{aligned}$$
(7)

In this case, the approximation of the image’s gradient \(\nabla I(i,j)\) is calculated as:

$$\begin{aligned} \nabla I(i,j) = \sqrt{G_x(i,j)^2 + G_y(i,j)^2}, \end{aligned}$$
(8)

for each pixel (ij) for \(i < N\), \(j < M\). Finally, one could optionally use a threshold on the the gradient \(\nabla I(i,j)\) to produce a binary image bin(ij) that indicates the edges:

$$\begin{aligned} I_{bin}(i,j) = {\left\{ \begin{array}{ll} 0 &{} \text {if } \nabla I(i,j) < I_{th} \\ 1 &{} \text {if } \nabla I(i,j) \ge I_{th} \end{array}\right. } \quad , \end{aligned}$$
(9)

where, \(I_{th}\) is a parameter threshold value.

3 Proposed analog edge detector

A hardware-friendly modification of the RCO as well as the proposed analog edge detector’s building blocks and operation are explained in this section. We note that, all transistors in the following design operate in the sub-threshold domain with power supply rails set to \(V_{DD} = -V_{SS} = 0.3V\) in order to reduce the power consumption of the entire circuitry.

3.1 Hardware-friendly Robert’s cross operator

In the literature, Bump circuits are used to implement Bell-like curves, that greatly resemble the Gaussian curve [21]. Despite being less accurate than the other Gaussian function circuit implementations, their low power consumption and compactness makes them preferable in various applications that require simultaneous operation of multiple Gaussian function circuits [21]. Their vast range of applications includes RBF-based classifiers [22], neuromorphic circuits [23], fuzzy and neuro-fuzzy controllers [24] and smart sensor systems like anomaly [25] and edge detection [17] circuits. In this work, we propose a hardware-friendly modification of the RCO (see Sect. 2) which facilitates a simple implementation of the analog edge detector by using Bump circuits as basic computational blocks. This is preferable to implementing the original RCO using squaring and root square circuits that, typically, are more challenging and power expensive than Bump circuits [21].

We assume the aforementioned image with a \(N\times M\) resolution. In our hardware-friendly implementation we make use of the following mapping \({\hat{z}}_{i,j}\) of the RCO \(z_{i,j}\) using a Gaussian Kernel transformation:

$$\begin{aligned} \begin{aligned} {\hat{z}}_{i,j}&\triangleq e^{-\frac{1}{2\sigma ^2} \left( z_{i,j}\right) ^2} \\&= e^{-\frac{1}{2\sigma ^2}\left( \sqrt{(y_{i,j} - y_{i+1,j+1})^2 + (y_{i+1,j} - y_{i,j+1})^2}\right) ^2} \\&= e^{-\frac{1}{2}\frac{(y_{i,j} - y_{i+1,j+1})^2}{\sigma ^2}} e^{-\frac{1}{2}\frac{(y_{i+1,j} - y_{i,j+1})^2}{\sigma ^2}}, \end{aligned} \end{aligned}$$
(10)

where the variance \(\sigma \) acts a parameter that controls the sensitivity of the edge detection operator. This equation can be expressed as follows:

$$\begin{aligned} {\hat{z}}_{i,j} = 2\pi \sigma ^2 {\mathcal {N}}(y_{i,j}\Vert y_{i+1,j+1}, \sigma ^2) {\mathcal {N}}(y_{i+1,j}\Vert y_{i,j+1}, \sigma ^2), \end{aligned}$$
(11)

where \({\mathcal {N}}(x\Vert \mu , \sigma ^2)\) is the univariate Gaussian function and is given by:

$$\begin{aligned} {\mathcal {N}}(x\Vert \mu , \sigma ^2) = \frac{1}{\sqrt{(2\pi ) \sigma ^2}} ~ e^{-\frac{1}{2} \frac{(x-\mu )^2}{\sigma ^2}}. \end{aligned}$$
(12)

Here, μ denotes the mean value of the Gaussian function. In practice, for the hardware implementation of (11), \(2\pi \sigma ^2\), for a given \(\sigma \), is a scalar constant and can be ignored. Also, unlike the original RCO, here, the edges are characterized by a low intensity value and the non-edges by a high one.

3.2 Edge detector architecture

In this paper, the basic building block is the Bump circuit introduced in our previous work [18], depicted in Fig. 1. It is composed of two neuron cells and a symmetrical current correlator biased by a cascode current mirror. The two neuron cells operate as a differential pair, where the differential voltage input is replaced by two input currents. Unlike a typical Bump circuit [21, 26, 27], where one of the differential pair’s voltage inputs acts as a constant parameter, here, based on (11) both \(I_{in1}\) and \(I_{in2}\) are in fact inputs to the circuit. The two neurons produce two drain currents \(I_1\) and \(I_2\), which consist of two complementary sigmoidal curves. Given these sigmoidal currents, the correlator’s output current resembles a Gaussian curve. The voltage parameter \(V_c\) and the bias current \(I_{bias}\) control the variance and the height of the Gaussian curve, respectively. In particular, as shown in Fig. 2 by increasing the absolute value of \(V_c\), the Gaussian curve’s width also increases. Furthermore, the output current’s maximum value \(I_{out,max}\) is approximately equal to the bias current value \(I_{bias}\). All transitors’ dimensions are summarized in Table 1.

Fig. 1
figure 1

The proposed analog architecture implementing a Bump circuit. \(I_{in1}\), \(I_{in2}\), \(V_c\) and \(I_{bias}\) are the 2 input currents, the voltage controlling the variance and the bias current controlling the height of the Gaussian curve, respectively. [18]

Fig. 2
figure 2

Left: Parametric analysis over \(I_{bias}\), for \(I_{in1}\in [0,10]\) nA, \(I_{in2}=5\) nA, \(V_c=0\) V. Right: Parametric analysis over \(V_c\), for \(I_{bias}=12\)nA, \(I_{in1}\in [0,10]\) nA and \(I_{in2}=5\) nA. [18]

Table 1 MOS transistors’ dimensions (Fig. 1)

Bump circuits can efficiently perform multiplication without the use of additional components. In particular, let us consider two Bump circuits. If we bias the second Bump circuit (Ibias2) with the first Bump circuit’s output current (Iout1), the output current of the second Bump equals the product of their respective Gaussian curves [27]. In this configuration, only the first Bump circuit is biased with a specified external bias current (\(I_{bias}\)). This topology constitutes the analog implementation of the RCO, depicted in Fig. 3 and its output current approximates a \(2 \times 2\) image’s gradient.

Fig. 3
figure 3

A 2-D Bump circuit that implements the analog Robert’s Cross operator

In the literature, Winner-Take-All (WTA) circuits are used to implement the \(\mathrm {argmax}\) operator [28]. In particular, a N-input N-output WTA circuit, is composed of N neurons, each one associated with one input and one output. If a given input \(I_{inj}\), \(j \le N\), is larger that the rest, then the respective output \(I_ {outj}\) has a high value, whereas the rest are zero. A 2-neurons Lazzaro WTA circuit example, shown in Fig. 4 can also be used as a simple threshold circuit, where its second input is the threshold value \(I_{th}\). To counter the fact that in the analog implementation, edges have a low intensity value, whereas in the software implementation they have a high one, the overall output of the circuit is the output of the neuron with the \(I_{th}\) as its input. Therefore, this topology operates complementary to a threshold circuit. The analog implementation of the RCO with the threshold circuit is depicted in Fig. 5.

Fig. 4
figure 4

A two neurons NMOS Lazzaro WTA circuit

Fig. 5
figure 5

The 2-D Bump circuit that implements the analog Robert’s Cross operator paired with a Winner-Take-All circuit that operates as a threshold circuit. \(I_{th}\) is the threshold current

3.3 System-level architecture

The proposed circuitry can be used on multiple high-level architectures in order to regulate the trade-off between the area, power consumption and operation speed. The straightforward solution, also proposed in [11] and shown in Fig. 6, is to construct a \(N\times M\) grid which is directly integrated on the photodiodes. This approach is suitable for accelerator circuits since it offers very high operation speeds at the cost of increased chip area and power consumption. On the other hand, a more area and power efficient configuration (shown in Fig. 7), consists of a single RCO cell, which is sequentially shifted throughout to the entire image. This approach, requires significantly more computation time (proportionally to the size of the image) in comparison to the previous one, as well as memories and digital circuitry to synchronize the whole procedure.

Unfortunately, both of the aforementioned approaches are inefficient for large images. Therefore, a middle ground, subsequently called the analog hardware-friendly implementation, that combines elements from both of them and can be adjusted in respect to the image’s size, is adopted in this work. In particular, an image can be segmented into smaller parts in which a single RCO cell is used as explained in Fig. 7. Then the processed segments are connected to reform the original image. In this topology, the size of each segment acts as a hyper-parameter controlling the trade-off between efficiency and operation speed. However, a drawback of this approach is that additional digital circuitry for reconstructing the original image is also required. An explanatory demonstration of this approach is depicted in Fig. 8.

Fig. 6
figure 6

Conceptual system-level architecture, where multiple analog edge detector cells directly integrated on the photodiodes

Fig. 7
figure 7

Conceptual system-level architecture, where a single analog edge detector cell is shifted towards the entire image

Fig. 8
figure 8

Proposed system-level architecture, where multiple analog edge detector cells are shifted along the entire image

4 Simulation results

In this section, a comparison between the analog hardware-friendly and software implementations of an RCO approximation in various different images is provided. The analog edge detector has been designed using the Cadence IC suite in a TSMC 90 nm CMOS process whereas the software implementation was evaluated using Python 3.7. All simulations presented are conducted on the layout, which is depicted in Fig. 9 (post-layout simulations). To avoid mismatches and manufacturing considerations, based on the common-centroid technique extra dummy transistors are used in the implementation of the layout [29].

Fig. 9
figure 9

Layout of the implemented RCO cell along with the WTA circuit

In order to import the images in Cadence, the pixel values were normalized in the range [2, 6] nA. These currents are the inputs of the Bump circuit and therefore values lower than 2 nA may lead to missleading results. The bias currents for the multivariate Bump circuits and the WTA circuit were set as \(I_{bias,b} = 4\) nA and \(I_{bias,wta} = 6\) nA, respectively. These are the minimum recommended current values for the proper operation of the edge detector. The hyperparameters \(V_c\) and \(I_{th}\) directly affect the results of the edge detector. For the simulations in this section their values are set to \(V_c=0.3\) V and \(I_{th}=3\) nA.

In order to quantitatively evaluate the edge detector’s performance, three figures of merit, that are widely used in image assessment [30], have been applied. Assuming for a reference image \({\mathbf {X}}\) and the produced image \({\mathbf {Y}}\). The SSIM is a perceptual metric which is calculated according to the following formula [31]:

$$\begin{aligned} SSIM({\mathbf {X}},{\mathbf {Y}})=\frac{(2\mu _{\mathbf {X}}\mu _{\mathbf {Y}}+C_1)(2\sigma _{{\mathbf {X}}{\mathbf {Y}}}+C_2)}{(\mu _{\mathbf {X}}^2+\mu _{\mathbf {Y}}^2+C_1)(\sigma _{{\mathbf {X}}}^2+\sigma _{{\mathbf {Y}}}^2+C_2)}, \end{aligned}$$
(13)

where \(\mu _{{\mathbf {X}}}\) the average of \({\mathbf {X}}\) (similarly for \({\mathbf {Y}}\)), \(\sigma _{{\mathbf {X}}}^2\) the variance of \({\mathbf {X}}\) (similarly for \({\mathbf {Y}}\)) and \(\sigma _{{\mathbf {X}}{\mathbf {Y}}}\) the covariance of \({\mathbf {X}}\) and \({\mathbf {Y}}\). Furthermore, for the coefficients \(C_1\) and \(C_2\) we select \(C_1=(0.01 L)^2\) and \(C_2=(0.03 L)^2\), where L is the specified dynamic range value of the images (\(L=1\) in our case).

The PSNR is given by [32, 33]:

$$\begin{aligned} PSNR({\mathbf {X}},{\mathbf {Y}})=20\log _{10}\left( \frac{\max \{{\mathbf {X}}\}}{\sqrt{MSE({\mathbf {X}},{\mathbf {Y}})}}\right) , \end{aligned}$$
(14)

where MSE is the mean squared error between the images \({\mathbf {X}}\) and \({\mathbf {Y}}\) and is given by [34]:

$$\begin{aligned} MSE({\mathbf {X}},{\mathbf {Y}})=\frac{1}{N M}\sum _{i,j=1}^{N,M}\mid {\mathbf {X}}(i,j)-{\mathbf {Y}}(i,j) \mid ^2, \end{aligned}$$
(15)

for each pixel (ij) of the denoted images. The Mean Absolute Percentage Error (MAPE), for each of the denoted images, is given as:

$$\begin{aligned} MAPE({\mathbf {X}},{\mathbf {Y}}) = \left( \frac{\sum _{i,j=1}^{N,M} \mid {\mathbf {X}}(i,j) - {\mathbf {Y}}(i,j) \mid }{ \sum _{i,j=1}^{N,M} X_{ij}}\right) 100\%, \end{aligned}$$
(16)

note that in (16) the mean value terms \(\frac{1}{NM}\) cancel each other out.

To test the proposed edge detector a typical \(512 \times 512\) image (see Fig. 10a) is used and the results are presented in Fig. 10b, c. The quality of the analog generated images can be assessed by a visual inspection in area with high edge concentration. Two representative examples are the face and the feather; in the former the analog implementation adequately captures the basic features of a human face but in the latter presents a generic depiction with lesser details. This is true for both the images with and without the threshold in Fig. 10b, c, respectively.

Fig. 10
figure 10

Edge detection on an image of Lenna

These results are also verified by observing the aforementioned metrics. In particular, the PSNR and the SSIM between the software and the analog generated images are equal to 28.4 dB and 0.85 (values over 0.9 indicate that the differences are not detectable by the human eye), respectively. It should be noted that the PSNR is calculated on the image without applying any threshold (see Fig. 10b), whereas the SSIM is computed on between the binary images (see Fig. 10c). To further (qualitatively) verify the performance of the proposed edge detector, the produced images for 2 additional reference images are provided. The results are presented in Fig. 11a, b and the PSNR and SSIM metrics for all images are summarized in Table 2.

Fig. 11
figure 11

Up: Edge detection on an image of a cameraman. Down: Edge detection on an image of a bike

Table 2 Performance summary for analog edge detectors

The proposed edge detector is also tested in terms of circuit’s sensitivity behavior in PVT variations. Specifically, a Monte-Carlo analysis for \(N=200\) points is conducted on 4 random pixels that indicate an edge and on 4 random pixels that indicate a flat area. Both Monte-Carlo analysis histograms are shown in Fig. 12. Their mean values are \(\mu _{edges} = 0.55\) nA, \(\mu _{flat} = 2.66\)nA with standard deviation of \(\sigma _{edges} = 0.28\)nA and \(\sigma _{flat} = 0.20\) nA, respectively.

Fig. 12
figure 12

Left: Post-layout Monte-Carlo sensitivity analysis histogram for 4 random pixels that indicate an edge. Right: Post-layout Monte-Carlo sensitivity analysis histogram for 4 random pixels that indicate a flat area

5 Performance summary and discussion

A performance summary in terms of circuit’s specifications for existing analog edge detectors is presented in Table 3. The aim of this work was to lower the detector’s power consumption. By observing Table 3 it is evident that the proposed work significantly outperforms the rest in terms of power consumption per pixel. Additionally, the proposed circuit achieves the high computation speed, measured in frames per second (FPS), compared to the existing works. Nonetheless, this increased performance is area-expensive, since the proposed detector requires \(2392 \mu m^2\) per pixel, which is among the highest area per pixel values in Table 1.

Table 3 Performance summary for analog edge detectors

The proposed analog edge detector targets smart sensor IoT devices that involve photodiodes. Its low power consumption and high quality results that match those of digital and software-based systems, make it suitable for high performance systems. For the incorporation of the analog edge detector in smart sensor system, except from the digital circuitry that synchronizes the shift of the RCO cells along the photodiode array, (see Fig. 8), additional components are also required. In particular, since the RCO suffers from noise distortion [19], a filtering circuit should be added between the photodiodes and the proposed edge detector. Furthermore, since the output of the circuit is in a binary format, no analog-to-digital converters are needed. Nonetheless, a digital memory is necessary for the reconstruction of the image that indicates the edges after the RCO cells are shifted along the entire photodiode array.

6 Conclusion

An analog edge detector based on a hardware-friendly modification of the RCO was presented with the scope of increasing the performance of the detector, while maintaining a low power consumption. This is achieved by utilizing a low power, current mode Bump and a Lazzaro WTA circuit. A single RCO cell, composed of these building blocks, can receive 100 K inputs per second consuming only 33 nW. To evaluate the proposed architecture 3 medium resolution images were used. Post-layout simulation results suggest that the produced images achieve an average PSNR value of 27.7 db and an average SSIM of 0.81, indicating an excellent result.