Keywords

FormalPara Background

The historical advances in stochastic computing (SC) can be emphasized mainly in four different phases [1]. 1950s were the early years which describe mainly the work done by John V. Neumann. In the 1960s and 1970s, a lot of advancements were made in the technique by researchers like B. R. Gaines, W. J. (Ted) Poppelbaum, Chusin Afuso, John Esch, Sergio Ribeiro, and others who developed the software and the hardware in which SC was used to do the arithmetic operations. The first conference on SC took place in 1978. After that, there was not much research done until 2000. In the year 2001, Howard Card put out two papers on SC in the IEEE [2, 3]. After that, the interest in SC increased. Subsequently, workshops on SC were held at the University of Waterloo in 2016 and in Montreal in 2017. Section 1 of this paper describes the initial work done in SC mainly from the years 1950s to 1970s. Section 2 describes recent development in SC. Section 3 shows errors, ways to minimize these errors, impact of correlation on stochastic computation and ways to manage/manipulate correlation in stochastic circuits. Section 4 covers the optimization techniques for SC circuits. Section 5 and 6 covers the advantages and drawbacks, conclusion and future scope, respectively.

1 Initial Work

There is a lag between application of input and the response at the output of every network. Error is an essential part of such circuit. As shown in [4], this error can be used for synthesizing reliable logic elements using unreliable components for the process under consideration. Similarly, redundancy is also present in biological neural models. This redundancy can be used to compensate component error. Hence, the use of bundles instead of single wire for carrying information was suggested. But, the probability of energizing a wire in a bundle needs to be random. This is because without randomness, error can be amplified instead of getting cancelled out. This bundling of wires can then be used to carry out arithmetic operations like addition (using stochastic multiplexer) and multiplication (based on Sheffer’s stroke, like stochastic multiplier). The need for random selection of inputs gave rise to third form of computing apart from digital and analog, which was done using random pulse sequence (RPS) combined with standard logic gates to realize various arithmetic and logical functions. This facilitated huge reduction in area and cost of a computational element.

Afuso as shown in [3, 4] developed RPS for SC. But, since the system required a local random pulse generator, a better system called as synchronous random pulse generator (SRPS) was developed by Afuso under the guidance of Prof. Poppelbaum at the University of Illinois. RPS was discrete; SRPS was digital. Following Fig. 1 shows production of RPS and SRPS. A noise diode is used to generate random noise. The occurrence of noise spikes is then detected by an adjustable detection voltage level and amplified. The output of difference amplifier is sampled by clock. A pulse of standard height and length is generated at the output of the sampling circuit only if it has detected a pulse in the previous slot. This pulse train generated at the output of the sampling device is completely random, and its frequency can be controlled by the detection level. But, buffering in SRPS lead to un-randomizing. So, clocked random pulse sequence (CRPS) was developed. CRPS utilized mapping of numbers in the range [−1, 1] onto range [0, 1] which permitted all operations to be performed with only combinational circuitry, thus preserving the probability distribution.

Fig. 1
An illustration of the production of R P S and S R P S. Noise source and input detection lead to difference amplifier and with clock it further leads to sampling and S R P S.

Simplified SRPS [4]

These RPS was used to design different types of pulse processing systems as seen in [4] like time stochastic processing, bundle processing, ergodic processing, and burst processing. So, all the software algorithms were established and simulated on traditional computer, but applications like machine learning required specific complex hardware structure which was not available at that time. The main challenge being the design of an element which would store and vary the stored value for long periods. The element should also be able to be used as a weight to multiply with other variables. This system needed to occupy less area and low cost because these elements were needed in large numbers in a practical system. To design such a system, analog, electro-chemical devices, or standard digital components using semiconductor devices were the options available. The traditional analog integrates and multiply circuits are not stable and require more cost in terms of area. Electro-chemical devices or transfluxors are not reliable and require refined external circuitry to function properly. The high speed, good stability, small size, low cost, and large-scale integration features of semiconductor devices made them the obvious choice for designing of the computing elements. von Neumann and Pierce [4] shows examples of few stochastic computers developed during the 1960s such as portable stochastic computer (POSTCOMP), adaptive digital element (ADDIE), regular array of stochastic computing element (RASCEL), TRANSFORMATRIX, and autonomous processing element (APE). All these computers used one of the pulse processing systems and perform calculations using probability logic. However, with the advent of large-scale integration (LSI) in the late 1960s, interest in SC decreased until the year 2000. The ability of stochastic circuits to effectively perform tasks like decoding in communications and implementing complex computing tasks at low hardware cost in artificial neural network has rekindled the interest of researchers in this field. Next section describes how SC is used today in computation and various methods in which a SC circuit can be designed and what parameters need to be taken into consideration while doing so.

2 Recent Developments in Stochastic Computing

For performing stochastic computation, the binary number under consideration is first converted into stochastic domain. All the calculations or functions that need to be performed are done in stochastic domain, and the result is then converted back to binary if the required output is supposed to be in binary domain. Following Fig. 2 [5] shows a traditional stochastic number generator (SNG). It is used to convert a binary number into a stochastic number and vice versa. A comparator is used as a probability conversion circuit (PCC) that compares the binary number of m bits with a random number of m bits generated by the random number source (RNS). For every binary number B greater than R, a one is produced at the output. The number X is called as a stochastic number (SN) and has a probability pX. It is defined as the ratio of total number of 1s to total number of bits in a stochastic bit stream. E.g., A sequence X = 10,101,100 has Px = Probability of 1s in X as (4/8), i.e., 0.5. All the arithmetic operations are done in the stochastic domain using the generated SN’s. The final output can then be converted to binary format by using a counter that will count the number of 1s in the sequence X. The randomness and length N determine the accuracy of created SN. Ideally, a stochastic number generator should be able to generate random binary sequence where each new bit is independent of the previous bit. Various RNS have been used like noise diode, LFSR [6], NLFSR [7], nanodevices like magnetic tunnel junction (MTJ) [8], and memristors. Some of which are true RNS, while others are pseudo-RNS. Similar to one in Fig. 2, there have been different RNG’s that are designed like weighted binary RNG as proposed by Gupta and Kumaresan in [6], 4-bit binary SNG by Van Daalen et al. [9], and sampling-based RNG [10].

Fig. 2
An illustration of a traditional S N G. C l k has converted to a binary number via 3 boxes labeled random number sequence, comparator, and counter.

Binary to stochastic and stochastic to binary converter [5]

Using above circuit, all the basic arithmetic operations, operations using memory and design of finite state machines can be done as shown in Fig. 3. Multiplication is performed using an AND gate. Addition (scaled) is done using a multiplexer. Similarly, division is performed by using a counter in the specific arrangement using D flip-flop, and subtraction is performed using an EX-OR gate. The range of the inputs should be between [0, 1] to be able to be represented as a probabilistic bit sequence. If not, the inputs need to be scaled. To represent a polynomial or non-polynomial function such as trigonometric functions whose co-efficient do not lie in the interval [0, 1], they are converted to Bernstein polynomials. For operations using memory, consider a toggle flip-flop (TFF). The TFF has the output probability of 1/2 irrespective of any non-zero input. So, TFF can be used to generate a stochastic number with a fixed probability thus ruling out the requirement of an RNG. Finite state machine (FSM) implementation of SC is shown in Fig. 3f where an up/down counter is used to obtain the state-transition. The counter will increment at X = 1 and decrement for X = 0. F is the mapping of output which can be realized using a combinational function of S.

Fig. 3
Six basic arithmetic operations. They include a stochastic multiplier, scaled adder, division counter, subtractor, F S M, and toggle flip flop.

Arithmetic, sequential, and FSM operations using SC [1] a multiplication, b scaled addition c division d subtraction e sequential element—flip-flop f FSM

3 Errors and Correlation in Stochastic Computing

Stochastic circuits can be designed using two different approaches. One is reconfigurable stochastic circuits, and other is fixed stochastic circuits. The inputs of reconfigurable stochastic circuits are programmable, and the circuit can be reused to implement different functions, whereas fixed stochastic circuits can be used for a dedicated function implementation. But, since SC is an approximate form of computing, there are various known sources of errors [1] as shown in Fig. 4.

Fig. 4
An illustration of errors in stochastic computing. It has 2 boxes labeled random number source and stochastic circuit joined by R 1, 2, and K. All the errors are marked.

Generic stochastic circuit with known sources of errors [1]

In above circuit, C is the core of the stochastic circuit. It can either be sequential or combinational logic circuit. X1, X2…. Xn are the input stochastic numbers of length N and are assumed to be independent of each other. To convert the input X to a desired probability of the value of X, the ancillary inputs R1, R2,…..Rk with constant value Ri = 0.5 and are used. Rk’s and Xn’s are assumed to be independent of each other. Any physical errors caused due to defect in hardware or sources like radiation are not shown in Fig. 4. The errors and the ways to overcome errors are described in detail as below:

Rounding Errors: These are errors caused due to quantization. For, e.g., a bit stream of Y bits can constitute of (Y + 1) numbers as SN = {0, 1/Y, 2/Y, ..., (Y − 1)/Y, 1}. If a number under consideration Z is not in this set, then it is rounded to the closest value in the set of SN. For example, with Y = 8 and Z = 0.130, we must round off Z to 1/8 = 0.125. The precision can be increased by increasing the length of SN.

Approximation Errors: These errors occur because not all functions can be realized exactly by SC. It is necessary to approximate them to something that is achievable. All stochastic function values are scaled to be in the interval [0, 1]. Without Rk’s as inputs, stochastic functions using one variable that are synthesizable are only trivial case of X and 1−X. So, functions like square root, squaring, sin(X) can be done by using Bernstein approximation.

Random Fluctuation Errors: These errors are caused because the N bit SN formed by using an RNG is pseudo-random in nature. These errors can be specified by calculating the mean square error. As the length of stochastic bit stream is increased, random fluctuation errors and rounding errors tend to decrease.

Constant-Induced Errors: These errors are caused because of the ancillary inputs. However, these errors can be reduced by transferring the ancillary inputs to sequential sub-circuits C so that their behavior can be traced. An algorithm CEASE is established to take out this error. CEASE replaces the ancillary inputs by sequential components equal to a mod counter with multiple moduli. It releases the weighted input bits when the counter-overflows.

Correlation Errors: Errors caused by correlation are due to intercommunication between the bit streams that occurs during computation which introduce dependencies and similarities between different bit streams. Correlation is also caused mainly due to three factors: RNS which are not able to produce good quality independent SNs, sharing of RNS to decrease the hardware cost of circuit and temporal dependencies introduced by sequential circuits. So, as the circuit size increases, correlation also increases. To quantify correlation, the stochastic computing (SCC) for a pair of SNs, X and Y, is defined as follows in Eq. 1:

$$ {\text{SCC}}\left( {X,\,Y} \right) = \left\{ {\begin{array}{*{20}c} {\frac{{p_{{{\text{X}} \wedge {\text{Y}}}} - p_{{\text{X}}} \cdot p_{{\text{Y}}} }}{{\min \left( {pX,\,pY} \right) - pXpY}}} & {{\text{if}}\,p_{{{\text{X}} \wedge {\text{Y}}}} > p_{{\text{X}}} \cdot p_{{\text{Y}}} } \\ {\frac{{p_{{{\text{X}} \wedge {\text{Y}}}} - p_{{\text{X}}} \cdot p_{{\text{Y}}} }}{{pXpY - \max \left( {pX + pY} \right) - 1,\,0}}} & {{\text{otherwise}}} \\ \end{array} } \right. $$
(1)

where PX^Y is P(X(t) = 1, Y(t) = 1) for all t. Bit streams that are positively correlated, i.e., the 1s and 0s of a bit stream overlap with each other have SCC value as 1. Negatively correlated means that the 1s and 0s of the bitstreams under consideration do not overlap at all, and SCC value is −1. Non-correlated or independent bit streams are the ones where each bit stream is obtained from different Bernoulli sequences as the random number source giving SCC value as 0. There are several ways to avoid or manipulate correlation as below:

  1. (a)

    Regeneration: In this method, the corrupted SN is converted into binary format and then given to a SNG to generate a new SN from it. This method has added to more hardware, more latency and desirable properties like progressive precision can be lost. [11]. Figure 5a shows a regeneration-based decorrelator.

  2. (b)

    Isolation: In this method, D flip-flop is introduced at the bit stream frequency into the line containing corrupted SN. So, it delays the corrupted SN by one clock cycle. If the corrupted SN under consideration has bits which are independent, naturally, the new delayed version of the corrupted SN and the original SN are independent of each other. Figure 5b shows isolation-based decorrelator.

  3. (c)

    Shuffle-based Decorrelation: Shuffling will re-randomize the bits in the corrupted SN thus helping reduce cross-correlation and autocorrelation. Complete elimination of correlation is not possible by this method because the newly generated SN will be correlated with original SN. Figure 5c shows an example of shuffle-based decorrelation.

Fig. 5
Three diagrams of regeneration-based, isolation-based, and shuffle-based decorrelators labeled A, B, and C, respectively.

a Regeneration-based decorrelator [11], b Isolation-based decorrelator [11], c Shuffle-based decorrelator [1]

  1. (a)

    Correlation Insensitive: There are few examples of stochastic circuits which are not sensitive to correlation. It means, there is no need to bother about the auto- or cross-correlation in such circuits. It means, the circuit will function as desired even with correlated inputs. Such stochastic computing circuits are called as correlation-insensitive circuits.

  2. (b)

    Correlation Injection by Synchronizer and Desynchronizer: Many stochastic circuits require uncorrelated inputs. But, circuits that implement functions which require the SN’s to be either positively or negatively correlated. E.g., to calculate the difference, an Ex-OR gate requires maximally correlated inputs. A type of SNG called synchronizer that can control the amount of correlation is used in such circuits. But, regenerating SNs with desired correlation levels between a stochastic system incurs a high hardware cost and introduces latency. Similarly, a desynchronizer can be used to decrease the correlation between the SN’s. Errors due to autocorrelation can be managed by CEASE.

4 Optimization of Stochastic Circuits

Many attempts are being made to optimize the SC circuits with respect to various factors. The random number generator is one of the main blocks of SC which occupies more hardware area and needs to be optimized. The second part is correlation, which if managed properly can save a lot of hardware area, decrease latency, and increase the accuracy. The third part is the connecting wires between RNS and PCC. Various experimentations with respect to RNG have been done in various research papers until now. They are listed as below:

  1. (i)

    Circular shifting of output of LFSR to be able to use a shared RNG [12].

  2. (ii)

    Permutation of output of shared LFSR in reverse lexicographic order instead of circular shifting [13].

  3. (iii)

    Use of S-box at the LFSR’s output and then share it to multiple RNGs. This method increases hardware complexity as the S-box is a complex combinational circuit [14].

  4. (iv)

    LFSR with nonlinear S-box function to show better autocorrelation and cross-correlation properties as compared to LFSR [15].

  5. (v)

    Use of spintronic devices instead of LFSR to decrease the area and power consumed by the stochastic core [16].

  6. (vi)

    Designing SNG’s by sharing an RNS based on randomization function [17].

  7. (vii)

    Use of LUT-based Quasi SNG where low discrepancy (LD) sequences are used to develop stochastic numbers [18]. Since LD sequences are uniformly spaced 1 s and 0 s, this SNG has better accuracy, convergence, throughput as compared to LFSR, and a less random fluctuations and execution time because of parallel implementation. The drawbacks include need for optimization with respect to area.

  8. (viii)

    Bit flipping of output of LFSR that makes the generated SN’s correlation insensitive and allows sharing of RNS [19]

Similarly, correlation can be managed by using correlation-insensitive circuits or correlation manipulative circuits like isolators, synchronizers, desynchronizer, decorrelators, regenerators, or judiciously selecting the seed of LFSR for generating RNS.

Considering above all design aspects, SC has been applied to applications where many arithmetic operations need to be done. This is because, simple logic gates are used to perform these operation, and hence, resultant decrease in the area and power is in 100 s as compared to traditional deterministic computing. Also, since the precision required in such applications is low, use of excessively long SN’s can be avoided thus limiting the latency.

5 Advantages and Drawbacks

SC utilizes randomness as a valuable resource and converts probability values into another statistical bit stream. It lacks place value, i.e., no LSB or MSB as compared to deterministic computing. This makes the computation robust. Apart from this, other advantages of SC are small area, less power, error tolerant, supports parallelism; standard digital components can be used to realize the circuits. Its primary disadvantages are correlation induced inaccuracy, random number fluctuations, high latency; randomness sources are costly, and the theory is not yet fully understood. The design of a stochastic circuit involves complex trade-offs among accuracy, computing time, and hardware area cost.

6 Conclusion and Future Challenges

From the various methods to design a SNG described above, use of LD sequences for generating RNS gives higher throughput. Also, use of randomization function at the output of LFSR gives better correlation management. However, both of these methods claim additional area cost. So, use of either or both of above methods to design SNG without the addition of area or latency overhead need to be explored more for future work.

Also, automatic synthesis of optimally correlated deterministic RNS for design SNG’s is also a challenge.

The domains to which SC can be successfully applied are artificial neural networks (ANNs), image processing, control systems, DSP, and decoding of modern error correcting codes [5]. Application of SC in implantable medical devices still remains a challenge for the researchers.