Introduction

The sensory system is responsible for processing the sensory information. The commonly recognized sensory systems include visual, auditory, olfactory, taste and touch systems (Field 1994; Kandel and Schwartz 2013). Visual inputs play a critical role in primates’ response to the external environment, and in a normal individual, over 70% information is derived visually (Treichler 1967). Visual system is one of the most widely and intensely studied sensory systems, as well as the most complex sensory nervous system; however the underlying visual mechanism is yet to be elucidated (Jessell Thomas et al. 2000; Rieke et al. 1997; Yan et al. 2016; Barranca et al. 2014). The surrounding visual information is processed in the retina, lateral geniculate nucleus, visual cortex, and other regions of the central nervous system (Schiller 1986; Gross 1994; Qiu et al. 2016). A large number of biological experiments are focused on the lateral geniculate and visual cortex of the visual system. The anatomical structures and functional properties of the retina have been studied comprehensively. Stimulation can be defined and the corresponding neuronal responses recorded easily. Therefore, in recent years, investigations are increasingly focused on the study of the retina (Bartsch et al. 2008; Maturana et al. 2014). The retina is the most investigated area in the brain, retinal mechanisms and responses have been studied with a great accuracy (Khoshbin-e-Khoshnazar 2014; Maturana et al. 2016; Urakawa et al. 2017). There are many models of retina which are accessible in the literature. They include detailed biophysical mechanisms as well as computationally efficient and abstract models (Qureshi et al. 2014; Touryan et al. 2005). The most computationally effective model is a set of models each of which represent a single cell (Reich and Bedell 2000). The reduction of theory would be a good idea to study the distribution of retinal responses, their properties general across all cells and dependency between responses of different retinal cells. Such information may help us to omit obtaining redundant information of responses by means of numerical integration and thus improve simulation speed with minor assumptions induced. So studying general properties such as response sparseness is of a great importance (Bakouie et al. 2017; Gravier et al. 2016).

A majority of the visual neuroscientists used random stimuli with simple statistical properties and stimulation patterns (such as light spots or light bars) to study the characteristics and coding mechanism of the neurons (Atick 1992; Felsen et al. 2005). However, animals and humans live in the natural as well as a rather environment. The retina is located in a complex visual environment and is naturally optimized to process the natural stimuli (Kandel and Schwartz 2013; Jessell Thomas et al. 2000). Although natural scenes are viewed as simple superposition of different types of stimuli, neuronal responses are not simple. Therefore, we cannot determine the mechanism of neuronal processing and coding of complex natural visual stimuli accurately using simple stimuli. Thus, natural stimuli are vital for exploring the perception of the human brain (Felsen et al. 2005; Simoncelli and Olshausen 2001; Felsen and Dan 2005).

Since information processing and transfer in the brain are restricted by energy metabolism, the nervous system may use energy-efficient strategies to deal with information, especially visual information processing and coding (Barlow 1961; Mizraji and Lin 2017). Recent advances in neurobiology suggested that neuronal response of the primary visual cortex (V1) to natural stimuli might be attributed to a sparse approximation of images and encoding stimuli to activate the specific neurons (Olshausen and Field 1996; Simoncelli 2003; Olshausen and Field 1997). Sparse coding refers to the activation of a subset of neurons at a specific period in response to a stimulus. Consecutively, single neuronal activities are primarily maintained at a low level during stimulation (Olshausen and Field 2004; Zheng et al. 2016; Lewick 2002). Sparse coding is essential for the processing of visual information; it reduces the number of neurons involved, saves energy consumption, improves the efficacy of information transmission and enhances the ability of information processing (Hubel and Wiesel 1997; Peters et al. 2017). In recent years, visual information processing and coding have been comprehensively investigated and the results (Lewick 2002; Hubel and Wiesel 1997; Peters et al. 2017; Vinje and Gallant 2002) allow simulation of the visual system in silico. Combining the data and results obtained by neurophysiologists using signal processing and computing theory facilitates the simulation of visual system by computers in order to resolve the challenges encountered in image processing (Huberman et al. 2008; Pillow et al. 2008; Tozzi and Peters 2017).

So far, there are many studies on the RGCs. Kameneva et al. use the Hodgkin–Huxley model to simulate morphological and physiological characteristics of RGCs (Kameneva et al. 2016). Hadjinicolaou et al. established a linear and non-linear model that predicts the response of RGCs under multi-electrode array stimulation (Hadjinicolaou et al. 2016). Wohrer and Kornprobst proposed virtual retina software which can simulate large-scale simulation of up to 100,000 neurons with good physiological characteristics (Wohrer and Kornprobst 2009). These experiments show that we can simulate the relevant characteristics of RGCs very well through the computer.

Fast Independent Component Analysis (FASTICA) is a unique sparse coding algorithm invented by Hyvärinen and Hoyer (Hyvarinen 1999; Hoyer and Hyvarinen 2000; Hyvarinen et al. 2001). It is a fast iterative algorithm based on an orthogonal rotation of pre-whitened data, using a fixed-point iteration scheme that maximizes the non-Gaussianity of the rotated components. The convergence of FASTICA is rapid, and independent of the step-size. In authors’s previously published paper,standard sparse coding and sparse coding based on FASTICA have been simulated, and the time of training bases, the convergence speed of objective function and the sparsity of coefficient matrix were compared respectively. The results show that sparse coding based on FASTICA is more effective than standard sparse coding. In addition, many experiments have proved that FASTICA can simulate sparse coding very well (Wang and Wang 2017; Hyvärinen 1999).

Methods

The basic idea of FASTICA is to determine the matrix \(W\) to maximize \(W^{T} X\) non-Gaussianity with projection steps (\(X\) projected onto \(W\)).

The FASTICA involves the following steps:

  1. 1.

    Whitening and centralizing the input data \(X\) and we can obtain processed data \(Z\).

  2. 2.

    Random initialization of Projection matrix \(W_{p}\).

  3. 3.

    \(W_{p} = E[Zg(W_{p}^{T} Z)] - E[g^{\prime}(W_{p}^{T} Z)]W_{P}\), where \(g\)(.) = \(\tanh\)(.), \(E[.]\) is an averaging operation.

  4. 4.

    Each iteration of the process is \(W_{p} = W_{p} - \sum\limits_{j = 1}^{p - 1} {(W_{p}^{T} W_{j} )} W_{j}\)

  5. 5.

    Normalization: \(W_{p} = W_{p} /\left\| {W_{p} } \right\|\)

  6. 6.

    Repetition of step 3 until \(W_{p}\) convergence is obtained.

After the above steps we can get the projection matrix, the initial characteristic matrix can be obtained by the following formula:

$$A = (W(W^{T} W)^{ - 1/2} )^{ - 1}$$
(1)

Then the resulting initial feature matrix \(A\) is brought into the following process:

  1. 7.

    Initialize the coefficient matrix \(S = A^{T} X\).

  2. 8.

    According to given \(A\), solve to minimize \(S\).

  3. 9.

    According to the given \(S\), solution to minimize \(A\).

  4. 10.

    Record the value of the coefficient corresponding to each pixel in the 1000 small image blocks, stop the iteration after 100 iterations, otherwise repeat step 8.

Here we did not use gaussian function because tanh function has better versatility (Hyvarinen et al. 2001; Hyvarinen and Hoyer 2002).We established a sparse coding model in order to describe the FASTICA algorithms effectively (Fig. 1).

Fig. 1
figure 1

Sparse coding model. The model consists of three layers: input, basic function and output. Data is fed into the input layer and trained by FASTICA to obtain basic functions, which are converted to output data through coding process

In the experiments, training samples consist of a sample of 10 image patches of 512 × 512 pixels sampled from natural images. We selected 10,000 mini-patches of 10 × 10 pixels from 10 images (each image 1000) randomly. Taking into account the computational speed and time cost, we randomly selected 1000 small blocks from the above 10,000 small blocks for experiments. The extracted small image blocks are taken as input data and processed through a sparse coding model based on FASTICA. The number of fixed iterations is 100, and finally the feature matrix and its corresponding coefficient matrix are obtained. It should be noted that the purpose of this paper is to simulate the temporal and spatial responses of ganglion cells under different stimuli through sparse coding based on FASTICA. Therefore, we consider each pixel of 1000 mini-patches of images as a ganglion cell and ignore visual field of neurons, and the coefficient corresponding to each pixel is the distribution of ganglion cells. The total number of pixels and corresponding ganglion cells are 100,000. The iterations equivalent to the experiment time in a physiological experiment, each iteration corresponds to the activation of all 100,000 ganglion cells for image stimuli. Due to the “fire or not” of neuronal action potentials, we do not consider the size of the coefficient for each pixel, we just consider the value of the coefficient is nonzero or zero. (zero means a ganglion cell is not activated and no action potential is generated, nonzero means a ganglion cell is activated and generated an action potential). So in this paper we consider a non-zero value as 1 in order to simplify the calculation.

Two types of stimuli are discussed in this study: natural scenes and random checkerboard stimuli, compared with the natural stimuli. Natural stimuli comprise of the sample with 10-image patches measuring 512 × 512 pixels derived from natural images. Conversely, the random checkerboard stimuli consist of 64 × 64 patches measuring 8 × 8 pixels. The brightness of each patch generated by the random sequence is graded on a scale of 0–1, with 1 representing black and 0 denoting white. The RGC was simulated by training the FASTICA algorithm with either natural scenes or random checkerboards. Then using the outputs of the trained model to simulate the responses to either natural scenes or random checkerboards. The reason why we use random checkerboard stimuli is that the artificial stimuli used in physiological experiments are random checkerboard stimuli. Checkerboard image stimuli was used in physiological experiments to measure the receptive field of ganglion cells, and the ganglion cells were compared with the response of random checkerboard stimulation and natural image stimulation at the same time. Since we want to compare the simulated experimental data with the physiological experimental data, we also used checkerboard stimuli, not white noise stimuli.

Statistical analysis

The neuronal response to stimuli was measured using kurtosis (Willmore and Tolhurst 2001). Kurtosis is the fourth moment of distribution at the peak. The value of kurtosis is close to zero for a Gaussian distribution and a high positive value is related to heavy-tailed peak distribution, similar to that in sparse neuronal response.

$$K(X) = \frac{{E\left[ {(X - \mu )^{4} } \right]}}{{\sigma^{4} }} - 3$$
(2)

Sparseness can be defined as lifetime and population sparseness. The fourth moment of the response distribution of a single neuron in response to stimuli was used for the quantification of the lifetime sparseness (\(K_{L}\)). Exposure to \(M\) stimuli evokes neuronal responses \(r_{1}\)\(r_{i}\)\(r_{\text{M}}\), and the lifetime sparseness, \(K_{L}\) is obtained by the following equation:

$$K_{L} = \left[ {\frac{1}{M}\sum\limits_{i = 1}^{M} {\left( {\frac{{r_{i} - \overline{r} }}{{\sigma_{r} }}} \right)}^{4} } \right] - 3$$
(3)

where \(\overline{r}\) and \(\sigma_{r}\) represent the mean and standard deviation of the responses, respectively. A distribution manifests high positive kurtosis if it contains several small responses (as compared to the standard deviation \(\sigma_{r}\)), and only a few large responses. A large \(K_{L}\) value implies short neuronal firing, and most often the absence of firing

Similarly, the distribution of the firing rate of the neuronal population in response to stimuli can be used to quantify the population sparseness (\(K_{P}\)), which is calculated as follows:

$$K_{P} = \left[ {\frac{1}{N}\sum\limits_{j = 1}^{N} {\left( {\frac{{r_{j} - \overline{r} }}{{\sigma_{r} }}} \right)^{4} } } \right] - 3$$
(4)

where \(N\) denotes the number of neurons, \(r_{j}\) is the firing rate of the entire population of neurons to a single stimulus, \(\overline{r}\) and \(\sigma_{r}\) represent the mean and standard deviation of the firing rates, respectively. The population sparseness is indicated by the distribution of responses of the entire coding population of N neurons to a single stimulus. A large \(K_{P}\) value indicates limited neuronal firing, and a majority of the neurons do not fire in response to a single stimulus.

Treves et al. proposed an alternative to lifetime sparseness and population sparseness (Treves and Rolls 1991). Similar to kurtosis, the parameter \(a\) measures the shape of response distribution. However, it is appropriate only for responses (such as those of real neurons), which range from 0 to + \(\infty\). The Treves–Rolls sparseness \(a\) is obtained by the following equation:

$$a = \frac{{\left( {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {r_{i} } } \right)^{2} }}{{\frac{1}{n}\sum\nolimits_{i = 1}^{n} {r_{i}^{2} } }}$$
(5)

The sparseness value is subtracted from 1 in order to obtain \(S\) that increases with rising population sparseness:

$${\text{S}} = \frac{1 - a}{{1 - \frac{1}{n}}}$$
(6)

The parameter \(S\) is used to characterize the lifetime response of single neurons (\(S_{L}\)) and the population response of the entire neurons (\(S_{P}\)) in response to stimuli. If the measure was used to represent M different responses generated by a single neuron in response to different stimuli (Eq. 1), the proportion of strong responses is closely related to lifetime kurtosis of the specific neuron. The value of \(S\) is between 0 and 1. A large \(S_{L}\) suggests the more sparse the discharge activity of a single neuron under stimulation, the neuron does not issue action potentials for most of the time under stimulation. A large \(S_{P}\) value suggests firing by a small number of neurons and most of the neurons do not respond to a single stimulus.

Student’s t test was employed for the comparison of the statistical results.

Results

Simulation of responses in single RGCs

We simulated the responses of single RGCs to stimuli. Consequently, RGCs showing an enhanced response to natural stimuli and random checkerboard stimuli was selected. The kurtosis was calculated, and the \({\text{K}}_{\text{L}}\) values were 5.111 and 1.308, respectively. These results suggested that the RGCs exhibits a significant lifetime sparseness in response to natural and random checkerboard stimuli. The RGCs fires sparsely in response to natural stimuli. In addition, we calculated the Treves–Rolls sparseness of the RGCs, and the \(S_{L}\) values were 0.587 and 0.418, respectively. The resulted also showed that the RGCs fires more sparsely in response to natural stimuli than random checkerboard stimuli. The responses of a ganglion cell to natural stimuli and random checkerboard stimuli are shown in Figs. 2 and 3.

Fig. 2
figure 2

The distribution of a ganglion cell under natural image stimulation

Fig. 3
figure 3

The distribution of a ganglion cell under random checkerboard stimulation

Horizontal axis represents the duration of the experiment and the vertical axis represents the response of the RGCs: 1 indicates neuronal firing, and 0 denotes the absence of response. Each vertical line indicates a single spike.

Simulating the population response of RGCs

Next, we simulated the population response of RGCs to stimuli. The statistical results show that approximately 65% of the RGCs do not fire and nearly 6% fire more than 15 times constantly in response to natural stimuli (Fig. 4). About 72% of the RGCs do not fire, and all the RGCs fire less than 15 times in response to random checkerboard stimuli continually (Fig. 5).

Fig. 4
figure 4

Population response of RGCs to natural stimuli

Fig. 5
figure 5

Response of RGCs population to random checkerboard stimuli

The lifetime sparseness of single neurons is an interesting property, which is not identical to population sparseness. Population sparseness is an instantaneous response of the population, whereas lifetime sparseness is defined as the lifetime response of an individual cell. We compared the population response of RGCs to different stimuli quantitatively using population sparseness parameters \(K_{P}\) and \(S_{P}\). Next, we simulated the response of RGCs population to natural stimuli and determined that the \(K_{P}\) was 5.019. Interestingly, the RGCs fire substantially less in response to the random checkerboard stimuli than natural stimuli. However, the \(K_{P}\) was 1.499 corresponds to random checkerboard stimuli suggesting that the “burst” of action potentials of single neurons primarily contribute to the sparse representation of the population of RGCs. Subsequently, the Treves–Rolls sparseness of population response to natural stimuli and random checkerboard stimuli was evaluated, and the \(S_{P}\) values were found to be 0.578 and 0.319, respectively. The results also suggested that in response to natural stimuli, individual neurons fire at a low rate, while the occasional “burst” of neuronal population transmits information efficiently.

The statistical results of lifetime sparseness (\(K_{L}\), \(S_{L}\)) and population sparseness (\(K_{P}\), \(S_{P}\)) of the neuronal activities in response to natural and random checkerboard stimuli are shown in Table 1, together with the results recorded from the chicken RGCs (Zhang et al. 2010). We observe the simulation data are more consistent with the results of real biophysical experiments recorded in the chicken RGCs. Results of this paper do not correspond to experimental data presented in Table 1, because our experiment is a simulation experiment and quoted data is from a physiological experiment. It is normal that the data of two experiments are not completely consistent, but the result of the two is the same. For example, the \(K_{L}\) data of natural stimuli is larger than random checkerboard stimuli in simulation experiment in Table 1, and the \(K_{L}\) data of natural stimuli is also larger than random checkerboard stimuli in chicken retinal experiment. Although the \(K_{L}\) values of the two experiments were different, they all showed that the response of ganglion cells under natural image stimulation were more sparse.

Table 1 Results of simulation and chicken retinal experiments

It should be noted that since the method of modelling and experimental are different, the values of modelling and experimental results are also different. Although there are discrepancies in the values between modelling and experimental data, the results of modelling and experimental are consistently. In our simulation experiment the outputs of neurons were 0 or 1, a burst of neural activity means multiple 1 values in less continuous iterations. The exact statement here about a burst of neural activity is the high frequency firing of a single ganglion cell. Here we correspond 100 iterations in our simulation experiment to a period of time in a physiological experiment. More in detail, 100 iterations in our simulation correspond 192 s of stimulation in the reference (Zhang et al. 2010).

The differential response of RGCs in response to natural and random checkerboard stimuli is explained by Theunissen et al. (2001). The study recorded neural response in the V1 of awake macaques, using natural spatiotemporal statistics, and also in response to a dynamic grating sequence stimulus. In addition, the fitted nonlinear receptive field models were established using each of these data sets and their varying response to natural visual stimuli compared. Although reference (Theunissen et al. 2001) is about cortical neurons, we think this explanation can also be applied to the retinal, because we believe that the main reason is the difference between the natural image and the random check-board image, which may not be related to the structural difference of cortical neurons and retinal. On average, the model using the natural stimulus predicted the visual response more than two times as accurately as the model fit using artificial stimulus. During natural stimulation, the temporal responses often showed a stronger late inhibitory component, indicating the effect of nonlinear temporal summation during natural vision. The natural stimuli are also strongly related spatially and temporally in contrast to synthetic stimuli. RGC receptive fields reduce the relevance between natural stimuli spatially and temporally, thereby, decreasing the number of neurons involved. Sparse coding can make neurons fire less times or reduce the number of activated neurons to represent enough information, and the efficacy of information transmission was improved. More in detail, sparse coding can reduce the energy consumption of action potentials produced by neurons. Since the brain consumes a certain amount of energy per unit time, less activated neurons or action potentials means less energy to represent a particular stimulus. Therefore, retinal processing of natural stimuli improves the efficiency of information transmission.

Discussion

We simulated the responses of RGCs to natural and random checkerboard stimuli using FASTICA. Also, kurtosis, lifetime and population sparseness were investigated and analyzed. These results were in agreement with biophysical experiments involving chicken RGCs.

First, the RGCs exhibited the significant lifetime sparseness in response to natural stimuli and random checkerboard stimuli. About 65 and 72% of the RGCs do not fire continually in response to natural and random checkerboard stimuli, respectively. The results showed that RGCs might respond to stimuli by sparse coding.

Second, we calculated the kurtosis and the Treves–Rolls sparseness of the RGCs in response to natural and random checkerboard stimuli. Consequently, both \(K_{L}\) and \(S_{L}\) values were larger in the case of natural than random checkerboard stimuli. The results show that the RGCs fires sparsely in response to natural stimuli.

Third, the population of RGCs fires much less in response to random checkerboard stimuli than natural stimuli. However, \(K_{P}\) and \(S_{P}\) values were larger with natural than random checkerboard stimuli. The results suggested that in response to natural stimuli, individual neurons fire at a low rate, while the occasional “burst” of neuronal population transmits information efficiently.

Finally, about 6% of the RGCs fire more than 15 times in response to natural stimuli while all of the RGCs fire less than 15 times in response to random checkerboard stimuli continuously. This phenomenon indicated that RGCs coding stimulates the information depending on a small number of RGCs that fire at a high frequency.

The responses of RGCs to natural and random checkerboard stimuli were simulated using FASTICA. We investigated and analyzed kurtosis, lifetime and population sparseness. The results of these simulation experiments were consistent with those of the biophysical experiments involving chicken RGCs indicating their reliability. The natural stimulus inputs are derived from the training set of natural scenes in this paper. In making a case for the robustness of the study, since presumably mammals have a vast set of “training inputs”, sparse coding was facilitated even for inputs beyond previously observed scenes. Although checkerboard image stimuli was used in physiological experiments to measure the receptive field of ganglion cells, we did not consider receptive fields for simplicity.

The results suggested that RGCs may represent stimuli by sparse coding. Sparse coding facilitates the retinal processing of natural visual information efficiently and improves the efficacy of information transmission. It also ensures optimal balance between metabolic energy consumption and efficiency of information transmission and was found to be conceptually similar to energy coding; thus, it uses minimal energy (Wang and Zhu 2016; Wang et al. 2017; Hasenstaub et al. 2010). The fewer the number of active neurons in a neural network, the less energy the network cost. Studies have shown that sparse neural coding patterns reflect the maximization of energy efficiency, that is, consume little energy to encode information (Levy and Baxter 1999; Laughlin 2001). Our study is mainly a simulation experiment without physiological studies, because biophysical mechanisms are too complicated and many mechanisms are not yet clear (Protopapa et al. 2016; Momtaz and Daliri 2016). We are simply trying to simulate and perform a simple analysis on a simplified ganglion cell activity using an artificial neural network, so biophysical mechanisms were not investigated in the study.

Since presumably mammals have a vast set of “training inputs”, sparse coding is facilitated even for inputs beyond previously observed scenes. Nevertheless, some complex mechanisms in the simulation experiments was not investigated. However, the results of simulation suggested the feasibility of in silico modeling of the visual system. Although the current simulation experiments illustrated the proof of concept, additional studies are essential to validate these findings.