1 Introduction

Understanding how sensory stimulus information is encoded in sequences of action potentials (spikes) of a sensory neuronal population and what makes neurons fire are challenging problems of systems neuroscience. Given any stimulus, predicting the neural response as accurately as possible would help to give insights into understanding the computations carried out by the neural ensemble and to comprehend further the encoding mechanism of the sensory system.

Current studies have mainly used linear or nonlinear regression methods to investigate the relationship between response of neural populations and visual stimuli (Ashe and Georgopoulos 1994; Fu et al. 1995; Luczak et al. 2004). Linear least squares regression is the simplest and most commonly used regression technique. However, this method is performed under the assumption that neural responses in a time bin are Gaussian distributed (Brillinger 1988), which is obviously not appropriate for describing spiking activity of neural responses. Generalized linear models (GLMs) emerge as a flexible extension of ordinary least squares regression allowing one to describe the neural response as a point process (Chornoboy et al. 1988) and find a best fit to the data (McCullagh and Nelder 1989; Paninski 2004). The linear-nonlinear Poisson (LNP) cascade model (Simoncelli et al. 2004), as the simplest example of GLMs, convolves the stimulus with a linear filter, subsequently transforms the resulting one-dimensional signal by a pointwise nonlinearity into a non-negative, time-varying firing rate, and finally generates spikes according to an inhomogeneous Poisson process. Other encoding models, extended on the basis of LNP cascade model, have also been shown to perform well for the prediction of spike trains in the hippocampus (Harris et al. 2003), in the retina (Pillow et al. 2008, 2005), and in the motor cortex (Truccolo et al. 2010), and for decoding motor cortical activity (Lawhern et al. 2010).

Besides the GLMs, there are also other types of models established to describe the computation process of neuronal population in V1 with an aim to reveal some encoding mechanisms of the visual cortex, such as the recurrent network model (Kaiener et al. 2009; Kriener et al. 2008), the information integration model (Wang et al. 2011), and the generalized thinning and shift (GTaS) model (Trousdale et al. 2013). However, few models have taken the small-world structure of neural ensembles into consideration. The recently developed methods have provided powerful tools for studying the functional connectivity property of brain networks (Sporns and Zwi 2004; Eldawlatly et al. 2009; Partzsch and Schüffny 2012; Leergaard et al. 2012). Most of them demonstrated that biological networks presented small-world properties, observed not only in large neural networks with each node representing a cortical area (Sporns and Zwi 2004; Bassett and Bullmore 2006; Kaiser 2008), but also in the local neural networks recorded with microelectrode arrays (Yu et al. 2008; Gerhard et al. 2011). A small-world network constitutes a compromise between random and nearest neighbor regimes, resulting in a short average path length despite the predominance of local connections (Kaiser 2008), which reflects the high efficiency of the network in transmitting and processing information (Achard and Bullmore 2007). Several simulated models have also been established to prove the important role of topological structure in representing the encoding dynamics of neural populations (Pernice et al. 2013, 2011; Trousdale et al. 2012). However, most of those models are based on a large quantity of neurons (even more than 10,000), which exceeds the recording capability of MEA. Thus, these models have difficulty obtaining direct supports and examination from electrophysiological experiments.

In this paper, we established a population encoding model following the building principle of GLM to describe the computing process of local neuronal populations recorded from V1. The model is mainly composed of three sets of linear filters, which respectively capture the covariate of visual stimuli, dependencies on its own post-spike history (for example, refractoriness, recovery periods, and adaptation), and dependencies on the recent spiking of adjacent coupling cells. The main difference from the ordinary GLM was the last part, in which we additionally constructed a small-world network when depicting coupling relationship among the neuronal population. The electrophysiological experiments under grating stimuli were designed to fit parameters of the model and to check the model further. The results show that both the encoding and decoding accuracy were improved by taking the small-world network structure into account, suggesting that the small-world structure of the local population in V1 may play an important role in encoding visual information.

2 Material and methods

2.1 Framework of the population encoding model

The population encoding model built here is to describe the computing process of local neuronal populations recorded from V1. Since the spiking probability of each individual cell is reported to be related to modulation by stimuli, past neuron activity, including its own and the coupled ensemble’s (Okatan et al. 2005; Truccolo et al. 2005), the spiking of individual neurons among an \(N\)-neuron population was described with three sets of linear filters: a spatiotemporal stimulus filter (denoted by R); a post-spike-dependent filter (denoted by P), which captures mostly refractoriness, recovery periods, and adaptation of the neuron; and a set of coupling filters C, which capture dependencies on recent spiking of other coupled cells. The summed filter responses are then exponentiated to obtain an instantaneous spike rate for each individual neuron. Note that the coupling relationship among the network satisfies small-world properties (see Sect. 2.2 for details), which is the main difference from the ordinary GLMs. The framework of the small-world based population encoding model (SW-based model for short) is shown in Fig. 1.

Fig. 1
figure 1

Model schematic for the neural population with small-world structure: each neuron has a stimulus filter, a post-spike filter, and coupling filters that capture dependencies on spiking in other neurons. And the coupled neurons constitute a “small-world” network. Summed filter output passes through an exponential nonlinearity to produce the instantaneous spike rate

2.2 Building each part of the model

As is shown in Fig. 1, each neuron from an \(N\)-neuron population is modeled with three sets of filters with coupling neurons constituting an \(N\)-neuron small-world network. Given an arbitrary stimulus s, and the spiking history of an N-neuron population \(B_{1:i}^{1:N}\), the conditional intensity function of a single cell’s spiking activity in the \(i\)th time bin \(t_{i}\) is

$$\begin{aligned}&\lambda \left( {t_i |B_{1:i}^{1:N} ,s,\theta } \right) \nonumber \\&\quad =\lambda _R \left( {t_i |s,\theta _R } \right) \lambda _P \left( {t_i |B_{1:i} ,\theta _P } \right) \lambda _C \left( {t_i |B_{1:i}^{1:K} ,\theta _C } \right) \end{aligned}$$
(1)

where \(\theta =\left\{ {\theta _R ,\theta _P ,\theta _C } \right\} \) indicates the parameters of filters to be fitted with the neuronal data recorded from V1 (see Sect. 2.3). \(\lambda _R \left( {t_i |s,\theta _R } \right) \) is the spiking intensity induced by the extrinsic covariate of the stimuli s and is modeled as \(\lambda _R =\exp \left( {R{\cdot }s} \right) \). The corresponding filter R is represented with a spatiotemporal filter, which is approximated with the product of a spatial filter and a temporal filter (Liu and Yao 2014; Sun and Dan 2009):

$$\begin{aligned} R\left( {x,y,\tau } \right) =R_s \left( {x,y} \right) R_t \left( \tau \right) \end{aligned}$$
(2)

with \(R_s \left( {x,y} \right) \) denoting a spatial filter and obtained by approximating the measured receptive field (RF) with a 2D Gabor function in a least-squares way, and \(R_{t}(\tau )\) the temporal filter represented using a set of gamma distribution models (see Sect. 2.3.1 for details). \(\lambda _P \left( {t_i |B_{1:i} ,\theta _P } \right) \) and \(\lambda _C \left( {t_i |B_{1:i}^{1:K} ,\theta _C } \right) \) of Eq. 1 represent the components of the intensity function conditioned on the spiking history \(B_{1:i} \) of its own and that of other \(K\)-coupled neurons. They are, respectively, modeled as

$$\begin{aligned} \lambda _P= & {} \exp \left[ {\mu _0 +\sum _{n=1}^Q {p(t_n )B_{i-n} } } \right] \end{aligned}$$
(3)
$$\begin{aligned} \lambda _C= & {} \exp \left( {\sum _{k=1}^K {c_k \cdot B_k } } \right) =\exp \left( {\sum _{k=1}^K {\sum _{n=1}^{Q^{{\prime }}} {c_k \left( {t_n } \right) B_{i-n}^k } } } \right) \end{aligned}$$
(4)

In Eq. 3, \(\mu _0 \) is the baseline log-firing rate of the cell, \(p(t_n )\) indicates the gain coefficients of post-spike filter P at time \(t_n \). In Eq. 4, \(c_k \) and \(B_k \) represent the modulation filter and the history spiking of the \(k\)th coupling neuron. The post-spike filter P and coupling filter C were represented using a set of raised cosine basis of the form

$$\begin{aligned}&b_j \left( t \right) \nonumber \\&\quad =\left\{ \begin{array}{ll} \frac{\cos \left( {a\log \left[ {t+d} \right] -\phi _j } \right) +1}{2}&{}\quad \! \hbox {if}\, a\log (t+d)\in \left[ {\phi _j -\pi ,\phi _j +\pi } \right] \nonumber \\ 0&{}\quad \hbox {Otherwise} \end{array}\right. \\ \end{aligned}$$
(5)

All model coefficients \(\theta \) were fitted with neuronal data recorded from V1 using standard maximum-likelihood methods (Paninski 2004) as well as their significance value (denoted by \(p\)). For each neuron group, not all coupling filters among neurons were retained, which also aligns with the sparse connectivity in the visual cortex (Bassett and Bullmore 2006; Eavani et al. 2015). To eliminate some unnecessary couplings, the common way is to set a significance level \(\varvec{\alpha }\) empirically (e.g. 0.01 or 0.001) or to add a penalty upon \(\alpha _{t}\) without considering the connectivity property of the network. In this paper, we set a dynamic threshold \(\alpha _{t}\) to generate a small-world structure, as strong as possible. First of all, the threshold should keep each unit connected with at least one unit of the network, and the degree of the \(N\)-neuron connected network should be larger than \(\ln (N)\). Secondly, the small-world property of each thresholded network was then checked in the following way. The average path lengths \(L\) (Eq. 6) and clustering coefficient \(F\) (Eq. 7) of the obtained network were calculated and compared with those of 100 random networks, which were constructed by randomly repositioning the connections maintaining the same number of nodes and connections as the original estimated network.

$$\begin{aligned} L= & {} \frac{1}{\frac{1}{2}N\left( {N-1} \right) }\sum _{i>j} {d_{i,j} }\end{aligned}$$
(6)
$$\begin{aligned} F= & {} \frac{1}{N}\sum _{i=1}^N {\frac{2E_i }{K_i \left( {K_i -1} \right) }} \end{aligned}$$
(7)

In the above, \(N\) is the number of nodes in the network, \(d_{i,j} \) indicates the number of shortest path connecting notes i and \({{\mathbf {j}}}, {{\mathbf {K}}}_{i}\) indicates the number of nodes that connected with node i and \({{\mathbf {E}}}_{i}\) presents the number of actual connections among node i with its neighbors. Let \(\lambda \) indicate the ratio between the average path length of the target model and that of a random network, \(\gamma \) the ratio between the average clustering coefficients of the two networks. \(S_{w }= \gamma /\lambda \). A “small-world” property was inferred if the ratio \(S_{w}>\) 1 (Achard et al. 2006; Humphries et al. 2006; He et al. 2007). The structure of the network was finally determined with the threshold that produces the strongest small-world property (equal to the largest \(S_{w})\). We also set a control model to prove the advantages of the proposed model by estimating the connectivity in a similar way but without a small-world structure (\(S_{w }\approx \) 1), denoted by nSW-based model for short.

2.3 Electrophysiological experiments

The experiments were designed both to fit the parameters of the established model and to examine the model’s performance. Each set of parameters was fitted with neuronal responses to sinusoidal drifting gratings of the preferred speed and spatial frequency for most neurons, varying in orientation (12 equally spaced orientations, \(0^{\circ }\)\(330^{\circ }\)) and repeated at least 20 times, except the spatiotemporal filters (see Sect. 2.3.1 for detail). We used V1 from Long Evans rats as the animal model. The neuronal data were obtained with a polyimide-insulated platinum/iridium micro-electrode array (Clunbury Scientific, USA) that were arranged in four rows with four wires in each row (electrode \(\hbox {diameter} = 50\,\upmu \hbox {m};\, \hbox {electrode spacing} \!=\! 350\,\upmu \hbox {m};\, \hbox {row spacing }\!=\! 350\,\upmu \hbox {m};\, \hbox {impedance} \!=\! 20{-}50\,\hbox {k}\Omega )\) and recorded with a Cerebus system (Blackrock Company, USA).

Offline analysis was performed using a program in MATLAB. The single unit activity was obtained using a band-pass filter between 250 Hz and 5 k Hz, threshold detecting and spike sorting in preprocessing.

2.3.1 Measuring the spatiotemporal receptive field of each neuron

As mentioned in Sect. 2.2, the spatiotemporal filter was approximated with the product of a spatial filter and a temporal filter (Eq. 2), which are obtained by fitting the spatial and temporal RF, respectively. The RFs for individual neurons are measured by reverse correlation (Jones et al. 1987) with a single bright square (\(6.6^{\circ }\times 6.6^{\circ }\)) flashing on a black background at each of the \(11\times 11\) positions in a pseudo-random sequence (20 flashes/position). The spatial RF profile at each temporal delay was approximated with a 2D Gabor function (Eq. 8). The response amplitude as a function of time was fitted with gamma distribution function (Eq. 9) (Liu and Yao 2014; Sun and Dan 2009).

$$\begin{aligned}&\!\!\!R_s \left( {x,y} \right) \nonumber \\&\!\!\!=We^{-\left( {(x-x_0 )\cos \theta +(y-y_0 )\sin \theta } \right) ^{2}/\sigma _x^2 -\left( (y-y_0 ) \cos \theta -(x-x_0 )\sin \theta \right) ^{2}/\sigma _y^2 }\nonumber \\ \end{aligned}$$
(8)
$$\begin{aligned}&\!\!\!R_t \left( \tau \right) =A\left( {\tau -\tau _0 } \right) ^{\alpha }e^{-\left( {\tau -\tau _0 } \right) /\sigma +A_0 } \end{aligned}$$
(9)

where \(W\) is amplitude of the RF response, \(x_{0}\) and \(y_{0}\) refer to location of theRF centre, \(\sigma _x \) and \(\sigma _y\) determine width and length of the RF, \(\theta \) is orientation. \(A,\alpha ,\sigma ,\tau _0 \) and \(A_0 \)in Eq. 9 are free parameters.

2.3.2 Verifying the encoding effect of the model

The model was validated using another set of oriented gratings, with the preferred speed, spatial frequency, and orientation, presenting for 1 s and repeating 20 times. The raster output from each channel of the model was obtained and the time-varying average response (PSTH) was computed in 1-ms time bins, smoothed with a Gaussian kernel of widths 2 ms.

To quantify the prediction accuracy of the model, the residual error [defined as the difference between the true and predicted values (Andersen et al. 1992)] was computed and averaged over nonoverlapping moving time windows \([t_{T-I}\sim t_{I}\)] (\(T-I\ge 1)\) with Eq. 10.

$$\begin{aligned} e\left( {t_T } \right) =\left( \sum _{i=T-I}^T {M_i } -\int _{t_{T-I} }^{t_I } {\lambda \left( t \right) \hbox {d}t} \right) \Big /I \end{aligned}$$
(10)

where \(M_{i}\) denotes the mean firing rate of the neuron in the \(i\)th time bin, \(\lambda \) is the conditional density of the model indicating the predicted firing rate in each time bin, and \(I\) represents the number of bins.

Fig. 2
figure 2

The spatiotemporal RF of the recorded neuron and the estimated spatial and temporal stimulus filters. a Illustration of parameter fitting of spatiotemporal RF. Spatial RF in each frame (33 ms/frame) was fitted with 2D Gaussian (upper, white ellipse). Shown are spatial RF profiles at ten temporal delays (circles in lower trace). The fourth frame shown is the RF at the time of peak response. Amplitude of Gaussian fit is plotted against time (lower trace) to obtain the temporal stimulus filter. b The spatial filters for the 14-neuron group from the example dataset. The red circle indicates the spatial profile RF shown in a (color figure online)

2.4 Checking the model-based decoding performance

To check decoding performance of the established model, we applied the regularized logistic regression (Bishop 2007) to decode the stimulus orientation from a single-trial population response predicted by the model under the stimuli of an arbitrary oriented drifting sinusoidal grating of the preferred speed and spatial frequency for most recorded neurons. Each grating was presented for 1 s and repeated 20 times. The decoder was to decide whether this population response occurred in a trial under the stimulus with orientation \(\theta _{1 }\) or \(\theta _{2}\), ranging from \(0^{\circ }\) to \(330^{\circ }\), in steps of \(30^{\circ }\). Such a classification has proven useful to assess the quality of different population coding schemes (Berens et al. 2011). We trained the logistic regression model using the glmnet toolbox (Friedman et al. 2010) in MATLAB with \(L_{1/2}\) regulation instead of \(L_{1}\) regulation. The \(L_{1/2}\) regulation has proved to achieve better constringency performance than \(L_{1}\) regulation (Zhao et al. 2012). For each pairwise combination of stimulus orientation, the cross-validation was performed using repeated random subsample technique (80 % training data, 20 % test data) for the whole regularization path (50 regularization parameters spaced between \(\hbox {e}^{-10}\) and e). The correct percentage over all tested data for each orientation was averaged to estimate the decoding performance based on population response predicted by the model.

3 Results

3.1 Electrophysiological results

A total of five datasets (containing 14, 15, 16, 15, 12 units, respectively) have been collected in five rats’ V1 with the same MEA to fit and to examine the model further. The detailed results of the one with 14 units are presented together with the key results of other datasets.

The spatiotemporal RF was first mapped for each recorded neuron. The spatial profile and temporal dynamic of RFs were fitted according to the method referred to in Sect. 2.3.1 to estimate the spatial and temporal stimulus filters. Figure 2a presents the mapped spatial RFs at different time delays (upper) of an example neuron (also mentioned in Figs. 3, 4) and the response amplitude as a function of time (lower, indicated with blue empty dots). The former was fitted with a 2D Gabor function (Eq. 8) and indicated with an ellipse in Fig. 2a (upper). The latter was fitted with gamma distribution function (Eq. 9) and denoted with a solid line in Fig. 2a (lower). The performance of each fitting was assessed with goodness-of-fit statistics, including the adjusted coefficient of determination (denoted by adjR2) and root mean squared error (denoted by stdError), summarized in Table 1. The estimated spatial filters of the 14-neuron group (Fig. 2b) formed an approximate complete mosaic covering a small region of visual space.

Fig. 3
figure 3

a Seven-dimensional basis for post-spike filters (upper) and four-dimensional basis for coupling filters (lower). b Estimated post-spike filter for the example mentioned in Fig. 2. c The coupling filters from the rest of the population for the example neuron mentioned in Fig. 2

The post-spike-dependent filters and coupling filters were both represented using a set of raised cosine pulses as basis functions (Eq. 5, Fig. 3a) according to the inter-spike interval (ISI) distribution and the temporal structure of the normalized cross-correlation histogram (CCH), which was obtained from the normalized joint PSTH by subtracting the cross-product of the PSTHs and then dividing by the SD of PSTH predictors. The temporal range was chosen after the observation that the magnitude of most estimated filters decline back to zero well within the first 40–60 ms. The exponentiated post-spike and coupling filters estimated for the example neuron (mentioned in Fig. 2a) were shown in Fig. 3b, c, respectively.

The significance \(p\) value of each coupling filter between each pair of neurons was obtained using the penalized maximum-likelihood method. Then the dynamic threshold was determined to transform the \(p\) matrix to a binary graph to estimate the connection structure of the network. The threshold was adjusted until the strongest small-world property (corresponding to the largest value of \( S_{w})\) emerged (see Sect. 2.2). Figure 4 shows the estimated binary connection matrix of the SW-based model (\(S_{w} = 1.52\), Fig. 4a) in addition to that of the control nSW-based model (\(S_{w} = 1.02\), Fig. 4b) without the small-world structure. The corresponding typical characteristics of both networks across all recorded datasets are shown in Table 2. The main difference between the two kinds of networks was the number of connections among the population as well as their connection structures. The small-world property disappeared with the number of connections growing much larger (Fig. 4c). In the reverse, it was not a connected network any more if the threshold was set less than the first value in Fig. 4c.

Fig. 4
figure 4

The binary connectivity matrix of the 14-neuron network for the SW-based model (a \(\alpha _{t}=0.0247\)) and nSW-based model (b \(\alpha _{t}=0.0308\)). A solid circle at row \(i\) and column \(j\) indicates a directed link from neuron \(i\) to neuron \(j\). Neurons are sorted according to the electrode from which they were recorded from. c The network index \(S_{w}\) and the number of the connection plotted against the threshold level

3.2 Examination results of the model

The model built in this paper was tested in both encoding accuracy and decoding performance.

To test the encoding accuracy, the response of individual neurons to 20 repeats of 1-s novel stimuli (see Sect. 2.3.2 for detailed information) was predicted under the SM-based model. The firing rate in each 1-ms time bin was computed and averaged across 20 trials. The residual error between the predicted PSTH (smoothed with a Gaussian kernel of widths 2 ms) and the recorded neuron data was computed (blue line, Fig. 5a). The residual error predicted by the control model (denoted by the nSW-based model) was computed in a similar way and is shown with a red line in Fig. 5a. The accuracy of the each model to predict individual neurons responses within 1 s was computed with Eq. 10 (Sect. 2.3.2), respectively. The prediction accuracy was compared between the two models across all five datasets (\(n = 72\), Fig. 5b). From the comparison we can see that the SW-based model predicted more accurate individual response (\(34.5\pm 3.1\,\%\) less residual error) than the nSW-based model. The results show that computation described by the SW-based model was much closer to the encoding process of the neurons recorded in V1, compared to the nSW-based model, suggesting that the small-world structure of a local neuronal population might be necessary to predict responses of individual neurons in V1. However, it is not enough to evaluate the capacity to extract stimulus information that the responses carry.

Table 1 The goodness-of-fit statistics for fitting spatial and temporal RFs
Table 2 Characteristics and the summary predicted effect of both models
Fig. 5
figure 5

Encoding performance compared between both models. a The residual error between the true PSTH of the neuron to 1-s novel stimuli in 1-ms time bins and the predicted response under the SW-based model (blue) and under the nSW-based model (red). b The predicted error of the nSW-based model for all 72 neurons plotted against those of the SW-based model. The SW-based model predicts responses with lower error than the control model. The one plotted in red was the predicted result for the target neuron (color figure online)

To examine further the decoding performance of the proposed model, we used the \(L_{1/2}\) regularized logistic regression to decode stimuli information (taking the grating’s orientation for an example) from the population response predicted by the established model. The grating’s orientation (\(\theta _{1}\) or \(\theta _{2}\), see Sect. 2.3.2 for detailed information) was decoded from each short (50-ms) segment of the population response during the presentation of a grating stimulus. The percent correct for each short segment was plotted against the time latency and shown in Fig. 6a (blue line), in addition to that of the control model (red line). The percent correct between the 100 and 500 ms after stimulus onset were averaged to represent the decoding accuracy for each combination of grating orientations. Figure 6b shows the comparison result of the decoding accuracy plotted against \(\Delta \theta (\Delta \theta = {\vert }\theta _{1} -\theta _{2}{\vert })\), in which \(\theta _{1}\) and \(\theta _{2}\) is ranging from \(0^{\circ }\) to \(330^{\circ }\), in steps of \(30^{\circ }\). Note that orientations whose difference is \(180^{\circ }\) were considered unique orientations with \(\Delta \theta = 0\), and the difference between orientations, which is larger than 90, was taken as the supplementary angle of its original difference instead. Thus, there are, in total, four different \(\Delta \theta \) values. For each \(\Delta \theta \), there are several different combinations. The final decoding accuracy at each \(\Delta \theta \) in Fig. 6b was the mean correct percent of all combinations. The comparison results indicate the decoding performance is improved across all \(\Delta \theta \) values by taking the small-world structure of neuronal populations into account. However, Fig. 6 only presents the decoding performance differences between two examples. We made further analysis upon the decoding performance between the two types of ensembles to see whether the conclusion is stable across different neuronal groups (see Sect. 3.3 for details).

Fig. 6
figure 6

Decoding performance under both models. a Time course of the decoding performance for both models averaged over each combination of \(\Delta \theta \). b Average decoding performance for both models plotted as a function of \(\Delta \theta \). The color code is as in a (color figure online)

3.3 Effect of small-world degree on a model’s performance

To explore whether the degree of small-world property would influence the encoding and decoding performance of the model, we constructed a set of 14-neuron networks based on SW-based models (in total \(n_\mathrm{net} = 151\)), derived from all five datasets. The encoding and decoding accuracy of each population were evaluated following the above ways, respectively. The distribution of the computed \(S_{w}\) from all 14-neuron networks is shown in Fig. 7a. We graded the set of networks into several different groups, i.e. strong group (\(S_{w} \ge 1.45,\,n_\mathrm{net}= 34\)), modest group (\(1.30 \le S_{w} < 1.45,\, n_\mathrm{net} = 87\)), and weak group (\(S_{w} < 1.30,\, n_\mathrm{net} = 30\)). Then statistical comparisons were made between different groups on both the encoding and decoding accuracy, as shown in Fig. 7b, c. In addition, we computed the encoding and decoding accuracy of the control models based on all networks (\(n_\mathrm{net} = 151,\, S_{w} \approx 1.00\)). From the comparison results we can see that there is no significant variation (pair \(t\) test, \(p>0.1\)) between groups with different degrees of small-world property, while the difference between each group and the control model is significant (pair \(t\) test, \( p < 0^{-5})\). The above results show that the encoding and decoding performance of local population is significantly affected by the small-world structure, whereas the model’s performance has no distinct relationship with the degree of small-world property.

Fig. 7
figure 7

The performance comparison among models is based on networks with different levels of small-worldness. a The comparison on encoding accuracy between models is based on networks with different levels of \(S_w\), referred to in Sect. 3.3. b The comparison on decoding accuracy between models is based on networks with different levels of \(S_{w}\)

As the results across all 14-neuron groups were stable, we considered it justified to reconstruct networks with different numbers of neurons aiming to investigate whether the conclusions depend on the number of neurons. Surprisingly, the results of this analysis were highly consistent across 13-neuron networks \((n_\mathrm{net}= 534\) out of 784) and 12-neuron networks (\(n_\mathrm{net}=657\) out of 2823). The above results suggest that the encoding and decoding performance of the local population in V1 would be improved by the small-world structure regardless of its degree.

4 Discussion

An important problem in neurophysiology is to understand how the sensory system encodes visual information and what would affect the spiking activity of a single neuron or neuronal populations. The most popular approach is to relate the neuron’s spiking activity to three factors (Truccolo et al. 2005), i.e. the stimulus information, the neuron’s own spiking history, and the modulation effect from coupling neurons. It is also the basic principle of GLM, which was applied abundantly to describe the computation process of a neuronal population and draw lots of valuable conclusions (Pillow et al. 2008). Nevertheless, little attention is given to the influence of the network structure property in encoding and decoding information of the neuronal population in V1. In this paper, we extended the GLM to describe the spiking activity of a neuronal population in V1 by specifying the network connectivity according to small-world property, which has been reported by amounts of studies (Yu et al. 2008; Gerhard et al. 2011).

The small-world network was also used in recent models. For example, Zheng et al. (2014) established a small-world neuronal network model and proved the temporal order of the spiking activity would be enhanced by introducing the small-world connectivity. And our study further demonstrated that the small-world structure would improve the encoding and decoding performance of local neuronal population in V1. There are also many researchers dedicating to exploring the influence of network structure on the population dynamics of neurons. Pernice et al. (2013) and Trousdale et al. (2013) have respectively established numerical models and revealed the strong influence of the network structure on the population activity dynamics, whereas those models are based on a large quantity of neurons (even more than 10,000) and cannot be examined in electrophysiological experiments. Haslinger et al. (2013) established a regression tree-based neuronal population model and demonstrated the pattern-based encodings were superior to those of independent neurons model. Similarly, from our simulated results, we can also conclude that the small-world structure (a type of neuronal population pattern) may enhance the encoding and decoding performance.

In conclusion, the simulating results of our model suggest that the small-world structure of the local population may play an important role in representing and carrying information. Furthermore, it is hypothesized that neurons in V1 do not encode visual information in an individual no pairwise or pairwise way, but gather as a neuronal network with a dynamic functional connectivity, which makes their spiking activity more robust and diverse for carrying or processing visual information.

However, the model’s predicting results do not completely agree with the recorded data. This discrepancy can be explained by the following factors: (1) the parameters of the SW-based model are mainly determined by the responses of simple cells with obvious spatial RF. However, there are other kinds of cells (e.g. complex cells) insensitive to location of the stimuli (Hubel and Wiesel 1962), which may affect the results. (2) The number of recorded neurons is finite and the integration from coupling neurons is also simulated with the finite number of neurons. It is, in fact, impossible to estimate how each neuron interacts with adjacent neurons (Wen and Zhang 2009); this can also cause deviation. In addition, the model was only fitted with small numbers of populations (12–15) due to our limited experimental conditions. In the future, we plan to use a larger amount (more than 20) of neuron data to fit this model and wish to draw more valuable conclusions.