Introduction

An ion channel is a large protein molecule that regulates cell function by controlling the flow of ions across the cell membrane (Aidley and Stanfield 1996, p. 3). Conduction of ions occurs through an aqueous pore which opens in response to a stimulus specific to the type of channel. Simple channels exhibit only two levels, open (conducting) and closed (Hille 2001, Chapters 4 and 5), but complex channels also have subconducting levels (Sukharev et al. 1999).

Mechanosensitive channels are triggered (gated) by membrane tension. These channels undergo conformational changes when the cell membrane is mechanically stressed (Martinac 2011). The most studied mechanosensitive channels are those of small (MscS) and large (MscL) conductance in the bacterium Escherichia coli (E. coli); see (Hamill and Martinac 2001; Martinac 2011). These two types of channel are multi-level and control intracellular pressure. In other organisms, mechanosensitive channels mediate the senses of touch, hearing, balance and proprioception, the latter including the sense of position of body parts in humans (Gillespie and Walker 2001; Ernstrom and Chalfie 2002; Xiao and Xu 2010).

Ion channel currents are recorded using the patch clamp technique (Hamill et al. 1981), and this provides a key source of information regarding ion channel activity. Currents appear to fluctuate randomly between various conducting levels. The noisy current is low-pass filtered and then digitised (sampled and quantised) to produce a sequence of observed currents in discrete time. Evidence indicates that noise in the recordings is not white (Venkataramanan et al. 1998a; Qin et al. 2000b; Fredkin and Rice 2001; Colquhoun and Sigworth 2009). For example, Schouten (2000, Chapter 4) analysed patch clamp recordings of barley leaf protoplasts and concluded, based on plots of autocorrelation functions of the noise at the closed level, that the noise in their records was correlated.

Hidden Markov models (HMMs) are widely used for describing the gating behaviour of a single ion channel and form a basis for statistical analysis of patch clamp data (Chung et al. 1990; Khan et al. 2005). Incorporating adjustments for filtering and coloured noise complicates the application of hidden Markov methodology. However, Michalek et al. (1999) found that model parameters and the channel gating scheme for Na\(^+\) channels were incorrectly estimated when correlations in channel noise were ignored.

Other models for ion channels have also been proposed, including fractal models (Liebovitch et al. 1987; Liebovitch 1989) and defect-diffusion models (Laüger 1985, 1988; Milhauser et al. 1988; French and Stockbridge 1988). These models differ fundamentally from the Markov model in that transition rates are no longer constant. The general conclusion from the literature is that while non-Markovian models may have some merit, the Markovian models so far provide a better fit for the data (Colquhoun and Sigworth 2009; Korn and Horn 1988; McManus et al. 1988; Sansom et al. 1989; McManus and Magleby 1989; Petracchi et al. 1991; Gibb and Colquhoun 1992). More recently, a non-parametric technique with exact missed event correction has been reported (Epstein et al. 2016). However, the method is based on the rather strong assumption that the filtered and digitised channel record has perfect resolution. No comparison of this method with Markov-based methods has been conducted yet.

In their paper, Almanjahie et al. (2015) considered an HMM with filtering but uncorrelated noise for ion channel data. They used the EM algorithm for parameter estimation, based on an extended forward–backward algorithm similar to that of Fredkin and Rice (2001). In this paper, we extend that work to an HMM with filtering and correlated noise.

The remainder of the paper is organised as follows. In “Standard hidden Markov models”, we review the standard HMM and present some mathematical preliminaries in “Some preliminaries”. Extensions of the HMM that include filtering and AR models for correlated noise are given in “HMMs with filtering and correlated noise”. We then introduce our filtered HMM with a moving average (MA) model for correlated noise in “Deconvolution approach”, using deconvolution (a signal processing method) to pre-whiten the noise. The pre-processed data can be modelled as a standard HMM, and parameters are estimated using the EM algorithm. Results of simulation studies to evaluate the performance of this approach are discussed in “Simulation study”. In “Application: MscL data”, the method is applied to real data obtained from MscL in E. coli. Finally, in “Discussion”, we discuss our findings and make concluding remarks.

Standard hidden Markov models

We model the gating behaviour of a single ion channel by a continuous time, regular, homogeneous Markov chain with a finite number of states that correspond to the conformational states of the channel (Colquhoun and Hawkes 1997, 1981), which we assume (for the moment) have distinct conductances. Let \(X_t\) denote the state of the channel at time \(t\ge 0\), \(S =\{0 ,1,\ldots ,N-1\}\) the state space and \(\varvec{Q}\) the intensity matrix for the process \(\{X_t:t\ge 0\}\). Further, let \(\varvec{\mu }=(\mu _{0},\mu _{1},\ldots ,\mu _{N-1})\) and \(\varvec{\sigma }=(\sigma _{0},\sigma _{1},\ldots ,\sigma _{N-1})\), where \(\mu _{i}\) and \(\sigma _{i}\) are, respectively, the mean current and the noise standard deviation corresponding to state i.

In practice, the noisy current is low-pass filtered and sampled, but for the moment we ignore the effect of the filtering. Then \(\varvec{X}=(X_{1},X_{2},\ldots ,X_{T})\) is a (segment of a) discrete-time, homogeneous, irreducible Markov chain with a finite state space \(S =\{0 ,1 , \ldots ,N-1\}\), \(N\times N\) transition probability matrix \(\varvec{P}=[p_{ij}]\), and initial state distribution \(\varvec{\pi }=(\pi _{0},\pi _{1},\ldots ,\pi _{N-1})\) where \(\pi _i={\mathbb {P}}(X_1=i)\), \(i\in S\). The sampled current at time t can be represented as

$$\begin{aligned} Y_{t}=\mu _{X_{t}}+\sigma _{X_{t}}\varepsilon _{t},\qquad t=1,2,\ldots ,T, \end{aligned}$$
(2.1)

where \(\varepsilon _{1},\varepsilon _{2},\ldots ,\varepsilon _{T}\) are independent and identically distributed (iid) N(0, 1) random variables, assumed also to be independent of \(\varvec{X}\). Given \(\varvec{X}\), the random variables \(Y_{1},Y_{2},\ldots ,Y_{T}\) are conditionally independent. Moreover, the distribution of \(Y_{t}\) conditional on \(\varvec{X}\) depends only on \(X_{t}\) and, by Eq. (2.1),

$$\begin{aligned}(Y_{t}\mid X_{t}=x_{t})\sim N(\mu _{x_{t}},\sigma ^2_{x_{t}}), \quad t=1,2,\ldots ,T.\end{aligned}$$

Set \(\varvec{Y} = (Y_{1},Y_{2},\ldots ,Y_{T})\). The joint distribution of \((\varvec{X}, \varvec{Y})\), i.e. probability mass function for \(\varvec{X}\) and (conditional) probability density function for \(\varvec{Y}\) given \(\varvec{X}\), is

$$\begin{aligned} {\mathbb {P}}(\varvec{x},\varvec{y}) = \pi _{x_{1}}\prod _{t=2}^{T}p_{x_{t-1},{x_{t}}}\prod _{t=1}^{T}f_{x_t}(y_{t}), \end{aligned}$$
(2.2)

\(\varvec{x}\in S^T, \varvec{y}\in {\mathbb {R}}^T\), where \(f_{x_t}\) is the \(N(\mu _{x_{t}},\sigma ^2_{x_{t}})\) probability density function. We call the representation in (2.2) a standard HMM.

Denote the model parameters by the vector \(\varvec{\phi }=(\varvec{\pi },\varvec{P},\varvec{\mu },\varvec{\sigma })\). As in Khan et al. (2005) and Almanjahie et al. (2015), the EM algorithm (Dempster et al. 1977) can be used for parameter estimation. In the following \(i,j=0,1,\ldots ,N-1\) represent the states of the Markov chain. Let the index \(m=0\) indicate initial parameter values. Then at iteration \(m=1,2,\ldots\) of the EM algorithm, the updating formulae for \(\pi _{i}\), \(p_{ij}\), \(\mu _{i}\) and \(\sigma _{i}\) are

$$\begin{aligned} \pi ^{m+1}_{i}&=\gamma ^m_{1}(i), \end{aligned}$$
(2.3)
$$\begin{aligned} p^{m+1}_{ij}&=\frac{\sum _{t=1}^{T-1}\gamma ^m_{t}(i,j)}{\sum _{t=1}^{T-1}\sum _{j=0}^{N-1}\gamma ^m_{t}(i,j)}, \end{aligned}$$
(2.4)
$$\begin{aligned} \mu ^{m+1}_{i}&=\frac{\sum _{t=1}^{T}\gamma ^m_{t}(i) y_{t}}{\sum _{t=1}^{T}\gamma ^m_{t}(i)}, \end{aligned}$$
(2.5)
$$\begin{aligned} \sigma ^{m+1}_{i}&=\left\{ \frac{\sum _{t=1}^{T}\gamma ^m_{t}(i)(y_{t}-\mu ^{m+1}_{i})^2}{\sum _{t=1}^{T}\gamma ^m_{t}(i)}\right\} ^\frac{1}{2}, \end{aligned}$$
(2.6)

where

$$\begin{aligned} \gamma _{t}^{m}(i)= {\mathbb {P}}(X_{t}=i\mid \varvec{y},\varvec{\phi }^m),\qquad t=1,2,\ldots ,T, \end{aligned}$$
(2.7)

and

$$\begin{aligned} \gamma _{t}^{m}(i,j)= {\mathbb {P}}(X_{t}=i,X_{t+1}=j\mid \varvec{y},\varvec{\phi }^m), \end{aligned}$$
(2.8)

\(t=1,2,\ldots ,T-1\). The \(\gamma _{t}^{m}(i)\) and \(\gamma _{t}^{m}(i,j)\) can be computed recursively (Khan et al. 2005; Almanjahie et al. 2015) using Baum’s forward and backward algorithms (Baum et al. 1970; Devijver 1985). Iterations continue until some stopping criterion, such as a pre-set tolerance, is satisfied.

Some preliminaries

The notation

$$\begin{aligned}\left\{ x_k\right\} _{k=-\infty }^\infty =\left( \ldots , x_{-1}, x_0, x_1, x_2, \ldots \right) \end{aligned}$$

represents a discrete time sequence, where the index can be a subset of the integers. When the limits of the index are clear from the context we will omit them.

A discrete time system is a function T that maps an input sequence \(\{x_k: k \in {\mathbb {Z}}\}\) to an output sequence \(\{y_k: k \in {\mathbb {Z}}\}\) given by

$$\begin{aligned} \{y_k\} = T\left( \{x_k\}\right) , \quad k\in {\mathbb {Z}}. \end{aligned}$$
(3.1)

A system is stable if for every bounded input sequence \(\{x_k\},\) the output \(\{y_k\}\) is bounded. The system is linear if for sequences \(\{u_k\}\) and \(\{v_k\},\)

$$\begin{aligned}T\left( \{au_k+bv_k\}\right) = aT\left( \{u_k\}\right) + bT\left( \{v_k\}\right) , \quad a, b \in {\mathbb {R}},\end{aligned}$$

and is time invariant if the input–output relationship does not change over time, that is, for each \(j \in {\mathbb {Z}}\), \(T\left( \{x_{k-j}\}\right) = \{y_{k-j}\}\). Henceforth we consider only linear time-invariant (LTI) systems.

The impulse response\(\{h_k\}\) of a discrete LTI system is the output of the system when the input is an impulse \(\{\delta _k\}\), where \(\{\delta _k\}\) is the Kronecker delta defined as

$$\begin{aligned} \delta _k = {\left\{ \begin{array}{ll} 1, \quad k=0,\\ 0, \quad k \ne 0. \end{array}\right. } \end{aligned}$$

The system is called finite (duration) impulse response (FIR) if its impulse response is a sequence of finite length.

Let \(\{u_k: k \in {\mathbb {Z}}\}\) and \(\{x_k: k \in {\mathbb {Z}}\}\) be discrete time sequences. The convolution of \(\{u_k\}\) and \(\{x_k\}\), denoted \(\{x_k\}*\{u_k\}\), is defined as

$$\begin{aligned} \{x_k\}*\{u_k\}&= \left\{ \sum _{j=-\infty }^\infty x_j\ u_{k-j}\right\} \nonumber \\&= \left\{ \sum _{j=-\infty }^\infty u_j\ x_{k-j}\right\} = \{u_k\}*\{x_k\}; \end{aligned}$$
(3.2)

see Proakis and Manolakis (1996, p. 76–82). Note the convolution yields a sequence. It follows that for any sequence \(\{x_k\},\)

$$\begin{aligned} \{x_k\}*\{\delta _k\} = \{x_k\} = \{\delta _k\} * \{x_k\}, \end{aligned}$$
(3.3)

so \(\{\delta _k\}\) is the identity for convolution.

An LTI system can be completely characterised by its impulse response \(\{h_k\}\), as formalised in the following result (Proakis and Manolakis 1996, p.76).

Theorem 1

The output\(\{y_k\}\) of an LTI system is related to the input\(\{x_k\}\) by

$$\begin{aligned} \{y_k\} = \{h_k\}*\{x_k\}, \end{aligned}$$
(3.4)

where\(\{h_k\}\)is the system impulse response.

In particular, since \(\{h_k\}\) is the impulse response,

$$\begin{aligned} \{h_k\} = \{h_k\}*\{\delta _k\}, \end{aligned}$$
(3.5)

a result that also follows from (3.3).

The z-transform

The ztransform (Adsad and Dekate 2015; Proakis and Manolakis 1996, p. 169) of the sequence \(\left\{ x_k\right\}\) is given by

$$\begin{aligned}X(z) = \sum _{k=-\infty }^\infty x_k z^{-k}.\end{aligned}$$

The region of convergence (ROC) of X(z), denoted \(R_X\), is the subset of the complex plane for which the sum converges. In simple cases, the z transform can be written in closed form.

Note that if we substitute \(z = sT,\) then the z-transform is equivalent to the Laplace transform of a continuous time signal sampled at frequency \(f=1/T\) (Adsad and Dekate 2015).

The next theorem relates convolution to the z transform.

Theorem 2

Let\(\{u_k\}\)and\(\{x_k\},\ k \in {\mathbb {Z}}\), be discrete time sequences and put

$$\begin{aligned} \{y_k\} = \{u_k\} * \{x_k\}. \end{aligned}$$
(3.6)

Then theztransformY(z) of\(\{y_k\}\)is

$$\begin{aligned} Y(z) = U(z)\ X(z), \end{aligned}$$
(3.7)

whereU(z) andX(z) are theztransforms of\(\{u_k\}\)and\(\{x_k\},\)respectively.

Theorem 2 implies that convolution in the discrete time domain is equivalent to multiplication in the z-domain. Applying the result to (3.4) gives

$$\begin{aligned} Y(z) = H(z)\ X(z). \end{aligned}$$
(3.8)

The function H(z) is the z transform of the system impulse response and is called the transfer function of the system. It can be shown that an LTI system is stable if and only if the ROC of H(z) includes the unit circle (Proakis and Manolakis 1996, p. 209).

Deconvolution

Deconvolution is the process of determining the input sequence given an output sequence and the system impulse response. This requires finding a discrete time sequence \(\{b_k\}\) that when convolved with the known output \(\{y_k\}\) yields the input sequence \(\{x_k\}\). By (3.4), it follows that

$$\begin{aligned} \{b_k\} * \{y_k\} = \{b_k\} * \{h_k\}*\{x_k\} = \{x_k\}. \end{aligned}$$
(3.9)

By (3.3), the sequence \(\{b_k\}\) must satisfy

$$\begin{aligned} \{b_k\} * \{h_k\} = \{\delta _k\}, \end{aligned}$$
(3.10)

that is, \(\{b_k\}\) is the inverse of \(\{h_k\}\) under convolution. Solving (3.10) for \(\{b_k\}\) given \(\{h_k\}\) in the time domain is usually difficult (Proakis and Manolakis 1996, p. 356). However, (3.8) can be rewritten as

$$\begin{aligned} X(z) = \frac{Y(z)}{H(z)}=F(z)\ Y(z), \end{aligned}$$
(3.11)

where \(F(z) = 1/H(z)\) is the reciprocal of the transfer function. Thus, deconvolution in the time domain is equivalent to division in the z domain. It follows from (3.9), (3.11) and Theorem 2 that F(z) is the z transform of \(\{b_k\}\), that is, \(F(z) = \sum _k b_k z^{-k}\).

Note that in general the inverse \(\{b_k\}\) may be of infinite length. In practice, this requires the sequence to be truncated (Mourjopoulos 1994). We will consider a truncation to be adequate if it satisfies the (Euclidean) norm-based criterion

$$\begin{aligned} \mid \mid \{b_k\}*\{h_k\} - \{\delta _k\}\mid \mid < \varepsilon \end{aligned}$$
(3.12)

for some pre-set value of \(\varepsilon\) (Proakis and Manolakis 1996, §8.5.2).

HMMs with filtering and correlated noise

Correlation in the noise is due to the low-pass filter and perhaps channel gating characteristics. Various models that include correlated noise have been proposed in the literature.

Venkataramanan et al. (1998a, b) developed a model incorporating correlated background noise and state-dependent ‘excess’ noise. They modelled background correlated noise \(\{n_t\}\) as an autoregressive (AR) process of order p. Their model can be written for \(t=1,2,\ldots ,T\) as

$$\begin{aligned} Y_{t}&= \mu _{X_t} + n_t + \sigma _{X_{t}}\varepsilon _{t}, \end{aligned}$$
(4.1a)
$$\begin{aligned} n_t&= \sum _{j=1}^{p}\zeta _{j} n_{t-j} + \delta _t, \qquad p\le t. \end{aligned}$$
(4.1b)

Each of the sequences \(\{\varepsilon _t\}\) and \(\{\delta _t\}\) was assumed to be iid normally distributed, with variances 1 and \(\sigma ^2_{\delta },\) respectively, and \(\zeta _1,\zeta _2,\ldots ,\zeta _{p}\) are the coefficients specifying the AR(p) model for \(\{n_t\}\).

Venkataramanan et al. (1998a, b) estimated the coefficients of the AR process as follows. First, the autocorrelation function of noise was estimated from long stretches of noise at the closed level. Then the coefficients in Eq. (4.1b) were estimated by using the Levinson–Durbin algorithm to solve the Yule–Walker equations for the autocorrelations of \(\{n_t\}\), a standard time series approach (Brockwell and Davis 2006, Chapter 8). They then preprocessed the data using a pre-whitening filter of length k to remove the correlation in the noise, the parameters of this filter being obtained from the inverse of the AR filter transfer function. Since the signal \(\mu _{X_t}\) also passes through the pre-whitening filter, the observed current \(Y_t\) at time t now depends on the Markov chain states at k time points. This collection (vector) of the k Markov chain states is referred to as a meta-state (Venkataramanan et al. 1998a). To calculate the likelihood of the model for this preprocessed data, Venkataramanan et al. (1998a) considered a vector Markov chain over the \(\left( M=N^k\right)\)meta-states. They developed a modified Baum–Welch algorithm (Baum et al. 1970; Baum and Eagon 1967), involving some approximations, to estimate the parameters in their HMM (Venkataramanan et al. 1998a, b). However, the modified Baum–Welch algorithm does not guarantee that the likelihood is non-decreasing after each iteration, and does not necessarily produce maximum likelihood estimates (Venkataramanan et al. 1998b, p. 1926).

Schouten (2000, Chapter 5) and De Gunst et al. (2001) used the model in Eq. (4.1), but also incorporated a Gaussian MA filter with length \(2r+1\) as an adjustment for the effect of the low-pass filter. Specifically, for \(\max (r,p)\le t\le T-r\),

$$\begin{aligned} Y_{t}&= \sum _{s=-r}^{r}\eta _s\mu _{X_{t-s}} + n_t + \sigma _{X_{t}}\varepsilon _{t}, \end{aligned}$$
(4.2a)
$$\begin{aligned} n_t&= \sum _{j=1}^{p}\zeta _{j} n_{t-j} + \delta _t , \end{aligned}$$
(4.2b)

where \(\{\varepsilon _t\}\) and \(\{\delta _t\}\) are as in Eq. (4.1) and the filter weights \(\eta _s\), \(s=-r,\ldots ,r\), are determined from filter characteristics as described in Colquhoun and Sigworth (2009, Appendix 3, p. 576); see also Table 4.1 in Schouten (2000). They too estimated the order of the AR process from long stretches of the noise at the closed level, but used Markov Chain Monte Carlo (MCMC) methods based on a meta-state approach to estimate the AR coefficients \(\zeta _{j}\), \(j=1,2,\ldots ,p\), and the HMM parameters.

Fredkin and Rice (2001) modelled the effect of the low-pass filter by a finite (duration) impulse response (FIR) filter and the state-independent correlated noise by an AR model. They pre-whitened the noise and developed an approximation to simplify likelihood computations, but did not report an application of their method to real data.

Qin et al. (2000b) extended the model of Fredkin and Rice (2001) to allow for state-dependent correlated noise. They modelled the effect of the low-pass filter by an FIR filter with coefficients \(h_s\), \(s=0,\dots ,r-1\), and used AR models for the state-dependent correlated noise \(\{n^{(X_t)}_t\}\), i.e. a separate model for each level. To simplify the computations they assumed that the AR models at all channel levels had the same order. Their model can be written as

$$\begin{aligned}&Y_{t} = \sum _{s=0}^{r-1}h_s\mu _{X_{t-s}} + n^{(X_t)}_t, \end{aligned}$$
(4.3a)
$$\begin{aligned}&n^{(X_t)}_t = \sum _{j=1}^{p}\zeta _{j}^{(X_t)} n^{(X_t)}_{t-j} + \sigma _{X_{t}}\varepsilon _{t} \end{aligned}$$
(4.3b)

for \(\max (r,p)\le t\le T\).

Rewriting Eq. (4.3b) gives

$$\begin{aligned} \sum _{j=0}^{p}\kappa _{j}^{(X_t)} n^{(X_t)}_{t-j}= & {} \sigma _{X_{t}}\varepsilon _{t}, \end{aligned}$$

where \(\kappa ^{(X_t)}_0=1\) and \(\kappa ^{(X_t)}_j=-\zeta _{j}^{(X_t)}\), \(j=1,2,\dots ,p\). This equation can be written as

$$\begin{aligned} \{\kappa ^{(X_t)}_t\}* \{n^{(X_t)}_{t}\} =\{\sigma _{X_{t}}\varepsilon _{t}\}, \end{aligned}$$

where \(*\) indicates convolution. Taking z transforms and using Theorem 2 give

$$\begin{aligned}K^{(X)}(z)N^{(X)}(z) = (\sigma _{X}\varepsilon )(z),\end{aligned}$$

whence

$$\begin{aligned} N^{(X)}(z)= (\sigma _{X}\varepsilon )(z)/K^{(X)}(z), \end{aligned}$$
(4.4)

where \(K^{(X)}(z)\) is the transfer function of the AR filter. Then taking the z transform of the model in Eq. (4.3a) and using Eq. (4.4) yield

$$\begin{aligned} Y(z)= H(z)\mu _{X}(z) + (\sigma _{X}\varepsilon )(z)/K^{(X)}(z), \end{aligned}$$
(4.5)

where H(z) is the transfer function of the FIR filter.

Pre-whitening is equivalent to multiplying both sides of (4.5) by \(K^{(X)}(z)\), yielding

$$\begin{aligned} K^{(X)}(z)Y(z)&= K^{(X)}(z)H(z)\mu _{X}(z) \nonumber \\&\quad + (\sigma _{X}\varepsilon )(z). \end{aligned}$$
(4.6)

Rewriting Eq. (4.6) in the time domain gives

$$\begin{aligned} \sum _{j=0}^{p}\kappa ^{(X_t)}_j Y_{t-j}= & {} \sum _{j=0}^{k}c^{(X_t)}_j\mu _{X_{t-j}} + \sigma _{X_t}\varepsilon _t, \end{aligned}$$
(4.7)

where \(k=r+p\) and \(c_j^{(X_t)}\) is the convolution of \(\{h_s\}\) and \(\{\kappa ^{(X_t)}_\ell \}\). Hence, as in equation 4 of Qin et al. (2000b),

$$\begin{aligned} Y_t= & {} \sum _{j=0}^{k}c^{(X_t)}_j\mu _{X_{t-j}} - \sum _{j=1}^{p}\kappa ^{(X_t)}_j Y_{t-j} + \sigma _{X_t}\varepsilon _t.\ \ \ \ \end{aligned}$$
(4.8)

Finally, based on a meta-state approach, Qin et al. (2000b) obtained parameter estimates by direct optimisation using quasi-Newton methods.

For an HMM with state-dependent Gaussian white noise, Khan et al. (2005) used a (Gaussian) MA adjustment for filtering. They considered the following model. For \(t=r+1,r+2,\ldots ,T-r\),

$$\begin{aligned} Y_{t}= & {} \sum _{s=-r}^{r}\eta _s\mu _{X_{t-s}} + \sigma _{X_{t}}\varepsilon _{t}, \end{aligned}$$
(4.9)

where the filter weights \(\eta _s\), \(s=-r,\ldots ,r\) are determined as for Eq. (4.2). Parameters were estimated by the EM algorithm based on a meta-state process.

Almanjahie et al. (2015) employed the model of Eq. (4.9), which they called a moving average filtered HMM (MAFHMM), and obtained parameter estimates using an EM-based algorithm. A key aspect of that work was the development of a generalised Baum’s forward–backward algorithm, similar to that of Fredkin and Rice (2001), which greatly reduced the computational demand. Nonetheless, estimation of model parameters is still effectively based on a meta-state model; see Almanjahie et al. (2015) for details. As a result, computational requirements were much greater than for the standard HMM.

A common feature of each of the above models is that they are based on meta-state processes. Note also that the vector Markov chain based on the meta-states has a much expanded state space. For example, consider a Markov chain with five states. If a 3-tuple of states is considered, there are \(5^3=125\) possible meta-states. Consequently, maximising the log-likelihood and estimating parameters for these models considerably increases the computational demand.

Deconvolution approach

In Eq. (4.9), only the current is considered to be filtered. However, in practice, during data collection it is the noisy current that is low-pass filtered, so the filter also affects the state-dependent noise. Thus, a more appropriate model for the recorded current is

$$\begin{aligned} Y_{t}= & {} \sum _{s=-r}^{r}\eta _{s}\mu _{X_{t-s}} + \sum _{s=-r}^{r}\eta _{s} \sigma _{X_{t-s}}\varepsilon _{t-s}, \end{aligned}$$
(5.1)

where \(t=1,2,\ldots ,T\), and the value of r and the filter weights \(\eta _{s}\) are determined as for Eq. (4.2). Note that the Markov chain has been relabelled here, so \(\varvec{X}=(X_{-r+1},X_{-r+2},\ldots ,X_1,X_2,\ldots ,X_{T+r})\). The second term in Eq. (5.1) now represents correlated state-dependent noise. Furthermore, in Eq. (5.1) the mean current and the state-dependent noise at time t both depend on the underlying Markov chain states at the present time t as well as the immediate r past and r future time points.

Our justification for this choice of model is as follows. In reported studies for the models in Eq. (4.1) and Eq. (4.2), either \(\sigma _{\delta }^2\) is larger than each \(\sigma _i^2\) (\(i=0,\ldots ,N-1\)) by at least an order of magnitude, or vice versa; see Table 3 in De Gunst et al. (2001). Since only one of these noise terms denominates in the model, we absorb all noise sources into the filtered state-dependent noise.

Each sum on the right hand side of Eq. (5.1) is a convolution, so this equation can be written in the time domain as

$$\begin{aligned} \{Y_{t}\}&= \{\eta _{t}\} * \{\mu _{X_{t}}\} + \{\eta _{t}\} * \{ \sigma _{X_{t}}\varepsilon _{t}\} \nonumber \\ &= \{\eta _{t}\} * \{\mu _{X_{t}} + \sigma _{X_{t}}\varepsilon _{t}\}. \end{aligned}$$
(5.2)

Taking the z transform in Eq. (5.2) and using linearity gives

$$\begin{aligned} Y(z)= & {} H(z)\left[ \mu _X(z) + (\sigma _X\varepsilon )(z)\right] , \end{aligned}$$
(5.3)

where \(H(z)=\sum _{s=-r}^{r}\eta _{s}z^{-s}\) is the transfer function of the MA filter. Let \(F(z)={1}/{H(z)}=\sum _{t} b_{t}z^{-t}\). This inverse exists under certain conditions, for example if the series converges in a region of the complex plane including the unit circle; see the example in “Implementing MAD” and the appendix in Mourjopoulos (1994). Multiplying both sides of Eq. (5.3) by F(z) yields

$$\begin{aligned} F(z)Y(z)= & {} \mu _X(z) + (\sigma _X\varepsilon )(z). \end{aligned}$$
(5.4)

In the time domain, Eq. (5.4) becomes

$$\begin{aligned} \{b_t\}*\{Y_t\}= & {} \{\mu _{X_t}\} + \{\sigma _{X_t}\varepsilon _t\}. \end{aligned}$$
(5.5)

Note that \(\{b_t\}*\{\eta _t\}=\{\delta _t\}\). Thus, Eq. (5.5) is simply the convolution of \(\{b_t\}\) with Eq. (5.2).

Putting \(\{\breve{Y}_{t}\} = \{b_t\}*\{Y_t\}\) in Eq. (5.5) gives

$$\begin{aligned} \breve{Y}_{t}&= \mu _{X_t} + \sigma _{X_t}\varepsilon _t, \qquad t=1,2,\ldots ,T. \end{aligned}$$
(5.6)

This final model is a standard HMM for \(\breve{\varvec{Y}}\), so Baum’s forward and backward algorithms and the EM algorithm can be used for parameter estimation. We call this method Moving average with deconvolution (MAD). Khan’s algorithm (Khan 2003) can be used for computing the standard errors of the parameter estimates.

Simulation study

The data simulation mimicked the data generation process for ion channels. We began by simulating a continuous time Markov chain, which was then sampled to produce a discrete time Markov chain that represented the sequence of states of the channel. Each state was mapped to a mean current, following which noise was added.

Almanjahie et al. (2015) had shown that MscL in E. coli has five subconductance levels in addition to the closed and fully open levels. They had also estimated the mean current and noise standard deviation at each level. For our simulations, we chose an intensity matrix \(\varvec{Q}\) which gave mean dwell times reflecting those estimated for MscL, but was otherwise arbitrary. A seven-state continuous time Markov chain with state space \(S=\{0,1,\ldots ,6\}\) was generated. The resulting continuous time Markov chain was sampled at 50 kHz, giving a discrete time Markov chain.

Currents were set to 0, 15, 30, 45, 65, 85 and 105, and state-dependent Gaussian white noise was added to these currents at each sampling point. The noise standard deviation at level 0 was set to 3, increasing in steps of 0.5 to 6 at the fully open level. The resulting noisy current was digitally filtered at 25 kHz to produce a data set containing 100,000 points.

Implementing MAD

For our simulated data set, the ratio of the sampling frequency \(f_s\) and the filter cutoff frequency \(f_c\) is 2. From Colquhoun and Sigworth (2009, Appendix, 3, p. 576) or Schouten (2000, Table 4.1), this corresponds to a filter of length 3 (\(r=1\)) with coefficients \(\eta _0=0.93\) and \(\eta _{-1}=\eta _1=0.035\). The reciprocal of the transfer function of the corresponding MA filter is

$$\begin{aligned} F(z)&= \frac{1}{0.035z^{1}+0.93+0.035z^{-1}} \nonumber \\&= 1.0782\left[ - \frac{1}{1+26.53z^{-1}}+\frac{1}{1+0.0377z^{-1}}\right] , \end{aligned}$$
(6.1)

where 0.0377 is obtained from 1 / 26.53. The function given by Eq. (6.1) is well defined in the region \(0.0377<|z|<26.53\), which includes the unit circle \(|z|=1\). Therefore, the corresponding system is stable (Proakis and Manolakis 1996, p. 209; see also appendix in Mourjopoulos 1994), so an inverse filter \(\{b_t\}\) can be constructed.

From (6.1), a (Laurent) series expansion yields

$$\begin{aligned} F(z)= & {} -\frac{1.0782}{26.53}z[1-z/26.53+ z^2/26.53^2- \cdots ]\nonumber \\&{}+\, 1.0782[1-0.0377z^{-1}+(0.0377)^2z^{-2}\nonumber \\&{}-\, (0.0377)^3z^{-3}+ \cdots ]\nonumber \\= & {} \cdots +0-0.0001z^3+0.0015z^2-0.0406z \nonumber \\&{ }+\, 1.0782 -0.0406z^{-1} + 0.0015z^{-2}\nonumber \\&{}-\, 0.0001z^{-3}+\cdots . \end{aligned}$$
(6.2)

We obtain \(b_t\) as the coefficient of \(z^{-t}\) in Eq. (6.2), for \(t=0,\pm\, 1, \pm \, 2,\ldots\). Note that, here and in general, the inverse filter \(\{b_t\}\) is of infinite length and needs to be truncated. We consider a truncation \(\{b'_t\}\) of \(\{b_t\}\) that satisfies (Proakis and Manolakis 1996, §8.5.2)

$$\begin{aligned} ||\{b'_t\}*\{\eta _t\}-\{\delta _t\}|| < 0.001. \end{aligned}$$
(6.3)

The corresponding truncated discrete time inverse MA filter is

$$\begin{aligned} \{b'_t\}= & {} \{-\,0.0001, 0.0015, -\,0.0406, \varvec{1.0782}, \nonumber \\&-\,0.0406,0.0015, -\,0.0001\}, \end{aligned}$$
(6.4)

where the number in bold indicates the entry corresponding to \(t = 0\). In this case,

$$\begin{aligned} \{b'_t\}*\{\eta _t\}= & {} \{\delta _t\} \end{aligned}$$
(6.5)

to within four decimal places for each entry.

Results

The simulated data set was analysed using MAD, MAFHMM and standard HMM. The results are summarised in Table 1, which shows the estimates of mean current and noise standard deviation for each level, together with the corresponding standard errors (determined by Khan’s algorithm, Khan 2003) and (individual) \(95\%\) confidence intervals.

All three methods give good estimates of the mean current at each level, but MAFHMM gave the largest standard errors. However, MAD provides much better estimates of noise standard deviations, and again MAFHMM produced the largest standard errors.

Table 1 Analysis of simulated data based on HMM and MAD

For HMM, MAFHMM and MAD, data were idealised as described in Almanjahie et al. (2015) and Khan et al. (2005). The idealised and simulated Markov chains were compared to identify any discrepancies or idealisation errors between them. MAD made 90 such errors, compared to 150 for HMM and 119 for MAFHMM out of a total of 100,000 points.

The mean dwell times and the number of points at each level as computed from the idealisations are presented in Table 2. For each level, this table shows the following mean dwell times: theoretical or true (\(\tau _i\), computed as \(-1/q_{ii}\), for each i, where \(q_{ii}\) is the \(i\)th diagonal entry of the \(\varvec{Q}\) used in the simulation), simulated (\({\bar{\tau }}_i\), computed as the mean dwell time in each state of the sampled simulated Markov chain) and estimated (\({\hat{\tau }}_i\)). The mean dwell times estimated by the methods were comparable, and similar to the corresponding theoretical and simulated values.

Table 2 Analysis of simulated data based on HMM and MAD

Fifty such simulation studies were conducted and gave results similar to the above.

Algorithm complexity

Computation of the likelihood is dominated by the number of arithmetic operations required for calculating the forward and backward probabilities. For an N-state Markov chain of length T, the forward probabilities require \(O(N^2T)\) calculations; see Rabiner (1989). If a moving average filter of total length \(2r+1\) is included, the number of calculations required when using the metastate approach is \(O(M^2T)\), where \(M=N^{2r+1}\) (Khan et al. 2005). The generalised forward–backward algorithm used in MAFHMM (Almanjahie et al. 2015) requires O(MNT) calculations.

As an example, when \(N=7\) and \(T=100{,}000\), the number of calculations required for each type of algorithm are summarised in Table 3. Note that MAD is equivalent to the standard HMM. For large data sets (typical of ion channel records), the run-time savings for MAD can be considerable.

Table 3 Algorithm complexity

Application: MscL data

Extensive high bandwidth patch clamp data were obtained in the laboratory of Professor Boris Martinac (Head of Mechanosensory Biophysics Laboratory, Victor Chang Cardiac Research Institute) from MscL in the bacterium E. coli. The data were recorded by the same researcher in the same laboratory during the same afternoon under identical environmental conditions, with applied voltage \(+100\) mV, bandwidths 25 kHz and 50 kHz and digitally sampled at 75 kHz and 150 kHz, respectively. Four recordings at each bandwidth were obtained, each containing between 5 and 30 million data values. Further details of the experiment can be found in Almanjahie et al. (2015).

The data were screened and eight data sets (four at each bandwidth) each containing about 200,000 values were selected for analysis. Preliminary exploration and analysis (see Almanjahie et al. 2015) revealed that the noise standard deviations at the intermediate levels were larger than those at the closed and fully open levels. Following Khan et al. (2005), constraints were placed so the noise standard deviations at intermediate levels were equally spaced between those for the closed and fully open levels. Based on comprehensive analysis of the data using HMM and MAFHMM (moving average filtered HMM), Almanjahie et al. (2015) concluded that MscL in E. coli  had five subconducting levels along with the fully open and closed levels. The main purpose in this section is to compare the performance of MAD with that of HMM and MAFHMM.

For the MscL data, the ratio \(f_s/f_c=3\) corresponds to a filter of length 3 (\(r=1\)) with coefficients \(\eta _0=0.84\) and \(\eta _{-1}=\eta _1=0.080\) as in  Colquhoun and Sigworth (2009, Appendix, 3, p. 576) or Schouten (2000, Table 4.1). For MAD, the norm-based criterion Eq. (3.12) with \(\varepsilon =0.001\) yields the corresponding truncated discrete time inverse MA filter as

$$\begin{aligned} \{b'_t\}&= \{-\,0.0011, 0.0112, -\,0.1164, \varvec{1.2125},\\&\quad -\,0.1164, 0.0112, -\,0.0011\}. \end{aligned}$$

In this case \(\{b'_t\}*\{\eta _t\} =\{\delta _t\}\) to within 3 decimal places for each entry.

Statistical analysis

With the number of levels set at \(N=7\) (Almanjahie et al. 2015) and state-dependent noise constrained as described above, the eight data sets were analysed using HMM, MAFHMM and MAD. Equations (2.3)–(2.6) were used for parameter estimation in HMM, and also MAD (for the preprocessed data). Parameter estimates for MAFHMM, derived in Almanjahie et al. (2015), are given for comparison. Estimated mean currents were offset to give \({\check{\mu }}_i= {\hat{\mu }}_i - {\hat{\mu }}_0\), \(i=0,1,\ldots ,6\), so that the mean current at the closed level was zero.

Results

Table 4 shows estimated intermediate conductance values for MscL. For each level the estimated conductance (% max) is the corresponding mean current as a percentage of that at the fully open level. (Conductance values for the closed and fully open levels are 0 and 100,  respectively.) Also given for each bandwidth are the mean (\({\bar{x}}\)) and standard deviation (s) of the four estimates of mean conductances at each level. Figure 1 is a plot of the conductance values for the eight data sets.

Table 4 Estimated MscL intermediate level conductances (% max) for the four data sets at each bandwidth, based on HMM (H), MAFHMM (M) and MAD (D). Also given, for each bandwidth are sample means (\({\bar{x}}\)) and sample standard deviations (s) of the four estimates of mean conductances for each level
Fig. 1
figure 1

Plots of estimated conductances (% max) for MscL based on HMM (open circles), MAFHMM (filled circles) and MAD (open diamonds). Bandwidths 25 kHz or 50 kHz as indicated

For the 25 kHz data, the estimated intermediate conductance at each level is highest for HMM. However, for the 50 kHz data there is little difference in the conductance estimates obtained by the three approaches.

Discussion

We have incorporated correlated noise in an HMM for ion channel data and used deconvolution to pre-whiten the noise, resulting in a standard HMM for the preprocessed data. Parameter estimates were obtained by the EM algorithm. The method performed well in simulation studies.

We applied this methodology to MscL data from E. coli, giving the results in Table 4 and Fig. 1. The estimates of channel conductances are comparable to those of other researchers (Sukharev et al. 1999, 2001; Petrov et al. 2011), as can be seen from Table 3 and Figure 8 in Almanjahie et al. (2015).

We have also computed standard errors and confidence intervals for parameter estimates for the simulated data sets; these are not routinely produced by channel researchers, but are an important adjunct as they quantify precision of the estimates.

An important point to note is that the model in Eq. (5.6) is an HMM for the preprocessed data. Since the preprocessing depends on an approximation to the inverse filter, the parameter estimates and standard errors for Eq. (5.6) do not coincide exactly with those for Eq. (5.1). However, in practice this approximation should have minimal effect when the truncation error is small.

Almanjahie et al. (2015) determined transition schemes for MscL in E. coli  based on HMM and MAFHMM analyses of the eight data sets. Compared to that for the 50 kHz data, their scheme for the 25 kHz data had one extra transition. Based on MAD analysis for all eight data sets, we produced transition schemes for this channel and these coincided exactly with that for the 50 kHz data in Almanjahie et al. (2015); they also reported that at the closed level the channel has two states. However, for the purpose of estimating mean channel conductance, this has no effect.

Overall, the key contributions of this paper are the development of a filtered HMM incorporating correlated noise and a meta-state-free algorithm for parameter estimation; statistical analysis of extensive high bandwidth data; and highlighting the importance of bandwidth for estimating channel characteristics. The new algorithm is simple and greatly reduces computation time and memory requirements. These advantages are important for processing the very large data sets that are made possible by high bandwidth recordings as a result of improvements in technology and experimental technique. Enhancements in technology coupled with corresponding advances in computational techniques are instrumental in furthering our understanding of the structure of ion channels.