Introduction

Ion channels are large protein molecules embedded in cell membranes across which they selectively control the flow of ions, thereby regulating many aspects of cell function  (Aidley and Stanfield 1996, p. 3). Conduction of ions occurs through an aqueous pore gated by specific stimuli. Simple channels have only two levels, open and closed, but complex channels may have several conducting levels. E. coli  expresses two types of mechanosensitive channel, MscL (large conductance) and MscS (small conductance), both multilevel channels that control intra-cellular pressure (Hamill and Martinac 2001).

Ion channel currents, recorded using the patch clamp technique (Hamill et al. 1981), include noise from various sources (Benndorf 2009). The current is low-pass filtered, and digitised (sampled and quantised), producing a sequence of observations that appear to fluctuate randomly between conductance levels.

Hidden Markov models (HMMs) were first applied to analysis of (two-level) ion channel data by Chung et al. (1990), who assumed state (level)-independent Gaussian white noise with known variance. They estimated the mean current at each level from a histogram of current amplitudes, and state transition probabilities using the Baum–Welch algorithm (Baum et al. 1970).

However, noise has been found to be greater in open states (Colquhoun and Sigworth 2009; Heinemann and Sigworth 1993). Consequently, Klein et al. (1997) assumed state-dependent Gaussian white noise. They were the first to use the expectation-maximisation (EM) algorithm (Dempster et al. 1977) for parameter estimation in the resulting HMM, obtaining estimates for mean currents, transition probabilities and noise variances (for simulated data).

While the low-pass filter removes high-frequency noise, facilitating detection of channel events (Colquhoun and Sigworth 2009, p. 484), it also reduces pulse amplitude and duration. This may bias estimates of mean currents, and mean time spent at each level as a result of missing (very) brief sojourns. Consequently, it is desirable to incorporate in the model an adjustment for the low-pass filtering. Fredkin and Rice (1992) were the first to use a moving average adjustment, applying their model with state-independent white noise to two-level data.

Later, Khan (2003) used HMMs with state-dependent Gaussian white noise and a moving average adjustment for filtering. An EM algorithm was developed in Khan (2003) and Khan et al. (2005) to obtain parameter estimates indirectly, using a meta-state (vector) Markov chain with a much larger state space than that of the original Markov chain. They analysed 10 kHz patch clamp data from MscL in E. coli, based on their HMMs (with and without adjustment for filtering). Parameter estimates for the actual HMM needed to be recovered from those of the meta-state HMM. They concluded that higher bandwidth data were required to adequately describe channel characteristics.

In the present study, parameter estimates for HMMs with a moving average adjustment for filtering were obtained using an improved EM-based algorithm that depends on a generalisation of the Baum forward–backward algorithm (Fredkin and Rice 2001). This algorithm has three advantages over that of Khan et al. (2005): it maximises the log-likelihood function for the actual HMM with an adjustment for filtering, rather than that for an associated HMM; the algorithm is much simpler; and the computational load is substantially reduced.

This paper is structured as follows: In “Modelling ion channel data” we introduce HMMs for ion channel data and briefly describe the EM-based algorithm for parameter estimation. The generalisation of Baum’s forward–backward algorithm is developed in “MAFHMM parameter estimation”. In “Methods and results” we analyse extensive improved bandwidth patch clamp data from MscL. We estimate the number of levels, the conductances, mean dwell times and proportion of time spent at each level. Further, in “Transition schemes” we infer a transition scheme for MscL based on our results. In “Discussion” we compare our results with those from previous studies, and finish with“Concluding remarks”.

Modelling ion channel data

Standard hidden Markov models

The gating behaviour of a single ion channel is often modelled by a continuous-time homogeneous Markov chain, with a finite number of states representing the conformational states of the channel (Colquhoun and Hawkes 1981; Ball and Sansom 1989; Becker et al. 1994; Colquhoun and Hawkes 1997). The states are assumed to have distinct conductances. The continuous-time (noisy) channel current is modelled as state-dependent Gaussian white noise added to the current corresponding to the unobserved (or hidden) channel state.

Initially we ignore the effect of filtering. Let \(X_{t}\) be the state of the channel at sampling time t, \(t=1,2,\ldots ,T\). Then \(\varvec{X}=(X_{1},X_{2},\ldots ,X_{T})\) is a discrete-time, homogeneous Markov chain, assumed to be irreducible, with finite state space \(S =\{0 ,1 , \ldots ,N-1\}\), \(N\times N\) transition probability matrix \(\varvec{P}=[p_{ij}]\) and initial distribution \(\varvec{\pi }=(\pi _{0},\pi _{1},\ldots ,\pi _{N-1})\). Denote by \(\mu _{i}\) and \(\sigma _{i}\), respectively, the mean current and noise standard deviation corresponding to state i of the channel, \(i=0,1,\ldots ,N-1\). Let \(\epsilon _{1},\epsilon _{2},\ldots ,\epsilon _{T}\) be independent and identically distributed standard normal random variables, also assumed to be independent of \(\varvec{X}\). The observed sample current at time t is given by

$$\begin{aligned} Y_{t}=\mu _{X_{t}}+\sigma _{X_{t}}\epsilon _{t},\quad t=1,2,\ldots ,T. \end{aligned}$$
(1)

Given \(\varvec{X}\), the random variables \(Y_{1},Y_{2},\ldots ,Y_{T}\) are conditionally independent. Moreover, for \(t=1,2,\ldots ,T\) the distribution of \(Y_{t}\) conditional on \(\varvec{X}\) depends only on \(X_{t}\) and, from (1),

$$\begin{aligned} (Y_{t}\mid X_{t}=x_{t})\sim N(\mu _{x_{t}},\sigma ^2_{x_{t}}), \end{aligned}$$
(2)

a normal distribution with mean \(\mu _{x_{t}}\) and variance \(\sigma ^2_{x_{t}}\). Put \(\varvec{Y}=(Y_1,Y_2,\ldots ,Y_T)\). The distribution of \((\varvec{X}, \varvec{Y})\) is called a (standard) HMM and its joint distribution, i.e. probability mass function for \(\varvec{X}\) and probability density function (pdf) for \(\varvec{Y}\), is given by

$$\begin{aligned} {\mathbb {P}}(\varvec{x},\varvec{y}) = \pi _{x_{1}}\prod _{t=2}^{T}p_{x_{t-1},{x_{t}}}\prod _{t=1}^{T}f_{x_t}(y_{t}), \end{aligned}$$
(3)

where \(\varvec{x}=(x_{1},x_{2},\ldots ,x_{T})\in S^T\), \(\varvec{y}\in {\mathbb {R}}^T\) and \(f_{x_t}\) is the \(N(\mu _{x_{t}},\sigma ^2_{x_{t}})\) pdf. Then the likelihood of the observed data is given by the marginal pdf of \(\varvec{Y}\),

$$\begin{aligned} g(\varvec{y}) = \sum _{\text {all}\, \varvec{x}}{\mathbb {P}}(\varvec{x},\varvec{y}),\quad \varvec{y}\in {\mathbb {R}}^T, \end{aligned}$$
(4)

where the sum is over all possible vectors \(\varvec{x}\).

Let \(\phi =(\varvec{\pi },\varvec{P},\varvec{\mu },\varvec{\sigma })\) denote the parameters in (3), where \(\varvec{\mu }\) and \(\varvec{\sigma }\) are the vectors of mean currents and noise standard deviations, respectively. Maximum-likelihood parameter estimation is complicated if based on (4) (Rabiner 1989, p. 21). The underlying Markov chain \(\varvec{X}\) is unobserved, so (3) may be described as the complete data likelihood and (4) as the incomplete data likelihood (Dempster et al. 1977). The EM algorithm maximises the incomplete (observed) data likelihood indirectly, by proceeding iteratively using the complete data log-likelihood. For other approaches, see Qin et al. (2000a, b), de Gunst et al. (2001) and Turner (2008).

In the following, \(i,j=0,1,\ldots ,N-1\) represent the states of the Markov chain and m indexes iterations of the EM algorithm, with \(m=0\) indicating initial values. At iteration \(m\ge 1\) of the EM algorithm, the updating formulae for \(\pi _{i}\), \(p_{ij}\), \(\mu _{i}\) and \(\sigma _{i}\) are (Klein et al. 1997; Khan et al. 2005)

$$\begin{aligned} \pi ^{m+1}_{i}&=\gamma ^m_{1}(i), \end{aligned}$$
(5)
$$\begin{aligned} p^{m+1}_{ij}&=\frac{\sum _{t=1}^{T-1}\gamma ^m_{t}(i,j)}{\sum _{t=1}^{T-1}\sum _{j=0}^{N-1}\gamma ^m_{t}(i,j)}, \end{aligned}$$
(6)
$$\begin{aligned} \mu ^{m+1}_{i}&=\frac{\sum _{t=1}^{T}\gamma ^m_{t}(i) y_{t}}{\sum _{t=1}^{T}\gamma ^m_{t}(i)}, \end{aligned}$$
(7)
$$\begin{aligned} \sigma ^{m+1}_{i}&=\left\{ \frac{\sum _{t=1}^{T}\gamma ^m_{t}(i)(y_{t}-\mu ^{m+1}_{i})^2}{\sum _{t=1}^{T}\gamma ^m_{t}(i)}\right\} ^\frac{1}{2}, \end{aligned}$$
(8)

where for \(t=1,2,\ldots ,T\)

$$\begin{aligned} \gamma _{t}^{m}(i)= {\mathbb {P}}(X_{t}=i\mid \varvec{y},\phi ^m), \end{aligned}$$
(9)

and for \(t=1,2,\ldots ,T-1\)

$$\begin{aligned} \gamma _{t}^{m}(i,j)= {\mathbb {P}}(X_{t}=i,X_{t+1}=j\mid \varvec{y},\phi ^m). \end{aligned}$$
(10)

Iterations continue until some stopping criterion is satisfied.

The \(\gamma _{t}^{m}(i)\) and \(\gamma _{t}^{m}(i,j)\) above can be computed using Baum’s forward–backward algorithm (Baum et al. 1970). The forward and backward probabilities are, respectively,

$$\begin{aligned} \alpha _{t}(i)&= {\mathbb {P}}(y_{1},\ldots ,y_{t}, X_{t}=i),\ t=1,2,\ldots ,T, \\ \beta _{t}(i)&= {\mathbb {P}}(y_{t+1},\ldots ,y_{T}\mid X_{t}=i), \end{aligned}$$

\(t= T-1,T-2,\ldots ,1\), and \(\beta _{T}(i)=1\). These tend to zero geometrically with t, so scaled versions are needed. As in Devijver (1985), for \(t= 1,2,\ldots ,T\) and \(i = 0,1,\ldots ,N-1\), define

$$\begin{aligned} \bar{\alpha }_{t}(i)&={\mathbb {P}}(X_{t}=i \mid y_{1},y_{2},\ldots ,y_{t}),\\ \bar{\beta }_{t}(i)&=\beta _{t}(i)/{\mathbb {P}}(y_{t+1},y_{t+2},\ldots ,y_{T}\mid y_{1},\ldots ,y_{t}), \end{aligned}$$

and \(\bar{\beta }_{T}( i )=1\). For \(i = 0,1,\ldots ,N-1\), the recursions for computing \(\bar{\alpha }_{t}(i)\) and \(\bar{\beta }_{t}(i)\) are

$$\begin{aligned} \bar{\alpha }_{t+1}(i)=\frac{1}{c_{t+1}}\sum _{j=0}^{N-1}\bar{\alpha }_{t}(j)p_{ji}f_{i}(y_{t+1}) \end{aligned}$$
(11)

for \(t=1,2,\ldots ,T-1\), and

$$\begin{aligned} \bar{\beta }_{t}(i)=\frac{1}{c_{t+1}}\sum _{j=0}^{N-1}p_{ij}f_{j}(y_{t+1})\bar{\beta }_{t+1}(j) \end{aligned}$$
(12)

for \(t=T-1,T-2,\ldots ,1\), where \(c_1={\mathbb {P}}(y_1)\) and \(c_{t+1}={\mathbb {P}}(y_{t+1}\mid y_{1},y_{2},\ldots ,y_{t})\). From (11), since \(\sum _{i=0}^{N-1}\bar{\alpha }_{t}(i)=1\), the scale factors are

$$\begin{aligned} c_{t+1}=\textstyle {\sum \limits _{i,j=0}^{N-1}}\bar{\alpha }_{t}(j)p_{ji}f_{i}(y_{t+1}) \end{aligned}$$

for \(t=1,2,\ldots ,T-1\). Then, as in Khan et al. (2005), for \(i,j=0,1,\ldots ,N\),

$$\begin{aligned} \gamma _{t}^{m}(i,j)=\bar{\alpha }_{t}^{m}(i)p_{ij}f_{j}^{m}(y_{t+1})\bar{\beta }_{t+1}^{m}(j)/ c_{t+1} \end{aligned}$$

for \(t = 1,2,\ldots ,T-1,\) and for \(t = 1,2,\ldots ,T\) \(\gamma _{t}^{m}(i)=\bar{\alpha }_{t}^{m}(i)\bar{\beta }_{t}^{m}(i)\).

HMM with adjustment for filtering

We now assume that the mean current is filtered, but not the noise. (Models in which the noise also is filtered are more complicated and will be dealt with elsewhere.) The Bessel filter used in patch clamp experiments is well approximated by a Gaussian digital filter (Colquhoun and Sigworth 2009; Schouten 2000). This is a symmetric moving average (MA) filter with length r and weights \(\eta _{s}\), \(s=-r,\ldots ,r\), and is easily incorporated into the model.

Let \(X_{t-r}^{t+r}=\left( X_{t-r},\ldots ,X_t,\ldots ,X_{t+r}\right)\), and similarly for \(x_{t-r}^{t+r}\). As in Khan (2003), the filtered mean channel current \(I_t\) at times \(t=r+1,r+2,\ldots ,T-r\) is given by

$$\begin{aligned} I_t=\sum _{s=-r}^{r}\eta _{s}\mu _{X_{t-s}}. \end{aligned}$$
(13)

The values of r and \(\eta _{s}\) are determined by the ratio of the sampling and the filter cut-off frequencies (Table 4.1, Schouten 2000; Colquhoun and Sigworth 2009, p. 577). Similarly to (1), the digitised current at time t is

$$\begin{aligned} Y_{t}=I_{t} + \sigma _{X_{t}}\epsilon _{t},\ t=r+1,r+2,\ldots ,T-r. \end{aligned}$$
(14)

We refer to this model as a moving average filtered hidden Markov model (MAFHMM). The key difference between (1) and (14) is that the mean current now depends also on the underlying Markov chain states at r past and r future time points. Khan et al. (2005) viewed these states as a (\(2r+1\))-vector of meta-states (Fredkin and Rice 1992; Venkataramanan et al. 1998). For a Markov chain with N states, the total number of meta-states is \(M=N^{2r+1}\).

MAFHMM parameter estimation

Improved algorithm

For simplicity we consider only \(r=1\). The general case is straightforward but notationally complex. In the MAFHMM the mean current depends on the Markov chain states at times \(t-1\), t and \(t+1\), so

$$\begin{aligned} (Y_{t} \mid X_{t-1}^{t+1}=x_{t-1}^{t+1})\sim N(I_t,\sigma ^2_{x_{t}}) \end{aligned}$$
(15)

for \(t=2,3,\ldots ,T-1\). Extending the notation of (3), the corresponding pdf is \(f_{x_{t-1},x_t,x_{t+1}}(y_{t})\). The joint distribution for \(\varvec{X}\) and \(\varvec{Y}\) is

$$\begin{aligned} {\mathbb {P}}(\varvec{x},\varvec{y}) = \pi _{x_{1}}\prod _{t=2}^{T}p_{x_{t-1},x_{t}}\prod _{t=2}^{T-1} f_{x_{t-1},x_t,x_{t+1}}(y_{t}) \end{aligned}$$
(16)

for \(\varvec{x} \in S^{T}\) and \(\varvec{y} \in {\mathbb {R}}^{(T-2)}\).

Key to our present approach to estimating the MAFHMM parameters is an algorithm for computing the conditional probabilities for \(\varvec{X}\) given \(\varvec{Y}\), due to Fredkin and Rice (2001), based on a generalisation of Baum’s algorithm. They used the algorithm for approximating the likelihood for simulated data, but not for maximum-likelihood estimation. Our contributions lie in a more detailed exposition and application of the algorithm to analysis of real data.

Throughout the rest of this section, indices \(i,j,k,\ell =0,1,\ldots ,N-1\) denote Markov chain states. Since the (filtered) current depends on three consecutive states of the underlying Markov chain, we generalise Baum’s forward and backward probabilities to

$$\begin{aligned} \alpha _{t}(i,j,k)={\mathbb {P}}(y_{2},\ldots ,y_{t}, X_{t-1}^{t+1}=\left( i,j,k\right) ) \end{aligned}$$
(17)

for \(t=2,3,\ldots ,T-1\), and

$$\begin{aligned} \beta _{t}(i,j,k)={\mathbb {P}}(y_{t+1},\ldots ,y_{T-1}\mid X_{t-1}^{t+1}=\left( i,j,k\right) ) \end{aligned}$$
(18)

for \(t=T-2,T-3,\ldots ,2\), and \(\beta _{T-1}(i,j,k){=}1\). Similarly to (11) and (12), the scaled versions of these probabilities are defined as

$$\begin{aligned} \bar{\alpha }_{t}(i,j,k) = {\mathbb {P}}(X_{t-1}^{t+1}=\left( i,j,k\right) \mid y_{2},\ldots ,y_{t}), \end{aligned}$$
(19)
$$\begin{aligned} \bar{\beta }_{t}(i,j,k)= \frac{\beta _{t}(i,j,k)}{{\mathbb {P}}(y_{t+1},\ldots ,y_{T-1}\mid y_{2},\ldots ,y_{t})} \end{aligned}$$
(20)

and \(\bar{\beta }_{T-1}(i,j,k){=}1\). The recursions for computing \(\bar{\alpha }_{t}(i,j,k)\) and \(\bar{\beta }_{t}(i,j,k)\) are

$$\begin{aligned} \bar{\alpha }_{t+1}(i,j,k)= \frac{1}{c_{t+1}}\sum _{\ell =0}^{N-1}\bar{\alpha }_{t}(\ell ,i,j)p_{jk}f_{i,j,k}(y_{t+1}) \end{aligned}$$
(21)

for \(t=2,3,\ldots ,T-2\), and

$$\begin{aligned} \bar{\beta }_{t}(i,j,k)= \frac{1}{c_{t+1}}\sum _{\ell =0}^{N-1}p_{k\ell }f_{j,k,\ell }(y_{t+1})\bar{\beta }_{t+1}(j,k,\ell ) \end{aligned}$$
(22)

for \(t=T-2,T-3,\ldots ,2\), where \(c_2={\mathbb {P}}(y_2)\) and

$$\begin{aligned} c_{t+1}&={\mathbb {P}}(y_{t+1}\mid y_{2},\ldots ,y_{t})\\&=\sum _{i,j,k,\ell =0}^{N-1}\bar{\alpha }_{t}(\ell ,i,j)p_{jk}f_{i,j,k}(y_{t+1}). \end{aligned}$$

Then for \(t=2,3,\ldots ,T-1\), the distribution of \(\{X_{t-1},X_{t},X_{t+1}\}\) given the observed data \(y_{2},\ldots ,y_{T-1}\) can be computed as

$$\begin{aligned} {\mathbb {P}}(X_{t-1}^{t+1}&=\left( i,j,k\right) \mid \varvec{y})\\ &= \bar{\alpha }_{t}(i,j,k)\bar{\beta }_{t}(i,j,k). \end{aligned}$$

Similarly to the unfiltered case, we use the EM algorithm to derive an iterative scheme for estimating the parameter vector \(\phi\). For \(t=2,3,\ldots ,T-1\) put

$$\begin{aligned} \varGamma ^m_{t}(i,j,k)&= {\mathbb {P}}(X_{t-1}^{t+1}=\left( i,j,k\right) \mid \varvec{y}, \phi ^{m})\nonumber \\&= \bar{\alpha }_{t}(i,j,k)\bar{\beta }_{t}(i,j,k), \end{aligned}$$
(23)

and for \(t=1,2,\ldots ,T-2\),

$$\begin{aligned} \varGamma ^m_{t}(i,j)&= {} {\mathbb {P}}(X_{t}=i,X_{t+1}=j \mid \varvec{y},\phi ^{m})\nonumber \\&= {} \sum _{k=0}^{N-1} \varGamma ^m_{t+1}(i,j,k), \end{aligned}$$
(24)

and \(\varGamma ^m_{T-1}(i,j) =\sum _{k=0}^{N-1} \varGamma ^m_{T-1}(k,i,j)\). Finally, for \(t=1,2,\ldots ,T-1\)

$$\begin{aligned} \varGamma ^m_{t}(i) = {\mathbb {P}}(X_{t}=i \mid \varvec{y},\phi ^{m}) =\sum _{j=0}^{N-1} \varGamma ^m_{t}(i,j), \end{aligned}$$
(25)

and \(\varGamma ^m_{T}(i) =\sum _{j=0}^{N-1}\varGamma ^m_{T-1}(j,i)\).

The resulting EM-based updating formulae for the MAFHMM are

$$\begin{aligned} \pi ^{m+1}_{i}&=\varGamma ^m_{1}(i), \end{aligned}$$
(26)
$$\begin{aligned} p^{m+1}_{ij}&=\frac{\sum _{t=1}^{T-1}\varGamma ^m_{t}(i,j)}{\sum _{t=1}^{T-1}\sum _{j=0}^{N-1}\varGamma ^m_{t}(i,j)}, \end{aligned}$$
(27)
$$\begin{aligned} \mu ^{m+1}_{\ell }&=\frac{\sum \nolimits _{t=2}^{T-1}\sum _{i,j,k=0}^{N-1}A^{\ell }(y_{t}-B^{\ell })\varGamma ^m_{t}(i,j,k)/\sigma ^{2}_{j}}{\sum \nolimits _{t=2}^{T-1}\sum _{i,j,k=0}^{N-1}(A^{\ell })^2\varGamma ^m_{t}(i,j,k)/\sigma ^{2}_{j}}, \end{aligned}$$
(28)

where \(A^{\ell }=\varvec{\eta }\varvec{c}_\ell ^\top\) and \(B^{\ell }=\varvec{\eta }\varvec{D}(\varvec{1}^\top -\varvec{c}_\ell ^\top )\). Here, \(\varvec{\eta }=(\eta _{-1},\eta _{0},\eta _{1})\) and \(\varvec{1}=(1,1,1)\) are row vectors, \(\varvec{D}={\mathrm{diag}}(\mu _{i},\mu _{j},\mu _{k})\) is a diagonal matrix, and \(\varvec{c}_\ell =(1_{\{i=\ell \}},1_{\{j=\ell \}},1_{\{k=\ell \}})\) is a row vector where each entry of \(\varvec{c}_\ell\) is an indicator function. Finally,

$$\begin{aligned} \sigma ^{m+1}_{j}&=\sqrt{\frac{\sum \nolimits _{t=2}^{T-1}\sum \nolimits _{i,k=0}^{N-1}\bigl (y_{t}-I^{m+1}_{i,j,k}\bigr )^2\varGamma ^m_{t}(i,j,k)}{\sum \nolimits _{t=2}^{T-1}\varGamma ^m_{t}(j)}}. \end{aligned}$$
(29)

The updating formulae for the initial distribution and transition probabilities have similar forms to those in the standard model, and those for the means and variances are again weighted averages (Khan et al. 2005).

Algorithm complexity

Computing the likelihood is dominated by the calculations for the forward and backward probabilities. For an HMM of length T with N states, the forward probabilities require O(\(N^2T\)) calculations (Rabiner 1989). For an MAFHMM with MA filter of length \(2r+1\), the number of calculations required when using the associated HMM is O(\(M^2T\)), where \(M=N^{2r+1}\). The improved algorithm requires O(MNT) calculations.

For example, for \(N=5\) and \(T=100{,}000\), the standard HMM requires on the order of 2,500,000 calculations, while the meta-state approach with an MA filter with \(r=1\) requires 1,562,500,000. In contrast, the improved algorithm requires on the order of 62,500,000 calculations.

Data idealisation

Data idealisation is the process of obtaining the (estimated) mean current corresponding to the channel state at each sampling point. For each \(t=1,2,\ldots ,T\), the idealised current is \(\hat{y}=\mu _{\hat{x}_t}\), where \(\hat{x}_t\) is the idealised state of the underlying Markov chain.

In the standard HMM, for \(t=1,2, \ldots ,T\), \(\hat{x}_{t}\) is obtained from the mode of the conditional distribution of \(X_{t}\) given \(\varvec{y}\), i.e. as the value of i for which \({\mathbb {P}}(X_{t}=i \mid \varvec{y},\phi )\) is maximum (Fredkin and Rice 1992). In the MAFHMM, for \(t=2,3,\ldots ,T-1\), \(\hat{x}_{t}\) is determined from the mode of the distribution of \(\{X_{t-1},X_{t},X_{t+1}\}\) given the observed data \(\varvec{y}\), i.e. as the value of j for which \({\mathbb {P}}(X_{t-1}=i,X_{t}=j,X_{t+1}=k \mid \varvec{y},\phi )\) is maximum.

Standard errors

Standard errors for parameter estimates can be obtained from the inverse of the observed information matrix (OIM). A general approach to computing the OIM within the EM iterations was discussed by Louis (1982), which requires the calculation of conditional expectations, but this is computationally intractable for HMMs. Khan (2003) developed a general recursive algorithm which computes the exact OIM for EM-based maximum-likelihood parameter estimates without requiring the computation of any expectations.

In the present study, standard errors for parameter estimates (under both HMM and MAFHMM) have been computed using Khan’s algorithm.

Methods and results

Experimental techniques

6\(\times\)His-tagged MscL proteins were purified, and the 6\(\times\)His tag was removed by thrombin according to a published procedure (Häse et al. 1995). Purified MscL was reconstituted into liposomes mixed with 100 % soybean azolectin using a dehydration/rehydration (D/R) reconstitution method (Delcour et al. 1989; Martinac et al. 2010). Mixed lipids were dissolved in chloroform and dried under nitrogen to make a thinner lipid film, and D/R buffer [200 mM KCl, 5 mM 4-(2-hydroxyethyl)-1-piperazineethanesulphonic acid (HEPES), adjusted to pH 7.2 with KOH] was added before vortexing and sonication for 10 min. MscL was added at protein-to-lipid ratio of 1:1000 (w/w) and incubated at 4 °C for 1 h. Detergent was removed with the addition of Biobeads (BioRad, Hercules, CA), followed by incubation at 4 °C for further 3 h. The proteoliposomes were collected by ultracentrifugation and resuspended in 30 μL D/R buffer. Aliquots of proteoliposomes were spotted onto cover slips and dehydrated overnight under vacuum at 4 °C. The dried proteoliposomes were rehydrated at 4 °C with D/R buffer and subsequently used for electrophysiological experiments.

The MscL channel activity was recorded from proteoliposomes using the patch clamp technique at applied voltage \(+100\) mV. The bath and pipette recording solution used in liposome experiments was the same, consisting of 200 mM KCl, 40 mM MgCl\(_2\) and 5 mM HEPES (pH 7.2 adjusted with KOH). Negative pressure (suction) activating MscL was applied to the patch pipette using a syringe, monitored with a pressure gauge (PM 015R, World Precision Instruments, Sarasota, FL). The single-channel current was amplified with an Axopatch 200B amplifier (Molecular Devices, Sunnyvale, CA), filtered at 25 and 50 kHz, digitized at 75 and 150 kHz, respectively, with a Digidata 1440A interface using pCLAMP 10 acquisition software (Molecular Devices, Sunnyvale, CA) and stored in a computer.

Data exploration

Four data sets at each bandwidth (25 and 50 kHz) were selected for analysis, each containing about 200,000 points. Figures 1 and 2 show segments; the higher bandwidth data have higher noise amplitude (about double at the closed level). Besides the closed and fully open levels, identified reasonably well by eye, both records show subconducting levels. Identifying subconducting levels is important, but requires more sophisticated approaches. We base our analysis on HMMs.

Fig. 1
figure 1

25 kHz data sampled at 75 kHz (sampling period 0.0133 ms, 1500 points). Broken horizontal lines show levels estimated using HMM with \(N=7\). (Portion A shown at higher time resolution in Fig. 6)

Fig. 2
figure 2

50 kHz data sampled at 150 kHz (sampling period 0.0066 ms, 3000 points); other details as in Fig. 1

Preliminary assessment of noise

The number of levels N was allowed to vary from 2 to 9, and for each N we fitted an HMM using (5)–(8). We restricted the maximum number of levels to 9, based on previous results (Sukharev et al. 2001; Khan 2003) and exploratory analysis of the present data. The fully open channel currents were 315–320 pA. The estimated noise standard deviations at the closed and fully open levels were about 5 and 11 pA, respectively, for the 25 kHz data, and about 9 and 12.5 pA, respectively, for the 50 kHz data. However, consistent with the findings of Khan et al. (2005), at the subconducting levels the estimated noise standard deviations were larger, 16–27 pA.

Previous studies (Blatz and Magleby 1986; Milne et al. 1989; Colquhoun and Sigworth 2009; Khan et al. 2005) suggested that the low-pass filter slows transitions between levels. Consequently, some data points in the digitised record are displaced in amplitude, causing the estimated noise standard deviations at the subconducting levels to be inflated by the contribution from these points. Further, the idealised record revealed that, as a result of over-estimation of noise variances, some points at subconducting levels were clearly misclassified. Consequently, the estimated noise standard deviations need to be constrained. A simple and computationally efficient approach is to constrain the noise standard deviations at the intermediate levels to be equally spaced between those at the fully open and closed levels, as in Khan et al. (2005). This resolved the misclassification problem.

Statistical analysis

We used HMMs and MAFHMMs to further analyse these records with the main aims of estimating the number of levels and the corresponding conductances. Key results will be summarised for all data sets, but details will be shown for just three selected data sets: (a) at bandwidth 25 kHz and (b) and (c) at 50 kHz.

Estimating the number of levels

Extensive analyses were carried out for all eight data sets, using HMMs with the number of levels N allowed to vary between 2 and 9 and noise standard deviations constrained as described in “Preliminary assessment of noise”. We selected the value of N based on the two goodness-of-fit criteria described in Khan et al. (2005). Figure 3 depicts the root mean square error (RMSE) for the three selected data sets, where

$$\begin{aligned} \hbox {RMSE}= \left\{ \frac{1}{T}\sum _{t=1}^{T}(y_{t}-\mu _{\hat{x}_{t}})^2\right\} ^{1/2}. \end{aligned}$$

As N increases, the RMSE decreases, but only slightly for \(N > 7\). Similar behaviour was observed for the second criterion, the negative maximised log-likelihood. The above analyses were repeated using MAFHMMs, with \(\mu _{\hat{x}_{t}}\) in the expression for RMSE replaced by \(\hat{I}_t = \sum _{s=-1}^{1}\eta _{s}\mu _{\hat{x}_{t-s}}\) and the average taken over a sequence of length \(T-2\).

Furthermore, comparing high time resolution plots of observed data with corresponding idealisations revealed evidence of over-fitting for \(N>7\), with additional levels being fitted within what appeared to be baseline noise. Hence we selected \(N=7\).

Fig. 3
figure 3

RMSE for data sets (a), (b) and (c)

Further results

Summarised in Table 1 are the results for each data set based on models with \(N=7\). Currents were offset after analysis so that the estimated mean current at the closed level was zero, that is \(\check{\mu }_i= \hat{\mu }_i - \hat{\mu }_0\), \(i=0,1,\ldots ,6\). At each level, channel conductances (% max) were estimated by the mean current as a percentage of that at the fully open level. Corresponding standard errors were computed by using Khan’s (2003) algorithm.

Idealised records were used to determine the mean dwell times and corresponding standard errors, and the proportion of time at each level. Estimated conductances and mean dwell times are presented graphically in Figs. 4 and 5.

Fig. 4
figure 4

Estimated conductances (% max) based on HMM (open marker) and MAFHMM (solid marker) for data sets (a), (b) and (c)

Fig. 5
figure 5

Estimated mean dwell times based on HMM (open marker) and MAFHMM (solid marker) for data sets (a), (b) and (c) (disc, square and diamond, respectively)

Table 1 Results from HMM and MAFHMM analyses with \(N=7\): data recorded at (a) 25 kHz and sampled at 75 kHz (200,000 points, sampling period 0.0133 ms); (b) and (c) 50 kHz and sampled at 150 kHz (190,000 points and 200,000 points, respectively, sampling period 0.0066 ms)

For the 25 kHz data, estimated subconductances were lower with filtering than without by as much as 4 % (see also Fig. 4). Also, as shown in Fig. 6, some points were idealised differently by the MAFHMM, most noticeably at the intermediate levels (for example between times 863.6 and 864.2). For each of the 50 kHz data sets, estimated subconductances were similar for HMM and MAFHMM. Adjustment for filtering resulted in minor differences in the estimated occupancy probabilities (\(\hat{p}_i\)).

Fig. 6
figure 6

Idealisation of portion A (Fig. 1) based on (a) standard HMM and (b) MAFHMM

For the 50 kHz data sets, corresponding estimated subconductances were higher than for the 25 kHz data. For HMM analysis, the differences were 1–8 % of the fully open level (with the largest difference at level 2) and for the MAFHMM they were 3–11 % (with the largest at level 3). For the two 50 kHz data sets, corresponding subconductances were different by 3–7 % for the HMM and 2–6 % for the MAFHMM. To indicate the statistical variability, estimated subconductances for all eight data sets are shown in Table 2, along with corresponding (sample) mean conductance (\(\bar{x}\)) and standard deviation (s) for each level.

Table 2 Estimated subconductances (% max) based on MAFHMM analyses of four data sets at each bandwidth

Figure 5 shows that estimated mean dwell times (\(\hat{\tau }_i\)) at corresponding intermediate levels were lower with filtering than without. This is also true for the closed and fully open levels (not shown in Fig. 5 due to the different scale). Comparison across the data sets indicated that the estimated mean dwell times at corresponding conductance levels were higher without filtering for the 25 kHz data set, but were similar for the two 50 kHz data sets.

Dwell time histograms at each level for the 50 kHz data set (c) showed a preponderance of short dwell times and a long tail. Sukharev et al. (2001) and Perozo et al. (2002) have suggested the existence of two conformational states at level 0. To seek evidence for this, we fitted a mixture of exponential distributions (Colquhoun and Sigworth 2009) to the dwell times at each level. Using the Bayesian information criterion (BIC), for each of the eight data sets the best estimate for the number of components in the mixture at level 0 was two. This is consistent with the gating scheme proposed by Sukharev et al. (2001). Similar analyses for other levels indicated only one component for each.

Transition schemes

For the 50 kHz data sets the transition scheme for MAFHMM based on estimated transition matrices and idealisations is shown in Fig. 7. We have adopted the notation SC\(_1,\ldots,\) SC\(_5\) of Cox et al. (2013) to indicate subconducting levels. Observe that (direct) transitions from the closed level C are only to SC\(_1\) and SC\(_2\), and the fully open level can be reached only from SC\(_4\) and SC\(_5\). Each state communicates directly with up to two states above it or below it. This scheme is consistent with the linear structure for MscL proposed by Sukharev et al. (2001).

A transition scheme (not shown here) for the 25 kHz data was also constructed, based on MAFHMM analyses. That scheme contained a (two-way) transition between SC\(_2\) and SC\(_5\) not present in Fig. 7. There are two possible explanations for this. Firstly, the 25 and 50 kHz data were different, so it is possible that these transitions were not present in the 50 kHz data. Secondly, it may be that the lower cut-off frequency of the 25 kHz data caused substantial delays in the rise time of the signal, as a result of which some points were sampled while in transition. This would cause the analysis to detect transitions that in fact do not exist.

Fig. 7
figure 7

Transition scheme for 50 kHz data sets, based on MAFHMM analysis

Discussion

Conductances from previous studies (Sukharev et al. 1999, 2001; Khan et al. 2005; Petrov et al. 2011), together with those from our present study, are summarised in Table 3 and plotted in Fig. 8. Subconductances show some variation between studies. Sukharev et al. (1999) reported five levels for MscL. In a later investigation, based on higher bandwidth data, Sukharev et al. (2001) claimed seven levels for this channel. Khan et al. (2005) found five levels using 10 kHz data and suggested that additional levels might be found in data showing more activity in the subconducting levels. Recently, Petrov et al. (2011) reported six levels.

Table 3 Reported conductances (% max) for MscL in E. coli, bandwidths as indicated
Fig. 8
figure 8

Reported conductances (% max) for MscL in E. coli; sources as for Table 3. For Pr\(_{25}\) and Pr\(_{50}\), dash indicates the sample mean of estimated subconductances (\(\bar{x}\) in Table 2). C and FO refer to the closed and fully open levels, respectively, and SC\(_1,\ldots,\) SC\(_5\) are intermediate conducting levels; N is the number of levels

Sukharev et al. (2001) analysed two different data sets recorded at applied voltages of \(-50\) and \(-20\) mV, respectively. They found five levels for the \(-50\) mV data and six levels for the \(-20\) mV data. By merging the results from these two analyses, they concluded that the MscL had seven levels. They also suggested that the 9 and 56 % levels (respectively, SC\(_1\) and SC\(_3\) in Fig. 8) were missing in the earlier investigation due to very low activity at subconducting levels. Petrov et al. (2011) used high hydrostatic pressure in their experiment to increase channel activity and determined the levels from amplitude histograms rather than a model-based analysis. Note the gap between their first and second subconducting levels in Fig. 8.

Chiang et al. (2004) and Shapovalov and Lester (2004), based on different approaches, suggested that MscL in E. coli may have many energetic (conformational) states, corresponding to positions of side chains in the channel protein. However, not all of these may be important for channel opening, and HMM techniques detect conducting states that may correspond to ensembles of these molecular positions.

Based on extensive analysis of eight data sets, our estimate of the number of levels is seven. All our data were recorded under identical experimental conditions, during the same afternoon, with applied voltage \(+100\) mV. (The higher voltage increases signal-to-noise ratio, facilitating detection of subconducting levels.)

Understanding MscL gating behaviour requires knowledge of both the number of subconducting levels and the number of conformations at each level (Sukharev et al. 1999, 2001). Sukharev et al. (2001) proposed that the membrane tension stretches from the closed conformation (C) to a closed–expanded (CE) conformation before the first intermediate level (SC\(_1\)) occurs, indicating more than one conformational state at the closed level. This notion is further supported by the results of electron paramagnetic resonance (EPR) spectroscopic study showing that a reduction in bilayer thickness, which results from stretching the membrane during MscL activation by membrane tension, may stabilise at least one additional closed conformation of the channel (Perozo et al. 2002). Statistical analyses by Khan (2003, p. 113) showed the existence of three conformational states at the closed level. Based on our present data, we found that the closed level had two conformational states.

In Table 1 the mean dwell times for the subconducting levels were about 2.6 sampling periods for the 25 kHz data and about 3.7 sampling periods for the 50 kHz data. However, the minimum dwell time at intermediate levels for the 50 kHz data sets was equal to one sampling period, a phenomenon also reported by Khan et al. (2005). Hence, even with our higher bandwidth data, the phenomenon of missed brief events reported by Khan et al. (2005) has not been eliminated.

Concluding remarks

The current study presents the most extensive statistical analysis of MscL data reported in the literature to date. Our major findings are that MscL in E. coli has seven levels, with two conformational states at the closed level. This is the first study reporting seven levels in one recording, based on HMM analysis. In addition, our data are consistent with two conformational states at the closed level, providing further empirical evidence for the proposed gating scheme of Sukharev et al. (2001).

We expect our improvements to the HMM-based statistical modelling and EM-based computational algorithms to play an important role in analysing future higher bandwidth experimental data.