Key words

1 Introduction

High-throughput analysis of functional mutations in proteins, peptides, or DNA by deep sequencing is emerging as a powerful technique. Properties such as protein stability, enzymatic activity, and peptide ligand or DNA binding have been studied [116]. The general approach involves screening a library of mutants or performing a selection for a desired function. Library sequences in pre- and post-selected pools are then identified by next-generation sequencing , and computational routines are used to extract information about how sequence relates to function.

Many selection or screening processes have been employed for these types of studies, including in vitro assays, phage display, yeast surface display in combination with fluorescence-activated cell sorting (FACS) , and in vivo assays. Some studies have used the observed frequencies of mutant variants in selected pools to infer sequence–function relationships [15]. As an alternative measure, enrichment scores have been calculated from the ratio of pre- and post-selection frequencies [614]. The effects of mutations in particular sequence positions have been investigated, either by experimentally screening single-mutant libraries or by assuming positional independence during computational post-processing. Position weight matrices have been built that score binding, stability, and function using this approach, sometimes with correction for nonspecific binding or consideration of enrichment changes over multiple selection rounds [5, 12, 13]. Analyzing single-residue substitutions benefits from enhanced statistical power, because it is easy to saturate a single-position sequence space. But important context-dependent effects may be neglected in this type of analysis.

In this chapter, we introduce a high-accuracy alternative to enrichment-based methods for probing mutational effects on the affinity of peptide ligands. Our protocol “SORTCERY” comprises the three steps of selection, deep sequencing , and computational analysis (Fig. 1a). The selection process involves two-color cell sorting of a yeast surface-displayed library based on the expression levels of displayed peptides and levels of binding to a target (Fig. 1b). Our sorting protocol builds on reports that two-color FACS can accurately distinguish between binders of diff erent affinities [1519] and agrees with a theoretical model describing the expected signals for clones expressing peptides with a range of binding strengths [20]. This m odel can guide sorting of a library into pools according to binding affinity, and the pools can then be deep sequenced to obtain information about individual library member affinities. SORTCERY extracts information from deep sequenced library pools using computational routines that rank observed mutant sequences according to binding strength.

Fig. 1
figure 1

(a) SORTCERY combines experimental and computational protocols to rank peptide ligands according to their affinity for a target. Yeast-displayed peptides are sorted into pools that include ligands of similar affinity using FACS. Deep sequencing information is generated for each sample, and the distribution of each sequence over the FACS gates is determined. Pairwise comparison of distributions permits calculation of the probability that one peptide binds more strongly than another, for each pair of peptides. A global rank order of affinities is computed from the probabilities. (b) SORTCERY’s yeast-display and gate-setting schemes. Peptide expression and target binding are detected via tags that are recognized by pairs of primary and fluorescently labeled secondary antibodies. Two-color cell sorting is based on these two signals. Gates are set to optimally separate binders of different affinities and to exclude non-binders and non-expressing cells

Applying SORTCERY to study helical peptide affinities for the apoptosis-regulating protein Bcl-xL, we obtained extremely accurate rankings for ~1000 sequences over a range of dissociation constants from 0.1 to 60 nM (Fig. 2a). Our study is described in Ref. [20], and the reader is referred to that paper for in-depth exposition of the theory underlying SORTCERY, the results when applied to Bcl-xL, and further discussion of strengths and limitations of this method. A special variant of our approach is described here (Fig. 2b, see Note 9 ) that can potentially be used to analyze much larger libraries.

Fig. 2
figure 2

(a) Individually measured dissociation constants vs. SORTCERY ranking indices for 19 sequences from a ranking of ~1000 sequences. Clones have been reindexed from 1 to 19. Error bars for rank indices are 95 % bootstrap confidence intervals: error bars for dissociation constants indicate standard deviations for four individual measurements. (b) Ranking indices for the same 19 clones as determined by convoluted SORTCERY (see Note 9 ). Figure panel (a) is adopted with publisher’s permission from Fig. 4 in Ref. [20]

2 Materials

2.1 Cell Culture Media

  1. 1.

    SD + CAA/SG + CAA: Dissolve 5 g casamino acids, 1.7 g yeast nitrogen base, 5.3 g ammonium sulfate, 10.2 g Na2HPO4–7H2O, and 8.6 g NaH2PO4-H2O in 700 ml water and autoclave for 15 min at 22 psi and 120 °C. For growth media (SD + CAA), dissolve 50 g glucose in 50 ml water then sterilize with a 0.2 μm filter. Add 40 ml of this 50 % glucose solution to the autoclaved media and fill up to 1 l with sterile water. For induction media (SG + CAA), dissolve 20 g galactose in 100 ml water then sterilize with a 0.2 μm filter. Add 100 ml of this 20 % galactose solution to the autoclaved media and fill up to 1 l with sterile water.

2.2 Fluorescence-Activated Cell Sorting

  1. 1.

    Low protein binding 0.45 μm filter plates or bottle-top filters.

  2. 2.

    BSS pH 8.0: 50 mM Tris, 100 mM NaCl, 1 mg/ml BSA.

  3. 3.

    Primary antibody mixture: anti-HA (Roche) 1:100 dilution and anti-Myc (Sigma) 1:100 dilution in BSS.

  4. 4.

    Secondary antibody mixture: APC-labeled anti-mouse (BD Biosciences) 1:40 dilution and PE-labeled anti-rabbit (Sigma) 1:100 dilution in BSS.

2.3 Deep Sequencing Sample Preparation (See Note 1 )

  1. 1.

    Zymoprep Yeast Plasmid Miniprep I (Zymo Research).

  2. 2.

    Isopropanol.

  3. 3.

    High-Fidelity DNA Polymerase (e.g., Phusion).

  4. 4.

    Thermocycler.

  5. 5.

    Gel equipment.

  6. 6.

    PCR purification and gel extraction kits (QiaGen).

  7. 7.

    MmeI (New England Biolabs): MmeI restriction enzyme, NEB CutSmart Buffer, 1 mM SAM.

  8. 8.

    T4 Ligase.

  9. 9.

    Primers and oligos.

3 Methods

3.1 Cell Growth and Induction of Yeast Surface Display Library (See Note 2 )

  1. 1.

    Dilute cells to OD600 of 0.05 in SD + CAA and grow for 8 h at 30 °C.

  2. 2.

    Dilute cells to OD600 of 0.005 in SD + CAA and grow to OD of 0.1–0.4 at 30 °C.

  3. 3.

    Dilute cells to OD600 of 0.025 in SG + CAA and grow to OD of 0.2–0.5 at 30 °C for induction of peptide expression.

3.2 Gate Setting

  1. 1.

    SORTCERY uses a two-color FACS setup to monitor expression (F e ) and binding (F b ) signals on a log/log or biexponential scale. On a log(F b ) vs. log(F e ) plot, points of equal binding strength lie on a line with a slope of 1 [20]. Subdivide the log/log plot accordingly into areas (gates) of different affinities by dissecting it with lines of slopes of 1 (red lines in Fig. 3). The number, position, and spacing of the lines will affect the performance of the procedure. We recommend an equal spacing between lines as this will result in optimal resolution between binders of different affinities. The number of lines (and the resulting gates) depends on the required resolution. This can be determined by measuring the FACS profiles of several yeast-displayed standards (see Note 3 ). Lines should be positioned such that the gates cover an area from the strongest binders to the baseline binding signal. FACS profiles of standards can help determine whether the experimental setup will generate samples with quality appropriate for a SORTCERY sort (see Note 4 ).

    Fig. 3
    figure 3

    Gate setting for an affinity sort with 12 gates. The red, diagonal lines subdivide the axis of affinity into different intervals and thus insure that each gate corresponds to a unique range of dissociation constants. The green, lower left borders exclude non-binding cells from higher-affinity gates and exclude non-expressing cells from all gates. The depicted FACS profile of a non-binder illustrates this. The blue, upper-right borders exclude cells with the maximum possible expression or binding signal, because affinities cannot be accurately estimated from such signals. This figure is adopted with the publisher’s permission from supplemental Fig. 3 in Ref. [20]

  2. 2.

    Gate boundaries should be set to exclude cells without significant expression signal and to prevent cells in the binding baseline from being captured in gates for higher affinities. Cutoffs can be established by monitoring the FACS profile of a non-binding yeast clone and noting: (1) the position of non-expressing cells (blob in the lower left corner of Fig. 3) and (2) the binding baseline (lower right area in Fig. 3). Determine appropriate cutoffs and set gate lower-edge boundaries accordingly (see example: green edges in Fig. 3).

  3. 3.

    Cell sorters assign maximum signal values to any signal intensity above their scale of measurement. Such signals have, therefore, not been accurately determined. Exclude the maximum expression and binding signal areas from the gates by setting gate boundaries accordingly (see example: blue edges in Fig. 3) (Fig. 4).

    Fig. 4
    figure 4

    FACS profile for a BH3 peptide ligand binding to Bcl-xL. The red line indicates the orientation of the first principle component for the profile of the expressing cells. This figure is adopted with publisher’s permission from Fig. 3 in Ref. [20]

3.3 Cell Sorting

  1. 1.

    Filter grown and induced yeast cells (Subheading 3.1) and wash twice with BSS.

  2. 2.

    Incubate cells with target molecule in BSS for 2 h at 21 °C (see Notes 5 and 6 ). Shake gently during incubation.

  3. 3.

    Filter cells and wash twice with BSS.

  4. 4.

    Incubate with mixture of primary antibodies (20 μl per 106 cells, see Notes 7 and 8 ) at 4 °C.

  5. 5.

    Filter cells and wash twice with BSS.

  6. 6.

    Incubate with mixture of secondary antibodies at 4 °C.

  7. 7.

    Filter cells and wash twice with BSS. Resuspend cells in BSS for sorting.

  8. 8.

    Sort cells into each individual gate and retain sorted pools for deep sequencing analysis (see Notes 9 and 10 ). Note the number of cells sorted into each pool. Also determine the library distribution across all gates by recording how many cells hit each gate during a set time interval, e.g., a minute. This information is important for the deep sequencing analysis (Subheading 3.5, step 4).

3.4 Deep Sequencing Sample Preparation

3.4.1 DNA Extraction

  1. 1.

    If >80,000 cells are sorted, spin cells down, aspirate supernatant, and add 150 μl of solution 1 from the Zymoprep kit + 2 μl Zymolyase. For smaller numbers of cells, directly add 50 μl of solution 1 per 100 μl cell suspension + 2 μl Zymolyase per 150 μl total volume.

  2. 2.

    Incubate at 37 °C for 1 h on a shaker.

  3. 3.

    Successively add 150 μl of solutions 2 and 3 per 150 μl incubation volume and vortex after each addition. Spin down precipitate, and retain supernatant.

  4. 4.

    Add 1 volume isopropanol and 0.1 volume 3 M NaOAc to each volume of DNA extract. Store at −20 °C overnight.

  5. 5.

    Spin at 14,000 × g at 4 °C for 10 min. Carefully remove supernatant. Resuspend DNA pellet in 20 μl sterile water (pellet may not be visible for small numbers of sorted cells).

3.4.2 DNA Amplification and Adapter Attachment

Most of this section is based on the excellent preparation protocol in Ref. [21].

  1. 1.

    For each sorted sample, separately, amplify DNA sequences encoding the peptide ligands out of plasmids by PCR. The 5′ end of the forward primer needs to contain a binding site for the MmeI restriction enzyme: 5′ GGGACCACCACCTCCGAC 3′ (see Note 11 ). The 5′ end of the reverse primer has to consist of a part of the Illumina adapter sequence: 5′ CGGTCTCGGCATTCCTGC 3′ (see Notes 12 and 13 ).

  2. 2.

    Purify PCR products with the Qiagen PCR purification kit. Elute in 30 μl sterile water.

  3. 3.

    Digest each PCR product with the MmeI restriction enzyme. Incubate the digestion mixture for 1 h at 37 °C, then heat inactivate for 20 min at 80 °C (see Note 14 ).

    Digestion reagents

    PCR product

    12.5 μl

    1 mM SAM

    2.5 μl

    NEB CutSmart buffer

    5 μl

    MmeI

    5 μl per 8.6 pmol PCR product

    Sterile water

    Fill up to 50 μl

  4. 4.

    Prepare double-stranded adapters by annealing single-stranded oligos. The forward strand should contain the standard Illumina read binding site [22], a unique barcode for multiplexing (see Note 15 ) and a 3′ TC, resultung in the sequence: 5′ ACACTCTTTCCCTACACGACGCTCTTCCGATCTbarcodeTC 3′. The reverse complement strand should be 5′ phosphorylated and lack the 5′ GA 3′ that would be complementary to the TC of the forward strand.

  5. 5.

    Ligate each digestion product with an adapter containing a unique barcode. Ligate for 30 min at 20 °C, then heat inactivate for 10 min at 65 °C.

  6. 6.

    Run the products of the ligation reaction on a gel. Gel-purify the bands of correct size with the QIAquick gel purification kit. Elute in 30 μl sterile water.

  7. 7.

    PCR-amplify the ligation product. Primers should contain overhangs that complete the Illumina adapter sequences.

    • Forward Primer: 5′ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCT 3′.

    • Reverse Primer: 5′ CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCATCTT 3′.

    • 15 PCR cycles should be sufficient using Phusion polymerase.

  8. 8.

    Purify PCR products with the Qiagen PCR purification kit. Elute in 30 μl sterile water.

  9. 9.

    Combine samples and perform a multiplexed deep sequencing run on an Illumina sequencer with the standard forward Illumina read primer: 5′ ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3′. If a reverse read is also to be carried out, use a custom primer (see Note 16 ).

3.5 Computational Analysis

  1. 1.

    Filter the Illumina data by only considering sequences with a high Phred score for the mutated positions and a low number of read errors in unmutated positions (see Note 17 ). If a reverse read has been performed that overlaps the forward read, compare complementary mutant codons and choose the version with the higher Phred score.

  2. 2.

    Assign each Illumina read to its sorted pool/gate by barcode identification.

  3. 3.

    Count the copies of each unique sequence across all pools. Discard sequences with low copy numbers when summing up counts from all gates. Calculate the number of sorted cells that each unique sequence likely originated from. Dividing the number of cells that were sorted into a pool by the number of sequence reads for this sample provides a rough estimate of the cells per read. As a rule of thumb, require at least 100 sorted cells for each observed sequence.

  4. 4.

    If a convoluted sort strategy was used, see Note 18 . Otherwise, calculate the distribution over the gates for each unique sequence.

    $$ {f}_{xj}=\frac{z_x\frac{n_{xj}}{{\displaystyle \sum}_i{n}_{xi}}}{{\displaystyle \sum}_y{z}_y\frac{n_{yj}}{{\displaystyle \sum}_i{n}_{yi}}} $$

    Here, f xj is the normalized frequency of sequence j in gate x, n xj is the number of reads of sequence j in deep sequencing data set x (which corresponds to gate x), and z x is the number of cells that hit gate x when measuring the distribution of cells across all gates.

  5. 5.

    Calculate all possible pairwise probabilities that a peptide A is a stronger binder than a peptide B and vice versa:

    $$ p\left(A>B\right)={\displaystyle \sum}_x{f}_{xA}{\displaystyle \sum}_{y<x}{f}_{yB} $$

    Note that gate indices x and y are assigned from lowest to highest affinity gates, i.e., in the equation the sum over y runs over all gates corresponding to lower affinities than that of gate x. Assign these probabilities as weights to the edges of a directed graph. The vertices of the graph represent peptides and the directed edge running from vertex B to vertex A indicates the assumption that peptide A is a stronger binder than peptide B (Fig. 5a).

    Fig. 5
    figure 5

    (a) A directed graph representing four peptide ligands and assumptions about their relative binding strengths. Each edge is weighted by the probability that the ligand at its tail is a weaker binder than the ligand at its head. (b) A linear subgraph of (a). Note that no conflicting assumptions about binding strengths exist

  6. 6.

    Find the maximum linear subgraph by first applying the method described in Ref. [23]. To do this, randomly choose a peptide/vertex A. For each other peptide/vertex B, compare the edge weights of the two edges that connect it to A. If p(A > B) > p(B > A), then B is considered a worse binder than A; if p(B > A) > p(A > B), then B is considered a better binder than A. Group all peptides according to whether they are better or worse binders than A. Then, within each group, repeat the procedure of randomly choosing one peptide and evaluating all others with respect to it, continuing to subdivide the groups until an ordering from best to worst binder has been constructed. Determine the likelihood score for this ordering by summing up the logarithms of the edge weights for all directed edges that agree with the ordering (Fig. 5b). Repeat the procedure of constructing an ordering several times and retain the one with the best score. Further refine this ordering by inserting each individual peptide into all possible positions and keeping the new position if a better score is obtained. Run the routine several times, alternately starting with the best and the worst binding peptide. Finally, run a Monte-Carlo search in which moves correspond to exchanging the positions of two peptides in the ordering. The final result represents an affinity ranking of all peptides.

4 Notes

  1. 1.

    We fine-tuned the protocols described in Subheading 3.4 using material from the specified suppliers. We have not tested corresponding products from other suppliers, and it is possible that these will also work for deep sequencing sample preparation. Experimenters may need to adjust protocols according to the specific products they use.

  2. 2.

    This growth protocol has been optimized for EBY100 cells that have been transformed with a pCTCON2 plasmid [17]. The experimenter may have to choose other parameters for a different setup. In the authors’ experience, cell densities may have an impact on the quality of FACS profiles. Low-quality FACS pr ofiles can lead to suboptimal sorts with respect to affinity . Users of the procedure should strictly monitor cell densities. The first growth step in this protocol ensures that samples contain mostly live and healthy cells for correct measurements of ODs. It may be possible to skip this step if cells are not grown up from frozen stocks or plates.

  3. 3.

    The number and position of gates can be chosen based on a set of standards. Record the FACS profiles of several yeast-displayed standards in a same-day experiment at a target concentration chosen based on anticipated affinities. Construct a set of gates to be tested for adequate resolution. Determine for each FACS profile how many cells would have hit each gate. This provides a distribution over the gates for each standard. Then, simulate an experiment by drawing random samples with a size of ten cells for each standard. (Note that clones should be sampled more often than this during an actual SORTCERY sort. However, real samples may experience additional experimental noise during preparation for deep sequencing . Thus, we find 10 cells in this procedure provide useful information.) Use the random sample for each standard X and gate i to calculate the normalized frequency, f iX , with which the standard would be observed in the gate. Calculate the probability that standard X is a better binder than standard Y based on the random samples, using the formula given in Subheading 3.5, step 5. Compare the result to the actual affinities of the standards. Repeat this many times to determine the range of values the probability can take. Sufficient resolution, i.e., a sufficient number and appropriate placement of gates, will be indicated by mostly high probabilities for the correct ordering of standards.

  4. 4.

    Record several FACS profiles for standards. Consider data for expressing cells that have binding signals mostly above the baseline. Use a cutoff line with a slope of −1 to separate expressing from non-expressing cells; using other cutoffs may bias the analysis. Adjust the retained data by subtracting the average binding and expression signals from each data point. Calculate the covariance matrix of the data. Determine the first principal component by calculating the matrix’s eigenvectors and eigenvalues. The vector with the largest corresponding eigenvalue indicates the orientation of the first principle component. Determine the first principle component’s slope, i.e., the slope of the vector. High-quality FACS pro files should result in a value close to 1 (Fig. 4). Reduction in quality can have many different experimental origins, such as inappropriate growth protocols (see Notes 1 and 2 ), excess dissociation of target molecule during washing steps (see Note 8 ), or nonspecific binding to tube walls (see Note 5 ).

  5. 5.

    BSA is used as a blocking agent to prevent nonspecific binding to the cells and, more importantly, the test tube walls. Adsorption to the tube walls may lead to significant depletion of target molecules and distortion of FACS profiles.

  6. 6.

    The number of target molecules should be in excess of the number of surface-displayed peptides. For example, our yeast strain expresses about 30,000 peptides per cell [24]. If 106 cells are incubated in 700 μl of 1 nM target molecule solution, then at most ~10 % of the target molecules are bound. Adjust your incubation volume accordingly. Choose the concentration of target molecule appropriately to investigate a specific range of affinities (see Note 3 ).

  7. 7.

    We have used an HA tag for detection of expression and a Myc tag for detection of binding. However, other tags may work with our protocol and may be preferred by the experimenter. Required antibody concentrations may depend on the exact choice. Always test whether the antibodies provide high-quality FACS profiles (see Note 3 ).

  8. 8.

    Swift application of antibodies is crucial because washing steps can disturb the equilibrium between free and bound target molecules. We have found that fully prepared samples are relatively stable, possibly because the antibodies cross-link the bound target molecules and thereby dramatically decrease dissociation.

  9. 9.

    Because gate setting requires a significant amount of time, gates should be drawn prior to sample preparation. Adjust PMT voltages so that the library’s FACS profile largely covers the preset gates. Adjustments may be guided by a set of standards.

  10. 10.

    If the number of chosen gates exceeds the number of sample tubes that the cell sorter can sort into at the same time, gates have to be sampled successively. This may waste a huge number of labeled cells, because cells that hit unselected gates will be discarded. The experimenter can adopt an alternative, convoluted sorting strategy instead that permits sorting into all gates simultaneously. In this approach, cells from different gates are sorted into the same sample tubes. Successive sorts that combine different sets of gates can be carried out, which enables back-calculation of the number of cells in each gate for each clone in the subsequent analysis (see Note 17 ). For N gates, prepare N unique combinations of gates. A gate must not be paired with any other gate more than once in these combinations. Sort orthogonal sets of combinations successively. For example, if 12 gates are chosen and the sorter can only sort into four sample tubes at the same time, the following set of combinations would be appropriate: {1,2,3}, {4,5,6}, {7,8,9}, {10,11,12}, {1,4,7}, {2,5,10}, {3,8,11}, {6,9,12}, {1,5,8}, {2,4,11}, {3,9,10}, and {6,7,12}. Note that any pair of two gate indices appears together at most once. This set of combinations could be processed in three successive sorts collecting four pools of cells (each pool derived from three gates, all pools sorted into individual sample tubes) at a time: first {1,2,3}, {4,5,6}, {7,8,9}, {10,11,12}, then {1,4,7}, {2,5,10}, {3,8,11}, {6,9,12}, and then {1,5,8}, {2,4,11}, {3,9,10}, {6,7,12}.

  11. 11.

    MmeI recognizes the sequence 5′ TCCRAC 3′. Additional nucleotides 5′ of the binding site can improve binding (e.g., 5′ GGGACCACCACC 3′ in step 1, Subheading 3.4.2). MmeI cuts 20 nucleotides 3′ of its binding sequence.

  12. 12.

    Use high-fidelity polymerase and as few PCR cycles as possible in order to reduce errors and amplification bias. 25 cycles generally suffice with the Phusion Polymerase standard protocol.

  13. 13.

    High salt content from the DNA extraction step may prove inhibitory to sufficient amplification. 5 μl DNA extract in a 100 μl reaction mixture generally provides enough dilution to obtain satisfactory results.

  14. 14.

    Excess MmeI may block digestion. MmeI activity is also curbed by high amounts of salt. Excess salt may enter the reaction mixture via the PCR product from the PCR purification step. In addition, MmeI has a very low turnover and stoichiometric amounts of MmeI are required for sufficient digestion. Experimenters need to take special care to use the exact amounts of PCR product and MmeI indicated in Subheading 2.

  15. 15.

    Diverse barcodes at the beginning of a deep sequencing read are required to ensure proper calibration of the base-calling algorithm. Barcodes need to be at least five nucleotides long, and deep sequencing runs should be multiplexed with at least 20 different barcodes. Barcode sequences should vary such that all bases appear in each position with roughly the same frequency.

  16. 16.

    Sequencing a library can be a difficult task for Illumina sequencers, because current base-calling algorithms expect significant sequence variety for all positions of a sample, whereas library samples generally contain regions of constant sequence. Spiking PhiX genome into the sample may help alleviate problems, as may running a reference lane with PhiX genome on the same flow cell.

  17. 17.

    MmeI sometimes cuts 19 or 21 bases 3′ of its binding site. Furthermore, the TC 3′ of the barcode may be missing in some reads. A small fraction of undigested but ligated sample may also be observed.

  18. 18.

    Analyze deep sequencing from convoluted sorts (see Note 9 ) in the following way: For each sequence j calculate its frequency in each pool x as

    $$ {g}_{xj}=\frac{n_{xj}}{{\displaystyle \sum}_i{n}_{xi}} $$

    with n xj being the number of reads for sequence j in pool x. Then calculate the corrected number of cells in pool x that contained sequence j as

    $$ {m}_{xj}={g}_{xj}{\displaystyle \sum}_y{z}_y $$

    where z y is the number of cells that hit gate y considering the distribution of cells across all gates, and the index y runs over all those gates that are part of pool x. Solve a linear equation system of the form

    $$ \overrightarrow{M_j}=\overrightarrow{D_j}\overrightarrow{Q_j} $$

    for the elements of vector Q j . The xth entry of the vector M j is m xi . The entry d xyj in the xth row and yth column of matrix D j is 1 if gate y is part of pool x and zero otherwise. The entry q yj in vector Q j is the time-corrected number of cells in gate y. Normalize vector Q j to obtain the frequencies that are required for step 5.