Abstract
Protein interactions, as well as the networks they formed, play a key role in many cellular processes and the distortion of the protein interacting interfaces may lead to the development of many diseases. In this chapter, we will briefly introduce the background knowledge of the protein–protein interaction, followed by the detailed explanation of varied analysis—from basic to advanced, as well as related tools and databases. VisANT (http://visant.bu.edu)—a free Web-based software platform for the integrative visualization, mining, analysis, and modeling of the biological networks—will be used as a main tool for all examples used in this section.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
When you are uncertain about the format of edge-list, you can always export the network in the format of edge-list with the menu File→Export as Tab-Delimited File→All and follow the exported examples.
References
Phizicky EM, Fields S (1995) Protein-protein interactions: methods for detection and analysis. Microbiol Rev 59(1):94–123
Berggard T, Linse S, James P (2007) Methods for the detection and analysis of protein-protein interactions. Proteomics 7(16):2833–2842
Sobott F, Robinson CV (2002) Protein complexes gain momentum. Curr Opin Struct Biol 12(6):729–734
McCammon MG et al (2002) Screening transthyretin amyloid fibril inhibitors: characterization of novel multiprotein, multiligand complexes by mass spectrometry. Structure 10(6):851–863
von Mering C et al (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887):399–403
Lu L, Arakaki AK, Lu H, Skolnick J (2003) Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res 13(6A):1146–1154
Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22(10):1317–1321
Ito T et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98(8):4569–4574
Uetz P et al (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770):623–627
Hart GT, Ramani AK, Marcotte EM (2006) How complete are current yeast and human protein-interaction networks? Genome Biol 7(11):120
Hu Z, Snitkin ES, DeLisi C (2008) VisANT: an integrative framework for networks in systems biology. Brief Bioinform 9(4):317–325
Hu Z et al (2007) VisANT 3.0: new modules for pathway visualization, editing, prediction and construction. Nucleic Acids Res 35(Web Server issue):W625–W632
Hu Z et al (2005) VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res 33(Web Server issue):W352–W357
Hu Z, Mellor J, Wu J, DeLisi C (2004) VisANT: an online visualization and analysis tool for biological interaction data. BMC Bioinformatics 5:17
Hu Z, Mellor J, DeLisi C (2004) Analyzing networks with VisANT. In: Baxevanis A, Davison D, Page R, Petsko G, Stein L, Stormo G (eds) Current protocols in bioinformatics. Wiley, Hoboken
Hu Z et al (2009) VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology (translated from eng). Nucleic Acids Res 37(Web Server issue):W115–W121 (in eng)
Hermjakob H et al (2004) The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 22:177–183
Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C (2009) Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network. Genome Biol 10(9):R91
Linghu B et al (2008) High-precision high-coverage functional inference from integrated data sources. BMC Bioinformatics 9:119
Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34(Database issue):D504–D506
Breitkreutz BJ et al (2008) The BioGRID Interaction Database: 2008 update. Nucleic Acids Res 36(Database issue):D637–D640
Aranda B et al (2010) The IntAct molecular interaction database in 2010. Nucleic Acids Res 38(Database issue):D525–D531
Zanzoni A et al (2002) MINT: a Molecular INTeraction database. FEBS Lett 513(1):135–140
Mewes HW et al (2008) MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res 36(Database issue):D196–D201
Cherry JM et al (1998) SGD: Saccharomyces Genome Database. Nucleic Acids Res 26(1):73–79
Wilson RJ, Goodman JL, Strelets VB (2008) FlyBase: integration and improvements to query tools. Nucleic Acids Res 36(Database issue):D588–D593
Keshava Prasad TS et al (2009) Human Protein Reference Database—2009 update. Nucleic Acids Res 37(Database issue):D767–D772
Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res 30(1):306–309
von Mering C et al (2007) STRING 7—recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35(Database issue):D358–D362
UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36(Database issue):D190–D195
Bruford EA et al (2008) The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res 36(Database issue):D445–D448
Schuster-Bockler B, Bateman A (2008) Protein interactions in human genetic diseases. Genome Biol 9(1):R9
Yeger-Lotem E et al (2004) Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci USA 101(16):5934–5939
Zhang LV et al (2005) Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J Biol 4(2):6
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2008) GenBank. Nucleic Acids Res 36(Database issue):D25–D30
Rogers A et al (2008) WormBase 2007. Nucleic Acids Res 36(Database issue):D612–D617
Goh KI et al (2007) The human disease network. Proc Natl Acad Sci USA 104(21):8685–8690
Tong AH et al (2004) Global mapping of the yeast genetic interaction network. Science 303(5659):808–813
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large-scale organization of metabolic networks. Nature 407(6804):651–654
Albert R, Jeong H, Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406(6794):378–382
Gavin AC et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555
Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA (2004) Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14(3):283–291
del Sol A, O’Meara P (2005) Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3):672–682
King OD (2004) Comment on “Subgraphs in random networks”. Phys Rev E Stat Nonlin Soft Matter Phys 70(5 Pt 2):058101, author reply 058102
Itzkovitz S, Milo R, Kashtan N, Ziv G, Alon U (2003) Subgraphs in random networks. Phys Rev E Stat Nonlin Soft Matter Phys 68(2 Pt 2):026127
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68
Ashburner M et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29
da Huang W et al (2007) The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8(9):R183
Hu Z et al (2007) Towards zoomable multidimensional maps of the cell (translated from eng). Nat Biotechnol 25(5):547–554 (in eng)
Lee TI et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298(5594):799–804
Milo R et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Endy D, Brent R (2001) Modelling cellular behaviour. Nature 409(6818):391–395
Stamm S (2002) Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome. Hum Mol Genet 11(20):2409–2416
Boulos MN (2003) The use of interactive graphical maps for browsing medical/health Internet information resources. Int J Health Geogr 2(1):1
Green ML, Karp PD (2006) The outcomes of pathway database computations depend on pathway ontology. Nucleic Acids Res 34(13):3687–3697
Fraser AG, Marcotte EM (2004) A probabilistic view of gene function. Nat Genet 36(6):559–564
Guimera R, Nunes Amaral LA (2005) Functional cartography of complex metabolic networks. Nature 433(7028):895–900
Ihmels J et al (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31(4):370–377
Bar-Joseph Z et al (2003) Computational discovery of gene modules and regulatory networks. Nat Biotechnol 21(11):1337–1342
Wu J, Hu Z, DeLisi C (2006) Gene annotation and network inference by phylogenetic profiling. BMC Bioinformatics 7:80
Oltvai ZN, Barabasi AL (2002) Systems biology. Life’s complexity pyramid. Science 298(5594):763–764
Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology annotations. Nat Rev Genet 9(7):509–515
Reimand J, Tooming L, Peterson H, Adler P, Vilo J (2008) GraphWeb: mining heterogeneous biological networks for gene modules with functional significance. Nucleic Acids Res 36(Web Server issue):W452–W459
Zhang M et al (2008) Interactive analysis of systems biology molecular expression data. BMC Syst Biol 2:23
Brohee S et al (2008) NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res 36(Web Server issue):W444–W451
Alibes A, Canada A, Diaz-Uriarte R (2008) PaLS: filtering common literature, biological terms and pathway information. Nucleic Acids Res 36(Web Server issue):W364–W367
Antonov AV, Schmidt T, Wang Y, Mewes HW (2008) ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data. Nucleic Acids Res 36(Web Server issue):W347–W351
Lee T, Desai VG, Velasco C, Reis RJ, Delongchamp RR (2008) Testing for treatment effects on gene ontology. BMC Bioinformatics 9(Suppl 9):S20
Salomonis N et al (2007) GenMAPP 2: new features and resources for pathway analysis. BMC Bioinformatics 8:217
Zhu J et al (2007) GO-2D: identifying 2-dimensional cellular-localized functional modules in gene ontology. BMC Genomics 8:30
Antonov AV, Tetko IV, Mewes HW (2006) A systematic approach to infer biological relevance and biases of gene network structures. Nucleic Acids Res 34(1):e6
Draghici S et al (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 31(13):3775–3781
Khatri P, Bhavsar P, Bawa G, Draghici S (2004) Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments. Nucleic Acids Res 32(Web Server issue):W449–W456
Khatri P et al (2007) Onto-Tools: new additions and improvements in 2006. Nucleic Acids Res 35(Web Server issue):W206–W211
Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21(18):3587–3595
Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281
Maglott D, Ostell J, Pruitt KD, Tatusova T (2007) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res 35(Database issue):D26–D31
Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I (2001) Controlling the false discovery rate in behavior genetics research. Behav Brain Res 125(1–2):279–284
Barry WT, Nobel AB, Wright FA (2005) Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21(9):1943–1949
Mootha VK et al (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273
Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles (translated from eng). Proc Natl Acad Sci USA 102(43):15545–15550 (in eng)
Volinia S et al (2004) GOAL: automated gene ontology analysis of expression profiles. Nucleic Acids Res 32(Web Server issue):W492–W499
Zhou X, Kao MC, Wong WH (2002) Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA 99(20):12783–12788
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Mathematical Definition of Metagraph
A metagraph \( {G_{\rm{m}}} = \{ V,E\} \) consists of a finite set V of the nodes and a finite set E of the edges. Nodes in a metagraph can be denoted as \( V = \{ {V_{\rm{s}}},{V_{\rm{m}}}\} \) where \( {V_{\rm{s}}} \) represents simple nodes as generally defined in simple graph and \( {V_{\rm{m}}} \) represents the metanodes. The subscription s represents the simple node/edge and the subscription m represents metanode/metaedge. Each metanode \( {v_{\rm{m}}} \in {V_{\rm{m}}} \) contains a subgraph consisting of child nodes and connected edges. In addition, each node \( v \in V \) represents a set of its instance nodes, i.e., \( v = \left\{ {{v_i}\left| {i > 0} \right.} \right\} \) where \( {v_i} \) is the instance node of \( v \). Instance nodes remains exact same identity between them but can have individual-specific properties. The statement that two metanodes share a node implies that each metanode contains an instance of the same node.
A metanode \( {v_{\rm{m}}} \) has two states, expanded or contracted; the expanded state manifests the internal subgraph (that is, places all children nodes with their connections into the graph) while the contracted state replaces this subgraph with the single node. The combination of different states of the metanodes for a given metagraph results in multiple views that are abstract representations of the same underlying data. The change of the views for a given metagraph is defined as the dynamics of the metagraph, as shown in Fig. 1D, E.
Edges in a metagraph can be denoted as \( E = \{ {E_{\rm{s}}},{E_{\rm{m}}}\} \)where \( {E_{\rm{s}}} \)represents simple edges that are generally defined in the simple graph and \( {E_{\rm{m}}} \) represents metaedges. Each metanode edge \( {e_{\rm{m}}} \in {E_{\rm{m}}} = {e_{{{v_{\rm{m}}},v}}} \)is associated with at least one contracted metanode \( {v_{\rm{m}}} \)and is transient: it appears when the metanode is contracted and disappears when one or two connected metanode nodes expanded, i.e., the metaedge is derived from the properties of two connected nodes. The most common derivation of the metaedge is the connection transfer. For example, when metanodes M1 and M2 are contracted in Fig. 1E, the connection between C and E is transferred to M1 and M2. However, metaedge can also be derived from other properties of the metanode. The metaedge shown in Fig. 1E is derived because two metanode M2 and M3 share the same node E. The derivation of the metaedge can be generalized as \( {e_{{{v_{{{\rm{m}},v}}}}}} = g({v_{{{\rm{m}},}}}v) \), where g is the aggregation function and \( v \in V \) can either be a metanode node or a simple node.
1.2 Download and Run VisANT as a Local Application
VisANT has four running modes in total, and two of them require a local copy of VisANT. Please visit http://visant.bu.edu and click the link “Run VisANT” for detailed instruction of other modes. It is recommended to run VisANT as a local application when handling large-scale network, such as the network with more than 100,000 nodes and edges because you will have the option to specify the memory size that VisANT can use. In addition, a local application allows VisANT to access local resources, such as load/save network files, directly; it also allows the user to develop VisANT plugins, as well as run a list of batch commands in the background without any user interface (batch mode).
The only drawback to run VisANT as a local application is that it easily becomes out of date because VisANT is under active development. Fortunately, VisANT provides a function to checks the update automatically and an icon will be shown near the Help menu if the update is available. Users can either click the icon, or corresponding menu to upgrade the VisANT to the latest version, as shown below:
-
1.
If not already installed, download and install the Java 2 Platform, Standard Edition, version 1.4 or higher (http://java.sun.com/javase/downloads/index.jsp).
-
2.
Go to http://visant.bu.edu and click on the link “Download,” then click the link “Latest Version of VisANT.”
-
3.
Select a directory to save the file “VisAnt.jar”
The VisAnt.jar is only about 400 K in size and the download shall take less than 1 min to finish. No installation is needed to run the VisANT.
-
4.
To launch VisANT, double-clicking VisAnt.jar
-
5.
To launch VisANT by an alternative mean: Open a Dos window in Win OS, or a shell window in other operation systems, and go to the directory where VisAnt.jar locates, and run the command:
java -Xmx512M -classpath VisAnt.jar cagt.bu.visant.VisAntApplet
where 512 M indicates the maximum size of the memory that VisANT can use. Increase this number if you have a large network or you get the “run out of memory” error.
-
6.
The VisANT main window will appear (Fig. 4).
-
7.
To exit VisANT, close the VisANT main window, or use the File → Exit menu option, or press the key combination ALT + X.
1.3 GO Term Enrichment Analysis
The four steps here describe how GOTEA works in VisANT. For illustration purposes, the following steps take only one metanode, G, into account and calculate only the enrichment score of one target GO term, T.
Step 1: Fully annotate all of the nodes in G with gene names and GO terms.
Step 2: Calculate density scores for each node based upon the topology and the GO term similarity to T. A vector D G of density scores of each gene in G is computed, with the element of D G for the ith gene denoted D i . The density score is used to evaluate the impact of other genes in G on the ith gene, according to both the GO term similarity and the topological distance to the ith gene. D i is defined as:
where the step function,
ensures that D i ≥ 0. M j is a measure of the GO term similarity calculated based upon the graph structure of the GO term hierarchy [85]. A significance threshold, α, is used to control the contribution that gene j makes to D i . For larger α, a greater number of less statistically significant (with M j < α) genes are filtered and they do not contribute to D i . The shortest distance between genes i and j given the topology of G is denoted d ij and was calculated with the Floyd–Warshall algorithm. We assume that shorter distances make an exponentially greater contribution to the density than do longer distances, with the steepness of the exponential determined by the parameter \( \tilde{\beta } \)When a bigger β is chosen, more distant genes can contribute to the density. Taken together, the parameters α and β are used to control the sensitivity and selectivity of the density.
Step 3: Another vector of density scores, D NG, is computed based upon a randomly chosen subset of genes representative of the background distribution. The background consists of all genes annotated by NCBI.
Step 4: Statistical significance for rejecting the null hypothesis is determined by a permutation test. For statistical robustness, step 3 is repeated n times. The number of times the average density score of randomly chosen genes is found to be larger than the average density score of genes in G is counted after n iterations and used to compute the final p-value (Fig. 23).
These four steps can be carried out for multiple testing by using multiple metanodes and multiple targeting GO terms. In this case, the p-values are corrected using FDR methods (79). Specifically,\( {\text{\ FDR}} = p \times {{m} \left/ {k} \right.} \), where m is the total number of GO terms tested and k is the rank of the GO terms under consideration. There is also an option for GOTEA to identify representative GO terms from all its discoveries based upon approaches that identify the most informative GO term (84).
1.4 Network Module Enrichment Analysis
NMEA is implemented in a manner similar to GOTEA. Where GOTEA used GO term similarities, NMEA uses p-values from T-tests on the expression values of two phenotypes.
Step 1: Fetch the expression profile of each gene in a given module (i.e., metanode, denoted M in the following context) from formatted user input. The input should include an adequate number of samples with comparable phenotypes (e.g., normal and disease).
Step 2: A vector D M of density scores of each gene is computed, with the element of D M for the ith gene denoted as D i . D i is defined as:
where the step function,
ensures that D i ≥ 0. M j is the p-value from a two-tailed t-test of differential expression between two phenotypes (for example, normal and disease). The parameters α and β are used to control the sensitivity and selectivity of the density as described in the previous section.
The density score is used to evaluate the impact of other genes in M on the ith gene, according to both the p-value calculated by T-test (an indicator of differential expression) and their topological distances to the ith gene.
Step 3: Another vector of density scores, D NM, is computed by randomly shuffling the phenotypes to obtain a representative sampling of the background distribution.
Step 4: Statistical significance for rejecting the null hypothesis is determined by a permutation test. For statistical robustness, step 3 is repeated n times. The number of times the average density score of randomly chosen genes is found to be larger than the average density score of genes in M is counted after n iterations and used to compute the final p-value.
When applying NMEA to multiple metanodes, the p-value must be corrected by FDR in a manner similar to what was described above for GOTEA. In this case, \( {\text{FDR}} = p \times {{m} \left/ {k} \right.} \) as before, but m is the total number of metanodes and k is the rank of the metanodes under consideration.
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this protocol
Cite this protocol
Hu, Z. (2013). Analysis Strategy of Protein–Protein Interaction Networks. In: Mamitsuka, H., DeLisi, C., Kanehisa, M. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 939. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-107-3_11
Download citation
DOI: https://doi.org/10.1007/978-1-62703-107-3_11
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-106-6
Online ISBN: 978-1-62703-107-3
eBook Packages: Springer Protocols