1 Introduction

Information on the function and molecular properties of individual proteins is available in major databases such as UniProt [1]. To be functional, most proteins establish physicochemical dynamic connections with other proteins. Finding these interactions provides opportunities to explore their biological functions [2]. The map of the protein–protein interactions (PPIs) in a particular organism is called the interactome [3]. Aberrant PPIs are detected in multiple aggregation-related diseases, such as polyglutamine diseases, Creutzfeldt–Jakob, Parkinson’s, Alzheimer’s, and cancer [4, 5]. The comparison of PPI networks in patients and controls can elucidate the molecular basis of these diseases and lead to the identification of possible therapeutic targets.

While several computational and experimental methods based on single or high-throughput screens have been implemented for detecting PPIs, all present advantages and disadvantages [6]. Computational methods (e.g., text mining, docking, machine learning, interolog mapping, and so forth [7]), are able to detect thousands of PPIs in much less time and at a lower cost than experimental methods; however, since these methods are based on predictions and not on experimental data, accuracy is always an issue. Nevertheless, computational methods can be very useful to understand which interactions may be missing in the available experimental dataset. In addition, the use of high-throughput screen methods cannot guarantee the capture of all interactions, and the most used experimental high-throughput screen techniques (i.e., mass spectrometry, two-hybrid assays, and tandem affinity purification) can produce rates of false positive interactions up to 50% [2]. Finally, methods that are unlikely to generate false positives, such as X-ray crystallography, are not easy to scale up and thus cannot be used to study large numbers of PPIs.

PPI datasets of several species, obtained with different detection methods, are publicly accessible and can be downloaded from databases such as BioGRID [8, 9], CCSB [10,11,12,13,14], DroID [15], FlyBase [16], HIPPIE [2], HitPredict [17], HomoMINT [18], INstruct [19], Interactome3D [20], Mentha [21], MINT [22], or PINA [23]. Although there is some degree of overlap between databases, every database reports an exclusive set of information, and since interactions can be reported in different formats, the comparison between databases can be demanding (e.g., BioGRID, MINT and CCSB report interactions using gene identifiers, UniProt numbers, and gene names, respectively). Databases can be human-curated (e.g., BioGRID, HIPPIE, and MINT) and can report the source of each PPI (e.g., BioGRID and MINT). Furthermore, functionally equivalent proteins can have distinct names in different species, making the comparison across species difficult to achieve. Since each method and database presents advantages and disadvantages, interactions reported in several independent studies, or in distinct species, are expected to be more reliable than those reported in a single study using high-throughput methods. Furthermore, as stated above, the comparison of interactomes obtained under different conditions (e.g., patients and controls) can be informative.

This paper presents EvoPPI (http://evoppi.i3s.up.pt), an open-source web application that aims to effortlessly compare PPI datasets across databases and species. Since proteins can have different names in the species being compared, a BLAST-based approach is used for across-species comparisons, allowing users to specify different criteria to select the proteins that are considered functionally equivalent. It should be noted, however, that EvoPPI is not an application for PPI inference using homology. Four parameters can be adjusted by the user: (1) number of descriptions to report, which controls the number of sequences to be reported in the output; (2) the expect value, which describes the number of hits expected by chance when searching a database of a particular size (lower E-value represents more “significant” match); (3) the minimum percentage of identity that the sequence alignment must have to be considered a positive match; and (4) the minimum length of the aligned block, which specifies the size the sequence alignment must have to be considered a positive match. These features are useful when comparing organisms such as Homo sapiens and Drosophila melanogaster, where two rounds of whole genome duplication occur in human lineage [24], implying that the majority of Drosophila genes have multiple paralogs in humans. In short, EvoPPI presents distinctive features such as the use of a BLAST approach for the identification of orthologous/paralogous genes (where the user can define the number of descriptions, the minimum expect value, the minimum length of alignment blocks, and the minimum identity), and the use of colour codes for an effortless detection of differences between datasets.

To demonstrate the usefulness of EvoPPI, we will compare the human interactomes for ATXN1, ATXN2, CACNA1A, ATXN7, TBP, ATXN3, HTT, ATN1, and AR, the nine polyglutamine (polyQ) proteins that are associated to degenerative disorders due to an expansion of the polyQ tract. These proteins are responsible for six spinocerebellar ataxias (SCA) types 1, 2, 6, 7, 17, Machado–Joseph disease (MJD or SCA3), Huntington’s disease (HT), dentatorubral pallidoluysian atrophy (DRPL), and spinal and bulbar muscular atrophy X-linked 1 (SBMA), respectively [25]. We will begin by demonstrating that there is no protein in common to all the polyQ disease interactomes. We will then show that when considering those proteins shared between the interactomes of at least four of the polyQ disease proteins, six are found to belong to the ubiquitin pathway. Comparisons with Mus musculus PPIs are also made for AR and TBP, using the EvoPPI BLAST search approach for distinct species comparisons, to explore why there is a significant excess of common interactors for these proteins in humans.

2 Materials and Methods

2.1 Data

EvoPPI relies on two main types of data to perform the analyses: reference genomes of the species (FASTA files) and interactomes (TSV files with the interactions). The current version of EvoPPI includes the reference genomes of ten animal species: Bos taurus, Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Gallus gallus, Homo sapiens, Mus musculus, Oryctolagus cuniculus, Rattus norvegicus, and Xenopus laevis (see Supplementary Table 1 for more details). For each of these 10 species, more than 100 PPIs are available in at least 1 interactome database. The reference genomes were downloaded from NCBI in GenBank Flat File Format (GBFF) and parsed to extract the CoDing sequences (CDSs) and create the FASTA files required by EvoPPI. In the same operation, dictionaries of gene synonyms for each species were also created. These dictionaries are used by EvoPPI to allow users to look for different gene names.

The current version of EvoPPI also includes 52 interactomes for the 10 species (see Supplementary Table 1 for additional details). We downloaded all the available interactomes from the following databases: BioGRID, CCSB, DroID, FlyBase, HIPPIE, HitPredict, HomoMINT, Instruct, Interactome3D, mentha, MINT, and PINA. We then parsed each database file to convert them into a unified format that EvoPPI can handle, that is, we converted each file into a simple TSV file with two columns that represent the Gene-ID identifiers of the genes involved in each reported interaction. It is important to note that this process requires converting the gene identifiers from their source formats (UniProtKB-ID, Gene name, or FlyBase; Supplementary Table 1) into Gene-ID, which sometimes requires a two-step conversion: first converting them to UniProtKB-ID and then to Gene-ID. To perform this step, we used the mapping API offered by UniProtKB.Footnote 1 However, some interactions were lost, because they could not be converted (see Supplementary Table 1 for additional details).

All the information managed by EvoPPI, including interactomes, species, and gene data, is stored in a relational database. This allows fast information retrieval, reducing the time required to process user queries. The current version of EvoPPI also includes support for user registration, allowing users to keep and manage their query results.

2.2 EvoPPI architecture

EvoPPI is composed of two different applications that act as the front-end and the back-end components, respectively. The front-end application is a web application that was implemented using the Angular v6 frameworkFootnote 2 in combination with the Angular Material v6 libraryFootnote 3 and the Material Dashboard Angular 5 template,Footnote 4 for a richer user interface. The back-end application was implemented using the Java EE 7 platform.Footnote 5 This application provides a RESTful API [26] with resources to access data and to request PPIs calculation. Communication between front-end and back-end applications is done using Asynchronous JavaScript and XML (AJAX) and JavaScript Object Notation (JSON) for data encoding.

EvoPPI relies on BLAST to perform sequence alignment between the gene sequences of distinct species, to identify orthologous/paralogous genes. As explained before, this identification is needed to enable a comparison of interactomes belonging to distinct species. To avoid installation and configuration issues, a DockerFootnote 6 container was created with a BLAST v2.6.0 installation. This container is invoked from the back-end application using the docker-java v3.0.13 library.Footnote 7

Figure 1 represents the general architecture and deployment of EvoPPI, including the components described above. EvoPPI is currently running in a WildFly v10.1.0 application serverFootnote 8 and uses a MySQL v5.7 database management systemFootnote 9 to store the information.

Fig. 1
figure 1

Architecture of EvoPPI, showing its main components and their interactions

EvoPPI 1.0 is publicly accessible at http://evoppi.i3s.up.pt. It is an open-source software distributed under a GPLv3 license. The source code of the front-end application is publicly available at https://github.com/sing-group/evoppi-frontend, while the source code of the back-end application is available at https://github.com/sing-group/evoppi-backend. Finally, the Docker container with the BLAST installation can be found at https://hub.docker.com/r/singgroup/evoppi-blast/.

2.3 Interactome Comparison Algorithms

EvoPPI allows users to compare the interactions of a gene (i.e., the query gene) in two or more interactomes, which may belong to the same or distinct species. Depending on this aspect, the algorithm used to perform the calculations is different.

2.3.1 Same Species Comparison

To retrieve the interactions for a given query gene in two or more interactomes belonging to the same species, the following algorithm is applied:

  1. 1.

    Interactions calculation step: for each query interactome, retrieve from the database the interactions where the query gene is present. EvoPPI allows specifying the interaction level, which is the degree of distance (up to a maximum of three) to retrieve transitive interactions. Therefore, if the degree is greater than one, after retrieving the genes that interact with the query gene, the process is repeated and the genes that interact with these degree 1 genes are also retrieved. This process is repeated as many times as the degree specified by the user and it results in a set of interactions, each one containing the interacting genes, the degree and the associated source query interactome.

  2. 2.

    Interactions completion step: iterate over all the interactions resulting from the previous step in order to check if they are present in the other query interactomes but were not discovered in the previous step. If so, add them with an unknown interaction level (i.e., − 1).

For example, as Fig. 2 illustrates, the following situation may occur: using an interaction level of 3 in the first step of the algorithm, the query gene A gives interactions A → B, B → C, and C → D in Interactome 1, but only the interaction A → B is present in Interactome 2. Although the interaction C → D is present in Interactome 2, it cannot be discovered because B → C does not exist. This completion step adds this kind of interaction with an interaction level of − 1, to indicate that the interaction is present in the interactome but the degree is unknown.

Fig. 2
figure 2

Exemplification of the Interactions completion step. The interactions set is completed by adding C → D (in red), discovered in the Interactome 2 by this second step

2.3.2 Distinct Species Comparison

Queries where interactomes belong to two distinct species follow a more complex process. In this case, the name reference interactomes is given to the interactomes of the species (i.e., reference species) to which the query gene belongs, and the name target interactomes is given to the interactomes of the second, distinct species (i.e., target species). To retrieve the interactions for a given query gene in two or more interactomes belonging to two distinct species, the following algorithm is applied:

  1. 1.

    Interactions calculation step in reference interactomes: apply the same procedure described in the interaction calculation step for same species comparisons in all reference interactomes.

  2. 2.

    Interactions completion step in reference interactomes: apply the same procedure described in the interactions completion step for same species for all the interactions obtained in the previous step.

  3. 3.

    Query gene BLAST: perform a BLAST query of the query gene and all genes involved in the set of interactions obtained in the previous step against the target genome to find their orthologous/paralogous genes.

  4. 4.

    Interactions calculation step in target interactomes: if the query gene has any orthologous/paralogous genes in the target species, apply the same procedure described in the interaction calculation step for same species comparisons to all its orthologous/paralogous genes in all target interactomes. As a restriction, interactions that do not have an orthologous/paralogous gene among the genes retrieved in step 1 are discarded.

  5. 5.

    Interactions completion step in reference interactomes: apply the same procedure described in the interactions completion step for same species for all the interactions obtained in the previous step. In this case, the interactions used as reference are those obtained in step 1 (i.e., reference interactions), instead of those obtained in step 4 (i.e., target interactions). BLAST results obtained in step 3 are used to determine the orthologous/paralogous relationships between reference and target interactions.

It is important to note that this algorithm will only retrieve interactions for those genes (or their corresponding orthologous/paralogous) discovered in the first step.

2.4 EvoPPI User Interface

EvoPPI provides an easy-to-use user interface specially designed for users without advanced bioinformatics skills. The landing page of EvoPPI (Fig. 3a) allows users to access the query interface, supporting the two types of analysis: one panel to perform same species comparisons (Fig. 3b) and another for distinct species comparisons (Fig. 3c). In both cases, users start by selecting the species, interactomes, and the query gene to perform the search. Despite EvoPPI using Gene-ID identifiers as the main identifier for the genes, it also keeps other alternative names. When a user starts to write a gene name, EvoPPI looks for that text in the gene identifier and for alternative names, to show a list of genes from which the user can select the query gene. In addition to these parameters, the interaction level parameter can be used in both query types to select the maximum degree of distance of the retrieved interactions.

Fig. 3
figure 3

Screenshots of EvoPPI: a the EvoPPI landing page, which gives access to the main functionalities (queries, results management and user login), b the query configuration panel for distinct species comparison, including the BLAST parameters, and c the query configuration panel for same species comparisons

The distinct species query form (Fig. 3b) also includes four parameters to configure the BLAST execution and filter the results. These parameters are: (1) the number of descriptions (BLAST max_target_seqs parameter); (2) the expect value (BLAST evalue parameter); (3) the minimum length of alignment blocks; and (4) the minimum identity, expressed as a percentage.

Although queries for the same species are completed in a few seconds, queries across species can take minutes or even hours, due to the BLAST sequence alignment step. Keeping this in mind, EvoPPI was designed to perform the queries asynchronously, so that users can launch a query, leave the application, and return later to check the execution status. To do so, EvoPPI implements three mechanisms: (1) it generates a unique URL for each query that users can use to return at any time to the query result; (2) it stores the results in the users’ browser storage so that they can be reopened later; and (3) it stores the queries in the EvoPPI database when users are logged in the web application.

The query results are listed in the results management interface (Fig. 4a). Each query result is presented in tabular (Fig. 4b) and graph formats (Fig. 4c). The tabular view lists the interactions, including the gene identifiers and names, and the interaction degree in each interactome, while the graph view represents the results as an undirected graph, where nodes are genes and edges are interactions. Different colours are used to represent the presence or absence of genes and interactions in the interactomes, while node sizes are used to represent the number of interactions for each gene. In both views, genes can be clicked to view detailed information (Fig. 4d), including the gene identifier, alternative names, the protein sequence and, for distinct species queries, the related results of the BLAST alignment.

Fig. 4
figure 4

Screenshots of the results management interfaces: a the EvoPPI results list, separated in distinct species and same species; b tabular results view of a distinct species query involving the gene ADH1A; c graph view for the same gene; and d additional gene information

Finally, the results can be exported in several formats, including a comma-separated values (CSV) file with the retrieved interactions, FASTA files with the protein sequences of the interactomes, and the interactions graph in different image formats.

3 Results and Discussion

3.1 Databases

Currently, EvoPPI incorporates data from 12 databases for 10 species, for which a minimum of 100 unique interactions have been reported, totalling 52 PPI datasets (Supplementary Table 1). EvoPPI uses Gene-ID as the main feature by which the corresponding proteins are identified in the database. Since not all publicly available databases use Gene-ID to identify the interacting proteins, data from those databases had to be converted using the UniProtKB ID mapping API, as described in the Material and Methods section. The average rate of conversion is high (87.9%), although it never reaches 100% (Supplementary Table 1).

The relationship between the number of unique interactions and the number of unique proteins for the 52 datasets is presented in Fig. 5. A power function with a coefficient of 0.6485 fits the data well (R2 = 0.94). The six largest datasets, namely Caenorhabditis elegans CCSB, and H. sapiens PINA2, mentha, HitPredict, BioGRID, and HIPPIE, have a much larger number of unique interactions than what was expected based on the number of unique proteins. This could suggest that: (1) smaller datasets are biased towards proteins showing many interactions; (2) the largest datasets include an important fraction of false positive interactions; or (3) although all data was downloaded from the main PPI databases, the largest datasets may include non-PPI interactions, such as gene interactions. The latter is likely the case for the CCSB C. elegans interactome, which integrated the WI8 interactome with evidence for functional relationships based on mRNA co-expression data available in WormBase12, RNAi phenotypes from RNAiDB24, genetic interactions curated in WormBase12, interolog interactions, and protein–protein interactions from the literature curated dataset [10].

Fig. 5
figure 5

The relationship between the number of unique interactions and the number of unique proteins for the 52 datasets used (Supplementary Table 1)

The number of PPI in the datasets overlaps only partially, as shown in Fig. 6, for the nine polyQ disease proteins and for the five largest datasets (PINA2, Mentha, HitPredict, HIPPIE, and BioGRID 3.4). Furthermore, the database with the highest number of PPIs differs from protein to protein. Therefore, the integration of all databases is needed to obtain all PPI available for a particular protein. This can be performed easily and quickly using the EvoPPI “Compare same species” operation.

Fig. 6
figure 6

The number of PPI in the PINA2 (in blue), Mentha (pink), HitPredict (green), HIPPIE (yellow), and BioGRID 3.4 (brown) datasets for: a androgen receptor (AR); b atrophin-1 (ATN1); c ataxin 1 (ATXN1); d ataxin 2 (ATXN2); e ataxin 3 (ATXN3); f ataxin 7 (ATXN7); g calcium voltage-gated channel subunit alpha1 A (CACNA1A); h Huntingtin (HTT); and i TATA-binding protein (TBP)

3.2 Case Study

PolyQ-containing proteins are enriched in protein complexes [27, 29]. Moreover, polyQ regions are usually located close to coiled-coil regions suggesting that they play a role in protein interaction regulation [27,28,29]. Therefore, it is not surprising that polyQ proteins have more PPI partners than non-polyQ proteins, and that they have a higher tendency to interact with other polyQ proteins than non-polyQ proteins [27]. In humans, 60 polyQ proteins have been described in the complete proteome [30]. Of those, nine of them, when expanded, cause neurodegenerative polyQ diseases [25], namely AR, ATN1, ATXN1, ATXN2, ATXN3, ATXN7, CACNA1A, HTT, and TBP. These nine proteins could be functionally related, despite not having any sequence homology. Therefore, our case study compares the interactors of these nine proteins using the EvoPPI web application.

Four interactions are found among polyQ disease proteins, namely HTT/TBP, AR/TBP, ATXN1/ATXN2 and ATXN7/TBP, which suggests that different proteins participate in the same biological pathways. Nevertheless, none of the proteins reported in the databases interacts with all nine wild-type polyQ disease proteins (Table 1). Indeed, the majority of the interactors bind to a single polyQ disease protein. This is an interesting observation given that some of the datasets considered in this study may be reporting interactions that were detected with proteins having an extended pathological polyQ only, possibly biasing the results towards an enrichment of common interactors. This suggests that: (1) these proteins participate in different biological processes, and/or (2) there are not many proteins that bind non-specifically to polyQ disease proteins with the help of the polyQ region. Only Polyubiquitin-C (UBC), a protease and RNA-binding protein, is reported to bind to eight out of the nine polyQ disease proteins (Supplementary Table 2). Nevertheless, based on the data available at the 14 H. sapiens datasets, UBC likely interacts with more than 50% of all human proteins. There are 14 other proteins that bind to at least 4 of the 9 polyQ disease proteins, namely SUMO1, SUMO2, VCP, PIAS1, CREBBP, EP300, GAPDH, EFEMP2, TP53, TBP, UBE2I, CASP1, CASP3, and NCOR1 (Supplementary Table 2). The list of the 15 interactors that interact with at least 4 of 9 polyQ disease proteins is enriched, according to PANTHERFootnote 10 in the molecular function “ubiquitin protein ligase binding” (6 proteins: SUMO1, SUMO2, UBC, PIAS1, TP53 and VCP; fold enrichment = 17.60; FDR = 9.07E−04). At least one of the six proteins that belong to the “ubiquitin protein ligase binding” GO term interacts with each of the polyQ disease proteins in this study (6, 2, 5, 3, 4, 4, 1, 5 and 3 interacts with AR, ATN1, ATXN1, ATXN2, ATXN3, ATXN7, CACNA1A, HTT and TBP, respectively; Supplementary Table 2). This distribution may reflect the number of large-scale studies dedicated to each of the proteins. Therefore, this observation suggests that ubiquitination is an important factor in the regulation of these nine polyQ disease proteins. Indeed, regulation of the ubiquitination machinery has been indicated as a potential therapeutic target in polyglutamine diseases [31,32,33,34]. These enzymes target proteins for degradation both by the proteasome and by autophagy [31,32,33,34].

Table 1 Distribution of the proteins of the nine polyQ disease interactomes according to presence in one or multiple polyQ disease proteins

Of the 261 proteins that interact with at least two polyQ disease proteins, as many as 42.1% have at least 1 paralogous protein, with an average of 2.44 (Supplementary Table 2). Members of the histone H3 family have as many as ten paralogous in this list. It should be noted that there is a clear co-occurrence of paralogous proteins. For instance, for the AR, only 20% of proteins with a paralogous do not show a presence/absence agreement with all other paralogous proteins. When comparing the interactors of the polyQ disease proteins, in ten cases the number of common interactors is lower than expected by chance, again suggesting that most polyQ disease proteins are involved in different functional networks (Table 2). For transcription factors AR/TBP, the number of common interactors is larger than expected by chance (Table 2), suggesting that either they are involved in the same biological pathway, or that many proteins are binding due to the presence of a polyQ region, which facilitates the interaction with other proteins. To address this issue, we used the EvoPPI BLAST search approach to identify orthologous/paralogous proteins in M. musculus (number of descriptions of 1; minimum expect value of 0.05; minimum length of alignment block of 40; minimum identity of 40%; interaction level 1). We have observed that, since the latter species shows shorter polyQ tracts at the N-terminal region of both AR and TBP proteins, as expected (Supplementary Fig. 1A, B), there are no common interactors when comparing TBP and AR (Fig. 7). 86% of the proteins that interact with TBP in M. musculus interact, and were identified as orthologous/paralogous of proteins interacting with TBP in humans, interact only with TBP in humans as well. Moreover, 67% of the proteins that interact with AR in M. musculus, and were identified as orthologous/paralogous of proteins interacting with AR in humans, interact only with AR in humans as well (Fig. 7). These observations suggest that, in humans, the sample size for both TBP and AR is already large enough to identify the majority of the proteins that interact with both of them. When we compare the number of common interactions and unique interactions in humans (48 vs. 479) and house mice (0 vs. 30) the proportion is non-significant (p = 0.10). The non-significant proportion in the number of common interactors and unique interactors in humans (where both genes encode proteins with polyQ) and in house mice (where there is either no polyQ or the size of the polyQ is shorter than in humans; Supplementary Fig. 1A, B), suggests that the larger number of common interactors between the two proteins could be attributed mainly to the involvement of these proteins in common biological pathways. Nevertheless, despite the large number of reported interactors for both proteins in humans, there are only 12.8% and 24.1% of common interactors for AR and TBP, respectively. Moreover, given the much smaller number of interactors reported for both M. musculus AR and TBP than for H. sapiens, this test may lack statistical power. This possibility must be considered since, for both humans and yeast, it has been reported that polyQ proteins have more PPI partners than non-polyQ proteins [27]. This tendency is observed in all species analysed in our case study having more than 10,000 interactions reported in at least 1 interactome database (H. sapiens, M. musculus, R. norvegicus, B. taurus, D. melanogaster, and C. elegans; Supplementary Fig. 2A–F, respectively), although only for H. sapiens, M. musculus, and D. melanogaster, polyQ proteins have a significantly larger number of interactors than non-polyQ proteins (Supplementary Fig. 2). Nevertheless, polyQ proteins tend to be transcription factors [27, 29] and these could have more interactors than the remaining protein categories. Therefore, we also addressed whether the proteins that function as transcription factors have more interactions than those that (1) are not transcription factors and (2) have, or not, a polyQ region. When this is taken into account, a clear effect of the polyQ on the number of interactors is observed in humans only. Therefore, we cannot exclude the effect of the polyQ region on the large number of common interactors between AR and TBP in humans, despite the surprising result obtained for the other species.

Table 2 Comparisons of the interactors of the polyQ disease proteins
Fig. 7
figure 7

H. sapiens (in blue) and M. musculus (in green) AR and TBP interactors

4 Conclusions

This paper has presented EvoPPI, an open-source web application tool that enables users to compare the interactions of a protein across interactomes from the same or different species. To compare interactomes from different species, EvoPPI uses a versatile BLAST search approach, which, we believe, is a distinctive feature of EvoPPI, since comparable tools only identify orthologs with the same name across species. The current version of EvoPPI also includes support user registration, allowing users to retain their query results for future management.

We have also shown the use of EvoPPI to compare the interactomes of the 9 human polyQ disease genes (those proteins that, when the polyQ tract is expanded, cause neurodegenerative disorders) in 14 datasets. Although polyQ genes show a large number of protein interactions, we found only a small set (15) that are common to at least four of these polyQ disease genes. Of these 15 proteins, 40% are involved in ubiquitin protein ligase-binding function. Ubiquitin/proteasome system dysfunction has been suggested in a range of polyglutamine neurodegenerative diseases [31,32,33,34]. Using the unique EvoPPI feature Compare different species, the comparisons of the human and mouse AR and TBP interactomes revealed a significant excess of common proteins. In humans, and for AR and TBP only, we cannot confidently discard the polyQ region as the cause of the observed excess of common interactions. For the other seven polyQ disease proteins, no excess of common interactors was observed.

The current development of EvoPPI includes, but it is not limited to: (1) the creation of a management interface to enable users to include new interactomes and species in the EvoPPI database, (2) the addition of new data visualization and analysis options, and (3) an improvement of the distinct species comparison algorithm to make it more complete and efficient.