Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari

Читать онлайн.
Название Computational Prediction of Protein Complexes from Protein Interaction Networks
Автор произведения Sriganesh Srihari
Жанр Программы
Серия ACM Books
Издательство Программы
Год выпуска 0
isbn 9781970001549



Скачать книгу

determined protein interactions with a “core” subset of interactions that have passed quality assessment (for example, based on literature verification). The Centre for Cancer Systems Biology (CCSB) Interactome Database at Harvard [Rolland et al. 2014, Yu et al. 2008, Yu et al. 2011] contains yeast, plant, virus, and human interactions. STRING [Von Mering et al. 2003, Szklarczyk et al. 2011] catalogs physical and functional interactions inferred from experimental and computational techniques. MIPS Comprehensive Yeast Genome Database (CYGD) [Güldener et al. 2005] and MIPS Mammalian Protein-Protein Interaction Database (MPPI) [Pagel et al. 2005] catalog protein interactions and also expert-curated protein complexes from yeast and mammals. The Human Protein Reference Database (HPRD) [Peri et al. 2004, Prasad et al. 2009] mainly contains experimentally identified human interactions. IRefIndex [Razick et al. 2008, Turner et al. 2010] and Integrated Interaction Database (IID) [Brown and Jurisica 2005] integrate experimental and computationally predicted interactions for human and several other species.

PPI DatabaseSourceReference
BioGridhttp://thebiogrid.org[Stark et al. 2011]
BINDhttp://bind.ca[Bader et al. 2003]
CCSBhttp://interactome.dfci.harvard.edu/[Rolland et al. 2014, Yu et al. 2008, Yu et al. 2011]
CYGDhttp://mips.helmholtz-muenchen.de/genre/proj/yeast/[Güldener et al. 2005]
DIPhttp://dip.doe-mbi.ucla.edu[Xenarios et al. 2002, Salwinski et al. 2004]
EMBL-EBI IntActhttp://www.ebi.ac.uk/intact/[Hermjakob et al. 2004, Kerrien et al. 2012]
HAPPIhttp://discern.uits.iu.edu:8340/HAPPI/[Chen et al. 2009]
HPRDhttp://www.hprd.org/[Peri et al. 2004, Prasad et al. 2009]
OPHID/IIDhttp://ophid.utoronto.ca/[Brown and Jurisica 2005]
InnateDBhttp://www.innatedb.com/[Lynn et al. 2008]
iRefIndexhttp://irefindex.org/wiki/index.php?title=iRefIndex[Razick et al. 2008, Turner et al. 2010]
MINT/HomoMINThttp://mint.bio.uniroma2.it/mint/[Zanzoni et al. 2002, Chatr-Aryamontri et al. 2007, Persico et al. 2005]
MIPShttp://mips.helmholtz-muenchen.de/proj/ppi/[Mewes et al. 2008]
MPPIhttp://mips.helmholtz-muenchen.de/proj/ppi/[Pagel et al. 2005]
STRINGhttp://string-db.org/[Von Mering et al. 2003, Szklarczyk et al. 2011]

      A simple yet effective way to represent interaction data is in the form of an undirected network called a protein-protein interaction network or simply PPI network, given as G = 〈V, E〉, where V is the set of proteins and E is the set of physical interactions between the proteins. Such a network presents a global or “systems” view of the entire set of proteins and their interactions, and provides a topological (mathematical) framework to interrogate the interactions. In the definitions throughout this book, we also use V (G) and E(G) to refer to the set of proteins and interactions of a (sub)network of G. For a protein vV, the set N(v) or Nv includes all immediate neighbors of v, and deg(v) = |N(v)| is the degree of v. These neighbors together with their interactions, Ev = E(v) = {(v, u) : uN(v)} ∪ {(u, w) : uN(v), wN(v) ∩ N(u)}, constitute the local (immediate) neighborhood subnetwork of v.

      PPI networks, like most real-world networks, have characteristic topological properties which are distinct from that of random networks. But, to understand this distinction we need to first understand what are random networks. Traditionally, random networks have been described using the Erdös-Rényi (ER) model, in which G(n, p) is a random network with |V| = n nodes and each possible edge connecting pairs of these nodes has probability p of existing [Erdös 1960, Bollobás 1985]. The expected number of edges in the network is Image, and the expected mean degree is np. Alternatively, a random network is defined as a network chosen uniformly at random from the collection Image of all possible networks with n nodes and m edges. If p is the probability for the existence of an edge, the probability for each network in this collection is,

Image

      where the closer p gets to 1, the more skewed the collection is toward networks with higher number of edges. When p = 1/2, the collection contains all possible Image networks with equal probability, Image.

      A marked feature of PPI networks that makes them non-random is the extremely broad distribution of degrees of proteins: While a majority of proteins have small number of immediate binding partners, there exist some proteins, referred to as “hubs,” with unusually large number of binding partners. Moreover, the degrees of most proteins are larger than the average node degree of the network. This degree distribution P(k) can be approximated by a power law P(k) ∼ k−γ, where k is the node degree and γ is a small constant. Such networks are called scale-free networks [Barabási 1999, Albert and Barabási 2002].

      Proteins within PPI networks exhibit an inherent tendency to cluster, which can be quantified by the local clustering coefficient for these proteins [Watts and Strogatz 1998]. For a selected protein v that is connected to |N(v)| other nodes, if these immediate neighbors were part of a clique (a complete subgraph), there will be Image interactions between them. The ratio of the actual number of interactions that exist between these nodes and the total number of possible interactions gives the local clustering coefficient CC(v) for the protein v:

Image

      The average clustering coefficient of the entire network is therefore given by

Image

      This clustering property gives rise to groups (subnetworks) of proteins in the PPI network that exhibit dense interactions within the groups, but sparse or less-dense interactions between the groups. The interactions between the groups occur via a few “central” proteins through which pass most paths in the network. The average shortest path length Image is given by