Shanghai University
Article Information
- Chunhong WANG, Shuiming CAI, Zengrong LIU, Youwen CHEN. 2015.
- Yeast protein-protein interaction network model based on biological experimental data
- Appl. Math. Mech. -Engl. Ed., 36(6): 827-834
- http://dx.doi.org/10.1007/s10483-015-1940-6
Article History
- Received 2014-5-6;
- in final form 2014-11-24
2. Faculty of Science, Jiangsu University, Zhenjiang 212013, Jiangsu Province, China;
3. Institute of Systems Biology, Shanghai University, Shanghai 200444, China
Since protein-protein interactions (PPIs) are central to most biological processes,the systematic identification of all PPIs is considered as an important strategy for uncovering the inner workings of a cell[1]. A number of experimental and computational techniques have been developed to systematically determine both the potential and the actual PPIs in several model organisms. Up to now,the information of PPI networks at the whole-genome level is available from many organisms,including Saccharomyces cerevisiae (bakes yeast)[2, 3, 4],Caenorhabditis elegans (worm)[5],Drosophila melanogaster (fruit fly)[6],and Homo sapiens (human)[7, 8]. Investigations on the topological structures of these PPI networks have revealed that they share several interesting features,e.g.,sparseness,small-world,scale-free,hierarchical modularity,and disassortativity[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. In order to recapture these characteristics,various network growth models invoking preferential attachment or gene duplication and divergence have been constructed. For more details,one may refer to Ref. [16] and the relevant references therein.
From the viewpoint of natural selection and biological evolution,duplication and divergence are the two dominant evolutionary forces in shaping biological networks[17, 18]. Duplication is the primary resource for biological network growth,while divergence is the original generating function diversity. Over the past decade,according to the principle of duplication and divergence,many network models have been proposed to reproduce topological properties of real PPI networks[16]. Cai[16] proposed a network model,where duplication and divergence were complemented with heterodimerization,to recapture the above-mentioned five topological properties of real PPI networks,and obtained some analysis results under certain conditions. However,the results in Ref. [16] only considered the evolutionary process. How to select the model parameters with real biological experimental data is not mentioned in Ref. [16]. Therefore,the biological meanings of the results are not clear. A problem arises naturally: whether similar results can be obtained when real biological experimental data are taken into account?
To address this problem,using the real PPI statistical data,we establish a yeast PPI network model based on duplication and divergence. The simulation results show that the topological characteristics of real PPI networks can be reproduced by our network model,including sparseness,scale-free,small-world,hierarchical modularity,and disassortativity.
2 Duplication-divergence modelIn PPI networks,the physical interaction of two proteins is expressed as two nodes connected by an edge. In the perspective of networks,duplication means that a new node,which has the same neighbors as the original node,is created. Naturally,the new node inherits all properties of the original node that is duplicated.
In the subsequent evolution,divergence illustrates the loss or addition of the functions of proteins. It can be represented as deleting or adding edges in the PPI networks. Deleting edges means that the divergent node loses links to its neighbors,while adding edges means that the divergent node links to the non-neighbor nodes. Specially,if edges are added between the new nodes and the original ones,the process is called heterodimerization[15, 16]. In the present model,due to the unique role of heterodimerization played in biology,we consider it as an independent process,separated by the added edges.
Although a new generated protein has the same properties as the original protein,the interactions among the proteins may be lost due to the mutations or mistranslations and the omission errors in the process of gene translations[16]. If the new generated protein has certain contribution to the biological function,it will be preserved in the evolutionary processes. Otherwise,it will become an isolated protein (isolated node) and will be removed from the network[16].
Based on the results in Refs. [15, 18, 19, 20, 21, 22],the processes and the statistical data of the yeast PPI networks can be summarized as follows:
(i) The yeast PPI network has about 4 000 proteins. The average degree ‹k› of the nodes is about 3.74,and the degree distribution is scale-free. The average cluster correlation ‹c› is about 0.066[15].
(ii) The duplication rate of every protein is on the order of magnitude of 10−2 per Ma (a million year). Because as many as 90% of gene duplicates are likely to eventually get lost after duplication,the effective duplication rate is about 10−3 per Ma[19, 20, 21, 22].
(iii) The rate of adding edges in every yeast duplicate protein pair is on the order of magnitude of 10−3[20]. Although it is not easy to get the deleting-edge rate from the biological experiments,a balance between the adding edges and the deleting edges must exist because of the relative stability of the average degree of the network.
(iv) There are three different preference mechanisms in duplication and divergence. In duplication,the rule of small-preference duplication of a node means that the probability of a node chosen to duplicate is inversely proportional to its degree[18, 19]. Because the nodes of the high degree (hubs) tend to be conservative,the probability of its duplication is very small. In divergence,the preference rule of the addition edges means that the larger the sum of degrees of the node pairs is,the bigger the probability of the addition edges is. The preference rule of deletion edges means that the smaller the sum of the reciprocal of the degrees of the node pair is,the larger the probability of deletion edges is.
(v) The duplicate protein is more likely to link to the original one,i.e.,the rate of heterodimerization is larger than that of non-heterodimerization[15].
3 Yeast PPI network evolution modelAccording to the above statistical data of yeast PPI networks,we present the following network evolution model:
(I) In view of (i),we start from a randomly connected network with 100 nodes,whose average degree ‹k› is close to 3.88.
(II) According to (ii) and (iv),we set one Ma as the time step. In one Ma,every node is duplicated with the probability proportional to the reciprocal of its degree. The duplication of the whole network keeps a rate of 10−3 per Ma. If a node is duplicated,we carry out the next steps. If not,we will not consider the next one until all nodes in the network have been considered in a time step.
(III) Based on (iii) and (iv),we delete the links of the new duplicate nodes. It is assumed that every edge is deleted with the probability inversely proportional to the reciprocal of the sum of degrees of the node pair.
(IV) Similarly,according to (iii) and (iv),we add edges to the new duplicate nodes. Considering all probable addition between the duplicate and the non-neighbor node (except the original one),we add edges with the probability proportional to the sum of degrees of the node pair. We assume that the addition-edges probability is equal to the deletion-edge probability. The average rate of deleting is on the order-of-magnitude of 10−3.
(V) The probability of the new duplication that heterodimerizates with the original one is denoted by p.
(VI) After the above steps,if a node is isolated from the network,we delete it.
(VII) Repeat the above six steps until the size of the PPI networks increases to 4 000.
4 Simulation and resultsUsing the above-established network model,we perform numerical simulations. All the data are averaged over 100 independent runs. In the simulations,the parameters p and q are chosen according to the privilege that the sum of the two parameters is equal to the addition-edge rate in the simulations.
4.1 Sparse and small worldTable 1 lists the values of ‹k› and ‹c› with different values of p and q,which matches to the real data well. The results show that the simulation error is comparable with the data of the real yeast network
Figure 1 presents the degree distribution lines under different p and q. It is evident that they are scale-free and their exponents are between −2.3 and −2.0,consistent with the real number which are between −2 and −3. The result shows that the network is scale-free.
![]() |
Fig. 1. Degree distribution P (k) (log-log scale) for different p and q |
Many biological networks display the property of hierarchical modularity[12]. Usually,Ci is used to measure the local cohesiveness of the network in the neighborhood of the node. It is defined by
C(k) is the dependence of the average cluster coefficient C of the nodes with the same degree k. It is an essential indicator to quantify the hierarchical modularity structure of a network. C(k) approximates k−Θ when Θ approximates −1,suggesting a hierarchical structure of the network[12]. In the present model,C(k) is consistent with the real biological observation shown in Fig. 2.
![]() |
Fig. 2. Cluster degree distribution C(k) (log-log scale) for different p and q |
We analyze the degree-degree correlation of the network,which represents the connection way of the nodes in the network. If the nodes with a high degree tend to link with the nodes with a high degree,the network is assortative; if the nodes with a high degree tend to link with the low-degree nodes,we call it disassortative. Practically,we calculate the person correlation coefficient to describe this property,which is defined as follows[23]:
where ji and ki are the degrees of the ends of the link i (i = 1,2,· · · ,M),where M is the number of the links in the network. If DDC > 0,the network is assortative. If DDC < 0,the network is disassortative. If DDC = 0,the network is independent. The values of DDC of the present PPI network model are listed in Table 2. The results show that the network model is disassortative,which also matches with real protein-protein networks.A PPI network model is constructed based on the principle of duplication and divergence with the real yeast PPI network experimental data. This model includes five processes,i.e., duplication,edge deletion,edge addition,heterodimerization,and removal of isolated nodes. Different preference rules are discussed with the parameters obtained from the real experimental or statistic data. The numerical simulation results show that the degree distribution,the average degree,the average cluster correlation,the cluster coefficient correlation,and the degree-degree correlation of the constructed network model are all well consistent with those of real PPI networks. It is believed that the constructed PPI network model can provide insights into the mechanism underlying the evolution of PPI networks.
[1] | Yook, S. H., Oltvai, Z. N., and Barabasi, A. L. Functional and topological characterization of protein interaction networks. Proteomics, 4, 928-942 (2004) |
[2] | Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J. M. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. nature, 403, 623-627 (2000) |
[3] | Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. A comprehensive twohybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences, 98, 4277-4278 (2001) |
[4] | Güldener, U., Münsterkötter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H. W., and Stümpflen, V. The MIPS protein interaction resource on yeast. Nucleic Acids Research, 34, 436-441 (2006) |
[5] | Li, S. M., Armstrong, C. M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P. O., Han, J. D. J., Chesneau, A., Hao, T., Goldberg, D. S., Li, N., Martinez, M., Rual, J. F., Lamesch, P., Xu, L., Tewari, M., Wong, S. L., Zhang, L. V., Berriz, G. F., Jacotot, L., Vaglio, P., Reboul, J., Hirozane-Kishikawa, T., Li, Q. R., Gabel, H. W., Elewa, A., Baumgartner, B., Rose, D. J., Yu, H. Y., Bosak, S., Sequerra, R., Fraser, A., Mango, S. E., Saxton, W. M., Strome, S., van den Heuvel, S., Piano, F., Vandenhaute, J., Sardet, C., Gerstein, M., Doucette-Stamm, L., Gunsalus, K. C., Harper, J. W., Cusick, M. E., Roth, F. P., Hill, D. E., and Vidal, M. A map of the interactome network of the Metazoan C. elegans. Science, 303, 540-543 (2004) |
[6] | Giot, L., Bader, J. S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y. L., Ooi, C. E., Godwin, B., Vitols, E., Vijayadamodar, G., Pochart, P., Machineni, H., Welsh, M., Kong, Y., Zerhusen, B., Malcolm, R., Varrone, Z., Collis, A., Minto, M., Burgess, S., McDaniel, L., Stimpson, E., Spriggs, F., Williams, J., Neurath, K., Ioime, N., Agee, M., Voss, E., Furtak, K., Renzulli, R., Aanensen, N., Carrolla, S., Bickelhaupt, E., Lazovatsky, Y., DaSilva, A., Zhong, J., Stanyon, C. A., Finley, R. L., Jr, White, K. P., Braverman, M., Jarvie, T., Gold, S., Leach, M., Knight, J., Shimkets, R. A., McKenna, M. P., Chant, J., and Rothberg, J. M. A protein interaction map of Drosophila melanogaster. Science, 302, 1727-1736 (2003) |
[7] | Stelzl, U., Worm, U., Lalowski, M., Haenig, C., Brembeck, F. H., Goehler, H., Stroedicke, M., Zenkner, M., Schoenherr, A., Koeppen, S., Timm, J., Mintzlaff, S., Abraham, C., Bock, N., Kietzmann, S., Goedde, A., Toksöz, E., Droege, A., Krobitsch, S., Korn, B., Birchmeier, W., Lehrach, H., and Wanker, E. E. A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122, 957-968 (2005) |
[8] | Rual, J. F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., Berriz, G. F., Gibbons, F. D., Dreze, M., Ayivi-Guedehoussou, N., Klitgord, N., Simon, C., Boxem, M., Milstein, S., Rosenberg, J., Goldberg, D. S., Zhang, L. V., Wong, S. L., Franklin, G., Li, S., Albala, J. S., Lim, J., Fraughton, C., Llamosas, E., Cevik, S., Bex, C., Lamesch, P., Sikorski, R. S., Vandenhaute, J., Zoghbi, H. Y., Smolyar, A., Bosak, S., Sequerra, R., Doucette-Stamm, L., Cusick, M. E., Hill, D. E., Roth, F. P., and Vidal, M. Towards a proteome-scale map of the human protein-protein interaction network. nature, 437, 1173-1178 (2005) |
[9] | Barabasi, A. L. and Oltvai, Z. N. Network biology: understanding the cell's functional organization. Nature Reviews Genetics, 5, 101-113 (2004) |
[10] | Sole, R. V. and Pastor-Satorra, R. Smith EA model of large-scale proteome evolution. Advances in Complex Systems, 5, 43-54 (2002) |
[11] | Jeong, H., Mason, S. P., BarabáRsi, A. L., and Oltvai, Z. N. Lethality and centrality in protein networks. nature, 411, 41-42 (2001) |
[12] | Ravasz, E. and Barabasi, A. L. Hierarchical organization in complex networks. nature, 67, 026112 (2003) |
[13] | Williams, R. J., Martinez, N. D., and Berlow, E. L. Two degrees of separation in complex food webs. Science, 297, 1551-1555 (2002) |
[14] | Maslov, S. and Sneppen, K. Specificity and stability in topology of protein networks. Science, 296, 910-913 (2002) |
[15] | Hase, T., Niimura, Y., Kaminuma, T., and Tanaka, H. Non-uniform survival rate of heterodimerization links in the evolution of the yeast protein-protein interaction network. PloS One, 3, e1667 (2008) |
[16] | Cai, S. M. Construction and Analysis of Batch Control and Biological Network Chaotic Systems and Complex Dynamical Networks (in Chinese), Ph.D. dissertation, Shanghai University, Shanghai (2012) |
[17] | Ohno, S. Evolution by Gene Duplication, Springer, Berlin (1970) |
[18] | Ispolatov, I., Krapivsky, P. L., and Yuryev, A. Duplication-divergence model of protein interaction network. Physical Review E, 71, 061911 (2005) |
[19] | Lynch, M. and Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science, 290, 1151-1155 (2000) |
[20] | Wagner, A. The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Molecular Biology and Evolution, 18, 1283-1292 (2003) |
[21] | Li, W. H. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. Journal of Molecular Evolution, 36, 96-99 (1993) |
[22] | Prachumwat, A. and Li, W. H. Protein function, connectivity, and duplicability in yeast. Molecular Biology and Evolution, 23, 30-39 (2006) |
[23] | Costa, L. F., Rodrigues, F. A., Vieso, G. T., and Boas, P. R. V. Characterization of complex networks: a survey of measurements. Advances in Physics, 56, 167-242 (2007) |