Finally, we further analyzed in detail the results for AD, showing that some well-ranked top genes were out of any known linkage interval associated with AD and still played a relevant role. shows the top-scoring genes for AD and the subnetwork induced by the interactions between their proteins. The 17 AD seeds (disease-gene associations from OMIM) and the 106 genes prioritized by NetCombo involved several protein complexes and signaling pathways such as the gamma-secretase complex, serine protease inhibitors, the cohesin complex, structural maintenance of chromosome (SMC) family, the short-chain dehydrogenases/reductases (SDR) family, adamalysin (ADAM) family, cytokine receptor family and Notch signaling pathway. Some genes within these families have been demonstrated to be involved in AD pathology –: ADAM10 (ADAM family), HSD17B10 (SDR family), and PSENEN, APH1A, APH1B, and NCSTN (gamma-secretase complex). It is worth mentioning that AD has been central to recent research efforts, but mechanisms underlying the disorder are still far from understood. The accumulation of senile plaques and neurofibrillary tangles is postulated as the main cause of the disease. The gamma-secretase is involved in the cleavage of the amyloid precursor protein. This process produces the amyloid beta peptide, the primary constituent of the senile plaques in AD. Interestingly, the six genes predicted by the method (pointed by arrows in ) were not associated with AD in OMIM. Remarkably, only APH1A (1q21–q22), and PSENEN (19q13.13) lied either under or close to a linkage interval associated with AD (i.e. 1q21, OMIM:611152; and 19q13.32, OMIM:107741) and none of the remaining four genes lied under or close to a known linkage interval associated with AD. Moreover, the subnetwork of top-ranking AD genes covered several genes in the expert curated data set reported by Krauthammer et al. such as APBB1, VLDLR, SERPINA1 and BACE1 (p-value associated with this event1.3e-3).
We investigated the prediction capacity and robustness of the approaches by testing their performance against the quality and number of interactions. Typically, network-based methods consider the paths between nodes equally relevant for a particular disease. The prioritization methods proposed in this study differ from others in the way the information is transferred through the network topology. NetShort considered a path between nodes shorter if it contained more seeds (known-disease gene associations) in comparison to other paths. NetScore accounted for multiple shortest paths between nodes. NetZcore assessed the biological significance of the neighborhood configuration of a node using an ensemble of networks in which nodes were swapped randomly but the topology of the original network was preserved. Our results demonstrated that combining different prioritization methods could exploit better the global topology of the network than existing methods. The prediction performance of the prioritization methods depended on the quality and size of the underlying interaction network. Yet, this dependence affected the performance of the methods similarly. The improvement of the network quality also improved the predictions for all methods. On the other hand, the prediction accuracy of the prioritization methods showed a large variation depending on the phenotype in consideration, but this variation was reduced when a consensus method was used (NetCombo).
Using disease-gene association information in OMIM data set and the proposed consensus prioritization method (NetCombo) on the human interactome, we calculated the disease-association scores of all genes in the network for Alzheimer's Disease (AD), diabetes and AIDS, three phenotypes with relatively high prevalence in the society. In order to check the validity of these scores, we used disease-gene associations from the Comparative Toxicogenomics Database (CTD) , the Genetic Association Database (GAD) and available expert curated data sets (see for details). Moreover, we analyzed the GO functional enrichment of the top-ranking genes.
The AUC values over dozens of different phenotypes that vary in number of initial gene-phenotype associations showed the applicability of the methods independent of the number of genes originally associated with the phenotype. Moreover, having more number of seeds associated with a pathophenotype did not necessarily improve the prediction accuracy. Most prioritization methods achieved better performance for disorders with low number of seeds. This difference in performance was significant for NetCombo, NetShort and Network propagation. In fact, the accuracy of the predictions was rather correlated with the average shortest path length between seeds, which shows the importance of the topology of the network.