Prediction of microbe-drug association based on graph attention stacked autoencoder
-
摘要:
传统方法发掘微生物与药物新关联主要通过生物实验完成,耗费时间且开销极大。基于此,提出基于图注意力堆叠自编码器微生物与药物关联预测方法GATSAE。建立微生物与药物异构网络,丰富关联信息;通过图卷积网络(GCN)提取多层潜在特征,得到微生物和药物的卷积融合矩阵;采用改进的堆叠自编码器学习有意义的高阶相似特征的无监督低维表示,在堆叠自编码器的基础上追加图卷积和注意力机制,进一步优化高阶特征信息的提取;将低维特征与关联特征串联,使用多层感知机(MLP)对最终的微生物-药物进行评分预测。通过效能评估,GATSAE方法的受试者工作特征曲线下面积(AUROC)及精确率-召回率曲线下面积(AUPR)分别达到
0.9619 和0.9577 ,优于经典的机器学习方法和常见的深度学习方法。案例研究表明,GATSAE方法能够准确预测到与SARS-CoV-2、大肠杆菌相关的候选药物,以及与阿司匹林相关的候选微生物。Abstract:A graph attention stacking autoencoder approach for predicting the association between microorganisms and drugs, known as GATSAE, is proposed in response to the conventional method of finding new associations between microorganisms and drugs, which is primarily accomplished through biological experiments, which is highly costly and time-consuming. Firstly, establish a heterogeneous network of microorganisms and drugs to enrich the associated information. Secondly, the convolutional fusion matrix of microorganisms and drugs is obtained by extracting multi-layer latent features through graph convolutional network (GCN). Once again, an improved stacked autoencoder is used to learn unsupervised low dimensional representations of meaningful high-order similar features. Graph convolution and attention mechanisms are added to the stacked autoencoder to further optimize the extraction of high-order feature information. Finally, the low-dimensional features are concatenated with associated features, and a multi-layer perceptron (MLP) is used to score and predict the final microbial drug. According to performance evaluation, GATSAE subjects’ area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPR) were
0.9619 and0.9577 , respectively. These results are better than those of popular deep learning techniques and traditional machine learning techniques. Case studies have shown that GATSAE can accurately predict candidate drugs related to SARS-CoV-2 and Escherichia coli, as well as candidate microorganisms related to aspirin. -
表 1 3种数据集的数据记录
Table 1. Data recording for three datasets
数据集 药物种类 微生物种类 已知关联关系种类 MDAD 627 142 1152 aBiofilm 1720 140 2884 DrugVirus 175 95 933 表 2 对比实验分组信息
Table 2. Comparison experiment grouping information
实验组 GCN 图注意力堆叠自编码器 数据串联 分组1 引入 无 无 分组2 无 引入 无 分组3 无 无 引入 分组4 无 无 无 GATSAE 引入 引入 引入 表 3 MLP与常见分类模型评价指标
Table 3. Evaluation indicators of MLP and common classification models
分类器 AUC AUPR Pre Rec F1 DT 0.9163 0.9164 0.9126 0.9203 0.9162 RF 0.9366 0.9309 0.9124 0.8915 0.9214 KNN 0.9489 0.9486 0.9041 0.9161 0.9098 SVM 0.8744 0.8469 0.8212 0.8516 0.8357 MLP 0.9619 0.9577 0.9166 0.9500 0.9329 表 4 模型指标对比值
Table 4. Comparison value of model indicators
模型 AUC AUPR Pre Rec F1 GCNMDA 0.9240 0.9124 0.8817 0.9257 0.9029 EGATMDA 0.9434 0.9335 0.9015 0.9213 0.9118 HGATDVA 0.9159 0.9008 0.8976 0.8989 0.8971 HNERMDA 0.8977 0.9026 0.8537 0.8430 0.8463 NIRBMMDA 0.8485 0.8327 0.8391 0.8388 0.8268 GATSAE 0.9619 0.9577 0.9166 0.9300 0.9329 表 5 与大肠杆菌相关排名前10的药物
Table 5. Top 10 drugs related to Escherichia coli
排名 相关药物 PMID 1 Ceftizoxime 6299968 2 Aminosalicylic Acid 33468700 3 Citral 35776056 4 Clozapine 25448498 5 Palmitic acid 29719215 6 Esculin 15137927 7 Azlocillin 7033199 8 Azidocillin 4563142 9 Aspirin 30658983 10 Glipizide 32995125 表 6 与阿司匹林相关排名前10的微生物
Table 6. Top 10 microbes related to Aspirin
排名 相关微生物 PMID 1 Candida albicans 33242673 2 Pseudomonas aeruginosa 25088031 3 Staphylococcus epidermidis 12555346 4 Human immunodeficiency virus 28480270 5 Streptococcus mutans unconfirmed 6 Staphylococcus aureus 34692677 7 Mycobacterium tuberculosis 23997233 8 Escherichia coli 30658983 9 Clostridium perfringens 31865684 10 Human herpesvirus unconfirmed 表 7 与SARS-CoV-2相关排名前20的药物
Table 7. Top 20 drugs related to SARS-CoV-2
排名 相关微生物 PMID 1 Chloroquine 35859449 2 ABT 37414987 3 Favipiravir 33108587 4 BCX 35062212 5 Luteolin 32389723 6 Amodiaquine 32486229 7 Cyclosporine 34081806 8 Emetine 33302852 9 Gemcitabine 32432977 10 Hydroxychloroquine 32373993 11 Amiodarone 36426888 12 Obatoclax 34989664 13 Remdesivir 33436624 14 Chlorpromazine 32773341 15 Nelfinavir 35390430 16 EIPA 37632140 17 Arbidol 32955901 18 Niclosamide 35348204 19 Dasatinib 36704839 20 Eflornithine 34055746 -
[1] 杨博图. 基于相似性信息的微生物-药物关联关系预测方法研究[D]. 长沙: 中南大学, 2022: 69.YANG B T. Study on prediction method of microbial-drug correlation based on similarity information[D]. Changsha: Central South University, 2022: 69(in Chinese). [2] SHREINER A B, KAO J Y, YOUNG V B. The gut microbiome in health and in disease[J]. Current Opinion in Gastroenterology, 2015, 31(1): 69-75. [3] LEY R E, BÄCKHED F, TURNBAUGH P, et al. Obesity alters gut microbial ecology[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(31): 11070-11075. [4] TURNBAUGH P J, RIDAURA V K, FAITH J J, et al. The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice[J]. Science Translational Medicine, 2009, 1(6): 6ra14. [5] GOLDMAN E. Antibiotic abuse in animal agriculture: exacerbating drug resistance in human pathogens[J]. Human and Ecological Risk Assessment: an International Journal, 2004, 10(1): 121-134. [6] VRBANAC A, DEBELIUS J W, JIANG L J, et al. An elegan(t) screen for drug-microbe interactions[J]. Cell Host & Microbe, 2017, 21(5): 555-556. [7] AARNOUDSE A L H J, DIELEMAN J P, VISSER L E, et al. Common ATP-binding cassette B1 variants are associated with increased digoxin serum concentration[J]. Pharmacogenetics and Genomics, 2008, 18(4): 299-305. [8] HAISER H J, SEIM K L, BALSKUS E P, et al. Mechanistic insight into digoxin inactivation by Eggerthella lenta augments our understanding of its pharmacokinetics[J]. Gut Microbes, 2014, 5(2): 233-238. [9] ONG F S, DEIGNAN J L, KUO J Z, et al. Clinical utility of pharmacogenetic biomarkers in cardiovascular therapeutics: a challenge for clinical implementation[J]. Pharmacogenomics, 2012, 13(4): 465-475. [10] VOORA D, SHAH S H, SPASOJEVIC I, et al. The SLCO1B1*5 genetic variant is associated with statin-induced side effects[J]. Journal of the American College of Cardiology, 2009, 54(17): 1609-1616. [11] RAMSEY L B, JOHNSON S G, CAUDLE K E, et al. The clinical pharmacogenetics implementation consortium guideline for SLCO1B1 and simvastatin-induced myopathy: 2014 update[J]. Clinical Pharmacology & Therapeutics, 2014, 96(4): 423-428. [12] VIOLI F, LIP G Y, PIGNATELLI P, et al. Interaction between dietary vitamin K intake and anticoagulation by vitamin K antagonists: is it really true? : a systematic review[J]. Medicine, 2016, 95(10): e2895. [13] GUTHRIE L, GUPTA S, DAILY J, et al. Human microbiome signatures of differential colorectal cancer drug metabolism[J]. NPJ Biofilms and Microbiomes, 2017, 3: 27. [14] ZHU L Z, DUAN G H, YAN C, et al. Prediction of microbe-drug associations based on chemical structures and the KATZ measure[J]. Current Bioinformatics, 2021, 16(6): 807-819. [15] MA Y J, LIU Q Q. Generalized matrix factorization based on weighted hypergraph learning for microbe-drug association prediction[J]. Computers in Biology and Medicine, 2022, 145: 105503. [16] ZHU B, XU Y, ZHAO P C, et al. NNAN: nearest neighbor attention network to predict drug-microbe associations[J]. Frontiers in Microbiology, 2022, 13: 846915. [17] LONG Y H, WU M, KWOH C K, et al. Predicting human microbe-drug associations via graph convolutional network with conditional random field[J]. Bioinformatics, 2020, 36(19): 4918-4927. [18] LONG Y H, WU M, LIU Y, et al. Ensembling graph attention networks for human microbe-drug association prediction[J]. Bioinformatics, 2020, 36(Supplement_2): i779-i786. [19] LONG Y H, ZHANG Y, WU M, et al. Predicting drugs for COVID-19/SARS-CoV-2 via heterogeneous graph attention networks[C]// Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine. Piscataway: IEEE Press, 2020: 455-459. [20] SUN Y Z, ZHANG D H, CAI S B, et al. MDAD: a special resource for microbe-drug associations[J]. Frontiers in Cellular and Infection Microbiology, 2018, 8: 424. [21] RAJPUT A, THAKUR A, SHARMA S, et al. aBiofilm: a resource of anti-biofilm agents and their potential implications in targeting antibiotic drug resistance[J]. Nucleic Acids Research, 2018, 46(D1): D894-D900. [22] ANDERSEN P I, IANEVSKI A, LYSVAND H, et al. Discovery and development of safe-in-man broad-spectrum antiviral agents[J]. International Journal of Infectious Diseases, 2020, 93: 268-276. [23] STEINBECK C, HOPPE C, KUHN S, et al. Recent developments of the chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics[J]. Current Pharmaceutical Design, 2006, 12(17): 2111-2120. [24] WEININGER D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules[J]. Journal of Chemical Information and Computer Sciences, 1988, 28(1): 31-36. [25] KAMNEVA O K. Genome composition and phylogeny of microbes predict their co-occurrence in the environment[J]. PLoS Computational Biology, 2017, 13(2): e1005366. [26] CHEN X, HUANG Y, YOU Z H, et al. A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases[J]. Bioinformatics, 2017, 33(5): 733-739. [27] CHEN X. KATZLDA: KATZ measure for the lncRNA-disease association prediction[J]. Scientific Reports, 2015, 5: 16840. [28] DENG L, HUANG Y B, LIU X J, et al. Graph2MDA: a multi-modal variational graph embedding model for predicting microbe-drug associations[J]. Bioinformatics, 2022, 38(4): 1118-1125. [29] JIANG H J, HUANG Y, YOU Z H. Predicting drug-disease associations via using Gaussian interaction profile and kernel-based autoencoder[J]. BioMed Research International, 2019, 2019: 2426958. [30] YANG H P, DING Y J, TANG J J, et al. Inferring human microbe-drug associations via multiple kernel fusion on graph neural network[J]. Knowledge-Based Systems, 2022, 238: 107888. [31] WANG C C, LI T H, HUANG L, et al. Prediction of potential miRNA-disease associations based on stacked autoencoder[J]. Briefings in Bioinformatics, 2022, 23(2): bbac021. [32] LIU D Y, HUANG Y B, NIE W J, et al. SMALF: miRNA-disease associations prediction based on stacked autoencoder and XGBoost[J]. BMC Bioinformatics, 2021, 22(1): 219. [33] WANG S D, LIN B Y, ZHANG Y Y, et al. SGAEMDA: predicting miRNA-disease associations based on stacked graph autoencoder[J]. Cells, 2022, 11(24): 3984. [34] LI J, ZHANG S, LIU T, et al. Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction[J]. Bioinformatics, 2020, 36(8): 2538-2546. [35] LONG Y H, LUO J W. Association mining to identify microbe drug interactions based on heterogeneous network embedding representation[J]. IEEE Journal of Biomedical and Health Informatics, 2021, 25(1): 266-275. [36] CHENG X L, QU J, SONG S B, et al. Neighborhood-based inference and restricted Boltzmann machine for microbe and drug associations prediction[J]. PeerJ, 2022, 10: e13848. [37] WANG M L, CAO R Y, ZHANG L K, et al. Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro[J]. Cell Research, 2020, 30(3): 269-271. [38] GHASEMNEJAD-BERENJI M, PASHAPOUR S. Favipiravir and COVID-19: a simplified summary[J]. Drug Research, 2021, 71(3): 166-170. [39] THEOHARIDES T C, CHOLEVAS C, POLYZOIDIS K, et al. Long-COVID syndrome-associated brain fog and chemofog: Luteolin to the rescue[J]. BioFactors, 2021, 47(2): 232-241. [40] CHOY K T, WONG A Y, KAEWPREEDEE P, et al. Remdesivir, lopinavir, emetine, and homoharringtonine inhibit SARS-CoV-2 replication in vitro[J]. Antiviral Research, 2020, 178: 104786. [41] DHAR J, SAMANTA J, KOCHHAR R. Corona virus disease-19 pandemic: the gastroenterologists’ perspective[J]. Indian Journal of Gastroenterology, 2020, 39(3): 220-231. -


下载: