Research

We explore the most advanced and captivating intersections of AI and bioinformatics.

Total Citations

Times our research has been cited by other scholars

h-index

Papers with at least h citations

i10-index

Papers with at least 10 citations

Featured Research

Discover our latest AI breakthroughs and updates from the lab

PTMGPT2

Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model

Post-translational modification prediction

Published: 02/01/2024Preview

SynergyGTN

Unlocking the therapeutic potential of drug combinations through synergy prediction using graph transformer networks

Drug combination predication for cancer cell-line

Published: 02/01/2024Preview

PUResNet

PUResNetV2. 0: a deep learning model leveraging sparse representation for improved ligand binding site prediction

Ligand Binding Site Prediction

Published: 02/01/2024Preview

Research Publications

Explore our contributions to AI research and development across various domains

2025

Therapeutic Potential of Curcuminoids in Type 2 Diabetes Mellitus (T2DM): Insights from Network Pharmacology, Molecular Docking, and Dynamics Simulations

03/19/2025

Authors: Ankit Pokhrel, Kil To Chong, Hilal Tayara

Journal: Food Bioscience

Abstract: Protein-protein interactions (PPIs) govern essential biological processes, relying on specific binding sites for molecular machinery in cells. Identifying these binding sites is crucial, with computational methods emerging as efficient alternatives to labor-intensive experimental approaches. While various techniques leverage sequential and structural information of amino acids, the limited availability of protein structural data in databases makes sequential-based models more practical. The proposed model, named TranP-B-site, employs a convolutional neural network on the transformer model’s embeddings of the sequential information of the amino acids to predict the binding sites of PPIs. First, two types of features are extracted for each amino acid in a protein sequence: one-hot encoding representing the low-level features and transformer model-based embeddings, which contain information about the entire protein …

View Publication →

TranP-B-site: A Transformer Enhanced Method for prediction of binding sites of Protein-protein interactions

03/15/2025

Authors: Sharzil Haris Khan, Hilal Tayara, Kil To Chong

Journal: Measurement

View Publication →

Transforming Highway Safety With Autonomous Drones and AI: A Framework for Incident Detection and Emergency Response

03/11/2025

Authors: Muhammad Farhan, Hassan Eesaar, Afaq Ahmed, Kil To Chong, Hilal Tayara

Journal: IEEE Open Journal of Vehicular Technology

Abstract: Highway accidents pose serious challenges and safety risks, often resulting in severe injuries and fatalities due to delayed detection and response. Traditional accident management methods heavily rely on manual reporting, which can be sometime inefficient and error-prone resulting in valuable life loss. This paper proposes a novel framework that integrates autonomous aerial systems (drones) with advanced deep learning models to enhance real-time accident detection and response capabilities. The system not only dispatch the drones but also provide live accident footage, accident identification and aids in coordinating emergency response. In this study we implemented our system in Gazebo simulation environment, where an autonomous drone navigates to specified location based on the navigation commands generated by Large Language Model (LLM) by processing the emergency call/transcript. Additionally, we created a dedicated accident dataset to train YOLOv11 m model for precise accident detection. At accident location the drone provides live video feeds and our YOLO model detects the incident, these high-resolution captured images after detection are analyzed by Moondream2, a Vision language model (VLM), for generating detailed textual descriptions of the scene, which are further refined by GPT 4-Turbo, large language model (LLM) for producing concise incident reports and actionable suggestions. This end-to-end system combines autonomous navigation, incident detection and incident response, thus showcasing its potential by providing scalable and efficient solutions for incident response management. The initial implementation demonstrates promising results and accuracy, validated through Gazebo simulation. Future work will focus on implementing this framework to the hardware implementation for real-world deployment in highway incident system.

View Publication →

Enhancing DILI toxicity prediction through integrated graph attention (GATNN) and dense neural networks (DNN)

03/01/2025

Authors: Agung Surya Wibowo, Kil To Chong, Hilal Tayara

Journal: Toxicology

Abstract: Drug-induced liver injury (DILI) toxicity is a condition when drugs have a destructive effect on the liver organ. The prediction of this toxicity becomes crucial in the drug development process to guarantee that drugs are safe from toxicity. Assessment is usually carried out in the conventional laboratory, which causes a high cost in materials and time. To help solve the problem, computational technology is used to predict DILI toxicity in compounds and drugs. Many researchers have developed the model by using various molecular datasets. The Simplified molecular input line entry system (SMILES) code data was used to represent drugs or compounds. In this work, we proposed the modified model using the reliable dataset from the previous work. We reproduced the best previous model and combined it with the graph attention neural network. After running the proposed model, the performances outperformed almost all …

View Publication →

CpGFuse: a holistic approach for accurate identification of methylation states of DNA CpG sites

02/19/2025

Authors: Sehi Park, Kil To Chong, Hilal Tayara

Journal: Briefings In Bioinformatics

Abstract: Anomalous DNA methylation has wide-ranging implications, spanning from neurological disorders to cancer and cardiovascular complications. Current methods for single-cell DNA methylation analysis face limitations in coverage, leading to information loss and hampering our understanding of disease associations. The primary goal of this study is the imputation of CpG site methylation states in a given cell by leveraging the CpG states of other cells of the same type. To address this, we introduce CpGFuse, a novel methodology that combines information from diverse genomic features. Leveraging two benchmark datasets, we employed a careful preprocessing approach and conducted a comprehensive ablation study to assess the individual and collective contributions of DNA sequence, intercellular, and intracellular features. Our proposed model, CpGFuse, employs a convolutional neural network with an …

View Publication →

Analysis of Ruddlesden‐Popper and Dion‐Jacobson 2D Lead Halide Perovskites Through Integrated Experimental and Computational Analysis

02/17/2025

Authors: Basir Akbar, Kil To Chong, Hilal Tayara

Journal: Battery Energy

Abstract: Two-dimensional (2D) lead halide perovskites (LHPs) have captured a range of interest for the advancement of state-of-the-art optoelectronic devices, highly efficient solar cells, next-generation energy harvesting technologies owing to their hydrophobic nature, layered configuration, and remarkable chemical/environmental stabilities. These 2D LHPs have been categorized into the Dion-Jacobson (DJ) and Ruddlesden-Popper (RP) systems based on their layered configuration respectively. To efficiently classify the RP and DJ phases synthetically and reduce reliance on trial/error method, machine learning (ML) techniques needs to develop. Herein, this work effectively identifies RP and DJ phases of 2D LHPs by implementing various ML models. ML models were trained on 264 experimental data set using 10-fold stratified cross-validation, hyperparameter optimization with Optuna, and Shapley Additive Explanations (SHAP) were employed. The stacking classifier efficiently classified RP and DJ phases, demonstrating a minimal variation between the sensitivity and specificity and achieved a high Balance Accuracy (BA) of (0.83) on independent test data set. Our best model tested on 17 hybrid 2D LHPs and three experimental synthesized 2D LHPs aligns well experimental outcomes, a significant advance in cutting edge ML models. Thus, this proposed study has unlocked a new route toward the rational classification of RP and DJ phases of 2D LHPs.

View Publication →

Exploring Nigella sativa anticancerous properties using network pharmacology, molecular docking and molecular dynamics simulation approach for non-small cell lung cancer

01/01/2025

Authors: Chandra Sourav, Kil To Chong, Hilal Tayara

Journal: Food Bioscience

Abstract: Current treatments for non-small cell lung cancer (NSCLC) often lead to side effects and impose financial burdens on patients, necessitating the exploration of alternative therapies using natural compounds. This study aims to identify specific active compounds within black cumin seeds that could effectively manage NSCLC through network pharmacology and molecular docking techniques. Active compounds and their corresponding NSCLC targets were retrieved and screened from various databases. Protein-protein interaction (PPI) networks were developed, and Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted to identify core targets. The research identified 19 active compounds in black cumin seeds that may exert anti-cancer effects by modulating proteins such as MAPK3, STAT3, and ALB. Among these compounds, Catechin, Riboflavin, and …

View Publication →

2024

From Detection to Action: A Multimodal AI Framework for Traffic Incident Response

12/09/2024

Authors: Afaq Ahmed, Muhammad Farhan, Hassan Eesaar, Kil To Chong, Hilal Tayara

Journal: Drones

Abstract: With the rising incidence of traffic accidents and growing environmental concerns, the demand for advanced systems to ensure traffic and environmental safety has become increasingly urgent. This paper introduces an automated highway safety management framework that integrates computer vision and natural language processing for real-time monitoring, analysis, and reporting of traffic incidents. The system not only identifies accidents but also aids in coordinating emergency responses, such as dispatching ambulances, fire services, and police, while simultaneously managing traffic flow. The approach begins with the creation of a diverse highway accident dataset, combining public datasets with drone and CCTV footage. YOLOv11s is retrained on this dataset to enable real-time detection of critical traffic elements and anomalies, such as collisions and fires. A vision–language model (VLM), Moondream2, is employed to generate detailed scene descriptions, which are further refined by a large language model (LLM), GPT 4-Turbo, to produce concise incident reports and actionable suggestions. These reports are automatically sent to relevant authorities, ensuring prompt and effective response. The system’s effectiveness is validated through the analysis of diverse accident videos and zero-shot simulation testing within the Webots environment. The results highlight the potential of combining drone and CCTV imagery with AI-driven methodologies to improve traffic management and enhance public safety. Future work will include refining detection models, expanding dataset diversity, and deploying the framework in real-world scenarios using …

View Publication →

Advanced drone-based weed detection using feature-enriched deep learning approach

12/03/2024

Authors: Mobeen Ur Rehman, Hassan Eesaar, Zeeshan Abbas, Lakmal Seneviratne, Irfan Hussain, Kil To Chong

Journal: Knowledge-Based Systems

Abstract: This research addresses the pressing challenge of weed identification in agriculture, crucial for ensuring food security in anticipation of a global population exceeding 9.7 billion by 2050. Utilizing drone imagery, we collected a dataset and proposed a customized model to achieve optimal performance. Our proposed model uses strategically modified backbone, neck, and head components, leveraging elements such as Ghost Convolution, BottleNeckCSP, and ECA (Efficient Channel Attention) layers. These modifications enhance the model’s capability to discern intricate patterns in drone imagery, ultimately leading to improved precision in weed detection. We introduce a purposefully crafted dataset to complement the model’s training, and our experiments demonstrate superior performance compared to the baseline models. Our model achieves a precision of 72.5%, recall of 68.0%, and mAP@0.5 of 73.9

View Publication →

PUResNetV2. 0: a deep learning model leveraging sparse representation for improved ligand binding site prediction

12/01/2024

Authors: Kandel Jeevan, Shrestha Palistha, Hilal Tayara, Kil T Chong

Journal: Journal Of Cheminformatics

Abstract: Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.

View Publication →

Transformer-Enhanced Retinal Vessel Segmentation for Diabetic Retinopathy Detection Using Attention Mechanisms and Multi-Scale Fusion.

11/15/2024

Authors: Hyung-Joo Kim, Hassan Eesaar, Kil To Chong

Journal: Applied Sciences

Abstract:

View Publication →

GATNM: Graph with Attention Neural Network Model for Mycobacterial Cell Wall Permeability of Drugs and Drug-like Compounds

11/15/2024

Authors: Agung Surya Wibowo, Osphanie Mentari Primadianti, Hilal Tayara, Kil To Chong

Journal: Chemometrics And Intelligent Laboratory Systems

Abstract: Mycobacterium tuberculosis cell wall has complexity and unusual organization. These conditions make the nutrients and antibiotics difficult to penetrate this wall which affects the low activity of several antimycobacterial drugs in mycobacteria cells. Based on this information, the cell wall permeability prediction in some compounds becomes important and would help develop novel antitubercular drugs. Recently, there have been many predictions helped by computational technology using the Simplified Molecular Input Line Entry System (SMILES) input drug compounds. In this study, we applied computational technology to predict the permeability of cell walls to some compounds or drugs. We evaluated several common machine learning models for their ability to predict cell wall permeability. However, none of these models achieved satisfactory performance. We investigated a Graph with Attention Neural Network …

View Publication →

iAnOxPep: a machine learning model for the identification of anti-oxidative peptides using ensemble learning

11/11/2024

Authors: Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Journal: Ieee-acm Transactions On Computational Biology And Bioinformatics

Abstract: ue to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the

View Publication →

m5C-Seq: Machine learning-enhanced profiling of RNA 5-methylcytosine modifications

11/01/2024

Authors: Zeeshan Abbas, Mobeen Ur Rehman, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: Epigenetic modifications, particularly RNA methylation and histone alterations, play a crucial role in heredity, development, and disease. Among these, RNA 5-methylcytosine (m5C) is the most prevalent RNA modification in mammalian cells, essential for processes such as ribosome synthesis, translational fidelity, mRNA nuclear export, turnover, and translation. The increasing volume of nucleotide sequences has led to the development of machine learning-based predictors for m5C site prediction. However, these predictors often face challenges related to training data limitations and overfitting due to insufficient external validation. This study introduces m5C-Seq, an ensemble learning approach for RNA modification profiling, designed to address these issues. m5C-Seq employs a meta-classifier that integrates 15 probabilities generated from a novel, large dataset using systematic encoding methods to make final predictions. Demonstrating superior performance compared to existing predictors, m5C-Seq represents a significant advancement in accurate RNA modification profiling.

View Publication →

FvFold: A model to predict antibody Fv structure using protein language model with residual network and Rosetta minimization

11/01/2024

Authors: Pasang Sherpa, Kil To Chong, Hilal Tayara

Journal: Computers In Biology And Medicine

Abstract: The immune system depends on antibodies (Abs) to recognize and attach to a wide range of antigens, playing a pivotal role in immunity. The precise prediction of the variable fragment (Fv) region of antibodies is vital for the progress of therapeutic and commercial applications, particularly in the treatment of diseases such as cancer. Although deep learning models exist for accurate antibody structure prediction, challenges persist, particularly in modeling complementarity-determining regions (CDRs) and the overall antibody Fv structures. Introducing the FvFold model, a deep learning approach harnessing the capabilities of the ProtT5-XL-UniRef50 protein language model which is capable of predicting accurate antibody Fv structure. Through evaluations on various benchmarks, our model outperforms existing models, demonstrating superior accuracy by achieving lower Root Mean Square Deviation (RMSD) in almost all loops and Orientational Coordinate Distance (OCD) values in the RosettaAntibody benchmark, Therapeutic benchmark and IgFold benchmark compared to the previous top-performing model.

View Publication →

Possum: identification and interpretation of potassium ion inhibitors using probabilistic feature vectors

10/22/2024

Authors: Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Journal: Archives of Toxicology

Abstract: The flow of potassium ions through cell membranes plays a crucial role in facilitating various cell processes such as hormone secretion, epithelial function, maintenance of electrochemical gradients, and electrical impulse formation. Potassium ion inhibitors are considered promising alternatives in treating cancer, muscle weakness, renal dysfunction, endocrine disorders, impaired cellular function, and cardiac arrhythmia. Thus, it becomes essential to identify and understand potassium ion inhibitors in order to regulate the ion flow across ion channels. In this study, we created a meta-model, POSSUM, for the identification of potassium ion inhibitors. Two distinct datasets were used for training, testing, and evaluation of the meta-model. We employed seven feature descriptors and five distinctive classifiers to construct 35 baseline models. We used the mean Gini index score to select the optimal base models and classifiers. The POSSUM method was trained on the optimal probabilistic feature vectors. The proposed optimal model, POSSUM, outperforms the baseline models and the existing methods on both datasets. We anticipate POSSUM will be a very useful tool and will be essential in the process of finding and screening possible potassium ion inhibitors.

View Publication →

GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction

10/10/2024

Authors: Waqar Ahmad, Kil To Chong, Hilal Tayara

Journal: Journal Of Chemical Information And Modeling

Abstract: Aqueous solubility is a critical physicochemical property of drug discovery. Solubility is a key issue in pharmaceutical development because it can limit a drug’s absorption capacity. Accurate solubility prediction is crucial for pharmacological, environmental, and drug development studies. This research introduces a novel method for solubility prediction by combining gated graph neural networks (GGNNs) and graph attention neural networks (GATs) with Smiles2Seq encoding. Our methodology involves converting chemical compounds into graph structures with nodes representing atoms and edges indicating chemical bonds. These graphs are then processed by using a specialized graph neural network (GNN) architecture. Incorporating attention mechanisms into GNN allows for capturing subtle structural dependencies, fostering improved solubility predictions. Furthermore, we utilized the Smiles2Seq encoding technique to bridge the semantic gap between molecular structures and their textual representations. Smiles2Seq seamlessly converts chemical notations into numeric sequences, facilitating the efficient transfer of information into our model. We demonstrate the efficacy of our approach through comprehensive experiments on benchmark solubility data sets, showcasing superior predictive performance compared to traditional methods. Our model outperforms existing solubility prediction models and provides interpretable insights into the molecular features driving solubility behavior. This research signifies an important advancement in solubility prediction, offering potent tools for drug discovery, formulation development, and environmental assessments. The fusion of GGNN and Smiles2Seq encoding establishes a robust framework for accurately forecasting solubility across various chemical compounds, fostering innovation in various domains reliant on solubility data.

View Publication →

A graph neural network approach for predicting drug susceptibility in the human microbiome

09/01/2024

Authors: Maryam, Mobeen Ur Rehman, Irfan Hussain, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: Recent studies have illuminated the critical role of the human microbiome in maintaining health and influencing the pharmacological responses of drugs. Clinical trials, encompassing approximately 150 drugs, have unveiled interactions with the gastrointestinal microbiome, resulting in the conversion of these drugs into inactive metabolites. It is imperative to explore the field of pharmacomicrobiomics during the early stages of drug discovery, prior to clinical trials. To achieve this, the utilization of machine learning and deep learning models is highly desirable. In this study, we have proposed graph-based neural network models, namely GCN, GAT, and GINCOV models, utilizing the SMILES dataset of drug microbiome. Our primary objective was to classify the susceptibility of drugs to depletion by gut microbiota. Our results indicate that the GINCOV surpassed the other models, achieving impressive performance metrics, with an accuracy of 93% on the test dataset. This proposed Graph Neural Network (GNN) model offers a rapid and efficient method for screening drugs susceptible to gut microbiota depletion and also encourages the improvement of patient-specific dosage responses and formulations.

View Publication →

In Silico Exploration of Novel EGFR Kinase Mutant-Selective Inhibitors Using a Hybrid Computational Approach

08/23/2024

Authors: Md Ali Asif Noor, Md Mazedul Haq, Md Arifur Rahman Chowdhury, Hilal Tayara, HyunJoo Shim, Kil To Chong

Journal: Pharmaceutics

Abstract: Targeting epidermal growth factor receptor (EGFR) mutants is a promising strategy for treating non-small cell lung cancer (NSCLC). This study focused on the computational identification and characterization of potential EGFR mutant-selective inhibitors using pharmacophore design and validation by deep learning, virtual screening, ADMET (Absorption, distribution, metabolism, excretion and toxicity), and molecular docking-dynamics simulations. A pharmacophore model was generated using Pharmit based on the potent inhibitor JBJ-125, which targets the mutant EGFR (PDB 5D41) and is used for the virtual screening of the Zinc database. In total, 16 hits were retrieved from 13,127,550 molecules and 122,276,899 conformers. The pharmacophore model was validated via DeepCoy, generating 100 inactive decoy structures for each active molecule and ADMET tests were conducted using SWISS ADME and PROTOX 3.0. Filtered compounds underwent molecular docking studies using Glide, revealing promising interactions with the EGFR allosteric site along with better docking scores. Molecular dynamics (MD) simulations confirmed the stability of the docked conformations. These results bring out five novel compounds that can be evaluated as single agents or in combination with existing therapies, holding promise for treating the EGFR-mutant NSCLC.

View Publication →

Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model

08/07/2024

Authors: Palistha Shrestha, Jeevan Kandel, Hilal Tayara, Kil To Chong

Journal: Nature Communications

Abstract: Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model’s final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.

View Publication →

Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance

08/01/2024

Authors: Osphanie Mentari, Muhammad Shujaat, Hilal Tayara, Kil T Chong

Journal: Current Bioinformatics

Abstract: One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing. Objective: Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores. Methods: In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation. Results: The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%. Conclusion: Our proposed model outperforms previous studies in accuracy categories.

View Publication →

NaII-Pred: An ensemble-learning framework for the identification and interpretation of sodium ion inhibitors by fusing multiple feature representation

08/01/2024

Authors: Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: High-affinity ligand peptides for ion channels are essential for controlling the flow of ions across the plasma membrane. These peptides are now being investigated as possible therapeutic possibilities for a variety of illnesses, including cancer and cardiovascular disease. So, the identification and interpretation of ligand peptide inhibitors to control ion flow across cells become pivotal for exploration. In this work, we developed an ensemble-based model, NaII-Pred, for the identification of sodium ion inhibitors. The ensemble model was trained, tested, and evaluated on three different datasets. The NaII-Pred method employs six different descriptors and a hybrid feature set in conjunction with five conventional machine learning classifiers to create 35 baseline models. Through an ensemble approach, the top five baseline models trained on the hybrid feature set were integrated to yield the final predictive model, NaII-Pred. Our proposed model, NaII-Pred, outperforms the baseline models and the current predictors on both datasets. We believe NaII-Pred will play a critical role in screening and identifying potential sodium ion inhibitors and will be an invaluable tool.

View Publication →

Generative AI in the Advancement of Viral Therapeutics for Predicting and Targeting Immune-Evasive SARS-CoV-2 Mutations

07/23/2024

Authors: Prem Singh Bist, Hilal Tayara, Kil To Chong

Journal: IEEE Journal of Biomedical and Health Informatics

Abstract: The emergence of immune-evasive mutations in the SARS-CoV-2 spike protein is consistently challenging existing vaccines and therapies, making precise prediction of their escape potential a critical imperative. Artificial Intelligence(AI) holds great promise for deciphering the intricate language of protein. Here, we employed a Generative Adversarial Network to decipher the hidden escape pathways within the spike protein by generating spikes that closely resemble natural ones. Through comprehensive analysis, we demonstrated that generated sequences capture natural escape characteristics. Moreover, incorporating these sequences into an AI-based escape prediction model significantly enhanced its performance, achieving a 7% increase in detecting natural escape mutations on the experimentally validated Greaney dataset. Similar improvements were observed on other datasets, demonstrating the model's generalizability. Precisely predicting immune-evasive spikes not only enables the design of strategically targeted therapies but also has the potential to expedite future viral therapeutics. This breakthrough carries profound implications for shaping a more resilient future against viral threats.

View Publication →

AntiCPs-CompML: A Comprehensive Fast Track ML method to predict Anti-Corona Peptides

06/27/2024

Authors: Prem Singh Bist, Sadik Bhattarai, Hilal Tayara, Kil To Chong

Journal: Cold Spring Harbor Laboratory

Abstract: This work introduces AntiCPs-CompML, a novel Machine learning framework for the rapid identification of anti-coronavirus peptides (ACPs). ACPs, acting as viral shields, offer immense potential for COVID-19 therapeutics. However, traditional laboratory methods for ACP discovery are slow and expensive. AntiCPs-CompML addresses this challenge by utilizing three primary features for peptide sequence analysis: Amino Acid Composition (AAC), Pseudo Amino Acid Composition (PAAC), and Composition-Transition-Distribution (CTD). The framework leverages 26 different machine learning algorithms to effectively predict potential anti-coronavirus peptides. This capability allows for the analysis of vast datasets and the identification of peptides with hallmarks of effective ACPs. AntiCPs-CompML boasts unprecedented speed and cost-effectiveness, significantly accelerating the discovery process while enhancing research efficiency by filtering out less promising options. This method holds promise for developing therapeutic drugs for COVID-19 and potentially other viruses. Our model demonstrates strong performance with an F1 Score of 92.12% and a Roc AUC of 76% in the independent test dataset. Despite these promising results, we are continuously working to refine the model and explore its generalizability to unseen datasets. Future enhancements will include featurebased and oversampling augmentation strategies addressing the limitation of anti-covid peptide data for comprehensive study, along with concrete feature selection algorithms, to further refine the model’s predictive power. AntiCPs-CompML ushers in a new era of expedited anti-covid peptides discovery, accelerating the development of novel antiviral therapies.

View Publication →

SB-Net: Synergizing CNN and LSTM Networks for Uncovering Retrosynthetic Pathways in Organic Synthesis

06/15/2024

Authors: Bilal Ahmad Mir, Hilal Tayara, Kil To Chong

Journal: Computational Biology And Chemistry

Abstract: Retrosynthesis is vital in synthesizing target products, guiding reaction pathway design crucial for drug and material discovery. Current models often neglect multi-scale feature extraction, limiting efficacy in leveraging molecular descriptors. Our proposed SB-Net model, a deep-learning architecture tailored for retrosynthesis prediction, addresses this gap. SB-Net combines CNN and Bi-LSTM architectures, excelling in capturing multi-scale molecular features. It integrates parallel branches for processing one-hot encoded descriptors and ECFP, merging through dense layers. Experimental results demonstrate SB-Net’s superiority, achieving 73.6 % top-1 and 94.6 % top-10 accuracy on USPTO-50k data. Versatility is validated on MetaNetX, with rates of 52.8 % top-1, 74.3 % top-3, 79.8 % top-5, and 83.5 % top-10. SB-Net’s success in bioretrosynthesis prediction tasks indicates its efficacy. This research advances computational chemistry, offering a robust deep-learning model for retrosynthesis prediction. With implications for drug discovery and synthesis planning, SB-Net promises innovative and efficient pathways.

View Publication →

Advancing Peptide-Based Cancer Therapy with AI: In-Depth Analysis of State-of-the-Art AI Models

06/14/2024

Authors: Sadik Bhattarai, Hilal Tayara, Kil To Chong

Journal: Journal Of Chemical Information And Modeling

Abstract: Anticancer peptides (ACPs) play a vital role in selectively targeting and eliminating cancer cells. Evaluating and comparing predictions from various machine learning (ML) and deep learning (DL) techniques is challenging but crucial for anticancer drug research. We conducted a comprehensive analysis of 15 ML and 10 DL models, including the models released after 2022, and found that support vector machines (SVMs) with feature combination and selection significantly enhance overall performance. DL models, especially convolutional neural networks (CNNs) with light gradient boosting machine (LGBM) based feature selection approaches, demonstrate improved characterization. Assessment using a new test data set (ACP10) identifies ACPred, MLACP 2.0, AI4ACP, mACPred, and AntiCP2.0_AAC as successive optimal predictors, showcasing robust performance. Our review underscores current prediction tool limitations and advocates for an omnidirectional ACP prediction framework to propel ongoing research.

View Publication →

AMPred-CNN: Ames mutagenicity prediction model based on convolutional neural networks

06/01/2024

Authors: Thi Tuyet Van Tran, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: Mutagenicity assessment plays a pivotal role in the safety evaluation of chemicals, pharmaceuticals, and environmental compounds. In recent years, the development of robust computational models for predicting chemical mutagenicity has gained significant attention, driven by the need for efficient and cost-effective toxicity assessments. In this paper, we proposed AMPred-CNN, an innovative Ames mutagenicity prediction model based on Convolutional Neural Networks (CNNs), uniquely employing molecular structures as images to leverage CNNs’ powerful feature extraction capabilities. The study employs the widely used benchmark mutagenicity dataset from Hansen et al. for model development and evaluation. Comparative analyses with traditional ML models on different molecular features reveal substantial performance enhancements. AMPred-CNN outshines these models, demonstrating superior accuracy, AUC, F1 score, MCC, sensitivity, and specificity on the test set. Notably, AMPred-CNN is further benchmarked against seven recent ML and DL models, consistently showcasing superior performance with an impressive AUC of 0.954. Our study highlights the effectiveness of CNNs in advancing mutagenicity prediction, paving the way for broader applications in toxicology and drug development.

View Publication →

An Ensemble Classifiers for Improved Prediction of Native–Non-Native Protein–Protein Interaction

05/29/2024

Authors: Nor Kumalasari Caecar Pratiwi, Hilal Tayara, Kil To Chong

Journal: International Journal Of Molecular Sciences

Abstract:

View Publication →

Stack-AAgP: Computational prediction and interpretation of anti-angiogenic peptides using a meta-learning framework

05/01/2024

Authors: Saima Gaffar, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: Angiogenesis plays a vital role in the pathogenesis of several human diseases, particularly in the case of solid tumors. In the realm of cancer treatment, recent investigations into peptides with anti-angiogenic properties have yielded encouraging outcomes, thereby creating a hopeful therapeutic avenue for the treatment of cancer. Therefore, correctly identifying the anti-angiogenic peptides is extremely important in comprehending their biophysical and biochemical traits, laying the groundwork for uncovering novel drugs to combat cancer. In this work, we present a novel ensemble-learning-based model, Stack-AAgP, specifically designed for the accurate identification and interpretation of anti-angiogenic peptides (AAPs). Initially, a feature representation approach is employed, generating 24 baseline models through six machine learning algorithms (random forest [RF], extra tree classifier [ETC], extreme gradient boosting [XGB], light gradient boosting machine [LGBM], CatBoost, and SVM) and four feature encodings (pseudo-amino acid composition [PAAC], amphiphilic pseudo-amino acid composition [APAAC], composition of k-spaced amino acid pairs [CKSAAP], and quasi-sequence-order [QSOrder]). Subsequently, the output (predicted probabilities) from 24 baseline models was inputted into the same six machine-learning classifiers to generate their respective meta-classifiers. Finally, the meta-classifiers were stacked together using the ensemble-learning framework to construct the final predictive model. Findings from the independent test demonstrate that Stack-AAgP outperforms the state-of-the-art methods by a considerable margin. Systematic experiments were conducted to assess the influence of hyperparameters on the proposed model. Our model, Stack-AAgP, was evaluated on the independent NT15 dataset, revealing superiority over existing predictors with an accuracy improvement ranging from 5% to 7.5% and an increase in Matthews Correlation Coefficient (MCC) from 7.2% to 12.2%.

View Publication →

Harnessing machine learning to predict cytochrome P450 inhibition through molecular properties

04/15/2024

Authors: Hamza Zahid, Hilal Tayara, Kil To Chong

Journal: Archives of Toxicology

Abstract: Cytochrome P450 enzymes are a superfamily of enzymes responsible for the metabolism of a variety of medicines and xenobiotics. Among the Cytochrome P450 family, five isozymes that include 1A2, 2C9, 2C19, 2D6, and 3A4 are most important for the metabolism of xenobiotics. Inhibition of any of these five CYP isozymes causes drug-drug interactions with high pharmacological and toxicological effects. So, the inhibition or non-inhibition prediction of these isozymes is of great importance. Many techniques based on machine learning and deep learning algorithms are currently being used to predict whether these isozymes will be inhibited or not. In this study, three different molecular or substructural properties that include Morgan, MACCS and Morgan (combined) and RDKit of the various molecules are used to train a distinct SVM model against each isozyme (1A2, 2C9, 2C19, 2D6, and 3A4). On the independent dataset, Morgan fingerprints provided the best results, while MACCS and Morgan (combined) achieved comparable results in terms of balanced accuracy (BA), sensitivity (Sn), and Mathews correlation coefficient (MCC). For the Morgan fingerprints, balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4 on an independent dataset ranged between 0.81 and 0.85, 0.61 and 0.70, 0.72 and 0.83, respectively. Similarly, on the independent dataset, MACCS and Morgan (combined) fingerprints achieved competitive results in terms of balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4, which ranged between 0.79 and 0.85, 0.59 and 0.69, 0.69 and 0.82, respectively.

View Publication →

Integrated Computational Approaches for Drug Design Targeting Cruzipain

03/27/2024

Authors: Aiman Parvez, Jeong-Sang Lee, Waleed Alam, Hilal Tayara, Kil To Chong

Journal: International Journal Of Molecular Sciences

Abstract: Cruzipain inhibitors are required after medications to treat Chagas disease because of the need for safer, more effective treatments. Trypanosoma cruzi is the source of cruzipain, a crucial cysteine protease that has driven interest in using computational methods to create more effective inhibitors. We employed a 3D-QSAR model, using a dataset of 36 known inhibitors, and a pharmacophore model to identify potential inhibitors for cruzipain. We also built a deep learning model using the Deep purpose library, trained on 204 active compounds, and validated it with a specific test set. During a comprehensive screening of the Drug Bank database of 8533 molecules, pharmacophore and deep learning models identified 1012 and 340 drug-like molecules, respectively. These molecules were further evaluated through molecular docking, followed by induced-fit docking. Ultimately, molecular dynamics simulation was performed for the final potent inhibitors that exhibited strong binding interactions. These results present four novel cruzipain inhibitors that can inhibit the cruzipain protein of T. cruzi.

View Publication →

Unveiling dominant recombination loss in perovskite solar cells with a XGBoost-based machine learning approach

03/15/2024

Authors: Basir Akbar, Hilal Tayara, Kil To Chong

Journal: iScience

Abstract: Remarkable and intelligent perovskite solar cells (PSCs) have attracted substantial attention from researchers and are undergoing rapid advancements in photovoltaic technology. These developments aim to create highly efficient energy devices with fewer dominant recombination losses within the realm of third-generation solar cells. Diverse machine learning (ML) algorithms implemented, addressing dominant losses due to recombination in PSCs, focusing on grain boundaries (GBs), interfaces, and band-to-band recombination. The extreme gradient boosting (XGBoost) classifier effectively predicts the recombination losses. Our model trained with 7-fold cross-validation to ensure generalizability and robustness. Leveraging Optuna and shapley additive explanations (SHAP) for hyperparameter optimization and investigate the influence of features on target variables, achieved 85% accuracy on over 2 million simulated data, respectively. Because of the input parameters (light intensity and open-circuit voltage), the performance evaluation measures for the dominant losses caused by the recombination predicted by proposed model were superior to those of state-of-the-art models.

View Publication →

Unlocking the therapeutic potential of drug combinations through synergy prediction using graph transformer networks

03/01/2024

Authors: Waleed Alam, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: Drug combinations are frequently used to treat cancer to reduce side effects and increase efficacy. The experimental discovery of drug combination synergy is time-consuming and expensive for large datasets. Therefore, an efficient and reliable computational approach is required to investigate these drug combinations. Advancements in deep learning can handle large datasets with various biological problems. In this study, we developed a SynergyGTN model based on the Graph Transformer Network to predict the synergistic drug combinations against an untreated cancer cell line expression profile. We represent the drug via a graph, with each node and edge of the graph containing nine types of atomic feature vectors and four bonds features, respectively. The cell lines represent based on their gene expression profiles. The drug graph was passed through the GTN layers to extract a generalized feature map for each drug pairs. The drug pair extracted features and cell-line gene expression profiles were concatenated and subsequently subjected to processing through multiple densely connected layers. SynergyGTN outperformed the state-of-the-art methods, with a receiver operating characteristic area under the curve improvement of 5% on the 5-fold cross-validation. The accuracy of SynergyGTN was further verified through three types of cross-validation tests strategies namely leave-drug-out, leave-combination-out, and leave-tissue-out, resulting in improvement in accuracy of 8%, 1%, and 2%, respectively. The Astrazeneca Dream dataset was utilized as an independent dataset to validate and assess the generalizability of the proposed method, resulting in an improvement in balanced accuracy of 13%. In conclusion, SynergyGTN is a reliable and efficient computational approach for predicting drug combination synergy in cancer treatment.

View Publication →

An integrative machine learning model for the identification of tumor T-cell antigens

03/01/2024

Authors: Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Journal: BioSystems

Abstract: The escalating global incidence of cancer poses significant health challenges, underscoring the need for innovative and more efficacious treatments. Cancer immunotherapy, a promising approach leveraging the body’s immune system against cancer, emerges as a compelling solution. Consequently, the identification and characterization of tumor T-cell antigens (TTCAs) have become pivotal for exploration. In this manuscript, we introduce TTCA-IF, an integrative machine learning-based framework designed for TTCAs identification. TTCA-IF employs ten feature encoding types in conjunction with five conventional machine learning classifiers. To establish a robust foundation, these classifiers are trained, resulting in the creation of 150 baseline models. The outputs from these baseline models are then fed back into the five classifiers, generating their respective meta-models. Through an ensemble approach, the five meta-models are seamlessly integrated to yield the final predictive model, the TTCA-IF model. Our proposed model, TTCA-IF, surpasses both baseline models and existing predictors in performance. In a comparative analysis involving nine novel peptide sequences, TTCA-IF demonstrated exceptional accuracy by correctly identifying 8 out of 9 peptides as TTCAs. As a tool for screening and pinpointing potential TTCAs, we anticipate TTCA-IF to be invaluable in advancing cancer immunotherapy.

View Publication →

An accurate prediction of drug–drug interactions and side effects by using integrated convolutional and BiLSTM networks

02/15/2024

Authors: author_1, author_2

Journal: Chemometrics And Intelligent Laboratory Systems

Abstract: Multiple drugs have gained attention for the treatment of complex diseases. However, while numerous drugs offer benefits, they also cause undesirable side effects. Accurate prediction of drug–drug interactions is crucial in drug discovery and safety research. Therefore, an efficient and reliable computational method is necessary for predicting drug–drug interactions and their associated side effects. In this study, we introduce a computational method based on integrating convolutional and BiLSTM networks to predict the types of drug–drug interactions. The Morgan fingerprints approach was utilized to encode the drug’s SMILES, and the Tanimoto coefficient structural similarity profile-based approach was used to determine similarities. These encoded drugs were passed through convolutional and BiLSTM layers to extract important feature maps. The ReLU activation function and the dense layer were employed for …

View Publication →

iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking

02/01/2024

Authors: Muhammad Shujaat, Sunggoo Yoo, Hilal Tayara, Kil To Chong

Journal: Current Bioinformatics

Abstract: Background and Objective: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.

View Publication →

DL-SPhos: Prediction of serine phosphorylation sites using transformer language model

02/01/2024

Authors: Palistha Shrestha, Jeevan Kandel, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: Serine phosphorylation plays a pivotal role in the pathogenesis of various cellular processes and diseases. Roughly 81% of human diseases have links to phosphorylation, and an overwhelming 86.4% of protein phosphorylation takes place at serine residues. In eukaryotes, over a quarter of proteins undergo phosphorylation, with more than half implicated in numerous disorders, notably cancer and reproductive system diseases. This study primarily focuses on serine-phosphorylation-driven pathogenesis and the critical role of conserved motif identification. While numerous techniques exist for predicting serine phosphorylation sites, traditional wet lab experiments are resource-intensive. Our paper introduces a cutting-edge deep learning tool for predicting S phosphorylation sites, integrating explainable AI for motif identification, a transformer language model, and deep neural network components. We trained our model on protein sequences from UniProt, validated it against the dbPTM benchmark dataset, and employed the PTMD dataset to explore motifs related to mammalian disorders. Our results highlight that our model surpasses other deep learning predictors by a significant 3%. Furthermore, we utilized the local interpretable model-agnostic explanations (LIME) approach to shed light on the predictions, emphasizing the amino acid residues crucial for S phosphorylation. Notably, our model also outperformed competitors in kinase-specific serine phosphorylation prediction on benchmark datasets.

View Publication →

In Silico Computational Method for ACP classification and Peptide Class validation Server in Bioinformatics

01/30/2024

Authors: Sadik Bhattarai, Prem Singh Bist, Hilal Tayara, Kil To Chomg

Journal: J. Living Sci. Res

Abstract: Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development of an effective computational approach is essential to recognize ACP candidates before experimental methods are used. In this study, we proposed an Ada-boosting algorithm with the base learner random forest called ACP-ADA, which integrates binary profile feature, amino acid index, and amino acid composition with a 210-dimensional feature space vector to represent the peptides. Training samples in the feature space were augmented to increase the sample size and further improve the performance of the model in the case of insufficient samples. Furthermore, we used five-fold cross-validation to find model parameters, and the cross-validation results showed that ACP-ADA outperforms existing methods for this feature combination with data augmentation in terms of performance metrics. Specifically, ACP-ADA recorded an average accuracy of 86.4% and a Mathew’s correlation coefficient of 74.01% for dataset ACP740 and 90.83% and 81.65% for dataset ACP240; consequently, it can be a very useful tool in drug development and biomedical research. Different prediction servers are available for identification of anticancer peptides where AntiCP, AntiCP2.0, mACPpred, MLACP etc. which can be validating platform for researcher working in peptide-based therapy.

View Publication →

Possum: identification and interpretation of potassium ion inhibitors using probabilistic feature vectors

01/11/2024

Authors: Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Journal: Archives of Toxicology

View Publication →

SolPredictor: predicting solubility with residual gated graph neural network

01/05/2024

Authors: Waqar Ahmad, Hilal Tayara, HyunJoo Shim, Kil To Chong

Journal: International Journal Of Molecular Sciences

Abstract: Computational methods play a pivotal role in the pursuit of efficient drug discovery, enabling the rapid assessment of compound properties before costly and time-consuming laboratory experiments. With the advent of technology and large data availability, machine and deep learning methods have proven efficient in predicting molecular solubility. High-precision in silico solubility prediction has revolutionized drug development by enhancing formulation design, guiding lead optimization, and predicting pharmacokinetic parameters. These benefits result in considerable cost and time savings, resulting in a more efficient and shortened drug development process. The proposed SolPredictor is designed with the aim of developing a computational model for solubility prediction. The model is based on residual graph neural network convolution (RGNN). The RGNNs were designed to capture long-range dependencies in graph-structured data. Residual connections enable information to be utilized over various layers, allowing the model to capture and preserve essential features and patterns scattered throughout the network. The two largest datasets available to date are compiled, and the model uses a simplified molecular-input line-entry system (SMILES) representation. SolPredictor uses the ten-fold split cross-validation Pearson correlation coefficient R2 0.79±0.02 and root mean square error (RMSE) 1.03±0.04. The proposed model was evaluated using five independent datasets. Error analysis, hyperparameter optimization analysis, and model explainability were used to determine the molecular features that were most valuable for prediction.

View Publication →

Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods.

01/04/2024

Authors: H Tayara, A Khan, J Kandel, KT Chong

Journal: Molecular Informatics

Abstract: Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for accurately and rapidly predicting perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available on GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.

View Publication →

IF-AIP: a machine learning method for the identification of anti-inflammatory peptides using multi-feature fusion strategy

01/01/2024

Authors: Saima Gaffar, Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: Background: The most commonly used therapy currently for inflammatory and autoimmune diseases is nonspecific anti-inflammatory drugs, which have various hazardous side effects. Recently, some anti-inflammatory peptides (AIPs) have been found to be a substitute therapy for inflammatory diseases like rheumatoid arthritis and Alzheimer’s. Therefore, the identification of these AIPs is an emerging topic that is equally important. Methods: In this work, we have proposed an identification model for AIPs using a voting classifier. We used eight different feature descriptors and five conventional machine-learning classifiers. The eight feature encodings were concatenated to get a hybrid feature set. The five baseline models trained on the hybrid feature set were integrated via a voting classifier. Finally, a feature selection algorithm was used to select the optimal feature set for the construction of our final model, named IF-AIP. Results: We tested the proposed model on two independent datasets. On independent data 1, the IF-AIP model shows an improvement of 3%–5.6% in terms of accuracies and 6.7%–10.8% in terms of MCC compared to the existing methods. On the independent dataset 2, our model IF-AIP shows an overall improvement of 2.9%–5.7% in terms of accuracy and 8.3%–8.6% in terms of MCC score compared to the existing methods. A comparative performance analysis was conducted between the proposed model and existing methods using a set of 24 novel peptide sequences. Notably, the IF-AIP method exhibited exceptional accuracy, correctly identifying all 24 peptides as AIPs.

View Publication →

2023

Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods

12/05/2023

Authors: Asad Khan, Jeevan Kandel, Hilal Tayara, Kil To Chong

Journal: Molecular Informatics

Abstract: Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.

View Publication →

Improving enhancer identification with a multi-classifier stacked ensemble model

12/01/2023

Authors: Bilal Ahmad Mir, Mobeen Ur Rehman, Hilal Tayara, Kil To Chong

Journal: Journal Of Molecular Biology

Abstract: Enhancers are DNA regions that are responsible for controlling the expression of genes. Enhancers are usually found upstream or downstream of a gene, or even inside a gene’s intron region, but are normally located at a distant location from the genes they control. By integrating experimental and computational approaches, it is possible to uncover enhancers within DNA sequences, which possess regulatory properties. Experimental techniques such as ChIP-seq and ATAC-seq can identify genomic regions that are associated with transcription factors or accessible to regulatory proteins. On the other hand, computational techniques can predict enhancers based on sequence features and epigenetic modifications. In our study, we have developed a multi-classifier stacked ensemble (MCSE-enhancer) model that can accurately identify enhancers. We utilized feature descriptors from various physiochemical properties as input for our six baseline classifiers and built a stacked classifier, which outperformed previous enhancer classification techniques in terms of accuracy, specificity, sensitivity, and Mathew’s correlation coefficient. Our model achieved an accuracy of 81.5%, representing a 2–3% improvement over existing models.

View Publication →

ORI-Explorer: a unified cell-specific tool for origin of replication sites prediction by feature fusion

11/01/2023

Authors: Zeeshan Abbas, Mobeen Ur Rehman, Hilal Tayara, Kil To Chong

Journal: Bioinformatics

Abstract: Motivation The origins of replication sites (ORIs) are precise regions inside the DNA sequence where the replication process begins. These locations are critical for preserving the genome’s integrity during cell division and guaranteeing the faithful transfer of genetic data from generation to generation. The advent of experimental techniques has aided in the discovery of ORIs in many species. Experimentation, on the other hand, is often more time-consuming and pricey than computational approaches, and it necessitates specific equipment and knowledge. Recently, ORI sites have been predicted using computational techniques like motif-based searches and artificial intelligence algorithms based on sequence characteristics and chromatin states. Results In this article, we developed ORI-Explorer, a unique artificial intelligence-based technique that combines multiple feature engineering techniques to train CatBoost Classifier for recognizing ORIs from four distinct eukaryotic species. ORI-Explorer was created by utilizing a unique combination of three traditional feature-encoding techniques and a feature set obtained from a deep-learning neural network model. The ORI-Explorer has significantly outperformed current predictors on the testing dataset. Furthermore, by employing the sophisticated SHapley Additive exPlanation method, we give crucial insights that aid in comprehending model success, highlighting the most relevant features vital for forecasting cell-specific ORIs. ORI-Explorer is also intended to aid community-wide attempts in discovering potential ORIs and developing innovative verifiable biological hypotheses.

View Publication →

Recent Studies of Artificial Intelligence on In Silico Drug Absorption

10/11/2023

Authors: Thi Tuyet Van Tran, Hilal Tayara, Kil To Chong

Journal: Journal Of Chemical Information And Modeling

Abstract: Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.

View Publication →

An ensemble of stacking classifiers for improved prediction of miRNA–mRNA interactions

09/01/2023

Authors: Priyash Dhakal, Hilal Tayara, Kil To Chong

Journal: Computers In Biology And Medicine

Abstract: MicroRNAs (miRNAs) are small non-coding RNA molecules that play a crucial role in regulating gene expression at the post-transcriptional level by binding to potential target sites of messenger RNAs (mRNAs), facilitated by the Argonaute family of proteins. Selecting the conservative candidate target sites (CTS) is a challenging step, considering that most of the existing computational algorithms primarily focus on canonical site types, which is a time-consuming and inefficient utilization of miRNA target site interactions. We developed a stacking classifier algorithm that addresses the CTS selection criteria using feature-encoding techniques that generates feature vectors, including k-mer nucleotide composition, dinucleotide composition, pseudo-nucleotide composition, and sequence order coupling. This innovative stacking classifier algorithm surpassed previous state-of-the-art algorithms in predicting functional miRNA targets. We evaluated the performance of the proposed model on 10 independent test datasets and obtained an average accuracy of 79.77%, which is a significant improvement of 7.26 % over previous models. This improvement shows that the proposed method has great potential for distinguishing highly functional miRNA targets and can serve as a valuable tool in biomedical and drug development research.

View Publication →

Meta-IL4: An ensemble learning approach for IL-4-inducing peptide prediction

09/01/2023

Authors: Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Journal: Methods

Abstract: The cytokine interleukin-4 (IL-4) plays an important role in our immune system. IL-4 leads the way in the differentiation of naïve T-helper 0 cells (Th0) to T-helper 2 cells (Th2). The Th2 responses are characterized by the release of IL-4. CD4+ T cells produce the cytokine IL-4 in response to exogenous parasites. IL-4 has a critical role in the growth of CD8+ cells, inflammation, and responses of T-cells. We propose an ensemble model for the prediction of IL-4 inducing peptides. Four feature encodings were extracted to build an efficient predictor: pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, quasi-sequence-order, and Shannon entropy. We developed an ensemble learning model fusion of random forest, extreme gradient boost, light gradient boosting machine, and extra tree classifier in the first layer, and a Gaussian process classifier as a meta classifier in the second layer. The outcome of the benchmarking testing dataset, with a Matthews correlation coefficient of 0.793, showed that the meta-model (Meta-IL4) outperformed individual classifiers. The highest accuracy achieved by the Meta-IL4 model is 90.70%. These findings suggest that peptides that induce IL-4 can be predicted with reasonable accuracy. These models could aid in the development of peptides that trigger the appropriate Th2 response.

View Publication →

XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites

08/02/2023

Authors: Zeeshan Abbas, Mobeen ur Rehman, Hilal Tayara, Quan Zou, Kil To Chong

Journal: Molecular Therapy

Abstract: 5-methylcytosine (m5C) is indeed a critical post-transcriptional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To precisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were combined to extract features from the RNA sequence, and then we used SHapley Additive exPlanations to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called Optuna to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets, and we compared the results with the previous methods. Our approach, m5C- pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state-of-the-art techniques.

View Publication →

iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data

08/01/2023

Authors: Sehi Park, Mobeen Ur Rehman, Farman Ullah, Hilal Tayara, Kil To Chong

Journal: Bioinformatics

Abstract: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. Results In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. Availability and implementation The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.

View Publication →

XGB5hmC: Identifier based on XGB model for RNA 5-hydroxymethylcytosine detection

07/15/2023

Authors: Agung Surya Wibowo, Hilal Tayara, Kil To Chong

Journal: Chemometrics And Intelligent Laboratory Systems

Abstract: One of the problems in bioinformatics that artificial intelligence can solve is RNA 5-hydroxymethylcytosine (5hmC) site detection, which has become increasingly important because of its benefits, such as cost savings in labor, materials, and time consumption. To create a reliable identifier, performance results must be as high as possible. In this study, we developed XGB5hmC, a high-performance identifier of RNA 5hmC. We use extreme gradient boosting (XGB) as the best model. In addition, we investigated other models, such as random forest (RF), ada boosting (AB), and gradient boosting (GB). First, IlearnPlus was used to run 15 different machine-learning models using 35 different descriptors to select the best descriptors. Then, it was decided that the composition of k-spaced nucleic acid pairs (CKSNAP), pseudo-K-tuple nucleotide composition (PseKNC), and position-specific trinucleotide propensity single strand (PSTNPss) are the best descriptors. Subsequently, the features were combined and reduced in dimension using chi-squared test filtering. Using these filtered features and the XGB model, we obtained better performance than the state-of-the-art methods. The increases in accuracy, sensitivity, specificity, and MCC values were 11.43, 15.82, 8.94, and 24.58%, respectively. This implies that our model improved as a reliable identifier to detect 5hmC.

View Publication →

Artificial intelligence in drug toxicity prediction: recent advances, challenges, and future perspectives

04/26/2023

Authors: Thi Tuyet Van Tran, Agung Surya Wibowo, Hilal Tayara, Kil To Chong

Journal: Journal Of Chemical Information And Modeling

Abstract: Toxicity prediction is a critical step in the drug discovery process that helps identify and prioritize compounds with the greatest potential for safe and effective use in humans, while also reducing the risk of costly late-stage failures. It is estimated that over 30% of drug candidates are discarded owing to toxicity. Recently, artificial intelligence (AI) has been used to improve drug toxicity prediction as it provides more accurate and efficient methods for identifying the potentially toxic effects of new compounds before they are tested in human clinical trials, thus saving time and money. In this review, we present an overview of recent advances in AI-based drug toxicity prediction, including the use of various machine learning algorithms and deep learning architectures, of six major toxicity properties and Tox21 assay end points. Additionally, we provide a list of public data sources and useful toxicity prediction tools for the research community and highlight the challenges that must be addressed to enhance model performance. Finally, we discuss future perspectives for AI-based drug toxicity prediction. This review can aid researchers in understanding toxicity prediction and pave the way for new methods of drug discovery.

View Publication →

Artificial intelligence in drug metabolism and excretion prediction: recent advances, challenges, and future perspectives

04/17/2023

Authors: Thi Tuyet Van Tran, Hilal Tayara, Kil To Chong

Journal: Pharmaceutics

Abstract: Drug metabolism and excretion play crucial roles in determining the efficacy and safety of drug candidates, and predicting these processes is an essential part of drug discovery and development. In recent years, artificial intelligence (AI) has emerged as a powerful tool for predicting drug metabolism and excretion, offering the potential to speed up drug development and improve clinical success rates. This review highlights recent advances in AI-based drug metabolism and excretion prediction, including deep learning and machine learning algorithms. We provide a list of public data sources and free prediction tools for the research community. We also discuss the challenges associated with the development of AI models for drug metabolism and excretion prediction and explore future perspectives in the field. We hope this will be a helpful resource for anyone who is researching in silico drug metabolism, excretion, and pharmacokinetic properties.

View Publication →

Recent studies of artificial intelligence on in silico drug distribution prediction

01/17/2023

Authors: Thi Tuyet Van Tran, Hilal Tayara, Kil To Chong

Journal: International Journal Of Molecular Sciences

Abstract: Drug distribution is an important process in pharmacokinetics because it has the potential to influence both the amount of medicine reaching the active sites and the effectiveness as well as safety of the drug. The main causes of 90% of drug failures in clinical development are lack of efficacy and uncontrolled toxicity. In recent years, several advances and promising developments in drug distribution property prediction have been achieved, especially in silico, which helped to drastically reduce the time and expense of screening undesired drug candidates. In this study, we provide comprehensive knowledge of drug distribution background, influencing factors, and artificial intelligence-based distribution property prediction models from 2019 to the present. Additionally, we gathered and analyzed public databases and datasets commonly utilized by the scientific community for distribution prediction. The distribution property prediction performance of five large ADMET prediction tools is mentioned as a benchmark for future research. On this basis, we also offer future challenges in drug distribution prediction and research directions. We hope that this review will provide researchers with helpful insight into distribution prediction, thus facilitating the development of innovative approaches for drug discovery.

View Publication →

Attention-Based Graph Neural Network for Molecular Solubility Prediction

01/12/2023

Authors: Waqar Ahmad, Hilal Tayara, Kil To Chong

Journal: Acs Omega

Abstract: Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation R2 and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.

View Publication →

2022

ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides

02/01/2022

Authors: S Bhattarai, KS Kim, H Tayara, KT Chong

Journal: International Journal Of Molecular Sciences

View Publication →

Research

Total Citations

h-index

i10-index

Featured Research

Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model

Unlocking the therapeutic potential of drug combinations through synergy prediction using graph transformer networks

PUResNetV2. 0: a deep learning model leveraging sparse representation for improved ligand binding site prediction

Research Publications

Therapeutic Potential of Curcuminoids in Type 2 Diabetes Mellitus (T2DM): Insights from Network Pharmacology, Molecular Docking, and Dynamics Simulations

TranP-B-site: A Transformer Enhanced Method for prediction of binding sites of Protein-protein interactions

Transforming Highway Safety With Autonomous Drones and AI: A Framework for Incident Detection and Emergency Response

Enhancing DILI toxicity prediction through integrated graph attention (GATNN) and dense neural networks (DNN)

CpGFuse: a holistic approach for accurate identification of methylation states of DNA CpG sites

Analysis of Ruddlesden‐Popper and Dion‐Jacobson 2D Lead Halide Perovskites Through Integrated Experimental and Computational Analysis

Exploring Nigella sativa anticancerous properties using network pharmacology, molecular docking and molecular dynamics simulation approach for non-small cell lung cancer

From Detection to Action: A Multimodal AI Framework for Traffic Incident Response

Advanced drone-based weed detection using feature-enriched deep learning approach

PUResNetV2. 0: a deep learning model leveraging sparse representation for improved ligand binding site prediction

Transformer-Enhanced Retinal Vessel Segmentation for Diabetic Retinopathy Detection Using Attention Mechanisms and Multi-Scale Fusion.

GATNM: Graph with Attention Neural Network Model for Mycobacterial Cell Wall Permeability of Drugs and Drug-like Compounds

iAnOxPep: a machine learning model for the identification of anti-oxidative peptides using ensemble learning

m5C-Seq: Machine learning-enhanced profiling of RNA 5-methylcytosine modifications

FvFold: A model to predict antibody Fv structure using protein language model with residual network and Rosetta minimization

Possum: identification and interpretation of potassium ion inhibitors using probabilistic feature vectors

GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction

A graph neural network approach for predicting drug susceptibility in the human microbiome

In Silico Exploration of Novel EGFR Kinase Mutant-Selective Inhibitors Using a Hybrid Computational Approach

Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model

Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance

NaII-Pred: An ensemble-learning framework for the identification and interpretation of sodium ion inhibitors by fusing multiple feature representation

Generative AI in the Advancement of Viral Therapeutics for Predicting and Targeting Immune-Evasive SARS-CoV-2 Mutations

AntiCPs-CompML: A Comprehensive Fast Track ML method to predict Anti-Corona Peptides

SB-Net: Synergizing CNN and LSTM Networks for Uncovering Retrosynthetic Pathways in Organic Synthesis

Advancing Peptide-Based Cancer Therapy with AI: In-Depth Analysis of State-of-the-Art AI Models

AMPred-CNN: Ames mutagenicity prediction model based on convolutional neural networks

An Ensemble Classifiers for Improved Prediction of Native–Non-Native Protein–Protein Interaction

Stack-AAgP: Computational prediction and interpretation of anti-angiogenic peptides using a meta-learning framework

Harnessing machine learning to predict cytochrome P450 inhibition through molecular properties

Integrated Computational Approaches for Drug Design Targeting Cruzipain

Unveiling dominant recombination loss in perovskite solar cells with a XGBoost-based machine learning approach

Unlocking the therapeutic potential of drug combinations through synergy prediction using graph transformer networks

An integrative machine learning model for the identification of tumor T-cell antigens

An accurate prediction of drug–drug interactions and side effects by using integrated convolutional and BiLSTM networks

iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking

DL-SPhos: Prediction of serine phosphorylation sites using transformer language model

In Silico Computational Method for ACP classification and Peptide Class validation Server in Bioinformatics

Possum: identification and interpretation of potassium ion inhibitors using probabilistic feature vectors

SolPredictor: predicting solubility with residual gated graph neural network

Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods.

IF-AIP: a machine learning method for the identification of anti-inflammatory peptides using multi-feature fusion strategy

Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods

Improving enhancer identification with a multi-classifier stacked ensemble model

ORI-Explorer: a unified cell-specific tool for origin of replication sites prediction by feature fusion

Recent Studies of Artificial Intelligence on In Silico Drug Absorption

An ensemble of stacking classifiers for improved prediction of miRNA–mRNA interactions

Meta-IL4: An ensemble learning approach for IL-4-inducing peptide prediction

XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites

iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data

XGB5hmC: Identifier based on XGB model for RNA 5-hydroxymethylcytosine detection

Artificial intelligence in drug toxicity prediction: recent advances, challenges, and future perspectives

Artificial intelligence in drug metabolism and excretion prediction: recent advances, challenges, and future perspectives

Sars-escape network for escape prediction of SARS-COV-2

iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters

Prediction of organic material band gaps using graph attention network

Recent studies of artificial intelligence on in silico drug distribution prediction

Attention-Based Graph Neural Network for Molecular Solubility Prediction

ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides