Training Deep Learning AI to Predict microRNA-Gene Interactions

Training Deep Learning AI to Predict microRNA-Gene Interactions

Apr 3, 2025 | Life Sciences & Biology, Medical & Health Sciences

Non-coding microRNAs (miRNAs) have important regulatory functions but are also implicated in various diseases. Mr Seung-won Yoon, PhD candidate at Chungnam National University, Republic of Korea, is training deep learning AI models to predict miRNA-gene associations. His research has implications for understanding disease pathogenesis, particularly cancer, and repurposing drugs for untreatable diseases.

The Diverse miRNAome

Think of RNA, and you probably think of messenger RNA (mRNA), transfer RNA (tRNA), or perhaps even ribosomal RNA. However, RNA’s scope extends far beyond protein synthesis. Cells contain a plethora of microRNA (miRNA) molecules – small non-coding single-stranded RNAs, around 22 nucleotides long. miRNAs regulate gene expression by hybridising to mRNA gene transcripts, usually to the mRNA’s 3’-UTR. This leads to gene silencing via various mechanisms – repressing translation, mRNA deadenylation, or activating mRNA cleavage.

The miRNAome is turning out to be more diverse than was ever thought possible. At least 2,000 human miRNAs have been identified as being important to survival. It is discovered that miRNAs are pivotal to a host of biological processes, including cell division, differentiation and death, nervous system development, immunity, and signal transduction. Conversely, miRNAs have been implicated in cellular dysfunction, leading to diseases such as cancers. Despite the physiological significance of miRNAs, their interactions with gene mRNA transcripts are not fully known. Hoping to elucidate these interactions is Mr Seoung-Won Yoon, a graduate student at Chungnam National University, South Korea.

Probing the miRNAome with Deep Learning AI

Mr Yoon is a PhD candidate in Professor Kyu-Chul Lee’s group at Chungnam’s Department of Computer Science and Engineering. The group has previously applied advanced computational approaches to various real-world applications, including databases and the internet-of-things. In recent years, they have turned their computational expertise to bioinformatics applications.

As miRNAs and gene transcripts cannot be observed directly, wet lab experiments are inadequate for studying their interactions, being complicated, time-consuming, and expensive. Instead, Mr Yoon and the group are deploying deep learning models to predict human miRNA-gene associations. Given the diversity of the miRNAome and putative gene targets, data-heavy computational methods are needed. Deep learning lends itself well to this, as it can identify sequence and mechanistic features of miRNAs and genes, and predict relationships.

Machine learning (ML) is a branch of AI with enormous potential in the biological sciences. In ML, algorithms take in data, and generalise to unknown data. Artificial neural networks (NNs) are a type of ML consisting of a network of connected nodes. In NNs, a signal is taken in via an input layer, passed through hidden layers, and then out through an output layer. A common analogy is that of the brain, with nodes representing neurons, and links between nodes representing synapses. Deep learning refers to an advanced type of NN with multiple hidden layers. Just like a brain, it’s capable of advanced learning.

Positive Training

Much like our own brains, AI must be trained for optimum ‘cognitive’ performance – no mean feat, as massive datasets are needed! To train their deep learning models, Mr Yoon and the group used three datasets. Firstly, they used a dataset of proven positive miRNA-gene associations. These were taken from miRTarBase, a curated database of over half a million miRNA-gene association pairs. After filtering out duplicates, 380,634 pairs remained. They used two additional databases, miRBase and biomaRt. miRBase contains sequence information and annotation information for miRNAs. biomaRt is an open dataset of gene sequences from the European Bioinformatics Institute. Of the 380,634 pairs, miRNAs and genes lacking sequence information on miRBase or biomaRt were excluded. This resulted in 2656 miRNA and 14,319 gene sequences, generating a total of 38,031,264 datasets (2656 × 14,319) and 358,864 positive miRNA-gene relationships.

The next step is ‘data embedding’. This involves extraction and vectorising of sequence features of the 2656 miRNAs and 14,319 genes. Each data element was embedded in 64 dimensions.

Balancing the Positive with the Negative

When training deep learning models, it’s beneficial to have a balanced dataset – not only positive data but also negative data (with no interaction). As negative data do not exist in nature, they must be curated. Negative data may be generated randomly, but this is not the most robust way. It’s better to generate negative data methodically using sophisticated criteria. To generate the negative data, the 358,864 positive relationships were removed from the 38,031,264 datasets. Further data were filtered out using ‘distance’ criteria (Euclidean distance, cosine similarity, and Mahalanobis distance) – this works as the embedded datasets exist in vector space. After filtering, 4,932,554 negative candidates were obtained. From these, 358,864 negative datasets were randomly selected to exactly balance the positive datasets (1:1 ratio). This yielded 717,728 data elements (358,864 + 358,864), each representing 124 dimensions. This is the largest sequence dataset ever constructed for miRNA-gene associations.

Unidirectional and Bidirectional Deep Learning

Mr Yoon and the group investigated two different types of deep learning model – long short-term memory (LSTM) and bidirectional LSTM (Bi-LSTM). Both are recurrent NNs (RNNs), a type of NN suitable for sequential data. LSTMs have a feature known as cell state that allows them to ‘remember’ and predict future data points. Traditional LSTMs input data unidirectionally front-to-back. In contrast, Bi-LSTMs input data both forwards and backwards.

The group tested an LSTM with three layers and a Bi-LSTM with two layers. The 717,728 data elements were divided into training data and test data in an 8:2 ratio, and fed into both models. Which model has the better performance in predicting miRNA-gene associations? In principle, this should be the Bi-LSTM, as the bidirectional information flow provides a richer representation. However, this takes up computational power, and the group found that the bi-LSTM had a slower training time than the LSTM. Instead, they considered the simpler yet faster LSTM as being more appropriate for the miRNA-gene dataset and selected it as their deep learning model.

Assessing Model Performance

A metric known as the area under the receiver operating characteristic curve (AUC) may be used to assess the performance of deep learning models. This takes into account ‘true positives’ and the avoidance of ‘false negatives’. Using a statistical method called K-fold cross-validation, the group determined that the LSTM model’s average AUC was 0.98 – close to 1.0 (the maximum), indicating a very good generalisation performance. Finally, they validated the model, confirming that it is able to uncover novel miRNA-gene association pairs not present in their positive training dataset.

Predicting BRCA2-associated miRNAs

The influence of miRNAs on cancer is a lively field of research. Variants or mutations of certain genes are implicated in various cancers. How do miRNA-gene interactions contribute to this? This is an important research question for Mr Yoon. BRCA2 is a human gene encoding a protein that repairs DNA replication errors and regulates the cell cycle. BRCA2 mutations can lead to malignant cells with unrepaired DNA damage, implicated in breast, ovarian, and prostate cancers. The group used their deep learning model to predict miRNAs that associate with BRCA2. Curiously, among the top 10 predicted candidates were miRNAs with known associations with prostate, ovarian, breast, and cervical cancers.

Going forward, Mr Yoon and the group want to further train the deep learning model for better prediction of miRNA-gene associations, and understand how these translate to disease phenotypes. Beyond BRCA2, they will focus on incurable and intractable diseases, deploying deep learning approaches to elucidate the pathogenetic entities involved. They hope to apply their deep learning insights to repurposing drugs to treat currently incurable diseases.

SHARE

DOWNLOAD E-BOOK

REFERENCE

https://doi.org/10.33548/SCIENTIA1185

MEET THE RESEARCHERS

Mr Seoung-Won Yoon
Chungnam National University, Yuseong-Gu, Daejeon, Republic of Korea

Mr Seoung-Won Yoon is a PhD Candidate at Chungnam National University, Republic of Korea, under the supervision of Professor Kyuchul Lee in the Computer Science & Engineering Department. Mr Yoon’s research involves developing deep learning models, primarily for bioinformatics and drug repurposing. He has so far participated in more than 10 nationally funded projects. His work on deep learning has included model development for pancreatic cancer-related genes, miRNA and mRNA prediction, and predicting the relationships between bio-genetic data. Additionally, he has conducted research projects in risk prediction, evaluating the performance of user movement path predictions, and assessing similarity patterns in user data. Furthermore, he has worked on an internet-of-things-based wearable flu vaccine project aimed at preventing the spread of influenza.

CONTACT

E: yoonenoch11@gmail.com

Professor Kyuchul Lee
Chungnam National University, Yuseong-Gu, Daejeon, Republic of Korea

Professor Kyuchul Lee received his Bachelor’s degree in Computer Science from Seoul National University in 1984 and a PhD in the same field from the same university in 1990. He has been a faculty member at Chungnam National University since 1989 and has also held the positions of visiting researcher at IBM Almaden Center and visiting professor at Syracuse University. At the time of writing, he has successfully led 144 national research projects, published 56 international journal papers, 97 international conference presentations, 125 domestic journal papers, 206 domestic conference presentations, and holds 46 patents and intellectual property records. Additionally, he has facilitated 12 technology transfers. As an advisor in the Data Artificial Intelligence Research Lab (formerly Database Systems Lab), he has mentored over 110 Master’s and PhD students, developing core technologies in the fields of databases and AI. His research spans across multimedia data, XML, semantic web, IoT, AI, and big data. Through courses like File Processing, Database Systems, and Web Programming, he has equipped students with the knowledge to excel in their professional careers.

FUNDING

National Research Foundation of Korea (NRF)

Korean Government (MSIT)

FURTHER READING

S Yoon, I Hwang, J Cho, et al., miGAP: miRNA–Gene Association Prediction Method Based on Deep Learning Model, Applied Sciences, 2023, 13(22), 12349. DOI: https://doi.org/10.3390/app132212349

REPUBLISH OUR ARTICLES

We encourage all formats of sharing and republishing of our articles. Whether you want to host on your website, publication or blog, we welcome this. Find out more

Creative Commons Licence (CC BY 4.0)

This work is licensed under a Creative Commons Attribution 4.0 International License.

What does this mean?

Share: You can copy and redistribute the material in any medium or format

Adapt: You can change, and build upon the material for any purpose, even commercially.

Credit: You must give appropriate credit, provide a link to the license, and indicate if changes were made.

SUBSCRIBE NOW

← PREVIOUS ARTICLE NEXT ARTICLE →

MORE ARTICLES YOU MAY LIKE

Dr Yurii V. Geletii – Professor Craig L. Hill | Redox Buffers: Self-Regulating Catalysts for Chemical Oxidation

Chemical reactions often demand precise control over their operating conditions to proceed efficiently. While chemists routinely use pH buffers to stabilise acidity levels, far less attention has been directed towards stabilising the electrochemical potential of solutions during oxidation–reduction reactions.
At Emory University, Dr Xinlin Lu, Dr Yurii Geletii, and Prof Craig Hill have pioneered a catalytic system that not only drives chemical reactions, but also acts as its own redox buffer. By automatically maintaining conditions optimal for electron transfers while converting malodorous thiols into odourless compounds, this innovation points to a new generation of catalysts that adjust themselves, delivering both efficiency and environmental benefits.

Dr Marie-Lou Gaucher | Unravelling Necrotic Enteritis in Poultry: The Quest for an Effective Vaccine

Avian necrotic enteritis (NE) is one of the most significant intestinal diseases affecting poultry worldwide, particularly broiler chickens. It causes major economic losses due to reduced growth rates, poor feed efficiency, and high mortality. The disease is caused by the bacterium Clostridium perfringens, specifically pathogenic type G strains. Dr Marie-Lou Gaucher from the Université de Montréal and her collaborators have been relentlessly studying ways to develop an effective vaccine against C. perfringens. Their promising findings may lead to innovative vaccination strategies and new methods to manage NE in poultry flocks.

Professor Abraham P. Lee | Delivering Cancer Immunotherapy with Acoustic-Electric Precision, AESOP’s Fact not Fable

Chimeric Antigen Receptor (CAR) T-cell therapy offers life-saving potential, particularly against blood cancers, but severe side effects such as cytokine release syndrome (CRS) limit its safety. These toxicities are linked to uncontrolled CAR expression levels on the T-cell surface. Led by Professor Abraham P. Lee, researchers at the University of California, Irvine, have developed an advanced microfluidic system, called the Acoustic-Electric Shear Orbiting Poration (AESOP) platform, to precisely control the dose of genetic material delivered into primary T cells. This innovation promises safer, more homogeneous, and highly effective cellular immunotherapies.

Dr Ray Stewart | Barriers to Dental Care for People with Special Needs: A Crisis of Neglect and Inaction

For people with special healthcare needs, something as basic as visiting a dentist can be nearly impossible. A ground-breaking paper by researchers at the University of California, San Francisco (UCSF) exposes the scale of this crisis. By outlining potential paths forward, Dr Ray Stewart and Dr Ben Meisel offer hope for significant improvements in access to essential dental care.