Training Deep Learning AI to Predict microRNA-Gene Interactions

Apr 3, 2025 | Life Sciences & Biology, Medical & Health Sciences

Non-coding microRNAs (miRNAs) have important regulatory functions but are also implicated in various diseases. Mr Seung-won Yoon, PhD candidate at Chungnam National University, Republic of Korea, is training deep learning AI models to predict miRNA-gene associations. His research has implications for understanding disease pathogenesis, particularly cancer, and repurposing drugs for untreatable diseases.

The Diverse miRNAome

Think of RNA, and you probably think of messenger RNA (mRNA), transfer RNA (tRNA), or perhaps even ribosomal RNA. However, RNA’s scope extends far beyond protein synthesis. Cells contain a plethora of microRNA (miRNA) molecules – small non-coding single-stranded RNAs, around 22 nucleotides long. miRNAs regulate gene expression by hybridising to mRNA gene transcripts, usually to the mRNA’s 3’-UTR. This leads to gene silencing via various mechanisms – repressing translation, mRNA deadenylation, or activating mRNA cleavage.

The miRNAome is turning out to be more diverse than was ever thought possible. At least 2,000 human miRNAs have been identified as being important to survival. It is discovered that miRNAs are pivotal to a host of biological processes, including cell division, differentiation and death, nervous system development, immunity, and signal transduction. Conversely, miRNAs have been implicated in cellular dysfunction, leading to diseases such as cancers. Despite the physiological significance of miRNAs, their interactions with gene mRNA transcripts are not fully known. Hoping to elucidate these interactions is Mr Seoung-Won Yoon, a graduate student at Chungnam National University, South Korea.

Probing the miRNAome with Deep Learning AI

Mr Yoon is a PhD candidate in Professor Kyu-Chul Lee’s group at Chungnam’s Department of Computer Science and Engineering. The group has previously applied advanced computational approaches to various real-world applications, including databases and the internet-of-things. In recent years, they have turned their computational expertise to bioinformatics applications.

As miRNAs and gene transcripts cannot be observed directly, wet lab experiments are inadequate for studying their interactions, being complicated, time-consuming, and expensive. Instead, Mr Yoon and the group are deploying deep learning models to predict human miRNA-gene associations. Given the diversity of the miRNAome and putative gene targets, data-heavy computational methods are needed. Deep learning lends itself well to this, as it can identify sequence and mechanistic features of miRNAs and genes, and predict relationships.

Machine learning (ML) is a branch of AI with enormous potential in the biological sciences. In ML, algorithms take in data, and generalise to unknown data. Artificial neural networks (NNs) are a type of ML consisting of a network of connected nodes. In NNs, a signal is taken in via an input layer, passed through hidden layers, and then out through an output layer. A common analogy is that of the brain, with nodes representing neurons, and links between nodes representing synapses. Deep learning refers to an advanced type of NN with multiple hidden layers. Just like a brain, it’s capable of advanced learning.

Positive Training

Much like our own brains, AI must be trained for optimum ‘cognitive’ performance – no mean feat, as massive datasets are needed! To train their deep learning models, Mr Yoon and the group used three datasets. Firstly, they used a dataset of proven positive miRNA-gene associations. These were taken from miRTarBase, a curated database of over half a million miRNA-gene association pairs. After filtering out duplicates, 380,634 pairs remained. They used two additional databases, miRBase and biomaRt. miRBase contains sequence information and annotation information for miRNAs. biomaRt is an open dataset of gene sequences from the European Bioinformatics Institute. Of the 380,634 pairs, miRNAs and genes lacking sequence information on miRBase or biomaRt were excluded. This resulted in 2656 miRNA and 14,319 gene sequences, generating a total of 38,031,264 datasets (2656 × 14,319) and 358,864 positive miRNA-gene relationships.

The next step is ‘data embedding’. This involves extraction and vectorising of sequence features of the 2656 miRNAs and 14,319 genes. Each data element was embedded in 64 dimensions.

Balancing the Positive with the Negative

When training deep learning models, it’s beneficial to have a balanced dataset – not only positive data but also negative data (with no interaction). As negative data do not exist in nature, they must be curated. Negative data may be generated randomly, but this is not the most robust way. It’s better to generate negative data methodically using sophisticated criteria. To generate the negative data, the 358,864 positive relationships were removed from the 38,031,264 datasets. Further data were filtered out using ‘distance’ criteria (Euclidean distance, cosine similarity, and Mahalanobis distance) – this works as the embedded datasets exist in vector space. After filtering, 4,932,554 negative candidates were obtained. From these, 358,864 negative datasets were randomly selected to exactly balance the positive datasets (1:1 ratio). This yielded 717,728 data elements (358,864 + 358,864), each representing 124 dimensions. This is the largest sequence dataset ever constructed for miRNA-gene associations.

Unidirectional and Bidirectional Deep Learning

Mr Yoon and the group investigated two different types of deep learning model – long short-term memory (LSTM) and bidirectional LSTM (Bi-LSTM). Both are recurrent NNs (RNNs), a type of NN suitable for sequential data. LSTMs have a feature known as cell state that allows them to ‘remember’ and predict future data points. Traditional LSTMs input data unidirectionally front-to-back. In contrast, Bi-LSTMs input data both forwards and backwards.

The group tested an LSTM with three layers and a Bi-LSTM with two layers. The 717,728 data elements were divided into training data and test data in an 8:2 ratio, and fed into both models. Which model has the better performance in predicting miRNA-gene associations? In principle, this should be the Bi-LSTM, as the bidirectional information flow provides a richer representation. However, this takes up computational power, and the group found that the bi-LSTM had a slower training time than the LSTM. Instead, they considered the simpler yet faster LSTM as being more appropriate for the miRNA-gene dataset and selected it as their deep learning model.

Assessing Model Performance

A metric known as the area under the receiver operating characteristic curve (AUC) may be used to assess the performance of deep learning models. This takes into account ‘true positives’ and the avoidance of ‘false negatives’. Using a statistical method called K-fold cross-validation, the group determined that the LSTM model’s average AUC was 0.98 – close to 1.0 (the maximum), indicating a very good generalisation performance. Finally, they validated the model, confirming that it is able to uncover novel miRNA-gene association pairs not present in their positive training dataset.

Predicting BRCA2-associated miRNAs

The influence of miRNAs on cancer is a lively field of research. Variants or mutations of certain genes are implicated in various cancers. How do miRNA-gene interactions contribute to this? This is an important research question for Mr Yoon. BRCA2 is a human gene encoding a protein that repairs DNA replication errors and regulates the cell cycle. BRCA2 mutations can lead to malignant cells with unrepaired DNA damage, implicated in breast, ovarian, and prostate cancers. The group used their deep learning model to predict miRNAs that associate with BRCA2. Curiously, among the top 10 predicted candidates were miRNAs with known associations with prostate, ovarian, breast, and cervical cancers. 

Going forward, Mr Yoon and the group want to further train the deep learning model for better prediction of miRNA-gene associations, and understand how these translate to disease phenotypes. Beyond BRCA2, they will focus on incurable and intractable diseases, deploying deep learning approaches to elucidate the pathogenetic entities involved. They hope to apply their deep learning insights to repurposing drugs to treat currently incurable diseases.

SHARE

DOWNLOAD E-BOOK

REFERENCE

https://doi.org/10.33548/SCIENTIA1185

MEET THE RESEARCHERS


Mr Seoung-Won Yoon
Chungnam National University, Yuseong-Gu, Daejeon, Republic of Korea

Mr Seoung-Won Yoon is a PhD Candidate at Chungnam National University, Republic of Korea, under the supervision of Professor Kyuchul Lee in the Computer Science & Engineering Department. Mr Yoon’s research involves developing deep learning models, primarily for bioinformatics and drug repurposing. He has so far participated in more than 10 nationally funded projects. His work on deep learning has included model development for pancreatic cancer-related genes, miRNA and mRNA prediction, and predicting the relationships between bio-genetic data. Additionally, he has conducted research projects in risk prediction, evaluating the performance of user movement path predictions, and assessing similarity patterns in user data. Furthermore, he has worked on an internet-of-things-based wearable flu vaccine project aimed at preventing the spread of influenza.

CONTACT

E: yoonenoch11@gmail.com


Professor Kyuchul Lee
Chungnam National University, Yuseong-Gu, Daejeon, Republic of Korea

Professor Kyuchul Lee received his Bachelor’s degree in Computer Science from Seoul National University in 1984 and a PhD in the same field from the same university in 1990. He has been a faculty member at Chungnam National University since 1989 and has also held the positions of visiting researcher at IBM Almaden Center and visiting professor at Syracuse University. At the time of writing, he has successfully led 144 national research projects, published 56 international journal papers, 97 international conference presentations, 125 domestic journal papers, 206 domestic conference presentations, and holds 46 patents and intellectual property records. Additionally, he has facilitated 12 technology transfers. As an advisor in the Data Artificial Intelligence Research Lab (formerly Database Systems Lab), he has mentored over 110 Master’s and PhD students, developing core technologies in the fields of databases and AI. His research spans across multimedia data, XML, semantic web, IoT, AI, and big data. Through courses like File Processing, Database Systems, and Web Programming, he has equipped students with the knowledge to excel in their professional careers.

FUNDING

National Research Foundation of Korea (NRF)

Korean Government (MSIT)

FURTHER READING

S Yoon, I Hwang, J Cho, et al., miGAP: miRNA–Gene Association Prediction Method Based on Deep Learning Model, Applied Sciences, 2023, 13(22), 12349. DOI: https://doi.org/10.3390/app132212349

REPUBLISH OUR ARTICLES

We encourage all formats of sharing and republishing of our articles. Whether you want to host on your website, publication or blog, we welcome this. Find out more

Creative Commons Licence (CC BY 4.0)

This work is licensed under a Creative Commons Attribution 4.0 International License. Creative Commons License

What does this mean?

Share: You can copy and redistribute the material in any medium or format

Adapt: You can change, and build upon the material for any purpose, even commercially.

Credit: You must give appropriate credit, provide a link to the license, and indicate if changes were made.

SUBSCRIBE NOW


Follow Us

MORE ARTICLES YOU MAY LIKE

Professor Rebecca States | Better Balance with Exercise: Choices for Parkinson’s

Professor Rebecca States | Better Balance with Exercise: Choices for Parkinson’s

Parkinson’s disease is the most rapidly growing neurological disease worldwide. At present, there are no treatments that can prevent or reverse the damage caused by this disease. Therefore, there is a demand for therapies that ease and manage symptoms. Professor Rebecca States of Hofstra University collaborated with colleagues from Long Island University to evaluate the effects of exercise on the balance and postural control of individuals with Parkinson’s disease. The outcomes shed light on how exercise should be used for healthcare practitioners and researchers working with Parkinson’s disease.

Dr Peter Kim | Can collagen production be re-programmed in ageing skin?

Dr Peter Kim | Can collagen production be re-programmed in ageing skin?

Scientists have a growing body of data that could bring them a step closer to being able to ‘instruct’ skin collagen to resist the effects of ageing, according to a review of the latest research undertaken by Dr Peter Kim, biochemist and founder of private tuition company Veribera.

Forkfuls of Clarity: The Lean Protein Prescription

Forkfuls of Clarity: The Lean Protein Prescription

New research led by Dr Nathaniel R. Johnson of UND and his mentors at NDSU, Drs Julie Garden-Robinson and Sherri Stastny, reveals a strong link between protein type and mental health in older adults. Analysing data from 637 North Dakotans aged 50+, the study found that self-reported average meal patterns that included lean proteins, like chicken, fish, eggs, and legumes, were associated with fewer days of depression and anxiety. In contrast, processed meats such as bacon and deli slices correlated with increased mental distress, especially in rural communities. These findings suggest that protein quality may significantly influence emotional well-being in later life, offering a simple, everyday strategy to support mental health, one nourishing meal at a time.

Dr Warren Strober | Unravelling the Complex Causes of Crohn’s Disease

Dr Warren Strober | Unravelling the Complex Causes of Crohn’s Disease

Crohn’s Disease (CD) is a type of inflammatory bowel disease that is due to abnormalities of the gastrointestinal (GI) immune system that result in immunologic hyper-responsiveness to normal GI constituents. It causes severe and recurrent GI symptoms that can be managed but not cured, except in rare cases where histocompatible bone marrow transplantation can be applied to replace the errant immune system.

Dr Warren Strober from the National Institutes of Health (NIH) in the USA, specialises in the study of the GI immune system, both when it operates normally to maintain homeostasis, as well as when it operates abnormally causing health issues such as CD.