Predicting safe haven regions for helitron transposable elements in centromeric regions of plant genomes using machine learning

Presentation Type

Abstract

Faculty Advisor

Chunguang Du

Access Type

Event

Start Date

25-4-2025 1:30 PM

End Date

25-4-2025 2:29 PM

Description

Helitrons are transposable elements that replicate through rolling-circle amplification. Prior peer-reviewed research has identified a significant number of Helitrons accumulating in the centromeric regions of plant genomes, including Oryza sativa. These regions are characterized by high repeat density and low gene density. These regions are also hypothesized to serve as safe haven regions for Helitron insertions, minimizing disruption to essential genes. To build upon this study, we developed a machine learning approach to predict additional safe haven regions for Helitrons in the rice genome. We first extracted Helitron insertion sites from the rice genome focusing our analysis on the centromeric regions of Chromosome 10. These insertion sites were used as positive examples, while randomly selected genomic regions served as negative controls. We trained a Random Forest classifier to predict new potential safe haven regions where Helitrons may safely insert. The resulting predictions were then visualized to confirm non-random distribution patterns consistent with centromeric accumulation. Our preliminary results demonstrate the feasibility of using machine learning to predict Helitron safe havens. Future work will focus on integrating additional genomic and epigenomic features, such as repeat density, repeat regions, and chromatin state, to enhance prediction accuracy in not only rice, but other species.. This conceptual framework lays the foundation for further understanding Helitron insertion dynamics and potential applications in genome engineering.

Comments

Poster presentation at the 2025 Student Research Symposium.

This document is currently not available here.

Share

COinS
 
Apr 25th, 1:30 PM Apr 25th, 2:29 PM

Predicting safe haven regions for helitron transposable elements in centromeric regions of plant genomes using machine learning

Helitrons are transposable elements that replicate through rolling-circle amplification. Prior peer-reviewed research has identified a significant number of Helitrons accumulating in the centromeric regions of plant genomes, including Oryza sativa. These regions are characterized by high repeat density and low gene density. These regions are also hypothesized to serve as safe haven regions for Helitron insertions, minimizing disruption to essential genes. To build upon this study, we developed a machine learning approach to predict additional safe haven regions for Helitrons in the rice genome. We first extracted Helitron insertion sites from the rice genome focusing our analysis on the centromeric regions of Chromosome 10. These insertion sites were used as positive examples, while randomly selected genomic regions served as negative controls. We trained a Random Forest classifier to predict new potential safe haven regions where Helitrons may safely insert. The resulting predictions were then visualized to confirm non-random distribution patterns consistent with centromeric accumulation. Our preliminary results demonstrate the feasibility of using machine learning to predict Helitron safe havens. Future work will focus on integrating additional genomic and epigenomic features, such as repeat density, repeat regions, and chromatin state, to enhance prediction accuracy in not only rice, but other species.. This conceptual framework lays the foundation for further understanding Helitron insertion dynamics and potential applications in genome engineering.