R(A)PTOR -A tool for systematic identification of Poly(A) tails and 3'unmapped regions from single molecule direct RNA-sequencing datasets

Date
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

The 3' cleavage of pre-messenger RNA (mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. Studies report poly-A tail as a long chain of adenine nucleotides added during RNA processing to 3' terminal of a messenger RNA (mRNA) molecule, however, the terminal 3' region is known to harbor additional unmappable regions (UMR) composed of uridylation and guanylation [1]. Although short read sequencing technologies are extensively used for study of 3' terminal poly(A) regions, the major drawback of third generation sequencing technologies lies in their inability to detect full length homopolymeric sequences [1] [2]. Recent long read sequencing technologies like Nanopore sequencing enable sequencing of full length transcripts at a single molecule resolution, however currently there are no tools for systematically analyzing 3' terminal unmapped regions from direct RNA-sequencing datasets. We present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3' terminal unmapped region analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR length, sequences, conserved polyA hexamer regions, nucleotide base composition and transcript vs UMR length correlation analysis at a single molecule resolution. In our benchmarking studies, we sequenced mRNA samples obtained from HepG2(Liver Hepatocellular Carcinoma) & K562(Bone Marrow Chronic myelogenous leukemia) cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified UMRs exhibited a median length of 50-100 nt, in agreement with previous studies [1].Our results also support an enrichment of previously known conserved polyA hexamers [3]. Nucleotide composition analysis of the identified 3' UMR regions showed an enrichment for A and U nucleotides in both HepG2 [A : 29%, U: 28%, G:20%, C:23% ] and K562 [A : 30%, U: 29%, G:19%, C:22%] and interestingly, guanylation was observed in upstream and downstream regions of UMR while uridylation was found to occur more in central regions, suggesting their characteristic role in mRNA stability. In addition, conserved motif analysis of UMR regions followed by RBP binding site analysis, identified several RBPs including HNRPK, PCB2, SART SRSF9, HNRPR and RBM4 to be enriched in the unmapped regions, suggestive of an unappreciated role of these RBPs in binding to 3' tails of mRNAs.

Description
Digitized for IUPUI ScholarWorks inclusion in 2021.
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Rights
Source
Alternative Title
Type
Poster
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}