Method

Pre-processing of the small RNA-seq

Source and processing of small RNA high throughput sequencing data: The small RNA high throughput sequencing data were downloaded from the NCBI SRA database (http://www.ncbi.nlm.nih.gov/sra). The adaptor sequences from the raw data were removed using Cutadapt. Clean reads with length of 16-40 bases long were kept into subsequent analysis.


Building tRNA index

Information about the tRNA genes for 20 species were downloaded from GtRNAdb (http://gtrnadb.ucsc.edu). Only high confident tRNAs with conserved secondary cloverleaf structure were used. Corresponding reference genomes were download from the same genome assembly which GtRNAdb used. tRNAs sequences were transformed to mature tRNA by removing intron sequences and adding “CCA” tail. 100 bp downstream sequences of tRNAs were extracted from reference genome based on genomic locatin of tsRNAs.


Mapping small RNA

Species-specific tRNA blast databases were built to query the small RNA sequences. Then we use Blastall to find the tRNA-related RNA sequences in each library. We considered only those alignments where the small RNA sequence was mapped to the tRNA sequence along 100% of its length without mismatch. Then the expression value of tsRNA was quantified using our custom script. Next, we filtered tsRNAs with low expression value and/or in limited samples to eliminate random degradation sequences.


Annotation of tsRNAs

The tsRNAs sequences were blast to tRNA index again to obtain the tsRNA-tRNA relationship. Here, we found that some tsRNAs mapped to more than one tRNAs. Sequence analysis of these tRNAs shows that they have high sequence similarity or even different copies of same tRNA. Therefore, we kept all the receivable matching relationship. Then the position type of tsRNA were settled according to the mapping result. Only if the identical sequence mapping at the 5’ extreme end of the mature tRNA was classified as 5’ tsRNA. A tsnaRNA that mapped to the 3’ extreme end of mature tRNA was classified as 3’ tsRNA. The downstream series were assigned to downstream tsRNA and the rest is inter tsRNA.


tsRNA target identification through CLASH/CLEAR data analysis

Crosslinking, ligation, and sequencing of hybrids (CLASH) and Covalent ligation of endogenous Argonaute-bound RNAs (CLEAR)-CLIP are two new techniques that designed to directly observe in vivo miRNA-target interaction through AGO binding. In addition, some research demonstrated that tsRNAs can recognize specific RNA targets by binding AGO protein in a miRNA-like way. So we used a custom pipeline to explore tsRNA-target interaction in CLASH/CLEAR-CLIP data. Three datasets were used: 1) CLASH data of human HEK293 cells (GSE50452); 2) CLEAR-CLIP data of human Huh7.5 cells (GSE73057); 3) CLEAR-CLIP data of mouse cortex (GSE73058). We first used Cutadapt to remove adapters and filter reads less than 16 nt. Then we merged all the fastq file to a fasta file, and collapsed and counted each unique read. The reads were then mapped to tRNA reference. Reads partially match (16-40 nt of the reads mapped to tRNAs perfectly and the unmapped part is more than 8 nt) to tRNAs were kept as candidate tsRNA–target chimeras. Next, we mapped the candidates to genome using bowtie and blat respectively to mark and remove fake chimeras that can mapped to other sites. At the same time, we “blasted” the candidates to NT database to remove pollutants. And then, the candidate tsRNA–target chimeras were split to tsRNAs and target sequences. The target sequences were mapped to genome to get the genome positions and to obtain the 100nt downstream of the ligation sites as target sequences. The duplex structure predictions for tsRNAs and target regions were made using RNAhybrid. Target gene were annotated using bedtools.


Protein binding tsRNAs exploration from CLIP/RIP data

The small RNA-seq data of protein CLIP/RIP sample were first pre-processed as previously mentioned. Only reads with a length between 16-40 nt were kept. Then, each unique read were collapsed and counted. Next, we use BLAST-2.2.16 to map these reads to their corresponding tsRNA reference to get the protein binding tsRNAs and their expression value. At last, we normalized each tsRNA’ expression value to CPM.


Mining tsRNA research literatures

Since the articles about tsRNAs’ specific expression patterns or biological functions are limited, it is feasible to use a text-mining approach to search for the information of tsRNAs in the full-text of open access articles. We firstly collected all articles mentioning the tsRNAs by searching keywords such as “tsRNA”, “tRF” or “tRNA” in NCBI PubMed database. Then we sought tsRNAs with validated specific expression or function by full-text mining. All the information of these tsRNAs was recorded and correspond with tsRBase ID.