tDRnamer standalone software
The standalone version of tDRnamer is available for reseearchers who have large data sets and resources/experience to work on a Linux/Unix environment.
Installation
System requirements
tDRnamer requires to be run on a Linux/Unix system with at least 8 cores and 16 GB memory. If working with small RNA sequencing data, we do not recommend using tDRnamer on a regular desktop or laptop.
Using Docker Image
To eliminate the need of installing dependencies, you can download the Docker image from our DockerHub repository using the command
docker pull ucsclowelab/tdrnamer
Using Conda Environment
For those who prefer to use conda, you can create the environment using the command
conda env create -f tdrnamer_env.yaml
Getting source code
The source code can be downloaded from GitHub at https://github.com/UCSC-LoweLab/tDRnamer. tDRnamer was developed with Python and Perl, and does not require compilation or installation.
To run tDRnamer from source code, dependencies listed below are required to be installed.
Dependencies
- Python 2.7 or higher
- pysam Python library (latest verion - older versions have a memory leak)
- Bowtie2
- NCBI BLAST+ 2.3 or higher
- EMBOSS 6.6
- Samtools 1.9 or higher
- Infernal 1.1.2 or higher
Tutorial
Test run
To try out tDRnamer with small data sets, we provide a script test_run.bash
. It includes downloading sample data, GRCh38/hg38 reference genome, and GtRNAdb tRNA annotations from our server, building tDRnamer reference database, and performing five tDRnamer runs:
1. Search and annotate tDRs from an ARM-seq sample data in FASTQ file format
2. Name and annotate tDR sequences provided in FASTA file with default mode
3. Name and annotate tDR sequences provided in FASTA file with maximum sensitivity mode
4. Name and annotate tDR sequences provided in FASTA file with the inclusion of nucleotide variations if exist
5. Search and annotate tDR sequences from provided tDR names
The ARM-Seq sample data was described in the following publication.
Cite
Cozen AE, Quartley E, Holmes AD, et al. (2015) ARM-seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments. Nature Methods 12:879–884.
The test run may take approximately five minutes to complete and the sample outputs can be downloaded from our server for comparison.
Components
tDRnamer contains two main tools:
create_tDRnamer_db
- Build a reference database for naming tDRs or finding tDR sequencestDRnamer
- Naming/annotating tDRs or finding tDR sequences
Data needed as inputs
- For naming tDRs from sequences
- Preprocessed sequencing data from Illumina platform in FASTQ file format, or
- tDR sequences in FASTA file format
- For finding tDR sequences from tDR names, single-column text file containing tDR names in defined format
- tRNAscan-SE outputs of targeted genome downloaded from GtRNAdb
- Genome sequence FASTA file (Eukaryotic genomes can be downloaded from UCSC Genome Browser)
Note
The chromosome names must be the same across all the input files. For example, if chr1
is used as chromosome 1 in tRNA annotations, the same chromosome name must be used in the genome sequence FASTA file. If genome sequence file is obtained from NCBI, ENSEMBL, or ENA, the chromosome names in the FASTA file have to be updated to match with the tRNA annotations.
How to Run
Step 1: Build custom reference database
Before naming tDRs or searching for tDR sequences, a custom reference database has to be built. Pre-built databases for model organisms have been made available for download here.
Reference databases can also be built using the create_tDRnamer_db
tool.
create_tDRnamer_db --db dbname --genome genome.fa --trna trnascan.out --ss trnascan.ss --namemap trna_name_map.txt --source source
dbname
is the output directory and name that will be used for the reference databasegenome.fa
is a FASTA file of the reference genometrnascan.out
is the output file generated by tRNAscan-SE and can be downloaded from GtRNAdbtrnascan.ss
is the secondary structure file generated by tRNAscan-SE and can be downloaded from GtRNAdbtrna_name_map.txt
is the map file that converts the tRNAscan-SE IDs to GtRNAdb gene symbols. It is also included in the GtRNAdb downloaded tarball.source
is the sequence source of the reference and can beeuk
for eukaryotes (default),bact
for bacteria, orarch
for archaea.
If create_tDRnamer_db
is run within the tDRnamer source directory, a file path where the reference database will be created should be included as part of the dbname
, for example, /db_path/hg38
.
Step 2: Naming tDRs or finding tDR sequences
Naming tDRs from sequences
Researchers can provide a FASTA file with possible tDR sequences as input. Alternatively, preprocessed small RNA-seq data in FASTQ file can be supplied. Raw sequencing data has to be preprocessed to remove sequencing adapters and merge paired end reads into single end reads. trimadapters.py
in tRAX software package can be used for this purpose. Gzip compressed file can be used.
Note
Both the forward and reverse strands of input sequences are searched.
To start tDR naming process, run the following command:
tDRnamer --mode seq --seq tdrs --db dbname --source source --output output_dir/prefix
tdrs
is the input FASTA or FASTQ filedbname
is the directory and name of the reference database generated bycreate_tDRnamer_db
source
is the sequence source of the tDRs and can beeuk
for eukaryotes (default),bact
for bacteria, orarch
for archaea.output_dir/prefix
is the directory and prefix for output files
Finding tDR sequences by names
A single-column text file without column header that contains tDR names will be used as input.
Example
tDR-31:76-Asp-GTC-2-G15C
tDR-38:76-Gln-CTG-1-D15U
tDR-1:41-Lys-CTT-1
To find and annotate tDR sequences, run the following command:
tDRnamer --mode name --name tdrs --db dbname --source domain --output output_dir/prefix
tdrs
is the input single-column text file with tDR namesdbname
is the directory and name of the reference database generated bycreate_tDRnamer_db
domain
is the sequence source of the tDRs and can beeuk
for eukaryotes (default),bact
for bacteria, orarch
for archaea.output_dir/prefix
is the directory and prefix for output files