Reference Manual

create_tDRnamer_db

create_tDRnamer_db creates the reference database that is used by tDRnamer for naming tDRs or find tDR sequences by names. It uses Bowtie2 to build indexes for sequence similarity search with default search mode. Nucleotide BLAST database is also built with NCBI BLAST+ for the initial scan of the maximum sensitivity search mode. In addition, the tool uses Infernal to create tRNA sequence alignments for annotating the positions of tDRs. The tRNA annotations required as inputs can be downloaded from GtRNAdb or generated by running tRNAscan-SE. The chromosome/sequence names of the reference genome sequences must match with the sequence name in tRNA annotations. Eukaryote genomes can be downloaded from the UCSC Genome Browser.

By default, possible pseudogenes and predicted tRNA genes with undetermined isotype are excluded in the database. For multicellular eukaryotes with high confidence tRNA genes defined, only high confidence/scoring tRNAs and filtered genes with score >= 50 bits, isotype model score >= 80 bits, and have consistent anticodon/isotype model are included. Researchers can skip the filtering step completely by using --skipfilter option. The cutoffs of tRNA score and isotype model score can also be adjusted using --score and --isoscore options.

For more information regarding the classification of high confidence/scoring tRNA genes, please check out our tRNAscan-SE paper:

Cite

Chan PP, Lin BY, Mak AJ, and Lowe TM. (2021) tRNAscan-SE 2.0: Improved Detection and Functional Classification of Transfer RNA Genes. Nucleic Acids Res. 49:9077–9096.

Usage

create_tDRnamer_db --db dbname --genome genome.fa --trna trnascan.out --ss trnascan.ss --namemap trna_name_map.txt [--source source] 
[--force] [--skipfilter] [--score score] [--isoscore score]

Options

--db or -d : dbname (required)
Directory and database name that will be used for the reference database
--genome or -g : genome.fa (required)
FASTA file of reference genome
--trna or -t : trnascan.out (required)
tRNAscan-SE output file (*.out file in GtRNAdb downloaded tarball)
--ss or -s : trnascan.ss (required)
tRNAscan-SE secondary structure output file (*.ss file in GtRNAdb downloaded tarball)
--namemap or -n : trna_name_map.txt (required)
Map file that converts tRNAscan-SE IDs to GtRNAdb gene symbols (*_name_map.txt file in GtRNAdb downloaded tarball)
--source or -r : source (optional)
Sequence source of reference
Default is euk for eukaryotes. Other values include arch for archaea and bact for bacteria.
--force or -q (optional)
Force to overwrite output files if existed.
--skipfilter (optional)
Skip filtering step to include all provided tRNAs in database
--score : score (optional)
tRNAscan-SE score cutoff for filtering multicellular eukaryotic tRNA genes (default = 50)
--isoscore : score (optional)
Isotype model score cutoff for filtering multicellular eukaryotic tRNA genes (default = 80)

Outputs

The following files are generated upon completion:

dbname-tRNAgenome.*: FASTA file of tRNA sequences with Bowtie2 indexes and BLAST database
dbname-trnaalign.stk: Alignments of mature tRNA sequences in Stockholm file format
dbname-trnaconvert.stk: Alignments of mature tRNA sequences in Stockholm file format
dbname-trnaloci.stk: Alignment of tRNA gene sequences in Stockholm file format
dbname-trnatable.txt: Tab-delimited file with tRNA transcripts and tRNA genes map
dbname-maturetRNAs.fa: FASTA file of mature tRNA sequences
dbname-maturetRNAs.bed: tRNA transcripts in BED file format
dbname-tRNAloci.fa: FASTA file of tRNA gene sequences
dbname-trnaloci.bed: tRNA genes in BED file format
dbname-filtered-tRNAs.out : tRNAscan-SE output file format with filtered tRNA genes used for database creation
dbname-dbinfo.txt: Database creation information
dbname-create_tDRnamer_db.log: Database creation log file

tDRnamer

tDRnamer is the main tool that annotates tDRs. When naming tDRs by sequences, it includes aligning input sequences to the reference database using Bowtie2, computing tDR positions relative to source tRNAs, assigning names to tDRs, and grouping tDRs with source tRNAs based on alignments. Both the forward and reverse strands of input sequences are searched. When providing tDR names as inputs, the tool will search for the corresponding tDR sequences in the reference database and annotate the tDRs with the identified sequences. tDRs derived from both mature tRNAs and precursor tRNAs will be identified when sequence source is set as euk (eukaryotes). Only tDRs derived from mature tRNAs will be identified when sequence source is set as bact (bacteria) or arch (archaea).

Usage

tDRnamer --mode mode [--seq filename or --name filename] --db dbname --output output_dir/prefix [--source source] [--force] [--max] [--var] [--minread reads] [--minlen length] [--maxlen length] [--maxmismatch percentage] [--cores cores]

Options

--mode or -m : mode (required)
tDRnamer search mode
Default is seq, search by sequences. Other value is name, search by tDR names.
--seq or -s : filename (required)
Input sequence file, only applicable with --mode as seq
Can be FASTA file with possible tDR sequences or FASTQ file with preprocessed small RNA-seq reads. Gzip compressed file is supported.
--name or -n : filename (required)
Input tDR name file, only applicable with --mode as name
Single-column text file without column header
--db or -d : dbname (required)
Directory and name of reference database generated by create_tDRnamer_db
--output or -o : output_dir/prefix (required)
Directory and prefix for output files
--source or -r : source (optional)
Sequence source of reference
Default is euk for eukaryotes. Other values include arch for archaea and bact for bacteria.
--force or -q (optional)
Force to overwrite output files if existed.
--max (optional)
Search with maximum sensitivity (slowest speed)
--var (optional)
Include nucleotide variation (if exists) as part of a tDR name
Only applicable with --mode as seq
--minread : reads (optional)
Minimum number of identical sequencing reads to be considered as a possible tDR (default = 10)
Only applicable with --mode as seq and --seq as FASTQ file
--minlen : length (optional)
Minimum sequence length (nt) to be considered as a tDR (default = 15)
To skip minimum sequence length constraint, specify value as 0.
--maxlen : length (optional)
Maximum sequence length (nt) to be considered as a tDR (default = 70)
Only applicable with --mode as seq To skip maximum sequence length constraint, specify value as 0
--maxmismatch : percentage (optional) Maximum percentage of mismatches by sequence length (default = 10) Only applicable with --max option Maximum acceptable value is 20
--cores
Number of processing cores to be used for sequence search (default = 4)

Input files

Sequence file

Researchers can provide a FASTA file with possible tDR sequences as input. Alternatively, preprocessed small RNA-seq data in FASTQ file can be supplied. Raw sequencing data has to be preprocessed to remove sequencing adapters and merge paired end reads into single end reads. trimadapters.py in tRAX software package can be used for this purpose. Gzip compressed file can be used. Please check out test_run.bash obtainable with the source code for examples.

tDR name file

A single-column text file without column header that contains tDR names will be used as input. An example file ExampleNames.txt can be downloaded from here.

Outputs

tDR annotations

prefix-tDR.fa: FASTA file with tDR names and sequences
prefix-tDR-info.txt: tab-delimited file with tDR annotations including tDR names and sequences, source tRNAs, Sprinzl positions of tDRs relative to source tRNAs, original tRNA isotype and anticodon, sequence variation counts, and group ID
prefix-tDR-groups.txt: text file containing queried tDRs that are grouped together by sequence alignments. Alignments are arranged in Stockholm format that includes primary sequence and secondary structure information. Details about Stockholm format can be found in the Infernal User Guide.
prefix-found-seq.fa: FASTA file with tDR sequences when search by tDR names. This file is generated during the initial round of sequence search before annotation process.

tDR alignments

prefix-tDRs.stk: Alignments of identified tDRs derived from mature tRNAs with reference tRNA sequences in Stockholm file format
prefix-pre-tDRs.stk: Alignments of identified tDRs derived from precursor tRNAs with reference tRNA sequences in Stockholm file format. This file is only generated when --source is euk.

Other output files

prefix-unique-seq.fa: FASTA file with unique sequences in provided FASTQ file
prefix-filtered-seq.fa: FASTA file with sequences after filtering by length constraint and error checking. This is only generated when --mode is seq.
prefix-reformatted-seq.fa: FASTA file with sequences after converting RNA sequences to DNA sequences if applicable
prefix-filtered-names.txt: List of tDR names after filtering by error checking. This is only generated when --mode is name.
prefix-tDR-list.txt: Intermediate file generated during tDR annotation process
prefix-clusters.txt: Intermediate file generated during tDR group process
prefix-pre-clusters.txt: Intermediate file generated during tDR group process. This file is only generated when --source is euk.
prefix-tDRs.sam: Intermediate alignment file generated during tDR searching/annotation process with --max option.
prefix-find-tdrs.log: Intermediate log file for searching tDR sequences by names. This is only generated when --mode is name.
prefix_tDRnamer.log: Log file of tDRnamer run