tRNAviz

Explore and visualize tRNA sequence features across the tree of life

Introduction

tRNAviz empowers users to perform comparative genomics analyses on tRNA sequence features.

Although many, many tRNAs have been sequenced or predicted in genomes, aggregation of tRNA features is limited to those fluent in "big-data" bioinformatics methods.

To address this, tRNAviz provides a point-and-click toolset of customizable and powerful visualizations.

tRNA genes predicted by tRNAscan-SE that decode the standard twenty amino acids were used as the data source of tRNAviz. Additional filtering criteria were applied to remove tRNA-derived SINEs, pseudogenes, and other sequences that we judged unlikely to be involved in bacterial, archaeal or eukaryotic cytosolic ribosome translation. These include:

  • tRNAs in phylum Chordata that do not belong to the high confidence gene set
  • tRNAs that are predicted as pseudogenes
  • genes with predicted truncation
  • fungal tRNA genes with scores below 50 bits
  • all other tRNA genes with a score below 25 bits
  • predicted mitochondrial-origin tRNAs in nuclear mitochondrial DNA sequences (NUMTs)

All tRNAs are tagged with its source assembly, clade, isotype, anticodon, score, best-scoring isotype-specific model and score, isotype-specific score from the anticodon model, intron length, G/C content, number of indels, and loop sizes.

tRNAviz data snapshot

Data Release 1.0 (November 2018)

Within a group of tRNAs, each position and base pair was classified into a set of nucleotide ambiguity codes. The classification algorithm iterates over ranked feature combinations1 to determine the consensus feature.

To determine a consensus feature for a group of tRNAs, within each major clade and isotype combination of the group of tRNAs, 90% of the tRNAs must contain the feature. Each species also must contain at least one tRNA with that feature. Each possible base or base pair was required to exist in at least 5% of the tRNA isodecoders in question2.

E.g., purine is consisted of A or G and is ranked lower than A, C, G, and U.

This prevents rare features from making a disproportionate impact.

Feature colors

The Compare pages facilitate a deeper look at conservation patterns.

Clade groups are arbitrary combinations of clades - any set of unique clades and species can be combined into a clade group. Clade groups can be used to visualize outgroup distributions3, but can also be used outside of taxonomic groups4. Both Variation pages use clade group queries.

A focus is a selection of tRNAs based on individual tRNA annotations. Currently, selecting a focus by position, isotype, anticodon, and domain-specific score range are supported.

In general, scores above 50 bits are likely to be real tRNAs. However, score ranges vary by clade5. Consult the Taxonomy page for your clade, or visit GtRNAdb for more detailed individual tRNA gene annotations.

For example, a user can choose to combine Ascomycota with Basidiomycota in one clade group, and compare their sequence feature distribution with Microsporidia in another clade group.

For example, to see if there is a shared tRNA sequence feature signature among pathogenic bacteria, a user may choose to combine all known pathogenic bacteria in one clade group, opportunistically pathogenic bacteria in a second group, and non-pathogenic bacteria in a third group.

Schizosaccharomyces pombe's tRNAMet has an average score of 57 bits, while in human, the average tRNAMet score is 78 bits.

The Compare by Sequence page facilitates a deeper look at conservation patterns. Its primary focus is to examine position-specific deviation from expected features in custom sets of tRNAs6. Input FASTA sequence is supported7.

Under the hood, tRNAviz undergoes a three step process:

  1. Use reference selection to build a covariance model
  2. Align each group of query selections to the model
  3. Extract and normalize position-specific scores

This process is computationally intensive. With smaller queries, bitcharts are generated in less than a minute, but with extremely large queries, it may take up to ten minutes.

Position-specific scores for each query selection are normalized by subtracting the expected score for the highest probability feature8. These are derived by aligning the reference model against the tRNAs used to build the reference model. In cases where the selected query sequence is highly dissimilar from the selected reference sequence model, it could display a single nucleotide where a tRNA base pair is expected, a gap nucleotide “-“, or gap pair “-:-“. This is due to the query and reference not aligning at that position, meaning that an insert/gap scores more favorably than forcing an alignment between the query and reference model. Thus, the highest penalty score a query tRNA position can receive is zero if it is equal to the reference.

This approach can be easily used to find closely related tRNAs, summarize distributions, and ranking isotypes and clades similar to a given tRNA. However, we recommend that you use the most appropriate tool for the job - for example, use BLAST to find closely related tRNAs.

Must be accompanied by choice of domain-specific numbering model. The universal model is used by default.

Each tRNA yields a parsetree. Position-specific scores for query selections with a single tRNA are directly extracted from the single parsetree, while for multiple tRNAs, all scores are extracted, then averaged by position.

tRNAviz is an open source web application under LGPLv3. The source code of the data processing pipeline and web application can be downloaded at the provided links.

Lin BY, Chan PP and Lowe TM. (2019) tRNAviz: explore and visualize tRNA sequence features. Nucleic Acids Res. gkz438.

The source data are included in tRNAviz_data_1.0.tar.gz for download.

For bug reports, issues, comments, or suggestions, send an email to trna@soe.ucsc.edu.