EL: Writing - original draft, Conceptualization, Software, Writing - review & editing, Visualization
JR: Conceptualization, Writing - review & editing, Writing - original draft
DF: Writing - review & editing, Supervision, Funding acquisition, Writing - original draft
The conversion of adenosine to inosine at the wobble position of select tRNAs is essential for decoding specific codons in bacteria and eukarya. In eukarya, wobble inosine modification is catalyzed by the heterodimeric ADAT complex containing ADAT2 and ADAT3. Human individuals homozygous for loss of function variants in ADAT3 exhibit intellectual disability disorders. We created a flexible computational tool to scan the human, mouse, nematode, fruit fly, and yeast exomes for genes either enriched or depleted in ADAT-dependent codons as compared to background models of codon bias derived from the exomes themselves. We find that many genes are enriched or depleted for ADAT-dependent codons as compared to the genomic background in all five species. Among those genes enriched for ADAT-dependent codons in humans, we find there is significant Gene Ontology (GO) enrichment for genes involved in diverse neurological processes. This pattern persists in the mouse exome but not the fruit fly or nematode exome. In the nematode exome, genes enriched in ADAT-dependent codons are GO enriched for translation associated genes, and in yeast there is GO enrichment for genes involved in metabolic functions. There is also GO-term overlap between yeast and fruit flies. Importantly, in its generalized form, ADATscan can also be used to scan any exome for genes enriched in any subset of codons specified by the user.
A) Summary statistics regarding the genes analyzed using ADATscan. B) The distribution of ADAT-dependent codon frequencies within genes in the human, mouse, nematode, fruit fly, and yeast exomes as determined by ADATscan. C) Average ADAT-dependent codon frequencies in enriched, depleted, and nonsignificant genes across gene bodies for all five species. D) GO-term enrichment using the list of genes that are enriched for ADAT-dependent codons after correcting for multiple comparisons (Benjamini-Hochberg procedure, FDR = 0.01). In cases where more than 500 genes were enriched, the 500 genes with the smallest p-values were used for GO-term analysis. The top five categories by fold enrichment output by PANTHER (see Reagents) are shown ordered by their p-values for each of the five species. Neurological associated processes are noted in bold and red.
Modification of wobble adenosine to inosine in the anticodon of tRNAs is a conserved enzymatic reaction, catalyzed by the TadA homodimeric complex in
To screen for genes that may be the most (or the least) likely to be translationally impacted by decreased ADAT activity, we created a computational tool to scan exome data for enrichment or depletion of ADAT-dependent codons. Our methodology is to first establish a background model for ADAT-dependent codon usage in the exome (see Methods). This null model is then applied to each nonredundant gene entry in an exome FASTA file whereby an expected number of ADAT-dependent codons is calculated and compared to the observed number via simple chi-square test (Figure 1A). These chi-square tests are then corrected for multiple comparisons (Benjamini-Hochberg procedure, FDR = 0.01) to yield a conservative list of genes that are enriched or depleted for ADAT-dependent codons. This tool also outputs the frequency of ADAT-dependent codons in each gene (Figure 1B), the counts of these codons in each gene, and the p-values for enrichment or depletion of ADAT-dependent codons for each gene. The tool also outputs a file that allows for plotting ADAT-dependent codon frequencies across gene bodies (Figure 1C).
Using this tool, we identified 5952 human genes that are significantly depleted for ADAT-dependent codons and 6713 human genes that are significantly enriched for ADAT-dependent codons as compared to the human exomic background. Subsequent GO-term enrichment analysis showed that genes enriched in ADAT-dependent codons are biased for genes involved in neurological associated processes (Figure 1D). The human genes identified represent candidates that may underlie the neurodevelopmental phenotypes seen in patients homozygous for loss of function variants in ADAT3. Notably, mouse genes enriched for ADAT-dependent codons also exhibit an overrepresentation of genes involved in neurological processes (Figure 1D). In contrast, no enrichment for genes involved in neurological processes was found within the fruit fly or nematode exome (Figure 1D). The nematode exome showed GO-term enrichment among those genes enriched for ADAT-dependent codons for translation associated genes. For the yeast exome, genes enriched for ADAT-dependent codons are GO-term enriched for genes involved in metabolic functions, and there is some GO-term overlap between fruit fly and yeast (Figure 1D). There was no striking trend in terms of ADAT-dependent codon frequency across gene bodies among enriched, depleted, and nonsignificant genes (Figure 1C). Altogether, these studies demonstrate that ADATscan can be used to predict which genes are presumably most dependent on ADAT for efficient translation and serve as a starting point for identifying biological processes linked to wobble inosine modification. Importantly, ADATscan is user-friendly and can be applied to any exome for any set of codons based on user input.
The definition of an ADAT-dependent codon is a -C or -A ending codon which lacks a cognate tRNA for decoding it, and is thus dependent on wobble inosine modified tRNAs. This information can be found at
The background model for ADAT-dependent codon usage is calculated using the relevant set of codons and the relevant exome data. Every codon in the exome is translated. If a codon corresponds to an amino acid that is encoded by an ADAT-dependent codon, then it is counted and recorded. This is done separately for all relevant amino acids. For each of these amino acids, the number of instances of ADAT-dependent codons are also counted. The latter count is divided by the former count for each relevant amino acid, yielding the background frequency of ADAT-dependent codon usage for each amino acid. This model is output to a file by ADATscan.
For each sequence in the exome file, amino acids that are encoded by ADAT-dependent codons are counted separately. These counts are then multiplied by their respective background frequency of ADAT-dependent codon usage in the exome as calculated in the background model. These numbers are then summed to yield a null estimate for the number of ADAT-dependent codons expected to appear in the protein based on amino acid composition and the background frequency of ADAT-dependent codon usage. This expectation is then compared to the observed number of ADAT-dependent codons in a 2x2 chi-square table that is filled as shown below:
Observed |
Expected |
|
Protein length |
Empirical value |
Empirical value |
ADAT-dependent codons |
Empirical value |
Expected value based on background model |
These tests are then corrected for multiple comparisons (Benjamini-Hochberg procedure, FDR = 0.01) to yield a conservative list of genes that are enriched or depleted for ADAT-dependent codons. In cases where more than 500 genes were enriched, the 500 genes with the smallest p-values were used for GO-term enrichment.
Human and mouse exome data were retrieved from the CCDS database (
Description: This file contains the results of ADATscan when the Bonferroni correction is applied. Resource Type: Dataset. DOI:
Description: This file contains the results of ADATscan when the Benjamini-Hochberg procedure is applied. Resource Type: Dataset. DOI:
Description: Link to github. Resource Type: InteractiveResource. DOI:
The authors would like to acknowledge Justin Fay for comments and encouragement.
J. R. is funded by NIH F32GM146366. This work was supported by NSF CAREER 1552126 to D.F.