Re-engineering the adenine deaminase TadA-8e for efficient and specific CRISPR-based cytosine base editing

Liang Chen 1,6, Biyun Zhu1,6, Gaomeng Ru1,6, Haowei Meng 2,6, Yongchang Yan2,6, Mengjia Hong1, Dan Zhang1, Changming Luan1, Shun Zhang1, Hao Wu2, Hongyi Gao1, Sijia Bai1, Changqing Li1, Ruoyi Ding1, Niannian Xue1, Zhixin Lei2, Yuting Chen3, Yuting Guan 1, Stefan Siwko4, Yiyun Cheng1, Gaojie Song 1, Liren Wang1, Chengqi Yi 2 , Mingyao Liu 1,5 and Dali Li 1

https://doi.org/10.1038/s41587-022-01532-7.

Base editors are composed of a nuclease-impaired Cas9 and a deaminase module to generate site-specific base conversions without inducing DNA double-stranded breaks in the absence of donor templates1. There are two major types of base editors, the cytosine base editors (CBEs)2 and the adenine base editors (ABEs)3, which catalyze C·G-to-T·A and A·T-to-G·C transitions, respectively. Through fusion of Cas9 nickase (Cas9n) with the activation-induced cytidine deaminase/apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like (AID/APOBEC) protein family of natural cytosine deaminases, CBEs usually generate considerable C-to-G or C-to-A byproducts because the U:G mismatch is recognized and excised by uracil DNA N-glycosylase (UNG) to create an abasic intermediate that initiates base excision repair to induceunpredictable base conversions. Fusion with or simultaneous expression of a uracil glycosylase inhibitor (UGI) dramatically increases the CBE-induced efficiency and purity of C·G-to-T·A transition4. If UGI is replaced with UNG5,6 or a DNA repair protein7,8 in the CBE backbone, the C-to-G base editors (CGBEs) are developed to mainly induce cytosine base transversions. CBE/CGBEs are promising tools for a broad range of applications, while they generate considerable indels, bystander edits, and Cas9-independent DNA and RNA off-target edits5,9–13, which raise safety concerns, especially for clinical applications. Several studies have developed more accurate CBE variants, such as BE4max-YE1/YEE, eA3A-BE4max and so on, through the engineering of APOBEC enzymes13,14. To improve the performance of CBE/CGBE technology, we thought to develop base editors with a cytosine deaminase that has potential distinct features superior to those of the AID/APOBEC family enzymes.Different from CBE/CGBEs, ABEs use an unnatural adenine deaminase that was evolved from TadA, a transfer RNA (tRNA) adenine deaminase in Escherichia coli, to induce A-to-G conversions in DNA with very high product purity (over 99.9%), minimal indels and a relatively condensed editing window3. Recent studies have shown that ABEs also induce cytosine substitutions in a defined TC*N motif (where the asterisk denotes the target cytosine (C) to edit and ‘T’ and ‘N’ as adjacent bases) independently of adenine conversions, suggesting that the evolved TaDA has cytosine deamination capability of evolved15,16. In our recent study, we also noticed that ABE8e, a superactive ABE variant containing the TadA-8e deaminase, displayed increased cytosine deaminase activity17. Inspired by the identification of its unexpected cytosine deaminase activity, we attempted to repurpose adenine deaminase TadA-8e, the most efficient TadA variant, into a purely unnatural cytosine deaminase (Fig. 1a). We assumed that the TadA-8e-derived CBE would have potential advantages over AID/APOBEC-based editors because the original ABEs exhibited a minimal indel frequency and undetectable Cas9-independent DNA off-target editing3,9,10.Here we report the engineering of adenine deaminase TadA-8e with a substituted type of catalytic substrate, which not only eliminates intrinsic adenine activity but also enables efficient cytosine editing. Because this unnatural cytosine deaminase-derived Td-CGBE induces highly efficient and precise C·G-to-G·C transversion in human cells and rodent embryos, a series of CBEs (Td-CBEs) with distinct features was developed via further molecular evolution and UGI fusions. A low level of indels and background level of either DNA or RNA off-target events were observed in Td-CGBE/Td-CBE-treated cells. Moreover, we applied Td-CGBE/Td-CBEs to install the desired mutations in pathogenic intricately homopolymeric cytosine sequences for generation or correction of disease models.

Results

Structure-guided molecular engineering of TadA-8e

To evolve TadA-8e into a cytosine deaminase, we speculated that the purine ring (A) had a relatively bigger size compared to the pyrimidine ring (C); therefore, it might be more fragile and less tolerant of mutation(s) around the pocket of TadA-8e. According to recently published structures of ABE8e in complex with its substrates18, 14 residues were selected and substituted with distinct amino acids to change the side chain size, polarity or hydrophilic–hydrophobic property (Fig. 1b). When testing on an endogenous target site in HEK293T cells, we found that several variants, such as V28G, N46A, N46G, N46L and N108G, markedly reduced A-to-G activity but kept a high C-to-D (where D represents G, T or A) editing activity (Fig. 1c). The ABE8e-N46 mutants exhibited high efficiency (up to 57.1%) and selectivity for cytosine deamination, suggesting that this residue was a key position for substrate base selectivity. Thus, we individually tested substitution of all the other amino acids at position N46 (Supplementary Fig. 1a). Although N46P and N46L variants showed similar activity and substrate selectivity at three targets, ABE8e-N46L was chosen for further investigationsbecause it exhibited a slightly condensed editing window and relatively high C-to-G activity (Supplementary Fig. 1a,b). Additional V28G or N108G variants introduced into ABE8e-N46L did not further improve its performance (Supplementary Fig. 1c). While we were completing this project, Bae and colleagues reported that ABE7.10 containing a P48R variant displayed increased cytosine editing with reduced but not eliminated adenosine deaminase activity19. However, in contrast to ABE7.10, we found that the protien generated by introducing the A48R variant (A48 in TadA-8e) in ABE8e still retained a high A-to-G efficiency and a much lower rate of cytosine edits compared to ABE8e-N46L, suggesting that N46 in TadA-8e was more critical than A48 for substrate selectivity (Supplementary Fig. 1d).Given the essential role of the N46L variant for the discrimination of adenine and cytosine, we questioned whether this substitution had a similar ability in previous ABE versions. Thus, ABEmax-N46L/N46L (with an N46L variant in both the TadA and TadA* domains) and miniABEmax-N46L variants were constructed and it was found that neither ABE variant exhibited adenine edits but both showed considerable cytosine edits at the target with a TC*N motif, with ABE8e-N46L displaying much higher activity (up to 2.2-fold; Supplementary Fig. 2a–c). For t FANCF site 1, the introduction of an N46L variant also increased the cytosine editing efficiency in ABEmax (34.3% versus 16.4%; Supplementary Fig. 2d), indicating that this variant might also improve cytosine catalytic ability in previous ABE versions. Importantly, ABE8e-N46L but not the other two variants (ABEmax-N46L/N46L and miniABEmax-N46L) was able to edit cytosines in the other three sequence contexts (CCN, GCN and ACN), indicating an expanded targeting scope (Supplementary Fig. 2). These data suggested that N46L was critical for the discrimination of adenines and cytosines both in ABE8e and previous ABE variants.

TadA-derived editor induces efficient editing

Because ABE8e-N46L mainly induced C-to-G transversion (Fig. 1cand Supplementary Fig. 2), we named it TadA-derived C-to-G base editor (Td-CGBE) and compared it with previously reported representative CGBEs5,8, CGBE1 and rAPOBEC-Cas9n-rXRCC1 (hereafter termed CGBE-XRCC1), and ABE8e at 23 target sites. Td-CGBE showed very efficient C-to-G editing (up to 72.8%) similar to CGBE1, while the activity was much higher than that of CGBE-XRCC1 on most of the targets. Importantly, Td-CGBE exhibited a very steep and narrow editing window (C5–C6) even corresponding to single cytosines at 17 of the 21 sites containing cytosines on positions 5–6 (Fig. 1d and Supplementary Fig. 3a,b), while the other two editors showed a broader window (Fig. 1e). Td-CGBE was very efficient on TC*N motifs as previously observed in ABE7.10 (ref. 15), but unexpectedly, it had a much higher efficiency of editing CC*N (average 1.9-fold increase compared to CGBE1 at the C5 position) and GC*N (average 5.6-fold increase compared to CGBE1 at the C5 position) motifs than the other two editors (Supplementary Fig. 3c). Td-CGBE exhibited up to 94.4% C-to-G product purity, which was very comparable to CGBE1 at all tested sites and superior to CGBE-XRCC1 (up to 3.1-fold improvement) at CC*N-motif sites (Supplementary Fig. 3d). Importantly, Td-CGBE induced much fewer indels in almost all tested targets (mean indel frequencies of 18.6%, 28.6% and 9.5% for CGBE1, CGBE-XRCC1 and Td-CGBE, respectively), which was 51.1% and 33.2% of the indel rates induced by CGBE1 and CGBE-XRCC1, respectively (Supplementary Fig. 3e). To investigate whether decreased indels were attributed to lower protein levels, western blotting assays were performed. To our surprise, the protein level of Td-CGBE was much higher than those of ABE8e and the other two CGBEs, suggesting that the N46L variant somehow enhanced protein synthesis (Supplementary Fig. 3f). However, the elevated expression did not induce A-to-G conversion by Td-CGBE at all tested targets and four additional well-used and efficient ABE targets, suggesting the elimination of adenine deaminase activity (Fig. 1f,g). These data suggest that Td-CGBE is a pure cytosine editor without recruitment of natural AID/APOBEC family enzymes,inducing highly efficient C-to-G conversions in a very narrow window with the fewest byproduct indels.Because previous studies showed that YE1/YEE variants in APOBECs or shorter linkers between Cas9n and the deaminase reduced the editing window14,20, we next compared Td-CGBE with CGBE variants (potentially with condensed editing windows) that were generated through the above strategies. However, CGBE-YEE and CGBE-nl (no linker) had very limited C-to-G editing after evaluation of six cytosine-rich targets. CGBE-YE1 showed comparable activity but a wider editing window compared to Td-CGBE. Consistent with a previous study showing that BE4max was inefficient at the target within GC*N motifs21, we found that the three APOBEC1-derived CGBE variants showed poor C-to-G editing frequencies (0.4–4.5%) when the target cytosines were in a GC*N motif (ABE site 23), while Td-CGBE edited this target with a frequency of 29.6% (up to 74-fold higher; Supplementary Fig. 4). These data further highlighted the advantages of an unnatural cytosine deaminase-derived CGBE. In addition, we further fused the UNG element to the Td-CGBE backbone, but no marked improvement was observed (Supplementary Fig. 5). Moreover, when fusing TadA-8e-N46L with the photospacer adjacent motif (PAM)-relaxed SpCas9-NG variant22, Td-CGBE-NG induced precise C5-to-G editing with a frequency of up to 48.9% at non-NGG PAM targets, suggesting that this TadA-8e-derived cytosine deaminase was compatible to engineered Cas9 variants (Supplementary Fig. 6).

Fig. 1 | Engineering of TadA-8e-derived CGBEs. a, A schematic illustration of the conceptual design to evolve an ABE into pure CBEs. b, Overview of the interaction of TadA-8e (light purple) with the single-stranded DNA substrate (green sticks) (Protein Data Bank (PDB): 6VPC). Cas9n is in gray, sgRNA is in cyan, complementary strand DNA is in orange and noncomplementary strand DNA is in green. Amino acids spatially contacting or adjacent to the substrate DNA are labeled on the enlarged image. c, Base editing efficiencies of ABE7.10, ABEmax, ABE8e and ABE8e variants at FANCF site 1 in HEK293T cells. The N46L variant (red arrowhead) was chosen for further evaluation. d, Heatmaps showing the on-target C-to-G editing efficiencies of ABE8e, Td-CGBE, CGBE1 and CGBE-XRCC1 at 12 endogenous target sites in HEK293T cells. e, Average C-to-G editing efficiencies of each editor at 23 endogenous target sites in d and Supplementary Fig. 3b. f, Dot graph showing the A-to-G editing frequencies of ABE8e and Td-CGBE at 23 endogenous sites shown in d and Supplementary Fig. 3b. Each data point represents a biological replicate at each target site. g, Heatmaps showing the on-target A-to-G editing efficiencies of ABE8e and Td-CGBE at four endogenous targets containing multiple adenines in HEK293T cells. d,e,g, Data represent the mean of three independent experiments.

Characterization of TadA-derived CBE variants

The distinct precision of Td-CGBE encouraged us to investigate whether Td-CGBE could be transformed into a Td-CBE. Thus, two copies of UGI were linked to ABE8e-N46L with the P2A peptide to generate a Td-CBE construct (Supplementary Fig. 7). Through evaluation of editing efficiency on a poly(C) target, we found that Td-CBE efficiently generated C-to-T conversion at C6–C8 at a frequency of up to 84.6%, suggesting that it was highly efficient (Fig. 2a). However, its editing window was not narrow as expected. We introduced further variants in or near the active pocket of TadA-8e to reduce the editing window. The majority of additional variants reduced the editing window and efficiency simultaneously, but unexpectedly an additional E27R variant increased both the activity and window of Td-CBE (Fig. 2a). Then, we thought to shorten the linker between the deaminase and Cas9n to narrow the window because our previous study showed that using a single-stranded DNA binding domain to elongate the linker dramatically increased the editing window of CBEs23. Interestingly, reducing the 32-residue XTEN sequence to 3- to 7-residue linkers, which was reported to reduce the editing window20, resulted in highly efficient editing within a 2-nucleotide (nt) window (Td-CBE-linker13, Td-CBE-linker15 and Td-CBE-linker18 in Fig. 2a; Supplementary Table 3). When all the linker residues were removed, the variant preferentially edited C6 within adjacent cytosines at this target (Fig. 2a) and was named enhanced Td-CBE (eTd-CBE). Moreover, we also noticed that although the efficiency of Td-CBE-P29A and Td-CBE-A48M was reduced, these variants showed very narrow editing windows. Thus, the linkers of these two constructs were removed to generate eTd-CBE-P29A (eTd-CBEa) and eTd-CBE-A48M (eTd-CBEm), which showed single-cytosine edits at this target as well as at an additional target site (Fig. 2a and Supplementary Fig. 8), suggesting that these two variants were very efficient and precise.To further investigate the performance of Td-CBE with an E27R variant (named Td-CBEmax), multiple endogenous targets were tested. As Td-CBEmax was very efficient, we compared it with BE4max side by side using 13 additional gRNA sites with scattered cytosines. The activity (defined as the activity at the position with the highest activity in each target) of Td-CBEmax at these targets ranged from 57.7% to 94.9%, which was comparable to BE4max (44.9–93.2%; Fig. 2b and Supplementary Fig. 9a). We also noticed that the major editing window of Td-CBEmax was reduced to 3nt (positions 5–7) compared to the 5-nt window (positions 5–9) of BE4max (Fig. 2c). Consistent with Td-CGBE, Td-CBEmax also induced a steady, low rate of indels ranging from 1.6% to 7.5% (4.6% on average), but BE4max had a 1.8 fold higher indel rate (8.5% on average) of up to 24.9%, suggesting that Td-CBEmax induced a low level of severe DNA damage (Supplementary Fig. 9b). The data demonstrate that Td-CBEmax is a highly efficient CBE inducing fewer bystander variants and indels at the evaluated target sites. We also noticed that the protein level of Td-CBEmax was higher than that of BE4max (Supplementary Fig. 9c). As the above data showed eTd-CBE variants displaying high efficiency and a narrow window, we next compared them with other accurate CBEs with either a narrow window (BE4max-YE1 and BE4max-YEE) or preferential editing in a defined sequence context, such as eA3A-BE4max and A3G-BE5.13, which prefer TC*N and CC*N motifs, respectively13,14,24. Through examination of 12 endogenous targets including cytosine-rich sites, we demonstrated that eTd-CBE had comparable activity but a more condensed window in comparison to BE4max-YE1 (C5–C6 versus C4–C8). Although eTd-CBEm and eTd-CBEa showed a little bit less efficiency than BE4max-YEE, these two variants edited a single cytosine at C5 or C6 in 9 of 12 target sites with up to 48.3% efficiency, while BE4max-YEE only induced single-cytosine conversion at two sites (Fig. 2d,e and Supplementary Fig. 10a). eTd-CBE variants showed 12.5-fold (ranging from 1.8- to 131.4-fold) higher precision (determined by dividing the efficiency at the highest position with the highest activity by that at the position with the second highest activity) compared to BE4max-YEE, suggesting that eTd-CBE variants were more accurate and induced less bystander editing than BE4max-YE1 or BE4max-YEE (Supplementary Fig. 10b). The eTd-CBE variants also showed comparable activity but a more condensed editing window and higher precision (up to 90-fold higher) in comparison to other accurate CBEs, such as eA3A-BE4max and A3G-BE5.13 (Fig. 2e and Supplementary Fig. 10b). Consistent with Td-CGBE, eTd-CBEs made no adenine edits and only induced less than 2% indels on average, which was a much lower rate than with BE4max-YE1, eA3A-BE4max and A3G-BE5.13 (Supplementary Fig. 10c,d). The reduction of indels by Td-CBE variants was not due to its lower expression level (Supplementary Fig. 9c).

Off-target evaluation of Td-CGBE and Td-CBEs

Similar to CGBE1, Td-CGBE induced background levels of cytosine mutations in analysis of all 36 in silico-predicted Cas9-dependent off-target sites25, while CGBE-XRCC1 induced low-level off-target editing at 2 sites (Supplementary Fig. 11a). Moreover, in an enhanced orthogonal R-loop assay26,27, CGBE1 generated mild rates of off-target effects (⁓3% on average and up to 5.7%), but CGBE-XRCC1 (⁓8.4% on average and up to 17.5%) generated much higher Cas9-independent cytosine off-target editing (Fig. 3a). Although ABE8e did not induce cytosine editing, it generated more severe Cas9-independent adenine editing. In contrast, Td-CGBE induced both cytosine and adenine off-target conversions at background levels (0.1–0.9%), suggesting that it did not generate deaminase-induced random editing (Fig. 3a,b and Supplementary Fig. 11b). Taking the data together, compared with AID/APOBEC-based CGBEs, Td-CGBE is very efficient and induces fewer bystander edits and minimal indels. As Td-CGBE also displays diminished Cas9-independent DNA off-target effects, it demonstrates that high-quality cytosine base conversions could be achieved by non-APOBEC family enzymes.To extensively investigate the off-target effects of Td-CBEs, several strategies were used. When evaluating 29 predicted off-target sites from three loci, we found that Td-CBE variants exhibited much lower Cas9-dependent off-target editing than APOBEC family CBEs. No obvious increased off-target editing was found in Td-CBEmax-treated sites, but in 10 of these 29 sites Td-CBEmax showed a dramatic decrease compared to BE4max. For the more accurate variants, eTd-CBEm and eTd-CBEa had much lower off-target editing efficiency (average <1%) compared to BE4max-YE1 (average 6.8%) and similar to BE4max-YEE (average 1.3%) at most of the off-target sites (Supplementary Figs. 12 and 13). Using the enhanced orthogonal R-loop assay, we found that Td-CBEmax induced much fewer edits compared to BE4max in allsix sites. Compared to the accurate CBEs, eTd-CBEm and eTd-CBEa induced only background editing, which was slightly lower than that of BE4max-YEE at three of the six tested sites and much lower than that of BE4max-YE1, eA3A-BE4max and A3G-BE5.13 (Fig. 3c).Recently, we developed an unbiased Detect-seq method for genome-wide assessment of CBE off-target effects and revealed that previous CBEs also generated unexpected edits outside the protospacer region and on the target strand28,29. We then performed Detect-seq experiments to test BE4 family and Td-CBE variants with sgRNA targeting the promiscuous VEGFA site 2. Through this evaluation, we found that BE4max (946 sites) induced 2.1-fold more off-target edits compared to Td-CBEmax (446 sites), although the editors had similar on-target efficiencies. Additionally, eTd-CBEm and eTd-CBEa induced similar numbers of off-target edits (37 and 32 sites, respectively) compared to BE4max-YEE (33 sites), which were much fewer than with BE4max-YE1 (387 sites), BE4max and Td-CBEmax (Fig. 3d). In addition, eTd-CBEm and eTd-CBEa had a narrow editing window as well as rigorous sequence context requirements in comparision to BE4max-YE1 and BE4max-YEE, as determined by analysis of off-target editing events (Supplementary Fig. 14a,b). Although Td-CBEs were not derived from APOBEC family enzymes, Td-CBEmax, eTd-CBEm and eTd-CBEa did not cause any de novo off-target sites compared to BE4max (Supplementary Fig. 14c). Notably, in contrast to previous results in which BE4max caused unexpected out-of-protospacer and target-strand edits, no such off-target events were observed in Td-CBE variant-treated cells (Fig. 3e). This suggests that Td-CBEs have the marked advantages of lower genome-wide off-target effects than most BE4max serial editors and are comparable to BE4max-YEE.To evaluate Cas9-independent RNA off-target effects, transcriptome profiling was used. Consistent with previous reports11,12, BE4max induced numerous C-to-U edits and BE4max-YE1 produced fewer off-target events. Td-CBEmax only induced 0.25% of the RNA off-target edits of BE4max, although we determined that the editorshad a similar DNA editing activity (Fig. 3f). Moreover, Td-CGBE and eTd-CBEs induced background levels of C-to-U off-target edits (Fig. 3fand Supplementary Fig. 15a,b). Additionally, only background levels of A-to-I RNA edits were observed for BE4max and TadA-derived editors, further confirming that the variants induced in TadA-8e fully abolished its adenine deaminase activity (Supplementary Fig. 15c). These results demonstrate that Td-CGBE and Td-CBE variants have almost eliminated Cas9-independent RNA off-target effects.

Fig. 2 | Evolution and characterization of Td-CBEs in mammalian cells.a, Comparison of C-to-T editing efficiencies of diverse Td-CBE variants at the FGF6-sg4 site in HEK293T cells. Data represent the mean of three independent experiments. b, Evaluation of the C-to-T editing efficiencies of BE4max and Td-CBEmax at six representative endogenous genomic loci in HEK293T cells. c, Average C-to-T editing efficiencies of BE4max and Td-CBEmax at 13 target sites in b and Supplementary Fig. 8a. d, The C-to-T editing efficiencies of the indicated CBEs at six representative endogenous genomic loci in HEK293T cells. e, Average C-to-T editing efficiencies of the indicated CBEs at 12 target sites in d and Supplementary Fig. 9a. b–e, Data represent the mean of three independent experiments except for BE4max in FANCF-sg17 and BE4max-YE1 in EMX1-sg7 (n = 2).

Fig. 3 | Off-target assessment of Td-CGBE and Td-CBEs. a, Cas9-independent DNA off-target analysis of the cumulative cytosine edits induced by ABE8e, Td-CGBE, CGBE1 and CGBE-XRCC1 using the modified orthogonal R-loop assay. b, Cas9-independent DNA off-target analysis of A-to-G edits induced by ABE8e and Td-CGBE using the modified orthogonal R-loop assay. c, Cas9-independent DNA off-target analysis of cumulative C-to-T edits induced by the indicated CBEs using the modified orthogonal R-loop assay. Data are mean ± s.d. (n = 3 independent experiments except for Td-CBEmax in R-loop 1 and eTd-CBEa in R-loop 6 with two biological replicate experiments). d, Genome-wide distribution of off-target effects determined by Detect-seq on each chromosome for the indicated CBEs with an sgRNA targeting VEGFA site 2. On- and off-target edits are indicated by red squares and blue circles, respectively. The number of off-target sites is in parentheses. e, Counts of out-of-protospacer editing, target-strand editing and all identified off-target events for the indicated CBEs. f, Jitter plots showing the ratio of RNA C-to-U editing (y axis) from the RNA-seq experiments. The total number of modified bases is listed on the top. Each dot represents an edited cytosine position in RNA. Each biological replicate is listed on the bottom. In a and b, data are mean ± s.d. (n = 3 independent experiments).

Efficient and accurate editing in mouse embryos by Td-CGBE

To evaluate the potential of TadA-derived base editors for application, we tested their performance in mouse embryos. When injecting Td-CGBE mRNA and an sgRNA to target Tyr gene exon 1 to create a premature stop codon in mouse embryos (Fig. 4a), 20 of the 21 F0pups obtained had cytosine conversions with an average of 55.6% efficiency (Fig. 4b and Supplementary Fig. 16a). An Albino phenotype was observed in F0 founders, suggesting ablation of tyrosinase function (Fig. 4c,d). In total, 67% of the pups had the desired C-to-G edits with 36% average efficiency (up to 84.9%), which was much higher than in a previous report, which used an optimized CGBE1 to edit the same target without observing an albino phenotype in founders30 (Fig. 4e,fand Supplementary Fig. 16b,c). Similar to the data obtained in cell lines, Td-CGBE induced few indels in mouse embryos (Supplementary Fig. 16d). Our results demonstrated that the TadA-derived cytosine deaminase was efficient not only in cell lines but also in mouse embryos and was likely even more efficient than APOBEC1-derived CGBE1 in vivo.

Precise editing of pathogenic single-nucleotide variants by Td-CGBE and Td-CBEs

As Td-CGBE and Td-CBE showed higher precision than other typical editors, we tried to compare them with representative base editors to edit pathogenic single nucleotide variants (SNVs) in homopolymeric cytosine sites. To create pathogenic C-to-G SNVs, Td-CGBE was delivered with individual sgRNAs targeting cytosine-rich sites, including the MPZ gene (causing Charcot Marie Tooth disease type 2I31) and the PTEN gene (causing macrocephalus32) in HEK293T cells. Compared with CGBE1 and CGBE-XRCC1, Td-CGBE was very efficient and predominantly edited the desired single cytosine, which was occured at a 1.5- and 8-fold higher rate than CGBE1-induced precise editing at two sites (Fig. 5a). Additionally, eTd-CBEa and eTd-CBEm introduced a pathogenic C-to-T mutation in the KCNA2 gene (causing epileptic encephalopathies33) with 61.6% and 78.7% efficiency, respectively, frequencies that were much higher than with BE4max-YE1 (10.9%) and BE4max-YEE (8.4%) (Fig. 5b). To test the potential for correction of pathogenic SNVs, stable cell lines containing pathogenic variants were generated. The data showed that Td-CGBE generated much higher C-to-G correction ratios and fewer indels than CGBE1 and CGBE-XRCC1 in two cell lines containing G-to-C variants (CELA2A c.639+1G>C causing early-onset atherosclerosis34 or HBB c.328G>C causing Hemoglobin Johnstown35; Fig. 5c and Supplementary Fig. 17a,b). In cell lines bearing T·A-to-C·G variants (TUBB6 c.1181T>C causing congenital nonprogressive bilateral facial palsy36 or PFN1 c.350A>G causing amyotrophic lateral sclerosis37), BE4max-YE1 and BE4max-YEE mainly induced simultaneous doubleor triple-cytosine transitions, but eTd-CBEm (2.9- and 1.9-fold-higher rate of correction than with BE4max-YEE) and eTd-CBEa (4.1- and 1.9-fold-higher rate of correction than with BE4max-YEE) induced much higher rates of precise corrections and generated fewer indels (<1% on average; Fig. 5d and Supplementary Fig. 17c,d). These data suggested that Td-CGBE and eTd-CBEs were efficient for precise generation or correction of pathogenic SNVs, especially for precise editing of single nucleotides in polycytosine sites.

Evaluation of eTd-CBEs by target library analysis

To unbiasedly evaluate the precision of eTd-CBEs, the gRNA–target pair strategy38 was adapted to generate a library of 9,120 oligonucleotides individually composed of all possible 6-mers, where cytosines (number ≥1) were distributed in positions 4–9 of a protospacer (Methods). The Tol2 transposon was leveraged to stably integrate our library into the genome of HEK293T cells, before stable transfection of candidate CBEs. An average 96% coverage of greater than 300× per guide-target pair was maintained throughout the culturing process (Supplementary Table 6). The editing efficiency at the position with the highest activity in each target was defined as 100%, and the relative activity of other positions was determined through comparison with this position. The editing activity analyzed from three CBEs showed that BE4max-YE1 (evaluated for 8,949 sgRNAs) had a major editing window (>40%) ranging from positions 4–8, whereas eTd-CBE (evaluated for 8,737sgRNAs) narrowed the window to positions 4–6 (Fig. 5e). As expected, an extremely condensed 1-nt window was observed for eTd-CBEm (evaluated for 8,522 sgRNAs) with the highest efficiency on position 5. The motif preferences of the eTd-CBEs were characterized in further analysis of ~2,700 targets containing C5. Similar to BE4max-YE1, these editors had the capability of a wide range of accurate C-to-T editing without a strict sequence context requirement (Fig. 5f). As eTd-CBEm preferentially edits cytosine in position 5 of protospacers without motif restrictions, it potentially corrects the majority of A-to-G pathogenic SNVs upon fusion to PAM-relaxed Cas9 variants, such as SpNG and SpRY22,39, as we demonstrated for the Td-CGBE-NG variant.

Fig. 4 | Examination of mouse embryos with Td-CGBE. a, Schematic of the target sequence in the exon 1 locus of the mouse Tyr gene. The sgRNA target sequence is in black, and the PAM is in bold. The desired C6-to-G transversion causing the premature stop codon TGA is in red. b, Cytosine conversion frequencies in mutant F0 mice (n = 20). c, Phenotype of F0 mice generated by Td-CGBE injection. The picture on the left was taken when the mice were 7 days old, while the right one was taken when mice were at 21 days old. WT, wild type. d, Sanger sequencing chromatograms of DNA from representative F0 mice (T02) and WT mice injected with Td-CGBE mRNA. e, Genotyping of representative F0 pups treated with Td-CGBE mRNA. The frequencies of mutant alleles were determined by high-throughput sequencing. f, C6-to-G editing frequencies in C-to-G founders generated by Td-CGBE (n = 14). In b and f, data are mean ± s.d. and each data point represents an individual mouse.

Fig. 5 | Precise editing of pathogenic SNVs by TadA-derived base editors and target library analysis for eTd-CBEs. a, Generation of C-to-G conversion of disease-relevant cell models (MPZ c.178G>C and PTEN c.106G>C) by CGBE1, CGBE-XRCC1 and Td-CGBE. b, Generation of C-to-T conversion of a disease-relevant cell model (KCNA2 c.890G>A) by BE4max-YE1, BE4max-YEE, eTd-CBEm and eTd-CBEa. c, Correction of pathogenic mutations by the indicated CGBEs in stable cell lines (CELA2A c.639+1G>C and HBB c.328G>C). d, Correction of pathogenic mutations by the indicated CBEs in stable cell lines (TUBB6c.1181T>C and PFN1 c.350A>G). e, Unbiased sgRNA target pair library analysis of the indicated CBEs. The heatmap represents the relative editing efficiency computed using the highest C-to-T base editing efficiency as 100%. Positions of the protospacer are shown at the bottom of the heatmap. f, Motif visualization of the indicated CBEs based on the data in e using sgRNAs with cytosine at position 5 of the protospacer. In a–d, protospacers of models or corrections are in black with the PAM in gray. Desired cytosines to be edited are indicated with yellow squares. The first line marked with an asterisk indicates desired alleles. The allele frequencies were determined by high-throughput sequencing. Data represent the mean of three independent experiments. NS, not significant.

Discussion

Traditional CBEs are all based on AID/APOBEC family natural cytosine deaminases that endow CBE variants with distinct features. In this study, we developed an unnatural cytosine deaminase evolved from adenine deaminase TadA-8e and generated a series of TadA-derived base editors to enable highly efficient and accurate C-to-G or C-to-T conversions. Because TadA-8e is a DNA adenine deaminase variant evolved from the tRNA adenine deaminase TadA in E. coli, this study further demonstrates the potential of molecular engineering to advance genome editing tools.A recent report demonstrated that a P48R variant in ABE7.10 could increase its cytosine editing efficiency in a restricted TC*N sequence context and decrease but not eliminate its adenine conversion activity19. Moreover, when the variant is introduced in ABE8e, substrate selectivity is severely impaired (Supplementary Fig. 1c). In this study, we have successfully converted the TadA enzyme into a pure cytosine deaminase through the introduction of an N46L variant in TadA-8e, suggesting that N46 is a very conserved residue critical for adenine deaminase activity. This is consistent with a previous study that reported that the N46A variant in miniABEmax fully abolished adenine editing activity16, but the variant resulting from introduction of N46A in wild-type TadA of ABEmax still showed compromised activity40. In the TadA structure, N46 forms a weak hydrophilic interaction with the purine group or analogs of its tRNA or DNA substrates18, whereas in cytosine deaminase structures the corresponding asparagine residue of APOBEC3A or APOBEC3B mainly forms polar contacts with the ribose group of its single-stranded DNA substrate41. Thus, from a structural point of view, the N46 residue contributes distinctly to recognition of different substrates. Moreover, based on the massive mutagenesis studies on N46, we found that editing adenine is highly fragile by variations on N46 and only the substitution to a similar residue (N46D) could preservethe activity, suggesting that the hydrophilic interaction contributed by the side chain of N46 is essential for the deaminase activity of A-to-G conversion while the editing of cytosine is not sensitive to the majority of N46 substitutions. Substitution with several amino acids with small side chains (for example, cysteine, glycine, serine, threonine, valine or proline) led to even higher activity, while substitutions to amino acids with big side chains (for example, histidine, phenylalanine, arginine or tryptophan) mostly abrogates the cytosine deaminase activity, indicating that bulky residues may push the substrate cytosine away from the activation pocket (Supplementary Fig. 1a). Compared to other small residues, the N46L variant would provide an optimal conformation to allow deamination of cytosines but not adenines by disrupting the original adenosine deaminase structure to a dead enzyme.We also found that the N46L variant could be partially applied to older versions (ABEmax or miniABEmax) to increase the cytosine editing efficiency within a TC*N motif, and we showed that the N46L variant in ABEmax and miniABEmax also abolished their A-to-G editing efficiency (Supplementary Fig. 2). In TadA-8e, the newly introduced eight substitutions are located far away from the N46 residue, and these substitutions may not affect the substrate selectivity determined by N46. Moreover, we found the N46L together with this ABE8e variant further enhanced the cytosine editing efficacy as well as expanded the editing scope (Supplementary Fig. 2). Further investigation of the substrate-bound structure for ABEmax as well as our TadA-derived base editors may help us to understand the exact mechanisms of how these mutations synergistically turned ABE8e into highly efficient TadA-derived base editors. The work described here extends our understanding of the versatile potential of the tRNA deaminase TadA, which has been evolved to efficiently catalyze adenine deamination in single-stranded DNA substrates and evolved here into a cytosine deaminase.In this study, we have developed a distinct series of cytosine editors, Td-CGBE and Td-CBEs, derived from the TadA-8e adenine deaminase. They not only are cytosine editors without leveraging the AID/APOBEC family of deaminases, but also have distinct superiorities such as the lowest indel rate, greatly reduced bystander mutations and a background level of Cas9-independent DNA and RNA off-target effects. As they are able to efficiently generate single-cytosine conversion at homopolymeric cytosine sites without sequence context requirements in vitro and in vivo, Td-CGBE and Td-CBEs are promising accurate editors, expanding the targeting scope of precision base conversions when fused with PAM-relaxed Cas9 variants, for a wide range of applications.