Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

C-to-G editing generates double-strand breaks causing deletion, transversion and translocation

Abstract

Base editors (BEs) introduce base substitutions without double-strand DNA cleavage. Besides precise substitutions, BEs generate low-frequency ‘stochastic’ byproducts through unclear mechanisms. Here, we performed in-depth outcome profiling and genetic dissection, revealing that C-to-G BEs (CGBEs) generate substantial amounts of intermediate double-strand breaks (DSBs), which are at the centre of several byproducts. Imperfect DSB end-joining leads to small deletions via end-resection, templated insertions or aberrant transversions during end fill-in. Chromosomal translocations were detected between the editing target and off-targets of Cas9/deaminase origin. Genetic screenings of DNA repair factors disclosed a central role of abasic site processing in DSB formation. Shielding of abasic sites by the suicide enzyme HMCES reduced CGBE-initiated DSBs, providing an effective way to minimize DSB-triggered events without affecting substitutions. This work demonstrates that CGBEs can initiate deleterious intermediate DSBs and therefore require careful consideration for therapeutic applications, and that HMCES-aided CGBEs hold promise as safer tools.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: In-depth profiles of cytidine base-editing outcomes.
Fig. 2: CGBE generates intermediate DSB at editing sites.
Fig. 3: Detection of CGBE-initiated chromosomal translocation.
Fig. 4: Genetic dissection of on-target indel formation.
Fig. 5: Molecular pathways under C-to-G transversion.
Fig. 6: Minimizing on-target indels with HMCES-aided tools.

Similar content being viewed by others

Data availability

BE-outcomes in 513 DNA repair gene knockouts are supplied as Supplementary Table 3. Raw high-throughput sequencing data, including BE-outcome screening, Base editing amplicon, Tn5-HTGTS, ATAC-seq and END-seq, were deposited in the SRA database (accession numbers, PRJNA839540 and PRJNA841413). Other datasets were downloaded from the SRA database (accession numbers: RNA-seq, SRR9019712, SRR9019714, SRR10251291, SRR10251292, SRR7988530, SRR7988534, SRR10590683, and SRR10590685; H3K27ac ChIP–seq, SRR14879752 and SRR14879751; H3K4me3 ChIP–seq, SRR14879757 and SRR14879751; H3K27me3 ChIP–seq, SRR11040439 and SRR11040434; H3K9me3 ChIP–seq, SRR11040453 and SRR11040444; PRO-seq, SRR7988488). Source data are provided with this paper.

References

  1. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, af8729 (2016).

    Article  Google Scholar 

  3. Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Porto, E. M., Komor, A. C., Slaymaker, I. M. & Yeo, G. W. Base editing: advances and therapeutic opportunities. Nat. Rev. Drug Discov. 19, 839–859 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).

    Article  CAS  PubMed  Google Scholar 

  6. Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480.e30 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Huang, T. P., Newby, G. A. & Liu, D. R. Precision genome editing using cytosine and adenine base editors in mammalian cells. Nat. Protoc. 16, 1089–1128 (2021).

    Article  CAS  PubMed  Google Scholar 

  8. Kim, D. et al. Genome-wide target specificities of CRISPR RNA-guided programmable deaminases. Nat. Biotechnol. 35, 475–480 (2017).

    Article  CAS  PubMed  Google Scholar 

  9. Zuo, E. et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289–292 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292–295 (2019).

    Article  CAS  PubMed  Google Scholar 

  11. Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotechnol. 39, 35–40 (2021).

    Article  CAS  PubMed  Google Scholar 

  12. Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41–46 (2021).

    Article  CAS  PubMed  Google Scholar 

  13. Koblan, L. W. et al. Efficient C*G-to-G*C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat. Biotechnol. 39, 1414–1425 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Chen, L. et al. Programmable C:G to G:C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nat. Commun. 12, 1384 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Hussmann, J. A. et al. Mapping the genetic landscape of DNA double-strand break repair. Cell 184, 5653–5669.e25 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635–5652.e29 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Liu, L. D. et al. Intrinsic nucleotide preference of diversifying base editors guides antibody ex vivo affinity maturation. Cell Rep. 25, 884–892 e883 (2018).

    Article  CAS  PubMed  Google Scholar 

  20. Lemos, B. R. et al. CRISPR/Cas9 cleavages in budding yeast reveal templated insertions and strand-specific insertion/deletion profiles. Proc. Natl Acad. Sci. USA 115, E2040–E2047 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Shou, J., Li, J., Liu, Y. & Wu, Q. Precise and predictable CRISPR chromosomal rearrangements reveal principles of Cas9-mediated nucleotide insertion. Mol. Cell 71, 498–509.e494 (2018).

    Article  CAS  PubMed  Google Scholar 

  22. Canela, A. et al. DNA breaks and end resection measured genome-wide by end sequencing. Mol. Cell 63, 898–911 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Frock, R. L. et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat. Biotechnol. 33, 179–186 (2015).

    Article  CAS  PubMed  Google Scholar 

  24. Meng, F. L. et al. Convergent transcription at intragenic super-enhancers targets AID-initiated genomic instability. Cell 159, 1538–1548 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Qian, J. et al. B cell super-enhancers and regulatory clusters recruit AID tumorigenic activity. Cell 159, 1524–1537 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Yu, K. AID function in somatic hypermutation and class switch recombination. Acta Biochim Biophys. Sin. 54, 759–766 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Cao, Q. et al. CRISPR-FOCUS: a web server for designing focused CRISPR screening experiments. PLoS ONE 12, e0184281 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Tang, S., Stokasimov, E., Cui, Y. & Pellman, D. Breakage of cytoplasmic chromosomes by pathological DNA base excision repair. Nature 606, 930–936 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Yan, C. T. et al. IgH class switching and translocations use a robust non-classical end-joining pathway. Nature 449, 478–482 (2007).

    Article  CAS  PubMed  Google Scholar 

  30. Hao, Q. et al. DNA repair mechanisms that promote insertion-deletion events during immunoglobulin gene diversification. Sci. Immunol. 8, eade1167 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lieber, M. R. The mechanism of double-strand DNA break repair by the nonhomologous DNA end-joining pathway. Annu. Rev. Biochem. 79, 181–211 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Yang, W. & Gao, Y. Translesion and repair DNA polymerases: diverse structure and mechanism. Annu. Rev. Biochem. 87, 239–261 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Yang, D. et al. REV7 is required for processing AID initiated DNA lesions in activated B cells. Nat. Commun. 11, 2812 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Mohni, K. N. et al. HMCES maintains genome integrity by shielding abasic sites in single-strand DNA. Cell 176, 144–153.e13 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Mehta, K. P. M., Lovejoy, C. A., Zhao, R., Heintzman, D. R. & Cortez, D. HMCES maintains replication fork progression and prevents double-strand breaks in response to APOBEC deamination and abasic site formation. Cell Rep. 31, 107705 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wu, L. et al. HMCES protects immunoglobulin genes specifically from deletions during somatic hypermutation. Genes Dev. 36, 433–450 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Karamitros, C. S. et al. Leveraging intrinsic flexibility to engineer enhanced enzyme catalytic activity. Proc. Natl Acad. Sci. USA 119, e2118979119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Stadtmauer, E. A. et al. CRISPR-engineered T cells in patients with refractory cancer. Science 367, eaba7365 (2020).

    Article  CAS  PubMed  Google Scholar 

  39. Yin, J. et al. Safeguarding genome integrity during gene-editing therapy in a mouse model of age-related macular degeneration. Nat. Commun. 13, 7867 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Nussenzweig, A. & Nussenzweig, M. C. Origin of chromosomal translocations in lymphoid cancer. Cell 141, 27–38 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620–628 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Nakamura, M. et al. High frequency class switching of an IgM+ B lymphoma clone CH12F3 to IgA+ cells. Int. Immunol. 8, 193–201 (1996).

    Article  CAS  PubMed  Google Scholar 

  43. Gunn, A. & Stark, J. M. I-SceI-based assays to examine distinct repair outcomes of mammalian chromosomal double strand breaks. Methods Mol. Biol. 920, 379–391 (2012).

    Article  CAS  PubMed  Google Scholar 

  44. Yeap, L. S. et al. Sequence-intrinsic mechanisms that target AID mutational outcomes on antibody genes. Cell 163, 1124–1137 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Olivieri, M. et al. A genetic map of the response to DNA damage in human cells. Cell 182, 481–496.e21 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Liu, X. et al. ERCC6L2 promotes DNA orientation-specific recombination in mammalian cells. Cell Res. 30, 732–744 (2020).

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  49. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Li, S. et al. Screening for functional circular RNAs using the CRISPR-Cas13 system. Nat. Methods 18, 51–59 (2021).

    Article  PubMed  Google Scholar 

  51. Hu, J. et al. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing. Nat. Protoc. 11, 853–871 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Picelli, S. et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 24, 2033–2040 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Shang, Y., Huang, M. E., Qin, Y. & Meng, F-. L. A Facile Tn5-HTGTS Protocol to Clone Chromosomal Structural Variants. PROTOCOL v.1 (Protocol Exchange, 2023); https://doi.org/10.21203/rs.3.pex-2494/v1

  54. Weber, R. A. et al. Maintaining iron homeostasis is the key role of lysosomal acidity for cell proliferation. Mol. Cell 77, 645–655.e7 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Wangen, J. R. & Green, R. Stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides. eLife 9, e52611 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Takahashi, H. et al. The role of mediator and little elongation complex in transcription termination. Nat. Commun. 11, 1063 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Lyabin, D. N. et al. YB-3 substitutes YB-1 in global mRNA binding. RNA Biol. 17, 487–499 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  59. Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Broche, J., Kungulovski, G., Bashtrykov, P., Rathert, P. & Jeltsch, A. Genome-wide investigation of the dynamic changes of epigenome modifications after global DNA methylation editing. Nucleic Acids Res. 49, 158–176 (2021).

    Article  CAS  PubMed  Google Scholar 

  62. Li, C. et al. Ligand-induced native G-quadruplex stabilization impairs transcription initiation. Genome Res 31, 1546–1560 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Patel, D., Patel, M., Datta, S. & Singh, U. CGGBP1 regulates CTCF occupancy at repeats. Epigenetics Chromatin 12, 57 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Qin, Q. et al. ChiLin: a comprehensive ChIP-seq and DNase-seq quality control and analysis pipeline. BMC Bioinform. 17, 404 (2016).

    Article  Google Scholar 

  65. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Halabelian, L. et al. Structural basis of HMCES interactions with abasic DNA and multivalent substrate recognition. Nat. Struct. Mol. Biol. 26, 607–612 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank L.-L. Chen, G. Xu, J. Li, L. Shen, T. Honjo, K. Yu and D. Gao for providing reagents, A. Nussenzweig for critical reading and help with END-seq, X. Xie, D. Yang, Ti. Liu, Y. Zhou, L. Shan, Z. Zhuo and S. Li for technical support. This work was supported by the National Key R&D Program of China 2023YFA0913700 (F.-L.M.), National Natural Science Foundation of China grant 32090040 (F.-L.M.), Strategic Priority Research Program of Chinese Academy of Science XDB0570000 (F.-L.M.), National Natural Science Foundation of China grant 32325019 (F.-L.M.), National Natural Science Foundation of China grant 31970880 (F.-L.M.), National Natural Science Foundation of China grant 31722020 (L.-S.Y.), National Natural Science Foundation of China grant 81861138014 (L.-S.Y.), National Natural Science Foundation of China grant 32170884 (X.L.), National Key R&D Program of China 2021YFA1301400 (L.-S.Y.). Shanghai Municipal Science and Technology Major Project HS2021SHZX001 (F.-L.M.), Natural Science Foundation of Shanghai grant 23XD1424000 (F.-L.M.), Chinese Academy of Sciences grant JCTD-2020-17 (F.-L.M. and X.Z.) and Chinese Academy of Sciences grant 318GJHZ2022010MI (F.-L.M.).

Author information

Authors and Affiliations

Authors

Contributions

F.-L.M. and L.-S.Y. developed the concept for the study. F.-L.M., L.-S.Y., W.W., Y. Sun, X.Z. and Y.X. developed the methodology. M.E.H., Y.Q., Y. Shang, Q.H., C.Z., C.L., S.L., L.D.L., S.Z., Y.Z., Y.W., N.L., S.W., T.G., B.W., Y.L., Y.C., X.L., Z.X., S.L. and P.D. performed the investigations. N.L., J.W., Y.W., Y. Sun, L.Z. and J.D. provided resources. M.E.H., Y.Q., Y.Shang, F.L.M. and L.S.Y. worked on visualization. F.L.M., L.S.Y., X.L. and X.Z. sourced funding. F.-L.M., L.S.Y., W.W., Y. Sun, Y.X., X.Z. and J.W. F.-L.M. and L.-S.Y. wrote the draft paper. F.-L.M., L.S.Y., W.W., Y. Sun, Y.X., X.Z., J.W., M.E.H., Y.Q. and Y. Shang reviewed and edited the final paper.

Corresponding authors

Correspondence to Leng-Siew Yeap or Fei-Long Meng.

Ethics declarations

Competing interests

Shanghai Institute of Biochemistry and Cell Biology has filed a patent application based on the findings in this article. Application number, 2022109688505; Inventor initials, F.-L.M., M.H., Y.C., Y.Q., Y. Shang, L.-D.L., X.L.

Peer review

Peer review information

Nature Cell Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cytidine editing outcome in-depth analyses.

a, Schematic illustration of the cytidine editing tools tested in the in-depth profiling assay. b, Substitution frequency (top) and C > G purity (bottom) of the indicated tools at 4 sgRNA targeting loci. Mean ± SEM of three biological replicates is shown. c, On-target deletion frequency at the 4 loci. Mean ± SEM of three biological replicates is shown. d, Representative western blot of BE expression from three replicates. e, Size distribution of on-target deletions. Mean ± SEM of three biological independent replicates are shown. f, Substitution and deletion profiles at the indicated loci after editing. Panel is labeled as Fig. 1f. Source numerical data and unprocessed blots are available in source data.

Source data

Extended Data Fig. 2 Cytidine editing initiates on-target deletion in different cell lines and sgRNA targets.

a, Substitution frequency and deletion profile are shown at the indicated loci upon cytidine editing in human dermal fibroblast and Hela cells. b, Deletion frequency is plotted against substitution frequency at the indicated locus across different cell types. Linear regression analysis was performed and Pearson’s r is shown. The significance of regression model was calculated by the F-test. Two-tailed p-value is shown. c, On-target deletion frequencies at 18 tested sgRNA target loci. Data from one replicate are shown. For boxplot, centre line indicates the median; box limits represent the upper and lower quartiles, and the whiskers represent minimum to maximum values. Source numerical data are available in source data.

Source data

Extended Data Fig. 3 Insertion profiles generated by BEs.

a, Insertion profiles are shown at the indicated loci after cytidine editing. 1bp-insertion are shown at the top and >1bp-insertion are shown at the bottom for each locus. In the profile, the inserted position was counted and the frequency is plotted along the nucleotides. b, Insertion subgroups. 1-bp insertions at a mononucleotide track (“Mono”), editing sites (“Edit”), or the Cas9-nicking site (“Nick”), and >1 bp insertions with duplication sequences are shown at the top. Example insertions are labelled along the DNA sequences. The count number of a particular insertion is shown in different colors from the total 1.9 million sequencing reads. c, Substitution, deletion and >1 bp insertion frequencies are summarized for the indicated tools at 4 tested loci from three biological replicates. For the Cas9 module, dCas9 (d) or nCas9 (n) was tested; for the AID/APOBEC module, catalytically active (+) or dead (-) module was tested. Two-tailed Student’s paired t-test was applied. For boxplot, centre line indicates the median; box limits represent the upper and lower quartiles, and the whiskers represent minimum to maximum values. Source numerical data are available in source data.

Source data

Extended Data Fig. 4 Intermediate DSBs revealed by different approaches.

a, Deletions and insertions at 11 loci after CGBE editing. Mean ± SD of three replicates is shown. Two-tailed Student’s paired t-test was applied. b, Knockout of XRCC4 was validated by Western blot (left). Expression of CGBE1 was examined in the indicated cell lines (right). Representative blots from two replicates are shown. c, Illustration of the END-seq experiment procedure. d, Expression levels of CGBE1, nCas9, and Cas9 are shown by a representative Western blot from two replicates. e, Two additional replicates of END-seq are shown. Panel is labeled as in Fig. 2d. f, Immunofluorescence for γH2AX in WT and XRCC4-deficient HEK293T cells with or without AID-CGBE editing. The sgRNA targeting a sgEMX1 site was used. One representative replicate is shown. Source numerical data and unprocessed blots are available in source data.

Source data

Extended Data Fig. 5 Chromosomal translocation profiles.

a, Circos plots of genome-wide translocation junctions in cells treated with nCas9 or the indicated BEs. Translocation junctions were binned into 5-Mb regions (black bars) and plotted on a log scale; the orange line connects the on-target bait site with a Cas-OT prey hotspot, the green line connects the on-target bait site with a deaminase-OT prey hotspot. The middle number indicates unique translocation junctions from 3.6 million edited cells. b, Distribution of translocation junctions at Cas-OT hotspots of an sgEMX1 on-target bait in cells edited with the indicated BE. c, Translocation, H3K27Ac and transcription profiles at an AID deaminase-OT hotspot. Top: Translocation junctions are indicated by black bars from cells transfected with AID-CGBE or CGBE1. Middle (H3K27Ac and SE): The H3K27Ac ChIP-seq profile is shown in orange, and identified SEs are depicted below with orange bars. Bottom (PRO-seq): PRO-seq detected sense and antisense transcription are shown in blue and red, respectively. d, Enrichment of scattered translocation junctions at genomic regions associated with the indicated histone markers, open chromatin and transcription in cells transfected with AID-CGBE from the 4 baits. Each dot indicates average number of two biological replicates at the indicated locus. Mean ± SD of 4 sgRNA loci is shown. e, Relative motif enrichment in AID-CGBE-edited samples compared to that of random sampling of the human genome. Each dot indicates average number of two biological replicates at the indicated locus. Mean ± SD of 4 sgRNA loci is shown. f, Substitution frequency (left) and C > G purity (right) of indicated pairs of synthetic sequences with and without palindromic deamination motif. The editing assay was performed in a 48-well plate format. Four independent replicates are shown. For boxplot, centre line indicates the median; box limits represent the upper and lower quartiles, and the whiskers represent minimum to maximum values. g, Substitution (upper) and deletion (lower) profile of CGBE1 on 5 pairs of synthetic sequences as shown in Fig. 1f. “TCAA” was applied as a control for “TCGA”. h, Percentage of sgRNAs containing palindromic motifs in the editing window. A total of 3040 potential CGBE-correcting loci were analysed. For Panels d and e, two-tailed one-sample t-test was performed. For Panel f, two-tailed Student’s paired t-test was applied. Source numerical data are available in source data.

Source data

Extended Data Fig. 6 BE outcome-screening.

a, Genes in the DDR-focused library, the Repair-seq library of Koblan et al., 2021, and the core DNA repair gene list defined by Olivieri et al, 2020, are shown in a Venn diagram with the numbers of genes labeled. b, Schematic illustration of the SaBE tools used in the CRISPR screening assay, including SaCGBE1, AID-SaBE3, AID-SaCGBE, and AID-SaBE1. c, Substitution profiles of the SaBE-Test locus for SaCGBE1 from 6 million reads. Deletion boundary profiles are shown at right. d, Substitution profiles of the SaBE-Test for AID-SaBE3 or AID-SaBE1 from 6 million reads. Deletion boundary profiles are shown at right. e, Schematic illustration of the BE outcome groups, with the example sequences of each group shown on the right. f, Radar plots showing base editing outcomes of gene knockouts. The wild-type result was used as a reference (blue line) in each panel. Source numerical data are available in source data.

Source data

Extended Data Fig. 7 CGBE editing outcomes in DNA repair deficient cell lines.

a, Deletion, substitution frequencies, and C > G purity are plotted for the indicated loci. Mean ± SD of 4 replicates is shown in the bar graph. The expression levels of indicated tools are shown by a representative Western blot on the right. b, Deletion (left) and substitution (right) frequencies were plotted for the 3 indicated loci in the indicated cell lines. Mean ± SD of three replicates is shown in the bar graph. Expression levels of CGBE1 in the indicated cells are shown by a representative Western blot from two biological replicates. A 3-fold serial dilution of protein samples was applied in Western blot. c, MH usage in the deletion junctions at 2 loci in the indicated mouse B cell lines. Mean ± SD of 3 replicates is shown. Two-tailed Student’s paired t-test was applied. d, Fractions of insertions are plotted for the indicated tools. e, Proportions of 1-bp and >1 bp insertions, and proportions of duplication and other insertions of >1 bp insertions are shown. f, Insertions at the SaBE test locus. Four insertion subtypes are shown at the top and example duplications are illustrates at the bottom. g, Negatively-enriched genes in >1 bp insertions. The heatmap is shown as in Fig. 4c. The p-value was also calculated by permutation test and FDR was calculated by the Benjamini-Hochberg procedure from MAGeCK. h, 1 bp and >1 bp insertion profiles of HEK293T XRCC4-/- and its parental cell lines at the sgFANCF locus. i, Expression levels of CGBE1 in the indicated cells are shown by a representative Western blot from two biological replicates. A 3-fold serial dilution of protein samples was applied in Western blot. j, Substitution, C > G purity, >1 bp insertion, and deletion are plotted. Mean ± SD of three replicates. Source numerical data and unprocessed blots are available in source data.

Source data

Extended Data Fig. 8 DSB end-joining induces aberrant C > G transversion.

a, C > G purity at three loci in CasRx-knockdown HEK293T cells. Each dot indicates average C > G purity of each positions from three biological replicates. The RFWD3 mRNA levels are shown on the left (mean ± SD of 3 biological replicates). b, Genotoxicity of REV1-deficient HEK293T cells after UV treatment. Mean ± SD of 3 replicates is shown. Knockout of REV1 was validated by Western blot. Representative blot is shown from 2 replicates. c, C > G purity after CGBE editing is plotted for each locus. Mean ± SD of 3 replicates is shown. Expression levels of CGBE1 in the indicated cells are shown by a representative Western blot from 2 replicates. A 3-fold serial dilution of protein samples was applied in Western blot. d, C > G purity is plotted for 3 loci in the Rev1-/-, Rev7-/-, Rev3l-/- and parental CH12F3 cell line. Mean ± SD of 3 independent replicates is shown. e, Expression levels of CGBE1 in the indicated cells are shown by a representative Western blot. A 3-fold serial dilution of protein samples was applied in Western blot. f, C > G purity after CGBE editing is plotted for each locus. Mean ± SD of 3 replicates is shown. g, Fold change of C > G purity in XRCC4 deficiency vs. wild-type is plotted against that of deletion frequency at 11 sgRNA loci. Linear regression analysis was performed and Pearson’s r is shown. The significance of regression model was calculated by the F-test. h, A working model of C > G transversion. For Panels a, c and f, two-tailed Student’s paired t-test was applied. For Panel b, two-tailed Student’s unpaired t-test was applied. For boxplots in a and c, centre line indicates the median; box limits represent the upper and lower quartiles, and the whiskers represent minimum to maximum values. Source numerical data and unprocessed blots are available in source data.

Source data

Extended Data Fig. 9 Optimization and characterization of CGBEfH/pH tools.

a, A schematic of the AID-CGBEfH1 construct is shown on top. Deletion frequencies at three sgRNA loci from three biological replicates are plotted at bottom. The same sgRNA loci are connected by a line. Mean ± SD is shown. b, Relative deletion frequency is shown as a heatmap for the AID-CGBEfH2 at sgFANCF locus. Deletion frequency was compared to that generated by AID-CGBE. c, Schematic illustration of C > G screening and protein evolution strategy. Validation of the mutations is shown on the right. Deletion and substitution frequencies are shown for each tool. Two biological replicates are shown for K83R/E165A/P185E/Q235K/Q235R. For other mutants, mean ± SD of 3 biological replicates is shown. For boxplots, centre line indicates the median; box limits represent the upper and lower quartiles, and the whiskers represent minimum to maximum values. d, Deletion and >1 bp insertion frequencies are plotted at the 7 sgRNA loci tested for the indicated tools. Mean ± SD of 3 biological replicates. Expression of CGBE tools is shown by Western blot. Representative blot is shown from 2 replicates. e, >1 bp insertion frequency is plotted at the 7 sgRNA loci tested for the indicated tools. Mean ± SD of 3 replicates. Expression of CGBE and HMCES is shown by Western blot. Representative blot is shown from 2 replicates. f, Deletion and C > G frequency are shown as mean ± SD of 3 biological replicates. g, C > G frequency are shown for synthetic ‘TCGA’ sgRNA sequences. Mean ± SD of 4 biological replicates. h, Deletion frequency is shown as mean ± SD of 3 replicates for the indicated tools. i, Numbers of normalized END-seq DSB ends are plotted along a 50 bp region of the sgEMX1 or sgFANCF locus. The orientation of DSB ends is indicated in blue (+) or red (-). j, Translocation junctions upon CGBE1 or CGBE1pH editing from the sgHBB bait. For Panels a, c, d, e, f, g and h, two-tailed Student’s paired t-test was applied. Source numerical data and unprocessed blots are available in source data.

Source data

Extended Data Fig. 10 High-fidelity Cas or optimized deaminase module cannot reduce CGBE-initiated deletion and GFP gating strategy.

a, CGBE1-derived tools with modified Cas/deaminase modules are shown on the top. Substitution frequency, C > G purity, deletion frequency, and insertion frequency were shown side-by-side between CGBE1 and optimized tool at 4 tested loci. Data from two independent replicates are shown. Source numerical data are available in source data. b, GFP gating strategy. In the transfection efficiency control assay, cells co-transfected with a GFP-expressing plasmid were gated on FSC and SSC for live cells. Live cells were further gated on FITC intensity. Non-transfected cells were assayed as a negative control.

Source data

Supplementary information

Reporting Summary

Peer Review File

Supplementary Table 1

Detail data of Tn5-HTGTS and END-seq.

Supplementary Table 2

BE outcome-screening results.

Supplementary Table 3

Editing efficiency, substitution spectrum and indel frequency in 513 DNA repair deficiencies.

Supplementary Table 4

Primers, plasmids and antibodies used in this study.

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, M.E., Qin, Y., Shang, Y. et al. C-to-G editing generates double-strand breaks causing deletion, transversion and translocation. Nat Cell Biol 26, 294–304 (2024). https://doi.org/10.1038/s41556-023-01342-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41556-023-01342-2

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing