Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

Integrated multiomic profiling of breast cancer in the Chinese population reveals patient stratification and therapeutic vulnerabilities

Abstract

Molecular profiling guides precision treatment of breast cancer; however, Asian patients are underrepresented in publicly available large-scale studies. We established a comprehensive multiomics cohort of 773 Chinese patients with breast cancer and systematically analyzed their genomic, transcriptomic, proteomic, metabolomic, radiomic and digital pathology characteristics. Here we show that compared to breast cancers in white individuals, Asian individuals had more targetable AKT1 mutations. Integrated analysis revealed a higher proportion of HER2-enriched subtype and correspondingly more frequent ERBB2 amplification and higher HER2 protein abundance in the Chinese HR+HER2+ cohort, stressing anti-HER2 therapy for these individuals. Furthermore, comprehensive metabolomic and proteomic analyses revealed ferroptosis as a potential therapeutic target for basal-like tumors. The integration of clinical, transcriptomic, metabolomic, radiomic and pathological features allowed for efficient stratification of patients into groups with varying recurrence risks. Our study provides a public resource and new insights into the biology and ancestry specificity of breast cancer in the Asian population, offering potential for further precision treatment approaches.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Multiomics landscape of the CBCGA cohort.
Fig. 2: Ancestry-specific molecular features of breast cancers in Chinese patients.
Fig. 3: Proteogenomic profiling yields new insights into breast cancer subtypes.
Fig. 4: Systematic evaluation of metabolic dysregulation with polar metabolomics and lipidomics.
Fig. 5: Immunogenomic analysis deciphered the heterogeneity of the TME in breast cancer.
Fig. 6: Multimodal data integration using machine learning for risk stratification of breast cancer.

Similar content being viewed by others

Data availability

WES, CNA, RNA-seq and metabolome data that support the findings of this study have been deposited in the Genome Sequence Archive database under accession code PRJCA017539. MS data have been deposited in iProX under accession code IPX0006535000. Human breast cancer genomic, transcriptomic data and protein data were derived from the FUSCC targeted sequencing cohort, TCGA Research Network, Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Clinical Proteomic Tumor Analysis Consortium (CPTAC). The datasets derived from TCGA, METABRIC and CPTAC are available at the cBioPortal website (www.cbioportal.org/). FUSCC targeted sequencing data are available in the Fudan Data Portal (https://data.3steps.cn/cdataportal/). All other data supporting the findings of this study are available from the corresponding author on reasonable request. Source data are provided with this paper.

Code availability

All data analysis and processing were conducted using published software packages whose details have been previously described and referenced within the Methods. No new code or mathematical algorithms were generated from this manuscript.

References

  1. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).

    Article  PubMed  Google Scholar 

  2. Waks, A. G. & Winer, E. P. Breast cancer treatment: a review. JAMA 321, 288–300, (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Gennari, A. et al. ESMO clinical practice guideline for the diagnosis, staging and treatment of patients with metastatic breast cancer. Ann. Oncol. https://doi.org/10.1016/j.annonc.2021.09.019 (2021).

    Article  PubMed  Google Scholar 

  4. Razavi, P. et al. The genomic landscape of endocrine-resistant advanced breast cancers. Cancer Cell 34, 427–438 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Jiang, Y. Z. et al. Genomic and transcriptomic landscape of triple-negative breast cancers: subtypes and treatment strategies. Cancer Cell 35, 428–440 (2019).

    Article  CAS  PubMed  Google Scholar 

  6. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

    Article  Google Scholar 

  7. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Sammut, S. J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).

    Article  CAS  PubMed  Google Scholar 

  10. Boehm, K. M. et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat. Cancer 3, 723–733 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Krug, K. et al. Proteogenomic landscape of breast cancer tumourigenesis and targeted therapy. Cell 183, 1436–1456 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Pan, J. W. et al. The molecular landscape of Asian breast cancers reveals clinically relevant population-specific differences. Nat. Commun. 11, 6433 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kan, Z. et al. Multi-omics profiling of younger Asian breast cancers reveals distinctive molecular signatures. Nat. Commun. 9, 1725 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Shimoi, T. et al. Hotspot mutation profiles of AKT1 in Asian women with breast and endometrial cancers. BMC Cancer 21, 1131 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Lang, G. T. et al. Characterization of the genomic landscape and actionable mutations in Chinese breast cancers by clinical sequencing. Nat. Commun. 11, 5679 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Lee, Y. R. et al. WWP1 gain-of-function inactivation of PTEN in cancer predisposition. N. Engl. J. Med. 382, 2103–2116 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lee, Y. R. et al. Reactivation of PTEN tumour suppressor for cancer treatment through inhibition of a MYC-WWP1 inhibitory pathway. Science https://doi.org/10.1126/science.aau0159 (2019).

  19. Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).

    Article  CAS  PubMed  Google Scholar 

  20. Wolf, D. M. et al. Redefining breast cancer subtypes to guide treatment prioritization and maximize response: predictive biomarkers across 10 cancer therapies. Cancer Cell https://doi.org/10.1016/j.ccell.2022.05.005 (2022).

  21. Hakimi, A. A. et al. An integrated metabolic atlas of clear cell renal cell carcinoma. Cancer Cell 29, 104–116 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Xiao, Y. et al. Comprehensive metabolomics expands precision medicine for triple-negative breast cancer. Cell Res 32, 477–490 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Xiao, Y. et al. Multi-omics profiling reveals distinct microenvironment characterization and suggests immune escape mechanisms of triple-negative breast cancer. Clin. Cancer Res. 25, 5002–5014 (2019).

    Article  CAS  PubMed  Google Scholar 

  24. Pusztai, L. et al. Durvalumab with olaparib and paclitaxel for high-risk HER2-negative stage II/III breast cancer: results from the adaptively randomized I-SPY2 trial. Cancer Cell https://doi.org/10.1016/j.ccell.2021.05.009 (2021).

    Article  PubMed  Google Scholar 

  25. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).

    Article  PubMed  Google Scholar 

  27. Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53, 1334–1347 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Uhrig, S. et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Ding, R. et al. Breast cancer screening and early diagnosis in Chinese women. Cancer Biol. Med. https://doi.org/10.20892/j.issn.2095-3941.2021.0676 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Lee, S. K. et al. Is the high proportion of young age at breast cancer onset a unique feature of Asian breast cancer? Breast Cancer Res. Treat. 173, 189–199 (2019).

    Article  PubMed  Google Scholar 

  32. Zhu, B. et al. Comparison of somatic mutation landscapes in Chinese versus European breast cancer patients. HGG Adv. 3, 100076 (2022).

    CAS  PubMed  Google Scholar 

  33. Wander, S. A. et al. The genomic landscape of intrinsic and acquired resistance to cyclin-dependent kinase 4/6 inhibitors in patients with hormone receptor-positive metastatic breast cancer. Cancer Discov. 10, 1174–1193 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Kalinsky, K. et al. Effect of capivasertib in patients with an AKT1 E17K-mutated tumour: NCI-MATCH subprotocol EAY131-Y nonrandomized trial. JAMA Oncol. 7, 271–278, (2021).

    Article  PubMed  Google Scholar 

  35. Smyth, L. M. et al. Capivasertib, an AKT kinase inhibitor, as monotherapy or in combination with fulvestrant in patients with AKT1 (E17K)-mutant, ER-positive metastatic breast cancer. Clin. Cancer Res. 26, 3947–3957 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Jones, R. H. et al. Fulvestrant plus capivasertib versus placebo after relapse or progression on an aromatase inhibitor in metastatic, oestrogen receptor-positive breast cancer (FAKTION): a multicentre, randomised, controlled, phase 2 trial. Lancet Oncol. 21, 345–357 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Gianni, L. et al. Efficacy and safety of neoadjuvant pertuzumab and trastuzumab in women with locally advanced, inflammatory, or early HER2-positive breast cancer (NeoSphere): a randomised multicentre, open-label, phase 2 trial. Lancet Oncol. 13, 25–32 (2012).

    Article  CAS  PubMed  Google Scholar 

  38. Robidoux, A. et al. Lapatinib as a component of neoadjuvant therapy for HER2-positive operable breast cancer (NSABP protocol B-41): an open-label, randomised phase 3 trial. Lancet Oncol. 14, 1183–1192 (2013).

    Article  CAS  PubMed  Google Scholar 

  39. de Azambuja, E. et al. Lapatinib with trastuzumab for HER2-positive early breast cancer (NeoALTTO): survival outcomes of a randomised, open-label, multicentre, phase 3 trial and their association with pathological complete response. Lancet Oncol. 15, 1137–1146 (2014).

    Article  PubMed  Google Scholar 

  40. Gianni, L. et al. Neoadjuvant chemotherapy with trastuzumab followed by adjuvant trastuzumab versus neoadjuvant chemotherapy alone, in patients with HER2-positive locally advanced breast cancer (the NOAH trial): a randomised controlled superiority trial with a parallel HER2-negative cohort. Lancet 375, 377–384 (2010).

    Article  CAS  PubMed  Google Scholar 

  41. Shao, Z. et al. Efficacy, safety, and tolerability of pertuzumab, trastuzumab, and docetaxel for patients with early or locally advanced ERBB2-positive breast cancer in Asia: the PEONY Phase 3 randomized clinical trial. JAMA Oncol. 6, e193692 (2020).

    Article  PubMed  Google Scholar 

  42. Llombart-Cussac, A. et al. HER2-enriched subtype as a predictor of pathological complete response following trastuzumab and lapatinib without chemotherapy in early-stage HER2-positive breast cancer (PAMELA): an open-label, single-group, multicentre, phase 2 trial. Lancet Oncol. 18, 545–554 (2017).

    Article  CAS  PubMed  Google Scholar 

  43. Prat, A. et al. HER2-enriched subtype and ERBB2 expression in HER2-positive breast cancer treated with dual HER2 blockade. J. Natl Cancer Inst. 112, 46–54 (2020).

    Article  PubMed  Google Scholar 

  44. Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50 (2018).

    Article  PubMed  Google Scholar 

  46. Tang, X. et al. A joint analysis of metabolomics and genetics of breast cancer. Breast Cancer Res. 16, 415 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Terunuma, A. et al. MYC-driven accumulation of 2-hydroxyglutarate is associated with breast cancer prognosis. J. Clin. Invest. 124, 398–412 (2014).

    Article  CAS  PubMed  Google Scholar 

  48. Nguyen, T. et al. Uncovering the role of N-acetyl-aspartyl-glutamate as a glutamate reservoir in cancer. Cell Rep. 27, 491–501 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Muthusamy, T. et al. Serine restriction alters sphingolipid diversity to constrain tumour growth. Nature 586, 790–795 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Ogretmen, B. Sphingolipid metabolism in cancer signalling and therapy. Nat. Rev. Cancer 18, 33–50 (2018).

    Article  CAS  PubMed  Google Scholar 

  51. Zheng, J. & Conrad, M. The metabolic underpinnings of ferroptosis. Cell Metab. 32, 920–937 (2020).

    Article  CAS  PubMed  Google Scholar 

  52. Chen, X., Kang, R., Kroemer, G. & Tang, D. Broadening horizons: the role of ferroptosis in cancer. Nat. Rev. Clin. Oncol. 18, 280–296 (2021).

    Article  CAS  PubMed  Google Scholar 

  53. Jiang, L. et al. Radiogenomic analysis reveals tumour heterogeneity of triple-negative breast cancer. Cell Rep. Med. 3, 100694 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Zhao, S. et al. Deep learning framework for comprehensive molecular and prognostic stratifications of triple-negative breast cancer. Fundam. Res. https://doi.org/10.1016/j.fmre.2022.06.008 (2022).

    Article  Google Scholar 

  55. Jiang, Y.-Z. et al. Integrated molecular portraits of breast cancer. Nat. Protoc. https://doi.org/10.21203/rs.3.pex-2435/v1 (2023).

    Article  PubMed  Google Scholar 

  56. Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Paquet, E. R. & Hallett, M. T. Absolute assignment of breast cancer intrinsic molecular subtype. J. Natl Cancer Inst. https://doi.org/10.1093/jnci/dju357 (2015).

  58. Chen, B., Khodadoust, M. S., Liu, C. L., Newman, A. M. & Alizadeh, A. A. Profiling tumour infiltrating immune cells with CIBERSORT. Methods Mol. Biol. 1711, 243–259 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17, 218 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14, 7 (2013).

    Article  Google Scholar 

  64. Telli, M. L. et al. Homologous recombination deficiency (HRD) score predicts response to platinum-containing neoadjuvant chemotherapy in patients with triple-negative breast cancer. Clin. Cancer Res. 22, 3764–3773 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Benard, B. A. et al. Clonal architecture predicts clinical outcomes and drug sensitivity in acute myeloid leukemia. Nat. Commun. 12, 7244 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Amin, S. B. et al. Comparative molecular life history of spontaneous canine and human gliomas. Cancer Cell 37, 243–257 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Chen, D. et al. Identification and characterization of robust hepatocellular carcinoma prognostic subtypes based on an integrative metabolite-protein interaction network. Adv. Sci. 8, e2100311 (2021).

    Article  Google Scholar 

  71. Johansson, H. J. et al. Breast cancer quantitative proteome and proteogenomic landscape. Nat. Commun. 10, 1600 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Chen, Y. J. et al. Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression. Cell 182, 226–244 (2020).

    Article  CAS  PubMed  Google Scholar 

  73. Avants, B. B., Epstein, C. L., Grossman, M. & Gee, J. C. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12, 26–41 (2008).

    Article  CAS  PubMed  Google Scholar 

  74. Neal, J. T. et al. Organoid modeling of the tumour immune microenvironment. Cell 175, 1972–1988 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Sachs, N. et al. A living biobank of breast cancer organoids captures disease heterogeneity. Cell 172, 373–386 (2018).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by grants from the National Key Research and Development Project of China (grant no. 2020YFA0112304 to Z.-M.S. and Y.-Z.J., and 2021YFF1201300 to Y.-Z.J., W.Huang and J.S.), the National Natural Science Foundation of China (grant nos. 92159301, 82341003 and 91959207 to Z.-M.S., 82272822 to Y.-Z.J, 82272704 to D.M. and 32370701 to L.S.), the Shanghai Key Laboratory of Breast Cancer (grant no. 12DZ2260100 to Z.-M.S.), the Shanghai Hospital Development Center Municipal Project for Developing Emerging and Frontier Technology in Shanghai Hospitals (grant no. SHDC12021103 to Z.-M.S.), the Program of Shanghai Academic/Technology Research Leader (grant no. 20XD1421100 to Y.-Z.J.), the Natural Science Foundation of Shanghai (grant no. 22ZR1479200 to Y.-Z.J. and 23ZR1411800 to X.J.), the Shanghai Rising-Star Program (grant no. 23QA1401400 to D.M.), the Youth Talent Program of Shanghai Health Commission (grant no. 2022YQ012 to X.J.) and the Shanghai Municipal Science and Technology Major Project (grant no. 2023SHZDZX02 to L.S.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We are grateful to Computing for the Future at Fudan and the Human Phenome Data Center of Fudan University for computing support. We also thank J. Xu from Nanjing University of Information Science and Technology for editing the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Z.-M.S., W.Huang, Y.Z. and Y.-Z.J. outlined the manuscript content. J.S. and W.Hunag performed the genomic sequencing. Y.Y., W.Hou, Y.L., Q.C., J.Y., N.Z., L.S. and Y.Z. performed RNA sequencing and contributed to data processing and analysis. W.L., W.G. and T.G. performed proteomics. S.Z., G.-H.S., W.-T.Y., C.Y. and Y.G. contributed to multimodal data integration. Y.-Z.J., D.M., X.J., Y.-F.Z., T.F., C.-J.L., L.-J.D., C.-L.L. and W.-J.Z. contributed to literature survey, data collection and data analysis. Y.-Z.J., D.M., X.J. and Y.X. prepared the figures and drafted the manuscript, with contributions from all authors. V.K., F.B., C.V., A.D., N.M.S., T.W. and C.M.P. helped with data interpretation and manuscript editing. All authors approved the final manuscript.

Corresponding authors

Correspondence to Yi-Zhou Jiang, Yuanting Zheng, Wei Huang or Zhi-Ming Shao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Cancer thanks Xiaohong Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Clinical and molecular characteristics of the Chinese Breast cancer Genome Atlas (CBCGA) cohort.

a, Cohort and omics information. b, The matching information between Immunohistochemistry (IHC) subtypes and PAM50 subtypes is displayed using a confusion matrix in which numbers in the diagonal represent subtype agreement between the two subtyping methods (in n = 752 tumors). Abbreviations for PAM50 subtypes: LumA, luminal A; LumB, luminal B; HER2, HER2-enriched; Basal, basal-like; Normal, normal-like. c, The matching information between AIMS subtypes and PAM50 subtypes is displayed using a confusion matrix (in n = 752 tumors). d, Differentially expressed proteins across PAM50 subtypes. From left to right, differential expression analysis were conducted between Luminal A (n = 56 tumors), Luminal B (n = 77 tumors), HER2-enriched (n = 59 tumors), Basal-like (n = 59 tumors) and the other subtypes (n = 215, 194, 212 and 212 tumors respectively). e, f, Differentially expressed polar metabolites (e) and lipids (f) across PAM50 subtypes. From left to right, differential expression analysis were conducted between Luminal A (n = 119 tumors), Luminal B (n = 144 tumors), HER2-enriched (n = 98 tumors), Basal-like (n = 52 tumors) and other subtypes (n = 324, 299, 345 and 391 tumors respectively). For d-f, two-sided P values were determined by Mann–Whitney U-test and adjusted by the Benjamini–Hochberg procedure. Proteins, polar metabolites and lipids were colored gray if they didn’t meet the criteria that the absolute value of log2 Fold Change (log2FC) is greater than 1 or FDR < 0.05.

Source data

Extended Data Fig. 2 Comparisons between the breast cancers raised in CBCGA Chinese and the Cancer Genome Atlas (TCGA) white individuals.

a–e, Gene-level somatic mutation frequencies of the IDC cases in the Luminal A (CBCGA: n = 182 tumors; TCGA: n = 229 tumors) (a), Luminal B (CBCGA: n = 180 tumors; TCGA: n = 183 tumors) (b), HER2-enriched (CBCGA: n = 121 tumors; TCGA: n = 35 tumors) (c), Basal-like (CBCGA: n = 83 tumors; TCGA: n = 86 tumors) (d) and Normal-like (CBCGA: n = 41 tumors; TCGA: n = 9 tumors) (e) cohorts. f, AKT1 mutation frequency found in IDC cases in East Asian (CBCGA: n = 624 tumors; Targeted sequencing cohort: n = 3,208 tumors; NCCH: n = 311 tumors) and white individuals (TCGA: n = 474 tumors; METABRIC: n = 1,866 tumors) breast cancer cohorts. ‘*’ denotes the cohorts where PAM50 subtypes are not available, AKT1 mutation frequency in all cases is shown. g, AKT1 mutation sites found in luminal A IDC patients in the CBCGA (upper) and TCGA white individuals (lower) cohorts.

Source data

Extended Data Fig. 3 Comparisons in molecular subtype and ERBB2 amplification between the breast cancers raised in CBCGA Chinese and TCGA white individuals.

a, b, Proportion of Luminal A (a) and HER2-enriched (b) breast cancer in the IDC cases of CBCGA Chinese (n = 716 tumors) and TCGA Asian (n = 47 tumors) compared with TCGA white individuals (n = 490 tumors) and METABRIC (n = 1974 tumors) cohorts. ce, Gene-level somatic copy number alterations of the IDC cases in the CBCGA and TCGA white individuals cohorts grouped by IHC-based subtypes: amplifications (upper) and deletions (lower) in HR+HER2- (c), HR-HER2+ (d) and triple-negative breast cancer (e). For a-b, P values were obtained from two-sided Fisher’s exact test and adjusted by the Benjamini–Hochberg procedure.

Source data

Extended Data Fig. 4 Quality control of proteomics and impact of copy number alteration on mRNA and protein expression.

a, Bar plot showing the detected genes in each batch. The totality of detected genes was 10864. b, Principal component analysis (PCA) evaluating the batch effect with all genes that were detected in over 70% of included samples after normalization and batch effect removement. c, Dot plots showing the Pearson’s correlation between technical replicates (samples within batch 33 and 34) with all genes that were detected in over 70% of included samples after normalization and batch effect removement. d, Venn diagrams depicting the cis-effect of CNA (FDR < 0.05) along the central dogma in this study and the studies published by Mertins and colleagues10 (n = 74 tumors) and by Krug and colleagues11 (n = 122 tumors). e, f, Boxplot showing the mRNA level and protein level of WWP1 (e) and CCND1 (f) across different GISTIC scores in each PAM50 subtype. For WWP1 analysis, the number of samples were as follows: LumA: n = 188 tumors in the RNA analysis and n = 52 tumors in the protein analysis; LumB: n = 198 tumors in the RNA analysis and n = 73 tumors in the protein analysis; HER2: n = 121 tumors in the RNA analysis and n = 49 tumors in the protein analysis; Basal: n = 88 tumors in the RNA analysis and n = 44 tumors in the protein analysis; Normal: n = 47 tumors in the RNA analysis and n = 20 tumors in the protein analysis. For CCND1 analysis, the number of samples were as follows: LumA: n = 147 tumors in the RNA analysis and n = 41 tumors in the protein analysis; LumB: n = 163 tumors in the RNA analysis and n = 58 tumors in the protein analysis; HER2: n = 105 tumors in the RNA analysis and n = 41 tumors in the protein analysis; Basal: n = 76 tumors in the RNA analysis and n = 31 tumors in the protein analysis; Normal: n = 37 tumors in the RNA analysis and n = 12 tumors in the protein analysis. In boxplots, the centreline represents the median, the box limits represent the upper and lower quartiles, the whiskers represent the 1.5× interquartile range, and the points represent individual samples. g, h, Forest plot of multivariate Cox regression analysis for relapse free survival adjusting for PAM50 clusters, tumor size and lymph node status in overall population (n = 271 tumors) (g) and HR+HER2- subgroup (n = 148 tumors) (h). Error bars represent the 95% confidence intervals (CI) of the hazard ratio (HR) and the center for the error bars indicates HRs. i, Gene set enrichment analysis (GSEA) comparing the molecular characteristics of each integrated cluster with the others. Pathways that were significantly enriched in certain cluster (FDR < 0.25) were shown. j, Heat map showing the abundance of immune cells in Cluster 3 (n = 75 tumors) and non-Cluster 3 (n = 196 tumors) breast cancers. Cell types that were significantly elevated in Cluster 3 subgroup were marked with asterisks. k, Enrichment of immunotherapy predictive signatures in integrated clusters and PAM50 subtypes indicated by logistic model in overall population (n = 271 tumors) and HR+HER2- (n = 148 tumors) subgroups. For d, P values were obtained from Spearman’s rank test with false discovery rate correction. For e, f, two-sided Wilcoxon rank tests were conducted to compare the mRNA level or protein level between samples with GISTIC scores of ‘0’ and ‘2’ in different PAM50 subtypes. *: P value < 0.05; N.S.: not significant, P value > 0.05. For g, h, P values were obtained from two-sided multivariate Cox regression analysis. The bold font indicates a P value less than 0.05. For j, P values were obtained from unpaired two-sided t-test.

Source data

Extended Data Fig. 5 Quality control and overview of polar metabolomic and lipidomic data in CBCGA.

a, The distribution of quality control (QC) samples in principal component analysis (PCA) of polar metabolomic data in positive- (left panel) and negative- (right panel) ion modes. b, The distribution of QC samples in PCA of lipidomic data in positive- (left panel) and negative- (right panel) ion modes. c, The numbers and proportions of annotated polar metabolites (top panel) and lipids (bottom panel) in our study. FA, Fatty Acid; GL, Glycerolipid; GP, Glycerophospholipid; SP, Sphingolipid; ST, Sterol Lipids. d, A volcano plot of the 669 annotated polar metabolites (top panel) and 1312 lipids (bottom panel) profiled. Differentially abundant metabolites of different categories were individually color coded. e, Log2 fold change (FC) of different categories of polar metabolites (top panel) and lipids (bottom panel) between tumor and normal tissues. The dashed red line represents the same level of metabolite abundance between the tumor and the normal. Tumor, n = 501 biologically independent samples; Normal, n = 76 biologically independent samples. Center line indicates the median, and bounds of box indicate the 25th and 75th percentiles, the whiskers represent the 1.5× interquartile range. f, A pathway-based analysis of metabolomic changes between tumor and normal tissues. The differential abundance (DA) score captures the average, gross changes for all metabolites in a pathway. A score of 1 indicates that all measured metabolites in the pathway increase in the tumor compared to normal tissues, and a score of −1 indicates that all measured metabolites in a pathway decrease. Pathways with no less than three measured metabolites were used for DA score calculation. Tumor, n = 501 biologically independent samples; Normal, n = 76 biologically independent samples. For d, P values are calculated using the two-sided Kruskal–Wallis test and adjusted by the Benjamini–Hochberg procedure.

Source data

Extended Data Fig. 6 Integrated analysis of immunogenomic characteristics of breast cancer.

a, CIBERSORT estimated cell proportion of 22 types of immune cells among TME phenotypes (Cold: n = 296 tumors; Moderate: n = 191 tumors; Hot: n = 265 tumors). Cell abundance was normalized across samples. b, ESTIMATE evaluated immune and stromal signatures among different TME phenotypes in each PAM50 subtype (LumA: n = 222 tumors; LumB: n = 221 tumors; HER2: n = 148 tumors; Basal: n = 112 tumors; Normal: n = 49 tumors). For the boxplot, center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. c, K-means clustering of TCGA cohort based on the estimated abundance of 24 microenvironment cell types (Cold: n = 419 tumors; Moderate: n = 458 tumors; Hot: n = 202 tumors). d, Distribution of TME phenotypes across the PAM50 subtypes in TCGA cohort. e, Proportions of tumor microenvironment cells deconvoluted from scRNA-seq data (n = 752 tumors). f, g, Comparison of MHC (f) and innate immune (g) molecules expression among TME phenotypes in each indicated PAM50 subtype (n = 752 tumors). h, Comparison of virus mimicry signature among TME phenotypes in each indicated intrinsic subtype (LumA: n = 222 tumors; LumB: n = 221 tumors; HER2: n = 148 tumors; Basal: n = 112 tumors; Normal: n = 49 tumors). Center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. For b,h, P values are calculated using the two-sided Kruskal–Wallis test adjusted by Benjamini–Hochberg (BH) procedure.

Source data

Extended Data Fig. 7 Recurrent ERBB2 fusion transcripts in HER2-positive tumors.

a, Distribution of fusion genes across chromosomes. b, The circle represents the landscape of fusion genes. Recurrent fusions (more than two samples) are displayed as connected gene pairs, in which the width of the connecting arc represents the number of samples that contained the fusion. Red indicates novel gene fusions not present in public database (FusionGDB and ChimerDB). c, Bar chart showing the top 11 recurrent fusion genes. d, e, Distribution of fusion genes in IHC subtypes (d) (HR+HER2-, n = 468 tumors; HR+HER2+ , n = 100 tumors; HR-HER2 + , n = 81 tumors; TNBC, n = 103 tumors; Paratumour, n = 60 samples) and PAM50 subtypes (e) (Luminal A, n = 222 tumors; Luminal B, n = 221 tumors; HER2-enriched, n = 148 tumors; Basal-like, n = 112 tumors; Normal-like, n = 49 tumors; Paratumour, n = 60 samples). For the boxplot, center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. f, The proportions of fusion types proximal to ERBB2 on chromosome 17q. g, Circos plot displaying ERBB2 fusions. h, Propensity-matched survival analysis for HER2-positive patients with or without ERBB2 fusions. For d, e, the statistical analysis was performed using the Kruskal–Wallis test. For h, survival distributions were compared using the log-rank test.

Source data

Extended Data Fig. 8 Data dimension, overall performance multimodal prognosis prediction model and feature importance of TMPIC model.

a, Upset plot showing the number of patients of different data modality combinations. Vertical bars of upper plot present the number of patients of data modality combinations denoted by the black circles of the plot located below. C, clinical stage; I, IHC subtype; T, transcriptomic data; P, digital pathology data; M, metabolomic data; R, radiologic data. b, Comparison of C-indices of models of single modalities (n = 6 models), of 2 to 3 modalities (n = 15 models) and of 4 to 6 modalities (n = 16 models). For the boxplot, center line indicates the median value, lower and upper hinges represent the 25th and 75th percentiles, respectively and whiskers denote 1.5 × interquartile range. FDR, false discovery rate. c, Feature importance score of TMPIC model. New C-indices were calculated as dropping each individual feature from the TMPIC model. Feature importance score calculated as the difference of original C-index and new C-index in the testing cohort (n = 80 patients). For b, P values were obtained from the Kruskal–Wallis test with false discovery rate correction.

Source data

Supplementary information

Reporting Summary

Supplementary Table 1

a, Clinical and molecular characteristics of the involved patients. b, Mutational signatures contribution per intrinsic subtype. c, Frequent somatic mutations and germline variants shown in Fig. 1. d, Frequent cancer-related copy number gain/amplification between different intrinsic subtypes. e, Frequent cancer-related copy number loss/deletion between different intrinsic subtypes. f, Transcriptome data shown in Fig. 1. g, Differentially expressed proteins across intrinsic subtypes. h, Differentially expressed polar metabolites across intrinsic subtypes. i, Differentially expressed lipids across intrinsic subtypes.

Supplementary Table 2

a, Clinical features and molecular subtypes between CBCGA and TCGA white individuals. b, Frequent mutations between CBCGA and TCGA white individuals (IDC). c, Intrinsic subtypes between CBCGA and TCGA white individuals (IDC). d, Enriched copy number amplifications between CBCGA and TCGA white individuals (IDC). e, Enriched copy number deletions between CBCGA and TCGA white individuals (IDC).

Supplementary Table 3

Effects of CNAs on mRNA and protein (P values were calculated using the two-sided Spearman’s rank test and were adjusted for multiple testing using the FDR method).

Supplementary Table 4

a, Additional samples. Supplementary information of the additional 58 TNBC samples for metabolomic detection. b, Polar metabolites. log2 transformed abundance of MS2 annotated polar metabolites in tumor and normal tissues of the CBCGA cohort. c, Lipids. log2 transformed abundance of MS2 annotated lipids in tumor and healthy tissues of the CBCGA cohort. d, Protein network. Protein annotations of metabolic protein network. e, Metabolite network. Polar metabolite annotations of metabolite network. f, Correlations. Correlation of subtype-specific metabolic proteins and subtype-specific polar metabolites.

Supplementary Table 5

a, Single-sample GSEA estimated abundance of tumor microenvironment cells. b, CIBERSORT estimated proportion of tumor microenvironment cells. c, scRNA deconvolution. Deconvoluted proportion of tumor microenvironment cells based on scRNA-seq data. d, Immunogenomic indicators of the cohort. e, Somatic mutations of each TME phenotypes. f, Copy-number alterations of each TME phenotypes.

Supplementary Table 6

a, List of fusion events in CBCGA cohort. b, The reading frame of fusion transcripts in CBCGA cohort.

Supplementary Table 7

a, Features for multimodal integration. b, C-indices of models combining multimodal features to stratify patient prognosis in the testing cohort. c, Risk scores for each patient and values of multimodal features used in the TMPIC model.

Source data

Source Data Fig. 1

Statistical Source Data.

Source Data Fig. 2

Statistical Source Data.

Source Data Fig. 3

Statistical Source Data.

Source Data Fig. 4

Statistical Source Data.

Source Data Fig. 5

Statistical Source Data.

Source Data Fig. 6

Statistical Source Data.

Source Data Extended Data Fig. 1

Statistical Source Data.

Source Data Extended Data Fig. 2

Statistical Source Data.

Source Data Extended Data Fig. 3

Statistical Source Data.

Source Data Extended Data Fig. 4

Statistical Source Data.

Source Data Extended Data Fig. 5

Statistical Source Data.

Source Data Extended Data Fig. 6

Statistical Source Data.

Source Data Extended Data Fig. 7

Statistical Source Data.

Source Data Extended Data Fig. 8

Statistical Source Data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, YZ., Ma, D., Jin, X. et al. Integrated multiomic profiling of breast cancer in the Chinese population reveals patient stratification and therapeutic vulnerabilities. Nat Cancer 5, 673–690 (2024). https://doi.org/10.1038/s43018-024-00725-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43018-024-00725-0

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer