Main

DOM is one of the most complex, dynamic and abundant sources of organic carbon on Earth, and its chemical reactivity remains mysterious so far. The metabolism of autotrophic organisms is well understood and produces a limited number of organic molecules, often rather small biomolecules or polymerized from small repetitive units. Compared with biomolecules, most DOM accumulated in natural waters and soils seem to be extremely complex and rather refractory. The insufficient understanding of the diagenesis of DOM has given rise to many inconclusive hypotheses lacking firm links between biomolecules and the observed DOM molecular complexity.

Aquatic DOM represents a mix of various stages of biotic and abiotic processed terrestrial and aquatic sources across contrasting conditions of temperature, photochemistry and seasonality2. Large contrasts of these regimes are observed in tropical and boreal biomes. The Amazon basin is an exemplary tropical catchment and the largest drainage system in the world, responsible for 20% of the global freshwater discharge and for about 10% of the global riverine DOM export to the oceans11,12. It comprises heterogeneous landscapes including the Andean Cordillera, minor mountain areas and expansive forested flatlands with stagnant and flowing waters affected by seasonal flooding13. The Amazon biome comprises extraordinary biodiversity of plants, animals and microorganisms14,15,16, constituting the source of Amazon DOM (AZ-DOM); high temperature combined with high humidity leads to rapid and extensive biological and chemical processing, affecting production and degradation of organic compounds, as well as carbon fluxes11,17. The equatorial position of the Amazon ecosystem also promotes photo-oxidation and mineralization of AZ-DOM. Processing of terrigenous and aquatic organic matter in Amazon rivers produces a quarter of global CO2 emissions from inland waters, nearly the same amount of carbon as sequestered by its forest11,18.

The Amazon basin comprises three main water types. Whitewater rivers (such as the Amazon main course and the Juruá, Japurá, Purus, Solimões and Madeira rivers) are turbid and originate in the Andes, from which they transport large amounts of nutrient-rich sediments12,19. Blackwater rivers (such as the Negro River) drain the Precambrian Guiana Shield, carrying small quantities of suspended matter but large amounts of humic substances20,21. Clearwater rivers (such as the Tapajós and Xingu rivers) feature high transparency, low sediment load, low nutrients and considerable bacterial abundance22.

The boreal forest biome is the second largest water-rich landscape apart from the humid tropics, covering about 14% of Earth’s land area from 50° N to 70° N, and is associated with forests and wetlands such as bogs, fens and peatlands that store and process vast amounts of carbon. The boreal biome has the largest number of lakes on Earth23. The molecular composition of boreal lake DOM is considered to be shaped by microbial synthesis and degradation, precipitation, temperature, land cover and water residence time24,25,26,27.

13C NMR spectra of DOM

Previous mass-spectrometry studies have identified thousands of ions in tropical and boreal DOM and showed distinction of DOM from different waters in the Amazon basin and boreal lakes17,28,29,30. Although high-resolution mass spectrometry offers exceptional capacity to identify elemental compositions and molecular formulae in complex mixtures, such analyses provide very limited specific structural information31. NMR spectroscopy offers isotope-specific determination of close-range atomic order (such as for 1H and 13C nuclei) within molecules and standalone capability to explain molecular structures in complex mixtures of unknown molecules such as DOM32,33,34,35. Here we used complementary multiplicity-edited 13C NMR spectra to quantify key substructures assembling the carbon skeletons of DOM in four main Amazon rivers and two mid-size Swedish boreal lakes (Fig. 1, Table 1, Extended Data Figs. 1 and 2 and Extended Data Tables 1 and 2). We have assessed the attendant aspects of DOM formation and reactivity enabled by this in-depth structural analysis.

Fig. 1: 13C NMR spectra of five DOM define contributions of core carbon substructures CH0123.
figure 1

a, Overlay of single-pulse (Call, black) and QUAT (Cq, brown) 13C NMR spectra; numbers indicate relative proportions of Cq to Call (%); Cq-related substructures OCqC3 and CqC4, as well as CHn-related substructures OCHn and CCHn, are shaded in colour. b, Overlay of multiplicity-edited 13C DEPT NMR spectra, indicating CH123 (purple), CH (blue), CH2 (green) and CH3 (red); numbers indicate their relative proportions to Call (%). c, Proportions of OCqC3 and other ODA-relevant oxygenated carbon units to Call (%). d, 13C NMR-derived relative proportions of quaternary carbon (Cq), methine (CH), methylene (CH2) and methyl (CH3) carbons denote progressive compaction of DOM molecules in the order B-DOM < T-DOM < A-DOM < S-DOM < N-DOM (see text). e, Overlay of area-normalized 13C NMR spectra of five DOM (δC: 0–235 ppm = 100% area). f, Overlay of area-normalized 13C NMR spectra of five DOM; section of carbonyl derivatives (δC: 165–185 ppm = 100% area). g, Overlay of area-normalized 13C NMR spectra of five DOM; section of polyphenols (δC: 60–165 ppm = 100% area).

Table 1 Percentages of 11 13C NMR-derived key carbon chemical environments (CH0123) in DOM samples

13C NMR spectra detect all carbon atoms in DOM molecules. Combined analysis of multiplicity-edited DEPT (distortionless enhancement by polarization transfer), QUAT (quaternary carbon only) and single-pulse 13C NMR spectra (Call) provided quantification of all four fundamental chemical environments of quaternary carbon (Cq), methine (CH), methylene (CH2) and methyl (CH3) carbon in the five DOM (Fig. 1a,b,d and Extended Data Table 3). These CH0123 subspectra showed prominent broad 13C NMR resonances representing core-carbon-based structural units of the carbon skeleton of DOM molecules. The 13C NMR-derived average O/C (oxygen to carbon) atomic ratios32 followed the order of N-DOM (DOM in the Negro River) > S-DOM (DOM in the Solimões River) > T-DOM (DOM in the Tapajós River) > B-DOM (DOM in the boreal lakes) > A-DOM (DOM in the Amazonas River). The average H/C (hydrogen to carbon) atomic ratios followed roughly the reverse order N-DOM < T-DOM < S-DOM < A-DOM < B-DOM (Extended Data Table 4), suggesting that the main oxygen-containing functional groups in DOM were associated with unsaturated carbon units, such as Csp2-based carbonyls (C2C = O), carbonyl derivatives (CONH, COOH and COOR), oxygenated aromatic carbons (Car–O; polyphenols) and olefins.

B-DOM showed a higher ratio of aliphatic protons to aliphatic carbons compared with the other four DOM, indicating higher H/C ratios within its aliphatic units. The abundance of singly oxygenated aliphatic groups (OCH units) followed the order B-DOM > N-DOM > A-DOM ≈ S-DOM > T-DOM (Table 1); analogous trends applied to the sum of O2CH and OCH units, but highly oxygenated polyphenols in N-DOM and A-DOM contributed to δC ≈ 90–108 ppm as well33. N-DOM showed the highest proportions and the largest molecular diversity of polyphenolic molecules among the five DOM, covering the maximum 13C NMR chemical shift range (δC ≈ 95–165 ppm) attainable for this class of molecules33 (Fig. 1c,e,g). Carboxylic acids in DOM (δC ≈ 165–185 ppm) showed remarkable variance in abundance (S-DOM > N-DOM > B-DOM ≈ T-DOM ≈ A-DOM) (Table 1 and Extended Data Table 4) and structural diversity (Fig. 1f and Extended Data Fig. 2e), with N-DOM and A-DOM being most distinct. The relative abundance of aliphatic carboxylic acids (δC > 175 ppm; Fig. 1f) was lowest in N-DOM and A-DOM. The abundance of ketones in DOM was higher in T-DOM, N-DOM and S-DOM and lower in B-DOM and A-DOM (Table 1, Extended Data Fig. 2 and Extended Data Tables 4 and 5).

Quaternary carbon is abundant in DOM

The proportion of Cq in total carbons was remarkably high in all five DOM (N-DOM: 66% > A-DOM: 62% > S-DOM: 60% > T-DOM: 58% > B-DOM: 56%), contrasting the comparatively minor fraction of Cq (about 15%) in common, hydrogen-rich primary/central metabolites (Fig. 1d and Extended Data Table 3). Cq in 13C NMR spectra of all five DOM comprised nine main structural environments (Extended Data Figs. 3 and 4 and Extended Data Table 5). Also, the sum of Cq and CH exceeded 80% of total carbons in all five DOM (Fig. 1d and Extended Data Table 3), indicative of a high degree of compaction and unsaturation of DOM molecules, which is not attainable by any combination of common, hydrogen-rich biomolecules. Csp2-based Cq units comprise familiar unsaturated functional groups (that is, C2C = O, COOH, CONH, COOR, CarO and CarC, and C2C = C), resonating at δC ≈ 95–235 ppm. The carboxyl group COOH (about 16%) was the most abundant Cq-containing functional group in DOM molecules. Moreover, we observed Csp3-based Cq units, in particular, OCqC3 (about 6%) and CqC4 (about 7%), resonating at δC ≈ 40–110 ppm (Table 1); O2CqC2 units were present but rare.

Mass spectra of tropical riverine and boreal lake DOM showed low average H/C ratios of DOM17,28,29,30, and this considerable unsaturation is commonly attributed to the presence of Csp2-based hydrogen-deficient structures, such as ketones, carboxylic acids, olefins and polyphenols. Many diverse oxidation processes lead to ketones and carboxylic acids, and riverine DOM typically contain high abundance of polyphenols (Fig. 1, Extended Data Fig. 3 and Extended Data Table 5). However, compared with the presence of trigonal planar sp2-hybridized Cq, the presence of Csp3-based tetrahedral CqC4 and OCqC3 units in DOM molecules implies a more stringent and entirely independent structural constraint (Fig. 2, Extended Data Figs. 3 and 4 and Extended Data Table 5). In comparison with the Csp2-based structural flatland of single and fused benzene rings36, CqC4 and OCqC3 units are the ultimate carriers of aliphatic branching and deeply embedded in molecules with complex three-dimensional shapes by necessity. The high abundance of CqC4 and OCqC3 units conveys the characteristics of DOM molecules rich in aliphatic unsaturated structures, such as several fused and bridged alicyclic rings containing several tetrahedral carbon stereocentres. It is worth noting that CqC4 units may originate from many distinct chemical precursors and processes37, whereas the OCqC3 units have rather limited diversity of sources.

Fig. 2: Main synthons and chemical reactions for ODA of DOM.
figure 2

a,b, DOM-related phenolic molecules (red shade) readily convert into five main first-generation (more reactive) ortho-cyclohexadienone and (more stable) para-cyclohexadienone derivatives (green shade), which possess atom-specific reactivity depending on substitution. c, ODA initiates complementary consecutive and parallel reactions that produce a prolific diversity of molecules with elaborate three-dimensional shapes and large-scale obliteration of original binding motifs. d, Early-generation products will continually experience ecosystem-specific transformative conditions and endure exposure to reactive oxygen species, photochemistry and redox chemistry, by which structural recalcitrance of DOM molecules increases with growing counts of intramolecular carbon–carbon bonds. e, ODA converts simple cyclohexadienones into complex, oxygen-rich molecules with fused and bridged alicyclic motifs that carry many carbon-based stereocentres, denoted here by blue asterisks; small circles on atoms denote Csp2-based units CqC4, OCqC3 and O2CqC2; large circles denote relative proportions of key carbon units in given molecules.

The OCqC3 substructure is very rare in common metabolites; it does not occur in typical carbohydrates, lignins, lipids, nucleotides, peptides and tannins. It was, however, very abundant in all five DOM of this study, comprising up to roughly 6% of all carbon (Call), equivalent to about 30% of oxygenated aliphatic (OCH) units (Table 1, Fig. 1c, Extended Data Figs. 3 and 4 and Extended Data Table 5). This mandates mechanistic relevance and straightforward synthesis of OCqC3 units in freshwaters across biomes. In the comparison of all five DOM, OCqC3 units were most abundant in B-DOM and least abundant in T-DOM and N-DOM (Table 1, Fig. 1c, Extended Data Fig. 4 and Extended Data Table 5). Furthermore, we found the abundance of benzene derivatives with electron-donating substituents (-OH and -OCH3; Table 1, Fig. 1c, Extended Data Fig. 4 and Extended Data Table 5) to be highest in N-DOM and lowest in B-DOM.

ODA creates complexity in freshwater DOM

The high abundance of OCqC3 units in boreal lake and tropical riverine DOM most likely results from ODA of abundant hydroxylated and methoxylated benzene derivatives, which ultimately originate from prevalent and molecularly heterogeneous lignin and tannin degradation products that are common constituents of terrestrial DOM. Phenol, (para) 2,5-cyclohexadienone and (ortho) 2,4-cyclohexadienone are interconvertible tautomers of C6H6O (Fig. 2a,b), with increasing energy content and reactivity, respectively38,39,40. Resonant electron donation by oxygenated substituents destabilizes benzene rings39, making them susceptible to transformation into first-generation synthons, comprising masked ortho-benzoquinone ketals, o-quinols, masked para-benzoquinone ketals, p-quinols and quinone methides41,42,43. All of those cyclohexadienones are accessible by straightforward reactions from the common aromatic substructures abundant in freshwater DOM (Fig. 2a,b). Cyclohexadienone-based dearomatization is a key biochemical reaction to generate structural complexity and it is also one of the most widely used complexity-generating reactions in organic synthetic chemistry at present to create elaborate natural product scaffolds6,7,8,9,10,41,42,43,44,45,46,47,48; here we propose that it is a key environmental mechanism in DOM processing as well. Cyclohexadienones show substituent-dependent atom-specific reactivity at each position of the six-membered rings (that is, substituent-dependent electrophilic and nucleophilic character), setting the stage for a huge variety of follow-up reactions6,43,49 (Fig. 2c–e). Cyclohexadienones readily engage in, for example, standard and inverse electron-demand Diels–Alder reactions ([4 + 2] cycloadditions), [m, n] cycloadditions, cyclizations, additions, reductions and so on, and the initial products often undergo well-documented complementary and parallel cascade reactions8,50,51. Already, the basic succession of ODA and [4 + 2] cycloaddition transforms five sp2-hybridized carbon atoms into five sp3-hybridized carbon atoms (Extended Data Table 6).

ODA operates through both biotic and abiotic mechanisms9,41,50. Molecular diversification is further amplified through dearomatization by complementary selectivity of its photochemical52,53, redox-initiated radical54,55, ionic56, as well as enzymatic variants; the last of these accommodates a remarkable promiscuity of substrates10,57,58, fostering opportunities for large-scale DOM processing. All of these dearomatization reactions probably occur in parallel, facilitating rapid complexification within DOM from simple aromatic precursor molecules8 (Fig. 2 and Extended Data Fig. 5). Fundamentally, ODA transforms flat aromatic rings into elaborately shaped oxygenated aliphatic molecules rich in tetrahedral Csp3–carbon atoms with fused and bridged alicyclic rings (Fig. 2e). Aromatic precursor molecules in DOM are often of appreciable size (m/z ≈ 450 by mass spectrometry)28,30, polysubstituted, polyoxygenated, molecularly diverse and have inherent low symmetry. ODA chemistry of these molecules will inevitably produce highly complex mixtures of oxygen-rich alicyclic DOM molecules59,60,61 (Fig. 2).

We propose ODA chemistry of oxygenated aromatic DOM molecules as an indispensable initiator for the synthesis of OCqC3-units-containing, highly complex, oxygen-rich alicyclic DOM molecules in tropical and boreal freshwater ecosystems61. The molecules generated early by ODA already contain fused and bridged alicyclic rings with several tetrahedral stereogenic centres59,60, in which many carbons are bonded to several carbons, thereby decreasing the number of chemical bonds between carbon and oxygen atoms on average. This diffuse embedding of oxygen atoms into aliphatic carbon networks is a specific structural feature of freshwater DOM molecules. By contrast, carbon atoms in common metabolites are regularly clustered together, whereas oxygen atoms are either diluted (as in lipids and peptides) or concentrated (as in carbohydrates).

Two other environmental synthesis pathways to produce OCqC3 carbon units are known but seem to be of minor relevance compared with ODA. One is selective preservation of OCqC3 units in precursor (bio)molecules, such as oxygenated terpenoids62,63. The other is the unselective attack of energy-rich hydroxyl radicals on DOM molecules64. Hydroxylation may also create OCqC3 carbon units from suitable aliphatic precursors65. However, both pathways cause incremental, additive molecular transformations (Extended Data Fig. 5a and Extended Data Table 6) but are not capable of generating topological complexity from structurally simple precursors as realized by ODA6,8 (Fig. 2e and Extended Data Fig. 5b). The rather diffuse input of OCqC3 units from highly diverse molecules into the ecosphere caused by these reactions is very likely not competitive with ODA in the molecular transformation of boreal and tropical DOM, in which up to 50% of carbon can be related with structural features susceptible to ODA either as educts (polyphenols) or products (OCqC3 units) according to 13C NMR spectra.

COOH-based rearrangement and ODA synergy

ODA and carboxylic-acid chemistry carry complementary roles in the processing of DOM. Carboxylic groups are the defining feature of carboxyl-rich alicyclic molecules (CRAM) that are ubiquitous in DOM across water systems34,66,67,68. The near universal presence of large quantities of highly aliphatic CRAM in DOM is difficult to explain by incremental pathways of common microbial or photochemical reactions. ODA fundamentally generates structural complexity of DOM molecules in a few-step cascade reaction (Fig. 2e, Extended Data Fig. 5b and Extended Data Table 6) and we propose carboxylation of ODA products as a straightforward pathway leading to CRAM.

COOH is a highly reactive attachment Cq unit, whereas all other Cq atoms in DOM molecules are connected to two or more carbon atoms. CRAM observance in 13C NMR spectra of DOM implies the co-occurrence of (aliphatic and aromatic) carboxylic acids and alicyclic rings in DOM ‘on average’. However, the high abundance of both structural units in DOM, and the considerable size of DOM molecules17,28,30, infers the presence of both substructures in most DOM molecules. The positioning of COOH towards the surface of DOM molecules conveys independent reactivity, including decarboxylative functionalization and carboxylation through complementary neutral, ionic and radical pathways54,55,56. Microbial and abiotic oxidation of DOM uses molecular oxygen and/or reactive oxygen species (Fig. 2d) to generate carboxyl groups69,70,71, an efficient processing step of DOM in oxic surface waters.

We propose COOH chemistry as a critical modifier in the structural evolution of DOM towards more compact molecules during environmental processing, which increases the average number of chemical bonds between constituent atoms in DOM molecules and the proportions of quaternary and methine carbon units, at the expense of methylene and methyl units (Fig. 2d). For instance, intermediates produced by decarboxylation carry intrinsic energy fostering structural rearrangements72. In particular, free radicals have distinct reactivity, with skeletal rearrangements towards higher compaction supported by the higher stability of sterically crowded radical positions, opposite to common chemistry, in which increasing steric demands (for example, entry of new substituents to pre-existing atomic environments in molecules) are difficult to attain37,61. Intramolecular reactions with participation of abundant carboxyl and hydroxyl groups contribute to other compaction of DOM molecules by, for example, forming anhydrides (two COOH groups), lactones (COOH and OH units) and ethers (two OH units).

Common aquatic DOM contains fewer N-containing or S-containing functional groups than O-containing functional groups28,29,30,31,32, and their effects on overall 13C NMR properties remain limited. However, the dearomatization of precursors such as pyrrole, pyridine, indole and aniline derivatives readily generate alkaloid-like structurally elaborate CHNO molecules under boreal and tropical catchment conditions73, which could be a main constituent of freshwater CHNO compounds in DOM. Such reactions agree with a recently described prevalence of heterocyclic nitrogen in aged ocean dissolved organic nitrogen74.

Fundamental structural rearrangement, many carbon–carbon connectivities in hydrogen-deficient molecules and large-scale obliteration of standard biomolecular structural motifs favour intrinsic structural recalcitrance of DOM against expedient degradation. Therefore, small units such as CO2 and CH4 are more likely to be lost than large substructures during the process of DOM molecular evolution. ODA readily explains the observed ultimate structural diversity of DOM molecules and the difficulty in regenerating sizeable amounts of standard biomolecular binding motifs such as simple carbohydrates or amino acids already from early stages of DOM diagenesis because they tend to be lost early75.

DOM molecules generated from low-mass and high-mass and low-symmetry oxygenated aromatic educts through ODA show elaborate shapes with a large proportion of sp3-hybridized carbon, fused and bridged alicyclic rings, presence of chiral carbon atoms and oxygen-based and nitrogen-based functionalization—features that correlate with success in medical drug design8,9,36 (Fig. 2e). Architecturally multiform molecules explore larger regions of the chemical space and, when featuring low counts of freely rotating bonds, convey more specific ligand–receptor interactions than flat (aromatic) molecules76,77. DOM, a globally relevant layer of ultimate organic molecular complexity, comprises hundreds of gigatonnes of organic carbon, several orders of magnitude more abundant than known biologically active natural products. It is conceivable that some of these polyfunctional, elaborately shaped, compact molecules carry relevant but as yet unrecognized biological activity.

Conclusions

Polyphenol chemistry in DOM processing comprises a remarkable dichotomy of traditional ring-opening and substitution chemistry on one hand and dearomatization on the other hand. ODA initiates an inflationary increase of molecular structural diversity from early stages of DOM processing, fundamentally distinct from the rather incremental variance in molecular structures associated with the addition and release of small units such as, for example, ±H2, CH2, O, CO and CO2.

The NMR-based structural differences of boreal lake and tropical river DOM molecules were not larger than the distinction among the four investigated AZ-DOM despite experiencing contrasting regimes of microbial communities, photochemistry, temperature and seasonality during their synthesis and degradation. The proposed ODA pathway applies to both biomes and offers a new mechanism to better reveal, understand and predict DOM structural complexity. It seems that ODA is an important mechanism to produce structurally altered DOM molecules that resist degradation and persist in the environment for centuries to millennia. We suggest that ODA might be a key process in the formation of CRAM that are abundant in freshwater and the global ocean32,34,68. It has been shown that CRAM in the deep ocean is old and very resistant to microbial and photochemical degradation78,79, and sequestration of carbon in structurally recalcitrant CRAM would reduce the release of CO2 to the atmosphere, thereby affecting global warming and climate change. This research opens doors towards more comprehensive understanding of the roles of DOM in ecosystems and as a potential chemical resource to society.

Methods

Sampling and site locations

39 Amazon basin water samples from 34 sampling sites were collected between 2 April 2014 and 25 May 2014 eastward from Solimões River (whitewater), Negro River (blackwater), Amazonas River (turbid water) to the Tapajós River (clearwater). Water samples were collected during a high water period with unusually high levels of flooding. Ar1–Ar4 were sampled six weeks later than the other Amazonas River samples (Extended Data Fig. 1 (map) and Extended Data Table 1). We obtained water samples by boat just below the surface. Solid-phase extraction (SPE) of the water samples was performed within 2 h in the field. The water column DOM was extracted by a previously described SPE method using PPL resin29,80,81. The eluates were stored in the freezer (−20 °C) until further analysis. To obtain meaningful S/N ratios in NMR spectra, we have used four consolidated Amazon basin rivers samples (SNAT) according to water types and selected samples with a very high similarity of their 1H NMR spectra (data not shown). About 75% of individual samples were used for pooling, after full NMR and mass spectrometry and chemistry characterization (data not used here), leaving backup samples in case of need. The pooling conforms to the aim of this contribution, which attempts the depiction of average structural features of DOM molecules in the four main selected Amazon basin rivers. Swedish boreal lake water samples were collected in August 2012 in the Malingsbo region and two representative lakes were included in this study, namely, Lilla Sångaren (M5) and Övre Skärsjön (M10); isolation of SPE-DOM in Swedish lakes was performed analogous to Amazon river basin waters. M5 and M10 are mid-size boreal Swedish lakes with the following key parameters: dissolved organic carbon: 6.8 and 11.2 mg l−1; lake area: 24 and 165 ha; maximum depth: 20 and 32 m; computed water residence time: 1.18 and 1.63 years (ref. 82); averaged values for very similar 13C NMR spectra of boreal lakes M5 to M10 produced values of B-DOM as shown.

NMR spectroscopy

A Bruker Avance III spectrometer and TopSpin 3.6/PL6 software were used to acquire 13C NMR spectra of re-dissolved AZ-DOM (10–40 mg solid SPE-DOM in typically 75–135 µl CD3OD (99.95% 2H; 13C-depleted 12CD3OD; Aldrich, Steinheim, Germany) at 283 K. Briefly, the re-dissolved DOM were transferred to 2.5–3.0-mm Bruker Match tubes and sealed. A cryogenic classical geometry 5 mm z-gradient 13C, 1H probe (B0 = 11.7 T) was used for acquisition of 13C NMR spectra. Transmitter pulses were at approximately 10 µs for 1H and 13C and calibrated 90°/180° pulses were used for each sample. In independent experiments, one-dimensional 800 MHz 1H NMR spectra were acquired from all 39 AZ-DOM samples (data not shown) and the samples showing the most congruent curvature of their 1H NMR spectra across the entire region of chemical shift (δH ≈ 0–10 ppm) were pooled before the acquisition of spectra for the four AZ-DOM samples (that is, S-DOM, N-DOM, A-DOM and T-DOM; see Extended Data Table 1); about 75% of samples were used for pooling (Extended Data Table 1) and the residue was kept for eventual consecutive analysis (data not shown). Pooling was necessary to obtain high-quality 13C NMR spectra (13C receptivity ≈ 1.7 × 10−4 of 1H) with sufficient S/N ratio to faithfully resolve low-abundance Csp2-based chemical environments. Swedish lake water samples M5 and M10 were used as isolated for acquisition of NMR spectra because of the higher disposable amount of sample; 13C NMR spectra shown represent M5 (Fig. 1 and Extended Data Fig. 3), but all NMR section integrals and intensity computations of B-DOM represent averaged values of M5 and M10; 13C NMR spectra of M5 and M10 in essence coincided, but that of M5 showed considerably better S/N ratio than that of M10. We used inverse-gated 1H decoupling for 13C NMR spectra to eliminate nuclear Overhauser effects and (acquisition-time-adjusted) linear combinations of the 13C DEPT-45, DEPT-135 and DEPT-90 NMR spectra (1JCH: 150 Hz) to compute the individual traces of CH (13C DEPT-90 NMR spectrum), CH2 (13C DEPT-45 minus 13C DEPT-135) and CH3 ((13C DEPT-45 plus 13C DEPT-135) minus 13C DEPT-90). We corrected the 13C DEPT-90 NMR spectrum by subtracting an appropriate amount (commonly about 2–3%) of the 13C DEPT-45 NMR spectrum to attenuate leakage of CH3 and CH2 into the 13C DEPT-90 NMR spectrum (methine carbon (CH) in DOM does not show appreciable 13C NMR resonances at δC < 20 ppm) that arises from the unavoidable variance in 1JCH of DOM. Then we determined the relative contributions of the individual spectra (CH3, CH2, CH1) to the sum CH123 as observed in 13C DEPT-45 NMR spectra with recognition of the individual transfer amplitudes, which were as follows (CH3 = 1.06; CH2 = 1.0; CH = 0.707)83,84. The proportions of quaternary carbon atoms Cq in DOM were computed from comparison of 13C DEPT-45, 13C QUAT and single-pulse 13C NMR spectra.

13C NMR section integrals and overlay figures were computed using the Bruker AMIX software (version 3.9.4) from area-normalized spectra with 0.1-ppm buckets and 100% total NMR integral area from δC = 0–235 ppm, with exclusion of 13CD3OD, δ13C = 47–51 ppm. We used bucketed 13C NMR section integral values with 1-ppm bandwidth from δC = 0–235 ppm for Call and Cq, a bandwidth from δC = 0–200 ppm for CH, a bandwidth from δC = 0–100 ppm for CH2 and a bandwidth from δC = 0–70 ppm for CH3 carbon units, and we set all negative values to zero. By these means, we avoided that baseline drift would influence CH123 values at values of δC for which no actual 13C NMR resonance integral was expected.

The content of polyphenols in 13C NMR spectra33 (Fig. 1c) was computed as the sum of CarO (80%), Car,q (60%), CarH (30%) and ipso-Car,q (80% of integral; see Table 1). See Extended Data Table 2 for further acquisition parameters.

H/C and O/C elemental ratios were computed according to Hertkorn et al.32 and Fig. 19 in ref. 27.