Introduction

Why study maps?

Maps have played and continue to play a predominant role in modern Western society. The development of cartographic technology spans thousands of years and a considerable evolution separates Ptolemy’s Cosmography from the Map of Imola by da Vinci, and contemporary geographic information systems. The map is a strategic administrative and military device, and a crucial asset for exploration, and trade (Hofmann and Tesson, 2012). It is a historical marker that has evolved at the heart of technological innovations in strategic fields, such as astronomy, geometry, printing, geodetic measurement systems, optics, but also chemistry (e.g. synthetic inks), aerospace technology, computer and information science. Many of these innovations directly impact the physical cartographic object or the representation. The art of cartography has also reflected aesthetic considerations throughout the ages.

The map and the landscape are intertwined through various apprehension mechanisms. In the very first degree, maps usually aim to represent a geographical space, in a regulated manner. In this perspective, historical cartography is often used as a source for spatial and environmental history (Liu et al., 2022; Mazier et al., 2015). However, maps are not mere projections of the territory, although they have a similar structure (Korzybski, 1994: 58). In the 1980s, the perspective that the map is also a socially constructed discourse emerged (Harley, 1989). Influenced by the thought of Foucault, Harley sees the map as a power-knowledge practice. This idea is still very popular today (Crampton, 2001; Crampton and Elden, 2007), to the point that some scholars consider that sovereign states as we know them could not have emerged without cartography (Branch, 2014; Schulten, 2012).

The map has also been used as a planning device. As such, it is not only a record of planned interventions, but also entails reflection and theory (Winther, 2020). Seeing a city on a map has contributed to the formation of aesthetic and functional schools of thought, such as hygienism (Geels, 2006). The map also directly influences the reader’s use of their environment, through the identification to landmarks (Keil et al., 2020), and orientation. Finally, as an intentional representation rather than a literal projection, the map reflects the cultural apprehension of the environment, through a series of figurative choices (Bertin, 1967), such as highlighting certain monuments, and political boundaries, or the graphical differentiation of particular agricultural surfaces, such as vineyards. Maps can have a profound imaginary or evocative power; some can make use of allegories, e.g., to depict the perceived impetuosity of the sea or the consideration of the rustic charm of the wilderness.

These multiple and complementary perspectives make maps a unique and transversal object of study. As such, archives and libraries have digitized millions of maps, many of which are available in open access online. Yet most works on the history of cartography continue to focus on a few eminent maps or great cartographic series. The spirit of the Annales school and the Nouvelle Histoire is struggling to penetrate the field. Through the development of new computational methods, we hope to make it possible to excavate and study the myriad maps that make up the cultural fabric, between the famous instances. The aim of this research is to propose a methodology to identify characteristic formal features in a corpus of maps. The digitization of cartographic collections paves the way for quantitative and corpus approaches. Machine reading techniques arose indeed, proposing to vectorize historical maps automatically and use them for the diachronic study of the territory (Ares Oliveira et al., 2019; Heitzler and Hurni, 2020; Uhl et al., 2022). The possibility of studying large map corpora, or to develop a ‘distant reading’ (Moretti, 2013) or rather a ‘distant viewing’ (Arnold and Tilton, 2019) of cartography, however, remains largely unexplored. This article precisely aims to address this gap by developing a methodology for cartographic stylometry. In this context, stylometry refers to the study and analysis of the distinctive visual appearance (or style) of maps. It includes aspects like line work, or texture, that differentiate one map or series of maps from another.

Maps as visual language systems

Corpus linguistics has led to important advances in understanding how languages change over time (Davies, 2012; Hardie and McEnery, 2011; Hilpert and Gries, 2016). However, expanding this approach to other visual language systems could significantly contribute to understanding human expression in a larger sense. In this respect, we believe that cartography is a particularly promising study case. “The map is a written language” (Goode, 1927: 1). More precisely, mapmaking practices could be considered transforms of natural languages (Jakobson, 1973: 29). In a distributionalist view, maps could then be decomposed into practical analysis units or etic units. In this context, visual computing could help to operationalize pertinent graphical features and capture the systemic organization of these map units. In fact, decomposing the map into smaller tiles is already effectively used by engineers and geographers, for instance to extract and visualize semantic classes (Hosseini et al., 2021, 2022; Petitpierre et al., 2021; Uhl et al., 2018, 2020). (Hiippala and Bateman, 2022) use comparable fragments, or visual tokens, to deploy a distant viewing approach and investigate semiotic modes in diagrams.

More qualitatively, some researchers have attempted to catalog the icons and symbols used in certain maps, such as François de Dainville in his Langage des géographes (Dainville et al., 2018). Such elements would be comparable to emic units, or cultural elementary units. However, the resulting listing remains rather limited and subjective, and the approach is hardly scalable. If we are to develop a scholarship of cartographic stylometry, we deem it preferable to already ground it in core theoretical concepts of semiotics, such as the notion of pertinence. Even if a digital methodology on map semiosis is not yet proposed in the scope of this article, we consider it a worthy and natural next step (Casti, 2005).

The system emerging from contrasts

In this context, “pertinence refers to the feature of distinctiveness of structures within a system” (Nöth, 1995: 300). In this perspective, a difference in form, at the lower etic level, can translate into a systemic distinction if a difference is also observed in a higher cultural plane. This is also the intuitive idea underlying contrastive representation learning. Contrastive representation learning is a machine learning strategy used to establish a feature representation, on the base of a set of contrasted sample pairs. Concretely, an unannotated set of images could be used to train a neural network, based on simple logical pairs, such as \(Imag{e}_{A}\cong Imag{e}_{B}\) (e.g. they are replicas), and \(Imag{e}_{A}\ncong Imag{e}_{C}\). The representation is then optimized to maximize the distance between negative pairs and minimize it between positive ones.

Contrastive representation learning is, therefore, an engaging solution for learning feature representation when one can define a set of opposed samples (e.g. cat vs dog). When it comes to representation, however, it is often uncomfortable to prescribe opposition, in part because of the non-trivial issue to define kinship or replication (Eco, 1992). This has also been acknowledged as one of the main limitations of structuralist approaches. To overcome this obstacle, we propose a mathematical strategy that makes it possible to learn the feature representation of low-level graphical cartographic units, or ‘map elements’, which we will define in a few paragraphs, by contrasting higher-level structures, such as map ensembles or series.

One characteristic aspect of cartography is indeed the existence of series. Cartographic series form coherent editorial ensembles and can account for cultural differences, in terms of style, and production practices (tools, techniques, materials). In this regard, cartographic series constitute relevant ensembles, as they exhibit both internal coherence and external contrast. Moreover, contrasting higher-level structures (map series) limits the impact of the subject matter (e.g., geographic space) that could prevail if we were to directly contrast maps or parts of maps. With the approach of joint contrastive representation learning we propose however, the risks of occluding cultural factors are hopefully more limited, and the assumption that the presence of resemblant elementary units in two maps bespeaks stylistic similarity becomes affordable, as it will be discussed in the next section.

Organization of the article

To summarize, in this article, we develop a methodology for cartographic stylometry. In particular, we seek to investigate the relationship between figurative proximity, on the one hand, and time and scale, on the other hand. We tackle several methodological challenges, such as: how to characterize cartographic figuration with computational methods? How to measure the figurative distance between a pair of maps? The results are then discussed in the framework of the historical writing on cartography. In the context of this paper, we use the term ‘figuration’ to refer to the graphical configuration of features on the map image rather than in a sense of representation or meaning-making.

The structure of the article is as follows. In the next section, we present the corpus, and the proposed approach, step by step. For reasons of clarity of the narration, we favor the explanation of the methodological reasoning over the exposition of the technical implementation details. The technical details on the algorithms and method are duly presented in a separate section, at the very end of the paper, after the conclusion. In the ''Results and discussion'', we present and discuss the results of the stylometric approach, on a case study focusing on French and Swiss maps. In the ''General Discussion and Conclusion'', we open the discussion in a more general way and offer some concluding remarks.

Approach

Constitution of the dataset

The corpus we study consists of 10,046 digitized French and Swiss maps available in open access, from the collections of 32 different institutions, the main ones being the Bibliothèque nationale de France (4,817) and the University Library of Zurich (1,452). The selection criteria are focused on the study period (1600–1950), the target scale (1:200 to 1:53,500), and the geographical coverage. The corpus is restricted to European France (75%), and Switzerland (25%). To avoid confusion on the cultural context, maps of France present in Swiss collections are excluded, and vice versa. Apart from these, all digitized cartographic documents made available online by any of the 32 institutions at the time of data collection are considered. When the scale is not indicated in the record, but can be inferred from the map itself, an estimate is made by a human expert. Moreover, for large map series (e.g. cadasters, national topographic maps, atlases), a sampling of 2% to 5% of the corpus (from smaller to larger scale maps) is performed, with a minimum sample of 5 maps per series. According to the metadata, the resulting corpus comprises more than 1800 distinct publishers, and 2700 creators. The complete table of maps and associated metadata is published along with this article Fig. 1.

Fig. 1
figure 1

Descriptive statistics on the final corpus by coverage, publication year, and map scale.

The maps are geocoded using coordinates harvested from the metadata, when available, or by recognizing the geographic named entities present in the title or description of the map, using natural language processing tools (spaCy: Industrial-strength NLP, 2022). The Nominatim API (Nominatim (2022)) is used to geocode the entities (i.e., to obtain geographic coordinates from placenames); the results are manually checked by a human expert.

Within this corpus, we identify 11 coherent map series, i.e., maps belonging to a series that constitutes an editorial ensemble, and stem from a same production context. These 11 map series should exhibit a variety of scale, time, topic, and geographic subject, as they will be foundational to the operationalization of cartographic figuration. For this purpose, we select three series of cadastral maps of Lausanne: the Ancien Régime cadaster by Melotte (1721–27), the ‘Napoleonic’ cadaster by Berney (1827–31), and the renovated cadaster by Deluz (1880–86), as well as three series of Napoleonic cadastral maps of the Rhône (1808–68), Côtes d’Armor (1800–66), and Haute-Marne (1800–50) departments. We also include two series of Parisian city atlases, by Jacoubet (1825–40) and Alphand (1860–89). Additionally, we select three topographic series: the 1:40,000 maps of the Etat Major (1831–65), the 1:50,000 national maps of the Institut Géographique National (IGN) (here 1926–63), and the Swiss Federal Topographical Surveys (Swisstopo), starting with the Siegfried maps (1870–1949). These 11 series represent 1063 maps, i.e. around 10% of the corpus (details in Supplementary Material).

Fragmentation

Recent advances in natural language processing have profoundly challenged previous approaches in linguistics to modeling grammar and syntax. Pragmatically, the problem of operationalizing the syntax itself seems to be solved. One of the methodological paradigms that made this possible is the fragmentation of texts into tokens. The second is the statistical association of tokens in texts. In this context, tokens are not even words. They are units consisting of about 4 characters or circa 3/4 of a word. The process of fragmentation of the text into tokens makes it possible to create convenient units, that can be easily represented in a digital space, and manipulated with computational approaches. Interestingly –and one may say counter-intuitively– this fragmented representation does not alter the machine’s ability to learn the syntax and grammar of a language, quite the contrary.

Drawing inspiration from ‘tokenization’, our methodology is based on the ‘fragmentation’ of the digitized map image into tiles, or elementary map units (mapels). In our perspective, a ‘mapel’ is an etic unit, analogous to a ‘phone’ for instance.Footnote 1Mapels are considered physical events, specific instances, cartographic utterances, and the realization of a mapping process. In terms of computer vision, when studying mapels, we study the graphical configuration of map extracts.

Concretely, 800 mapels are cut out automatically from each map in the corpus.Footnote 2Each unit has a size of 50×50 pixels, which permits to cover a minute space of the map.Footnote 3Together, they constitute a summary of the map’s figuration. Non-geographic content of the map, as well as areas of the map that are devoid of graphical content, or exhibit a particularly low graphical load, are excluded.Footnote 4Except for these areas, the initial location of the mapels is first drawn randomly, and each one is recentered on the most salient graphical feature (e.g. line, point, icon) close to the initial location. The orientation of the local graphical features is computed, saved, and a rotation is applied to the mapel, so that the graphical content of each unit has a neutral orientation. The computed orientation is stored separately, as an explicit variable attached to the mapel.Footnote 5

The stylometric space of cartography

The prevalent methods for image embedding –to create a digital representation of graphical objects– are based on neural networks. However, these methods have inherent limitations, such as producing opaque and non-interpretable features. Additionally, neural networks operate like black boxes, making them incompatible with hermeneutic requirements. Instead, we employ a distinct method, starting with a set of 13 candidate features, based on the literature in digital cartography analysis and computer vision (Barvir and Vozenilek, 2020; Dalal and Triggs, 2005; Herold, 2015: 64–67; Ojala et al., 2002; Otsu, 1979; Scharr, 2000; Uhl et al., 2018). These features cover explicit and complementary aspects of cartographic representation: color (5 candidate features), morphology (1), texture (1), graphical load (3), line thickness (1), and orientation (2). In this study, morphology refers to the form of the visual elements in maps, the shape and the arrangement of graphical features. Color, as a perceptual variable, is a difficult concept to operationalize when it comes to digital images. It is it is commonly encoded as red-green-blue pixel values. As we will discuss, this digital encoding is reductive and impedes the use of color as a pertinent visual feature. Texture refers to the arrangement of pixels in patterns, such as hatchings, grid, or plain patterns. Graphical load measures the density of graphical content in the image. Finally, orientation is computed based on the local dominant orientation of edges.

Methodologically, it is indeed not desirable to impose the features of the problem, since studying these features might precisely be of interest for the study of cartography. This is why we set up an optimization process to select the most suitable candidates to build a meaningful feature space. To do so, we rely on the principle of distinction between map series, which are coherent sets of maps considered homogeneous cultural, figurative, and editorial ensembles. To construct the digital space of cartographic figuration, we rely on the idea that a conceptual space is constructed by the contrasts and oppositions between the concepts themselves. This is a foundational notion in semiotics. “A semiotic system functions adequately as long as its elements and structures remain sufficiently distinct or differentiated. The semiotic principle which guarantees this functional differentiation is called the principle of pertinence […], pertinence is the property by which a speech sound (a phone) is structurally opposed and thus distinct from another speech sound of the same system.” (Nöth, 1995: 183). In this research, we base the optimization of the feature space on the conceptually simple idea that the figuration of two maps that belong to the same series is more similar than when the maps belong to distinct map series.

Our research hypotheses postulate that, after optimization, the feature space will enable distinction between map series that represent the same geographical location (1), and those of similar scale (2). For instance, different series representing Paris at a similar scale (e.g. city atlases) should be distinguishable, as they exhibit stylistic differences. However, we expect the Napoleonic cadastral map series to be indistinguishable from one another due to their identical production process and representation conventions (Hennet, 1811). If these hypotheses hold true, the selected features will be demonstrated as independent from subject, and scale, thereby maximizing their potential to describe cultural and stylistic variables.

Each map is represented by a set of 800 elementary map units (mapels), with each unit tentatively represented by 13 candidate features. In an optimal feature space, mapels of a particular map are close to similar mapels found in maps from the same series. To measure this proximity, we introduce a free parameter k and assume that if the distance d between two mapels mi, mj is less than k, the two units can be considered close, or analogous (Eq. 1).

$$(d({m}_{i},{m}_{j}) \,< \,k)\Rightarrow ({m}_{i}\sim {m}_{j})$$
(1)

In this context, k can be defined as the radius of free variation. The variation in the features, within this radius, are so small that they do not disrupt the relation of analogy between two mapels.

Next, the stylometric distance D between a map A and a map B can be defined as the proportion of mapels of A that are analogous to at least one mapel of B, plus the proportion of mapels of B that are analogous to at least one mapel of A (Eq. 2).Footnote 6

$$D(A,B)=\left[\mathop{\sum}\limits_{{m}_{i}\in A}\left(\mathop{\min}\limits_{{m}_{j}\in B}d\left({m}_{i},{m}_{j}\right)\right)\, <\, k\right]+\left[\mathop{\sum}\limits_{{m}_{i}\in B}\left(\mathop{\min }\limits_{{m}_{j}\in A}d\left({m}_{i},{m}_{j}\right)\right)\, < \,k\right]$$
(2)

Hence, when maps A and B belong to the same series, many mapels should be shared by both maps. The stylometric distance D represents the proportion of mapels common to both maps.

Furthermore, we make use of information on map series to which individual maps belong, to identify an optimal feature space. This problem represents the inverse of a clustering problem. In this case, the clusters are known (map series), and we are trying to find the features space that best matches this data and allows us to distinguish the clusters. Translated into mathematical terms, the optimization problem requires maximizing the median distances μD(Sa, Sb)ab between maps of different series S, and minimizing the median distances μD(Sa, Sb)a=b between maps of the same series. To keep the space structured and homogeneous, the optimization is constrained by minimizing the deviations of distances within the same series σD(Sa, Sb)a=b, as well as the deviations of distances between two different series σD(Sa, Sb)ab. The aim is to avoid the creation of a sparse, inhomogeneous space, or the impact of outstanding maps. For the mean noted \(\bar{\cdot }\) :

$${\max }\frac{\overline{{\mu }_{D}{({S}_{a},{S}_{b})}_{a\ne b}}-\overline{{\mu }_{D}{({S}_{a},{S}_{b})}_{a=b}}}{\overline{{\sigma }_{D}{({S}_{a},{S}_{b})}_{a\ne b}}+\overline{{\sigma }_{D}{({S}_{a},{S}_{b})}_{a=b}}}$$
(3)

To solve this optimization problem, the feature space can be deformed, either by adding and removing some of the 13 candidate features, by modifying their relative weight, or their internal parameters. The exploration of these different combinations is performed using a genetic algorithm. Genetic algorithms are a powerful family of algorithms used to find the optimal combination of parameters in a problem. They mimic an evolutionary mechanism, with several iterations called ‘generations’, where the best performing parameters are kept from one generation to the next. At the end of the evolutionary optimization process, the best combination of parameters is retained. The radius of free variation k is dynamically optimized for each solution, using a line search.

At this stage, the feature space enables us to efficiently contrast the cartographic representation of the various series. We will discuss this aspect in more depth in the section ''Exploration of the stylometric map space'' of the results, but we can already indicate that the mathematical optimization allows us to select 6 features among the 13 candidates, and to assign them a weight between 0.05 and 1 (details in Table 1). These features thus compose a 6-dimensional space, where some dimensions have a greater impact on the distance between two mapels. In this space, the distance d between two mapels is indicative of the figurative proximity between them. Moreover, the distance is no longer derived from an arbitrary set of graphical metrics, but directly from the notion of distinction between map series. Mapels that are separated by a distance smaller than k, the radius of free variation, are analogous (Eq. 1).

Table 1 Candidate features, and their resulting weight after optimization.

Interpretation of the feature space

In this respect, mapels lying in the same hypersphere of radius k can be considered variants of the same canonical form. However, this canonic form does not constitute an emic unit (a “mapeme”) in the sense of linguistics. An “emic unit, such as a phoneme or morpheme, is an invariant form obtained from the reduction of a class of variant forms to a limited number of abstract units” (Nöth, 1995: 183). The distinction between emic units should also reflect the possibility to alter meaning. In the present case, this second necessary condition is not demonstrated. In this regard, the nature of the relationship between mapels and their canonical form is rather that of a token to its type. Therefore, we prefer to call these canonical units ‘mapotype’. A mapotype, in this context, is the archetype obtained from the reduction of a class of variant cartographic utterances. Albeit we can intuit that mapotypes are cultural units, because of the contrastive process they stem from, what distinguishes them from emic units seems to be the scale at which differentiation operates. While emic units are determined by their direct functional role, such as their capacity to alter meaning, mapotypes are determined by their joint ability to reflect cultural contrasts. In this context, for instance, mapotypes make it possible to operationalize contrasts between cartographic series.

We use the concept of mapotype to construct meaningful and pragmatical observation units for the study of the evolution of cartographic figuration. To do so, we iteratively subdivide the feature space, until we obtain cells of radius r < k. These cells represent our functional units of analysis, or mapotypes.

Analysis

For visualization purposes, the mapel closest to the geometric center of the mapotype is used to represent the latter in a geometric space. To facilitate visualization in 2D, we reproject the mapotypes using t-stochastic neighbor embedding (t-SNE, Maaten and Hinton, 2008), a nonlinear dimensionality reduction method that preserves neighborhood relations when possible. The projected mapotypes are then constrained on a grid (Klingemann, 2020). This visualization we call ‘mapotypic mosaic’ constitutes the basis for most of the visual-analytical results presented in Section ''Results and discussion''. Our approach to the analysis of the results is inspired by a distributionalist-variationist perspective. We operate a uniform stratification (temporal, scalar) of the corpus and observe the relative distribution of mapotypes by stratum. The Fig. 2 summarizes the main methods deployed for the analysis of the results, as well as the steps of the methodology described in this section.

Fig. 2
figure 2

Flow-chart summarizing the method.

Results and discussion

Proof of concept

As discussed in Section ''The stylometric space of cartography'', the optimization process has the objective of maximizing contrasts, thus distance, between cartographic series. This process yields an optimum of 3.59 for the Eq. 3, and the resulting distances are presented in Fig. 3. The stylometric distance clearly differentiates between the maps belonging to the same series, on the diagonal axis, and those belonging to distinct series. The only exception to this trend is observed for the Napoleonic cadasters, and this result verifies our research hypotheses (see Sections ''The stylometric space of cartography'' and ''Interpretation of the feature space''). Moreover, maps representing the same geographical subject (Paris, Lausanne), but belonging to different series are correctly distinguished. The metric is also robust at various scales and map types, from cadastral plans to national topographic maps, and city atlases.

Fig. 3
figure 3

Matrix of average stylometric proximity between the maps in the selected cartographic series.

Until now, the notions of elementary map unit (mapel) and mapotype may have seemed rather theoretical and abstract. We have stated that replication is difficult to circumscribe when it comes to representation, as each instance is somehow unique. In Fig. 4, we reveal how mapels look like, and how much variation is found within a mapotype. As will be discussed in the next section, the color itself is not a feature of the stylometric map space. Apart from that, we note that mapotypes generally capture resemblance, although a certain amount of free variation remains. This seems particularly true for most complex and cluttered mapels. Overall, as a complement to the statistical validation offered by Fig. 2, this outcome is sufficiently consistent to validate a distributional approach.

Fig. 4: Qualitative evaluation of an exemplar sample of mapotypes.
figure 4

Each row corresponds to a mapotype, and each tile corresponds to a mapel within that mapotype. For each mapotype, a sample of mapels is drawn at random from the corresponding year range subset. When no instance of the mapotype is present for a given year range, the space is left blank.

Exploration of the stylometric map space

The features selected to construct the stylometric map space, after optimization, are listed, with their relative weight and possible interpretation, in Table 1. In general, morphology seems to be the most pertinent feature (weight = 1.0), far ahead of features related to graphical load (0.25, 0.15), line thickness (0.25), or texture (0.1). The importance of the orientation seems minimal (0.05), as expected. No feature related to the distribution of colors was retained.

In total, our methodology yields 55,599 mapotypes. Only 0.1% have less than 5 occurrences in the corpus and more than half of them (53%) have more than a 100 and account for 85% of the mapels in the corpus (see Supplementary material). An overview of all the mapotypes observed is presented in the mapotypic mosaic (Fig. 5). Note that this space is not globally continuous. The t-SNE algorithm used for visualization tends to preserve neighborhood relationships at the expense of the linearity of the dimensional reduction. Overall, we can notice from Fig. 5 that the method is highly sensitive to different kinds of lines (contrasted, dark, thick, colored, interrupted, or continuous, e.g. A3, A4, B4), textures (hatchings, dotted patterns, grids, e.g. A1, B3), as well as other graphical elements (texts, colored gradients, e.g. C4).

Fig. 5: Mapotypic mosaic.
figure 5

Constrained t-SNE visualization of all the mapotypes observed in the corpus. The graticule (A1D4) is used in the text as a coordinate system to refer to specific zones of the image. The image enlargements in full resolution are available onlineFootnote

https://github.com/RPetitpierre/cartographic-stylometry/tree/main/Figure_5

.An enlargement of each image zone (A1D4) is also provided in the supplementary material.

The fact that no descriptive feature of color distribution is retained suggests that contrary to our intuition, color might not be a very strong explanatory cue to operationalize stylometric proximity. On one hand, this intriguing result may indicate that color is not specific enough. Indeed, similar shades can be found in various unrelated maps and contexts, which is also true for some texture types. On the other hand, the identity of color is only a mental concept derived from the Gestalt principle of similarity. Recent psychophysical and cognitive research suggest that color saliency, contrasts, as well as high-level knowledge might be more defining factors of perceived identity or constancy of color (Maule et al., 2023; Tajima and Komine, 2015). Especially for hand-colored maps, the physical color as measured by the camera can be quite different from one area of the map to another or from one map to another, within the same series. This is true even when color is perceived as identical, hence the difficulty of applying color-based segmentation methods to cartographic documents, as experienced in previous work (Leyk and Boesch, 2009; Liu et al., 2016). However, despite the absence of features specifically encoding color, it appears that the feature space differentiates between colored and uncolored features (e.g. Figures 5, C4). This is interesting because it could indicate that more than the shade itself, it is the fact of being colored or not that is decisive in the characterization of the cartographic representation.

Due to the optimization methodology used to select features, the occupation of the space itself can be interpreted as a marker for determining which graphical elements are most important for the analysis of figuration. Thus, the fact that a large proportion of the space is simply occupied by different kinds of lines indicates that these minute differences in linework may play an important role in characterizing cartographic representation, which is consistent with the literature. For instance, line drawing is one of the main cues that allow experts to distinguish different techniques (e.g. engraving, etching, lithography) or tools used to create the map (Gascoigne, 1986).

Cartography in the Conjuncture

The previous results demonstrated the potential of our approach to distinguish between map series and provided a first insight into mapotypes. We now seek to use them as tools to dive into the evolution of cartography over periods of a few decades each, in a medium-term timeframe close to what Fernand Braudel referred to as the conjuncture (Braudel, 1996; Santamaria and Bailey, 1984). Fig. 6 is based on the projection used for creating the mapotypic mosaic (Fig. 5) and permits to visualize the relative distribution of each mapotype, within time, and scale-based strata. Tab 2. presents the W statistic (n = 55,599) of the Shapiro-Wilk normality test (henceforth called SW test), as adapted by (Royston, 1995), applied to the relative distribution of mapotypes per time and scale stratum. The SW test is considered the most powerful normality test (Razali and Wah, 2011). Lower W statistics indicate the occurrence of extremely low or high-frequent mapotypes (i.e., departure from normality in the relative frequency distribution), thus a more characteristic figuration in a particular stratum.Footnote 8Put simply, a lower value indicates that the distribution for this stratum is not uniform and thus some types are clearly over- or under-represented.

Fig. 6: Evolution of the relative distribution of the mapotypes through time, and by scale.
figure 6

The upper part of the figure represents the relative distribution of types, by time strata. The lower part of the figure represents the same distribution, by scale strata. Each block depicts the density map for a specific year or scale stratum, defined by uniform stratification (1/9th, 1/6th). The corresponding stratum is indicated on the left of each block row. The brighter areas correspond to a relatively high representation of the corresponding figurative subspace. The graticule in the lower-right corner can be used to make the connection between each block and the Fig. 5.

1600–1817: From geographic iconography to the symbolic map

We can note in Fig. 6 that cartography occupies a relatively stable stylometric subspace from 1600 to 1817. In the first time stratum (1605–1730), mapotypes in the lower right (B34, C4, D14) are dominating. These brighter areas correspond to figurations strongly characteristic of this period. Gradually, from 1730 onwards, the occupation of the feature space becomes more diffuse, with fewer bright areas. This dilution of the characteristic idiosyncrasy is also observed statistically (Table 2). Indeed, the W statistic of the SW test is increasing for the time slices 1731–1770 and 1771–1817, implying a more normal distribution of the relative frequencies per stratum. The 17th century and the beginning of the 18th century are characterized by an over-representation of handcrafted shading. This corresponds for instance to irregular hatching and diamond-shaped grid textures observed in Fig. 7a, b. The lines are often relatively thick and dark. There were also hand-painted icons, representing trees for example.

Table 2 Statistic W of the Shapiro-Wilk normality test [0–1] applied to the relative distribution of mapotypes per time and scale stratum.
Fig. 7: Top 5% most characteristic mapotypes by time strata.
figure 7

a 1605–1730; b 1731–1770; c 1771–1817; d 1818–1834; e 1835–1849; f 1850–1866; g 1867–1884; h 1885–1900; i 1901–1950.

From 1731 onwards, more abstract textures, such as colored hatching, became comparatively more frequent. They adopted light pastel tones. The turn of the century (1770–1817, Fig. 7c) was a continuation of this trend of abstraction. Some characteristic motifs of this period included the abstraction of wooded areas or orchards by small circles (°) and vineyards by crossed-out S ($). Hatching became more regular and were also used to represent the approximate relief.

The studied period starts after the popularization, during the 16th century, of copperplate engraving by Italian mapmakers (Woodward, 1996). Prior to this, the predominant technique was xylogravure (woodcut). A certain legacy of this period remained, such as the aesthetic of the thick, dark line. Deeply marked lines could extend the longevity of the copper plate. The pursuit of deep black pigments, obtained from wine-black or from white wine and beer is also well documented in secondary sources (Woodward, 1996). In the 17th century, textures seemed to be used primarily for shading figurative elements (e.g. buildings). At this time, cartography still blended strongly with iconographic engraving and the adoption of a less abstract, oblique perspective was still very popular, as in the works of the Swiss cartographer Matthaeus Merian, for instance, in which buildings are stylized but clearly recognizable (Merian, 1638). This early ambiguity between iconography and cartography is one of the reasons why the 16th century was not included in the corpus. The period 1731–1817 truly materialized the transition to a more abstract cartographic representation. For example, the rather iconic representation of wooded areas and trees was replaced by a more abstract, symbolic one. The foliage of the trees became circles, the buildings or the colored furrows of the crops became hatching. The use of shading hatching was seemingly altered to represent topography and natural relief. This result can also be intuited from previous work, such as that by Dainville. Although, in that case, the punctual examples and counter-examples made it difficult to ascertain the presence of a general trend (Dainville et al., 2018).

1818–1834: The boom of cadastral survey

The early 19th century transition is the most marked in the corpus. The characteristic areas are highlighted in Figs. 6, A23, which corresponds to thin lines. The line, simple and uncluttered, became in fact the central element of the cartographic representation. We also notice in Fig. 7d the presence of colored borders, notably characteristic carmine pinks, but also shades of orange or blue. The distribution of mapotypes in this period also strongly coincides with their distribution among large-scale maps (1:1801–1:2500). The stratum 1818–1834 is also the period with the most characteristic figuration, according to the SW test (W = 0.81).

This period 1818–1834 was marked by the advent of the so-called ‘Napoleonic’ cadaster, of which a number of plates are found in the corpus. The production of the Napoleonic cadaster, which also extended to Switzerland, was a monumental undertaking that involved the surveying and recording of more than 100 million parcels in France. Established primarily for fiscal reasons, it also materialized the increasing influence of the administration (Clergeot, 2007). The cadastre was not only a powerful dispositif, reinforcing the control of authority over the population and the territory. In the context of post-revolutionary France and the First French Empire, the cadastre was presented as a way to introduce an equal and fair taxation system. The first attempts to establish a cadastre for the new empire began in 1807, but they led to very uneven results. Therefore, in 1811, the ‘Recueil méthodique des lois, décrets, règlemens, instructions et décisions sur le cadastre de la France; approuvé par le Ministre des Finances’ was published (Hennet, 1811). The document was supposed to organize and regulate the cadastration of the territory by the surveyors, and establish the mechanisms for the control of the surveys by a verifying engineer. The document also defines the scale of the cadastral maps, as well as the rules of surveying, the methods for the calculation of triangulation, and for the recognition of owners. It also lays down the principles for the division of the commune into sections and the ‘tableau d’assemblage’. The Recueil is also the document that founds the concept of urban cadastre (Bourillon, 2007). The instructions related to the figuration occupy a modest part of the document. As an administrative document with a very limited edition, the cadaster was not engraved, but hand drawn. The characteristic lines of this period, traced with Indian ink, were used to delineate the parcel of land, as well as the administrative boundaries, effectively and precisely. Furthermore, it was set that “the limits of each section are marked by a border of a different color for each” (Hennet, 1811: 97), which is indeed observed in the results. Carmine pink was the official color for representing the built environment, but the law also regulated the use of green, red, blue, and ‘water green’ (vert d’eau). What some cartographers call by extension ‘Napoleonic cadastre’ is in reality a set of documents produced in various administrative contexts, whose realization began during the Napoleonic period but continued until the middle or the end of the 19th century, according to the definition. It is not geographically circumscribed to France or the First French Empire, but also includes Belgium, the Netherlands, Switzerland, Spain, Northern Italy, and sometimes Austria-Hungary. These documents are characterized by a relative adherence to the rules described in the ‘Recueil méthodique’. In fact, the first instances of the figurative principles laid down in the Recueil precede the publication of the latter. For example the austrian cadastre of Venice, drawn up in 1808, is often considered to be one of the first Napoleonic cadastre (di Lenardo et al., 2021). It is also important to mention that other legal documents were sometimes established at a local level to regulate the surveying of the Napoleonic cadastre. For example, in the Swiss canton of Vaud, which includes the city of Lausanne, it is the ‘Instructions données par le Département des Finances pour la levée des plans et l'établissement du cadastre, ensuite de la décision du Conseil d’Etat du 6 décembre 1826’ (Soulier and Berdez, 1827). This document repeats almost point by point the ideas of the Recueil. However, there are some interesting variations. For example, it states that water should be represented in blue, not green.

The cadastration campaign in France was interrupted shortly after the publication of the Recueil, first by the economic crisis and instability caused by the Napoleonic Wars, and then by the fall of the Empire (Vivier, 2007). Under the Bourbon Restoration, the idea that the cadastre allowed for an egalitarian taxation throughout the territory was highly contested. The administration of the cadastre was reduced to a trickle, before being re-established by a regulation in 1827. It was not until this date, and even more so under the July Monarchy, that most of the territory was surveyed, which explains the chronology observed in our results. The highly regulated production context of the Napoleonic cadastre, in terms of drawing conventions, also explains why these are the least well differentiated series, according to the results presented in Fig. 3, and why the figuration of the maps in this time stratum is particularly characteristic.

Based on the quantitative methodology we describe in this article, (Petitpierre, 2023) explores how these legal instructions and the related graphical utterances were replicated in various local contexts, in France, but also in Switzerland, Belgium, and the Netherlands. According to their results, the visual consistency of the Napoleonic cadasters is at its maximum when observing the first generation of cadasters, established until the 1820s, and progressively decreases for the later ones (‘renovated’ cadastres, established in the second half of the 19th century). Urban cadasters seem also to be more consistent with each other, compared to rural ones, which is consistent with research on the diffusion of innovations (Lengyel et al., 2020). (Petitpierre, 2023)‘s case study on the Napoleonic cadasters also shows how mapels can be used to compare the representation of various semantic classes, such as buildings, water, and the road network, in different cadastral cases.

1835–1884: Cartography in the Industrial Revolution

The years 1835–1849 and 1850–1866 enabled for the exploration of new figurative spaces (Figs. 6, A1, 2, B1, 2), and were characterized by a marked return of hatched textures (Fig. 7e, f). This time, however, the patterns were more regular, the lines were generally finer and the spacing between them could be shorter. We also observe in the bottom-left part of Fig. 7e shading lines of varied identical length. From 1850 onwards, color prints also became visibly more frequent.

These developments, which took place in the mid-19th century, were likely due to the propagation of technological innovations that occurred several years earlier, in the late 18th and early 19th centuries. A first innovation was the development, thanks to the advances in mechanics that preceded the French Industrial Revolution, of rocker mechanisms and mechanical rulings (Robinson, 1975). These tools hastened the engraving of straight lines and hatching. Even shading is somewhat rationalized. In the map of the Etat Major, for instance, shading lines are in fact based on surveyed contours curves (Arnaud, 2022: 138–149); the length of the shading lines is inversely proportional to the slope steepness. Another innovation was the invention of lithography, which, by exploiting the chemical antagonism between hydrophilic and hydrophobic substances, made it possible to draw the maps more naturally, directly on the plate. This innovation, detailed in 1818 by the German Alois Senefelder (Senefelder, 1818) and translated into French in 1819, considerably accelerated plate drawing compared to copperplate engraving, while offering a more precise and finer line than etching. However, if this method was popularized in Germany as early as 1809, with the Bavarian cadastral campaign, its diffusion in the French-speaking world was slower (Ristow, 1975). It was not until 1826 that a commission of the Société de Géographie emphasized the advantages of this new method, although it still considered copperplate engraving to be “more perfect” (Jomard et al., 1826). In 1837, the Frenchman Godefroy Engelmann also presented a method of chromolithography that became very popular and made a major contribution to the development of color printing (Ristow, 1975). Our results (Fig. 7g) indicate that the quantitative impact of these technological innovations was not immediate. Indeed, it is reflected especially in the period 1835–1866, and even rather 1850–1866 for color printing, during which the production really took off. This time delay between the technical innovation and its dissemination is not surprising. Indeed, our definition of the characteristic idiosyncrasy of the figuration in a certain period is above all quantitative and thus tends to reflect substantial and momentous developments, rather than avant-garde movements. The Carte géologique générale de la France, for instance, was printed first in color in 1842 (Rossi, 2005). By 1855, when this map was showcased at the Universal Exposition, 3,000 copies had been printed. Printing large-format maps in color remained a challenge until the mid-19th century. The first sheets printed in colour by the Dépôt de la Guerre, for instance, had to wait until the 1860s (Arnaud, 2022: 76). Color printing, then six times cheaper than manual colorization, finally enabled the spread of color in mapmaking (Cook, 1995).

The period 1867–1884 continued and strengthens the mid-century trends. We also observe more texts, such as street names, as well as the strong representation of regular dashed lines, which make this period very characteristic, in terms of figuration (Table 2). This period was marked, among other things, by the regular publication of the Parisian atlases, in particular the Alphand plans, from 1867, towards the end of the Haussmann works. Some mapels of these maps are represented among the most characteristic mapotypes of the period.

1885–1950: Democratization

During the later part of the 19th century and the first half of the 20th century, color became completely prevalent. Almost all the characteristic mapotypes of the period 1885–1900 are colored (Fig. 7h), such as the fine blue hatching used in the Siegfried series of national maps in Switzerland to represent lakes, or the brown contours lines, whose presence also reflects the technical progress of the 19th century in terms of relief depiction (Hurni, 2008), but also in terms of geodetic survey. In the first half of the 20th century, we note the presence of mapotypes from urban maps, containing textual information on street names. A graphical and typographical evolution can also be observed in this respect with the development of a more modern aesthetic. Finally, we also notice the presence of halftone patterns consisting of a grid of small dots in grayscale.

The production of the national topographic maps in the 19th century, such as the maps of the Etat Major in France, and the Siegfried maps in Switzerland, represented a masterpiece that incarnates the rise of nation states. In the 19th century indeed, cartography is thought to have played an essential role in the construction of modern nation states (Branch, 2014; Schulten, 2012). The production of the Siegfried national maps for instance, was the subject of intense debates among the scientific community and the military institutions. Its production was transferred from lithography to more costly but precise copperplate engraving beginning in 1910, resulting in particularly fine textures and sharp contours (Feldman, 2015). The creation of national topographic series is also an opportunity to undertake new geodetic surveys. These increasingly accurate elevation measurements were plotted as contour lines. Altough this technique for representing land relief dates back to the end of the 18th century (Arnaud, 2022: 63), it widespread use was slowed down until sufficient elevation data was available. The beginning of the 20th century also reflects another paradigmatic shift in cartography. The map was no longer limited to a military or administrative tool. Indeed, following the technical advances of the 19th century which enabled a vertiginous rise in production, and the increasing mobility, the touristic or orientation map became popular, creating new economic opportunities (Akerman and Nekola, 2023; Olson, 2010). The cartography of this period also shows the arrival of photography, and the development of aviation. These technologies led to the appearance of photomechanical printing processes, such as rotogravure or halftone screens. These techniques produce very convincing halftone patterns that are reflected in the image by the presence of regular screens composed of small dots, as observed in Fig. 7i (Cook, 2015). These photographic techniques were used, for example, to map mountain relief in the Alps, either by aerial photograph (Imhof, 1982: 48–52), or by photoengraving from a relief model (Pearson, 2015: 1260–1262).

Macroscopic trends

In this final result section, we look at some of the macroscopic trends that can observed in the longer term. These trends can be derived from the observation of the interpretable features we have calculated, or simply from the relative distributions of mapotypes, the functional units we defined in section ''Interpretation of the feature space'' as “the archetype obtained from the reduction of a class of variant cartographic utterances”.

Disruptive evolution regarding scale

The issue of scale also plays an important role in the characterization of map figuration (Fig. 6). We notice that mapotypes corresponding to 1:200–1:2,500 maps occupy very distinct figurative spaces compared to 1:25,001–1:53,500 maps. As evidenced by the high value of the SW test statistic (Table 2), the stratum of 1:2,501–1:25,000 maps partly bridges the gap between the characteristic large-scale maps and the equally characteristic very small-scale maps. Fig. 8 shows that the correlation (as measured by Kendall’s tau coefficient, Kendall, 1938) of this stratum with other scale strata is weak, but above average, which supports a bridging role hypothesis.Footnote 9

Fig. 8: Evolution of mean Kendall’s correlation tau coefficient between the relative distribution of mapotypes for the 3 scale strata, through time.
figure 8

Kendall’s tau measures the degree of similarity the distributions, i.e., the stylistic correlation between different map scales. All pval < 8·10−4 < 0.05. n = 55,599.

For the 1:25,001–1:53,500 maps, the distribution is concentrated in very bright clusters in Fig. 6. A visual investigation of the mapotypes informs about the strong predominance of lines at large scale (1:200–1:2500), with lesser graphical load, while the small-scale maps (1:25,001–1:53,500) show a higher graphical load, which seems not only due to a higher concentration of information, but also to the presence of many textures depicting relief and topography with shading (see also Supplementary Material). Figure 8 suggests that these figurative differences develop during the period studied. At the beginning of the period indeed, the results show a positive, albeit weak association between the cartographic figuration at various scales. However, from the end of the 18th century, the association becomes more marked and negative. This indicates that the corresponding stylometric map spaces became increasingly distinct.

This disruptive evolution can intuitively be understood as a phenomenon of differentiation, in which one genre, the map, separated into two distinct sub-genres, specializing differently for small- and large-scale maps. This result is interesting, firstly because of its temporality, since this phenomenon, beginning between 1771 and 1817, was concomitant with the appearance of the Napoleonic cadaster, and secondly and more importantly with the invention of the concepts of ‘small-scale’ and ‘large-scale’ themselves, as Edney demonstrates (Edney, 2019: 171). While he postulates a generalization of the principles of large-scale mapmaking, and foremost the idea of scale, to small-scale maps, we observe a figurative divergence. This is not incompatible, as figuration could have become a novel way of distinguishing these newly conceived categories, that became otherwise united by making practices. This scale-related differentiation seems persistent. Indeed, even if it weakens slightly at the end of the century, it is far from returning to its previous values. These results suggest that cartographic figuration, initially independent of scale (hence the low initial correlation), diverged durably at the turn of the 19th century (negative correlation). This dichotomy is not surprising for those studying historical and contemporary cartography. Until the end of the 20th century and the arrival of web mapping, which has somewhat blurred the boundaries, the style and semiology of large-scale maps (e.g. cadasters) and smaller-scale topographic maps differed evidently. However, our results suggest that such a dichotomy may not be justified for earlier maps.

Fashions and macroscopic trends on representation

As the selected features are interpretable (Table 1), they provide informative metrics on the macroscopic evolution of the corpus (Fig. 9). The Regional Kendall statistical test (RK, Helsel and Frans, 2006) assesses the significance of a regional trend.Footnote 10 In the present case, ‘regions’ are not geographical, but rather scale-based strata. Indeed, the measured variables –graphical load, line width, number of components– are not expected to be independent from scale. Therefore, we constitute 3 equilibrated ‘regions’ for large-, medium- and small-scale maps (1:200–1:2500, 1:2501–1:10,000, 1:10,001–1:53,500), to assess that the trends observed are not due to a variation in map scale, but rather to a general trend in cartography. The results of the statistical analysis are summarized in Table 3.

Fig. 9: Evolution of graphical load, line width, and number of components through time, overall and per scale-based strata.
figure 9

For the overall evolution, the median line, and the 25–75th percentiles interval are represented.

Table 3 Statistical evaluation of the trends observed in the macroscopic evolution of the variables (Fig. 9), using a Regional Kendall test.

These macroscopic evolutions (Fig. 9) are consistent with the effects observed in the previous section. They highlight technological and cultural processes that first favored the production of finer lines, with a peak in the first half of the 19th century, before the line was reinforced again. The process of abstraction of geographical elements, or shift from a rather iconic to a more symbolic representation, which took place from the 17th to the beginning of the 19th century (Fig. 7a–c), seems accompanied by a stability of the perceived graphical load,Footnote 11despite the increase in the number of components. This suggests an improved readability of the map. However, from 1807 onwards, the graphical load increased again, accompanied from 1834 by the thickening of the line, as well as the continuous increase in the density of graphical components. This phenomenon also coincides with an unprecedented increase in cartographic production, at the same period.

One possible hypothesis, based on the analysis of the macroscopic trends (Fig. 9), as well as the most characteristic mapotypes for the period (Fig. 7), would be that the map could have reached, at the end of the 18th century, a certain state of sophistication, which might in turn have favored its use as a device to spatialize other kinds of data, for example administrative or military. To fulfill these new functional requirements, specific and additional informational layers, such as topography, infrastructure, or property ownership, could be superimposed on geographic data. These additional geometric objects may have increased the graphical load and the density of components. Furthermore, at this time, the advances in cartographic printing techniques led to an unprecedented increase in production. We suspect that the sudden abundance of maps in certain social circles could also have fueled its complexification since the cartographic literacy of readers might have increased. At the same time, the improvement of printing methods also provided the technical requirements for creating more regular and finer patterns. These innovations seem to translate into a fashion of the fine line.

This trend seems to fade at the end of the 19th century, when the lines became again thicker, and more visible. This is evident in Fig. 7, where, at the beginning of the 20th century, mapotypes showing strong and contrasted lines are clearly visible.

General Discussion and Conclusion

To conclude, in this article, we present a new stylometric methodology applied to cartographic information. Our approach draws inspiration from structural linguistics, as well as distributionalism. We take as a case study a large and diverse corpus of more than 10,000 French and Swiss maps. The maps are decomposed into elementary map units (mapels). A set of interpretable features are selected, first based on the literature, and later by optimizing the representation to keep only those features that best contrast the cartographic series, and thus capture their cultural differences. This optimized feature representation makes it possible to group mapels according to their type. We call the resulting class ‘mapotype’.

The feature representation that results from this first approach effectively differentiate the various cartographic series, which confirms its effectiveness. Morphology seems to be the most relevant feature in the construction of this representation, followed by graphical load, line thickness, and texture. Counter-intuitively, color does not seem to stand out. The diachronic analysis of mapotypic distributions permit to identify key shifts or stabilities corresponding to important cultural and historical changes. The investigation of our corpus highlights a slow process of abstraction of the representation of the environment in the 17th and 18th centuries, which appears to favor symbolic representation over iconic ones. From the 19th century onwards, the boom of the cadaster reflects the rise of the administration, and the technological developments in mechanics and chemistry, in the context of the industrial revolution, especially the invention of mechanical rulings and lithography. During the 19th century, a peak in the production of finer lines is observed, supported by these technical developments and by cultural processes. We also evidence a macroscopic trend towards increased map load, which can be interpreted as a sophistication of the map, used as a support for the representation of various data (e.g. topographic, ownership, etc.). Later, the rise of nation-states is visible, with the production of color-printed national topographic maps, as well as the arrival of photography, aviation, and the growing tourism.

We see the approach we present in this paper not as a definitive tool, but rather as an initial proof of concept for cartographic stylometry. Pragmatically, we propose a computational fragment-based methodology for the visual study of cartography and demonstrate that it permits an effective dialogue with the history of cartography. This paper has therefore, above all, a confirmatory role, with respect to the methodology, by demonstrating that the results it produces integrate effectively with the existing literature on cartography. However, it has also made it possible, incidentally, to underline phenomena that are little or not theorized. This is the case of the disruptive evolution of maps according to scale, which seems to take place at the turn of the 19th century and cause a lasting split in the figuration of small- and large-scale maps, with the appearance of two sub-genres. The possibility of detecting such visual linguistic evolutions on large map corpora is original.

Another notable observation derived from our analysis pertains to the evident shift in cartographic representation at the same time. This ‘epistemological rupture’, to coin the term introduced by Gaston Bachelard and Louis Althusser (Althusser, 2005; Bachelard, 2001) is characterized by a substantial discontinuity in the distribution of mapotypes, which indicates a momentous change in the prevailing paradigm. The scale of this rupture also resonates with Michel Foucault’s work (Foucault, 1976), as he identified a profound shift in the way knowledge is constructed at that time, which he interprets as the transition from a classical to a modern episteme. Our findings similarly suggest an evolution at an epistemological level. Between 1600 and 1730, maps in our corpus are characterized by handcrafted shading, figurative elements, and a focus on visual aesthetics. This approach echoes a worldview that values direct resemblance and interpretation of the world. As time progresses, we note a transition to more abstract forms of representation. Instead of iconographic representation, cartographers began using simpler symbols to represent geographical features. This shift seems to reflect a changing understanding of knowledge as more abstract, categorized, and systematized. By the early 19th century, information overlay and data mapping, with a growing number of components, and soon graphical load, reflects an increasingly abstract and systemic understanding of geographical space.

In this regard, the scholarship of cartographic stylometry for which we advocate, is not only about retracing aesthetical ideals or technological progress. Notwithstanding its interspersion with technical innovations, the narrative that we attempt to construct allows us to read, in the multitude and beyond the eminent instances, the way in which the visual language of the mapmakers, evolves. Further, the changes we observe when studying maps do not only concern the way in which cartographers represent the world, but also how they perceive and idealize it.

A computational approach to cartography is not an end in itself. To propose a mise en abyme, we can draw an analogy between the computation of a formal feature space and a mapping process, with several implications: it is a subjective, reductive, incomplete, partial, idealized, constructed discourse on cartography (Crampton, 2001; Edney, 2019; Harley, 1989). However, like maps themselves, it offers the advantage of enabling discussion on territories that are beyond the purview of a single observer. In this regard, maps are astonishing devices that are connecting us to large and complex abstract ideas that could not be conceived without them. Our study focused on a varied corpus of more than 10,000 maps. As a comparison, it is already more than all the maps present in the five volumes of the History of Cartography series, whose analysis and discussion required the work of hundreds of the best experts in the field. The comparison is not about the nature of the knowledge produced, which is necessarily richer and more detailed in the second case, but merely about scalability. As we write, not tens of thousands but closer to a millionFootnote 12 maps have already been digitized by national libraries and map collections throughout the world. We can content ourselves with studying one percent of all maps and ignoring the majority, or we can try and invent new methods of approaching this vastly unexplored mass. We advocate for the second option. Computational methods can be used in complement to qualitative close interpretation of the documents, in particular for the confirmation of formulated research hypotheses on larger corpora, or the exploration of macroscopic trends. For instance, we could cross-validate the theoretical taxonomies of cartographic modes, such as the one proposed by Edney (Edney, 2019: 33). Finally, the methodology for cartographic stylometry we propose is based on particular details (mapels) embedded in individual maps, which may themselves be part of larger ensembles. This multiscalar trackable approach means that we can move back and forth between these different scales, and thus bridge qualitative and quantitative cartographic studies.

To conclude, as intriguing as the initial results proposed in this article might be, the design space for this approach still remains largely unexplored. The implementation of this method raised many questions. For example, at what scale (or scales) does the style of the map reside, and how do we define the size of the mapel based on this? What is the optimal number of mapels to represent a map? Should mapotypes be able to overlap? Is it desirable to extend the concept of map series to improve the representativeness of the model? What about authorship? On a purely technical level, one could also integrate other candidate features, for example by considering alternative color spaces (Banda et al., 2011; Ebi et al., 1994; Herrault et al., 2013; Salvatore and Guitton, 2004). In addition, various mathematical strategies for optimizing the feature representation could be explored, including differentiable ones. In future work, we expect to address some of these questions. We also seek to integrate semantics into the model, to foster a discussion of map semiosis. Furthermore, while maintaining our focus on cartography as a priority, we seek to build an approach as generic as possible, as it seems to us that the methodology could also be transposed to other domains. Future research should replicate the methodology to other corpora, to prove its genericity, and improve the specificity of the formal representation. Moreover, as the method demonstrated sound results in relation to the historical writing on cartography, we believe that it will now be possible to delve deeper into more complex cultural mechanisms, such as the propagation of cultural conventions over the long term and across large areas, a measure of evolutionary complexity in cartography, and the relationship between the signs and the perception of the environment itself.

Complement on techniques and methods

Segmentation of the map background

The geographic content of the map is separated from the background (map frame, title block, legend, scale, scanner background, etc.). For this step, a randomly drawn sample of 1061 maps, reduced to an area of 106 pixels (approx. 1000 × 1000 pixels) is manually annotated, i.e., the actual geographic area within the scanned map image is delineated. This annotated sample is divided into train, validation, and test subsets (70/20/10%) and is then used to train an OCRNet neural network (Yuan et al., 2020), implemented in mmsegmentation (MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark, 2022), in order to segment the maps into geographic vs. background map content. OCRNet uses an HRNet-V2p-W48 encoder, and is pre-trained on the Cityscapes-160k dataset (Cordts et al., 2016), containing urban scenes in first person view. The network is trained by stochastic gradient descent (lr=0.01), using the cross entropy as loss, until convergence. The trained model achieves an intersection over union (IoU) of 87.7% for geographic content segmentation (pixel accuracy 90.5%).

Exclusion of the image areas with a low graphical load

To limit the implicit impact of low information areas, the graphical load of each tile is computed, using ED2, an approach based on edge density. This metric was proposed by (Barvir and Vozenilek, 2020), who have evaluated and compared several metrics for computing the graphical load of maps, against a human panel. ED2 is the metric that appears to be the best proxy for measuring the perceived graphical load, according to their study. The computation of ED2 is done using a Scharr filter (Scharr, 2000) to detect edges, and applying a square root operator to the result. Then, areas with very low graphical load (ED2 < 2.75) are excluded. This criterion is applied once again after the normalization of the mapels.

Normalization of the mapels

The mapels are recentered on the most salient graphical feature close to their initial randomly drawn location. This is done by computing the pixel with the lowest value in the 50×50 area centered around the initial location and setting it as the new center. To neutralize the orientation, the Histogram of Oriented Gradient (HOG, (Dalal and Triggs, 2005)) is computed on each mapel (Eq. 4a). As its name indicates, the output of HOG is a histogram of the local orientation of graphical features, with 180 bins i.e. one per degree. The maximal peak in the histogram, corresponding to the dominant edge orientation, is stored as the mapel’s orientation (Eq. 4b).

$$HOG({m}_{i})=\{{h}_{0},{h}_{1},\,\mathrm{..}.,{h}_{179}\}$$
(4a)
$$orientation({m}_{i})=arg\mathop{\max }\limits_{{h}_{j}}HOG({m}_{i})$$
(4b)

Then, the mapel’s orientation is neutralized by simply locally rotating the image by its orientation.

Computation of candidate stylometric features

The candidate stylometric features are computed on each mapel. First the RGB color histogram is computed on each channel separately. Two different values are tested for the number of bins per channel: 6 and 9, resulting in an 18-dimensional, and another 27-dimensional vector for this featureFootnote 13. The other features relative to color distribution (peak color, standard deviation, skew, kurtosis) are computed on an extensive 256-bins histogram, for each channel separately (resulting on a 3-dimensional vector for each feature). The peak color is calculated by finding the histogram maximal peak (argmax). The use of these 4 features for the computational analysis of map images was first proposed by (Uhl et al., 2018). The HOG (Dalal and Triggs, 2005) is computed on the whole mapel, in grayscale, setting the number of bins to 12 for the orientation histogram, and testing three different parameters for the cell size (5, 10, and 25). The output counts three 12-dimensional vectors for this feature.

The linear binary pattern (LBP, (Ojala et al., 2002)) was computed on the black and white image, which is obtained by applying Otsu’s (Otsu, 1979) thresholding algorithm for binarization. The number of bins is set to 12, as for the HOG, and three different values are tested for the radius parameter (2, 3, and 4), accounting for various texture scales, and resulting in three 12-dimensional vectors.

For the graphical load, three approaches are assessed. The first involves a simple computation of the proportion of dark pixel in the binary image, after Otsu’s thresholding. The second, ED2 is based on edge density, as explained already. The last one involves counting the number of dark connected components after Otsu’s thresholding. For instance, a single line in a mapel will result in a value of 1, a double line in a value of 2.

To compute the line width, the image is first binarized, then thinned (i.e. skeletonized) using the algorithm proposed by (Zhang and Suen, 1984). The width is obtained by calculating the ratio before and after thinning. Finally, two methods are tested to operationalize the orientation. The first is using the value computed in the previous section, i.e. the maximum peak of the HOG prior to the neutralization of orientation. The second corresponds to the binned HOG pattern method, proposed by (Petitpierre, 2020) for the analysis of historical maps. This method also relies on computing the HOG, with 24 bins. The orientations are then regrouped into discrete angle bins: vertical (±π), horizontal (±π/2, ±3π/2), diagonal (±π/4, ±3π/4), regular oblique (±π/6, ±2π/6, ±4π/6, ±5π/6), and irregular oblique (all other orientations).

Features selection with a genetic algorithm

The optimization process (see Eq. 3) relies on the genetic algorithm implemented by (Solgi, 2020). The candidate features are normalized before optimization. The population size at each epoch is 20. The ratio of parents retained from one generation to another is 4/20. These ‘parents’ combinations are sometimes (with a 50% probability) recombined uniformly within each other with a single crossover point. The genetic algorithm is trained until reaching a plateau, which occurs after 80 generations.

Iterative subdivision of the feature space into mapotypes

To obtain the mapotypes, the feature space is iteratively subdivided, until obtaining cells of radius r < k, where k is the radius of free variation. The iterative division of the space is performed using a KMeans algorithm. Each cell is iteratively divided until the radius r between the center of the cell and the outermost mapel is less than k. The center is defined, in the sense of KMeans, as the geometric center of the samples in the cell. KMeans’ K parameter, corresponding to the number of clusters or subdivisions at each split, is set to 4.