1 Introduction

Throughout their history, universities have collected (and still collect) a lot of items related to their teaching and research activity. For example, faculties offering natural science courses needed plant and biological specimens so that students could learn about the reality of remote environments at a time when travelling was not as easy and fast as it is today, and the Internet was still merely science fiction. Meanwhile, libraries were acquiring books to be used for study and research, and archives were collecting the university’s institutional documents. Unfortunately, as science and technology developed, many items ended up in the backrooms or in reserve libraries, giving rise to the so-called Cinderella Collections [47].

At the same time, often for reasons of prestige [17, 25, 28, 30], many universities set up art collections with works that were sometimes purchased by the university itself, sometimes as a result of donations, and other times self-generated in the fine arts faculties. These collections were also often housed in buildings that were small architectural jewels of their time, as they had to demonstrate the university’s status as an institution of knowledge and research.

All this has created a wealth of heritage that is very important for any country. For example, 30% of Britain's heritage recognised as culturally significant by the Department for Culture, Media and Sport is found in universities, yet these institutions house only 4% of the country’s museums [24, 42]. However, this heritage is often poorly documented, usually due to a lack of resources and staff. The studies carried out by Salse et al. [35] show that much of this heritage is managed on a part-time basis by enthusiastic individuals who are experts in their field of knowledge but who are not experts in documenting heritage and do not have the time to give this task the level of attention it requires.

On the other hand, there is no shortage of documentation tools available for the task. Archives, libraries, and museums have generated their own often very complete documentation tools and have recognised the points that all these institutions have in common, using unifying terms such as GLAM or "memory institutions" [14, 50]. They have also generated tools that facilitate record sharing, such as metadata exchange schemas (Dublin Core, LIDO) and crosswalks. Finally, they have also found sites such as Europeana or the DPLA (Digital Public Library of America), which make GLAM records accessible from a single location. However, these tools have only occasionally been combined for integrated use in university heritage collections.

2 Objectives

This article proposes a methodology to bridge the gap between the existing documentation tools and the day-to-day reality of collections, especially small university collections with few resources and/or staff, which in this article are referred to as “non-museum collections”.

The proposal is based on the experience acquired in the creation of the Virtual Museum of the University of Barcelona, one of Spain’s so-called historical universities, founded in 1401 as the Estudi General de Medicina i Arts. The University of Barcelona (UB) is a perfect example of an institution that houses non-museum institutions, since it has various collections that fall into the following categories among those defined by UNESCO [39]: movable cultural heritage, immovable cultural heritage, and intangible cultural heritage. Furthermore, over the years, the UB has been upholding the importance of harvesting the human heritage 3F that has left its mark on the institution [40] for the benefit of the university environment. This last concept is not listed by UNESCO, but it is particularly important in the tourism sector.

With this starting point in mind, this article has two specific objectives:

  1. 1.

    To present a sustainable methodology for the documentation of non-museum collections in a digital environment. 4FFootnote 1 The aim is to establish a descriptive procedure that is sustainable and that can ensure that the items described for the future, with standardised schemas, are interoperable at all levels 5FFootnote 2 and adapted to the semantic web and the LOD movement. This methodology should make it possible to achieve a unified vision of university heritage that respects the diversity of the different collections but at the same time reveals the relationships that often exist between them (for example, different collections have items associated with medicine).

  2. 2.

    To propose a documentation model that facilitates the description and organisation of items or groups of items in these non-museum structures. This involves identifying the metadata system deemed the most appropriate and recommending the use of different value and content standards for the appropriate composition of the metadata.

The following is not explored in this article:

  • Specific proposals for documentation in archives, libraries, and museums. These organisations already have their own working systems and strong institutions providing guidelines (ICA, IFLA, ICOM6FFootnote 3). Their work focuses on the processing of records for central system input and on project management at the metadata level.

  • Specific proposals for human and intangible heritage. We believe that this should be the subject of a specific study to be published in the future.

  • Specific proposals dealing with processes associated with the life of the heritage item. For example, we would exclude the management of documentation resulting from the processes of selection, acquisition, loan, or management of events such as exhibitions.

3 Methodology

There are various methodologies that attempt to explain how to create a documentation project, but they are often based on the discipline of origin. The guiding principle for archivists and document managers is the ISO 15489 document management standard [22] and all the ISO standards and recommendations derived from it. In museums there have been various proposals, such as the joint work of UNESCO and ICOM [33]. On the other hand, Zeng [49] proposes a template with a series of steps to be completed in any metadata project. However, in the field of computer science, the SDLC (System Development Life Cycle) is used to generate computer systems [31], while the DBLC (Database Life Cycle) [13] is used to generate databases. The phases comprising this methodology can be seen in Fig. 1.

Fig. 1
figure 1

Source: Authors, based Coronel & Morris

Database life cycle (DBLC). [13]

In our study, we use DBLC as a global framework (since the ultimate objective of our system is the generation of a heritage database), which involves the following phases:

Within this generic structure, we will focus mainly on the Database Design phase since the aim of our study is to propose a sustainable documentation methodology for non-museum university collections. Although it is advisable to bear in mind the indications of institutions such as the aforementioned ISO, ICOM and UNESCO, in this study, we will adapt the recommendations of Valle, Fernández, and Arenillas [29] as they refer more specifically to heritage documentation. These authors structure this design phase as follows:

  • Definition of work team

  • Definition of objectives of documentation

  • Definition of subject of documentation

    1. o

      Classification of heritage

    2. o

      Selection of elements to be documented.

  • Definition of documentation tools and methodologies

    1. o

      Documentation methodology

    2. o

      Systematisation and standardisation of information

    3. o

      Information management (not discussed in this article)

The choice of metadata schemas and associated standards is based on the studies by Salse et al. [34, 35] identifying the different metadata schemas used in the field of university heritage, and also on the experience acquired by some of the authors of this study in teaching in the field of metadata and databases, as well as the experience gained throughout the implementation process of the new Virtual Museum of the University of Barcelona, as a case study. This process began at the end of 2018 and its main milestone was the publication in September 2021 of the first version of the new UB Virtual Museum (still under development).

4 Results (methodological proposal)

4.1 Definition of work team

Although this work team is not specifically part of the database design and implementation, it is essential for carrying out the entire DBLC and, as will be discussed below, it is also a key element in the creation of a sustainable university infrastructure for long-term heritage management.

In the initial selection of members of this team, it is very important to consider that the universities already have staff with the scientific and technical knowledge for effective heritage documentation. However, this staff is often dispersed across different departments and only librarians and archivists have documentation as one of their main tasks. Therefore, a working team needs to be created that will bring together the different areas of expertise. The structure proposed in Fig. 2 covers the tasks that should be carried out at the heritage level according to ICOM [12], although it is only a guideline and could include other actors, such as teachers in the marketing area to design dissemination programmes, or specialists in cultural management who could come from the economics area. The key factor to bear in mind is that it is necessary to involve personnel from different areas, faculties, or services within the university to reinforce the idea of a shared heritage that needs to be preserved.

Fig. 2
figure 2

Source: Authors

Work structure proposal.

Specifically, the following guidelines are proposed:

  • The work team should represent a cross section of the institution, bringing together professionals from different areas of expertise to achieve a common goal.

  • The management of activities (acquisitions, transfers, loans, exhibitions, events in general) should remain in the hands of the central heritage unit, although certain aspects may be delegated to the heads of the collections. It is important, for example, that the first management derives from the heads of the collections, who are the ones who know whether a particular donation or offer can be accepted. These management procedures would have to be specified in an internal regulation of the institution that would establish the specific processes.

  • The cataloguing/inventory guidelines must remain in the hands of the library and/or archive and/or the corresponding faculty of information science if one exists. The choice of these three actors is because both the library and the archive are specialists in the management of their respective metadata, while the corresponding faculty can provide a broader vision of the world of metadata that responds to the needs of the different collections. If the university has museums with staff dedicated to documentation, these would also be a good option.

  • The cataloguing itself should be carried out by the people designated within each unit/collection/museum, as they are the ones who know the context and the importance of the catalogued item, with the existence of occasional external partnerships, such as collaboration grants, crowdsourcing [3] or collaborations with final projects in undergraduate programmes or doctoral theses.

  • The conservation of museum objects should be the responsibility of the university’s fine arts centres, while libraries and archives, which often have restoration workshops, should be responsible for written historical documents. In the case of museum work, it should be remembered that most centres that teach studies related to the arts have conservation and preservation courses, and university collections offer excellent opportunities for practical coursework.

At the top of the structure shown in Fig. 2, there should be a steering committee chaired by the position responsible for managing/directing/deciding on the university's cultural heritage (in the case of the University of Barcelona, the Vice-Rector for Arts, Culture, and Heritage) or the person delegated thereby. However, it is essential that this structure should also include at least one cultural heritage expert on the staff of the university itself, as a centralised management figure(s).

One possible composition of this committee is depicted in Fig. 3.

Fig. 3
figure 3

Source: Authors

Steering committee (proposal).

The functions of this committee would essentially be to establish basic periodic lines of action, which would be set out in successive master plans, and specific working procedures. Elements of analysis for the development of these tools are presented below.

4.2 Definition of objectives of documentation (why we document)

This issue will have to be clearly set out in the master plan, if necessary, at a global level, but it will probably be necessary to define objectives at the level of the university, as well as for each collection, as collections are too diverse to be covered by a single, homogeneous objective.

At the global level, Valle, Fernandez, and Arenillas define five main types of documentation objectives in heritage management: as a knowledge strategy, to protect cultural assets, to conduct research, to engage in preservation actions, or to disseminate information. In the case of university heritage, all these objectives are valid, although their importance may vary in specific collections. For example, the objectives of zoological collections have a strong research component, while protection and dissemination are probably key aims for fine arts heritage.

4.3 Definition of subject of documentation (what we document)

Once the objectives of documentation have been established, it is necessary to examine what it is that we want to document. While this can be defined specifically for each collection, it is also necessary to establish a common working framework.

Establishing this framework should involve:

  1. 1.

    Defining what it is that we must document. This means defining what we consider to be heritage within the university system, which is part of a broader discussion that aimed specifically at defining what heritage is. Perhaps once we have established this definition (or definitions), some collections may be removed from the list or merged with others.

  2. 2.

    Defining how the different collections are structured to provide a unitary structure to the university system; in other words, generating a classification/taxonomy framework.

  3. 3.

    Prioritising the tasks that should be included in the master plan, both at a general level and for each collection.

  4. 4.

    Defining the levels of description/selection criteria appropriate for the different collections and whether there are relationships between them.

4.3.1 Types of heritage to be documented

The four main types of university heritage elements defined in 2019 by the Vice-Rectorate for Arts, Culture, and Heritage of the University of Barcelona [39] were adopted for this study, on the understanding that these types of heritage are commonly found in most European universities. The first three were previously identified by UNESCO [38]. The four categories are:

  • Movable cultural heritage. Includes objects of various kinds, books, and archive documents, as well as scientific collections (mineralogy, zoology, herbaria, etc.).

  • Immovable tangible cultural heritage. Basically includes buildings and botanical complexes such as the Ferran Soldevila Garden at the UB.

  • Immaterial Heritage. Elements that reflect the spirit of the institution, such as the university hymn, ceremonials, customs, and traditions [38], as well as associated tangible elements.

  • Human heritage. The contribution of those people who have left their mark on the institution. Not identified by UNESCO, but widely used in the tourism sector,7FFootnote 4 human heritage is fundamental to a university environment, where people are important.

In this article, we focus on movable cultural heritage, leaving specific proposals for the management of other types of heritage for future studies. However, some of the parameters set forth here could also be applied to intangible heritage and immovable cultural heritage. Human heritage clearly requires a separate study.

4.3.2 Classification systems

Once we have broadly defined what we must document, it is essential to link it all together by means of a structure that allows the various collections to be interrelated and offers a united vision, normally of a hierarchical nature. Unfortunately, LAM institutions use different classification systems. For example, in museums and libraries, classification structures are usually based on subjects (UDC8FFootnote 5 or Dewey in libraries, Nomenclature or AAT9FFootnote 6 in museums), while in archives there are three different types of classifications: organic classifications (based on the principle of provenance of the documentation), functional classifications (based on the competences, functions, and activities of the generating institution) and organic-functional classifications (combining the first two concepts) [5,6,7].

We propose a double classification for university collections, which has been adopted for the UB's virtual museum:

  1. 1.

    An organic approach by collection. This approach is necessary in order to keep sight of the context of creation, although, if necessary, groupings by type of heritage could be made at higher levels. With this purpose, the property dcterms:isPartOf was reserved.

  2. 2.

    An approach by subject and by type of object. Complementary to the organic approach, this approach has the aim of overcoming the fragmentary vision offered by the first. For example, this approach would recover all items related to veterinary medicine regardless of the collection that hosts them. Of the classification systems analysed, the one best suited to the needs of the University of Barcelona was Nomenclature 4.0 for classification by subject and the objects facet of the AAT for types of objects (in accordance with CDWA guidelines10FFootnote 7). For object type the field dcterms:type was reserved, and for classification by subject a specific field metadadesUB:classification was created.

4.3.3 Levels of description / cataloguing

It should be borne in mind that the items to be described in a university environment are sometimes isolated items, while others are part of a whole or have parts that are worth cataloguing and always belong to a specific collection. This makes it necessary to consider different levels of description/cataloguing. This is an issue that has been addressed by various GLAM institutions that have designed specific conceptual models and/or descriptive standards to provide a solution. For example, the International Federation of Library Associations (IFLA) has established the LRM (Library Reference Model), which outlines four broad classes of objects to be described: work, expression, manifestation, and item. In this classification, "item” refers to the concrete object held in the library, while “work” is the more abstract idea in the mind of the creator.

In historical archives, the levels of description have traditionally been delimited by the ISAD G, which recommends different levels of description created from documentary groups. The most generic is the fonds and the most specific is the simple documentary unit [20], although there may be numerous intermediate levels. This standard is the basis for the creation of the new conceptual model, Records in Context (ICA RIC-O), which restructures conventional groupings by integrating them into three large groups: RecordSet, Record, and RecordPart. This conceptual model will serve as the basis for the new standard archival description.

In museums, the work is normally considered a material element rather than a work in the mind of the author as it is in the bibliographic world [19]. Furthermore, we often cannot define as many document groups as can be defined in the case of an archival environment. Moreover, the conceptual models associated with museums, such as the CIDOC CRM model11FFootnote 8 and the CCO model,12FFootnote 9 do not define specific levels; instead, they define objects that contain other objects. In fact, museum metadata schemas do not use the same criteria to establish levels of description. For example, in its Catalog Level element, CDWA recognises 19 levels of description, while VRA13FFootnote 10 core, only recognises three: work, collection, and image, and LIDO,14FFootnote 11 the interchange schema for museums promoted by ICOM (International Council of Museums), distinguishes between individual item, part of an item, and group of items.

Having evaluated the different conceptual models, we propose the adoption of the solution provided by LIDO, as its fewer levels make it more feasible for infrastructures with the limited resources typical of non-museum collections. To its three levels we would add two more: the collection (to collect the usual structure of the university heritage) and the institution, which would be used exclusively to control the physical manifestations of the intangible heritage (photographs, videos, recordings) [8, 44].

Therefore, in the case of university collections, we would define four main levels of cataloguing or description plus a level of instantiation for intangible heritage. The cataloguing approach we propose goes from the general to the specific, in accordance with the principles of archival science, to ensure that the higher levels are catalogued or at least inventoried. The five levels are:

  • Collection: the first thing to be catalogued if nothing else could be, which could also have sub-collections.

  • Group of items sets of items that for various reasons may be linked and that it is not sustainable to work on separately (e.g. an eighteenth-century doctor's bag that includes all the doctor's working tools inside it). This would constitute a second level in cataloguing, and it would be interesting to consider it as a descriptive unit in very large collections for which it would not be sustainable to catalogue individual items.

  • Item: a separate item that is considered necessary to catalogue because of its value, although in large collections it is only possible to do this for single items.

  • Part of the item: only in the case of architecture or very specific items of exceptional value.

  • Instantiation (intangible heritage only): the registration and cataloguing of intangible heritage have been promoted especially since the 2003 Convention for the Safeguarding of the Intangible Cultural Heritage,15FFootnote 12 although the ethnographic aspects of disappearing civilisations were already being collected long before that. However, these intangible elements need to have a physical medium that would make it possible to record their existence, and this medium would have to be catalogued.

This structure of 4 + 1 levels should not undermine the work done by museums, libraries, and archives, as it is possible to map out correspondences between the different levels.

The choice of the level of work for each collection would have to be determined in accordance with:

  1. 1.

    The proposal of the head of the collections, in consultation with experts.

  2. 2.

    Pre-existing tools (inventories or catalogues);

  3. 3.

    The internal structure of the collection;

  4. 4.

    The sustainability of the proposal (are there human, material, and economic resources available to carry out the project?).

The final proposal for the level of description must be approved by the work team (see Sect. 4.1.).

4.3.4 Selection criteria

It is often not sustainable to consider cataloguing at item level due to the size of some collections. For example, at the University of Barcelona, as shown in Table 1, there are different collections with more than 5,000 items [41].

Table 1 Collections of the University of Barcelona that are featured in the virtual museum (only notable pieces). There are others that have not been incorporated (Criminology, Mathematics, Intangible and Human Heritage, etc.) which will be added later

In these cases, it is essential to prioritise. It will be the experts on each collection who will establish the criteria for selecting the items or levels to be documented, based on the existing resources, the age and uniqueness of the items, the demands of users, the state of conservation, and the previous state of the catalogue/inventory.

4.4 Definition of documentation tools and methodologies (how we document)

This section defines HOW we document in accordance with the following basic sections:

  • Documentation methodology

  • Systematisation and standardisation of information

4.4.1 Documentation methodology

This section is a continuation of the section on selection criteria (see 4.3.4.). It involves an in-depth analysis of the needs of each collection and how we can balance scientific rigour and sustainability.

In general, there are three main types of university heritage collections:

  • New/non-inventoried collections.

  • Collections with total or partial inventory.

  • Collections fully catalogued or inventoried.

Depending on the descriptive stage of the collections, the methodology will vary, incorporating more or fewer of the following steps:

  1. 1.

    Establishment of the starting point for each collection.

  2. 2.

    Decision on how the work will continue from now on for each collection, which will involve the following:

  3. 3.

    Drawing up a work schedule that prioritises the tasks to be carried out and integrate them into the general schedule.

  4. 4.

    Deciding on the tools to be documented (metadata schemas, controlled vocabularies, programming) and trying to find a common meeting point. In the case of the UB, this meeting point was the institution's new virtual museum. See 4.4.2.

  5. 5.

    Establishing the depth of the description (for systems that are not inventoried/catalogued or are only partially inventoried/catalogued). Traditionally, both archival science and museology distinguish between the concepts of inventory (a more superficial description and/or with a recording function) and catalogue (more in-depth, limited to collections or single items, and with a strong influence on the historical or contextual aspect), although the definitions tend to vary slightly depending on the field of knowledge [4]. However, in a heritage environment as diverse as the university, these distinctions can become very ambiguous, which is why we advocate Santana’s proposal [35] to collapse these terms into the concept of General Cultural Heritage Registers. Based on this single model of register, we would need to decide whether to do an in-depth, research-oriented cataloguing (a "catalogue" in the conventional sense), or a quick cataloguing that captures the fundamental characteristics of the item (or group of items) to be documented, registers it, and prepares it for a more in-depth cataloguing. It would even be possible to simply consider entering a minimal record, which is little more than a code and a title, simply to provide evidence of the material existence of a heritage element.

However, an important question to consider is that in many centres the university heritage (HERITAGE) is linked to a more general heritage unit (ESTATE), which inventories all university properties. In these cases, it would be advisable to analyse the inventory needs of this unit and ensure that our records have the necessary fields, to ensure compatibility between all databases.

  1. 5.

    Taking the following actions in the case of systems that are already fully inventoried or previously catalogued in computer format:

    1. i.

      apply clean-up techniques and reconcile metadata with Open Refine [48] systems to avoid inconsistencies between the controlled languages that would have to serve all collections and the languages ​​that have been used so far, if any; and

    2. ii.

      design/adapt crosswalks to facilitate the entry of records and their collection within the virtual museum.

  2. 2.

    Considering a cataloguing approach that integrates the semantic web and Linked Data as basic operating principles. This entails, among other things, treating each document grouping as a resource and therefore generating a permanent IRI17FFootnote 13 for each one, searching for documentary languages that are adapted to LOD and applying them, and taking advantage of the data they contain for the semantic enrichment of our records [2].

4.4.2 Systematisation and standardisation of documentation

As noted above, one of the basic objectives of our project is to define the tools for documentation. The selection of tools should be subject to the following guidelines:

  • Adopt the most widely accepted standards for documentation. These tools will be evaluated by the project management team to assess their suitability. For this task, we will use Boughida’s terminology, as quoted by Gilliland [18], to characterise the different types of standards to be defined: data structure standards are metadata schemas used in a community comprising the structures of relationships, records, and fields that a database will have; data value standards are those controlled languages used within a community to populate certain fields of metadata schema; data content standards facilitate the definition of the format of the values; and data format/technical interchange standards are those formats that allow us to save the records and share them with other institutions (XML, JSON, CSV, etc.).

Table (Table 2) contains some examples of standards adopted by GLAM institutions in each category, bearing in mind that the list is not exhaustive, and the examples shown do not all fall entirely into a single category.27

Table 2 Metadata standards

If each institution has its own standards, which ones should we use in a university environment? We will try to answer this question in the following sections.

  • Produce quality metadata. Good metadata are essential to be able to pursue quality research. If the metadata are poor, the research results will be poor as well [10, 11, 16, 18, 26, 27, 37].

  • Produce metadata that adapt to different types of users.

  • Produce metadata that can be easily located, accessed, reused, and shared. In other words, they must comply with the FAIR principles,18FFootnote 14 which, although they were created in a scientific environment, are fully applicable here [46]. A final objective should be the aggregation of data for submission to large institutional repositories such as DPLA, Europeana or GBIF.19FFootnote 15

  • Create/adapt a metadata schema or application profile that is modular and extensible, adapted to the diversity of the collections and developed from a simple conceptual model that provides the information with a basic structure.

  • Ensure that the metadata can be developed by people outside libraries and archives who are not information specialists, but who already do an excellent job in their fields on the advice of information professionals. This means that the system chosen must allow the records corresponding to the different collections to be edited and added in a simple way.

  • Facilitate metadata ingestion from other systems, such as libraries, archives, legally established museums, and collections that have developed their own systems. To this end, it is essential that crosswalks be developed to act as bridges between old systems and the new one.

This section can be summarised as follows (Fig. 4):

Fig. 4
figure 4

Source: Authors

Requirements for metadata in university collections.

The next section outlines our proposals for non-museum university collections, in terms of both conceptual models and standards.

Conceptual data models for non-museum university collections

The conceptual models existing in the GLAM environment are characterised by a high level of complexity that can only be sustained by institutions with sufficient staff and resources. Our proposal would be based on a simplified version of the CIDOC CRM model,20FFootnote 16 which we consider to be the most appropriate for cultural heritage as a whole because of its treatment of the day-to-day management of items. The difference is that while CIDOC CRM defines a series of basic classes, subclasses, and properties, our proposal includes most of the basic entities of the CIDOC CRM model but transforms them into properties to be reported (Fig. 5).

Fig. 5
figure 5

Our proposal: E-R Non-Museum conceptual model

Data structure standards for non-museum university collections

Our proposal would be to use interchange metadata schemes for the daily management of university collections, while maintaining the original metadata schemas in those environments with consolidated standards. These schemas are used for the following reasons:

  1. 1.

    Their learning curve is low. With a minimum of training and good documentation, it is possible to achieve the documentation objectives set out for each collection.

  2. 2.

    They are supported by most programming that is based on more complex conceptual systems. This would allow the library, archives, and museums (if any) to work with the standards specific to their environment and periodically feed the common system with the processing of new datasets.

  3. 3.

    They allow scaling up to national and international cooperation environments, if the appropriate systems and protocols are in place to do so (collective catalogues, international repositories).

  4. 4.

    They are usually conceptually prepared for an LOD environment. The schemas we propose in this article are mostly application ontologies [49].

  5. 5.

    They usually include properties or attributes that are well adapted to the needs of the proposed conceptual model.

  6. 6.

    They facilitate the creation of specific application profiles for certain environments. This would allow us to include those properties required by our conceptual model, but which do not appear by default.

  7. 7.

    They usually already have crosswalks/maps that facilitate migration between schemas.

Data structure standards for cultural heritage

There are two main Data Structure Standards that we have considered using for collections of tangible and intangible cultural heritage.21FFootnote 17 The first is Dublin Core, which has emerged from the library environment for the cataloguing of electronic resources on the Internet and is now consolidated at the level of data interchange [26], since it is the basis for migrating data to a multitude of digital repositories, such as Europeana or DPLA. The second is LIDO, which developed out of the museum environment (ICOM) and is closely linked to CDWA and the CIDOC-CRM ontology, of which it is an expression.

We ruled out other specific systems for museums/heritage materials, such as VRA Core and CDWA Lite, because they are superseded by LIDO, which is more of an institutional initiative through ICOM.

However, LIDO has a narrower range of application than Dublin Core, as reflected in Waldron & Webster’s report on VRA cataloguing and metadata [43], because it is used exclusively in a museum context and is a more complex system, although numerous initiatives have been established to promote its use [1].

With the above in mind, our proposal focuses on creating an application profile based on Dublin Core for the following reasons:

  1. 1.

    It is easy to use and easily adapted by collections managers, which should be our basic objective.

  2. 2.

    It is very well established and is the basis for large repositories (Europeana, DPLA).

  3. 3.

    It allows data to be made ready for scaling and migration to other systems.

  4. 4.

    It allows data to be prepared for LOD environments, as reflected in the various guides prepared by the Dublin Core Metadata Initiative [15] and in the fact that it is currently constituted as a formal ontology.

  5. 5.

    It allows the establishment of different relationships between the items using the properties relation:isPartOf and relation:hasPart. This will allow the implementation of our levels of description (see 4.3.3.).

  6. 6.

    It encourages the creation of application profiles that allow us to add management aspects of those recognised by ICOM/CIDOC and that we have added to our conceptual model.

The proposed application profile (see “Appendix 1”) offers a flat structure for the user that connects with the different parts of our adaptation to the CIDOC CRM conceptual model.

Data structure standards for scientific heritage

Despite its advantages, Dublin Core does not adapt well to natural science collections, which are very common in the university environment, as they have very specific characteristics, such as the existence of taxonomies (botany, zoology) and/or of multiple descriptive fields (mineralogy).

Our analysis of existing metadata schemas has led to the identification of three main types of approaches:

  • Museum approach. Particularly noteworthy is the case of Italy, where the ICCD (Istituto Centrale per il Catalogo e la Documentazione) has developed specific cataloguing standards for botany, mineralogy, petrology, planetology, palaeontology, and zoology [32]. These are the basis for the Catalogo Generali dei Beni Culturali, which compiles Italy's impressive cultural and scientific heritage. Other initiatives can be found in Canada (Chin Natural Sciences Data Dictionary) and ICOM itself, which has developed CRMsci, an extension of the CIDOC-CRM model for science that is still in the draft phase.

  • Natural sciences approach. There are numerous metadata schemas that have been designed to deal with very specific areas (e.g. agriculture, mineral resources, biodiversity, and ecology). Many of these are collected at the DCC (Digital Curation Centre). Because of their presence in a university environment and in collections management, the structural standards developed by the Biodiversity Information Standards (TDWG)22FFootnote 18 are worth highlighting here, particularly the following:

    1. o

      Darwin Core, a Dublin Core application profile, ratified as a standard in 2009, which has been used for cataloguing zoological collections to record taxonomies. Its purpose, however, is the cataloguing of biodiversity, not the cataloguing of specific items in a museum environment. In other words, what is important in this schema is the event (the sighting of a bear in the Pyrenees, for example), rather than the item itself. Darwin Core is currently used to feed international biodiversity repositories, such as GBIF, in which many universities participate.

    2. o

      ABCD (Access to Biological Collection Data), initiated as a project in 2000 and consolidated as a standard in 2005, which allows the collection of information on natural science collections and specific specimens. It feeds the BioCase and GBIF portals and has developed crosswalks to Darwin Core that allow interoperability between the two schemas. It also allows data to be exported to LIDO.

    3. o

      ABDC_EDF (extension for geosciences), was developed in 2005 out of ABDC, since there was a lack of standards in the field of earth sciences. This metadata schema is the one used in the GeoCASE portal, the equivalent of GBIF in the field of geoscience.

None of the above schemas allow museum management; at most there are elements linked to conservation and preservation within schemas linked to palaeontology.

  • Other approaches. There are approaches that do not fall into either of the two categories above, but which can be very interesting for the construction of databases. For example, Wikipedia’s infoboxes [45] have very elaborate templates that could be useful as metadata schemas in the natural sciences, as shown in Fig. 6.

Fig. 6
figure 6

Source: https://en.wikipedia.org/wiki/Template:Infobox_mineral

Mineral infobox template and example.

It should be noted that although at the time of writing Schema.org had not yet developed metadata for the natural sciences, its unstoppable growth suggests that at some point this will become an option.

In addition, there is a whole series of approaches that could be described as “hand-made”, something very common in university collections. Many of these began being automated when there were no metadata standards, using simple resources such as Microsoft Excel or Microsoft Access. Some attempts have also been made to generate application profiles of widely used metadata schemas, such as the case of the Colorado School of Mines [9], which adapted Dublin Core to the cataloguing of minerals.

From our point of view, based on the principles outlined in Sect. 4.4.2., there is a need for collections in the natural sciences, applying a metadata schema that is simple but that can also be migrated to established standards. We consider two options to be worth considering:

  1. 1.

    Adapt the environment's own systems (ABCD, ABCD_EFG and Darwin Core) in accordance with the managers’ needs and with the pre-existing metadata schemas of the various collections.

  2. 2.

    Adapt Dublin Core to the needs of scientific collections.

In both cases, it would be necessary to add the fields that have been determined for the museum management of the items (it is important to bear in mind that these types of items have a great deal of movement due to exchanges, donations, loans, and transfers) and the necessary mapping to be able to enter the records into major international repositories.

From our point of view, the first is the better option, as metadata schemas are more appropriate. However, the use of all of the fields may exceed the capacities of cataloguers, and may also often be unnecessary. This question requires in-depth analysis in the future research.

Data content standards

If Dublin Core is chosen as the standard, we have a problem of content, as it avoids "forcing" and only gives very generic recommendations that do not have to be followed [15]. The original purpose of such flexibility was to facilitate the implementation of the standard, but it can lead to major inconsistencies in the data. It is therefore essential to limit the values that users can write, either through Data Value Standards or through Data Content Standards.

For content standards, there are three main options:

  1. 1.

    Adapt the content standards specific to one of the GLAM institutions, which are listed in Table 2.

  2. 2.

    Use the content recommendations of specialised structural standards, such as CDWA, which are so complete that in many cases these same standards can serve as a content standard.

  3. 3.

    Use specific international standards for specific aspects in certain fields, such as ISO standards for formatting dates or geographical coordinates.

Our proposal is to adopt the content standards recommended by Dublin Core. However, if they are not suitable, or if no standard is recommended, we propose the use of the content recommendations of the museum standards (i.e. CDWA and CCO), filtering them to adapt them to the characteristics of our collection and our software. In other words, the proposal is to:

  1. 1.

    Develop a data dictionary/metadata record in which the characteristics stated for each of the properties are briefly adapted to CDWA/CCO (see 4.6).

  2. 2.

    Link the data dictionary to the specific point of CDWA or CCO where this property is treated in case the cataloguer wants to obtain more information.

Data value standards

Value standards, also known more recently as KOS (Knowledge Organisation Systems), indicate the accepted values for a given property. In their simplest version, they constitute what we call lists of terms, but they can include complex language systems such as thesauri. They work in conjunction with content standards because when a term does not appear in the list of values, it is then necessary to construct the term anew using the recommendations given by the content standard.

It is important to emphasise that we must avoid creating our own lists as much as possible. There are various vocabularies available today that are adapted to the LOD environment, which can be perfectly useful for our purposes. In the field of cultural heritage, international tools such as VIAF23FFootnote 19 (people and institutions), Nomenclature 4.0 (classification), Geonames (geographical locations) or AAT (materials or types of objects) are highly recommended. The use of these tools involves something of a learning curve, but it is better for collections managers to learn how to use these tools rather than spending time and effort creating new ones. However, there may be exceptions to this rule, such as:

  1. 1.

    In some cases, where there are well-developed local tools, their use may be preferable to global standards. For instance, at our centre, rather than using AAT directly, we chose to use the UB Thesaurus, developed by the university library for subjects, as it was already adapted to the linked data, had links to international standards and was fully adapted to our needs in terms of concepts. Moreover, if it was necessary to add new terms, the library could add them without any problem. On the other hand, for creators we opted for VIAF, which was also used by the library itself.

  2. 2.

    In some cases, where the list may be shorter than it is for subjects, it may be useful to make a pre-selection of terms to give more guidance to cataloguers. For example, in our case, in terms of materials and techniques, the AAT was adequate, but a selection was made of the terms that might appear more frequently as a way of facilitating the work and ensuring consistency. The same was done with the levels of Nomenclature 4.0, which are too detailed for our needs.

4.5 Before starting to catalogue: the choice of computer system

There are numerous systems, including open-source systems, that may be suitable for our purposes and allow us to implement the designed application profile and the selected standards. Alcaraz offers a good comparison of some of these systems [6]. Although it is not the aim of this article to indicate which system should be used, we can point out some of the features it should have:

  • Support for different metadata schemas, especially Dublin Core.

  • Integrated mappings to other metadata standards.

  • Capacity to create application profiles from certain metadata schemas.

  • Capacity to define your own controlled values and to ingest externally controlled values.

  • Support for OAI-PMH,24FFootnote 20 the protocol that will allow aggregators to collect your records and send them to Europeana.

  • Support for image management using IIIF.25FFootnote 21

  • Use of semantic web standards, which should allow for features such as: introduction and application of ontologies, IRI generation for resources, connection to LOD datasets, SPARQL26FFootnote 22 EndPoint, semantic enrichment options, etc.

  • User interfaces that are easy to understand, visual, and customisable.

  • Importing systems that allow you to work with the main serialisations of the work environment. Normally, CSV, XML (and its variants), JSON, or Turtle.

4.6 Before starting to catalogue documenting the process

The end point of the whole system should be a document containing all the decisions relating to the final database. Depending on the environment, this document will have different names. Computer scientists call them data dictionaries [13], while information specialists, especially in a semantic web environment, refer to them as metadata registries [49]. Other names include systems catalogues, glossaries, and database manuals. But whatever the name, this document must be a tool that helps collections managers to enter the data, property by property, keeping doubts to a minimum and in the simplest possible way, without the need for a potentially overwhelming level of technical knowledge. For example, in a field in which users must enter URIs that facilitate subsequent semantic enrichment, we do not need to explain in detail what this enrichment is, but we do need to explain how we want them to enter the data.

We propose a data registry/dictionary model based on ISO 11179 -2013 [23], a standard that supports the development of metadata registries and is accompanied by some extreme elements of Coronel and Morris’ data dictionary [13]. Table 3 presents an example of this.

Table 3 Data dictionary

4.7 Summary of the work methodology

See Fig. 7.

Fig. 7
figure 7

Summary of procedures for non-museum university collections

5 Conclusion

The proposal for the documentation of university heritage outlined in this article is designed to respond to the following unquestionable facts:

  • The metadata must be of high quality.

  • The metadata must comply with the FAIR principles.

  • There are numerous tools and standards for quality metadata, but they are often sectorial in nature.

  • University heritage is highly interdisciplinary.

  • The people in charge of most university collections work part-time and need simple and very targeted systems.

Consequently, we propose the creation of an application profile based on Dublin Core, the most widely used of the structural interchange standards, while preserving elements of the CIDOC CRM conceptual model, which will become properties of this profile. We also propose the use of value and content standards specific to the sector to populate the properties that make up the profile.

We believe that our proposal allows these professionals to work part-time but does not prevent full-time library and archive professionals from submitting their records to the central heritage system.

To manage this system, we propose the creation of a steering committee made up of various university professionals under the direction of the relevant vice-rectorate. Decisions on metadata should be left in the hands of information professionals, be they librarians, archivists, or professors in the university's information science faculties. The composition of the steering committee and its functions should probably be set out in the regulations of the university's heritage unit as a way of consolidating its existence.

We believe that our proposal, summarised in point 4.7, offers a sustainable approach to the documentation of university collections, although it only focuses on one specific type of heritage, a limitation that will need to be addressed in subsequent studies.

6 Proposals for the future

Although this paper focuses mainly on tangible cultural heritage, it acknowledges that there are other types of heritage that need to be looked at in greater depth. For example, as far as natural science collections are concerned, Darwin Core could be a suitable system for zoological and botanical collections (many of which already collaborate with GBIF), but further study is necessary to identify the right system for geosciences and other natural sciences. A more in-depth study would therefore be needed to determine whether a common application profile could be used for all natural science fields.

Human heritage is a more complex question. This type of heritage is not defined by UNESCO and has been discussed mainly in the field of tourism. Research on how to deal with this type of heritage is still in its infancy, as it is difficult to adapt it to standard scientific criteria. It is worth considering, however, that the analysis would probably have to start with metadata standards that collect biographical data on people, such as Schema.org's Person (or its predecessor, FOAF), ISAAR CPF, or CDWA's Person/Corporate Body Authority.