1 Introduction

During the last decades, there has been an increasing demand for study climate change and for mitigate its impact on ecology, social vulnerability and economic activities. Historical meteorological observations have a crucial role in the understanding of climate change. However, meteorological and climatological measurements can only be reliable if their measurement methods are traceable to the International System of Units, the SI. In fact, the Manual on the Global Observing System [1] suggests that meteorological stations should be equipped with properly calibrated instruments. Lopardo et al. [2] state that the comparison of the evolution of weather parameters can only be considered if the uncertainty of their measures is known. And the study made by Kowal et al. [3] mentions the importance of the stability in instruments for climatological studies and encourages the planning of calibration programs for measurement devices. In this context, having a calibration program with a deeply analysis of the uncertainty sources, encourages the reliability of the data, raising significant contributions to the study of climate.

The Guide to the Expression of Uncertainty in Measurement (GUM) [4] defines the measurement uncertainty as the doubt of the veracity of a measurement result. To evaluate the uncertainty and its causes, a calibration process must be carried out. A calibration consists in the comparison of an instrument against a reference, allowing to find errors in the instrument readings. For this, to get the most out of meteorological instrumental calibrations, the selected calibration points must be set according to the real environment conditions where the thermometer operates. From a climatological point of view, the calibration allows to improve climate data collection, knowing the errors and applying the respective corrections, concluding in better accurate climate models.

Recently, metrologist have been involved in improving data quality for meteorology and climatological studies. In 2012 The Istituto Nazionale di Ricerca Metrologica (INRIM) manufactured a transportable calibration chamber for on-site calibration campaigns [5, 6]. This chamber is a prototype of the travelling standard, aligned to the SI standards. The chamber was used in the calibration of air temperature sensors in Himalayan [7] and a new version was used in the calibration at the Osservatorio meteorologico di Moncalieri [8], where INRIM periodically performs the air thermometers calibrations. A further prototype is operated at the Arctic Station of Ny-Ålesund [9] (https://nyalesundresearch.no/infrastructures/the-metrology-laboratory-at-vaskerilab/).

The Osservatorio meteorologico di Moncalieri was founded in 1859 and since 1865, the air temperature has been recorded uninterruptedly, even during war times, making it a centennial station recognized by the World Meteorological Organization (WMO). Having set a calibration program specifically developed for these centennial stations, INRIM guarantees the data traceability due to periodic verification and re-calibration of the involved instrumentation. The on-site calibration of two air thermometers of the Osservatorio di Moncalieri were performed previously in 2012. In 2016, further thermometers have been added, which were previously calibrated in a private laboratory in collaboration with INRIM. Later, in 2022, the on-site calibration was repeated. It is well known that the instrument properties can change over time, especially when they are exposed to environmental conditions. Therefore, with the aim of evaluating the instrument stability as drift in time, repeated successive calibrations should be carried out, with associated corrections and the constant traceability to national standards.

A previous analysis was carried out considering the air thermometers belonged to the Osservatiorio Meteorologico di Moncalieri. The study by Bertiglia et al. [8] describes one of a fully traceable procedure for the on-site calibration of air temperature sensors in the transportable thermal chamber. The evaluation of climatic trend has also been evaluated, comparing the raw data with the data corrected by the calibration report. As expected, the analysis shows that the application of the calibration function has an impact on the recorded data. In line with the past activities, in the present paper an on-site calibration procedure is described. The exposed calibration was carried out at the Osservatorio meteorologico di Moncalieri in October 2022. The evaluation of the uncertainty contribution and the analysis of the results are presented. Finally, the comparison between the results of different calibrations is reported, also aiming at correcting instrumental drift, as one of the key factors in data homogenisation in climatology.

2 Data and Methodology

From 17-10-2022 to 19-10-2022, the calibration of three thermometers (Siap+Micros) from the Osservatorio Meteorologico di Moncalieri was done. Figure 1 and Fig. 2 show the thermometers under calibration (TUC) called Capannina (model t001), Vent Capannina (model t001) and Torretta (model SM3840). These thermometers were calibrated by comparison against INRiM travelling standards, in a portable thermal chamber (Fig. 3), at \(-15 ^{\circ }\hbox {C}\), \(0 ^{\circ }\hbox {C}\), \(15 ^{\circ }\hbox {C}\), \(26 ^{\circ }\hbox {C}\) and \(39 ^{\circ }\hbox {C}\), according to the temperature range of the area. The TUCs calibrations were simultaneously performed, and the readings of each sensor was compared with the reading of one of the three reference thermometers Pt100 (RT), called RT1, RT2 and RT3. The RTs were calibrated by INRIM against reference standards and traceable to the the International Temperature Scale of 1990, ITS-90 fixed points. The RTs expanded uncertainty values were no higher than \(0.02 ^{\circ }\hbox {C}\). The readings of the reference sensors were recorded by a high precision multimeter, Fluke Super-DAQ 1586, meanwhile the readings of the TUC were taken from its respective datalogger.

Each TUC was placed in the thermal chamber near to its associated RT. Given the relative positions in the calibration volume, Cappannina was linked to RT2, Vent Capannina with RT3 and Torretta was associated with RT1. With the aim of reducing the temperature gradient, the thermometers were set at the same height.

Fig. 1
figure 1

Picture of Capannina (left) and Vent. Capannina (right), in the Stevenson screen, sited on the balcony of the Osservatorio meteorologico di Moncalieri

Fig. 2
figure 2

Picture of Torretta, sited on the roof of the Osservatorio meteorologico di Moncalieri

Fig. 3
figure 3

The thermal chamber and the datalogger used in the calibration are presented in the picture

With the aim of limiting the chamber instability contribution to the overall calibration uncertainty, the readings after the stabilisation were considered. For this study, the stabilisation required that the differences between successive readings of the same RT were within \(\pm 0.01^{\circ }\hbox {C}\). Once the thermal stabilisation was reached, for each point of calibration, at least 20 readings every 30 s, were considered. It must considered that the stabilisation time was not the same for all the calibration points. For example, temperatures lower than \(0 ^{\circ }\hbox {C}\) and higher than \(25 ^{\circ }\hbox {C}\) required more than 5 h.

The aim of a calibration is to know the correction to be applied on the evaluated instrument and the associated uncertainty measurement. The correction is defined as the reading difference between the reference and the calibrated instrument, and the results can be expressed as a calibration curve, which indicates the relationship between an indication, in this case the air temperature, and the corresponding measured value [10]. For this study, the calibration function was calculated from the 5 temperature calibration points and defined as:

$$\begin{aligned} T = T_{TUC}+ a T_{TUC}^2+ b T_{TUC}+ c \end{aligned}$$
(1)

Equation 1 is a second order polynomial model, where T represents the air temperature and \(T_{TUC}\) is the temperature measured by the thermometer under calibration. a, b and c are the coefficients of the model. Each calibrated thermometer has its own calibration function, depending of the a, b and c values. In this way, Eq. 1 can be used to predict the correction value for any temperature \(T_{TUC}\) [4].

Table 1 shows the uncertainty contributions to the the overall uncertainty budget. \(\sigma _{res,TUC}\) is the uncertainty contribution due to the TUC resolution, obtained by the minimum division of the temperature reading. \(\sigma _{rep,TUC}\) and \(\sigma _{rep,RT}\) represent the repeatability uncertainty, derived from the standard deviations of the TUC and RT records, respectively. The uncertainty due to the RT calibration, \(\sigma _{cal,RT}\), is obtained from the calibration certificate delivered by INRIM. \(\sigma _{DL}\) is the uncertainty contribution by the datalogger readings and this value is provided by its manual. The datalogger used for this campaign was purchased immediately before and came with a fresh traceable calibration certificate from the manufacturer. Its overall uncertainty is so derived from its calibration. Concerning the calibration curve, the model has uncertainties associated to the interpolation process and the uncertainty due to this (Eq. 2) is obtained from the residual results of the calibration curve:

$$\begin{aligned} \sigma _{interp} = a T_{TUC}^2+ b T_{TUC}+ c - (T_{RT}-T_{TUC}) \end{aligned}$$
(2)

As mentioned before, the difference between the RT and TUC temperature readings is known as the correction. Whenever possible, if the instrument has any known error, it should be corrected with the information given by the calibration certificates.

Regarding the chamber uncertainties, the instability and the inhomogeneity are identified. The uncertainty due to the chamber instability, \(\sigma _{inst}\), is determined from the temporal variation of the temperature. The uncertainty due to the chamber inhomogeneity, \(\sigma _{inhom}\), reflects the temperature variation among different points of the chamber. The temporal instability and the spatial inhomogeneity were calculated from the RTs records, using more than 190 records for each calibration point. \(\sigma _{inst}\) is calculated as the standard deviation of the mean temperature values (\({\bar{T}}\)) recording by RTs (Eq. 3).

$$\begin{aligned} \sigma _{inst} = \sqrt{\frac{\sum _{i=1}^{n} \left( {T}_{i} - {\bar{T}} \right) ^2 }{n-1}} \end{aligned},$$
(3)

where n is the length of the sample and i, each record. Since the RTs are connected to the same datalogger, the acquisition time is the same for all the RTs. \(T_{i}\) represents the mean temperature for each time, considering the 3 RTs. And \({\bar{T}}\) is the mean temperature of the sample.

\(\sigma _{inh}\) is determined from the maximum temperature difference recorded within the chamber. For this, the maximum and minimum temperature values reached by any sensor are considered. Then, the inhomogeneity uncertainty is defined as the difference between them (Eq. 4).

$$\begin{aligned} \sigma _{inh} = max \left( T_{RTs} \right) - min \left( T_{RTs} \right) . \end{aligned}$$
(4)
Table 1 Uncertainty contributions in the on-site calibration

As it can be observed, the measurement uncertainties come from different sources. To combine them, the uncertainty contributions must be expressed in similar terms. Hence, the standard uncertainties, \(\mu\), is calculated, taking into account their probability distribution. Supposing that the standard uncertainties are independent, they can be combined by summation in quadrature (\(\mu _{c}\))

$$\begin{aligned} \mu _{c}^2= \sum _{i=1}^{n} c_{i}^2\mu _{i}^2 \end{aligned}$$
(5)

where i represents each source of uncertainty and c, its corresponded sensitive coefficient.

$$\begin{aligned} \mu _{c}= & {} \sqrt{\mu _{res,TUC}^2 + \mu _{rep,TUC}^2 + \mu _{cal,RT}^2 + \mu _{rep,RT}^2 + \mu _{DL}^2+ \mu _{interp}^2+\mu _{inst}^2 + \mu _{inh}^2 } \end{aligned}$$
(6)
$$\begin{aligned} \mu _{c}= & {} \sqrt{ \left( \frac{\sigma _{res,TUC}}{\sqrt{12}} \right) ^{2} +\sigma _{rep,TUC}^{2} + \left( \frac{\sigma _{DL}}{2}\right) ^{2} + \left( \frac{\sigma _{cal,RT}}{2}\right) ^{2} + \sigma _{rep,RT}^{2} +\sigma _{intep}^{2} + \sigma _{inst}^{2} + \left( \frac{\sigma _{inh}}{\sqrt{12}}\right) ^{2} } \end{aligned}$$
(7)

Finally, the coverage factor k is calculated according to the Welch–Satterthwaite Equation and the expanded uncertainty is calculated as:

$$\begin{aligned} U_{T} = k\mu _{c} \end{aligned}$$
(8)

The procedure described above is performed in agreement with the GUM [4] and considering the uncertainty guide from the National Physical Laboratory NPL [11].

3 Results

With a confidence level of 95 \(\%\), a degrees of freedom of 19 and the coverage factor rounded by 2, the corrections of each thermometer and the associated expanded uncertainty are shown in Table 2. Torretta presents the highest measurement uncertainties, probably more influenced by the characteristics of the sensor (resolution and repeatability) than by the chamber uncertainties or the RT properties. For negative temperature values, the 3 thermometers present high correction values. The correction is also high for temperatures near \(38 ^{\circ }\hbox {C}\). For these values, the associated expanded uncertainty is lower than the corrections values and the margin of doubt is acceptable.

Table 2 Calibration results for the three thermometers of Moncalieri. T is the TUC average temperature, C is the TUC correction temperature and \(U_{T}\) is the expanded uncertainty

Table 3 shows the coefficients of the calibration curve for each thermometer and the evaluated expanded uncertainty, which correspond to the highest uncertainty reached in the calibration procedure (see Table 2). The temperature correction values with the uncertainty declared in Table 3 are graphed in Fig. 4. The graph shows the curves get closer in temperatures between \(10 ^{\circ }\hbox {C}\) and \(20 ^{\circ }\hbox {C}\). At \(15 ^{\circ }\hbox {C}\), one of the calibration point, the corrections are significantly near \(0^{\circ }\hbox {C}\), meaning that the TRUCs and TRs readings were very similar. Figure 4 also shows that the Torretta behaviour is noteworthy different respect to the other instruments, especially for air temperatures below \(0 ^{\circ }\hbox {C}\), something expected due to the corrections and uncertainty results, exposed in Table 2.

Table 3 Values of the coefficients a, b and c of the calibration curve for each thermometer under calibration and its expanded uncertainty \(U_{T}\)
Fig. 4
figure 4

Calibration curve of Capannina, Torretta and Vent Capannina thermometers from the calibration in 2022

4 Discussion

4.1 Uncertainty Analysis

Tables 4, 5 and 6 summarise the influence of each standard uncertainty to the overall expanded uncertainty. For temperatures below \(0 ^{\circ }\hbox {C}\), the measurements in the calibration procedure are influenced by the condensation process inside the chamber, directly affecting the stability of the measurements. Since the TUCs dataloggers remained outside of the chamber, it could not be possible to close it completely. Therefore, the temperature difference between the inside of the chamber and the ambient generated the condensation, affecting the optimal performance of the chamber and increasing the inhomogeneity contribution to the overall expanded uncertainty. In addition, because of the time required to take the measurements, there was a need to insert all thermometers together with the three reference ones. Adding extra heat and air flows from and to the outside of the chamber, the controlling capabilities of the overall system was limited. The highest impact of the inhomogeneity happens for values below \(0^{\circ }\hbox {C}\) and near \(39 ^{\circ }\hbox {C}\), concluding that the farther the chamber temperature from the ambient, the worse the inhomogeneity.

Table 4 Percent contribution of relative standard uncertainty and combined uncertainty \(\mu _{c}\), in \(^{\circ }\hbox {C}\), in the Capannina calibration
Table 5 Percent contribution of relative standard uncertainty and combined uncertainty \(\mu _{c}\), in \(^{\circ }\hbox {C}\), in the Vent Capannina calibration
Table 6 Percent contribution of relative standard uncertainty and combined uncertainty \(\mu _{c}\), in \(^{\circ }\hbox {C}\), in the Torretta calibration

Table 2 reveals that Capannina and Torretta present the highest expanded uncertainty at the calibration point near \(0 ^{\circ }\hbox {C}\), being \(0.31 ^{\circ }\hbox {C}\) and \(0.62 ^{\circ }\hbox {C}\) respectively. In addition to the inhomogeneity, the interpolation process is the major responsible of the large overall uncertainty value (Table 4 and 6). Regarding Vent Capannina, the highest expanded uncertainty was registered close to \(15 ^{\circ }\hbox {C}\): \(0.30 ^{\circ }\hbox {C}\). And like the other sensors, the main contributing uncertainty is the interpolation, with more than 83 \(\%\) of influence.

In general, the main sources of uncertainty are the interpolation method and the thermal homogeneity. The third place goes to the TUCs repeatability. At \(-15 ^{\circ }\hbox {C}\) and \(39 ^{\circ }\hbox {C}\), the standard uncertainties due to the chamber inhomogeneity are \(0.12 ^{\circ }\hbox {C}\) and \(0.09^{\circ }\hbox {C}\), respectively. Past calibrations have been performed in climatic chambers with similar characteristics to that used in this work. Bertiglia et al. [8] noted that the chamber presents problems in the vertical temperature gradient and the inhomogeneity is one of the largest sources of uncertainty. The value of this standard uncertainty was \(0.11 ^{\circ }\hbox {C}\). In the calibration of meteorological sensors in Himalayan [7], the worst thermal homogenisation value was registered at \(-25 ^{\circ }\hbox {C}\) and it was \(0.12 ^{\circ }\hbox {C}\). Hence, different versions of the climatic chamber have the same characterisation results: the inhomogeneity values are similar and the ability to stabilise and homogenise the temperature decreases when the temperature is negative and extreme.

On the other hand, the smallest contribution to the overall uncertainty came from the repeatability and calibration contributions of the reference thermometers. This is a prove that working with references with the highest quality, the negative impact in the calibration procedure is reduced. This result addresses the important of implementing a calibration program traceable to the SI standards.

As already mentioned, Fig. 4 illustrates the notable difference of the Torretta behaviour respect to the other instruments. Torreta presents the highest expanded uncertainty value and their air temperature corrections below \(0 ^{\circ }\hbox {C}\) are totally opposite to those obtained by the other thermometers. The differences in the calibration results can be explained by the ageing, since Torreta is the oldest thermometer, and also by the different expositions and therefore the different influences which the sensors are subjected to. Capannina and Vent Capannina are located inside the Stevenson screen, protected from wind and solar radiation. In contrast, Torretta is installed on the roof of the Osservatorio. Although Torretta has a solar shield which protects it from the direct solar radiation, it is more exposed to the ageing due to environmental exposure affecting negatively the instrument preservation.

4.2 Calibration Curves Analysis

With the aim of comparing the results from the current and previous calibrations, Fig. 5 (Vent Capannina), Fig. 6 (Capannina) and Fig. 7 (Torreta) are plotted. Vent Capannina and Capannina were bought in 2016 and previous the installation, they were removed from its screen and placed in a climatic camber for the calibration under controlled conditions. In the private laboratory, the measurements were compared with the reading of three Pt100 used as reference thermometers. The Torreta calibration in 2012 was performed in the same portable thermal chamber in 2022, and the measurements were compared with the reading of one reference thermometer Pt100. This reference thermometer was also calibrated by the INRIM and traceable to the International Temperature Scale of 1990.

Figures 5 and 6 show that for the calibration in 2016, the correction values are close to \(0 ^{\circ }\hbox {C}\), highly influenced to the fact that the sensors were new. Observing the calibration curves of 2022, the corrections for both sensors have considerably changed, specially for negative temperature values. In 2022, Capannina also presents a significant correction for values near \(40 ^{\circ }\hbox {C}\). Thus, the analysed plots evidence how the ageing and the degradation of the instruments generate a drift effect. Regarding the declared expanded uncertainties in the laboratory calibration in 2016, their values are equal to \(0.07 ^{\circ }\hbox {C}\). Knowing the measurement uncertainties allow the comparison of the results. The uncertainty contributions to the overall uncertainty in 2016 were the RT and TUC repeatability and resolution, the RT calibration, the instability and inhomogeneity of the climatic chamber, and the residuals of the calibration curve. However, unlike the on-site calibration, the overall system involved in the laboratory condition was not the same from that operative system in the Osservatorio and therefore more uncertainty contributions should be considered.

Fig. 5
figure 5

Vent. Capannina calibration results of the laboratory calibration in 2016 and the on-site calibration in 2022. The expanded uncertainty in 2016 is \(0.07 ^{\circ }\hbox {C}\)

Fig. 6
figure 6

Capannina calibration results of the laboratory calibration in 2016 and the on-site calibration in 2022. The expanded uncertainty in 2016 is \(0.07 ^{\circ }\hbox {C}\)

Fig. 7
figure 7

Torreta calibration results of the on-site calibration in 2012 and 2022. The expanded uncertainty in 2012 is \(0.42 ^{\circ }\hbox {C}\)

Observing Fig. 7, the Torreta performance has also changed during the last ten years. The calibration curve of 2012 presents a positive slope (\(\hbox {a} = 4.00 \times 10{-}4 ^{\circ }\hbox {C}^{-1}\), \(\hbox {b} = 1.44\times 10{-}2\), \(\hbox {c} = -6.85 \times 0.75 ^{\circ }\hbox {C}\)) and the thermometer, in general, tends to overestimate the air temperature. In 2022, the slope is negative (Table 3) and the air temperature lower than \(0 ^{\circ }\) also tends to be overestimated. The performance of the sensor gets worst with time, for example, in 2012 the correction at \(-18.30^{\circ }\hbox {C}\) is \(-0.84^{\circ }\hbox {C}\) and in 2022, the correction at \(-13.88 ^{\circ }\hbox {C}\), is \(-2.03 ^{\circ }\hbox {C}\). With a coverage factor equal to 2 and a confidence level of 95 %, the expanded uncertainty declared in 2012 is \(0.42 ^{\circ }\hbox {C}\) and in 2022, is \(0.62 ^{\circ }\hbox {C}\) (Table 3). As already mentioned, during the last calibration, Torreta had to be calibrated with other sensors in the same thermal chamber. This challenge was considered and compensated identifying two more uncertainty contributions, not discussed in the previous calibration: the TUC repeatability and the chamber instability. The waiting time for the chamber stabilisation (differences in consecutive measurements not higher than \(|0.01| ^{\circ }\hbox {C}\)) was one of the key for the success of the procedure. In contrast, the highest contribution to the overall uncertainty in 2022 is the interpolation process. This could have been resolved by increasing the number of calibration point, generating a calibration curve more representative. However, due to the limited time, it could not be possible.

5 Conclusion

This study describes the methodology of an on-site calibration of air temperature sensors with a complete analysis and evaluation of the uncertainty contributions. Being this a periodic re-calibration of the same instruments using for the recording of historical temperature series, the drift effect is also evaluated, improving the data series homogeneity.

Results showed that the temperature corrections are higher for values below \(0 ^{\circ }\hbox {C}\) and close to \(40 ^{\circ }\hbox {C}\). And for readings near the \(15 ^{\circ }\hbox {C}\), the thermometers have the lowest corrections. With respect to the uncertainty, the highest contributions to the overall uncertainty comes from the interpolation method, the thermal homogeneity, and the TUCs repeatability. Regarding the interpolation, if more points are considered in the calibration procedure, the residual curve will be reduced and therefore, the uncertainty of the model. This requests more time on site, with associated staff costs, so a compromise should be considered, proposing this necessity in the routine procedure.

The maximum overall expanded uncertainty accounted for two thermometers is \(0.3 ^{\circ }\hbox {C}\), and \(0.6 ^{\circ }\hbox {C}\) for one thermometer more exposed to environmental conditions. The two \(0.3 ^{\circ }\hbox {C}\) values confirm the validity of the on-site calibration procedure and system, being close to the overall target uncertainty in prescribed meteorological and climate data requirements [12, 13]. Comparing the on-site calibration for the same instrument, the corrections have increased with time, something understandably for aged instruments.

The successive re-calibrations allow the accurate evaluation of the sensor drift, being the changes in the sensor properties overtime a key and sometimes even hidden component to consider and include in assessing data quality for historical series. The results exposed in this paper reflect the importance of the repeated calibration at given intervals, as prescribed by the WMO Expert Teams and more recently by the Global Climate Observing System (GCOS) in the recommendations for climate reference stations [14]. If the calibration corrections are not been considered, the accuracy of the measurements would be significantly affected.

Finally, one of the most important advantages of the on-site calibration is the using of the same dataloggers, same cabling in the same environmental conditions for the reading of the records. In this way, the calibration curve is more representative of the measurement conditions, together with the evaluated uncertainty. Moreover, the fact that the whole measurement chain is tested in working conditions, the calibration uncertainty contributions are reduced [6].