Introduction

The latest reports published by the Congressional Research Service [1] highlight that, since 2000, the United States area monitored by the National Interagency Coordination Center (NICC) has been impacted by 70, 025 wildfires a year that have damaged about 7 million acres. The report highlights that although there were more fires per year on average in the 1990s, these fires were generally smaller, and the amount of land burned was half the current annual average. Beyond fire events, recent history demonstrates a significant improvement in environmental catastrophes due to illicit activities, climate change, or fatalities. Despite their causes, catastrophic events pose a huge threat to the environment, people, and infrastructure [2]. So, detecting them as early as possible can limit damage [3].

Due to this trend, social media like Twitter,Footnote 1 are continuously updated and can describe a detailed picture of past and current happenings in the location of interest; therefore, they are considered a powerful source of information [4]. These and other characteristics of Web 2.0 and open sources are the key pillars behind the second generation of Open-Source Intelligence (OSINT) [5] and the reasons behind the diffusion of OSINT in both public and private sectors like defense, marketing, due diligence and so on.

Existing literature shows how the Open-Source Intelligence (OSINT) cycle can be applied to collect and analyze text and multimedia content from the web and social media [6] to manage dangerous situations [7, 8]. In this sense, monitoring open sources and extracting the related actionable and valuable knowledge rewrite classical and vertical processes into new ones that can support the expert in a critical context. In the environmental domain, the use of open sources such as social media and micro-blogging could constitute valuable input to the fire monitoring process [9, 10]; in particular, the identification and reporting of posts related to occurring and unknown firing events could increase the monitoring coverage that usually is human-based or, depending on both financial availability and current weather conditions, satellite-based.

For what concern satellite-based monitoring, both private and publicly available platforms allow the domain expert to locate active emergencies all over the world. However, these platforms have the inherent weakness of being a satellite-based service, only providing information when and where the satellite passes orbit and only in case of suitable weather conditions, leaving many emergencies without an appropriate information set required to make high-impact and high-risk decisions. Satellite monitoring does not ensure comprehensive detection of all fires or real-time access to their details. Instances of fire initiation and extinction between satellite observations present potential limitations. Factors such as cloud cover, dense smoke, or the presence of a tree canopy can obscure a fire entirely, making it undetectable. Furthermore, fires that are too small or insufficiently hot may elude registration through satellite-based monitoring. Therefore, it is important to use a multi-domain approach that leverages the strengths of both OSINT and GEOINT (Geospatial Intelligence), respectively, for the monitoring of open sources and satellite imagery and cross-relate information from different sources to realize a reliable detection of fire events [11, 12].

This paper proposes a fire indicator, the Perceived Risk Index (PRI), which leverages the cognitive process behind people’s risk perception [13] on social media [14] to detect, locate and monitor active fires. In particular, PRI considers the amount of alerts in the same geographical zone and the intensity of expressed emotion to give monitoring experts a measure of the perceived intensity of an ongoing fire. The index is assessed through a system that collects, classifies, annotates, and correlates posts with satellite data. The system gathers information from social media posts, classifies it to filter out irrelevant posts, and applies Natural Language Processing (NLP) techniques to extract knowledge in terms of contents, time, and geographical localization. Additionally, other types of information are collected by cross-relating open-source data, particularly satellite observations. The obtained fire reports are stored in a full-text index implemented with Apache Solr and visually exposed through a dashboard to help experts detect and monitor ongoing fire events.

The main contributions of the work can be summarized as follows:

  • Definition of a Perceived Risk Index (PRI) measuring risk related to ongoing fires leveraging microblogging contents and expressed emotions.

  • Assessing PRI reliability by correlating its distribution during real fires with satellite fire intensity distributions on different datasets.

  • A monitoring framework able to collect, classify, annotate, correlate posts with satellite data, and summarize detected alerts in an interactive dashboard.

The rest of the paper is organized as follows: the “Related Work” section analyzes the state-of-the-art on fire detection and techniques involved in the process; the “Overall Process” section describes the proposed approach. Experimentation is described in the “Experimentation” section; and some work limitations are described in the “Limitations” section. The “Conclusions” section concludes the work.

Related Work

Existing scientific literature amply explored the relationship between open sources and key domains like economy and finance [15, 16], public safety [17, 18] and environmental monitoring [19]. In the domain of interest, in particular, proposed methodologies cover all relevant categories, such as air pollution, floods, fires, and so on. For example, in 2019, Gurajala et al. [20] collected two years of tweets from Paris, London, and New Delhi to analyze the societal response to air quality. In particular, leveraging Natural Language Processing, topic modeling, and three different text classifiers demonstrated that the number of tweets related to concerns about air quality degradations is highly correlated to PM values.

The literature also highlights that locating environmental events via social activity without other information is feasible, albeit requires further refinement to achieve optimal results [21]. Therefore, an assessment phase is required to validate social data with classical insights and vice-versa. Main assessment processes leverage data from multiple sources and remote sensing by fusing them to compute a final index useful to the expert. This information can subsequently be utilized to direct remote sensing data collection (e.g., via satellites) for a more comprehensive analysis while managing the crisis. Studies have demonstrated that utilizing multiple modes or cross-modal learning can significantly improve the results compared to using only one data type [22, 23]. In particular, Pramanik et al. [24] state that online social media ubiquity can be considered a “sensor” and can be used to extend further coverage where there are no Continuous Ambient Air Quality Monitoring Stations (CAAQMSs). Nevertheless, they fuse influential user tweets with CAAQMS to create a crowd-sensed air quality measurement framework required to raise awareness and support the activities of boards. Through these sensors, authors define the “influential users” that express their reactions, sentiments, and opinions towards pollution levels and, by tracking their tweets, estimate air quality in urban areas of developing regions. Kumbalaparambi et al. [25] relate tweets and their enrichment to the PM2.5 concentration. In particular, starting from a word cloud of expressed emotions for each season, they identify the main tokens used when talking about air pollution issues. Then, the authors use a self-attention mechanism to categorize into three air pollution classes (poor, good, and noise-neutral) related tweets that discuss air quality issues; finally, they define a BiLSTM to estimate the PM2.5 concentration. The BiLSTM is trained by collected tweets and related signals produced by CAAQMSs. Sadiq et al. [26] highlight how, during flood events, the infrastructure damaged cannot always be detected using remote sensing and leveraging social sensing like microblogging activities as a source of useful information. They also fuse remote and social sensing data to derive informed flood extent maps. Khan et al. [27] studied the quality of social media data to understand if it constitutes a reliable alternative in the absence of authoritative and official data related to flooding scenarios by focusing on media content like images and video. Social media data is fused with official rainfall data to assess the validity of tweet statements and identify the following three types of signals: (i) confirmatory signals, which imply a high level of confidence that a region is flooded; (ii) complementary signals that provide contextual information such as needs and requests, disaster impact or damage; and (iii) novel signals when both data sources do not overlap and provide a unique set of data points. Liu et al. [28] designed and tested six different computational and spatiotemporal analytical approaches to assess the relevance of risk information extracted from tweets and apply it during the 2013 Colorado flood event.

In a fire monitoring scenario, the goal is to detect as early as possible a fire event to activate the disaster management process able to save lives, protect the environment, and assess damages. In this sense, social media are adopted for estimating a level of fire risk from the vulnerability of population and ecological system points of view. Loureiro et al. [9], leveraging Natural Language Processing (NLP) and sentiment analysis, related social media posts about wildfire with political, economic, and welfare perceptions. The approach defines a hedonometer estimating how sentiments about wildfires vary with exposure, measured via Euclidean distance between the event of interest and air quality. Yue et al. [10] propose a proof of concept that uses geo-tagged social media-derived data to evaluate wildfire hazard and social-ecological vulnerability with the final goal of identifying the most vulnerable area. Researchers conclude that (i) Geo-tagged social media data are useful for disaster risk studies and (ii) massive and vulnerable populations might result in a significant increase in wildfire risk perception. CASPER (Category and Sentiment-based Problem Finder) system [29] detects wildfires by tracking the sentiment expressed in social media posts.

Some studies expand text comprehension through transformer models [30]. This is the case for the work in [31], where a BERT-based classifier recognizes fire-related tweets obtained via a query-based crawler and signals the alarm only in case of a true positive. Even if Ningsih and Hadiana [32] notice that it is not always apparent if the words of a person announce a catastrophe, the detection of disasters in tweets is often difficult due to the uncertainty of tweet language structure, a vast number of recent methodologies leverage a classifier for disaster tweets classification to support disaster management, rescue and emergency responders in spreading information during disasters and needy situations [33,34,35].

This paper proposes a Perceived Fire Index based on tweet information that intends to give an overall idea of the seriousness of an ongoing fire event when more official data are unavailable. The proposed index leverages recent developments in Natural Language Processing (i.e., Transformer-based classifiers) to identify relevant tweets and analyze them in terms of geo-location and expressed emotions.

Overall Process

The proposed solution constructs a monitoring framework that gives experts awareness of ongoing fires by cross-relating open-source information coming from satellite and social media. The system acquires and extracts information and processes data to provide experts, through an interactive dashboard, with a summary of potentially dangerous situations regarding fire events. Finally, an assessment of a Perceived Risk Index, derived from the posted tweets, offers experts an indicator of event severity.

The process, outlined in Fig. 1, consists of the following steps:

  1. 1.

    Twitter Crawler: Employing the Twitter API, a crawler retrieves tweets based on a user-specified search query (e.g., “fire”);

  2. 2.

    Fire Tweet Classifier: In this phase, incoming tweets are classified to filter relevant ones (i.e., tweets actually reporting a fire) through an ad-hoc model fine-tuned during the experimentation of the proposal;

  3. 3.

    Tweet Annotation: During this phase, relevant tweets are automatically processed to extract additional information regarding the place and date of the fire and the emotions expressed in the text. In particular, Neural Networks at the state-of-the-art are exploited for Named Entity Recognition, a fine-tuned roBERTa model extracts emoji and their score from tweet content, and GeoNames is adopted for the geolocalize the warning in the tweet.

  4. 4.

    Fire Detection: The system constantly retrieves additional information from distributors of active fire data, exploiting satellite sensors.

  5. 5.

    Fire Reports Database: All collected tweets and their metadata are stored within the Fire Reports database.

  6. 6.

    Perceived Risk Index: This stage, involving geographic data and specific time intervals, exploits matching tweets to evaluate a Perceived Risk Index. It measures people’s perceptions about active fires, giving experts an idea of the seriousness of fire from the users’ point of view.

  7. 7.

    Interactive Fire Dashboard: This component allows experts to monitor specific areas or at-risk situations in a user-friendly, interactive interface.

Fig. 1
figure 1

Overall Process. Tweets collected by the crawler are classified and, if relevant, automatically annotated. Then, they are cross-related with data from satellites. Finally, a Perceived Risk Index is evaluated, and all information is provided on an interactive dashboard

Twitter Crawler

To realize the crawler, Twitter API,Footnote 2 which enables programmatic access to Twitter, is accessed through the Python library Tweepy.Footnote 3 It is adopted to request tweets in real time. API also catches additional data such as retweets, replies, likes, and special contents (e.g., images) of any tweet the query finds.

Fire Tweet Classifier

For the Fire Tweet Classifier construction, a language model is fine-tuned through a fire dataset. The objective is to construct a classifier to distinguish between generic tweets and tweets reporting fires. A bert-base-uncased modelFootnote 4 [36] is trained and tested through three wildfire datasets from the “Disaster Tweet Corpus 2020Footnote 5”. In particular, training and test sets were constructed (with a percentage of 70 and 30, respectively) by randomly selecting tweets contained in the following datasets:

  • Wildfire-australia-2013

  • Wildfire-california-2014

  • Wildfire-colorado-2012

The Disaster Tweet Corpus 2020 dataset consists of tweets collected during 48 disasters over 10 disaster types, with human annotations denoting if a tweet is related to such disaster or not [37]. In particular, datasets contain 6440 tweets, of which one-half refers to fires and the other does not.

Training has been done through the following hyperparameters: batch size of 16; learning rate of \(5e^{-5}\); AdamW optimizer; 2 epochs.

Tweet Annotation

Tweet Annotation aims to extract useful information from collected tweets, such as geo-localization, time, emotions, and so on. Actually, the date is associated with the creation timestamp of the tweet; for the location, there are two ways to obtain it: (i) via the geo-tag (attached coordinates or place identifier) when available and (ii) by searching places mentioned in the text, through a NER (Named Entity Recognition) process. The locations mentioned in the text undergo geocoding to obtain the relative coordinates. If no place is found in a tweet, it is discarded.

As mentioned, the Tweet Annotation subphase passes through a Natural Language Processing pipeline that extracts the location mentioned within the tweet content and the associated expressed emotions. For the first goal, the adopted Python library is StanzaFootnote 6: a collection of tools for the linguistic analysis of many human languages [38]. Named entities of interest for the proposed system are locations (e.g., addresses, cities, counties) and GeoPolitical Entities (GPE) such as States. Whenever the NER extracts more than one geographical entity, the corresponding tweet is discarded. Moreover, a call to the GeoNames WebserviceFootnote 7 (through the GeoPy libraryFootnote 8) allows us to detect its coordinates for each extracted entity. Given a location name, GeoNames searches for it and returns a detailed data structure containing geographical information (e.g., latitude and longitude). In case multiple locations are found via this process, only the first result (the most likely) will be considered for the given query.

The extraction of emotions from tweets exploits the emoji extraction implemented by the twitter-roberta-base-emojiFootnote 9 transformer model. The model predicts 20 emojis and their scores (in the range \([0-1]\)) [39]. Predicted emojis and their score, particularly the fire one, are added to stored reports. The framework exploits the fire emotional score to measure the perceived user’s emotion associated with the fire event.

Fire Detection

The Fire Information for Resource Management System (FIRMS)Footnote 10 collects active fire data. FIRMS distributes Near Real-Time (NRT) active fire data from the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the Aqua and Terra satellites and the Visible Infrared Imaging Radiometer Suite (VIIRS) aboard S-NPP and NOAA 20 (formally known as JPSS-1). Globally, these data are available within 3 h of satellite observation, but active fire detection is available in real-time for the US and Canada. Data collected by FIRMS, available to download in a structured format, contains information such as latitude, longitude, brightness, satellite, instrument, and acquiring date.

Requests to FIRMS are made through the official API.Footnote 11 Uncoupled from the tweet retrieval, the system restarts the satellite collection every 30 min. Then, new satellite observations are matched with stored fire reports (not yet validated) through places and times. The geographical match exploits Eq. 1, while the time matching considers a match between days. When a match is found, the corresponding fire report (i.e., the set of tweets referencing the same fire event) is considered “validated”, and the level of intensity of fire (i.e., the reported brightnessFootnote 12) is associated with it. In particular, items corresponding to tweets just validated are updated, and the corresponding brightness is stored.

The matching between places is made by evaluating the distance between their coordinates leveraging the Haversine formula [40]. It is a mathematical formula used to calculate the distance between two points on the surface of a sphere, given their latitudinal and longitudinal coordinates. It is commonly used in navigation and geolocation applications, especially in calculating distances on the Earth. Following its formal definition:

$$\begin{aligned} {\begin{matrix} &{} hav(\Theta ) = hav(\phi _2 - \phi _1) + \cos (\phi _1)\cos (\phi _2)hav(\lambda _2- \lambda _1)\\ &{} hav(\Theta ) = \sin ^2\left( \frac{\Theta }{2}\right) = \frac{1-\cos (\Theta )}{2} \end{matrix}} \end{aligned}$$
(1)

where

  • \(\phi _1\) and \(\phi _2\) are latitudes of the first and second point, respectively;

  • \(\lambda _1\) and \(\lambda _2\) are longitudes of the first and second point, respectively;

  • \(\theta\) is the central angle formed by the two points and the center of the Earth.

Regarding time, tweets and satellite observations match when the reference date is the same, regardless of time.

Fire Reports Database

Every tweet and fire information extracted is stored in an Apache SolrFootnote 13 index. Apache Solr is an open-source enterprise-search platform with REST-like API. Solr stores the documents in structures called cores. Every core has its schema, which defines data types for every field and indexing and querying functionalities. The core used for the proposed system stores documents consisting of the following fields:

  • id: Solr identifier for the specific document (representing a tweet): is unique and automatically generated at the document creation;

  • \(id\_tweet\): identifier of the tweet associated with the document;

  • text: text content of the tweet.

  • user: author of tweet;

  • \(retweet\_count\): number of retweets for the given tweet;

  • \(favorite\_count\): number of likes expressed for the given tweet;

  • \(retweeted\_tweet\): identifier of the retweeted tweet (if the given tweet is a retweet);

  • entities: entities extracted by the NLP from the tweet content;

  • emotions: type of emotions and their scores extracted from the tweet text and expressed as follows: \(emotionType_1: score_1\), \(emotionType_2: score_2\), \(\dots\);

  • coordinates: geolocalization of the tweet (extracted from the tweet metadata or its text), expressed in the “Latitude, Longitude” format;

  • date: tweet creation timestamp;

  • bright: level of brightness (i.e., intensity) of fire detected by the satellite on the same date and place;

  • firms: boolean value stating if a satellite detection has validated the report (i.e., the satellite has also detected the fire event).

The store component containing fire reports has been implemented through the Python library pysolr.Footnote 14

Perceived Risk Index Evaluation

The definition of the Perceived Risk Index (PRI) derives from a preliminary analysis of tweet aspects’ influence. In particular, a Multiple Linear Regression (MLR) has been done to determine a dependency between the brightness value (i.e., the fire strength or intensity) detected by the satellite and information from Twitter. The emotional fire score, number of retweets and likes, and the total number of tweets in the same day are associated with each detected tweet. The level of brightness is the dependent variable; the objective is to understand what tweet features (as independent variables) affect the fire intensity. The adopted dataset is the same as Fire Tweet Classifier training (see the “Fire Tweet Classifier” section). Results of the linear regression, shown in Table 1, register an \(R^2\) of 0.72. They affirm that with a level of significance \(\alpha =0.05\), the emotional fire score and the total number of tweets are relevant features for predicting a fire event intensity level. The same cannot be said for the number of retweets and likes. It follows that the Perceived Risk Index should exploit the number of tweets and the fire event intensity level. So, by defining a decay function \(\delta _t\) as follows:

$$\begin{aligned} \delta _t = 2^{-\lambda (t-t_{last})} \end{aligned}$$
(2)

where:

  • \(\lambda\) is a decay factor in \([0-1]\);

  • t is the current instant;

  • \(t_{last}\) is the instant of the last valid tweet.

The PRI for the g geographical area, at time t, is assessed through the following equation:

$$\begin{aligned} {Perceived\_Risk\_Index_{gt} = {\sum _{tw \in T} s_{tw}} * |T| * \delta _t} \end{aligned}$$
(3)

where:

  • T is the tweet set for a given geographic area and date, posted by different users;

  • \(s_{tw}\) is the emotional fire score of tweet tw;

  • |T| is the cardinality of T.

The decay function needs to align the intensity of PRI to the real evolution of fire. In particular, it guarantees a gradual PRI decrease when the tweet stream slows down.

After an empirical analysis of results on the evaluated datasets, the Perceived Risk Index value can be interpreted as follows:

  • \(Perceived\_Risk\_Index < 50\) corresponds with low risk;

  • \(50< Perceived\_Risk\_Index < 500\) corresponds with a moderate risk;

  • \(Perceived\_Risk\_Index \ge 500\) corresponds with a high risk.

Table 1 Regression results

Interactive Fire Dashboard

Data contained in Solr is available through a dashboard realized with Banana,Footnote 15 a Solr plugin. From the dashboard, it is possible to filter reports and get every information about them, such as tweets generating the report and the intensity score. The dashboard also includes a map to show reports geographically based on their coordinates. In particular, the user can specify a query and a time period to search for reports (Fig. 2). Results are exposed in a table and graphical form and through a map, highlighting reports through a marker corresponding with their position. Markers change based on the number of detected fires and their intensity, as depicted in the example in Fig. 3).

Fig. 2
figure 2

Detail of query and filter panels. The dashboard filters fire messages based on query input

Fig. 3
figure 3

Detail of map panel. Fire messages are made more visible on the map, highlighting the fire intensity

Experimentation

The experimentation of the proposed approach consists of analyzing the existence of a significant correlation between the intensity of the fire (detected by the satellite) and the proposed Perceived Risk Index. The objective is to demonstrate its validity as an index for fire detection and monitoring.

Datasets

The implemented system has been tested by collecting tweets from 1 to 20 in June 2022, leveraging the Twitter API. A total of 6245 English tweets have been found through the following query:

figure a

Among all collected tweets, 4923 have been classified as relevant (i.e., fire-related) and undergo the process described in the “Overall Process” section and exemplified in Fig. 4. At the end of annotation process 3804 tweets are considered. Info about the size of datasets is reported in Table 2.

Table 2 Size of adopted datasets

The example in Fig. 4 shows that in classifying a set of three tweets, one is considered fire-related and undergoes the annotation process. The annotation process, particularly NER, recognizes “Arizona” as a geographical entity from text content and “Jun 14, 2022” as a time reference from tweet metadata. The Transformer model predicts the fire emoji for tweet content with an intensity (i.e., probability) of about 0.37. Then, through the GeoNames Webservice, coordinates of “Arizona” are extracted (i.e., latitude: 34.5, longitude: \(-\)111.5). Given the information from the annotation process, a match with satellite data considering date and geographical area can be done. In particular, let us assume the satellite has detected a fire at the coordinates latitude: 33.3 and longitude: \(-\)111.6. The match could be done if we assume a maximum distance limit of 150 kms from the extracted geographical entity (i.e., Arizona). So, the fire report can be validated, and the brightness (severity/intensity) level of the detected fire can be determined. In this instance, the corresponding satellite detection indicates a brightness of 367K.

Fig. 4
figure 4

Case Study. The example shows the application of the pipeline to three different tweets, highlighting the match between tweets and satellite data

In addition, three public datasets of wildfire (described in the “Fire Tweet Classifier” section) are adopted for the correlation evaluation.

Fire Tweet Classifier Performance Evaluation

The fine-tuned model for the classification of relevant tweets obtains the subsequent performance: Accuracy: \(99\%\), Average F1-score: \(98\%\). Performance has also been compared with recent approaches, as shown in Table 3.

Table 3 Fire Tweet Classifier: performance comparison

Results

The evaluation of the validity of the proposed index in measuring the seriousness of an ongoing fire passes through the elaboration of Pearson’s correlation coefficient. In particular, for each considered dataset, two distributions are compared:

  • PRIs for each considered time interval;

  • The corresponding brightness values for each considered time interval.

The shared time interval and geographical area join distribution pairs. Moreover, for each dataset, we assess the decay function \(\delta\) by setting three values for the decay factor \(\lambda\): 0.01, 0.1, and 0.9.

After checking the normalization of distributions, we extracted Pearson’s correlation coefficients and their significance (represented by the p_value).

Results, summarized in Table 4, show an average best correlation of 0.77 among all adopted datasets with a valid significance. In particular, for the adopted datasets, the value 0.1 is the most suitable decay factor. Such correlation demonstrates the validity of the proposed Perceived Risk Index as a measure for early alerting in detecting and monitoring fire events.

Table 4 Correlation results

Limitations

The evaluation of the proposed Perceived Risk Index (PRI) showcases the potential of utilizing social media information to derive a risk assessment associated with fire. Nevertheless, despite the potential of this research, it does have some limitations which will be the focus of future developments and improvements:

  • The current analysis is limited to posts in English. Future research should explore methods to extend the analysis to a multilingual context because analyzing only English content may lead to the omission of crucial data for preventing and managing emergency situations, as relevant information could be expressed in various languages.

  • The analysis does not incorporate images and videos associated with fire scenes, a factor that could enhance the credibility of user-contributed posts. Future developments should focus on integrating multimedia content to provide a more comprehensive understanding of incident reports.

  • The methodology faces challenges in mitigating false alerts generated by disinformation campaigns. Future extensions should improve filters to identify and exclude misleading information effectively. This includes addressing the dissemination of inaccurate and deceptive information by malicious actors.

  • The current implementation relies on Twitter as the social media source. This introduces a dependency on the availability of the Twitter API service and related updates. Future enhancements should explore the inclusion of multiple social media platforms to diversify data sources and improve reliability.

  • The methodology relies on the acquisition of satellite data through FIRMS API, which could bring in a delay between collected data and data really accessible. In this sense, the hypothesis of introducing additional acquisition techniques could be evaluated in the future.

Conclusions

This paper adopts an information fusion approach to propose a Perceived Risk Index concerning fire events. First, a Twitter crawler collects tweets, and a classifier filters them based on their relevance; then, tweets are processed in terms of content (such as NLP, geo-location, and emotion extraction). Finally, fire information is cross-related with satellite information to construct the Perceived Risk Index. Such an indicator leverages the number of relevant daily tweets for a specific geographic area and the emotional fire score extracted from them. Collected reports and additional information populate a Solr core with content available to experts through an interactive dashboard. Although some technical limitations due, for example, to the availability of APIs or accessibility of social media posts, the analysis of the correlation (also on real datasets) between the fire brightness and the proposed indicator reveals the validity of the index for assisting experts in detecting and monitoring fire events.

In the future, the proposed indicator could be extended by evaluating the contribution of the following information:

  • Analysis to a multilingual context;

  • Images (and their contents) attached to tweets;

  • The existence of links in the tweets and, eventually, the reliability of corresponding sites;

  • Posts from additional social media, like Instagram and Facebook, to enrich incoming data.

  • Introduce a corroboration methodology to weaken disinformation attempts.