Skip to main content
Log in

Unifying Faceted Search and Analytics over RDF Knowledge Graphs

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The formulation of analytical queries over Knowledge Graphs in RDF is a challenging task that presupposes familiarity with the syntax of the corresponding query languages and the contents of the graph. To alleviate this problem, we introduce a model for aiding users in formulating analytic queries over complex, i.e., not necessarily star schema-based, RDF Knowledge Graphs. To come up with an intuitive interface, we leverage the familiarity of users with Faceted Search systems. In particular, we start from a general model for Faceted Search over RDF data, and we extend it with actions that enable users to formulate analytic queries, too. Thus, the proposed model can be used not only for formulating analytic queries but also for exploratory purposes, i.e., for locating the desired resources in a Faceted Search manner. We describe the model from various perspectives, i.e., (1) we propose a generic user interface for intuitively analyzing RDF Knowledge Graphs, (2) we define formally the state space of the interaction model and the required algorithms for producing the user interface actions, (3) we present an implementation of the model that showcases its feasibility, and (4) we discuss the results of an evaluation with users that provides evidence for the acceptance of the method by users. Apart from being intuitive for end users, another distinctive characteristic of the proposed model is that it allows the gradual formulation of complex analytic queries (including nested ones).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability

The dataset used in the running example as well as the running system is publicly accessible.

Notes

  1. https://en.wikipedia.org/wiki/RDF_Schema.

  2. https://www.w3.org/standards/semanticweb/inference.

  3. http://data.persee.fr/explore/sparklis/?lang=en.

  4. https://team.inria.fr/oak/projects/warg/.

  5. https://www.w3.org/TR/vocab-data-cube/.

  6. i.e., the results of the SPARQL query “select ?x where { ?x rdf:type owl:NamedIndividual. }.”

  7. The reflexive and transitive reduction in a binary relation R is the smallest relation R’ such as both R and R’ have the same reflexive and transitive closure.

  8. http://docs.openlinksw.com/virtuoso/.

  9. https://angular.io/.

  10. The deployment of the system that was used is accessible at https://demos.isl.ics.forth.gr/rdf-analytics.

References

  1. Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the web of data. J Web Semant 7(3):154–165

    Article  Google Scholar 

  2. Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledgebase. Commun ACM 57(10):78–85

    Article  Google Scholar 

  3. Isaac A, Haslhofer B (2013) Europeana linked open data–data. europeana. eu. Semant Web 4(3):291–297

    Article  Google Scholar 

  4. Fafalios P, Petrakis K, Samaritakis G, Doerr K, Kritsotaki A, Tzitzikas Y, Doerr MFASTCAT (2021) collaborative data entry and curation for semantic interoperability in digital humanities. J Comput Cult Herit (JOCCH) 14(4):1–20

    Article  Google Scholar 

  5. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z et al (2018) DrugBank 5.0: a major update to the drugbank database for 2018. Nucl Acids Res 46(D1):1074–1082

    Article  Google Scholar 

  6. Tzitzikas Y, Marketakis Y, Minadakis N, Mountantonakis M, Candela L, Mangiacrapa F et al (2019) Methods and tools for supporting the integration of stocks and fisheries. In: Information and communication technologies in modern agricultural development: 8th international conference, HAICTA 2017, Chania, Crete, Greece, September 21–24, 2017, Revised Selected Papers 8. Springer, pp 20–34

  7. Koho M, Ikkala E, Leskinen P, Tamper M, Tuominen J, Hyvönen E (2020) Warsampo knowledge graph: Finland in the second world war as linked open data. Semantic Web—Interoperability, Usability, Applicability. https://doi.org/10.3233/SW-200392. In press

  8. Jaradeh MY, Oelen A, Farfar KE, Prinz M, D’Souza J, Kismihók G, Stocker M, Auer S (2019) Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th international conference on knowledge capture, pp 243–246

  9. Dimitrov D, Baran E, Fafalios P, Yu R, Zhu X, Zloch M, Dietze S (2020) TweetsCOV19—a knowledge base of semantically annotated tweets about the COVID-19 pandemic. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 2991–2998

  10. Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Burdick D, Eide D, Funk K, Katsis Y, Kinney R et al (2020) COVID-19 open research dataset (CORD-19). https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge

  11. Gazzotti R, Michel FGF (2020) CORD-19 named entities knowledge graph (CORD19-NEKG). Zenodo. https://doi.org/10.5281/zenodo.3827449

  12. Tzitzikas Y (2022) FS2KG: from file systems to knowledge graphs (demo). In: ISWC 2022

  13. Mountantonakis M, Tzitzikas Y (2023) Using multiple RDF knowledge graphs for enriching ChatGPT responses. In: European conference on machine learning and principles and practice of knowledge discovery in databases, ECML PKDD

  14. Chatzakis M, Mountantonakis M, Tzitzikas Y (2021) RDFsim: similarity-based browsing over DBpedia using embeddings. Information 12(11):440

    Article  Google Scholar 

  15. Nikas C, Kadilierakis G, Fafalios P, Tzitzikas Y (2020) Keyword search over RDF: is a single perspective enough? Big Data Cogn Comput 4(3):22

    Article  Google Scholar 

  16. Kritsotakis V, Roussakis Y, Patkos T, Theodoridou M (2018) Assistive query building for semantic data. In: SEMANTICS posters & demos

  17. e Zainab SS, Saleem M, Mehmood Q, Zehra D, Decker S, Hasnain A (2015) FedViz: a visual interface for SPARQL queries formulation and execution. In: VOILA@ ISWC, p 49

  18. Ferré S (2014) SPARKLIS: a SPARQL endpoint explorer for expressive question answering. In: ISWC posters and demonstrations track

  19. Akritidis A, Tzitzikas Y (2023) Demonstrating interactive SPARQL formulation through positive and negative examples and feedback. In: 26th international conference on extending database technology, EDBT 2023

  20. Sacco GM, Tzitzikas Y (2009) Dynamic taxonomies and faceted search: theory, practice, and experience. Springer, Berlin

    Book  Google Scholar 

  21. Tzitzikas Y, Manolis N, Papadakos P (2017) Faceted exploration of RDF/S datasets: a survey. J Intell Inf Syst 48(2):329–364

    Article  Google Scholar 

  22. Papadaki M-E, Tzitzikas Y (2023) RDF-ANALYTICS: interactive analytics over RDF knowledge graphs. In: 26th international conference on extending database technology, EDBT 2023

  23. Antoniou G, Van Harmelen F (2004) A semantic web primer. MIT Press, Cambridge

    Google Scholar 

  24. Mountantonakis M, Tzitzikas Y (2018) LODsyndesis: global scale knowledge services. Heritage 1(2):23

    Article  Google Scholar 

  25. Prieto-Diaz R (1991) Implementing faceted classification for software reuse. Commun ACM 34(5):88–97

    Article  Google Scholar 

  26. Sacco G (2000) Dynamic taxonomies: a model for large information bases. IEEE Trans Knowl Data Eng 12(3):468–479

    Article  Google Scholar 

  27. English J, Hearst M, Sinha R, Swearingen K, Yee K-P (2002) Hierarchical faceted metadata in site search interfaces. In: CHI’02 extended abstracts on human factors in computing systems, pp 628–639

  28. Tunkelang D (2009) Faceted search, vol 5. Morgan & Claypool Publishers, San Rafael

    Book  Google Scholar 

  29. Russell-Rose T, Tate T (2012) Designing the search experience: the information architecture of discovery. Newnes, Oxford, p 45

    Google Scholar 

  30. Tessel B (2019) Metadata categorization for identifying search patterns in a digital library. J Doc 75(2):270–286. https://doi.org/10.1108/JD-06-2018-0087

    Article  Google Scholar 

  31. Kobayashi Y, Shindo H, Matsumoto Y (2019) Scientific article search system based on discourse facet representation. Proc AAAI Conf Artif Intell 33:9859–9860. https://doi.org/10.1609/aaai.v33i01.33019859

    Article  Google Scholar 

  32. Moreno-Vega J, Hogan A (2018) GraFa: scalable faceted browsing for RDF graphs. In: International semantic web conference. Springer, Berlin, pp 301–317

  33. Manioudakis K, Tzitzikas Y (2020) Faceted search with object ranking and answer size constraints. ACM Trans Inf Syst (TOIS) 39(1):1–33

    Article  Google Scholar 

  34. Arenas M, Grau BC, Kharlamov E, Marciuška Š, Zheleznyakov D (2016) Faceted search over RDF-based knowledge graphs. J Web Semant 37:55–74

    Article  Google Scholar 

  35. Feddoul L, Schindler S, Löffler F (2019) Automatic facet generation and selection over knowledge graphs. In: International conference on semantic systems. Springer, Berlin, pp 310–325

  36. Spyratos N, Sugibuchi T (2018) HIFUN-a high level functional query language for big data analytics. J Intell Inf Syst 51:529–555

    Article  Google Scholar 

  37. Papadaki M-E, Tzitzikas Y, Mountantonakis M (2023) A brief survey of methods for analytics over RDF knowledge graphs. Analytics 2(1):55–74

    Article  Google Scholar 

  38. Ferré S (2021) Analytical queries on vanilla RDF graphs with a guided query builder approach. In: International conference on flexible query answering systems. Springer, Berlin, pp 41–53

  39. Ferré S (2017) Sparklis: an expressive query builder for SPARQL endpoints with guidance in natural language. Semant Web 8(3):405–418

    Article  Google Scholar 

  40. Sherkhonov E, Grau BC, Kharlamov E, Kostylev EV (2017) Semantic faceted search with aggregation and recursion. In: International semantic web conference. Springer, Berlin, pp 594–610

  41. Kharlamov E, Giacomelli L, Sherkhonov E, Grau BC, Kostylev EV, Horrocks I (2017) Semfacet: making hard faceted search easier. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 2475–2478

  42. Leskinen P, Miyakita G, Koho M, Hyvönen E (2018) Combining faceted search with data-analytic visualizations on top of a SPARQL endpoint. In: CEUR workshop proceedings

  43. Hyvönen E, Ahola A, Ikkala E (2022) Booksampo fiction literature knowledge graph revisited: building a faceted search interface with seamlessly integrated data-analytic tools. In: 26th international conference on theory and practice of digital libraries, TPDL 2022, Padua, Italy, September 20–23, 2022. Springer, Berlin, pp 506–511

  44. Zhao P, Li X, Xin D, Han J (2011) Graph cube: on warehousing and OLAP multidimensional networks. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 853–864

  45. Azirani EA, Goasdoué F, Manolescu I, Roatiş A (2015) Efficient OLAP operations for RDF analytics. In: 2015 31st IEEE international conference on data engineering workshops. IEEE, pp 71–76

  46. Benatallah B, Motahari-Nezhad HR et al (2016) Scalable graph-based OLAP analytics over process execution data. Distrib Parallel Databases 34:379–423

    Article  Google Scholar 

  47. Papadaki M-E, Spyratos N, Tzitzikas Y (2021) Towards interactive analytics over RDF graphs. Algorithms 14(2):34

    Article  MathSciNet  Google Scholar 

  48. Hasan SS, Rivera D, Wu X-C, Durbin EB, Christian JB, Tourassi G (2020) Knowledge graph-enabled cancer data analytics. IEEE J Biomed Health Inform 24(7):1952–1967

    Article  Google Scholar 

  49. Michel F, Gandon F, Ah-Kane V, Bobasheva A, Cabrio E, Corby O, Gazzotti R, Giboin A, Marro S, Mayer T et al (2020) Covid-on-the-Web: knowledge graph and services to advance COVID-19 research. In: International semantic web conference. Springer, Berlin, pp 294–310

  50. Salast PER, Martin M, Da Mota FM, Auer S, Breitman KK, Casanova MA (2012) Olap2datacube: an ontowiki plug-in for statistical data publishing. In: 2012 second international workshop on developing tools as plug-ins (TOPI). IEEE, pp 79–83

  51. Zloof MM (1975) Query-by-example: the invocation and definition of tables and forms. In: Proceedings of the 1st international conference on very large data bases, pp 1–24

  52. Li H, Chan C-Y, Maier D (2015) Query from examples: an iterative, data-driven approach to query construction. Proc VLDB Endow 8(13):2158–2169

    Article  Google Scholar 

  53. Arenas M, Diaz GI, Kostylev EV (2016) Reverse engineering SPARQL queries. In: Proceedings of the 25th international conference on world wide web, pp 239–249

  54. Diaz G, Arenas M, Benedikt M (2016) SPARQLByE: querying RDF data by example. Proc VLDB Endow 9(13):1533–1536

    Article  Google Scholar 

  55. Ali W, Saleem M, Yao B, Hogan A, Ngomo A-CN (2021) A survey of RDF stores & SPARQL engines for querying knowledge graphs. VLDB J (2021). (accepted for publication)

  56. Nikas C, Fafalios P, Tzitzikas Y (2021) Open domain question answering over knowledge graphs using keyword search, answer type prediction, SPARQL and pre-trained neural models. In: International semantic web conference. Springer, Berlin, pp 235–251

  57. Ali E, Caputo A, Lawless S, Conlan O (2021) Personalizing type-based facet ranking using BERT embeddings

  58. Niu X, Fan X, Zhang T (2019) Understanding faceted search from data science and human factor perspectives. ACM Trans Inf Syst (TOIS) 37(2):1–27

    Article  Google Scholar 

  59. Tzitzikas Y, Papadaki M-E, Chatzakis M (2021) A spiral-like method to place in the space (and interact with) too many values. J Intell Inf Syst 58:1–25

    Google Scholar 

  60. Ravindra P, Deshpande VV, Anyanwu K (2010) Towards scalable RDF graph analytics on mapreduce. In: Proceedings of the 2010 workshop on massive data analytics on the cloud, pp 1–6

  61. Zou L, Özsu MT, Chen L, Shen X, Huang R, Zhao D (2014) gStore: a graph-based SPARQL query engine. VLDB J 23:565–590

    Article  Google Scholar 

  62. Ibragimov D, Hose K, Pedersen TB, Zimányi E (2015) Processing aggregate queries in a federation of SPARQL endpoints. In: The semantic web. Latest advances and new domains: 12th European semantic web conference, ESWC 2015, Portoroz, Slovenia, May 31–June 4, 2015. Proceedings 12. Springer, Berlin, pp 269–285

  63. Ibragimov D, Hose K, Pedersen TB, Zimányi E (2016) Optimizing aggregate SPARQL queries using materialized RDF views. In: The semantic web–ISWC 2016: 15th international semantic web conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part I 15. Springer, Berlin, pp 341–359

  64. Codd EF, Codd SB, Salley CT (1993) Providing OLAP (on-line analytical processing) to user-analysts: an IT mandate. E. F. Codd and Associates

  65. Faulkner L (2003) Beyond the five-user assumption: benefits of increased sample sizes in usability testing. Behav Res Methods Instrum Comput 35:379–383

    Article  Google Scholar 

Download references

Acknowledgements

Many thanks to Alexandros Perrakis for proof reading the entire paper and for developing the second implementation of the model.

Funding

FORTH-ICS.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception, design, and writing of this work. The implementation of the system was done by Maria-Evangelia Papadaki.

Corresponding author

Correspondence to Maria-Evangelia Papadaki.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Consent for publication

Yes.

Code availability

Upon request to the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Papadaki, ME., Tzitzikas, Y. Unifying Faceted Search and Analytics over RDF Knowledge Graphs. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02076-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10115-024-02076-9

Keywords

Navigation