Skip to main content
Log in

Top-k approximate selection for typicality query results over spatio-textual data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Spatial keyword query is a classical query processing mode for spatio-textual data, which aims to provide users the spatio-textual objects with the highest spatial proximity and textual similarity to the given query. However, the top-k result objects obtained by using the spatial keyword query mode are often similar to each other, while users hope that the system can pick top-k typicality results from the candidate query results in order to make users understand the representative features of the candidate result set. To deal with the problem of typicality analysis and typical object selection of spatio-textual data query results, a typicality evaluation and top-k approximate selection approach is proposed. First, the approach calculates the synthetic distances on dimensions of geographic location, textual semantics, and numeric attributes between all spatio-textual objects. And then, a hybrid index structure that can simultaneously support the location, text, and numeric multi-dimension matching is presented in order to expeditiously obtain the candidate query results. According to the synthetic distances between spatio-textual objects, a Gaussian kernel probability density estimation-based method for measuring the typicality of query results is proposed. To facilitate the query result analysis and top-k typical object selection, the Tournament strategy-based and local neighborhood-based top-k typical object approximate selection algorithms are presented, respectively. The experimental results demonstrated that the text semantic relevancy measuring method for spatio-textual objects is accurate and reasonable, and the local neighborhood-based top-k typicality result approximate selection algorithm achieved both the low error rate and high execution efficiency. The source code and datasets used in this paper are available to be accessed from https://github.com/JiaShengS/Typicality_analysis/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Algorithm 2
Fig. 8
Algorithm 3
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Chen Z, Chen L, Cong G, Jensen CS (2021) Location- and keyword-based querying of geo-textual data: a survey. VLDB J 12:603–640. https://doi.org/10.1007/s00778-021-00661-w

    Article  Google Scholar 

  2. Werneck H, Silva NC, Viana MC, Pereira AM, Mourão F, Rocha L (2021) Points of interest recommendations: methods, evaluation, and future directions. Inf Syst 101:101789. https://doi.org/10.1016/j.is.2021.101789

    Article  Google Scholar 

  3. Chan HK-H, Long C, Wong RC-W (2018) On generalizing collective spatial keyword queries. IEEE Trans Knowl Data Eng 30(9):1712–1726. https://doi.org/10.1109/icde.2019.00252

    Article  Google Scholar 

  4. Chen L, Shang S, Yang C, Li J (2019) Spatial keyword search: a survey. GeoInformatica 24:85–106. https://doi.org/10.1007/s10707-019-00373-y

    Article  Google Scholar 

  5. Dubois D, Prade H, Rossazza JP (1991) Vagueness, typicality, and uncertainty in class hierarchies. Int J Intell Syst. https://doi.org/10.1002/int.4550060205

    Article  Google Scholar 

  6. Lee T, Park J-W, Lee S, Hwang S-W, Elnikety S, He Y (2015) Processing and optimizing main memory spatial-keyword queries. Proc VLDB Endow 9:132–143. https://doi.org/10.14778/2850583.2850588

    Article  Google Scholar 

  7. Tao Y, Sheng C (2014) Fast nearest neighbor search with keywords. IEEE Trans Knowl Data Eng 26:878–888. https://doi.org/10.1109/TKDE.2013.66

    Article  Google Scholar 

  8. Galán SF (2019) Comparative evaluation of region query strategies for DBSCAN clustering. Inf Sci 502:76–90. https://doi.org/10.1016/j.ins.2019.06.036

    Article  MathSciNet  Google Scholar 

  9. Cong G, Jensen CS, Wu D (2009) Efficient retrieval of the top-k most relevant spatial web objects. Proc VLDB Endow 2:337–348. https://doi.org/10.14778/1687627.1687666

    Article  Google Scholar 

  10. Jinbao W, Hong G, Jianzhong L, Donghua Y (2012) An index supporting spatial approximate keyword search on disks. J Comput Res Dev 49:2142

    Google Scholar 

  11. Yang J, Zhang Y, Zhou X, Wang J, Hu H, Xing C (2019) A hierarchical framework for top-k location-aware error-tolerant keyword search. In: 2019 IEEE 35th international conference on data engineering (ICDE), pp 986–997. https://doi.org/10.1109/icde.2019.00092

  12. Zheng B, Zheng K, Jensen CS, Hung NQV, Su H, Li G, Zhou X (2020) Answering why-not group spatial keyword queries. IEEE Trans Knowl Data Eng 32:26–39. https://doi.org/10.1109/icde.2019.00272

    Article  Google Scholar 

  13. Zhao P, Fang H, Sheng VS, Li Z, Xu J, Wu J, Cui Z (2016) Monochromatic and bichromatic ranked reverse Boolean spatial keyword nearest neighbors search. World Wide Web 20:39–59. https://doi.org/10.1007/s11280-016-0399-8

    Article  Google Scholar 

  14. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD conference. https://doi.org/10.1007/978-0-387-35973-1_1151

  15. Beckmann N, Kriegel H.-P, Schneider R, Seeger B (1990) The r*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD conference. https://doi.org/10.1145/93597.98741

  16. Rocha-Junior J.B, Gkorgkas O, Jonassen S, Nørvåg K (2011) Efficient processing of top-k spatial keyword queries. In: International symposium on spatial and temporal databases. https://doi.org/10.1007/978-3-642-22922-0_13

  17. Vaid S, Jones CB, Joho H, Sanderson M (2005) Spatio-textual indexing for geographical search on the web. In: International symposium on spatial and temporal databases. https://doi.org/10.1007/11535331_13

  18. Haryanto AA, Islam MS, Taniar D, Cheema MA (2018) Ig-tree: an efficient spatial keyword index for planning best path queries on road networks. World Wide Web 22:1359–1399. https://doi.org/10.1007/s11280-018-0643-5

    Article  Google Scholar 

  19. Zhang D, Tan K-L, Tung AKH (2013) Scalable top-k spatial keyword search. In: International conference on extending database technology. https://doi.org/10.1145/2452376.2452419

  20. Zhang C, Zhang Y, Zhang W, Lin X (2013) Inverted linear quadtree: efficient top k spatial keyword search. IEEE Trans Knowl Data Eng 28:1706–1721. https://doi.org/10.1109/ICDE.2013.6544884

    Article  Google Scholar 

  21. Margaritis G, Anastasiadis SV Low-cost management of inverted files for online full-text search. In: Proceedings of the 18th ACM conference on information and knowledge management. https://doi.org/10.1145/1645953.1646012

  22. Faloutsos C, Christodoulakis S (1984) Signature files: an access method for documents and its analytical performance evaluation. ACM Trans Inf Syst 2:267–288. https://doi.org/10.1145/2275.357411

    Article  Google Scholar 

  23. Luaces D, Viqueira JRR, Pena TF, Cotos JM (2019) Leveraging bitmap indexing for subgraph searching. In: International conference on extending database technology. https://doi.org/10.5441/002/edbt.2019.06

  24. Felipe ID, Hristidis V, Rishe N (2008) Keyword search on spatial databases. In: 2008 IEEE 24th international conference on data engineering, pp 656–665. https://doi.org/10.1109/ICDE.2008.4497474

  25. Wu D, Cong G, Jensen CS (2012) A framework for efficient spatial web object retrieval. VLDB J 21:797–822. https://doi.org/10.1007/s00778-012-0271-0

    Article  Google Scholar 

  26. Lu J, Lu Y, Cong G (2011) Reverse spatial and textual k nearest neighbor search. In: ACM SIGMOD conference. https://doi.org/10.1145/1989323.1989361

  27. Zhang D, Chee YM, Mondal A, Tung AKH, Kitsuregawa M (2009) Keyword search in spatial databases: towards searching by document. In: 2009 IEEE 25th international conference on data engineering, pp 688–699. https://doi.org/10.1109/ICDE.2009.77

  28. Zhang D, Ooi BC, Tung AKH (2010) Locating mapped resources in web 2.0. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), pp 521–532. https://doi.org/10.1109/ICDE.2010.5447897

  29. Zheng K, Su H, Zheng B, Shang S, Xu J, Liu J, Zhou X (2015) Interactive top-k spatial keyword queries. In; 2015 IEEE 31st international conference on data engineering, pp 423–434. https://doi.org/10.1109/ICDE.2015.7113303

  30. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. ArXiv:cs.DB/0204046. https://doi.org/10.1145/375551.375567

  31. Wu D, Jensen CS (2016) A density-based approach to the retrieval of top-k spatial textual clusters. In: Proceedings of the 25th ACM international on conference on information and knowledge management. https://doi.org/10.1145/2983323.2983648

  32. Gonçalves SV, Carmo Nicoletti M (2020) Using the concept of instance typicality in instance-based learning environments involving nominal attributes. Int J Hybrid Intell Syst 16:67–79. https://doi.org/10.3233/HIS-200280

    Article  Google Scholar 

  33. Bappy JH, Paul S, Tuncel E, Roy-Chowdhury AK (2019) Exploiting typicality for selecting informative and anomalous samples in videos. IEEE Trans Image Process 28:5214–5226. https://doi.org/10.1109/TIP.2019.2910634

    Article  MathSciNet  Google Scholar 

  34. Moreau A, Pivert O, Smits G (2017) A typicality-based recommendation approach leveraging demographic data. In: International conference on flexible query answering systems. https://doi.org/10.1007/978-3-319-59692-1_7

  35. Mohankumar AK, Begwani N, Singh A (2021) Diversity driven query rewriting in search advertising. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining. https://doi.org/10.1145/3447548.3467202

  36. Mehta P, Skoutas D, Sacharidis D, Voisard A (2016) Coverage and diversity aware top-k query for spatio-temporal posts. In: Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems. https://doi.org/10.1145/2996913.2996941

  37. Cai Z, Kalamatianos G, Fakas GJ, Mamoulis N, Papadias D (2020) Diversified spatial keyword search on RDF data. VLDB J 29:1171–1189. https://doi.org/10.1007/s00778-020-00610-z

    Article  Google Scholar 

  38. Qian Z, Zhang L, Zhu H, Xu J (2018) Diversified spatial keyword query on topic coverage. In: APWeb/WAIM workshops. https://doi.org/10.1007/978-3-030-01298-4_3

  39. Zhang C, Zhang Y, Zhang W, Lin X, Cheema MA, Wang X (2014) Diversified spatial keyword search on road networks. In: International conference on extending database technology. https://doi.org/10.5441/002/edbt.2014.34

  40. Yoshikawa Y, Iwata T, Sawada H (2015) Non-linear regression for bag-of-words data via gaussian process latent variable set model. In: AAAI conference on artificial intelligence

  41. Jing L, Ng MKP, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25:35–55. https://doi.org/10.1007/s10115-009-0256-5

    Article  Google Scholar 

  42. Nguyen HT, Duong PH, Cambria E (2019) Learning short-text semantic similarity with word embeddings and external knowledge sources. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.07.013

    Article  Google Scholar 

  43. Nie H, Zhou J, Wang H, Li M (2019) Word similarity computing based on hownet and synonymy thesaurus. In: Intelligent systems with applications. https://doi.org/10.1007/978-3-030-29513-4_20

  44. Liang J, Xiao Y, Wang H, Zhang Y, Wang W (2017) Probase+: inferring missing links in conceptual taxonomies. IEEE Trans Knowl Data Eng 29:1281–1295. https://doi.org/10.1109/TKDE.2017.2653115

    Article  Google Scholar 

  45. Wei T, Lu Y, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Syst Appl 42:2264–2275. https://doi.org/10.1016/j.eswa.2014.10.023

    Article  Google Scholar 

  46. Azad DHK, Deepak A (2019) A new approach for query expansion using Wikipedia and wordnet. ArXiv arXiv:abs/1901.10197. https://doi.org/10.1016/j.ins.2019.04.019

  47. Wood J, Tan P, Wang W, Arnold CW (2016) Source-lda: enhancing probabilistic topic models using prior knowledge sources. In: 2017 IEEE 33rd international conference on data engineering (ICDE), pp 411–422. https://doi.org/10.1109/ICDE.2017.99

  48. Hua W, Wang Z, Wang H, Zheng K, Zhou X (2015) Short text understanding through lexical-semantic analysis. In: 2015 IEEE 31st international conference on data engineering, pp 495–506. https://doi.org/10.1109/ICDE.2015.7113309

  49. Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations. https://doi.org/10.48550/arXiv.1301.3781

  50. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. ArXiv arXiv:abs/1310.4546. https://doi.org/10.5555/2999792.2999959

  51. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP (2011) Natural language processing (almost) from scratch. ArXiv arXiv:abs/1103.0398. https://doi.org/10.1016/j.chemolab.2011.03.009

  52. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Conference on empirical methods in natural language processing

  53. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146. https://doi.org/10.1162/tacl_a_00051

    Article  Google Scholar 

  54. Gunopulos D, Kollios G, Tsotras VJ, Domeniconi C (2005) Selectivity estimators for multidimensional range queries over real attributes. VLDB J 14:137–154. https://doi.org/10.1007/s00778-003-0090-4

    Article  Google Scholar 

  55. Hua M, Pei J, Fu AW-C, Lin X, Leung H (2007) Efficiently answering top-k typicality queries on large databases. In: Very large data bases conference. https://doi.org/10.5555/1325851.1325952

  56. Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM-SIAM symposium on discrete algorithms. https://doi.org/10.5555/313559.313789

  57. Rocha-Junior JB, Gkorgkas O, Jonassen S, Nørvåg K (2011) Efficient processing of top-k spatial keyword queries. In: International symposium on spatial and temporal databases.https://api.semanticscholar.org/CorpusID:13559844

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61772249), and partly by the General Research Project of Education Department of Liaoning Province, China (LJKZ0355).

Funding

This work was supported by the National Natural Science Foundation of China (No. 61772249).

Author information

Authors and Affiliations

Authors

Contributions

XM did conceptualization, methodology, software, data curation, writing—original draft preparation; XZ done conceptualization, validation, supervision, writing—review and editing; HH and QL contributed to supervision and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Xiangfu Meng.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical and informed consent for data used

Written informed consent was obtained from all the participants prior to the enrollment (or for the publication) of this study (or case report).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meng, X., Zhang, X., Huo, H. et al. Top-k approximate selection for typicality query results over spatio-textual data. Knowl Inf Syst 66, 1425–1468 (2024). https://doi.org/10.1007/s10115-023-02013-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02013-2

Keywords

Navigation