Abstract
Spatial keyword query is a classical query processing mode for spatio-textual data, which aims to provide users the spatio-textual objects with the highest spatial proximity and textual similarity to the given query. However, the top-k result objects obtained by using the spatial keyword query mode are often similar to each other, while users hope that the system can pick top-k typicality results from the candidate query results in order to make users understand the representative features of the candidate result set. To deal with the problem of typicality analysis and typical object selection of spatio-textual data query results, a typicality evaluation and top-k approximate selection approach is proposed. First, the approach calculates the synthetic distances on dimensions of geographic location, textual semantics, and numeric attributes between all spatio-textual objects. And then, a hybrid index structure that can simultaneously support the location, text, and numeric multi-dimension matching is presented in order to expeditiously obtain the candidate query results. According to the synthetic distances between spatio-textual objects, a Gaussian kernel probability density estimation-based method for measuring the typicality of query results is proposed. To facilitate the query result analysis and top-k typical object selection, the Tournament strategy-based and local neighborhood-based top-k typical object approximate selection algorithms are presented, respectively. The experimental results demonstrated that the text semantic relevancy measuring method for spatio-textual objects is accurate and reasonable, and the local neighborhood-based top-k typicality result approximate selection algorithm achieved both the low error rate and high execution efficiency. The source code and datasets used in this paper are available to be accessed from https://github.com/JiaShengS/Typicality_analysis/.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Chen Z, Chen L, Cong G, Jensen CS (2021) Location- and keyword-based querying of geo-textual data: a survey. VLDB J 12:603–640. https://doi.org/10.1007/s00778-021-00661-w
Werneck H, Silva NC, Viana MC, Pereira AM, Mourão F, Rocha L (2021) Points of interest recommendations: methods, evaluation, and future directions. Inf Syst 101:101789. https://doi.org/10.1016/j.is.2021.101789
Chan HK-H, Long C, Wong RC-W (2018) On generalizing collective spatial keyword queries. IEEE Trans Knowl Data Eng 30(9):1712–1726. https://doi.org/10.1109/icde.2019.00252
Chen L, Shang S, Yang C, Li J (2019) Spatial keyword search: a survey. GeoInformatica 24:85–106. https://doi.org/10.1007/s10707-019-00373-y
Dubois D, Prade H, Rossazza JP (1991) Vagueness, typicality, and uncertainty in class hierarchies. Int J Intell Syst. https://doi.org/10.1002/int.4550060205
Lee T, Park J-W, Lee S, Hwang S-W, Elnikety S, He Y (2015) Processing and optimizing main memory spatial-keyword queries. Proc VLDB Endow 9:132–143. https://doi.org/10.14778/2850583.2850588
Tao Y, Sheng C (2014) Fast nearest neighbor search with keywords. IEEE Trans Knowl Data Eng 26:878–888. https://doi.org/10.1109/TKDE.2013.66
Galán SF (2019) Comparative evaluation of region query strategies for DBSCAN clustering. Inf Sci 502:76–90. https://doi.org/10.1016/j.ins.2019.06.036
Cong G, Jensen CS, Wu D (2009) Efficient retrieval of the top-k most relevant spatial web objects. Proc VLDB Endow 2:337–348. https://doi.org/10.14778/1687627.1687666
Jinbao W, Hong G, Jianzhong L, Donghua Y (2012) An index supporting spatial approximate keyword search on disks. J Comput Res Dev 49:2142
Yang J, Zhang Y, Zhou X, Wang J, Hu H, Xing C (2019) A hierarchical framework for top-k location-aware error-tolerant keyword search. In: 2019 IEEE 35th international conference on data engineering (ICDE), pp 986–997. https://doi.org/10.1109/icde.2019.00092
Zheng B, Zheng K, Jensen CS, Hung NQV, Su H, Li G, Zhou X (2020) Answering why-not group spatial keyword queries. IEEE Trans Knowl Data Eng 32:26–39. https://doi.org/10.1109/icde.2019.00272
Zhao P, Fang H, Sheng VS, Li Z, Xu J, Wu J, Cui Z (2016) Monochromatic and bichromatic ranked reverse Boolean spatial keyword nearest neighbors search. World Wide Web 20:39–59. https://doi.org/10.1007/s11280-016-0399-8
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD conference. https://doi.org/10.1007/978-0-387-35973-1_1151
Beckmann N, Kriegel H.-P, Schneider R, Seeger B (1990) The r*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD conference. https://doi.org/10.1145/93597.98741
Rocha-Junior J.B, Gkorgkas O, Jonassen S, Nørvåg K (2011) Efficient processing of top-k spatial keyword queries. In: International symposium on spatial and temporal databases. https://doi.org/10.1007/978-3-642-22922-0_13
Vaid S, Jones CB, Joho H, Sanderson M (2005) Spatio-textual indexing for geographical search on the web. In: International symposium on spatial and temporal databases. https://doi.org/10.1007/11535331_13
Haryanto AA, Islam MS, Taniar D, Cheema MA (2018) Ig-tree: an efficient spatial keyword index for planning best path queries on road networks. World Wide Web 22:1359–1399. https://doi.org/10.1007/s11280-018-0643-5
Zhang D, Tan K-L, Tung AKH (2013) Scalable top-k spatial keyword search. In: International conference on extending database technology. https://doi.org/10.1145/2452376.2452419
Zhang C, Zhang Y, Zhang W, Lin X (2013) Inverted linear quadtree: efficient top k spatial keyword search. IEEE Trans Knowl Data Eng 28:1706–1721. https://doi.org/10.1109/ICDE.2013.6544884
Margaritis G, Anastasiadis SV Low-cost management of inverted files for online full-text search. In: Proceedings of the 18th ACM conference on information and knowledge management. https://doi.org/10.1145/1645953.1646012
Faloutsos C, Christodoulakis S (1984) Signature files: an access method for documents and its analytical performance evaluation. ACM Trans Inf Syst 2:267–288. https://doi.org/10.1145/2275.357411
Luaces D, Viqueira JRR, Pena TF, Cotos JM (2019) Leveraging bitmap indexing for subgraph searching. In: International conference on extending database technology. https://doi.org/10.5441/002/edbt.2019.06
Felipe ID, Hristidis V, Rishe N (2008) Keyword search on spatial databases. In: 2008 IEEE 24th international conference on data engineering, pp 656–665. https://doi.org/10.1109/ICDE.2008.4497474
Wu D, Cong G, Jensen CS (2012) A framework for efficient spatial web object retrieval. VLDB J 21:797–822. https://doi.org/10.1007/s00778-012-0271-0
Lu J, Lu Y, Cong G (2011) Reverse spatial and textual k nearest neighbor search. In: ACM SIGMOD conference. https://doi.org/10.1145/1989323.1989361
Zhang D, Chee YM, Mondal A, Tung AKH, Kitsuregawa M (2009) Keyword search in spatial databases: towards searching by document. In: 2009 IEEE 25th international conference on data engineering, pp 688–699. https://doi.org/10.1109/ICDE.2009.77
Zhang D, Ooi BC, Tung AKH (2010) Locating mapped resources in web 2.0. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), pp 521–532. https://doi.org/10.1109/ICDE.2010.5447897
Zheng K, Su H, Zheng B, Shang S, Xu J, Liu J, Zhou X (2015) Interactive top-k spatial keyword queries. In; 2015 IEEE 31st international conference on data engineering, pp 423–434. https://doi.org/10.1109/ICDE.2015.7113303
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. ArXiv:cs.DB/0204046. https://doi.org/10.1145/375551.375567
Wu D, Jensen CS (2016) A density-based approach to the retrieval of top-k spatial textual clusters. In: Proceedings of the 25th ACM international on conference on information and knowledge management. https://doi.org/10.1145/2983323.2983648
Gonçalves SV, Carmo Nicoletti M (2020) Using the concept of instance typicality in instance-based learning environments involving nominal attributes. Int J Hybrid Intell Syst 16:67–79. https://doi.org/10.3233/HIS-200280
Bappy JH, Paul S, Tuncel E, Roy-Chowdhury AK (2019) Exploiting typicality for selecting informative and anomalous samples in videos. IEEE Trans Image Process 28:5214–5226. https://doi.org/10.1109/TIP.2019.2910634
Moreau A, Pivert O, Smits G (2017) A typicality-based recommendation approach leveraging demographic data. In: International conference on flexible query answering systems. https://doi.org/10.1007/978-3-319-59692-1_7
Mohankumar AK, Begwani N, Singh A (2021) Diversity driven query rewriting in search advertising. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining. https://doi.org/10.1145/3447548.3467202
Mehta P, Skoutas D, Sacharidis D, Voisard A (2016) Coverage and diversity aware top-k query for spatio-temporal posts. In: Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems. https://doi.org/10.1145/2996913.2996941
Cai Z, Kalamatianos G, Fakas GJ, Mamoulis N, Papadias D (2020) Diversified spatial keyword search on RDF data. VLDB J 29:1171–1189. https://doi.org/10.1007/s00778-020-00610-z
Qian Z, Zhang L, Zhu H, Xu J (2018) Diversified spatial keyword query on topic coverage. In: APWeb/WAIM workshops. https://doi.org/10.1007/978-3-030-01298-4_3
Zhang C, Zhang Y, Zhang W, Lin X, Cheema MA, Wang X (2014) Diversified spatial keyword search on road networks. In: International conference on extending database technology. https://doi.org/10.5441/002/edbt.2014.34
Yoshikawa Y, Iwata T, Sawada H (2015) Non-linear regression for bag-of-words data via gaussian process latent variable set model. In: AAAI conference on artificial intelligence
Jing L, Ng MKP, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25:35–55. https://doi.org/10.1007/s10115-009-0256-5
Nguyen HT, Duong PH, Cambria E (2019) Learning short-text semantic similarity with word embeddings and external knowledge sources. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.07.013
Nie H, Zhou J, Wang H, Li M (2019) Word similarity computing based on hownet and synonymy thesaurus. In: Intelligent systems with applications. https://doi.org/10.1007/978-3-030-29513-4_20
Liang J, Xiao Y, Wang H, Zhang Y, Wang W (2017) Probase+: inferring missing links in conceptual taxonomies. IEEE Trans Knowl Data Eng 29:1281–1295. https://doi.org/10.1109/TKDE.2017.2653115
Wei T, Lu Y, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Syst Appl 42:2264–2275. https://doi.org/10.1016/j.eswa.2014.10.023
Azad DHK, Deepak A (2019) A new approach for query expansion using Wikipedia and wordnet. ArXiv arXiv:abs/1901.10197. https://doi.org/10.1016/j.ins.2019.04.019
Wood J, Tan P, Wang W, Arnold CW (2016) Source-lda: enhancing probabilistic topic models using prior knowledge sources. In: 2017 IEEE 33rd international conference on data engineering (ICDE), pp 411–422. https://doi.org/10.1109/ICDE.2017.99
Hua W, Wang Z, Wang H, Zheng K, Zhou X (2015) Short text understanding through lexical-semantic analysis. In: 2015 IEEE 31st international conference on data engineering, pp 495–506. https://doi.org/10.1109/ICDE.2015.7113309
Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations. https://doi.org/10.48550/arXiv.1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. ArXiv arXiv:abs/1310.4546. https://doi.org/10.5555/2999792.2999959
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP (2011) Natural language processing (almost) from scratch. ArXiv arXiv:abs/1103.0398. https://doi.org/10.1016/j.chemolab.2011.03.009
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Conference on empirical methods in natural language processing
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146. https://doi.org/10.1162/tacl_a_00051
Gunopulos D, Kollios G, Tsotras VJ, Domeniconi C (2005) Selectivity estimators for multidimensional range queries over real attributes. VLDB J 14:137–154. https://doi.org/10.1007/s00778-003-0090-4
Hua M, Pei J, Fu AW-C, Lin X, Leung H (2007) Efficiently answering top-k typicality queries on large databases. In: Very large data bases conference. https://doi.org/10.5555/1325851.1325952
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM-SIAM symposium on discrete algorithms. https://doi.org/10.5555/313559.313789
Rocha-Junior JB, Gkorgkas O, Jonassen S, Nørvåg K (2011) Efficient processing of top-k spatial keyword queries. In: International symposium on spatial and temporal databases.https://api.semanticscholar.org/CorpusID:13559844
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61772249), and partly by the General Research Project of Education Department of Liaoning Province, China (LJKZ0355).
Funding
This work was supported by the National Natural Science Foundation of China (No. 61772249).
Author information
Authors and Affiliations
Contributions
XM did conceptualization, methodology, software, data curation, writing—original draft preparation; XZ done conceptualization, validation, supervision, writing—review and editing; HH and QL contributed to supervision and writing—review and editing. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical and informed consent for data used
Written informed consent was obtained from all the participants prior to the enrollment (or for the publication) of this study (or case report).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meng, X., Zhang, X., Huo, H. et al. Top-k approximate selection for typicality query results over spatio-textual data. Knowl Inf Syst 66, 1425–1468 (2024). https://doi.org/10.1007/s10115-023-02013-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02013-2