Abstract
In recent years, the exploration of knowledge in large-scale human mobility has gained significant attention. In order to achieve a semantic understanding of human behavior and uncover patterns in large-scale human mobility, Named Entity Recognition (NER) is a crucial technology. The rapid advancements in IoT and CPS technologies have led to the collection of massive human mobility data from various sources. Therefore, there’s a need for Cross-domain NER which can transfer entity information from the source domain to automatically identify and classify entities in different target domain texts. In the situation of the data-poor, how could we transfer human mobility knowledge over time and space is particularly significant, therefore this paper proposes an Adaptive Text Sequence Enhancement Module (at-SAM) to help the model enhance the association between entities in sentences in the data-poor target domains. This paper also proposes a Predicted Label-Guided Dual Sequence Aware Information Module (Dual-SAM) to improve the transferability of label information. Experiments were conducted in domains that contain hidden knowledge about human mobility, the results show that this method can transfer task knowledge between multiple different domains in the data-poor scenarios and achieve SOTA performance.
Similar content being viewed by others
Availability of data and materials
The data that support the findings of this study are openly available at https://github.com/jinpeng01/LANER/tree/main/ner_data.
References
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26
Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) Ernie: enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129
Cheng P, Erk K (2020) Attending to entities for better text understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 7554–7561
Cowan B, Zethelius S, Luk B, Baras T, Ukarde P, Zhang D (2015) Named entity recognition in travel-related search queries. In: Proceedings of the AAAI conference on artificial intelligence, vol 29, pp 3935–3941
Brandsen A, Verberne S, Lambers K, Wansleeben M (2022) Can bert dig it? named entity recognition for information retrieval in the archaeology domain. J Comput Cult Herit (JOCCH) 15(3):1–18
Khademi ME, Fakhredanesh M (2020) Persian automatic text summarization based on named entity recognition. Iran J Sci Technol, Trans Electr Eng, 1–12
Mollá D, Van Zaanen M, Smith D (2006) Named entity recognition for question answering. In: Proceedings of the Australasian language technology workshop 2006, pp 51–8
Li Z, Qu D, Xie C, Zhang W, Li Y (2020) Language model pre-training method in machine translation based on named entity recognition. Int J Artif Intell Tools 29(07n08):2040021
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
Radford A, Narasimhan K, Salimans T, Sutskever I et al (2018) Improving language understanding by generative pre-training
Baevski A, Edunov S, Liu Y, Zettlemoyer L, Auli M (2019) Cloze-driven pretraining of self-attention networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5360–5369
Liu T, Yao J-G, Lin C-Y (2019) Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5301–5307
Jie Z, Lu W (2019) Dependency-guided lstm-crf for named entity recognition. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3862–3872
Xia C, Zhang C, Yang T, Li Y, Du N, Wu X, Fan W, Ma F, Yu P (2019) Multi-grained named entity recognition. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1430–1440
Liu Y, Meng F, Zhang J, Xu J, Chen Y, Zhou J (2019) GCDT: a global context enhanced deep transition architecture for sequence labeling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2431–2441
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. Journal of machine learning research 12(ARTICLE):2493–2537
Xie J, Yang Z, Neubig G, Smith NA, Carbonell J (2018) Neural cross-lingual named entity recognition with minimal resources. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 369–379
Liu Z, Winata GI, Fung P (2020) Zero-resource cross-domain named entity recognition. In: Proceedings of the 5th workshop on representation learning for NLP, pp 1–6
Liu Z, Xu Y, Yu T, Dai W, Ji Z, Cahyawijaya S, Madotto A, Fung P (2021) Crossner: evaluating cross-domain named entity recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13452–13460
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Sharnagat R (2014) Named entity recognition: a literature survey. Center For Indian Language Technology, 1–27
Hu J, Zhao H, Guo D, Wan X, Chang T-H (2022) A label-aware autoregressive framework for cross-domain ner. In: Findings of the association for computational linguistics: NAACL 2022, pp 2222–2232
Jia C, Liang X, Zhang Y (2019) Cross-domain ner using cross-domain language modeling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2464–2474
Wang Z, Qu Y, Chen L, Shen J, Zhang W, Zhang S, Gao Y, Gu G, Chen K, Yu Y (2018) Label-aware double transfer learning for cross-specialty medical named entity recognition. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp 1–15
Wang J, Kulkarni M, Preoţiuc-Pietro D (2020) Multi-domain named entity recognition with genre-aware and agnostic inference. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8476–8488
Liu Z, Winata GI, Xu P, Fung P (2020) Coach: a coarse-to-fine approach for cross-domain slot filling. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 19–25
Sachan DS, Xie P, Sachan M, Xing EP (2018) Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition. In: Machine learning for healthcare conference. PMLR, pp 383–402
Jia C, Zhang Y (2020) Multi-cell compositional lstm for ner domain adaptation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5906–5917
OpenAI: introducing ChatGPT (2022). https://openai.com/blog/chatgpt. Accessed 03 April 2023
Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77
Liu J, Pasupat P, Wang Y, Cyphers S, Glass J (2013) Query understanding enhanced by hierarchical parsing structures. In: 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, pp 72–77
Liu J, Pasupat P, Cyphers S, Glass J (2013) Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 8386–8390
Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649
Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X (2021) A unified generative framework for various NER subtasks. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 5808–5822
Wu Y, Jiang M, Lei J, Xu H (2015) Named entity recognition in chinese clinical text using deep neural network. Stud Health Technol Inform 216:624
Borthwick AE (1999) A maximum entropy approach to named entity recognition. New York University, ???
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7871–7880
Tang M, Zhang P, He Y, Xu Y, Chao C, Xu H (2022) Dosea: a domain-specific entity-aware framework for cross-domain named entity recogition. In: Proceedings of the 29th international conference on computational linguistics, pp 2147–2156
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 142–147
Funding
This study was funded by National Natural Science Foundation of China(No. 62272045).
Author information
Authors and Affiliations
Contributions
Yutong Jiang: Conceptualization ,Investigation, Methodology, Visualization, Writing, Resources. Fusheng Jin: Funding acquisition, Supervision, Conceptualization, Data curation, Methodology, Writing review & editing. Mengnan Chen: Data curation, Methodology, Validation, Visualization, Writing. Guoming Liu: Writing, Validation. He Pang: Writing, Supervision, review & editing. Ye Yuan: Writing review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no conflict of interest regarding the publication of this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, Y., Jin, F., Chen, M. et al. Cross-domain NER in the data-poor scenarios for human mobility knowledge. Geoinformatica (2024). https://doi.org/10.1007/s10707-024-00513-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10707-024-00513-z