Cross-Lingual Transfer Learning in Drug-Related Information Extraction from User-Generated Texts

Sakhovskiy, A. S.; Tutubalina, E. V.

doi:10.1134/S036176882307006X

Cross-Lingual Transfer Learning in Drug-Related Information Extraction from User-Generated Texts

Published: 07 December 2023

Volume 49, pages 590–595, (2023)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

153 Accesses
Explore all metrics

Abstract

Aggregating knowledge about drug, disease, and drug reaction entities across a broader range of domains and languages is critical for information extraction applications. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for biomedical named entity recognition (NER) and multi-label sentence classification. We investigate the role of transfer learning strategies between two English corpora and a novel annotated corpus of Russian reviews about drug therapy. In these corpora, labels for sentences indicate health-related issues or their absence. Sentences that belong to a certain class are additionally labeled at the entity level to identify fine-grained subtypes such as drug names, drug indications, and drug reactions. The evaluation results demonstrate that the BERT training on Russian and English raw reviews (5M in total) provides the best transfer capabilities for adverse drug reactions detection task on the Russian data. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the classification task, our EnRuDR-BERT model achieved the macro F1 score of 70%, gaining 8.64% over the score of a general-domain BERT model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Article Open access 17 December 2021

Drug knowledge discovery via multi-task learning and pre-trained models

Article Open access 16 November 2021

Notes

REFERENCES

Huang, C.C. and Lu, Z., Community challenges in biomedical text mining over 10 years: Success, failure and the future, Briefings Bioinf., 2016, vol. 17, no. 1, pp. 132–144.
Article Google Scholar
Vaswani, A., Shazeer, N., et al., Attention is all you need, Proc. 31st Int. Conf. Neural Information Processing Systems, 2017, pp. 6000–6010.
Devlin, J., Chang, M., et al., BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, vol. 1, pp. 4171–4186.
Conneau, A. and Lample, G., Cross-lingual language model pretraining, Adv. Neural Inf. Process. Syst., 2019, vol. 32, pp. 7059–7069.
Google Scholar
Lample, G., Conneau, A., et al., Unsupervised machine translation using monolingual corpora only, Proc. Int. Conf. Learning Representations, 2018.
Artetxe, M. and Schwenk, H., Margin-based parallel corpus mining with multilingual sentence embeddings, Proc. 57th Annu. Meet. Association for Computational Linguistics, 2019, pp. 3197–3203.
Tutubalina, E., Alimova, I., et al., The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews, Bioinformatics, 2021, vol. 37, no. 2, pp. 243–249.
Article Google Scholar
Alvaro, N., Miyao, Y., and Collier, N., TwiMed: Twitter and PubMed comparable corpus of drugs, diseases, symptoms, and their relations, JMIR Public Health Surveill., 2017, vol. 3, no. 2.
Zolnoori, M., et al., A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications, J. Biomed. Inf., 2019, vol. 90.
Karimi, S., Metke-Jimenez, A., et al., Cadec: A corpus of adverse drug event annotations, J. Biomed. Inf., 2015, vol. 55, pp. 73–81.
Article Google Scholar
Sarker, A., Belousov, M., et al., Data and systems for medication-related text classification and concept normalization from Twitter: Insights from the Social Media Mining for Health (SMM4H)-2017 shared task, J. Am. Med. Inf. Assoc., 2018, vol. 25, no. 10, pp. 1274–1283.
Article Google Scholar
Moreno, I., Boldrini, E., et al., Drugsemantics: A corpus for named entity recognition in Spanish summaries of product characteristics, J. Biomed. Inf., 2017, vol. 72, pp. 8–22.
Article Google Scholar
Névéol, A., Anderson, R.N., et al., CLEF eHealth 2017 multilingual information extraction task overview: ICD10 coding of death certificates in English and French, CEUR Workshop Proc., 2017, vol. 1866.
Névéol, A., et al., CLEF eHealth 2018 multilingual information extraction task overview: ICD10 coding of death certificates in French, Hungarian and Italian, CEUR Workshop Proc., 2018, vol. 2125.
Shelmanov, A.O., Smirnov, I.V., and Vishneva, E.A., Information extraction from clinical texts in Russian, Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue,” 2015, no. 14, pp. 560–572.
Miftahutdinov, Z., Sakhovskiy, A., and Tutubalina, E., KFU NLP team at SMM4H 2020 tasks: Cross-lingual transfer learning with pretrained language models for drug reactions, Proc. 5th Social Media Mining for Health Applications Workshop and Shared Task, 2020, pp. 51–56.
Gusev, A., Kuznetsova, A., et al., Bert implementation for detecting adverse drug effects mentions in Russian, Proc. 5th Social Media Mining for Health Applications Workshop and Shared Task, 2020, pp. 46–50.
Alimova, I., Tutubalina, E., et al., A machine learning approach to classification of drug reviews in Russian, Proc. Ivannikov ISPRAS Open Conf., 2017, pp. 64–69.
Klein, A., Alimova, I., et al., Overview of the fifth social media mining for health applications (#SMM4H) shared tasks at COLING 2020, Proc. 5th Social Media Mining for Health Applications Workshop and Shared Task, 2020, pp. 27–36.
Magge, A., Klein, A., et al., Overview of the sixth social media mining for health applications (#SMM4H) shared tasks at NAACL 2021, Proc. 6th Social Media Mining for Health Workshop and Shared Task, 2021, pp. 21–32.
Kuratov, Y. and Arkhipov, M., Adaptation of deep bidirectional multilingual transformers for Russian language, 2019.
Tutubalina, E.V., Miftahutdinov, Z.Sh., et al., Using semantic analysis of texts for the identification of drugs with similar therapeutic effects, Russ. Chem. Bull., 2017, vol. 66, no. 11, pp. 2180–2189.
Article Google Scholar

Download references

Funding

This work was supported by the Russian Science Foundation, project no. 23-11-00358.

Author information

Authors and Affiliations

Sber AI, Kutuzovskii pr. 32, 121170, Moscow, Russia
A. S. Sakhovskiy & E. V. Tutubalina
Kazan Federal University, ul. Kremlevskaya 18, 420008, Kazan, Russia
A. S. Sakhovskiy & E. V. Tutubalina
National Research University Higher School of Economics, ul. Myasnitskaya 20, 101000, Moscow, Russia
E. V. Tutubalina

Authors

A. S. Sakhovskiy
View author publications
You can also search for this author in PubMed Google Scholar
E. V. Tutubalina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. S. Sakhovskiy or E. V. Tutubalina.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sakhovskiy, A.S., Tutubalina, E.V. Cross-Lingual Transfer Learning in Drug-Related Information Extraction from User-Generated Texts. Program Comput Soft 49, 590–595 (2023). https://doi.org/10.1134/S036176882307006X

Download citation

Received: 06 June 2023
Revised: 12 June 2023
Accepted: 15 June 2023
Published: 07 December 2023
Issue Date: December 2023
DOI: https://doi.org/10.1134/S036176882307006X

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions