-
Anisotropic span embeddings and the negative impact of higher-order inference for coreference resolution: An empirical analysis Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-25 Feng Hou, Ruili Wang, See-Kiong Ng, Fangyi Zhu, Michael Witbrock, Steven F. Cahan, Lily Chen, Xiaoyun Jia
Coreference resolution is the task of identifying and clustering mentions that refer to the same entity in a document. Based on state-of-the-art deep learning approaches, end-to-end coreference resolution considers all spans as candidate mentions and tackles mention detection and coreference resolution simultaneously. Recently, researchers have attempted to incorporate document-level context using
-
Automated annotation of parallel bible corpora with cross-lingual semantic concordance Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-25 Jens Dörpinghaus
Here we present an improved approach for automated annotation of New Testament corpora with cross-lingual semantic concordance based on Strong’s numbers. Based on already annotated texts, they provide references to the original Greek words. Since scientific editions and translations of biblical texts are often not available for scientific purposes and are rarely freely available, there is a lack of
-
How do control tokens affect natural language generation tasks like text simplification Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-23 Zihao Li, Matthew Shardlow
Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the
-
Emerging trends: When can users trust GPT, and when should they intervene? Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-16 Kenneth Church
Usage of large language models and chat bots will almost surely continue to grow, since they are so easy to use, and so (incredibly) credible. I would be more comfortable with this reality if we encouraged more evaluations with humans-in-the-loop to come up with a better characterization of when the machine can be trusted and when humans should intervene. This article will describe a homework assignment
-
Lightweight transformers for clinical natural language processing Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-12 Omid Rohanian, Mohammadmahdi Nouriborji, Hannah Jauncey, Samaneh Kouchaki, Farhad Nooralahzadeh, ISARIC Clinical Characterisation Group, Lei Clifton, Laura Merson, David A. Clifton
Specialised pre-trained language models are becoming more frequent in Natural language Processing (NLP) since they can potentially outperform models trained on generic texts. BioBERT (Sanh et al., Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv: 1910.01108, 2019) and BioClinicalBERT (Alsentzer et al., Publicly available clinical bert embeddings. In
-
Actionable conversational quality indicators for improving task-oriented dialog systems Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-09 Michael Higgins, Dominic Widdows, Beth Ann Hockey, Akshay Hazare, Kristen Howell, Gwen Christian, Sujit Mathi, Chris Brew, Andrew Maurer, George Bonev, Matthew Dunn, Joseph Bradley
Automatic dialog systems have become a mainstream part of online customer service. Many such systems are built, maintained, and improved by customer service specialists, rather than dialog systems engineers and computer programmers. As conversations between people and machines become commonplace, it is critical to understand what is working, what is not, and what actions can be taken to reduce the
-
A year’s a long time in generative AI Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-08 Robert Dale
A lot has happened since OpenAI released ChatGPT to the public in November 2022. We review how things unfolded over the course of the year, tracking significant events and announcements from the tech giants leading the generative AI race and from other players of note; along the way we note the wider impacts of the technology’s progress.
-
Preface: Special issue on NLP approaches to offensive content online Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-12-06 Marcos Zampieri, Isabelle Augenstein, Siddharth Krishnan, Joshua Melton, Preslav Nakov
We are delighted to present the Special Issue on NLP Approaches to Offensive Content Online published in the Journal of Natural Language Engineering issue 29.6. We are happy to have received a total of 26 submissions to the special issue evidencing the interest of the NLP community in this topic. Our guest editorial board comprised of international experts in the field has worked hard to review all
-
OffensEval 2023: Offensive language identification in the age of Large Language Models Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-12-06 Marcos Zampieri, Sara Rosenthal, Preslav Nakov, Alphaeus Dmonte, Tharindu Ranasinghe
The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy
-
Data-to-text generation using conditional generative adversarial with enhanced transformer Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-11-28 Elham Seifossadat, Hossein Sameti
In this paper, we propose an enhanced version of the vanilla transformer for data-to-text generation and then use it as the generator of a conditional generative adversarial model to improve the semantic quality and diversity of output sentences. Specifically, by adding a diagonal mask matrix to the attention scores of the encoder and using the history of the attention weights in the decoder, this
-
Abstractive summarization with deep reinforcement learning using semantic similarity rewards Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-10-31 Figen Beken Fikri, Kemal Oflazer, Berrin Yanıkoğlu
ive summarization is an approach to document summarization that is not limited to selecting sentences from the document but can generate new sentences as well. We address the two main challenges in abstractive summarization: how to evaluate the performance of a summarization model and what is a good training objective. We first introduce new evaluation measures based on the semantic similarity of the
-
Neural Arabic singular-to-plural conversion using a pretrained Character-BERT and a fused transformer Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-10-11 Azzam Radman, Mohammed Atros, Rehab Duwairi
Morphological re-inflection generation is one of the most challenging tasks in the natural language processing (NLP) domain, especially with morphologically rich, low-resource languages like Arabic. In this research, we investigate the ability of transformer-based models in the singular-to-plural Arabic noun conversion task. We start with pretraining a Character-BERT model on a masked language modeling
-
Perceptional and actional enrichment for metaphor detection with sensorimotor norms Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-09-20 Mingyu Wan, Qi Su, Kathleen Ahrens, Chu-Ren Huang
Understanding the nature of meaning and its extensions (with metaphor as one typical kind) has been one core issue in figurative language study since Aristotle’s time. This research takes a computational cognitive perspective to model metaphor based on the assumption that meaning is perceptual, embodied, and encyclopedic. We model word meaning representation for metaphor detection with embodiment information
-
Emerging trends: Smooth-talking machines Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-09-11 Kenneth Ward Church, Richard Yue
Large language models (LLMs) have achieved amazing successes. They have done well on standardized tests in medicine and the law. That said, the bar has been raised so high that it could take decades to make good on expectations. To buy time for this long-term research program, the field needs to identify some good short-term applications for smooth-talking machines that are more fluent than trustworthy
-
Improved conversational recommender system based on dialog context Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-09-08 Xiaoyi Wang, Jie Liu, Jianyong Duan
Conversational recommender system (CRS) needs to be seamlessly integrated between the two modules of recommendation and dialog, aiming to recommend high-quality items to users through multiple rounds of interactive dialogs. Items can typically refer to goods, movies, news, etc. Through this form of interactive dialog, users can express their preferences in real time, and the system can fully understand
-
A study towards contextual understanding of toxicity in online conversations Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-30 Pranava Madhyastha, Antigoni Founta, Lucia Specia
Identifying and annotating toxic online content on social media platforms is an extremely challenging problem. Work that studies toxicity in online content has predominantly focused on comments as independent entities. However, comments on social media are inherently conversational, and therefore, understanding and judging the comments fundamentally requires access to the context in which they are
-
Improving short text classification with augmented data using GPT-3 Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-25 Salvador V. Balkus, Donghui Yan
GPT-3 is a large-scale natural language model developed by OpenAI that can perform many different tasks, including topic classification. Although researchers claim that it requires only a small number of in-context examples to learn a task, in practice GPT-3 requires these training examples to be either of exceptional quality or a higher quantity than easily created by hand. To address this issue,
-
Creating a large-scale diachronic corpus resource: Automated parsing in the Greek papyri (and beyond) Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-15 Alek Keersmaekers, Toon Van Hal
This paper explores how to syntactically parse Ancient Greek texts automatically and maps ways of fruitfully employing the results of such an automated analysis. Special attention is given to documentary papyrus texts, a large diachronic corpus of non-literary Greek, which presents a unique set of challenges to tackle. By making use of the Stanford Graph-Based Neural Dependency Parser, we show that
-
PGST: A Persian gender style transfer method Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-15 Reza Khanmohammadi, Seyed Abolghasem Mirroshandel
Recent developments in text style transfer have led this field to be more highlighted than ever. There are many challenges associated with transferring the style of input text such as fluency and content preservation that need to be addressed. In this research, we present PGST, a novel Persian text style transfer approach in the gender domain, composed of different constituent elements. Established
-
Toward a shallow discourse parser for Turkish Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-11 Ferhat Kutlu, Deniz Zeyrek, Murathan Kurfalı
One of the most interesting aspects of natural language is how texts cohere, which involves the pragmatic or semantic relations that hold between clauses (addition, cause-effect, conditional, similarity), referred to as discourse relations. A focus on the identification and classification of discourse relations appears as an imperative challenge to be resolved to support tasks such as text summarization
-
Emojis as anchors to detect Arabic offensive language and hate speech Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-10 Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets—analyzing key cultural differences. We observed a constant
-
How you describe procurement calls matters: Predicting outcome of public procurement using call descriptions Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-10 Utku Umur Acikalin, Mustafa Kaan Gorgun, Mucahid Kutlu, Bedri Kamil Onur Tas
A competitive and cost-effective public procurement (PP) process is essential for the effective use of public resources. In this work, we explore whether descriptions of procurement calls can be used to predict their outcomes. In particular, we focus on predicting four well-known economic metrics: (i) the number of offers, (ii) whether only a single offer is received, (iii) whether a foreign firm is
-
SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-03 Xuanyu Su, Yansong Li, Paula Branco, Diana Inkpen
Anti-Asian speech during the COVID-19 pandemic has been a serious problem with severe consequences. A hate speech wave swept social media platforms. The timely detection of Anti-Asian COVID-19-related hate speech is of utmost importance, not only to allow the application of preventive mechanisms but also to anticipate and possibly prevent other similar discriminatory situations. In this paper, we address
-
Masked transformer through knowledge distillation for unsupervised text style transfer Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-25 Arthur Scalercio, Aline Paes
Text style transfer (TST) aims at automatically changing a text’s stylistic features, such as formality, sentiment, authorial style, humor, and complexity, while still trying to preserve its content. Although the scientific community has investigated TST since the 1980s, it has recently regained attention by adopting deep unsupervised strategies to address the challenge of training without parallel
-
Navigating the text generation revolution: Traditional data-to-text NLG companies and the rise of ChatGPT Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-19 Robert Dale
Since the release of ChatGPT at the end of November 2022, generative AI has been talked about endlessly in both the technical press and the mainstream media. Large language model technology has been heralded as many things: the disruption of the search engine, the end of the student essay, the bringer of disinformation … but what does it mean for commercial providers of earlier iterations of natural
-
Assessment of the E3C corpus for the recognition of disorders in clinical texts Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-18 Roberto Zanoli, Alberto Lavelli, Daniel Verdi do Amarante, Daniele Toti
Disorder named entity recognition (DNER) is a fundamental task of biomedical natural language processing, which has attracted plenty of attention. This task consists in extracting named entities of disorders such as diseases, symptoms, and pathological functions from unstructured text. The European Clinical Case Corpus (E3C) is a freely available multilingual corpus (English, French, Italian, Spanish
-
Describe the house and I will tell you the price: House price prediction with textual description data Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-18 Hanxiang Zhang, Yansong Li, Paula Branco
House price prediction is an important problem that could benefit home buyers and sellers. Traditional models for house price prediction use numerical attributes such as the number of rooms but disregard the house description text. The recent developments in text processing suggest these can be valuable attributes, which motivated us to use house descriptions. This paper focuses on the house asking/advertising
-
Korean named entity recognition based on language-specific features Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-29 Yige Chen, KyungTae Lim, Jungyeul Park
In this paper, we propose a novel way of improving named entity recognition (NER) in the Korean language using its language-specific features. While the field of NER has been studied extensively in recent years, the mechanism of efficiently recognizing named entities (NEs) in Korean has hardly been explored. This is because the Korean language has distinct linguistic properties that present challenges
-
Linguistically aware evaluation of coreference resolution from the perspective of higher-level applications Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-19 Voldemaras Žitkus, Rita Butkienė, Rimantas Butleris
Coreference resolution is an important part of natural language processing used in machine translation, semantic search, and various other information retrieval and understanding systems. One of the challenges in this field is an evaluation of resolution approaches. There are many different metrics proposed, but most of them rely on certain assumptions, like equivalence between different mentions of
-
A resampling-based method to evaluate NLI models Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-09 Felipe de Souza Salvatore, Marcelo Finger, Roberto Hirata, Alexandre G. Patriota
The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known
-
Focusing on potential named entities during active label acquisition Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-06 Ali Osman Berk Şapcı, Hasan Kemik, Reyyan Yeniterzi, Oznur Tastan
Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the
-
Cluster-based ensemble learning model for improving sentiment classification of Arabic documents Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-01 Rana Husni Al Mahmoud, Bassam H. Hammo, Hossam Faris
This article reports on designing and implementing a multiclass sentiment classification approach to handle the imbalanced class distribution of Arabic documents. The proposed approach, sentiment classification of Arabic documents (SCArD), combines the advantages of a clustering-based undersampling (CBUS) method and an ensemble learning model to aid machine learning (ML) classifiers in building accurate
-
Polish natural language inference and factivity: An expert-based dataset and benchmarks Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-01 Daniel Ziembicki, Karolina Seweryn, Anna Wróblewska
Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose, we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, that is prediction of entailment, contradiction, or neutral (ECN). In this paper, we describe
-
Improving semantic coverage of data-to-text generation model using dynamic memory networks Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-31 Elham Seifossadat, Hossein Sameti
This paper proposes a sequence-to-sequence model for data-to-text generation, called DM-NLG, to generate a natural language text from structured nonlinguistic input. Specifically, by adding a dynamic memory module to the attention-based sequence-to-sequence model, it can store the information that leads to generate previous output words and use it to generate the next word. In this way, the decoder
-
Urdu paraphrase detection: A novel DNN-based implementation using a semi-automatically generated corpus Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-29 Hafiz Rizwan Iqbal, Rashad Maqsood, Agha Ali Raza, Saeed-Ul Hassan
Automatic paraphrase detection is the task of measuring the semantic overlap between two given texts. A major hurdle in the development and evaluation of paraphrase detection approaches, particularly for South Asian languages like Urdu, is the inadequacy of standard evaluation resources. The very few available paraphrased corpora for these languages are manually created. As a result, they are constrained
-
Morphosyntactic probing of multilingual BERT models Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-25 Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai
We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong
-
Plot extraction and the visualization of narrative flow Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-23 Michael A. DeBuse, Sean Warnick
This article discusses the development of an automated plot extraction system for narrative texts. Acknowledging the distinction between plot, as an object of study with its own rich history and literature, and features of a text that may be automatically extractable, we begin by characterizing a text’s scatter plot of entities. This visualization of a text reveals entity density patterns characterizing
-
Emerging trends: Risks 3.0 and proliferation of spyware to 50,000 cell phones Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-19 Kenneth Ward Church, Raman Chandrasekar
Our last emerging trend article introduced Risks 1.0 (fairness and bias) and Risks 2.0 (addictive, dangerous, deadly, and insanely profitable). This article introduces Risks 3.0 (spyware and cyber weapons). Risks 3.0 are less profitable, but more destructive. We will summarize two recent books, Pegasus: How a Spy in Your Pocket Threatens the End of Privacy, Dignity, and Democracy and This is How They
-
A comparison of latent semantic analysis and correspondence analysis of document-term matrices Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-18 Qianqian Qi, David J. Hessen, Tejaswini Deoskar, Peter G. M. van der Heijden
Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-term matrices
-
Recommending tasks based on search queries and missions Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-17 Darío Garigliotti, Krisztian Balog, Katja Hose, Johannes Bjerva
Web search is an experience that naturally lends itself to recommendations, including query suggestions and related entities. In this article, we propose to recommend specific tasks to users, based on their search queries, such as planning a holiday trip or organizing a party. Specifically, we introduce the problem of query-based task recommendation and develop methods that combine well-established
-
Determining sentiment views of verbal multiword expressions using linguistic features Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-15 Michael Wiegand, Marc Schulder, Josef Ruppenhofer
We examine the binary classification of sentiment views for verbal multiword expressions (MWEs). Sentiment views denote the perspective of the holder of some opinion. We distinguish between MWEs conveying the view of the speaker of the utterance (e.g., in “The company reinvented the wheel” the holder is the implicit speaker who criticizes the company for creating something already existing) and MWEs
-
Joint learning of text alignment and abstractive summarization for long documents via unbalanced optimal transport Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-15 Xin Shen, Wai Lam, Shumin Ma, Huadong Wang
Recently, neural abstractive text summarization (NATS) models based on sequence-to-sequence architecture have drawn a lot of attention. Real-world texts that need to be summarized range from short news with dozens of words to long reports with thousands of words. However, most existing NATS models are not good at summarizing long documents, due to the inherent limitations of their underlying neural
-
What should be encoded by position embedding for neural network language models? Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-10 Shuiyuan Yu, Zihao Zhang, Haitao Liu
Word order is one of the most important grammatical devices and the basis for language understanding. However, as one of the most popular NLP architectures, Transformer does not explicitly encode word order. A solution to this problem is to incorporate position information by means of position encoding/embedding (PE). Although a variety of methods of incorporating position information have been proposed
-
SAN-T2T: An automated table-to-text generator based on selective attention network Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-05 Haijie Ding, Xiaolong Xu
Table-to-text generation aims to generate descriptions for structured data (i.e., tables) and has been applied in many fields like question-answering systems and search engines. Current approaches mostly use neural language models to learn alignment between output and input based on the attention mechanisms, which are still flawed by the gradual weakening of attention when processing long texts and
-
A transformer-based multi-task framework for joint detection of aggression and hate on social media data Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-04-11 Soumitra Ghosh, Amit Priyankar, Asif Ekbal, Pushpak Bhattacharyya
Moderators often face a double challenge regarding reducing offensive and harmful content in social media. Despite the need to prevent the free circulation of such content, strict censorship on social media cannot be implemented due to a tricky dilemma – preserving free speech on the Internet while limiting them and how not to overreact. Existing systems do not essentially exploit the correlatedness
-
On generalization of the sense retrofitting model Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-03-31 Yang-Yin Lee, Ting-Yu Yen, Hen-Hsen Huang, Yow-Ting Shiue, Hsin-Hsi Chen
With the aid of recently proposed word embedding algorithms, the study of semantic relatedness has progressed rapidly. However, word-level representations are still lacking for many natural language processing tasks. Various sense-level embedding learning algorithms have been proposed to address this issue. In this paper, we present a generalized model derived from existing sense retrofitting models
-
The problem of varying annotations to identify abusive language in social media content Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-03-29 Nina Seemann, Yeong Su Lee, Julian Höllig, Michaela Geierhos
With the increase of user-generated content on social media, the detection of abusive language has become crucial and is therefore reflected in several shared tasks that have been performed in recent years. The development of automatic detection systems is desirable, and the classification of abusive social media content can be solved with the help of machine learning. The basis for successful development
-
Towards diverse and contextually anchored paraphrase modeling: A dataset and baselines for Finnish Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-03-16 Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Aurora Piirto, Jenna Saarni, Maija Sevón, Otto Tarkka
In this paper, we study natural language paraphrasing from both corpus creation and modeling points of view. We focus in particular on the methodology that allows the extraction of challenging examples of paraphrase pairs in their natural textual context, leading to a dataset potentially more suitable for evaluating the models’ ability to represent meaning, especially in document context, when compared
-
Argumentation models and their use in corpus annotation: Practice, prospects, and challenges Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-28 Henrique Lopes Cardoso, Rui Sousa-Silva, Paula Carvalho, Bruno Martins
The study of argumentation is transversal to several research domains, from philosophy to linguistics, from the law to computer science and artificial intelligence. In discourse analysis, several distinct models have been proposed to harness argumentation, each with a different focus or aim. To analyze the use of argumentation in natural language, several corpora annotation efforts have been carried
-
Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-27 Dmytro Kalpakchi, Johan Boye
We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our method provides a strong, deterministic and inexpensive-to-train baseline for less-resourced languages. While a language-specific corpus is still required, its size is nowhere near those required by modern neural question generation (QG) architectures. Our method surpasses QG baselines
-
An end-to-end neural framework using coarse-to-fine-grained attention for overlapping relational triple extraction Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-21 Huizhe Su, Hao Wang, Xiangfeng Luo, Shaorong Xie
In recent years, the extraction of overlapping relations has received great attention in the field of natural language processing (NLP). However, most existing approaches treat relational triples in sentences as isolated, without considering the rich semantic correlations implied in the relational hierarchy. Extracting these overlapping relational triples is challenging, given the overlapping types
-
An unsupervised perplexity-based method for boilerplate removal Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-21 Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel, Pablo Gamallo
The availability of large web-based corpora has led to significant advances in a wide range of technologies, including massive retrieval systems or deep neural networks. However, leveraging this data is challenging, since web content is plagued by the so-called boilerplate: ads, incomplete or noisy text and rests of the navigation structure, such as menus or navigation bars. In this work, we present
-
How to do human evaluation: A brief introduction to user studies in NLP Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-06 Hendrik Schuff, Lindsey Vanderlyn, Heike Adel, Ngoc Thang Vu
Many research topics in natural language processing (NLP), such as explanation generation, dialog modeling, or machine translation, require evaluation that goes beyond standard metrics like accuracy or F1 score toward a more human-centered approach. Therefore, understanding how to design user studies becomes increasingly important. However, few comprehensive resources exist on planning, conducting
-
NLP startup funding in 2022 Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-01-31 Robert Dale
It’s no secret that the commercial application of NLP technologies has exploded in recent years. From chatbots and virtual assistants to machine translation and sentiment analysis, NLP technologies are now being used in a wide variety of applications across a range of industries. With the increasing demand for technologies that can process human language, investors have been eager to get a piece of
-
KLAUS-Tr: Knowledge & learning-based unit focused arithmetic word problem solver for transfer cases Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-22 Suresh Kumar, P. Sreenivasa Kumar
Solving the Arithmetic Word Problems (AWPs) using AI techniques has attracted much attention in recent years. We feel that the current AWP solvers are under-utilizing the relevant domain knowledge. We present a knowledge- and learning-based system that effectively solves AWPs of a specific type—those that involve transfer of objects from one agent to another (Transfer Cases (TC)). We represent the
-
SEN: A subword-based ensemble network for Chinese historical entity extraction Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-22 Chengxi Yan, Ruojia Wang, Xiaoke Fang
Understanding various historical entity information (e.g., persons, locations, and time) plays a very important role in reasoning about the developments of historical events. With the increasing concern about the fields of digital humanities and natural language processing, named entity recognition (NER) provides a feasible solution for automatically extracting these entities from historical texts
-
Emerging trends: Unfair, biased, addictive, dangerous, deadly, and insanely profitable Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-19 Kenneth Church, Annika Schoene, John E. Ortega, Raman Chandrasekar, Valia Kordoni
There has been considerable work recently in the natural language community and elsewhere on Responsible AI. Much of this work focuses on fairness and biases (henceforth Risks 1.0), following the 2016 best seller: Weapons of Math Destruction. Two books published in 2022, The Chaos Machine and Like, Comment, Subscribe, raise additional risks to public health/safety/security such as genocide, insurrection
-
Parameter-efficient feature-based transfer for paraphrase identification Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-19 Xiaodong Liu, Rafal Rzepka, Kenji Araki
There are many types of approaches for Paraphrase Identification (PI), an NLP task of determining whether a sentence pair has equivalent semantics. Traditional approaches mainly consist of unsupervised learning and feature engineering, which are computationally inexpensive. However, their task performance is moderate nowadays. To seek a method that can preserve the low computational costs of traditional
-
Towards universal methods for fake news detection Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-10-26 Maria Pszona, Maria Janicka, Grzegorz Wojdyga, Aleksander Wawer
Fake news detection is an emerging topic that has attracted a lot of attention among researchers and in the industry. This paper focuses on fake news detection as a text classification problem: on the basis of five publicly available corpora with documents labeled as true or fake, the task was to automatically distinguish both classes without relying on fact-checking. The aim of our research was to
-
A benchmark for evaluating Arabic word embedding models Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-10-17 Sane Yagi, Ashraf Elnagar, Shehdeh Fareh
Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on