Natural Language Engineering期刊最新论文, 计算机, 人工智能类期刊,

Anisotropic span embeddings and the negative impact of higher-order inference for coreference resolution: An empirical analysis

Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-25
Feng Hou, Ruili Wang, See-Kiong Ng, Fangyi Zhu, Michael Witbrock, Steven F. Cahan, Lily Chen, Xiaoyun Jia

Coreference resolution is the task of identifying and clustering mentions that refer to the same entity in a document. Based on state-of-the-art deep learning approaches, end-to-end coreference resolution considers all spans as candidate mentions and tackles mention detection and coreference resolution simultaneously. Recently, researchers have attempted to incorporate document-level context using

更新日期：2024-01-25

详情收藏

Automated annotation of parallel bible corpora with cross-lingual semantic concordance

Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-25
Jens Dörpinghaus

Here we present an improved approach for automated annotation of New Testament corpora with cross-lingual semantic concordance based on Strong’s numbers. Based on already annotated texts, they provide references to the original Greek words. Since scientific editions and translations of biblical texts are often not available for scientific purposes and are rarely freely available, there is a lack of

更新日期：2024-01-25

详情收藏

How do control tokens affect natural language generation tasks like text simplification

Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-23
Zihao Li, Matthew Shardlow

Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the

更新日期：2024-01-23

详情收藏

Emerging trends: When can users trust GPT, and when should they intervene?

Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-16
Kenneth Church

Usage of large language models and chat bots will almost surely continue to grow, since they are so easy to use, and so (incredibly) credible. I would be more comfortable with this reality if we encouraged more evaluations with humans-in-the-loop to come up with a better characterization of when the machine can be trusted and when humans should intervene. This article will describe a homework assignment

更新日期：2024-01-16

详情收藏

Lightweight transformers for clinical natural language processing

Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-12
Omid Rohanian, Mohammadmahdi Nouriborji, Hannah Jauncey, Samaneh Kouchaki, Farhad Nooralahzadeh, ISARIC Clinical Characterisation Group, Lei Clifton, Laura Merson, David A. Clifton

Specialised pre-trained language models are becoming more frequent in Natural language Processing (NLP) since they can potentially outperform models trained on generic texts. BioBERT (Sanh et al., Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv: 1910.01108, 2019) and BioClinicalBERT (Alsentzer et al., Publicly available clinical bert embeddings. In

更新日期：2024-01-12

详情收藏

Actionable conversational quality indicators for improving task-oriented dialog systems

Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-09
Michael Higgins, Dominic Widdows, Beth Ann Hockey, Akshay Hazare, Kristen Howell, Gwen Christian, Sujit Mathi, Chris Brew, Andrew Maurer, George Bonev, Matthew Dunn, Joseph Bradley

Automatic dialog systems have become a mainstream part of online customer service. Many such systems are built, maintained, and improved by customer service specialists, rather than dialog systems engineers and computer programmers. As conversations between people and machines become commonplace, it is critical to understand what is working, what is not, and what actions can be taken to reduce the

更新日期：2024-01-09

详情收藏

A year’s a long time in generative AI

Nat. Lang. Eng. (IF 2.5) Pub Date : 2024-01-08
Robert Dale

A lot has happened since OpenAI released ChatGPT to the public in November 2022. We review how things unfolded over the course of the year, tracking significant events and announcements from the tech giants leading the generative AI race and from other players of note; along the way we note the wider impacts of the technology’s progress.

更新日期：2024-01-08

详情收藏

Preface: Special issue on NLP approaches to offensive content online

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-12-06
Marcos Zampieri, Isabelle Augenstein, Siddharth Krishnan, Joshua Melton, Preslav Nakov

We are delighted to present the Special Issue on NLP Approaches to Offensive Content Online published in the Journal of Natural Language Engineering issue 29.6. We are happy to have received a total of 26 submissions to the special issue evidencing the interest of the NLP community in this topic. Our guest editorial board comprised of international experts in the field has worked hard to review all

更新日期：2023-12-06

详情收藏

OffensEval 2023: Offensive language identification in the age of Large Language Models

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-12-06
Marcos Zampieri, Sara Rosenthal, Preslav Nakov, Alphaeus Dmonte, Tharindu Ranasinghe

The OffensEval shared tasks organized as part of SemEval-2019–2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy

更新日期：2023-12-06

详情收藏

Data-to-text generation using conditional generative adversarial with enhanced transformer

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-11-28
Elham Seifossadat, Hossein Sameti

In this paper, we propose an enhanced version of the vanilla transformer for data-to-text generation and then use it as the generator of a conditional generative adversarial model to improve the semantic quality and diversity of output sentences. Specifically, by adding a diagonal mask matrix to the attention scores of the encoder and using the history of the attention weights in the decoder, this

更新日期：2023-11-28

详情收藏

Abstractive summarization with deep reinforcement learning using semantic similarity rewards

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-10-31
Figen Beken Fikri, Kemal Oflazer, Berrin Yanıkoğlu

ive summarization is an approach to document summarization that is not limited to selecting sentences from the document but can generate new sentences as well. We address the two main challenges in abstractive summarization: how to evaluate the performance of a summarization model and what is a good training objective. We first introduce new evaluation measures based on the semantic similarity of the

更新日期：2023-10-31

详情收藏

Neural Arabic singular-to-plural conversion using a pretrained Character-BERT and a fused transformer

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-10-11
Azzam Radman, Mohammed Atros, Rehab Duwairi

Morphological re-inflection generation is one of the most challenging tasks in the natural language processing (NLP) domain, especially with morphologically rich, low-resource languages like Arabic. In this research, we investigate the ability of transformer-based models in the singular-to-plural Arabic noun conversion task. We start with pretraining a Character-BERT model on a masked language modeling

更新日期：2023-10-11

详情收藏

Perceptional and actional enrichment for metaphor detection with sensorimotor norms

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-09-20
Mingyu Wan, Qi Su, Kathleen Ahrens, Chu-Ren Huang

Understanding the nature of meaning and its extensions (with metaphor as one typical kind) has been one core issue in figurative language study since Aristotle’s time. This research takes a computational cognitive perspective to model metaphor based on the assumption that meaning is perceptual, embodied, and encyclopedic. We model word meaning representation for metaphor detection with embodiment information

更新日期：2023-09-20

详情收藏

Emerging trends: Smooth-talking machines

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-09-11
Kenneth Ward Church, Richard Yue

Large language models (LLMs) have achieved amazing successes. They have done well on standardized tests in medicine and the law. That said, the bar has been raised so high that it could take decades to make good on expectations. To buy time for this long-term research program, the field needs to identify some good short-term applications for smooth-talking machines that are more fluent than trustworthy

更新日期：2023-09-11

详情收藏

Improved conversational recommender system based on dialog context

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-09-08
Xiaoyi Wang, Jie Liu, Jianyong Duan

Conversational recommender system (CRS) needs to be seamlessly integrated between the two modules of recommendation and dialog, aiming to recommend high-quality items to users through multiple rounds of interactive dialogs. Items can typically refer to goods, movies, news, etc. Through this form of interactive dialog, users can express their preferences in real time, and the system can fully understand

更新日期：2023-09-08

详情收藏

A study towards contextual understanding of toxicity in online conversations

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-30
Pranava Madhyastha, Antigoni Founta, Lucia Specia

Identifying and annotating toxic online content on social media platforms is an extremely challenging problem. Work that studies toxicity in online content has predominantly focused on comments as independent entities. However, comments on social media are inherently conversational, and therefore, understanding and judging the comments fundamentally requires access to the context in which they are

更新日期：2023-08-30

详情收藏

Improving short text classification with augmented data using GPT-3

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-25
Salvador V. Balkus, Donghui Yan

GPT-3 is a large-scale natural language model developed by OpenAI that can perform many different tasks, including topic classification. Although researchers claim that it requires only a small number of in-context examples to learn a task, in practice GPT-3 requires these training examples to be either of exceptional quality or a higher quantity than easily created by hand. To address this issue,

更新日期：2023-08-25

详情收藏

Creating a large-scale diachronic corpus resource: Automated parsing in the Greek papyri (and beyond)

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-15
Alek Keersmaekers, Toon Van Hal

This paper explores how to syntactically parse Ancient Greek texts automatically and maps ways of fruitfully employing the results of such an automated analysis. Special attention is given to documentary papyrus texts, a large diachronic corpus of non-literary Greek, which presents a unique set of challenges to tackle. By making use of the Stanford Graph-Based Neural Dependency Parser, we show that

更新日期：2023-08-15

详情收藏

PGST: A Persian gender style transfer method

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-15
Reza Khanmohammadi, Seyed Abolghasem Mirroshandel

Recent developments in text style transfer have led this field to be more highlighted than ever. There are many challenges associated with transferring the style of input text such as fluency and content preservation that need to be addressed. In this research, we present PGST, a novel Persian text style transfer approach in the gender domain, composed of different constituent elements. Established

更新日期：2023-08-15

详情收藏

Toward a shallow discourse parser for Turkish

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-11
Ferhat Kutlu, Deniz Zeyrek, Murathan Kurfalı

One of the most interesting aspects of natural language is how texts cohere, which involves the pragmatic or semantic relations that hold between clauses (addition, cause-effect, conditional, similarity), referred to as discourse relations. A focus on the identification and classification of discourse relations appears as an imperative challenge to be resolved to support tasks such as text summarization

更新日期：2023-08-11

详情收藏

Emojis as anchors to detect Arabic offensive language and hate speech

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-10
Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury

We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets—analyzing key cultural differences. We observed a constant

更新日期：2023-08-10

详情收藏

How you describe procurement calls matters: Predicting outcome of public procurement using call descriptions

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-10
Utku Umur Acikalin, Mustafa Kaan Gorgun, Mucahid Kutlu, Bedri Kamil Onur Tas

A competitive and cost-effective public procurement (PP) process is essential for the effective use of public resources. In this work, we explore whether descriptions of procurement calls can be used to predict their outcomes. In particular, we focus on predicting four well-known economic metrics: (i) the number of offers, (ii) whether only a single offer is received, (iii) whether a foreign firm is

更新日期：2023-08-10

详情收藏

SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-08-03
Xuanyu Su, Yansong Li, Paula Branco, Diana Inkpen

Anti-Asian speech during the COVID-19 pandemic has been a serious problem with severe consequences. A hate speech wave swept social media platforms. The timely detection of Anti-Asian COVID-19-related hate speech is of utmost importance, not only to allow the application of preventive mechanisms but also to anticipate and possibly prevent other similar discriminatory situations. In this paper, we address

更新日期：2023-08-03

详情收藏

Masked transformer through knowledge distillation for unsupervised text style transfer

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-25
Arthur Scalercio, Aline Paes

Text style transfer (TST) aims at automatically changing a text’s stylistic features, such as formality, sentiment, authorial style, humor, and complexity, while still trying to preserve its content. Although the scientific community has investigated TST since the 1980s, it has recently regained attention by adopting deep unsupervised strategies to address the challenge of training without parallel

更新日期：2023-07-25

详情收藏

Navigating the text generation revolution: Traditional data-to-text NLG companies and the rise of ChatGPT

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-19
Robert Dale

Since the release of ChatGPT at the end of November 2022, generative AI has been talked about endlessly in both the technical press and the mainstream media. Large language model technology has been heralded as many things: the disruption of the search engine, the end of the student essay, the bringer of disinformation … but what does it mean for commercial providers of earlier iterations of natural

更新日期：2023-07-19

详情收藏

Assessment of the E3C corpus for the recognition of disorders in clinical texts

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-18
Roberto Zanoli, Alberto Lavelli, Daniel Verdi do Amarante, Daniele Toti

Disorder named entity recognition (DNER) is a fundamental task of biomedical natural language processing, which has attracted plenty of attention. This task consists in extracting named entities of disorders such as diseases, symptoms, and pathological functions from unstructured text. The European Clinical Case Corpus (E3C) is a freely available multilingual corpus (English, French, Italian, Spanish

更新日期：2023-07-18

详情收藏

Describe the house and I will tell you the price: House price prediction with textual description data

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-07-18
Hanxiang Zhang, Yansong Li, Paula Branco

House price prediction is an important problem that could benefit home buyers and sellers. Traditional models for house price prediction use numerical attributes such as the number of rooms but disregard the house description text. The recent developments in text processing suggest these can be valuable attributes, which motivated us to use house descriptions. This paper focuses on the house asking/advertising

更新日期：2023-07-18

详情收藏

Korean named entity recognition based on language-specific features

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-29
Yige Chen, KyungTae Lim, Jungyeul Park

In this paper, we propose a novel way of improving named entity recognition (NER) in the Korean language using its language-specific features. While the field of NER has been studied extensively in recent years, the mechanism of efficiently recognizing named entities (NEs) in Korean has hardly been explored. This is because the Korean language has distinct linguistic properties that present challenges

更新日期：2023-06-29

详情收藏

Linguistically aware evaluation of coreference resolution from the perspective of higher-level applications

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-19
Voldemaras Žitkus, Rita Butkienė, Rimantas Butleris

Coreference resolution is an important part of natural language processing used in machine translation, semantic search, and various other information retrieval and understanding systems. One of the challenges in this field is an evaluation of resolution approaches. There are many different metrics proposed, but most of them rely on certain assumptions, like equivalence between different mentions of

更新日期：2023-06-19

详情收藏

A resampling-based method to evaluate NLI models

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-09
Felipe de Souza Salvatore, Marcelo Finger, Roberto Hirata, Alexandre G. Patriota

The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known

更新日期：2023-06-09

详情收藏

Focusing on potential named entities during active label acquisition

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-06
Ali Osman Berk Şapcı, Hasan Kemik, Reyyan Yeniterzi, Oznur Tastan

Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the

更新日期：2023-06-06

详情收藏

Cluster-based ensemble learning model for improving sentiment classification of Arabic documents

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-01
Rana Husni Al Mahmoud, Bassam H. Hammo, Hossam Faris

This article reports on designing and implementing a multiclass sentiment classification approach to handle the imbalanced class distribution of Arabic documents. The proposed approach, sentiment classification of Arabic documents (SCArD), combines the advantages of a clustering-based undersampling (CBUS) method and an ensemble learning model to aid machine learning (ML) classifiers in building accurate

更新日期：2023-06-01

详情收藏

Polish natural language inference and factivity: An expert-based dataset and benchmarks

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-06-01
Daniel Ziembicki, Karolina Seweryn, Anna Wróblewska

Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose, we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, that is prediction of entailment, contradiction, or neutral (ECN). In this paper, we describe

更新日期：2023-06-01

详情收藏

Improving semantic coverage of data-to-text generation model using dynamic memory networks

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-31
Elham Seifossadat, Hossein Sameti

This paper proposes a sequence-to-sequence model for data-to-text generation, called DM-NLG, to generate a natural language text from structured nonlinguistic input. Specifically, by adding a dynamic memory module to the attention-based sequence-to-sequence model, it can store the information that leads to generate previous output words and use it to generate the next word. In this way, the decoder

更新日期：2023-05-31

详情收藏

Urdu paraphrase detection: A novel DNN-based implementation using a semi-automatically generated corpus

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-29
Hafiz Rizwan Iqbal, Rashad Maqsood, Agha Ali Raza, Saeed-Ul Hassan

Automatic paraphrase detection is the task of measuring the semantic overlap between two given texts. A major hurdle in the development and evaluation of paraphrase detection approaches, particularly for South Asian languages like Urdu, is the inadequacy of standard evaluation resources. The very few available paraphrased corpora for these languages are manually created. As a result, they are constrained

更新日期：2023-05-29

详情收藏

Morphosyntactic probing of multilingual BERT models

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-25
Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai

We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong

更新日期：2023-05-25

详情收藏

Plot extraction and the visualization of narrative flow

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-23
Michael A. DeBuse, Sean Warnick

This article discusses the development of an automated plot extraction system for narrative texts. Acknowledging the distinction between plot, as an object of study with its own rich history and literature, and features of a text that may be automatically extractable, we begin by characterizing a text’s scatter plot of entities. This visualization of a text reveals entity density patterns characterizing

更新日期：2023-05-23

详情收藏

Emerging trends: Risks 3.0 and proliferation of spyware to 50,000 cell phones

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-19
Kenneth Ward Church, Raman Chandrasekar

Our last emerging trend article introduced Risks 1.0 (fairness and bias) and Risks 2.0 (addictive, dangerous, deadly, and insanely profitable). This article introduces Risks 3.0 (spyware and cyber weapons). Risks 3.0 are less profitable, but more destructive. We will summarize two recent books, Pegasus: How a Spy in Your Pocket Threatens the End of Privacy, Dignity, and Democracy and This is How They

更新日期：2023-05-19

详情收藏

A comparison of latent semantic analysis and correspondence analysis of document-term matrices

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-18
Qianqian Qi, David J. Hessen, Tejaswini Deoskar, Peter G. M. van der Heijden

Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-term matrices

更新日期：2023-05-18

详情收藏

Recommending tasks based on search queries and missions

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-17
Darío Garigliotti, Krisztian Balog, Katja Hose, Johannes Bjerva

Web search is an experience that naturally lends itself to recommendations, including query suggestions and related entities. In this article, we propose to recommend specific tasks to users, based on their search queries, such as planning a holiday trip or organizing a party. Specifically, we introduce the problem of query-based task recommendation and develop methods that combine well-established

更新日期：2023-05-17

详情收藏

Determining sentiment views of verbal multiword expressions using linguistic features

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-15
Michael Wiegand, Marc Schulder, Josef Ruppenhofer

We examine the binary classification of sentiment views for verbal multiword expressions (MWEs). Sentiment views denote the perspective of the holder of some opinion. We distinguish between MWEs conveying the view of the speaker of the utterance (e.g., in “The company reinvented the wheel” the holder is the implicit speaker who criticizes the company for creating something already existing) and MWEs

更新日期：2023-05-15

详情收藏

Joint learning of text alignment and abstractive summarization for long documents via unbalanced optimal transport

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-15
Xin Shen, Wai Lam, Shumin Ma, Huadong Wang

Recently, neural abstractive text summarization (NATS) models based on sequence-to-sequence architecture have drawn a lot of attention. Real-world texts that need to be summarized range from short news with dozens of words to long reports with thousands of words. However, most existing NATS models are not good at summarizing long documents, due to the inherent limitations of their underlying neural

更新日期：2023-05-15

详情收藏

What should be encoded by position embedding for neural network language models?

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-10
Shuiyuan Yu, Zihao Zhang, Haitao Liu

Word order is one of the most important grammatical devices and the basis for language understanding. However, as one of the most popular NLP architectures, Transformer does not explicitly encode word order. A solution to this problem is to incorporate position information by means of position encoding/embedding (PE). Although a variety of methods of incorporating position information have been proposed

更新日期：2023-05-10

详情收藏

SAN-T2T: An automated table-to-text generator based on selective attention network

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-05-05
Haijie Ding, Xiaolong Xu

Table-to-text generation aims to generate descriptions for structured data (i.e., tables) and has been applied in many fields like question-answering systems and search engines. Current approaches mostly use neural language models to learn alignment between output and input based on the attention mechanisms, which are still flawed by the gradual weakening of attention when processing long texts and

更新日期：2023-05-05

详情收藏

A transformer-based multi-task framework for joint detection of aggression and hate on social media data

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-04-11
Soumitra Ghosh, Amit Priyankar, Asif Ekbal, Pushpak Bhattacharyya

Moderators often face a double challenge regarding reducing offensive and harmful content in social media. Despite the need to prevent the free circulation of such content, strict censorship on social media cannot be implemented due to a tricky dilemma – preserving free speech on the Internet while limiting them and how not to overreact. Existing systems do not essentially exploit the correlatedness

更新日期：2023-04-11

详情收藏

On generalization of the sense retrofitting model

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-03-31
Yang-Yin Lee, Ting-Yu Yen, Hen-Hsen Huang, Yow-Ting Shiue, Hsin-Hsi Chen

With the aid of recently proposed word embedding algorithms, the study of semantic relatedness has progressed rapidly. However, word-level representations are still lacking for many natural language processing tasks. Various sense-level embedding learning algorithms have been proposed to address this issue. In this paper, we present a generalized model derived from existing sense retrofitting models

更新日期：2023-03-31

详情收藏

The problem of varying annotations to identify abusive language in social media content

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-03-29
Nina Seemann, Yeong Su Lee, Julian Höllig, Michaela Geierhos

With the increase of user-generated content on social media, the detection of abusive language has become crucial and is therefore reflected in several shared tasks that have been performed in recent years. The development of automatic detection systems is desirable, and the classification of abusive social media content can be solved with the help of machine learning. The basis for successful development

更新日期：2023-03-29

详情收藏

Towards diverse and contextually anchored paraphrase modeling: A dataset and baselines for Finnish

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-03-16
Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Aurora Piirto, Jenna Saarni, Maija Sevón, Otto Tarkka

In this paper, we study natural language paraphrasing from both corpus creation and modeling points of view. We focus in particular on the methodology that allows the extraction of challenging examples of paraphrase pairs in their natural textual context, leading to a dataset potentially more suitable for evaluating the models’ ability to represent meaning, especially in document context, when compared

更新日期：2023-03-16

详情收藏

Argumentation models and their use in corpus annotation: Practice, prospects, and challenges

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-28
Henrique Lopes Cardoso, Rui Sousa-Silva, Paula Carvalho, Bruno Martins

The study of argumentation is transversal to several research domains, from philosophy to linguistics, from the law to computer science and artificial intelligence. In discourse analysis, several distinct models have been proposed to harness argumentation, each with a different focus or aim. To analyze the use of argumentation in natural language, several corpora annotation efforts have been carried

更新日期：2023-02-28

详情收藏

Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-27
Dmytro Kalpakchi, Johan Boye

We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our method provides a strong, deterministic and inexpensive-to-train baseline for less-resourced languages. While a language-specific corpus is still required, its size is nowhere near those required by modern neural question generation (QG) architectures. Our method surpasses QG baselines

更新日期：2023-02-27

详情收藏

An end-to-end neural framework using coarse-to-fine-grained attention for overlapping relational triple extraction

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-21
Huizhe Su, Hao Wang, Xiangfeng Luo, Shaorong Xie

In recent years, the extraction of overlapping relations has received great attention in the field of natural language processing (NLP). However, most existing approaches treat relational triples in sentences as isolated, without considering the rich semantic correlations implied in the relational hierarchy. Extracting these overlapping relational triples is challenging, given the overlapping types

更新日期：2023-02-21

详情收藏

An unsupervised perplexity-based method for boilerplate removal

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-21
Marcos Fernández-Pichel, Manuel Prada-Corral, David E. Losada, Juan C. Pichel, Pablo Gamallo

The availability of large web-based corpora has led to significant advances in a wide range of technologies, including massive retrieval systems or deep neural networks. However, leveraging this data is challenging, since web content is plagued by the so-called boilerplate: ads, incomplete or noisy text and rests of the navigation structure, such as menus or navigation bars. In this work, we present

更新日期：2023-02-21

详情收藏

How to do human evaluation: A brief introduction to user studies in NLP

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-02-06
Hendrik Schuff, Lindsey Vanderlyn, Heike Adel, Ngoc Thang Vu

Many research topics in natural language processing (NLP), such as explanation generation, dialog modeling, or machine translation, require evaluation that goes beyond standard metrics like accuracy or F1 score toward a more human-centered approach. Therefore, understanding how to design user studies becomes increasingly important. However, few comprehensive resources exist on planning, conducting

更新日期：2023-02-06

详情收藏

NLP startup funding in 2022

Nat. Lang. Eng. (IF 2.5) Pub Date : 2023-01-31
Robert Dale

It’s no secret that the commercial application of NLP technologies has exploded in recent years. From chatbots and virtual assistants to machine translation and sentiment analysis, NLP technologies are now being used in a wide variety of applications across a range of industries. With the increasing demand for technologies that can process human language, investors have been eager to get a piece of

更新日期：2023-01-31

详情收藏

KLAUS-Tr: Knowledge & learning-based unit focused arithmetic word problem solver for transfer cases

Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-22
Suresh Kumar, P. Sreenivasa Kumar

Solving the Arithmetic Word Problems (AWPs) using AI techniques has attracted much attention in recent years. We feel that the current AWP solvers are under-utilizing the relevant domain knowledge. We present a knowledge- and learning-based system that effectively solves AWPs of a specific type—those that involve transfer of objects from one agent to another (Transfer Cases (TC)). We represent the

更新日期：2022-12-22

详情收藏

SEN: A subword-based ensemble network for Chinese historical entity extraction

Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-22
Chengxi Yan, Ruojia Wang, Xiaoke Fang

Understanding various historical entity information (e.g., persons, locations, and time) plays a very important role in reasoning about the developments of historical events. With the increasing concern about the fields of digital humanities and natural language processing, named entity recognition (NER) provides a feasible solution for automatically extracting these entities from historical texts

更新日期：2022-12-22

详情收藏

Emerging trends: Unfair, biased, addictive, dangerous, deadly, and insanely profitable

Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-19
Kenneth Church, Annika Schoene, John E. Ortega, Raman Chandrasekar, Valia Kordoni

There has been considerable work recently in the natural language community and elsewhere on Responsible AI. Much of this work focuses on fairness and biases (henceforth Risks 1.0), following the 2016 best seller: Weapons of Math Destruction. Two books published in 2022, The Chaos Machine and Like, Comment, Subscribe, raise additional risks to public health/safety/security such as genocide, insurrection

更新日期：2022-12-19

详情收藏

Parameter-efficient feature-based transfer for paraphrase identification

Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-12-19
Xiaodong Liu, Rafal Rzepka, Kenji Araki

There are many types of approaches for Paraphrase Identification (PI), an NLP task of determining whether a sentence pair has equivalent semantics. Traditional approaches mainly consist of unsupervised learning and feature engineering, which are computationally inexpensive. However, their task performance is moderate nowadays. To seek a method that can preserve the low computational costs of traditional

更新日期：2022-12-19

详情收藏

Towards universal methods for fake news detection

Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-10-26
Maria Pszona, Maria Janicka, Grzegorz Wojdyga, Aleksander Wawer

Fake news detection is an emerging topic that has attracted a lot of attention among researchers and in the industry. This paper focuses on fake news detection as a text classification problem: on the basis of five publicly available corpora with documents labeled as true or fake, the task was to automatically distinguish both classes without relying on fact-checking. The aim of our research was to

更新日期：2022-10-26

详情收藏

A benchmark for evaluating Arabic word embedding models

Nat. Lang. Eng. (IF 2.5) Pub Date : 2022-10-17
Sane Yagi, Ashraf Elnagar, Shehdeh Fareh

Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on

更新日期：2022-10-17

详情收藏