-
Unsupervised question-retrieval approach based on topic keywords filtering and multi-task learning Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-21 Aiguo Shang, Xinjuan Zhu, Michael Danner, Matthias Rätsch
Currently, the majority of retrieval-based question-answering systems depend on supervised training using question pairs. However, there is still a significant need for further exploration of how to employ unsupervised methods to improve the accuracy of retrieval-based question-answering systems. From the perspective of question topic keywords, this paper presents TFCSG, an unsupervised question-retrieval
-
A novel joint extraction model based on cross-attention mechanism and global pointer using context shield window Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-18 Zhengwei Zhai, Rongli Fan, Jie Huang, Neal Xiong, Lijuan Zhang, Jian Wan, Lei Zhang
Relational triple extraction is a critical step in knowledge graph construction. Compared to pipeline-based extraction, joint extraction is gaining more attention because it can better utilize entity and relation information without causing error propagation issues. Yet, the challenge with joint extraction lies in handling overlapping triples. Existing approaches adopt sequential steps or multiple
-
Multipath-guided heterogeneous graph neural networks for sequential recommendation Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-16 Fulian Yin, Tongtong Xing, Meiqi Ji, Zebin Yao, Ruiling Fu, Yuewei Wu
With the explosion of information and users’ changing interest, program sequential recommendation becomes increasingly important for TV program platforms to help their users find interesting programs. Existing sequential recommendation methods mainly focus on modeling user preferences from users’ historical interaction behaviors directly, with insufficient learning about the dynamics of programs and
-
A closer look at reinforcement learning-based automatic speech recognition Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-16 Fan Yang, Muqiao Yang, Xiang Li, Yuxuan Wu, Zhiyuan Zhao, Bhiksha Raj, Rita Singh
Reinforcement learning (RL) has demonstrated effectiveness in improving model performance and robustness for automatic speech recognition (ASR) tasks. Researchers have employed RL-based training strategies to enhance performance beyond conventional supervised or semi-supervised learning. However, existing approaches treat RL as a supplementary tool, leaving the untapped potential of RL training largely
-
Improving linear orthogonal mapping based cross-lingual representation using ridge regression and graph centrality Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-16 Deepen Naorem, Sanasam Ranbir Singh, Priyankoo Sarmah
-
Higher order statistics-driven magnitude and phase spectrum estimation for speech enhancement Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-16 T. Lavanya, P. Vijayalakshmi, K. Mrinalini, T. Nagarajan
Higher order statistics (HOS), can be effectively employed for noise suppression, provided the noise follows a Gaussian distribution. Since most of the noises are distributed normally, HOS can be effectively used for speech enhancement in noisy environments. In the current work, HOS-based parametric modelling for magnitude spectrum estimation is proposed to improve the SNR under noisy conditions. To
-
Incorporating external knowledge for text matching model Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-12 Kexin Jiang, Guozhe Jin, Zhenguo Zhang, Rongyi Cui, Yahui Zhao
Text matching is a computational task that involves comparing and establishing the semantic relationship between two textual inputs. The prevailing approach in text matching entails the computation of textual representations or employing attention mechanisms to facilitate interaction with the text. These techniques have demonstrated notable efficacy in various text-matching scenarios. However, these
-
KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-07 Hongru Wang, Wai-Chung Kwan, Min Li, Zimo Zhou, Kam-Fai Wong
To alleviate the shortage of dialogue datasets for Cantonese, one of the low-resource languages, and facilitate the development of customized task-oriented dialogue systems, we propose , the first Cantonese nowledge-driven ialogue ataset for taurants. It contains 834 multi-turn dialogues, 8000 utterances, and 26 distinct slots. The slots are hierarchical, and beneath the 26 coarse-grained slots are
-
Model discrepancy policy optimization for task-oriented dialogue Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-06 Zhenyou Zhou, Zhibin Liu, Zhaoan Dong, Yuhan Liu
Task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies, and agent interaction with user models can help the agent enhance its generalization capacity. But user models frequently lack the language complexity of human interlocutors and contain generative errors, and their design biases can impair the agent’s ability to function well in certain situations. In this paper
-
Next word prediction for Urdu language using deep learning models Comput. Speech Lang (IF 4.3) Pub Date : 2024-03-02 Ramish Shahid, Aamir Wali, Maryam Bashir
Deep learning models are being used for natural language processing. Despite their success, these models have been employed for only a few languages. Pretrained models also exist but they are mostly available for the English language. Low resource languages like Urdu are not able to benefit from these pre-trained deep learning models and their effectiveness in Urdu language processing remains a question
-
MECOS: A Bilingual Manipuri-English Spontaneous Code-Switching Speech Corpus for Automatic Speech Recognition Comput. Speech Lang (IF 4.3) Pub Date : 2024-02-20 Naorem Karline Singh, Yambem Jina Chanu, Hoomexsun Pangsatabam
-
Translating and predicting document structure for medical domain scientific abstracts Comput. Speech Lang (IF 4.3) Pub Date : 2024-02-09 Sadaf Abdul Rauf, François Yvon
Machine Translation (MT) technologies have improved in many ways and generate usable outputs for a growing number of domains and language pairs. Yet, most sentence based MT systems struggle with contextual dependencies, processing small chunks of texts, typically sentences, in isolation from their textual context. This is likely to cause systematic errors or inconsistencies when processing long documents
-
Single-channel speech enhancement using colored spectrograms Comput. Speech Lang (IF 4.3) Pub Date : 2024-02-07 Sania Gul, Muhammad Salman Khan, Muhammad Fazeel
Speech enhancement concerns the processes required to remove unwanted background sounds from the target speech to improve its quality and intelligibility. In this paper, a novel approach for single-channel speech enhancement is presented, using colored spectrograms. We propose the use of a deep neural network (DNN) architecture adapted from the pix2pix generative adversarial network (GAN) and train
-
A novel Chinese–Tibetan mixed-language rumor detector with multi-extractor representations Comput. Speech Lang (IF 4.3) Pub Date : 2024-02-07 Lisu Yu, Fei Li, Lixin Yu, Wei Li, Zhicheng Dong, Donghong Cai, Zhen Wang
Rumors can easily propagate through social media, posing potential threats to both individual and public health. Most existing approaches focus on single-language rumor detection, which leads to unsatisfying performance when these are applied to mixed-language rumor detection. Meanwhile, the type of mixed-language (mixture of word-level or sentence-level) is a great challenge for mixed-language rumor
-
A method of phonemic annotation for Chinese dialects based on a deep learning model with adaptive temporal attention and a feature disentangling structure Comput. Speech Lang (IF 4.3) Pub Date : 2024-02-05 Bowen Jiang, Qianhui Dong, Guojin Liu
Phonemic annotation is aimed at annotating a speech fragment with phonemic symbols. As the phonetic features of a speech fragment vary greatly among different languages including their dialects, it is a significant way to describe and write down the phonetic system of a language utilizing phonemic symbols. It is meaningful to develop an automatic and effective method for the phonemic annotation task
-
LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech Comput. Speech Lang (IF 4.3) Pub Date : 2024-02-03 Titouan Parcollet, Ha Nguyen, Solène Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Estève, Mickael Rouvier, Jerôme Goulian, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces an open-source framework for assessing and building SSL-equipped French speech
-
Spectral–temporal saliency masks and modulation tensorgrams for generalizable COVID-19 detection Comput. Speech Lang (IF 4.3) Pub Date : 2024-02-02 Yi Zhu, Tiago H. Falk
Speech COVID-19 detection systems have gained popularity as they represent an easy-to-use and low-cost solution that is well suited for at-home long-term monitoring of patients with persistent symptoms. Recently, however, the limited generalization capability of existing deep neural network based systems to unseen datasets has been raised as a serious concern, as has their limited interpretability
-
Effective Infant Cry Signal Analysis and Reasoning using IARO based Leaky Bi-LSTM Model Comput. Speech Lang (IF 4.3) Pub Date : 2024-01-24 B.M. Mala, Smita Sandeep Darandale
In the present scenario, the recognition of particular emotions or needs from an infant's cry is a difficult process in the field of pattern recognition as it does not have any verbal information. In this article, an automated model is introduced for an effective recognition of infant cries. At first, the infant cry signals are collected from the Baby Chillanto (BC) dataset and the Donate a Cry Corpus
-
Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances Comput. Speech Lang (IF 4.3) Pub Date : 2024-01-18 Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi
Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring. However, the sequential optimization of the front-end and back-end
-
Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement Comput. Speech Lang (IF 4.3) Pub Date : 2024-01-13 Yihao Li, Meng Sun, Xiongwei Zhang, Hugo Van hamme
A key step to single channel speech enhancement is the orthogonal separation of speech and noise. In this paper, a dual branch complex convolutional recurrent network (DBCCRN) is proposed to separate the complex spectrograms of speech and noises simultaneously. To model both local and global information, we incorporate conformer modules into our network. The orthogonality of the outputs of the two
-
A tag-based methodology for the detection of user repair strategies in task-oriented conversational agents Comput. Speech Lang (IF 4.3) Pub Date : 2024-01-08 Francesca Alloatti, Francesca Grasso, Roger Ferrod, Giovanni Siragusa, Luigi Di Caro, Federica Cena
Mutual comprehension is a crucial component that makes a conversation succeed. While it can be easily reached through the cooperation of the parties in human–human dialogues, such cooperation is often lacking in human–computer interaction due to technical problems, leading to broken conversations. Our goal is to work towards an effective detection of breakdowns in a conversation between humans and
-
TTK: A toolkit for Tunisian linguistic analysis Comput. Speech Lang (IF 4.3) Pub Date : 2024-01-03 Asma Mekki, Inès Zribi, Mariem Ellouze, Lamia Hadrich Belguith
Over the last two decades, many efforts have been made to provide resources to support the Arabic Natural Language Processing (NLP). Some of these resources target specific NLP tasks such as word tokenization, parsing, or sentiment analysis, while others attempt to tackle numerous tasks at once. In this paper, we present ¡¡TTK¿¿, a toolkit for Tunisian linguistic analysis. It consists of a collection
-
Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-26 Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan
Speech signals are valuable biomarkers for assessing an individual’s mental health, including identifying Major Depressive Disorder (MDD) automatically. A frequently used approach in this regard is to employ features related to speaker identity, such as speaker-embeddings. However, over-reliance on speaker identity features in mental health screening systems can compromise patient privacy. Moreover
-
Enhanced local knowledge with proximity values and syntax-clusters for aspect-level sentiment analysis Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-28 Pengfei Chen, Biqing Zeng, Yuwu Lu, Yun Xue, Fei Fan, Mayi Xu, Lingcong Feng
Aspect-level sentiment analysis (ALSA) aims to extract the polarity of different aspect terms in a sentence. Previous works leveraging traditional dependency syntax parsing trees (DSPT) to encode contextual syntactic information had obtained state-of-the-art results. However, these works may not be able to learn fine-grained syntactic knowledge efficiently, which makes them difficult to take advantage
-
Contextual emotion detection using ensemble deep learning Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-20 Asalah Thiab, Luay Alawneh, Mohammad AL-Smadi
Emotion detection from online textual information is gaining more attention due to its usefulness in understanding users’ behaviors and their desires. This is driven by the large amounts of texts from different sources such as social media and shopping websites. Recent studies investigated the benefits of deep learning in the detection of emotions from textual conversations. In this paper, we study
-
SecNLP: An NLP classification model watermarking framework based on multi-task learning Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-23 Long Dai, Jiarong Mao, Liaoran Xu, Xuefeng Fan, Xiaoyi Zhou
The popularity of ChatGPT demonstrates the immense commercial value of natural language processing (NLP) technology. However, NLP models like ChatGPT are vulnerable to piracy and redistribution, which can harm the economic interests of model owners. Existing NLP model watermarking schemes struggle to balance robustness and covertness. Typically, robust watermarks require embedding more information
-
Improving self-supervised learning model for audio spoofing detection with layer-conditioned embedding fusion Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-18 Souvik Sinha, Spandan Dey, Goutam Saha
The application of voice recognition systems has increased by a great deal with technology. This has allowed adversaries to falsely claim access to these systems by spoofing the identity of a target speaker. The existing supervised learning (SL)-based countermeasures are yet to provide a complete solution against the newly evolving spoofing attacks. To tackle this problem, we explore self-supervised
-
Complementary regional energy features for spoofed speech detection Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-16 Gökay Dişken
Automatic speaker verification systems are found to be vulnerable to spoof attacks such as voice conversion, text-to-speech, and replayed speech. As the security of biometric systems is vital, many countermeasures have been developed for spoofed speech detection. To satisfy the recent developments on speech synthesis, publicly available datasets became more and more challenging (e.g., ASVspoof 2019
-
A novel channel estimate for noise robust speech recognition Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-16 Geoffroy Vanderreydt, Kris Demuynck
We propose a novel technique to estimate the channel characteristics for robust speech recognition. The method focuses on reliable time–frequency speech patches which are highly independent of the noise condition. Combined with a root-based approximation of the logarithm in the MFCC computation, this reduces the variance caused by the noise on the spectral features, and therefore also the constrain
-
Rep-MCA-former: An efficient multi-scale convolution attention encoder for text-independent speaker verification Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-10 Xiaohu Liu, Defu Chen, Xianbao Wang, Sheng Xiang, Xuwen Zhou
In many speaker verification tasks, the quality of speaker embedding is an important factor in affecting speaker verification systems. Advanced speaker embedding extraction networks aim to capture richer speaker features through the multi-branch network architecture. Recently, speaker verification systems based on transformer encoders have received much attention, and many satisfactory results have
-
Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-05 Zexin Cai, Ming Li
Partially fake audio, a variant of deep fake that involves manipulating audio utterances through the incorporation of fake or externally-sourced bona fide audio clips, constitutes a growing threat as an audio forgery attack impacting both human and artificial intelligence applications. Researchers have recently developed valuable databases to aid in the development of effective countermeasures against
-
New research on monaural speech segregation based on quality assessment Comput. Speech Lang (IF 4.3) Pub Date : 2023-12-05 Xiaoping Xie, Can Li, Dan Tian, Rufeng Shen, Fei Ding
Speech enhancement (SE) is a pivotal technology in enhancing the quality and intelligibility of speech signals. Nevertheless, when processing speech signals under conditions of high signal-to-noise ratio (SNR), conventional SE techniques may inadvertently lead to a diminution in the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). This article introduces
-
A knowledge-augmented heterogeneous graph convolutional network for aspect-level multimodal sentiment analysis Comput. Speech Lang (IF 4.3) Pub Date : 2023-11-23 Yujie Wan, Yuzhong Chen, Jiali Lin, Jiayuan Zhong, Chen Dong
Aspect-level multimodal sentiment analysis has also become a new challenge in the field of sentiment analysis. Although there has been significant progress in the task based on image–text data, existing works do not fully deal with the implicit sentiment expression in data. In addition, they do not fully exploit the important information from external knowledge and image tags. To address these problems
-
A semi-supervised high-quality pseudo labels algorithm based on multi-constraint optimization for speech deception detection Comput. Speech Lang (IF 4.3) Pub Date : 2023-11-22 Huawei Tao, Hang Yu, Man Liu, Hongliang Fu, Chunhua Zhu, Yue Xie
Deep learning-based speech deception detection research relies on a large amount of labeled data. However, in the process of collecting speech deception detection data, the identification of truth and lies requires researchers to have a professional knowledge reserve, which greatly limits the number of annotated samples. Improving the accuracy of lie detection with insufficient annotation data is the
-
Representation learning strategies to model pathological speech: Effect of multiple spectral resolutions Comput. Speech Lang (IF 4.3) Pub Date : 2023-11-15 Gabriel Figueiredo Miller, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth
This paper considers a representation learning strategy to model speech signals from patients with Parkinson’s disease, with the goal of predicting the presence of the disease, and evaluating the level of degradation of a patient’s speech. In particular, we propose a novel fusion strategy that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders
-
Though this be hesitant, yet there is method in ’t: Effects of disfluency patterns in neural speech synthesis for cultural heritage presentations Comput. Speech Lang (IF 4.3) Pub Date : 2023-11-11 Loredana Schettino, Antonio Origlia, Francesco Cutugno
This study presents the results of two perception experiments aimed at evaluating the effect that specific patterns of disfluencies have on people listening to synthetic speech. We consider the particular case of Cultural Heritage presentations and propose a linguistic model to support the positioning of disfluencies throughout the utterances in the Italian language. A state-of-the-art speech synthesizer
-
Dual Knowledge Distillation for neural machine translation Comput. Speech Lang (IF 4.3) Pub Date : 2023-11-09 Yuxian Wan, Wenlin Zhang, Zhen Li, Hao Zhang, Yanxia Li
Existing knowledge distillation methods use large amount of bilingual data and focus on mining the corresponding knowledge distribution between the source language and the target language. However, for some languages, bilingual data is not abundant. In this paper, to make better use of both monolingual and limited bilingual data, we propose a new knowledge distillation method called Dual Knowledge
-
Speaking to remember: Model-based adaptive vocabulary learning using automatic speech recognition Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-31 Thomas Wilschut, Florian Sense, Hedderik van Rijn
Memorizing vocabulary is a crucial aspect of learning a new language. While personalized learning- or intelligent tutoring systems can assist learners in memorizing vocabulary, the majority of such systems are limited to typing-based learning and do not allow for speech practice. Here, we aim to compare the efficiency of typing- and speech based vocabulary learning. Furthermore, we explore the possibilities
-
Adopting machine translation in the healthcare sector: A methodological multi-criteria review Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-29 Marco Zappatore, Gilda Ruggieri
-
An effective approach for identifying keywords As high-quality filters to get emergency-implicated Twitter Spanish data Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-26 Joel Garcia-Arteaga, Jesús Zambrano-Zambrano, Jorge Parraga-Alava, Jorge Rodas-Silva
Twitter has become a powerful knowledge source for data extraction for data mining projects due to the amount of data generated by its users, which allows researchers to find content of almost any topic in real time, but this depends on the quality of the keywords used, otherwise the extracted data will have a high percentage of irrelevant content. In this paper, we introduce a time-aware machine-learning-based
-
DAE-NER: Dual-channel attention enhancement for Chinese named entity recognition Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-26 Jingxin Liu, Mengzhe Sun, Wenhao Zhang, Gengquan Xie, Yongxia Jing, Xiulai Li, Zhaoxin Shi
Named Entity Recognition (NER) is an important component of Natural Language Processing (NLP) and is a fundamental yet challenging task in text analysis. Recently, NER models for Chinese-language characters have received considerable attention. Owing to the complexity and ambiguity of the Chinese language, the same semantic features have different levels of importance in different contexts. However
-
A lightweight approach based on prompt for few-shot relation extraction Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-25 Ying Zhang, Wencheng Huang, Depeng Dang
Few-shot relation extraction (FSRE) aims to predict the relation between two entities in a sentence using a few annotated samples. Many works solve the FSRE problem by training complex models with a huge number of parameters, which results in longer processing times to obtain results. Some recent works focus on introducing relation information into Prototype Networks in various ways. However, most
-
Meta adversarial learning improves low-resource speech recognition Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-19 Yaqi Chen, Xukui Yang, Hao Zhang, Wenlin Zhang, Dan Qu, Cong Chen
Low-resource automatic speech recognition is a challenging task. To resolve this issue, multilingual meta-learning learns a better model initialization from many source languages, allowing for rapid adaption to target languages. However, differences in data scales and learning difficulties vary greatly from one language to another. As a result, the model favors large-scale and simple source languages
-
The limits of the Mean Opinion Score for speech synthesis evaluation Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-21 Sébastien Le Maguer, Simon King, Naomi Harte
The release of WaveNet and Tacotron has forever transformed the speech synthesis landscape. Thanks to these game-changing innovations, the quality of synthetic speech has reached unprecedented levels. However, to measure this leap in quality, an overwhelming majority of studies still rely on the Absolute Category Rating (ACR) protocol and compare systems using its output; the Mean Opinion Score (MOS)
-
Document-level relation extraction with entity mentions deep attention Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-20 Yangsheng Xu, Jiaxin Tian, Mingwei Tang, Linping Tao, Liuxuan Wang
Document-level Relation Extraction(DocRE) aims to extract relations between entities from documents. In contrast to sentence-level relation extraction, it requires extracting semantic relations from multiple sentences. It is necessary to further improve the performance of the above algorithm in order to extract document-level relation. Therefore, the DocRE algorithms have to deal with more complex
-
M-Sim: Multi-level Semantic Inference Model for Chinese short answer scoring in low-resource scenarios Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-20 Peichao Lai, Feiyang Ye, Yanggeng Fu, Zhiwei Chen, Yingjie Wu, Yilei Wang
Short answer scoring is a significant task in natural language processing. On datasets comprising numerous explicit or implicit symbols and quantization entities, the existing approaches continue to perform poorly. Additionally, the majority of relevant datasets contain few-shot samples, reducing model efficacy in low-resource scenarios. To solve the above issues, we propose a Multi-level Semantic
-
Pronoun use in preclinical and early stages of Alzheimer's dementia Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-12 Dagmar Bittner, Claudia Frankenberg, Johannes Schröder
The present study aims at improving the predictive power of the use of pronouns in computational modeling of the risk of Alzheimer's dementia (AD) by (i) further determining the onset of increased pronoun use in AD and (ii) providing insights into the linguistic contexts affected by the increase early on. Pronoun use was compared longitudinally between subjects who either stayed cognitively intact
-
A ChannelWise weighting technique of slice-based Temporal Convolutional Network for noisy speech enhancement Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-11 Wei-Tyng Hong, Kuldeep Singh Rana
In recent years, Temporal Convolutional Networks (TCNs) have driven significant progress in single-channel noisy speech enhancement. However, TCN-based systems still face certain challenges, such as limited utilization of network channel depth for handling long-range dependencies and issues with weight sharing. To address these challenges, this paper proposes a novel channel-wise weighting scheme,
-
M2A: A model-agnostic and metadata-free adversarial framework for unsupervised opinion summarization Comput. Speech Lang (IF 4.3) Pub Date : 2023-10-04 Yanyue Zhang, Deyu Zhou, Zhenglin Wang, Yilong Lai
-
Two evaluations on Ontology-style relation annotations Comput. Speech Lang (IF 4.3) Pub Date : 2023-09-30 Savong Bou, Makoto Miwa, Yutaka Sasaki
In this paper, we propose an Ontology-Style Relation (OSR) annotation approach. In conventional Relation Extraction (RE) datasets, relations are annotated as a link between two entity mentions. In contrast, in our OSR annotation, a relation is annotated as a relation mention (i.e., not a link but a node) and rdfs:domain and rdfs:range links are annotated from the relation mention to its argument entity
-
Morse wavelet transform-based features for voice liveness detection Comput. Speech Lang (IF 4.3) Pub Date : 2023-09-29 Priyanka Gupta, Hemant A. Patil
The need for Voice Liveness Detection (VLD) has emerged particularly for the security of Automatic Speaker Verification (ASV) systems. Existing Spoofed Speech Detection (SSD) systems rely on attack-specific approaches to detect spoofed speech. However, to safeguard ASV systems against all the kinds of spoofing attacks (known as well as unknown attacks), determining whether a speech is uttered live
-
Towards inclusive automatic speech recognition Comput. Speech Lang (IF 4.3) Pub Date : 2023-09-20 Siyuan Feng, Bence Mark Halpern, Olya Kudina, Odette Scharenborg
Practice and recent evidence show that state-of-the-art (SotA) automatic speech recognition (ASR) systems do not perform equally well for all speaker groups. Many factors can cause this bias against different speaker groups. This paper, for the first time, systematically quantifies and finds speech recognition bias against gender, age, regional accents and non-native accents, and investigates the origin
-
-
Towards better Chinese-centric neural machine translation for low-resource languages Comput. Speech Lang (IF 4.3) Pub Date : 2023-09-11 Bin Li, Yixuan Weng, Fei Xia, Hanjun Deng
The last decade has witnessed enormous improvements in science and technology, stimulating the growing demand for economic and cultural exchanges in various countries. Building a neural machine translation (NMT) system has become an urgent trend, especially in the low-resource setting. However, recent work tends to study NMT systems for low-resource languages centered on English, while a few studies
-
GTSO: Gradient tangent search optimization enabled voice transformer with speech intelligibility for aphasia Comput. Speech Lang (IF 4.3) Pub Date : 2023-09-09 Ranjith R, Chandrasekar A
A frequent neurological condition known as aphasia is brought on by injury to language-related brain regions as well as possibly other regions of the brain involved in executive, memory, and attention functions. Due to a lack of speech-language pathologists and the vast expense of treatment, traditional therapy is difficult for aphasia-affected people to access. In this research work, speech intelligibility
-
Predicting children’s perceived reading proficiency with prosody modeling Comput. Speech Lang (IF 4.3) Pub Date : 2023-09-09 Kamini Sabu, Preeti Rao
Reading is a foundational skill and the focus of school-level education efforts across countries. The assessment of linguistic competence from oral reading has long been the subject of scientific studies linking the reader’s comprehension of the text to various measures of oral reading fluency. Given the time and resource intensive nature of such assessment, it is of interest to automate the prediction
-
A physical exertion inspired multi-task learning framework for detecting out-of-breath speech Comput. Speech Lang (IF 4.3) Pub Date : 2023-08-31 Sibasis Sahoo, Samarendra Dandapat
Physical exertion is a stress condition that affects how we normally produce speech. It alters both the temporal and spectral pattern of speech characteristics. Therefore, speech utterances can be used as a cost-effective telehealth solution to detect whether a person is under physical exertion. This paper deals with the detection of shortness of breath (or out-of-breath) condition from speech under
-
Optimized Binaural Enhancement via Attention Masking Network-based Speech Separation Framework in Digital Hearing Aids Comput. Speech Lang (IF 4.3) Pub Date : 2023-08-23 A Joseph Sathiadhas Esra, Dr. Y. Sukhi
Earlier studies have utilized microphone signal processing for performing the target speech evaluation and separation-like feature recognition by involving a huge amount of training data along with supervised machine learning. These approaches are most appropriate for stationary noise suppression; however, it is challenging for non-stationary noise, and it does not satisfy the practical processing
-
Training RNN language models on uncertain ASR hypotheses in limited data scenarios Comput. Speech Lang (IF 4.3) Pub Date : 2023-08-20 Imran Sheikh, Emmanuel Vincent, Irina Illina
Training domain-specific automatic speech recognition (ASR) systems requires a suitable amount of data comprising the target domain. In several scenarios, such as early development stages, privacy-critical applications, or under-resourced languages, only a limited amount of in-domain speech data and an even smaller amount of manual text transcriptions, if any, are available. This motivates the study
-
Bayesian active summarization Comput. Speech Lang (IF 4.3) Pub Date : 2023-08-15 Alexios Gidiotis, Grigorios Tsoumakas
Bayesian Active Learning has had significant impact to various NLP problems, but nevertheless its application to text summarization has been explored very little. We introduce Bayesian Active Summarization (BAS), as a method of combining active learning methods with state-of-the-art summarization models. Our findings suggest that BAS achieves better and more robust performance, compared to random selection