A Multi-Channel Convolutional Neural Network approach to automate the citation screening process

doi:10.1016/j.asoc.2021.107765

Applied Soft Computing

Volume 112, November 2021, 107765

https://doi.org/10.1016/j.asoc.2021.107765 Get rights and content

Under a Creative Commons license

open access

Highlights

•
Multi-Channel CNN model was developed to support the citation screening process.
•
Our model uses the Glove Embeddings to gain insight from each word’s context.
•
20 systematic review datasets from the medical domain were used for evaluation.
•
Significant workload savings of at least 10% in 18 out of 20 review datasets.

Abstract

The systematic literature review (SLR) process is separated into several steps to increase rigor and reproducibility. The selection of primary studies (i.e., citation screening) is an important step in the SLR process. The citation screening process aims to identify the relevant primary studies fairly and with high rigor using selection criteria. Through the study selection criteria, reviewers determine whether an article should be included or excluded from the SLR. However, the screening process is highly time-consuming and error-prone as the researchers must read each title and possibly hundreds to thousands of abstracts and full-text documents. This study aims to automate the citation screening process using Deep Learning algorithms. With this, it is aimed to reduce the time and costs of the citation screening process and increase the precision and recall of the relevant primary studies. A Multi-Channel Convolutional Neural Network (CNN) is proposed, which can automatically classify a given set of citations. As the architecture uses the title and abstract as features, our end-to-end pipeline is domain-independent. We have performed six experiments to assess the performance of Multi-Channel CNNs across 20 publicly available systematic literature review datasets. It was shown that for 18 out of 20 review datasets, the proposed method achieved significant workload savings of at least 10%, while in several cases, our model yielded a statistically significantly better performance over two benchmark review datasets. We conclude that Multi-Channel CNNs are effective for the citation screening process in SLRs. Multi-Channel CNNs perform best on large datasets of over 2500 samples with few abstracts missing.

Keywords

Systematic literature review (SLR)

Citation screening

Automation

Neural networks

Natural language processing