当前位置: X-MOL 学术IET Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Guest Editorial: Machine learning applied to quality and security in software systems
IET Software ( IF 1.6 ) Pub Date : 2023-07-25 , DOI: 10.1049/sfw2.12141
Honghao Gao 1 , Walayat Hussain 2 , Ramón J. Durán Barroso 3 , Junaid Arshad 4 , Yuyu Yin 5
Affiliation  

During the development of software systems, even with advanced planning, problems with quality and security occur. These defects may result in threats to program development and maintenance. Therefore, to control and minimise these defects, machine learning can be used to improve the quality and security of software systems. This special issue focuses on recent advances in architecture, algorithms, optimisation, and models for machine learning applied to quality and security in software systems. After a rigorous review according to relevance, originality, technical novelties, and presentation quality, we selected 4 manuscripts. A summary of these accepted papers is outlined below.

In the first paper entitled “Robust Malware Identification via Deep Temporal Convolutional Network with Symmetric Cross Entropy Learning” by Sun et al., the authors propose a robust Malware identification method using the temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface (API) function names in many cases. Here, considering the numerous unlabelled samples in practical intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, word2vec. In the experiments, the proposed method is compared to several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of the proposed method, that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.

In the second paper entitled “Just-In-Time Defect Prediction Enhanced by the Joint Method of Line Label Fusion and File Filtering” by Zhang et al., the authors propose a Just-in-Time defect prediction model enhanced by the joint method of line label Fusion and file Filtering (JIT-FF). First, to distinguish added and removed lines while preserving the original software changes information, the authors represent the code changes as original, added, and removed codes according to line labels. Second, to obtain semantics-enhanced code representation, the authors propose a cross-attention-based line label fusion method to perform complementary feature enhancement. Third, to generate code changes containing fewer defect-irrelevant files, the authors formalise the file filtering as a sequential decision problem and propose a reinforcement learning-based file filtering method. Finally, based on generated code changes, CodeBERT-based commit representation and multi-layer perceptron-based defect prediction are performed to identify the defective software changes. The experiments demonstrate that JIT-FF predicts defective software changes more effectively.

In the third paper entitled “Android Malware Detection via Efficient API Call Sequences Extraction and Machine Learning Classifiers” by Wang et al., the authors propose a novel Android malware detection framework, where the authors contribute an efficient API call sequences extraction algorithm and an investigation of different types of classifiers. In API call sequences extraction, the authors propose an algorithm for transforming the function call graph from a multigraph into a directed simple graph, which successfully avoids unnecessary repetitive path searching. The authors also propose a pruning search, which further reduces the number of paths to be searched. The developed algorithm greatly reduces the time complexity. The authors generate the transition matrix as classification features and investigate three types of machine learning classifiers to complete the malware detection task. The experiments are performed on real-world APKs, and the results demonstrate that the proposed method reduces the running time and produces high detection accuracy.

In the fourth paper entitled “Selecting Reliable Blockchain Peers via Hybrid Blockchain Reliability Prediction” by Zheng et al., the authors propose H-BRP, a Hybrid Blockchain Reliability Prediction model, to extract the blockchain reliability factors and then make the personalised prediction for each user. Connecting to unreliable blockchain peers is prone to resource waste and even loss of cryptocurrencies by repeated transactions. The proposed model primarily aims to select reliable blockchain peers and to evaluate and predict their reliability. Comprehensive experiments conducted on 100 blockchain requesters and 200 blockchain peers demonstrate the effectiveness of the proposed H-BRP model. Furthermore, the implementation and dataset of 2,000,000 test cases are released.

The Guest Editors would like to express their deep gratitude to all the authors who have submitted their valuable contributions, and to the numerous and highly qualified anonymous reviewers. We think that the selected contributions, which represent the current state of the art in the field, will be of great interest to the community. We also would like to thank the IET Software publication staff members for their continuous support and dedication. We particularly appreciate the relentless support and encouragement granted to us by Prof. Hana Chockler, the Editor-in-Chief of IET Software.



中文翻译:

客座社论:机器学习应用于软件系统的质量和安全

在软件系统的开发过程中,即使有提前规划,也会出现质量和安全问题。这些缺陷可能会对程序开发和维护造成威胁。因此,为了控制和最小化这些缺陷,可以使用机器学习来提高软件系统的质量和安全性。本期特刊重点关注应用于软件系统质量和安全的机器学习的架构、算法、优化和模型的最新进展。经过根据相关性、原创性、技术新颖性和演示质量进行严格审查后,我们选出了 4 篇稿件。这些被接受论文的摘要概述如下。

在 Sun 等人的第一篇题为“通过具有对称交叉熵学习的深度时域卷积网络的鲁棒恶意软件识别”的论文中,作者提出了一种使用时域卷积网络(TCN)的鲁棒恶意软件识别方法。此外,在许多情况下,词嵌入技术通常用于理解输入操作码(opcode)和应用程序编程接口(API)函数名称之间的上下文关系。这里,考虑到实际智能环境中存在大量未标记样本,作者使用词嵌入方法在未标记集上预训练TCN模型,即word 2 vec。在实验中,所提出的方法在合成恶意软件数据集和真实世界数据集上与几种传统统计方法和更新的神经网络进行了比较。性能比较表明该方法具有更好的性能和噪声鲁棒性,在实际场景中该方法可以产生 98.75% 的最佳识别精度。

在Zhang等人的第二篇论文《Just-In-Time Defect Prediction Enhanced by the Joint Method of Line Label Fusion and File Filtering》中,作者提出了一种通过行标签融合和文件过滤联合方法增强的即时缺陷预测模型。行标签融合和文件过滤(JIT-FF)。首先,为了区分添加和删除的行,同时保留原始软件更改信息,作者根据行标签将代码更改表示为原始、添加和删除的代码。其次,为了获得语义增强的代码表示,作者提出了一种基于交叉注意力的行标签融合方法来执行互补特征增强。第三,为了生成包含较少缺陷无关文件的代码更改,作者将文件过滤形式化为顺序决策问题,并提出了一种基于强化学习的文件过滤方法。最后,根据生成的代码更改,执行基于 CodeBERT 的提交表示和基于多层感知器的缺陷预测,以识别有缺陷的软件更改。实验表明,JIT-FF 可以更有效地预测有缺陷的软件更改。

在 Wang 等人撰写的题为“通过高效 API 调用序列提取和机器学习分类器进行 Android 恶意软件检测”的第三篇论文中,作者提出了一种新颖的 Android 恶意软件检测框架,其中作者贡献了一种高效的 API 调用序列提取算法和一项研究不同类型的分类器。在API调用序列提取中,作者提出了一种将函数调用图从多重图转换为有向简单图的算法,成功避免了不必要的重复路径搜索。作者还提出了剪枝搜索,进一步减少了要搜索的路径数量。所开发的算法大大降低了时间复杂度。作者生成转换矩阵作为分类特征,并研究三种类型的机器学习分类器来完成恶意软件检测任务。实验在真实世界的 APK 上进行,结果表明该方法减少了运行时间并产生了较高的检测精度。

在Zheng等人的第四篇论文“Selecting Reliable Blockchain Peers via Hybrid Blockchain Reliability Prediction”中,作者提出了H-BRP,一种混合​​区块链可靠性预测模型,提取区块链可靠性因素,然后对每个因素进行个性化预测用户。连接到不可靠的区块链节点很容易造成资源浪费,甚至因重复交易而导致加密货币丢失。所提出的模型主要旨在选择可靠的区块链节点并评估和预测其可靠性。对 100 个区块链请求者和 200 个区块链节点进行的综合实验证明了所提出的 H-BRP 模型的有效性。此外,还发布了2,000,000个测试用例的实现和数据集。

客座编辑谨向所有提交宝贵贡献的作者以及众多高素质的匿名审稿人表示深切的谢意。我们认为,选定的贡献代表了该领域当前的最新技术水平,将会引起社区的极大兴趣。我们还要感谢IET Software出版人员的持续支持和奉献。我们特别感谢IET Software主编 Hana Chockler 教授给予我们的不懈支持和鼓励。

更新日期:2023-07-29
down
wechat
bug