-
CYCLE: Learning to Self-Refine the Code Generation arXiv.cs.SE Pub Date : 2024-03-27 Yangruibo Ding, Marcus J. Min, Gail Kaiser, Baishakhi Ray
Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually
-
An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project arXiv.cs.SE Pub Date : 2024-03-27 Ben Arie Tanay, Lexy Arinze, Siddhant S. Joshi, Kirsten A. Davis, James C. Davis
Background: Large Language Models (LLMs) such as ChatGPT and CoPilot are influencing software engineering practice. Software engineering educators must teach future software engineers how to use such tools well. As of yet, there have been few studies that report on the use of LLMs in the classroom. It is, therefore, important to evaluate students' perception of LLMs and possible ways of adapting the
-
Vulnerability Detection with Code Language Models: How Far Are We? arXiv.cs.SE Pub Date : 2024-03-27 Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, Yizheng Chen
In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing vulnerability datasets, including poor data quality, low label accuracy, and high duplication rates, leading to unreliable model performance in realistic vulnerability detection
-
Algorithmic Details behind the Predator Shape Analyser arXiv.cs.SE Pub Date : 2024-03-27 Kamil Dudka, Petr Muller, Petr Peringer, Veronika Šoková, Tomáš Vojnar
This chapter, which is an extended and revised version of the conference paper 'Predator: Byte-Precise Verification of Low-Level List Manipulation', concentrates on a detailed description of the algorithms behind the Predator shape analyser based on abstract interpretation and symbolic memory graphs. Predator is particularly suited for formal analysis and verification of sequential non-recursive C
-
UVL Sentinel: a tool for parsing and syntactic correction of UVL datasets arXiv.cs.SE Pub Date : 2024-03-27 David Romero-Organvidez, Jose A. Galindo, David Benavides
Feature models have become a de facto standard for representing variability in software product lines. UVL (Universal Variability Language) is a language which expresses the features, dependencies, and constraints between them. This language is written in plain text and follows a syntactic structure that needs to be processed by a parser. This parser is software with specific syntactic rules that the
-
How is Testing Related to Single Statement Bugs? arXiv.cs.SE Pub Date : 2024-03-27 Habibur Rahman, Saqib Ameen
In this study, we analyzed the correlation between unit test coverage and the occurrence of Single Statement Bugs (SSBs) in open-source Java projects. We analyzed data from the top 100 Maven-based projects on GitHub, which includes 7824 SSBs. Our preliminary findings suggest a weak to moderate correlation, indicating that increased test coverage is somewhat reduce the occurrence of SSBs. However, this
-
TGMM: Combining Parse Tree with GPU for Scalable Multilingual and Multi-Granularity Code Clone Detection arXiv.cs.SE Pub Date : 2024-03-27 Yuhang Ye, Yuekun Wang, Yinxing Xue, Yueming Wu, Yang Liu
The rapid evolution of programming languages and software systems has necessitated the implementation of multilingual and scalable clone detection tools. However, it is difficult to achieve the above requirements at the same time. Most existing tools only focus on one challenge. In this work, we propose TGMM, a tree and GPU-based tool for multilingual and multi-granularity code clone detection. By
-
Testing Resource Isolation for System-on-Chip Architectures arXiv.cs.SE Pub Date : 2024-03-27 Philippe Ledent, Radu Mateescu, Wendelin Serwe
Ensuring resource isolation at the hardware level is a crucial step towards more security inside the Internet of Things. Even though there is still no generally accepted technique to generate appropriate tests, it became clear that tests should be generated at the system level. In this paper, we illustrate the modeling aspects in test generation for resource isolation, namely modeling the behavior
-
Natural Language Requirements Testability Measurement Based on Requirement Smells arXiv.cs.SE Pub Date : 2024-03-26 Morteza Zakeri-Nasrabadi, Saeed Parsa
Requirements form the basis for defining software systems' obligations and tasks. Testable requirements help prevent failures, reduce maintenance costs, and make it easier to perform acceptance tests. However, despite the importance of measuring and quantifying requirements testability, no automatic approach for measuring requirements testability has been proposed based on the requirements smells,
-
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution arXiv.cs.SE Pub Date : 2024-03-26 Wei Tao, Yucheng Zhou, Wenqiang Zhang, Yu Cheng
In software evolution, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing functionalities. Large Language Models (LLMs) have shown promise in code generation and understanding but face difficulties in code change, particularly at the repository level. To overcome these challenges,
-
SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions arXiv.cs.SE Pub Date : 2024-03-26 Cheryl Lee, Zhouruixin Zhu, Tianyi Yang, Yintong Huo, Yuxin Su, Pinjia He, Michael R. Lyu
As an emerging cloud computing deployment paradigm, serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. However, a significant hurdle remains in the form of the cold start problem, causing latency when launching new function instances from scratch. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloading
-
An Empirical Study of ChatGPT-related projects on GitHub arXiv.cs.SE Pub Date : 2024-03-26 Zheng Lin, Neng Zhang
As ChatGPT possesses powerful capabilities in natural language processing and code analysis, it has received widespread attention since its launch. Developers have applied its powerful capabilities to various domains through software projects which are hosted on the largest open-source platform (GitHub) worldwide. Simultaneously, these projects have triggered extensive discussions. In order to comprehend
-
Characterizing Dependency Update Practice of NPM, PyPI and Cargo Packages arXiv.cs.SE Pub Date : 2024-03-26 Imranur Rahman, Nusrat Zahan, Stephen Magill, William Enck, Laurie Williams
Keeping dependencies up-to-date prevents software supply chain attacks through outdated and vulnerable dependencies. Developers may use packages' dependency update practice as one of the selection criteria for choosing a package as a dependency. However, the lack of metrics characterizing packages' dependency update practice makes this assessment difficult. To measure the up-to-date characteristics
-
A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection arXiv.cs.SE Pub Date : 2024-03-25 Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Earl T. Barr, Wei Le
Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabilities
-
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance arXiv.cs.SE Pub Date : 2024-03-25 Jaskirat Singh, Bram Adams, Ahmed E. Hassan
Deciding what combination of operators to use across the Edge AI tiers to achieve specific latency and model performance requirements is an open question for MLOps engineers. This study aims to empirically assess the accuracy vs inference time trade-off of different black-box Edge AI deployment strategies, i.e., combinations of deployment operators and deployment tiers. In this paper, we conduct inference
-
Proceedings Sixth Workshop on Models for Formal Analysis of Real Systems arXiv.cs.SE Pub Date : 2024-03-26 Frédéric LangINRIA Grenoble Rhône-Alpes, France, Matthias VolkEindhoven University of Technology, The Netherlands
This volume contains the proceedings of MARS 2024, the sixth workshop on Models for Formal Analysis of Real Systems, held as part of ETAPS 2024, the European Joint Conferences on Theory and Practice of Software. The MARS workshops bring together researchers from different communities who are developing formal models of real systems in areas where complex models occur, such as networks, cyber-physical
-
Generation of Asset Administration Shell with Large Language Model Agents: Interoperability in Digital Twins with Semantic Node arXiv.cs.SE Pub Date : 2024-03-25 Yuchen Xia, Zhewen Xiao, Nasser Jazdi, Michael Weyrich
This research introduces a novel approach for assisting the creation of Asset Administration Shell (AAS) instances for digital twin modeling within the context of Industry 4.0, aiming to enhance interoperability in smart manufacturing and reduce manual effort. We construct a "semantic node" data structure to capture the semantic essence of textual data. Then, a system powered by large language models
-
LLMs as Compiler for Arabic Programming Language arXiv.cs.SE Pub Date : 2024-03-24 Serry Sibaee, Omar Najar, Lahouri Ghouti, Anis Koubaa
In this paper we introduce APL (Arabic Programming Language) that uses Large language models (LLM) as semi-compiler to covert Arabic text code to python code then run the code. Designing a full pipeline from the structure of the APL text then a prompt (using prompt engineering) then running the prodcued python code using PyRunner. This project has a three parts first python library, a playground with
-
Seeking Enlightenment: Incorporating Evidence-Based Practice Techniques in a Research Software Engineering Team arXiv.cs.SE Pub Date : 2024-03-25 Reed Milewicz, Jon Bisila, Miranda Mundt, Joshua Teves
Evidence-based practice (EBP) in software engineering aims to improve decision-making in software development by complementing practitioners' professional judgment with high-quality evidence from research. We believe the use of EBP techniques may be helpful for research software engineers (RSEs) in their work to bring software engineering best practices to scientific software development. In this study
-
Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection arXiv.cs.SE Pub Date : 2024-03-25 Haoyang Chen, Botong Xu, Kaiyang Zhong
Purpose: The study aims to investigate the application of the data element market in software project management, focusing on improving effort estimation by addressing challenges faced by traditional methods. Design/methodology/approach: This study proposes a solution based on feature selection, utilizing the data element market and reinforcement learning-based algorithms to enhance the accuracy of
-
Design Patterns for Multilevel Modeling and Simulation arXiv.cs.SE Pub Date : 2024-03-25 Luca Serena, Moreno Marzolla, Gabriele D'Angelo, Stefano Ferretti
Multilevel modeling and simulation (M&S) is becoming increasingly relevant due to the benefits that this methodology offers. Multilevel models allow users to describe a system at multiple levels of detail. From one side, this can make better use of computational resources, since the more detailed and time-consuming models can be executed only when/where required. From the other side, multilevel models
-
Investigating the Readability of Test Code: Combining Scientific and Practical Views arXiv.cs.SE Pub Date : 2024-03-25 Dietmar Winkler, Pirmin Urbanke, Rudolf Ramler
The readability of source code is key for understanding and maintaining software systems and tests. Several studies investigate the readability of source code, but there is limited research on the readability of test code and related influence factors. We investigate the factors that influence the readability of test code from an academic perspective complemented by practical views. First, we perform
-
Exposing the hidden layers and interplay in the quantum software stack arXiv.cs.SE Pub Date : 2024-03-25 Vlad Stirbu, Arianne Meijer-van de Griend, Jake Muff
Current and near-future quantum computers face resource limitations due to noise and low qubit counts. Despite this, effective quantum advantage can still be achieved due to the exponential nature of bit-to-qubit conversion. However, optimizing the software architecture of these systems is essential to utilize available resources efficiently. Unfortunately, the focus on user-friendly quantum computers
-
Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models arXiv.cs.SE Pub Date : 2024-03-25 Mingyi Zhou, Xiang Gao, Pei Liu, John Grundy, Chunyang Chen, Xiao Chen, Li Li
Recent studies show that deployed deep learning (DL) models such as those of Tensor Flow Lite (TFLite) can be easily extracted from real-world applications and devices by attackers to generate many kinds of attacks like adversarial attacks. Although securing deployed on-device DL models has gained increasing attention, no existing methods can fully prevent the aforementioned threats. Traditional software
-
A Mixed Method Study of DevOps Challenges arXiv.cs.SE Pub Date : 2024-03-25 Minaoar Hossain Tanzil, Masud Sarker, Gias Uddin, Anindya Iqbal
Context: DevOps practices combine software development and IT operations. There is a growing number of DevOps related posts in popular online developer forum Stack Overflow (SO). While previous research analyzed SO posts related to build/release engineering, we are aware of no research that specifically focused on DevOps related discussions. Objective: To learn the challenges developers face while
-
AgentFL: Scaling LLM-based Fault Localization to Project-Level Context arXiv.cs.SE Pub Date : 2024-03-25 Yihao Qin, Shangwen Wang, Yiling Lou, Jinhao Dong, Kaixin Wang, Xiaoling Li, Xiaoguang Mao
Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code scope
-
Coupled Requirements-driven Testing of CPS: From Simulation To Reality arXiv.cs.SE Pub Date : 2024-03-24 Ankit Agrawal, Philipp Zech, Michael Vierhauser
Failures in safety-critical Cyber-Physical Systems (CPS), both software and hardware-related, can lead to severe incidents impacting physical infrastructure or even harming humans. As a result, extensive simulations and field tests need to be conducted, as part of the verification and validation of system requirements, to ensure system safety. However, current simulation and field testing practices
-
"How do people decide?": A Model for Software Library Selection arXiv.cs.SE Pub Date : 2024-03-24 Minaoar Hossain Tanzil, Gias Uddin, Ann Barcomb
Modern-day software development is often facilitated by the reuse of third-party software libraries. Despite the significant effort to understand the factors contributing to library selection, it is relatively unknown how the libraries are selected and what tools are still needed to support the selection process. Using Straussian grounded theory, we conducted and analyzed the interviews of 24 professionals
-
SoK: Comprehensive Analysis of Rug Pull Causes, Datasets, and Detection Tools in DeFi arXiv.cs.SE Pub Date : 2024-03-24 Dianxiang Sun, Wei Ma, Liming Nie, Yang Liu
Rug pulls pose a grave threat to the cryptocurrency ecosystem, leading to substantial financial loss and undermining trust in decentralized finance (DeFi) projects. With the emergence of new rug pull patterns, research on rug pull is out of state. To fill this gap, we first conducted an extensive analysis of the literature review, encompassing both scholarly and industry sources. By examining existing
-
Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications arXiv.cs.SE Pub Date : 2024-03-24 Wei Ma, Daoyuan Wu, Yuqiang Sun, Tianwen Wang, Shangqing Liu, Jian Zhang, Yue Xue, Yang Liu
Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general
-
FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools arXiv.cs.SE Pub Date : 2024-03-24 Han Liu, Jian Zhang, Cen Zhang, Xiaohan Zhang, Kaixuan Li, Sen Chen, Shang-Wei Lin, Yixiang Chen, Xinhua Li, Yang Liu
The continual expansion of software size and complexity has led to an increased focus on reducing defects and bugs during development. Although Automated Static Analysis Tools (ASATs) offer help, in practice, the significant number of false positives can impede developers' productivity and confidence in the tools. Therefore, previous research efforts have explored learning-based methods to validate
-
Fine-Grained Assertion-Based Test Selection arXiv.cs.SE Pub Date : 2024-03-24 Sijia Gu, Ali Mesbah
For large software applications, running the whole test suite after each code change is time- and resource-intensive. Regression test selection techniques aim at reducing test execution time by selecting only the tests that are affected by code changes. However, existing techniques select test entities at coarse granularity levels such as test class, which causes imprecise test selection and executing
-
Who Uses Personas in Requirements Engineering: The Practitioners' Perspective arXiv.cs.SE Pub Date : 2024-03-23 Yi Wang, Chetan Arora, Xiao Liu, Thuong Hoang, Vasudha Malhotra, Ben Cheng, John Grundy
Personas are commonly used in software projects to gain a better understanding of end-users' needs. However, there is a limited understanding of their usage and effectiveness in practice. This paper presents the results of a two-step investigation, comprising interviews with 26 software developers, UI/UX designers, business analysts and product managers and a survey of 203 practitioners, aimed at shedding
-
Automated System-level Testing of Unmanned Aerial Systems arXiv.cs.SE Pub Date : 2024-03-23 Hassan Sartaj, Asmar Muqeet, Muhammad Zohaib Iqbal, Muhammad Uzair Khan
Unmanned aerial systems (UAS) rely on various avionics systems that are safety-critical and mission-critical. A major requirement of international safety standards is to perform rigorous system-level testing of avionics software systems. The current industrial practice is to manually create test scenarios, manually/automatically execute these scenarios using simulators, and manually evaluate outcomes
-
When LLM-based Code Generation Meets the Software Development Process arXiv.cs.SE Pub Date : 2024-03-23 Feng LinPeter, Dong Jae KimPeter, Tse-HusnPeter, Chen
Software process models play a pivotal role in fostering collaboration and communication within software teams, enabling them to tackle intricate development tasks effectively. This paper introduces LCG, a code generation framework inspired by established software engineering practices. LCG leverages multiple Large Language Model (LLM) agents to emulate various software process models, namely LCGWaterfall
-
Local Features: Enhancing Variability Modeling in Software Product Lines arXiv.cs.SE Pub Date : 2024-03-23 David de Castro, Alejandro Cortiñas, Miguel R. Luaces, Oscar Pedreira, Ángeles Saavedra Places
Context and motivation: Software Product Lines (SPL) enable the creation of software product families with shared core components using feature models to model variability. Choosing features from a feature model to generate a product may not be sufficient in certain situations because the application engineer may need to be able to decide on configuration time the system's elements to which a certain
-
CodeShell Technical Report arXiv.cs.SE Pub Date : 2024-03-23 Rui Xie, Zhengran Zeng, Zhuohao Yu, Chang Gao, Shikun Zhang, Wei Ye
Code large language models mark a pivotal breakthrough in artificial intelligence. They are specifically crafted to understand and generate programming languages, significantly boosting the efficiency of coding development workflows. In this technical report, we present CodeShell-Base, a seven billion-parameter foundation model with 8K context length, showcasing exceptional proficiency in code comprehension
-
Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers arXiv.cs.SE Pub Date : 2024-03-22 Sivana Hamer, Marcelo d'Amorim, Laurie Williams
Sonatype's 2023 report found that 97% of developers and security leads integrate generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), into their development process. Concerns about the security implications of this trend have been raised. Developers are now weighing the benefits and risks of LLMs against other relied-upon information sources, such as StackOverflow (SO)
-
Concerned with Data Contamination? Assessing Countermeasures in Code Language Model arXiv.cs.SE Pub Date : 2024-03-25 Jialun Cao, Wuqi Zhang, Shing-Chi Cheung
Various techniques have been proposed to leverage the capabilities of code language models (CLMs) for SE tasks. While these techniques typically evaluate their effectiveness using publicly available datasets, the evaluation can be subject to data contamination threats where the evaluation datasets have already been used to train the concerned CLMs. This can significantly affect the reliability of the
-
Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback arXiv.cs.SE Pub Date : 2024-03-25 Zhangqian Bi, Yao Wan, Zheng Wang, Hongyu Zhang, Batu Guan, Fangxin Lu, Zili Zhang, Yulei Sui, Xuanhua Shi, Hai Jin
Large language models (LLMs) have shown remarkable progress in automated code generation. Yet, incorporating LLM-based code generation into real-life software projects poses challenges, as the generated code may contain errors in API usage, class, data structure, or missing project-specific information. As much of this project-specific context cannot fit into the prompts of LLMs, we must find ways
-
DeepKnowledge: Generalisation-Driven Deep Learning Testing arXiv.cs.SE Pub Date : 2024-03-25 Sondess Missaoui, Simos Gerasimou, Nikolaos Matragkas
Despite their unprecedented success, DNNs are notoriously fragile to small shifts in data distribution, demanding effective testing techniques that can assess their dependability. Despite recent advances in DNN testing, there is a lack of systematic testing approaches that assess the DNN's capability to generalise and operate comparably beyond data in their training distribution. We address this gap
-
CodeS: Natural Language to Code Repository via Multi-Layer Sketch arXiv.cs.SE Pub Date : 2024-03-25 Daoguang Zan, Ailun Yu, Wei Liu, Dong Chen, Bo Shen, Wei Li, Yafen Yao, Yongshun Gong, Xiaolin Chen, Bei Guan, Zhiguang Yang, Yongji Wang, Qianxiang Wang, Lizhen Cui
The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple yet
-
Can Language Models Pretend Solvers? Logic Code Simulation with LLMs arXiv.cs.SE Pub Date : 2024-03-24 Minyu Chen, Guoqiang Li, Ling-I Wu, Ruibang Liu, Yuxin Su, Xi Chang, Jianxin Xue
Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators, their
-
A hybrid LLM workflow can help identify user privilege related variables in programs of any size arXiv.cs.SE Pub Date : 2024-03-23 Haizhou Wang, Zhilong Wang, Peng Liu
Many programs involves operations and logic manipulating user privileges, which is essential for the security of an organization. Therefore, one common malicious goal of attackers is to obtain or escalate the privileges, causing privilege leakage. To protect the program and the organization against privilege leakage attacks, it is important to eliminate the vulnerabilities which can be exploited to
-
Navigating Fairness: Practitioners' Understanding, Challenges, and Strategies in AI/ML Development arXiv.cs.SE Pub Date : 2024-03-21 Aastha Pant, Rashina Hoda, Chakkrit Tantithamthavorn, Burak Turhan
The rise in the use of AI/ML applications across industries has sparked more discussions about the fairness of AI/ML in recent times. While prior research on the fairness of AI/ML exists, there is a lack of empirical studies focused on understanding the views and experiences of AI practitioners in developing a fair AI/ML. Understanding AI practitioners' views and experiences on the fairness of AI/ML
-
Enhancing Testing at Meta with Rich-State Simulated Populations arXiv.cs.SE Pub Date : 2024-03-22 Nadia Alshahwan, Arianna Blasi, Kinga Bojarczuk, Andrea Ciancone, Natalija Gucevska, Mark Harman, Simon Schellaert, Inna Harper, Yue Jia, Michał Królikowski, Will Lewis, Dragos Martac, Rubmary Rojas, Kate Ustiuzhanina
This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS and
-
ACCESS: Assurance Case Centric Engineering of Safety-critical Systems arXiv.cs.SE Pub Date : 2024-03-22 Ran Wei, Simon Foster, Haitao Mei, Fang Yan, Ruizhe Yang, Ibrahim Habli, Colin O'Halloran, Nick Tudor, Tim Kelly
Assurance cases are used to communicate and assess confidence in critical system properties such as safety and security. Historically, assurance cases have been manually created documents, which are evaluated by system stakeholders through lengthy and complicated processes. In recent years, model-based system assurance approaches have gained popularity to improve the efficiency and quality of system
-
Towards Deep Learning Enabled Cybersecurity Risk Assessment for Microservice Architectures arXiv.cs.SE Pub Date : 2024-03-22 Majid Abdulsatar, Hussain Ahmad, Diksha Goel, Faheem Ullah
The widespread adoption of microservice architectures has given rise to a new set of software security challenges. These challenges stem from the unique features inherent in microservices. It is important to systematically assess and address software security challenges such as software security risk assessment. However, existing approaches prove inefficient in accurately evaluating the security risks
-
AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models arXiv.cs.SE Pub Date : 2024-03-22 Chaoyun Zhang, Zicheng Ma, Yuhao Wu, Shilin He, Si Qin, Minghua Ma, Xiaoting Qin, Yu Kang, Yuyi Liang, Xiaoyu Gou, Yajie Xue, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
Verbatim feedback constitutes a valuable repository of user experiences, opinions, and requirements essential for software development. Effectively and efficiently extracting valuable insights from such data poses a challenging task. This paper introduces Allhands , an innovative analytic framework designed for large-scale feedback analysis through a natural language interface, leveraging large language
-
On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions arXiv.cs.SE Pub Date : 2024-03-22 Matteo Ciniselli, Alberto Martin-Lopez, Gabriele Bavota
Code completion is a key feature of Integrated Development Environments (IDEs), aimed at predicting the next tokens a developer is likely to write, helping them write code faster and with less effort. Modern code completion approaches are often powered by deep learning (DL) models. However, the swift evolution of programming languages poses a critical challenge to the performance of DL-based code completion
-
Testing for Fault Diversity in Reinforcement Learning arXiv.cs.SE Pub Date : 2024-03-22 Quentin Mazouni, Helge Spieker, Arnaud Gotlieb, Mathieu Acher
Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, none
-
Programmers Prefer Individually Assigned Tasks vs. Shared Responsibility arXiv.cs.SE Pub Date : 2024-03-22 Adela Krylova, Roman Makarov, Sergei Pasynkov, Yegor Bugayenko
In traditional management, tasks are typically assigned to individuals, with each worker taking full responsibility for the success or failure of a task. In contrast, modern Agile, Lean, and eXtreme Programming practices advocate for shared responsibility, where an entire group is accountable for the outcome of a project or task. Despite numerous studies in other domains, the preferences of programmers
-
Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation arXiv.cs.SE Pub Date : 2024-03-22 Shanthi Karpurapu, Sravanthy Myneni, Unnati Nettur, Likhit Sagar Gajja, Dave Burke, Tom Stiehm, Jeffery Payne
Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper
-
"The Law Doesn't Work Like a Computer": Exploring Software Licensing Issues Faced by Legal Practitioners arXiv.cs.SE Pub Date : 2024-03-22 Nathan Wintersgill, Trevor Stalnaker, Laura A. Heymann, Oscar Chaparro, Denys Poshyvanyk
Most modern software products incorporate open source components, which requires compliance with each component's licenses. As noncompliance can lead to significant repercussions, organizations often seek advice from legal practitioners to maintain license compliance, address licensing issues, and manage the risks of noncompliance. While legal practitioners play a critical role in the process, little
-
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond arXiv.cs.SE Pub Date : 2024-03-21 Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu
Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological
-
Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature arXiv.cs.SE Pub Date : 2024-03-20 Jeremy R. Harper
In the evolving landscape of clinical informatics, the integration and utilization of software tools developed through governmental funding represent a pivotal advancement in research and application. However, the dispersion of these tools across various repositories, with no centralized knowledge base, poses significant challenges to leveraging their full potential. This study introduces an automated
-
Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals arXiv.cs.SE Pub Date : 2024-03-21 Khanh Nghiem, Anh Minh Nguyen, Nghi D. Q. Bui
As a research-product hybrid group in AI for Software Engineering (AI4SE), we present four key takeaways from our experience developing in-IDE AI coding assistants. AI coding assistants should set clear expectations for usage, integrate with advanced IDE capabilities and existing extensions, use extendable backend designs, and collect app data responsibly for downstream analyses. We propose open questions
-
Towards Single-System Illusion in Software-Defined Vehicles -- Automated, AI-Powered Workflow arXiv.cs.SE Pub Date : 2024-03-21 Krzysztof Lebioda, Viktor Vorobev, Nenad Petrovic, Fengjunjie Pan, Vahid Zolfaghari, Alois Knoll
We propose a novel model- and feature-based approach to development of vehicle software systems, where the end architecture is not explicitly defined. Instead, it emerges from an iterative process of search and optimization given certain constraints, requirements and hardware architecture, while retaining the property of single-system illusion, where applications run in a logically uniform environment
-
Multi-role Consensus through LLMs Discussions for Vulnerability Detection arXiv.cs.SE Pub Date : 2024-03-21 Zhenyu Mao, Jialong Li, Munan Li, Kenji Tei
Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers
-
Pricing-driven Development and Operation of SaaS : Challenges and Opportunities arXiv.cs.SE Pub Date : 2024-03-20 Alejandro García-Fernández, José Antonio Parejo, Antonio Ruiz-Cortés
As the Software as a Service (SaaS) paradigm continues to reshape the software industry, a nuanced understanding of its operational dynamics becomes increasingly crucial. This paper delves into the intricate relationship between pricing strategies and software development within the SaaS model. Using PetClinic as a case study, we explore the implications of a Pricing-driven Development and Operation