Dos and Don’ts of Machine Learning in Computer Security (2022), USENIX Security, D Arp, et al. [pdf]
Machine/deep learning for software engineering: A systematic literature review (2022), TSE, Simin Wang, et al. [pdf]
Trustworthy AI: From principles to practices (2023), arxiv, BO Li, et al. [pdf]
Data Collection and Labeling
Unbalanced Distribution
Deep Learning Based Vulnerability Detection (2021), arxiv, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! (2023), ICSE, X Yang, et al. [pdf]
On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), arxiv, B Steenhoek, et al. [pdf]
Label Errors
Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, et al. [pdf]
Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper) (2023), ISSTA, X Nie, et al. [pdf]
Data Noise
Slice-Based Code Change Representation Learning (2023), SANER, F Zhang, et al. [pdf]
Are we building on the rock? on the importance of data preprocessing for code summarization (2022), FSE, L Shi, et al. [pdf]
Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? (2018), ASE, Z Liu, et al. [pdf]
System Design and Learning
Data Snooping
AutoTransform: automated code transformation to support modern code review process (2022), ICSE, Thongtanunam, Patanamon, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. [pdf]
Can Neural Clone Detection Generalize to Unseen Functionalitiesƒ (2021), ASE, C Liu, et al. [pdf]
CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation (2020), TDSC, S Liu, et al. [pdf]
Deep just-in-time defect prediction: how far are we? (2021), ISSTA, Z Zeng, et al. [pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, et al. [pdf]
Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models (2302), ICSE, S Gao, et al. [pdf]
Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
Syntax and Domain Aware Model for Unsupervised Program Translation (2302), ICSE, F Liu, J Li, L Zhang. [pdf]
How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
On the Evaluation of Neural Code Summarization (2022), ICSE, E Shi, Y Wang, L Du, et al. [pdf]
Spurious Correlations
Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al. [pdf]
Explaining mispredictions of machine learning models using rule induction (2021), FSE, J Cito, I Dillig, S Kim, et al. [pdf]
Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching (2021), TOSEM, D Zou, Y Zhu, S Xu, et al. [pdf]
Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code (2021), ASE, M Paltenghi, M Pradel. [pdf]
Vulnerability detection with fine-grained interpretations (2021), FSE, Y Li, S Wang, TN Nguyen. [pdf]
What do they capture? a structural analysis of pre-trained language models for source code (2022), ICSE, Y Wan, W Zhao, H Zhang, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond (2023), ISSTA, E Shi, Y Wang, H Zhang, et al. [pdf]
Inappropriate Model Design
Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking (2022), TSE, H Wang, P Ma, Y Yuan, et al. [pdf]
Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al.[pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al.[pdf]
Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, G Li, J Zhang, et al. [pdf]
RepresentThemAll: A Universal Learning Representation of Bug Reports (2023), ICSE, S Fang, T Zhang, Y Tan, et al. [pdf]
Template-based Neural Program Repair (2023), ICSE, X Meng, X Wang, H Zhang, et al. [pdf]
Performance Evaluation
Inappropriate Baseline
Towards More Realistic Evaluation for Neural Test Oracle Generationr (2023), ARXIV, Z Liu, K Liu, X Xia, et al. [pdf]
Inappropriate Evaluation Dataset
Deep Learning Based Program Generation From Requirements Text: Are We There Yet? (2020), TSE, H Liu, M Shen, J Zhu, et al. [pdf]
Generating realistic vulnerabilities via neural code editing: an empirical study (2022), FSE, Y Nong, Y Ou, M Pradel, et al. [pdf]
Low Reproducibility
An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]
Inappropriate Performance Measures
Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al. [pdf]
Multi-task learning based pre-trained language model for code completion (2020), ASE, F Liu, G Li, Y Zhao, et al. [pdf]
On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al. [pdf]
Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
SynShine: Improved Fixing of Syntax Errors (2022), TSE, Ahmed T, Ledesma N R, Devanbu P. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
Tare: Type-Aware Neural Program Repair (2023), ICSE, Q Zhu, Z Sun, W Zhang, et al. [pdf]
How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
GitHub Copilot AI pair programmer: Asset or Liability? (2023), JSS, AM Dakhel, V Majdinasab, A Nikanjam, et al. [pdf]
Deployment and Maintainance
Real-World Constraints
Examining Zero-Shot Vulnerability Repair with Large Language Models (2023), S&P, H Pearce, B Tan, B Ahmad, et al. [pdf]
A Performance-Sensitive Malware Detection System Using Deep Learning on Mobile Devices (2020), TIFS, R Feng, S Chen, X Xie, et al. [pdf]
Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al.[pdf]
When Code Completion Fails: A Case Study on Real-World Completions (2019), ICSE, VJ Hellendoorn, S Proksch, HC Gall, et al. [pdf]
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants (2023), arxiv, G Sandoval, H Pearce, T Nys, et al. [pdf]
Grounded Copilot: How Programmers Interact with Code-Generating Models (2023), OOPSLA1, S Barke, MB James, N Polikarpova. [pdf]
LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning (2308), arxiv, J Lu, L Yu, X Li, et al.[pdf]
Compressing Pre-trained Models of Code into 3 MB (2022), ASE, J Shi, Z Yang, B Xu, et al.[pdf]
Attack Threats
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (2021), USENIX Security, R Schuster, C Song, E Tromer, et al. [pdf]
Adversarial Robustness of Deep Code Comment Generation (2022), TOSEM, Y Zhou, X Zhang, J Shen, et al. [pdf]
An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]
Generating Adversarial Examples for Holding Robustness of Source Code Processing Models (2020), AAAI, H Zhang, Z Li, G Li, et al. [pdf]
Semantic Robustness of Models of Source Code (2020), SANER, G Ramakrishnan, J Henkel, Z Wang, et al. [pdf]
You see what I want you to see: poisoning vulnerabilities in neural code search (2022), FSE, Y Wan, S Zhang, H Zhang, et al. [pdf]
Contrabert: Enhancing code pre-trained models via contrastive learning (2023), ICSE, S Liu, B Wu, X Xie, et al. [pdf]
On the robustness of code generation techniques: An empirical study on github copilot (2023), ICSE, A Mastropaolo, L Pascarella, E Guglielmi, et al. [pdf]
Two sides of the same coin: Exploiting the impact of identifiers in neural code comprehension (2023), ICSE, S Gao, C Gao, C Wang, et al. [pdf]
Multi-target Backdoor Attacks for Code Pre-trained Models (2023), ACL, Y Li, S Liu, K Chen, et al. [pdf]
Backdooring Neural Code Search (2023), ACL, W Sun, Y Chen, G Tao, et al. [pdf]
ReCode: Robustness Evaluation of Code Generation Models (2022), ACL, S Wang, Z Li, H Qian, et al. [pdf]
Natural Attack for Pre-trained Models of Code (2022), ICSE, Z Yang, J Shi, J He, et al. [pdf]
Coprotector: Protect open-source code against unauthorized training usage with data poisoning (2022), WWW, Z Sun, X Du, F Song, et al. [pdf]
On the Security Vulnerabilities of Text-to-SQL Models (2211), ISSRE, X Peng, Y Zhang, J Yang, et al. [pdf]
Security Concerns in Generated Code
Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions (2022), S&P, H Pearce, B Ahmad, B Tan, et al. [pdf]
Automated repair of programs from large language models (2023), ICSE, Z Fan, X Gao, M Mirchev, et al. [pdf]
Cctest: Testing and repairing code completion systems (2023), ICSE, Z Li, C Wang, Z Liu, et al. [pdf]
Analyzing Leakage of Personally Identifiable Information in Language Models (2023), S&P, N Lukas, A Salem, R Sim, et al. [pdf]
CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot (2023), USENIX Security, L Niu, S Mirza, Z Maradni, et al. [pdf]