Papers

Type Inference
Data Collection and Labeling
System Design and Learning
Performance Evaluation
Deployment and Maintainance

Type Inference

Dos and Don’ts of Machine Learning in Computer Security (2022), USENIX Security, D Arp, et al. [pdf]
Machine/deep learning for software engineering: A systematic literature review (2022), TSE, Simin Wang, et al. [pdf]
Trustworthy AI: From principles to practices (2023), arxiv, BO Li, et al. [pdf]

Data Collection and Labeling

Unbalanced Distribution

Deep Learning Based Vulnerability Detection (2021), arxiv, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! (2023), ICSE, X Yang, et al. [pdf]
On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), arxiv, B Steenhoek, et al. [pdf]

Label Errors

Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, et al. [pdf]
Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper) (2023), ISSTA, X Nie, et al. [pdf]

Data Noise

Slice-Based Code Change Representation Learning (2023), SANER, F Zhang, et al. [pdf]
Are we building on the rock? on the importance of data preprocessing for code summarization (2022), FSE, L Shi, et al. [pdf]
Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? (2018), ASE, Z Liu, et al. [pdf]

System Design and Learning

Data Snooping

AutoTransform: automated code transformation to support modern code review process (2022), ICSE, Thongtanunam, Patanamon, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. [pdf]
Can Neural Clone Detection Generalize to Unseen Functionalitiesƒ (2021), ASE, C Liu, et al. [pdf]
CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation (2020), TDSC, S Liu, et al. [pdf]
Deep just-in-time defect prediction: how far are we? (2021), ISSTA, Z Zeng, et al. [pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, et al. [pdf]
Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models (2302), ICSE, S Gao, et al. [pdf]
Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
Syntax and Domain Aware Model for Unsupervised Program Translation (2302), ICSE, F Liu, J Li, L Zhang. [pdf]
How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
On the Evaluation of Neural Code Summarization (2022), ICSE, E Shi, Y Wang, L Du, et al. [pdf]

Spurious Correlations

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al. [pdf]
Explaining mispredictions of machine learning models using rule induction (2021), FSE, J Cito, I Dillig, S Kim, et al. [pdf]
Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching (2021), TOSEM, D Zou, Y Zhu, S Xu, et al. [pdf]
Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code (2021), ASE, M Paltenghi, M Pradel. [pdf]
Vulnerability detection with fine-grained interpretations (2021), FSE, Y Li, S Wang, TN Nguyen. [pdf]
What do they capture? a structural analysis of pre-trained language models for source code (2022), ICSE, Y Wan, W Zhao, H Zhang, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond (2023), ISSTA, E Shi, Y Wang, H Zhang, et al. [pdf]

Inappropriate Model Design

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking (2022), TSE, H Wang, P Ma, Y Yuan, et al. [pdf]
Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al.[pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al.[pdf]
Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, G Li, J Zhang, et al. [pdf]
RepresentThemAll: A Universal Learning Representation of Bug Reports (2023), ICSE, S Fang, T Zhang, Y Tan, et al. [pdf]
Template-based Neural Program Repair (2023), ICSE, X Meng, X Wang, H Zhang, et al. [pdf]

Performance Evaluation

Inappropriate Baseline

Towards More Realistic Evaluation for Neural Test Oracle Generationr (2023), ARXIV, Z Liu, K Liu, X Xia, et al. [pdf]

Inappropriate Evaluation Dataset

Deep Learning Based Program Generation From Requirements Text: Are We There Yet? (2020), TSE, H Liu, M Shen, J Zhu, et al. [pdf]
Generating realistic vulnerabilities via neural code editing: an empirical study (2022), FSE, Y Nong, Y Ou, M Pradel, et al. [pdf]

Low Reproducibility

An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]

Inappropriate Performance Measures

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al. [pdf]
Multi-task learning based pre-trained language model for code completion (2020), ASE, F Liu, G Li, Y Zhao, et al. [pdf]
On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al. [pdf]
Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
SynShine: Improved Fixing of Syntax Errors (2022), TSE, Ahmed T, Ledesma N R, Devanbu P. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
Tare: Type-Aware Neural Program Repair (2023), ICSE, Q Zhu, Z Sun, W Zhang, et al. [pdf]
How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
GitHub Copilot AI pair programmer: Asset or Liability? (2023), JSS, AM Dakhel, V Majdinasab, A Nikanjam, et al. [pdf]

Deployment and Maintainance

Real-World Constraints

Examining Zero-Shot Vulnerability Repair with Large Language Models (2023), S&P, H Pearce, B Tan, B Ahmad, et al. [pdf]
A Performance-Sensitive Malware Detection System Using Deep Learning on Mobile Devices (2020), TIFS, R Feng, S Chen, X Xie, et al. [pdf]
Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al.[pdf]
When Code Completion Fails: A Case Study on Real-World Completions (2019), ICSE, VJ Hellendoorn, S Proksch, HC Gall, et al. [pdf]
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants (2023), arxiv, G Sandoval, H Pearce, T Nys, et al. [pdf]
Grounded Copilot: How Programmers Interact with Code-Generating Models (2023), OOPSLA1, S Barke, MB James, N Polikarpova. [pdf]
LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning (2308), arxiv, J Lu, L Yu, X Li, et al.[pdf]
Compressing Pre-trained Models of Code into 3 MB (2022), ASE, J Shi, Z Yang, B Xu, et al.[pdf]

Attack Threats

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (2021), USENIX Security, R Schuster, C Song, E Tromer, et al. [pdf]
Adversarial Robustness of Deep Code Comment Generation (2022), TOSEM, Y Zhou, X Zhang, J Shen, et al. [pdf]
An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]
Generating Adversarial Examples for Holding Robustness of Source Code Processing Models (2020), AAAI, H Zhang, Z Li, G Li, et al. [pdf]
Semantic Robustness of Models of Source Code (2020), SANER, G Ramakrishnan, J Henkel, Z Wang, et al. [pdf]
You see what I want you to see: poisoning vulnerabilities in neural code search (2022), FSE, Y Wan, S Zhang, H Zhang, et al. [pdf]
Contrabert: Enhancing code pre-trained models via contrastive learning (2023), ICSE, S Liu, B Wu, X Xie, et al. [pdf]
On the robustness of code generation techniques: An empirical study on github copilot (2023), ICSE, A Mastropaolo, L Pascarella, E Guglielmi, et al. [pdf]
Two sides of the same coin: Exploiting the impact of identifiers in neural code comprehension (2023), ICSE, S Gao, C Gao, C Wang, et al. [pdf]
Multi-target Backdoor Attacks for Code Pre-trained Models (2023), ACL, Y Li, S Liu, K Chen, et al. [pdf]
Backdooring Neural Code Search (2023), ACL, W Sun, Y Chen, G Tao, et al. [pdf]
ReCode: Robustness Evaluation of Code Generation Models (2022), ACL, S Wang, Z Li, H Qian, et al. [pdf]
Natural Attack for Pre-trained Models of Code (2022), ICSE, Z Yang, J Shi, J He, et al. [pdf]
Coprotector: Protect open-source code against unauthorized training usage with data poisoning (2022), WWW, Z Sun, X Du, F Song, et al. [pdf]
On the Security Vulnerabilities of Text-to-SQL Models (2211), ISSRE, X Peng, Y Zhang, J Yang, et al. [pdf]

Security Concerns in Generated Code

Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions (2022), S&P, H Pearce, B Ahmad, B Tan, et al. [pdf]
Automated repair of programs from large language models (2023), ICSE, Z Fan, X Gao, M Mirchev, et al. [pdf]
Cctest: Testing and repairing code completion systems (2023), ICSE, Z Li, C Wang, Z Liu, et al. [pdf]
Analyzing Leakage of Personally Identifiable Information in Language Models (2023), S&P, N Lukas, A Salem, R Sim, et al. [pdf]
CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot (2023), USENIX Security, L Niu, S Mirza, Z Maradni, et al. [pdf]

Papers

Table of contents

Type Inference

Data Collection and Labeling

Unbalanced Distribution

Label Errors

Data Noise

System Design and Learning

Data Snooping

Spurious Correlations

Inappropriate Model Design

Performance Evaluation

Inappropriate Baseline

Inappropriate Evaluation Dataset

Low Reproducibility

Inappropriate Performance Measures

Deployment and Maintainance

Real-World Constraints

Attack Threats

Security Concerns in Generated Code