Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Benchmarks for LM4Code/LM4SE

Table of contents

  1. Relevant papers
  2. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
  3. On the Evaluation of Neural Code Translation: Taxonomy and Benchmark
  4. Evaluating large language models trained on code
  5. A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends
  6. Bug Repair
    1. Defects4J
    2. ManyBugs/IntroClass
    3. BugAID
    4. CoCoNut
    5. QuixBugs
    6. Bugs.jar
    7. BugsInPy
    8. DeepFix
  7. Code Generation/Synthesis
    1. CONCODE
    2. HumanEval
    3. MBPP/MathQA-Python
  8. Code Sumarization
    1. CODE-NN
    2. TL-CodeSum
    3. CodeSearchNet

This page lists popular benchmarks for evaluating language models for code (LM4Code) and language models for software engineering (LM4SE).

Relevant papers

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

  • Release year: 2021-02
  • Paper
  • Repository
  • Description: Proposes a benchmark with 8 tasks to evaluate code understanding and generation models like CodeBERT, CodeGPT, GraphCodeBERT, etc.

On the Evaluation of Neural Code Translation: Taxonomy and Benchmark

  • Release year: 2023
  • Paper
  • Description: This paper proposes a taxonomy of neural code translation tasks and introduces a new benchmark, G-TransEval, for evaluating neural code translation models. G-TransEval includes a variety of code translation tasks with different levels of difficulty and complexity.

Evaluating large language models trained on code

  • Release year: 2021
  • Paper
  • Description: Evaluates Codex, AlphaCode, and other models on code translation, code completion, code summarization, and other tasks.

Bug Repair

Defects4J

ManyBugs/IntroClass

BugAID

CoCoNut

QuixBugs

Bugs.jar

BugsInPy

DeepFix

Code Generation/Synthesis

CONCODE

HumanEval

MBPP/MathQA-Python

Code Sumarization

CODE-NN

TL-CodeSum

CodeSearchNet