Code Defect Detection Using Pre-trained Language Models with Encoder-Decoder via Line-Level Defect Localization

Jimin An, YunSeok Choi, Jee-Hyong Lee


Abstract
Recently, code Pre-trained Language Models (PLMs) trained on large amounts of code and comment, have shown great success in code defect detection tasks. However, most PLMs simply treated the code as a single sequence and only used the encoder of PLMs to determine if there exist defects in the entire code. For a more analyzable and explainable approach, it is crucial to identify which lines contain defects. In this paper, we propose a novel method for code defect detection that integrates line-level defect localization into a unified training process. To identify code defects at the line-level, we convert the code into a sequence separated by lines using a special token. Then, to utilize the characteristic that both the encoder and decoder of PLMs process information differently, we leverage both the encoder and decoder for line-level defect localization. By learning code defect detection and line-level defect localization tasks in a unified manner, our proposed method promotes knowledge sharing between the two tasks. We demonstrate that our proposed method significantly improves performance on four benchmark datasets for code defect detection. Additionally, we show that our method can be easily integrated with ChatGPT.
Anthology ID:
2024.lrec-main.306
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
3446–3456
Language:
URL:
https://aclanthology.org/2024.lrec-main.306
DOI:
Bibkey:
Cite (ACL):
Jimin An, YunSeok Choi, and Jee-Hyong Lee. 2024. Code Defect Detection Using Pre-trained Language Models with Encoder-Decoder via Line-Level Defect Localization. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3446–3456, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Code Defect Detection Using Pre-trained Language Models with Encoder-Decoder via Line-Level Defect Localization (An et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.306.pdf