Generating Patch Ingredients for Search-based Program Repair using Code Language Models
Lijzenga, Oebele (2024)
As software systems grow in size, more bugs occur, which are usually resolved manually. Manual bug
localization and fixing is a costly time-consuming process, and hinders the development of new software. Search-based automated program repair (APR) techniques attempt to fix bugs in programs by
searching a search space of patches using an evolutionary algorithm. Patches are constructed from
code elements elsewhere in the program, also referred to as the redundancy assumption. As a result however, search-based APR techniques are not capable of fixing bugs if the required patch ingredients are
not present elsewhere in the code. Previous work has attempted to treat this problem, but has failed
to produce additional patch ingredients in a cost-effective manner. This study proposes ARJACLM,
a novel search-based APR technique based on ARJA, which uses pre-trained code language models
(CLM) to generate patch ingredients on-the-fly. Moreover, an extensive evaluation of the code generation capabilities of 20 CLMs is performed to determine which CLMs are most cost-effective, and are
suitable for use in APR techniques. Results show that the performance of ARJACLM is improved by
59% when CLMs are used. Furthermore, CLM-based patch ingredients are of higher quality than their
redundancy-based counterparts, and ARJACLM performs best when redundancy-based patch ingredients are omitted as a result. Moreover, the results expose several challenges involved with incorporating
CLMs into a search-based technique, and provide directions for future research
Lijzenga_MA_EEMCS.pdf