Zhang, Yifan, Huang, Chen, Karas, Zachary, Nguyen, Thuy Dung, Leach, Kevin, & Huang, Yu. (2025). Enhancing Code LLM Training with Programmer Attention. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering. https://doi.org/10.1145/3696630.3728510
Human attention, such as where programmers look while reading or writing code, provides valuable signals that are not yet fully used in training large language models (LLMs) for code. These signals offer insights that go beyond machine-driven attention. However, collecting eye-tracking data is complex and costly, and there has been little progress in systematically applying these signals for training code LLMs.
To address this, we propose a full pipeline that combines data augmentation and reward-based fine-tuning. Specifically, we introduce: (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that transforms raw fixations into learnable attention motifs, and (3) a reward-guided strategy that integrates these insights into a CodeT5 supervised fine-tuning process.
Our experiments show a +7.16 improvement in CodeBLEU on the CodeXGlue benchmark for code summarization, demonstrating that combining human and machine attention can significantly enhance code intelligence. We hope this work encourages further exploration of human-centered approaches in next-generation AI for Software Engineering (AI4SE).
