Gemini-Driven Automated Prompt Engineering for Mining High-Volume Sustainability Reports at ING Bank

Author(s): Fishchuk, V.V. (2025)

Abstract:

LLMs are increasingly used for mining long, heterogeneous reports, increasing the demand for prompt engineering; yet, scalable manual prompt engineering is costly. This study evaluates automated prompt optimization for information extraction at ING Bank. The author adapts the ProTeGi automated prompt optimization framework to the information extraction use case and introduces Gradient Verification (GradV). This decision gate filters LLM feedback, facilitating faster convergence of LLM-feedback-based prompt optimization methods. The work also introduces a transparent prompt enrichment (PE) framework, which converts verified LLM feedback into modular instructions. Using Gemini Flash 2.0, the author extracts absolute emissions of Scope 1, assurance of Scope 3, and reporting period from 141 anonymized annual sustainability reports. On the test set (n=49), the improvements over the initial prompts were 27 percentage points (pp) for S1, 4pp for S3a, and 10pp for Rp; after the Holm-Bonferroni correction, only S1 remains statistically significant (one-tailed exact binomial test on discordant pairs, with stepwise significance levels of 0.0167, 0.025, 0.05).
    Optimization also improved stability, reducing run-to-run response variability on some variables under identical conditions. 
    Furthermore, optimized prompts displayed similar accuracies to analyst-designed prompts on some targets, clearly highlighting the potential for reducing manual effort. In general, automated prompt optimization has shown potential to be a feasible and scalable alternative to manual prompt engineering for long-context and long-input information extraction in enterprise scenarios for certain target variables.

Document(s):

thesis_final.pdf