Poster
TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs
Felipe Nuti · Tim Franzmeyer · Joao Henriques
East Exhibition Hall A-B #E-3005
AI writing tools like ChatGPT first learn from vast collections of internet text, and afterwards are trained to follow instructions and safety rules. But it's hard to know how much each stage — learning from the internet versus learning to follow instructions — contributes to any single reply, making it difficult to quantitatively analyse how the AI works and behaves.We introduce a way to peek at the AI's internal signals as it answers each question and split each reply into two contributions: from internet data versus from instruction data. From that split, we compute the Tuning Contribution (TuCo), a simple percentage that shows how much the instructions data shaped the response (for example, "30% tuning contribution"), compared to the internet data.TuCo can help researchers spot when the AI's instruction learning phase has less effect than intended, letting the AI go into "unfamiliar" territory for which it does not have instructions. It can reveal hidden blind spots — like trick prompts that quietly undermine safeguards — and can guide teams in strengthening defences. It can also point out questions where tweaks barely help, so developers can refine their training data and make AI systems more reliable.