Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Actionable Interpretability

Simplicity isn't as simple as you think

Aman Sinha · Timothee Mickus · Marianne Clausel · Mathieu Constant · Xavier Coubez

[ ] [ Project Page ]
Sat 19 Jul 10:40 a.m. PDT — 11:40 a.m. PDT

Abstract:

Interpretability as a field of study has explored several evaluation criteria that range from correctness, i.e., how closely the explanations match the model's behavior, to usefulness, i.e., how easy the explanations are for end users to understand. Prior literature consistently emphasizes correctness over usefulness, and usually discards the latter as a characteristic that requires sacrificing the former. While there are practical arguments in favor of such an approach, this state of affairs has also resulted in a knowledge gap --- what are the consequences of favoring explanations that are simpler, beyond their tendency to not align with more accurate depictions of a model's inner workings? In this work, we conduct several experiments using the simplicity metric of \citet{bhatt2020evaluating}, and show that simpler explanations often present multiple caveats: (i) there is a tradeoff between model performance and simplicity of explanations; (ii) simple explanations can be generated for datapoints that are complex for humans to label; and (iii) simpler explanations need not align with human rationales.

Chat is not available.