Poster
in
Workshop: Actionable Interpretability
Simplicity isn't as simple as you think
Aman Sinha · Timothee Mickus · Marianne Clausel · Mathieu Constant · Xavier Coubez
Interpretability as a field of study has explored several evaluation criteria that range from correctness, i.e., how closely the explanations match the model's behavior, to usefulness, i.e., how easy the explanations are for end users to understand. Prior literature consistently emphasizes correctness over usefulness, and usually discards the latter as a characteristic that requires sacrificing the former. While there are practical arguments in favor of such an approach, this state of affairs has also resulted in a knowledge gap --- what are the consequences of favoring explanations that are simpler, beyond their tendency to not align with more accurate depictions of a model's inner workings? In this work, we conduct several experiments using the simplicity metric of \citet{bhatt2020evaluating}, and show that simpler explanations often present multiple caveats: (i) there is a tradeoff between model performance and simplicity of explanations; (ii) simple explanations can be generated for datapoints that are complex for humans to label; and (iii) simpler explanations need not align with human rationales.