Overcoming fundamental barriers in feature attribution methods with grouped attributions
LLMs, jailbreaking, and generative AI’s ‘biggest security flaw’
Achieving stability guarantees for feature attribution methods
Understanding and improving model predictions using rules learned from data.
A novel prompting method that provides faithful explanations and improves performance on complex reasoning tasks
An influence framework for studying in-context learning examples
Finding phrases that bias image and text generation towards unexpected outputs