Achieving stability guarantees for feature attribution methods
Understanding and improving model predictions using rules learned from data.
A novel prompting method that provides faithful explanations and improves performance on complex reasoning tasks
An influence framework for studying in-context learning examples
Finding phrases that bias image and text generation towards unexpected outputs