Recent posts
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
We study jailbreak attacks through propositional Horn inference.
Towards Compositionality in Concept Learning
A method for learning compositional concepts from pre-trained foundation models.
Data-Efficient Learning with Neural Programs
Combining neural perception with symbolic or GPT-based reasoning
Sum-of-Parts Models: Faithful Attributions for Groups of Features
Overcoming fundamental barriers in feature attribution methods with grouped attributions
SmoothLLM: Defending LLMs Against Jailbreaking Attacks
LLMs, jailbreaking, and generative AI’s ‘biggest security flaw’
Stable Explanations with Multiplicative Smoothing
Achieving stability guarantees for feature attribution methods
Do Machine Learning Models Learn Statistical Rules Inferred from Data?
Understanding and improving model predictions using rules learned from data.
Faithful Chain-of-Thought Reasoning
A novel prompting method that provides faithful explanations and improves performance on complex reasoning tasks