DebugML

Recent posts

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

Anton Xue, Avishree Khare, Rajeev Alur, Surbhi Goel, Eric Wong July 9, 2024 12 minute read

We study jailbreak attacks through propositional Horn inference.

Towards Compositionality in Concept Learning

Adam Stein, Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong July 5, 2024 9 minute read

A method for learning compositional concepts from pre-trained foundation models.

Data-Efficient Learning with Neural Programs

Alaia Solko-Breslin, Seewon Choi, Ziyang Li, Neelay Velingker, Rajeev Alur, Mayur Naik, Eric Wong June 11, 2024 13 minute read

Combining neural perception with symbolic or GPT-based reasoning

Sum-of-Parts Models: Faithful Attributions for Groups of Features

Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong October 26, 2023 12 minute read

Overcoming fundamental barriers in feature attribution methods with grouped attributions

SmoothLLM: Defending LLMs Against Jailbreaking Attacks

Alex Robey, Eric Wong, Hamed Hassani, George J. Pappas October 17, 2023 17 minute read

LLMs, jailbreaking, and generative AI’s ‘biggest security flaw’

Stable Explanations with Multiplicative Smoothing

Anton Xue, Rajeev Alur, Eric Wong July 26, 2023 13 minute read

Achieving stability guarantees for feature attribution methods

Do Machine Learning Models Learn Statistical Rules Inferred from Data?

Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong July 25, 2023 9 minute read

Understanding and improving model predictions using rules learned from data.

Faithful Chain-of-Thought Reasoning

Qing Lyu*, Shreya Havaldar*, Adam Stein*, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch May 3, 2023 9 minute read

A novel prompting method that provides faithful explanations and improves performance on complex reasoning tasks