Publications

For the most up-to-date publication list, please see my Google Scholar profile.

ASTPrompter: Preference-Aligned Automated Language Model Red-Teaming to Generate Low-Perplexity Unsafe Prompts

Amelia Hardy*, Houjun Liu*, Allie Griffith, Bernard Lange, Duncan Eddy, Mykel J Kochenderfer
Paper

We apply the Adaptive Stress Testing (AST) framework to language modeling to identify prompts that are both effective for red-teaming and likely to occur under natural autoregression.

More than Marketing? On the Information Value of AI Benchmarks for Practitioners

Amelia Hardy*, Anka Reuel*, Kiana Jafari Meimandi, Lisa Soder, Allie Griffith, Dylan M Asmar, Sanmi Koyejo, Michael S Bernstein, Mykel J Kochenderfer
Paper

We present a qualitative, interview-based study on how AI benchmarks are used in practice for decision-makers in research, product, and policy roles.

BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices

Anka Reuel*, Amelia Hardy*, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel J. Kochenderfer
Paper

Spotlighted in NeurIPS! In this work, we propose best practices for informative, reproducible AI benchmarks and evaluate a set of benchmarks according to these criteria.

Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures

Soyeon Jung, Amelia Hardy, Mykel J Kochenderfer

We present a simple and interpretable approach to modeling flight trajectories that leverages Gaussian Mixture Models specific to each flight segment.

Evaluating Human-Language Model Interaction

Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
Paper

Published in TMLR! We evaluate human-LM interaction on the tasks of social dialogue, question answering, crossword puzzles, summarization, and metaphor generation and highlight cases where the results from non-interactive and interactive metrics diverge.

Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent

Ethan A Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat Lim, Amelia Hardy, Chetanya Rastogi, Haojun Li, Alexander Iyabor, Yutong He, Hari Sowrirajan, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Jillian Tang, Avanika Narayan, Giovanni Campagna, Christopher D Manning
Paper

Published in SIGDIAL! We present V2 of Chirpy Cardinal, an open-domain dialogue agent that won 2nd place in the 2020 Alexa Prize Competition.

Effective Social Chatbot Strategies for Increasing User Initiative

Amelia Hardy, Ashwin Paranjape, Christopher Manning
Paper

Published at SIGDIAL! We study strategies for increasing user intiative in human-bot conversations and show that simple automated metrics correlate with human judgment of initiative.

Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations

Ashwin Paranjape, Abigail See, Kathleen Kenealy, Haojun Li, Amelia Hardy, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Christopher D Manning
Paper

Published in Alexa Prize Proceedings! We present Chirpy Cardinal, an open-domain dialogue agent that won 2nd place in the 2019 Alexa Prize Competition. Our open-source code is here!