Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
publications
Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations
Ashwin Paranjape, Abigail See, Kathleen Kenealy, Haojun Li, Amelia Hardy, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Christopher D Manning
Paper
Published in Alexa Prize Proceedings! We present Chirpy Cardinal, an open-domain dialogue agent that won 2nd place in the 2019 Alexa Prize Competition. Our open-source code is here!
Effective Social Chatbot Strategies for Increasing User Initiative
Amelia Hardy, Ashwin Paranjape, Christopher Manning
Paper
Published at SIGDIAL! We study strategies for increasing user intiative in human-bot conversations and show that simple automated metrics correlate with human judgment of initiative.
Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent
Ethan A Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat Lim, Amelia Hardy, Chetanya Rastogi, Haojun Li, Alexander Iyabor, Yutong He, Hari Sowrirajan, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Jillian Tang, Avanika Narayan, Giovanni Campagna, Christopher D Manning
Paper
Published in SIGDIAL! We present V2 of Chirpy Cardinal, an open-domain dialogue agent that won 2nd place in the 2020 Alexa Prize Competition.
Evaluating Human-Language Model Interaction
Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
Paper
Published in TMLR! We evaluate human-LM interaction on the tasks of social dialogue, question answering, crossword puzzles, summarization, and metaphor generation and highlight cases where the results from non-interactive and interactive metrics diverge.
Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures
Soyeon Jung, Amelia Hardy, Mykel J Kochenderfer
We present a simple and interpretable approach to modeling flight trajectories that leverages Gaussian Mixture Models specific to each flight segment.
ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts
Amelia Hardy*, Houjun Liu*, Bernard Lange, Mykel J Kochenderfer
Paper
We apply the Adaptive Stress Testing (AST) framework to language modeling to identify prompts that are both effective for red-teaming and likely to occur under natural autoregression.
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel*, Amelia Hardy*, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel J. Kochenderfer
Paper
Spotlighted in NeurIPS! In this work, we propose best practices for informative, reproducible AI benchmarks and evaluate a set of benchmarks according to these criteria.
More than Marketing? On the Information Value of AI Benchmarks for Practitioners
Amelia Hardy*, Anka Reuel*, Kiana Jafari Meimandi, Lisa Soder, Allie Griffith, Dylan M Asmar, Sanmi Koyejo, Michael S Bernstein, Mykel J Kochenderfer
Paper
We present a qualitative, interview-based study on how AI benchmarks are used in practice for decision-makers in research, product, and policy roles.
talks
Effective Social Chatbot Strategies for Increasing User Initiative
I presented my work on strategies for increasing user intiative in human-bot dialogues. You can watch my talk here.
Developing an LLM-Based Cockpit Assistant
I was invited to give a talk on my work with SISL + Airbus on building an LLM-based cockpit assistant and a framework to evaluate it.
ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts
I was invited to give a talk on ASTPrompter at the Leibniz Center For European Economic Research workshop on Designing Procedures for Red Teaming Generative AI Models.
AA228/CS238 Decision Making Under Uncertainty Lecture on Policy Gradient Estimation
I gave a lecture on policy gradient estimation! You can watch my talk here.