ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts

Published in Arxiv, 2024

Recommended citation: Amelia Hardy*, Houjun Liu*, Bernard Lange, Mykel J Kochenderfer
Download Paper