ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts
Published in Arxiv, 2024
Recommended citation: Amelia Hardy*, Houjun Liu*, Bernard Lange, Mykel J Kochenderfer
Download Paper
Published in Arxiv, 2024
Recommended citation: Amelia Hardy*, Houjun Liu*, Bernard Lange, Mykel J Kochenderfer
Download Paper