Efficient Detection of Toxic Prompts in Large Language Models Paper • 2408.11727 • Published 29 days ago • 11
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique Paper • 2408.10701 • Published about 1 month ago • 10