Skip to content

Rohit909-creator/Reward-Augmented-Decoding-RAD-

Repository files navigation

Reward Augmented Decoding (RAD) for Qwen 2.5 3B Uncensored Model to Reduce Toxicity

diagram: RAD Diagram

This repository contains code for making Qwen 2.5 3B uncensored model generate non-harmful contents using RAD (Reward Augmented Decoding) technique. Just like in the Paper, here candidate tokens are taken to generate multiple continuations, then analyze and choose the one with the least toxicity.

Citation: If you find this repository useful, please consider citing the following paper:

@article{zhang2023rad,
  title={RAD: Large Language Model Generation with Reward Augmented Decoding},
  author={Haikang Deng and Colin Raffel},
  journal={arXiv preprint arXiv:https://arxiv.org/abs/2310.09520},
  year={2023}
}

Here is the link to the original paper.

About

Using Reward Signals to prevent Toxic Responses in LLMs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published