Reinforcement Learning from Human Feedback

Notes and commented code for RLHF (PPO)

The code has been commented by using the trl library from Hugging Face with version 0.7.10: https://github.com/huggingface/trl/

You will find the original code of the ppo_trainer.py file and also the commented code. You can use any diff tool to check my comments.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
hugging-face-code-commented/trl		hugging-face-code-commented/trl
.gitignore		.gitignore
README.md		README.md
Slides.pdf		Slides.pdf
gpt_sentiment.py		gpt_sentiment.py
requirements.txt		requirements.txt

Provide feedback