File:RLHF diagram.svg

Page contents not supported in other languages.
This is a file from the Wikimedia Commons
From Wikipedia, the free encyclopedia

Original file(SVG file, nominally 512 × 366 pixels, file size: 177 KB)

Summary

Description
English: This is a high-level overview of reinforcement learning from human feedback, including training an initial supervised model, collecting human feedback, training a reward model, and using it to align the initial model.
Date
Source Own work
Author PopoDameron

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

Captions

High-level overview of reinforcement learning from human feedback

14 March 2024

image/svg+xml

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current20:20, 1 April 2024Thumbnail for version as of 20:20, 1 April 2024512 × 366 (177 KB)PopoDameronClarified relationship between RM and aligned model & added description to the aligned model
04:13, 14 March 2024Thumbnail for version as of 04:13, 14 March 2024512 × 366 (160 KB)PopoDameronUploaded own work with UploadWizard
The following pages on the English Wikipedia use this file (pages on other projects are not listed):

Global file usage

Metadata