Original reward space

#15
by anjaa - opened

Hi, I am a bit confused, what is the original reward space?
Seems like rewards are transformed to the range of -0.5 to 4.5.

The actual rewards of this example from the HelpSteer dataset

are [3,3,4,2,2] for the five helpsteer objectives:

helpfulness, correctness, coherence, complexity, verbosity

We can linearly transform our predicted rewards to the

original reward space to compare with the ground truth

helpsteer_rewards_pred = multi_obj_rewards[0, :5] * 5 - 0.5
print(helpsteer_rewards_pred)

[2.78125 2.859375 3.484375 1.3847656 1.296875 ]

RLHFlow org

I linearly transformed HelpSteer rewards from [0-4] to [0.1, 0.9] for training the model with x -> (x+0.5)/5. So, in order to inverse transform it to the original scale [0,4], I applied x -> x*5 - 0.5

Haoxiang-Wang changed discussion status to closed

Sign up or log in to comment