23 3 19

Haoxiang Wang

Haoxiang-Wang

https://haoxiang-wang.github.io/

AI & ML interests

Machine Learning (Transfer Learning, OOD Generalization, Domain Adaptation, Meta-Learning)

Organizations

Haoxiang-Wang's activity

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 16 days ago

Why is the code-complexity coefficient so high in the demo example?

#16 opened 17 days ago by

icdt

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 about 1 month ago

Special tokens in the vocabulary?

#13 opened 2 months ago by

nshen7

Original reward space

#15 opened about 1 month ago by

anjaa

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 about 2 months ago

[AUTOMATED] Model Memory Requirements

#5 opened 3 months ago by

model-sizer-bot

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 2 months ago

What is the range of the output score from the model?

#12 opened 2 months ago by

nshen7

Why is `multi_obj_rewards` multipled by 5, but then 0.5 is subtracted from it?

#11 opened 2 months ago by

xzuyn

Update README.md

#3 opened 3 months ago by

philschmid

Issue when finetuning the reward model on custom dataset

#2 opened 4 months ago by

yguooo

Longer context

#10 opened 2 months ago by

salazaaar

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 3 months ago

batched predictions with padding through the model don't seem to work correctly

#7 opened 3 months ago by

karthikramen

ModuleNotFoundError: No module named 'transformers_modules.RLHFlow.ArmoRM-Llama3-8B-v0'

#6 opened 3 months ago by

fchaubard

Why Not Utilize a Sigmoid Function in the Regression Layer?

#8 opened 3 months ago by

xwz-xmu

New activity in allenai/reward-bench 3 months ago

Separate Scores: With & Without Prior Sets

#6 opened 3 months ago by

Haoxiang-Wang

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 4 months ago

Problem running the model

#1 opened 4 months ago by

Asaf-Yehudai

New activity in RLHFlow/LLaMA3-iterative-DPO-final 4 months ago

exl2 quants

#2 opened 4 months ago by

Apel-sin

New activity in RLHFlow/pair-preference-model-LLaMA3-8B 4 months ago

CAn you specify the license for this model please ?

#1 opened 4 months ago by

sparsh35

commented a paper 4 months ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 67 •

New activity in prometheus-eval/Feedback-Bench 5 months ago

Data Description

#2 opened 5 months ago by

Haoxiang-Wang

New activity in prometheus-eval/Preference-Bench 5 months ago

Data Description

#2 opened 5 months ago by

Haoxiang-Wang

New activity in prometheus-eval/BiGGen-Bench-Results 5 months ago

Data Description

#1 opened 5 months ago by

Haoxiang-Wang

New activity in argilla/distilabel-intel-orca-kto 5 months ago

Data Preparation

#1 opened 5 months ago by

Haoxiang-Wang

New activity in chargoddard/llama-2-34b-uncode 12 months ago

Is this Llama-2 or CodeLlama-2?

#1 opened 12 months ago by

Haoxiang-Wang