Edit model card

A 3B T5 model trained on the P3 (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001. The model is trained from the T5 v1.1 lm-adapt checkpoint and fully finetuned.

For more details, see HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation.

Performance on T0 held-out tasks (average accuracy across prompts using rank classification):

Model ANLI (avg) HellaSwag StoryCloze CB COPA RTE WiC WSC WinoGrande Average
T0-3B 33.4 27.2 84.0 45.4 75.9 64.6 50.7 65.1 51.0 55.2
hypertask_T0_3B (this model) 41.7 30.1 96.9 72.7 89.1 81.2 51.7 57.2 59.2 64.4
Downloads last month
12
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train hamishivi/hypertask_T0_3B