Why the intermediate_size of Qwen1.5-MoE-A2.7B is different from Qwen-1.8B?

#5
by ShiKeNLP - opened

Hello,

The intermediate_size of Qwen1.5-MoE-A2.7B is 5632, while Qwen-1.8's intermediate_size is 11008. May I ask what's the relationship of these two intermediate_size? and how to upcycle a mlp layer with size 11008 to a fine-grained expert with size 5632?

In the report, one MLP layer will be copied by 8 times, and one expert will split to 8 fine-grained experts, so it seems the moe_intermediate_size should be 1/8 of Qwen-1.8B's intermediate_size, but the moe_intermediate_size is 1408, Qwen-1.8B's intermediate_size is 11008, and 11008/1408=7.8182, so why it's not 8 times?

Thank you !

same question

This comment has been hidden
Qwen org

Hi, all! Please see our technical report on that matter: https://arxiv.org/html/2407.10671

jklj077 changed discussion status to closed

Sign up or log in to comment