Some weights of the model checkpoint at ./model_dir were not used when initializing BertModel
Some weights of the model checkpoint at ./model_dir were not used when initializing BertModel: ['encoder.layer.2.mlp.wo.bias', 'encoder.layer.11.mlp.wo.weight', 'encoder.layer.0.mlp.layernorm.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.7.mlp.wo.weight', 'encoder.layer.8.mlp.wo.weight', 'encoder.layer.3.mlp.wo.weight', 'encoder.layer.1.mlp.layernorm.bias', 'encoder.layer.8.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.7.mlp.wo.bias', 'encoder.layer.9.mlp.layernorm.bias', 'encoder.layer.10.mlp.wo.weight', 'encoder.layer.11.mlp.layernorm.weight', 'encoder.layer.0.mlp.wo.weight', 'encoder.layer.8.mlp.wo.bias', 'encoder.layer.7.mlp.gated_layers.weight', 'encoder.layer.0.mlp.layernorm.bias', 'encoder.layer.11.mlp.gated_layers.weight', 'encoder.layer.3.mlp.wo.bias', 'encoder.layer.4.mlp.gated_layers.weight', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.9.mlp.wo.bias', 'encoder.layer.5.mlp.layernorm.weight', 'encoder.layer.10.mlp.layernorm.weight', 'encoder.layer.6.mlp.layernorm.bias', 'encoder.layer.2.mlp.gated_layers.weight', 'encoder.layer.4.mlp.layernorm.weight', 'encoder.layer.6.mlp.wo.bias', 'encoder.layer.7.mlp.layernorm.bias', 'encoder.layer.10.mlp.layernorm.bias', 'encoder.layer.0.mlp.gated_layers.weight', 'encoder.layer.4.mlp.wo.bias', 'encoder.layer.6.mlp.layernorm.weight', 'encoder.layer.2.mlp.wo.weight', 'encoder.layer.3.mlp.gated_layers.weight', 'encoder.layer.9.mlp.wo.weight', 'encoder.layer.7.mlp.layernorm.weight', 'encoder.layer.0.mlp.wo.bias', 'encoder.layer.10.mlp.gated_layers.weight', 'encoder.layer.4.mlp.layernorm.bias', 'encoder.layer.11.mlp.wo.bias', 'encoder.layer.8.mlp.layernorm.bias', 'encoder.layer.3.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.layernorm.bias', 'encoder.layer.4.mlp.wo.weight', 'encoder.layer.1.mlp.wo.bias', 'encoder.layer.1.mlp.layernorm.weight', 'encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.8.mlp.gated_layers.weight', 'encoder.layer.5.mlp.wo.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.9.mlp.layernorm.weight', 'encoder.layer.11.mlp.layernorm.bias', 'encoder.layer.1.mlp.wo.weight', 'encoder.layer.6.mlp.wo.weight', 'encoder.layer.9.mlp.gated_layers.weight', 'encoder.layer.6.mlp.gated_layers.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at ./model_dir and are newly initialized: ['encoder.layer.7.intermediate.dense.weight', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.11.output.dense.weight', 'encoder.layer.8.output.dense.weight', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.3.intermediate.dense.weight', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.6.output.dense.bias', 'encoder.layer.1.output.dense.bias', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.5.output.dense.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.11.output.dense.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.7.output.dense.weight', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.8.output.dense.bias', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.3.output.dense.bias', 'encoder.layer.9.output.LayerNorm.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.11.output.LayerNorm.weight', 'encoder.layer.4.output.dense.bias', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.9.output.dense.weight', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.2.output.dense.bias', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.4.output.dense.weight', 'encoder.layer.5.output.dense.weight', 'embeddings.position_embeddings.weight', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.10.output.dense.weight', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.0.output.dense.weight', 'encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.7.output.dense.bias', 'encoder.layer.9.output.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.2.output.dense.weight', 'encoder.layer.6.output.dense.weight', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.10.output.dense.bias', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.3.output.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
where u able to solve it ?
This usually happens if trust_remote_code=True
is missing when calling AutoModel.from_pretrained
. If this does not solve your problem, can you share the code and the version of the transformers package, which you were using to load the model?