python main. py Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. model. safetensors.index.json: 100%| | 13.5k/13.5k [00:00‹?, PB/s] model-00001-of-00002. safetensors: 100% | 4.95G/4.95G [07:27<00:00, 11. 1MB/s] model-00002-of-00002. safetensors: 100% 67. 1M/67.1M [00:05<00:00, 11.5MB/s] Downloading shards: 100% || | 2/2 [07:35‹00:00, 227.61s/it] Gemma's activation function should be approximate GeLU and not exact GeLU. Changing the activation function to 'gelu_pytorch_tanh.if you want to use the legacy "gelu', edit the "model.config to set hidden_activation=gelu* instead of todden act instead of hidden_act. See for more details. Loading checkpoint shards: 100%| | 2/2 [00:03<00:00, 1.87s/itl generation_config json: 100%|| 137/137[00:00<?」3B/s]