--- license: other license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms base_model: anakin87/gemma-2b-orpo tags: - orpo datasets: - alvarobartt/dpo-mix-7k-simplified language: - en --- # gemma-2b-orpo-GGUF This is a GGUF quantized version of the [`gemma-2b-orpo` model](https://huggingface.co/anakin87/gemma-2b-orpo/): an ORPO fine-tune of google/gemma-2b. You can find more information, including evaluation and training/usage notebook in the [`gemma-2b-orpo` model card](https://huggingface.co/anakin87/gemma-2b-orpo/) ## 🎮 Model in action The model can run with all the libraries that are part of the Llama.cpp ecosystem. If you need to apply the prompt template manually, take a look at the [tokenizer_config.json of the original model](https://huggingface.co/anakin87/gemma-2b-orpo/blob/main/tokenizer_config.json). 📱 **Run the model on a budget smartphone** -> [see my recent post](https://www.linkedin.com/posts/stefano-fiorucci_llm-genai-edgecomputing-activity-7183365537618411520-PU2s) Here a simple example with **Llama.cpp python**: ```python ! pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="anakin87/gemma-2b-orpo-GGUF", filename="gemma-2b-orpo.Q5_K_M.gguf", verbose=True # for a known bug, verbose must be True ) # text generation - prompt template applied manually llm("<|im_start|> user\nName the planets in the solar system<|im_end|>\n<|im_start|>assistant\n", max_tokens=75) # chat completion - prompt template automatically applied llm.create_chat_completion( messages = [ { "role": "user", "content": "Please list some places to visit in Italy" } ] ) ```