[MODELS] Discussion

#372
by victor HF staff - opened
Hugging Chat org
edited Apr 10

Here we can discuss about HuggingChat available models.

image.png

victor pinned discussion

what are limits of using these? how many api calls can i send them per month?

How can I know which model am using

How can I know which model am using

at the bottom of your screen:
image.png

Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding

Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.

apparently gemma cannot code?

Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Hugging Chat org

@typo777 can you share the conversation?

Hi, I have a single machine with 10 h100 gpus(0-9) 80Gb Gpu ram, when i load the model onto 2 gpus it works well, when i switch to 3 gpus (45 Gb per gpu) or higher (tested for 3-9)the model loads but when inferencing it give trash output …//// or gives and error like the probability contains nan or inf values. I have tried using device map = auto, also tried the empty weights loading and the model dispatch with llama decoder layer specified to be on one gpu, i tried custom device maps as well, i also tried many models all had this same issue. I used ollama and was able to load the model and infer on all 10 gpus, so i think that the issue is not with the gpus’s. I have also tried using different generation arguments and found out 1 thing that if you set ‘do sample’ false then you get the probability error else you get the output in …//// form. If the model is small you get some random russian, spanish etc words. I have also tried using different configurations like float16, bfloat16, float 32(no results waited for long time). I am sharing my code as well can you guys point me in right direction. Thanks a lot.

from transformers import pipeline
import os
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

os.environ[‘TRANSFORMERS_CACHE’] = ‘/data/HF_models’

checkpoint = “/data/HF_models/hub/models–meta-llama–Meta-Llama-3.1-70B/snapshots/7740ff69081bd553f4879f71eebcc2d6df2fbcb3”
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map=‘auto’, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

print(model)

message = “Tell me a joke”

pipe = pipeline(
“text-generation”,
model = model,
tokenizer = tokenizer,)

generation_args = {
“max_new_tokens”: 20,
#“return_full_text”: False,
#“temperature”: 0.4,
#“do_sample”: True, #false worked
#“top_p”: 0.5,
}

print(pipe(message, **generation_args))

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

Please can you share conversation?? if possible

hi, can we have deepseek v2.5 model?

I need community model features

unable to download "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8" model gets struck at 81%, no disk space issues on my side.

Sign up or log in to comment