How to run LLM from transformers library under Windows without GPU?

I have no GPU but I can run openbuddy-llama3-8b-v21.1-8k from ollama. It works with speed of ~1 t/s.

But it doesn't work when I try the following code:

from transformers import (    AutoModelForCausalLM,    AutoTokenizer,    GenerationConfig,)import torchnew_model = "openbuddy/openbuddy-llama3-8b-v21.1-8k"model = AutoModelForCausalLM.from_pretrained(    new_model,    device_map="auto",    trust_remote_code=True,    torch_dtype=torch.bfloat16,    low_cpu_mem_usage=True,)tokenizer = AutoTokenizer.from_pretrained(    new_model,    max_length=2048,    trust_remote_code=True,    use_fast=True,)tokenizer.pad_token = tokenizer.eos_tokentokenizer.padding_side = "right"prompt = """<|im_start|>systemYou are a helpful AI assistant.<|im_end|><|im_start|>userКакоткрытьброкерскийсчет?<|im_end|><|im_start|>assistant"""inputs = tokenizer.encode(    prompt, return_tensors="pt", add_special_tokens=False).cpu() generation_config = GenerationConfig(    max_new_tokens=700,    temperature=0.5,    top_p=0.9,    top_k=40,    repetition_penalty=1.1,     do_sample=True,    pad_token_id=tokenizer.eos_token_id,    eos_token_id=tokenizer.eos_token_id,)outputs = model.generate(    generation_config=generation_config,    input_ids=inputs,)print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Looks like model.generate works much slower comparing to running from ollama.

I see that the process uses 25% of cpu only.

Where am I wrong?

How to run LLM from transformers library under Windows without GPU?

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...