Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23305

Run LLama 2 on GPU

$
0
0

I want to run LLama2 on a GPU since it takes forever to create answers with CPU. I have access to a nvidia a6000 through a jupyter notebook. I have installed everything and the responses are fine but it takes so long and isn't fast enough for my research purposes.

import torchimport transformersfrom transformers import LlamaForCausalLM, LlamaTokenizerimport setGPUmodel_dir = "llama/llama-2-7b-chat-hf"model = LlamaForCausalLM.from_pretrained(model_dir)tokenizer = LlamaTokenizer.from_pretrained(model_dir)pipeline = transformers.pipeline("text-generation",    model=model,    tokenizer=tokenizer,    torch_dtype=torch.float16,)sequences = pipeline('I wanna hear some news. What is up today',    do_sample=True,    top_k=10,    num_return_sequences=1,    eos_token_id=tokenizer.eos_token_id,    max_length=400,)for seq in sequences:    print(f"{seq['generated_text']}")

This is my current code. When I try nvidia-smi in terminal, the GPU is always at 0% whereas the CPU RAM increases extremely, so it's 100% running on the CPU. How can I make it run on the GPU?

I have made this tutorial directly from meta: https://ai.meta.com/blog/5-steps-to-getting-started-with-llama-2/


Viewing all articles
Browse latest Browse all 23305

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>