I am using a LLM model, CohereForAI/c4ai-command-r-plus-4bit
, to do some inference. I have a GPU but it's not powerful enough so I want to use CPU. Below are the example codes and problems.
Code:
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLMPRETRAIN_MODEL = 'CohereForAI/c4ai-command-r-plus-4bit'tokenizer = AutoTokenizer.from_pretrained(PRETRAIN_MODEL)model = AutoModelForCausalLM.from_pretrained(PRETRAIN_MODEL, device_map='cpu')text = "this is an example"inputs = tokenizer(text, return_tensors="pt")with torch.no_grad(): outputs = model(**inputs) embedding = outputs.last_hidden_state.mean(dim=1).squeeze().numpy()print(embedding.shape)
Error:
ValueError: Expected a cuda device, but got: CPU
Does it mean that the c4ai-command-r-plus-4bit
model can only run on GPU? Is there anything I missed to run it on CPU? Thanks!