Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16684

LLaMa2 fine tune on m1 mac with llama.cpp

$
0
0

device: m1 mac,os: sonomalang : python,Editor: vscode, ipynb

i have a question for attach the lora module to the llama model. (for QLoRa )

Before attaching the module, I applied quantization to the model and executed the command below.

! ./main -m /to/model/path/ggml-model-f32_q4_0.guff --lora /to/model/path/ggml-adapter-model.bin

here's problem. it never stop and performance drops sharply.-> warning: using a lora adapter with a quantized model may result in poor quality

Getting into the llm, and doing some finetuning on the m1 mac is my primary goal. Unfortunately, the quantization library is not supported, and these issues have arisen. Additionally, it doesn't have anything to do with the question, so please advise anything you think is a problem.

1st, Is the --lora option correct for the model to attach modules?

2nd, Am I right to try the right way?

what i tryed:

  1. tried to use a quantization library.(Not supported)-> get peft model

  2. tried llama.cpp with --lora option.-> Execution does not stop, nor does it save

I want the model with the module attached to it to be returned.


Viewing all articles
Browse latest Browse all 16684

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>