Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14360

How to increase the width of hidden linear layers in Mistral 7B model?

$
0
0

After installing

!pip install -U bitsandbytes!pip install -U transformers!pip install -U peft!pip install -U accelerate!pip install -U trl

And then some boilerplates to load the Mistral model:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, loggingfrom peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_modelfrom datasets import load_datasetfrom trl import SFTTrainerimport torchbnb_config = BitsAndBytesConfig(      load_in_4bit= True,    bnb_4bit_quant_type= "nf4",    bnb_4bit_compute_dtype= torch.bfloat16,    bnb_4bit_use_double_quant= False,)base_model="mistralai/Mistral-7B-v0.1"model = AutoModelForCausalLM.from_pretrained(        base_model,        quantization_config=bnb_config,        torch_dtype=torch.bfloat16,        device_map="auto",        trust_remote_code=True,)model.config.use_cache = False # silence the warningsmodel.config.pretraining_tp = 1model.gradient_checkpointing_enable()tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)tokenizer.padding_side = 'right'tokenizer.pad_token = tokenizer.eos_tokentokenizer.add_eos_token = Truetokenizer.add_bos_token, tokenizer.add_eos_token

We can see the layers/architecture:

>>> model

[out]:

MistralForCausalLM(  (model): MistralModel(    (embed_tokens): Embedding(32000, 4096)    (layers): ModuleList(      (0-31): 32 x MistralDecoderLayer(        (self_attn): MistralAttention(          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)          (rotary_emb): MistralRotaryEmbedding()        )        (mlp): MistralMLP(          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)          (act_fn): SiLU()        )        (input_layernorm): MistralRMSNorm()        (post_attention_layernorm): MistralRMSNorm()      )    )    (norm): MistralRMSNorm()  )  (lm_head): Linear(in_features=4096, out_features=32000, bias=False))

Is there any way to increase the width size of the Linear4bit layers?

E.g. if we want the model to take in another 800 more hidden nodes layer to get

MistralForCausalLM(  (model): MistralModel(    (embed_tokens): Embedding(32000, 4896)    (layers): ModuleList(      (0-31): 32 x MistralDecoderLayer(        (self_attn): MistralAttention(          (q_proj): Linear4bit(in_features=4896, out_features=4896, bias=False)          (k_proj): Linear4bit(in_features=4896, out_features=1024, bias=False)          (v_proj): Linear4bit(in_features=4896, out_features=1024, bias=False)          (o_proj): Linear4bit(in_features=4896, out_features=4896, bias=False)          (rotary_emb): MistralRotaryEmbedding()        )        (mlp): MistralMLP(          (gate_proj): Linear4bit(in_features=4896, out_features=14336, bias=False)          (up_proj): Linear4bit(in_features=4896, out_features=14336, bias=False)          (down_proj): Linear4bit(in_features=14336, out_features=4896, bias=False)          (act_fn): SiLU()        )        (input_layernorm): MistralRMSNorm()        (post_attention_layernorm): MistralRMSNorm()      )    )    (norm): MistralRMSNorm()  )  (lm_head): Linear(in_features=4896, out_features=32000, bias=False))

Note: It’s okay if the additional hidden nodes in the Linear4bit be randomly initialized.


Viewing all articles
Browse latest Browse all 14360

Trending Articles