Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Trainer acts as if it's training from scratch

$
0
0

I'm training a model with a huggingface trainer and I specified the checkpoint folder for the resume_from_checkpoint parameter.However, when it continues to train, it still saves the checkpoints with the names corresponding to the first save steps (e.g. checkpoint-4 even though resume_from_checkpoint should start from checkpoint-4096). The progress bar shows all the max_steps as well, even though I don't want it to start from the beginning.

Is this a common problem? How do I fix this?

I save my training arguments in a yaml file:

training_args:   learning_rate: !!float 1e-4   do_train: true   per_device_train_batch_size: 8   per_device_eval_batch_size: 8   logging_steps: 1024   output_dir: /path/to/training_output/   overwrite_output_dir: False   remove_unused_columns: False   save_strategy: steps   evaluation_strategy: steps   save_steps: 1024   load_best_model_at_end: True   warmup_steps: 100   max_steps: 65536   seed: 22   resume_from_checkpoint: /path/to/checkpoint-4096

and then train the model by initialising a TrainingArguments object with these as **kwargs.

But the terminal displays:

Saving model checkpoint to /path/to/checkpoint-4

And the progress bar shows all the steps, even though I need it to start from step 4096.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>