Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23276

Trial Cloning and Resuming in Ray Tune

$
0
0

Consider the following code (condensed for brevity):

def modelTrainer(config):    model = myModel(dim1=config['dim1'], dim2=config['dim2'], dropout=config['dropout'])    optimizer = Adam(model.parameters(), lr=config['lr'])    trainDataset, valDataset, testDataset = loadData()    trainData = DataLoader(trainDataset, shuffle=True)    valData = DataLoader(valDataset, shuffle=True)    if checkpoint:        model.load_state_dict(...)    model.train() #train model on trainDataset    ...    model.eval() #evaluate model on valDataset    ...if __name__ == "__main__":  paramSpace = {'dim1': [10], 'dim2': [32, 64])}  pbtParamSpace = {'dropout': tune.uniform(0.1, 0.3), 'lr': tune.loguniform(1e-4, 1e-1)}  pbtScheduler = PopulationBasedTraining(hyperparam_mutations = pbtParamSpace)  tuner = tune.Tuner(modelTrainer, param_space=paramSpace, tune_config=tune.TuneConfig(scheduler=pbtScheduler))  results = tuner.fit()

I am tuning the dropout and lr for 2 possible model configurations of [dim1, dim2]: [10, 32] and [10, 64]. I have the following questions:

  • Does every trial in Ray execute every line of code in the Trainable (modelTrainer) when starting, cloning & restoring trials?
  • Should the line trainDataset, valDataset, testDataset = loadData() be outside the Trainable? Otherwise when a trial is resumed or cloned this can cause the 3 sets to mix across epochs.
  • If say after the perturbation interval, the [10, 64] model has a better metric than the [10, 32] model. Then (with quantile fraction=0.5), from the next epoch, is Tune going to trial 2 versions of the [10, 64] model (one the original and the other cloned from the original with perturbed dropout and lr values)?
  • Should I add extra lines of code for the perturbed parameters to be applied to the cloned trial (i.e., what prevents the original and cloned trial to run with the same set of parameters)?
  • With the PBT scheduler, is the implicit assumption that all the parameters in hyperparam_mutations should apply to every possible combination of parameters in param_space?

Viewing all articles
Browse latest Browse all 23276

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>