Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16478

How do I solve the problem that pytorch still allocates space to the invisible gpu even though the visible gpu is set, but the allocation fails

$
0
0

Even though I set the visible gpu to 0,2,5,7, there is still a problem of insufficient allocated memory space, and the error gpu is gpu3.here are my error callback:

Traceback (most recent call last):  File "main_moco_files_dataset_strong_aug.py", line 500, in <module>    main()  File "main_moco_files_dataset_strong_aug.py", line 187, in main    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes    while not context.join():  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join    raise ProcessRaisedException(msg, error_index, failed_process.pid)torch.multiprocessing.spawn.ProcessRaisedException:-- Process 3 terminated with the following error:Traceback (most recent call last):  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap    fn(i, *args)  File "/disco/chenwei/lizijing/papercode/PatchSearch-main/main_moco_files_dataset_strong_aug.py", line 357, in main_worker    train(train_loader, model, optimizer, scaler, summary_writer, epoch, args)  File "/disco/chenwei/lizijing/papercode/PatchSearch-main/main_moco_files_dataset_strong_aug.py", line 406, in train    loss = model(images[0], images[1], moco_m)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl    result = self.forward(*input, **kwargs)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward    output = self.module(*inputs[0], **kwargs[0])  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl    result = self.forward(*input, **kwargs)  File "/disco/chenwei/lizijing/papercode/PatchSearch-main/moco/builder.py", line 144, in forward    q2 = self.predictor(self.base_encoder(x2))  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl    result = self.forward(*input, **kwargs)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 307, in forward    x = self.forward_features(x)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 299, in forward_features    x = self.blocks(x)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl    result = self.forward(*input, **kwargs)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward    input = module(input)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl    result = self.forward(*input, **kwargs)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 177, in forward    x = x + self.drop_path(self.attn(self.norm1(x)))  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl    result = self.forward(*input, **kwargs)  File "/root/disco/anaconda3/envs/patch_search/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 153, in forward    attn = attn.softmax(dim=-1)RuntimeError: CUDA out of memory. Tried to allocate 456.00 MiB (GPU 3; 44.53 GiB total capacity; 42.48 GiB already allocated; 244.44 MiB free; 42.76 GiB reserved in total by PyTorch)

here are my running config:

#!/usr/bin/env bashset -xset -eOUTPUT_DIR='/disco/chenwei/lizijing/code_output_dir'CODE_DIR='/disco/chenwei/lizijing/papercode/SSL-Backdoor-main/poison-generation/data'EXPERIMENT_ID='HTBA_trigger_10_targeted_n02106550'export CUDA_VISIBLE_DEVICES=0,2,5,7EXP_DIR=$OUTPUT_DIR/$EXPERIMENT_ID/mocoEVAL_DIR=$EXP_DIR/linearDEFENSE_DIR=$EXP_DIR/patch_search_iterative_search_test_images_size_1000_window_w_60_repeat_patch_1_prune_clusters_True_num_clusters_1000_per_iteration_samples_2_remove_0x25FILTERED_DIR=$DEFENSE_DIR/patch_search_poison_classifier_topk_20_ensemble_5_max_iterations_2000_seed_4789RATE='1.00'SEED=4789### STEP 1.1: pretrain the modelpython main_moco_files_dataset_strong_aug.py \     --seed $SEED \     -a vit_base --epochs 200 -b 1024 \     --stop-grad-conv1 --moco-m-cos \     --multiprocessing-distributed --world-size 1 --rank 0 \     --dist-url "tcp://localhost:$(( $RANDOM % 50 + 10000 ))" \     --save_folder $EXP_DIR \     $CODE_DIR/$EXPERIMENT_ID/train/loc_random_loc*_rate_${RATE}_targeted_True_*.txt

I guarantee that the CUDA VISIBLE DEVICES'CUDA VISIBLE DEVICES' environment variable has been set successfully because I added a print statement to the python program being executed and detected that the value of the corresponding environment variable is normal, which I now expect to be able to make. pytorch assigns the model correctly to the specified gpu


Viewing all articles
Browse latest Browse all 16478

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>