Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Call to cuDevicePrimaryCtxRetain results in CUDA_ERROR_OUT_OF_MEMORY

$
0
0

i'm tryng to do some backward/forward selection with cuML and SFS library. SFS isn't avaiable in cuda i imported it from Scikit-learn library. How ever when i use n_jobs=-1 i get some errors. my code, however, works when i use n_jobs=13. Unfortunatly that would cause the GPU to be used only 50% and when i use n_jobs=1 it uses 100% of cpu and very little GPU. i would like to find a way to use n_jobs=-1 which would lead to using GPU instead of GPU.

my code:

combined_df=cpd.concat([train_df,evaluate_df])cupy.asarray(combined_df.iloc[:,2:].values)combined_df=combined_df.astype('float32')test_fold = [0] * len(train_df) + [1] * len(evaluate_df)model = cuRF(n_estimators=100, max_depth=10,n_streams=1,verbose=6, random_state=42)sfs=SFS(model,n_features_to_select=2,direction='forward',scoring='r2',n_jobs=-1)#,cv=1PredefinedSplit(test_fold=test_fold)#result=sfs.fit(combined_df.iloc[:,2:].to_cupy(dtype='float32'),combined_df['Mcap_w'].to_cupy(dtype='float32'))#.to_numpy(dtype='float32').to_numpy(dtype='float32')result=sfs.fit(cupy.asarray(combined_df.iloc[:,2:].values).get(),cupy.asarray(combined_df['Mcap_w'].values).get())sfs_df = pd.DataFrame({'Feature': combined_df.iloc[:,2:].columns, 'SFS_filter': sfs.get_support().tolist()})sfs_df[sfs_df['SFS_filter']==True]['Feature'].tolist()

The error that i got:

_RemoteTraceback: """ Traceback (most recent call last): File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py",line 426, in _process_workercall_item = call_queue.get(block=True, timeout=timeout) File "/home/user/miniconda3/envs/rapid/lib/python3.10/multiprocessing/queues.py",line 122, in getreturn _ForkingPickler.loads(res) File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/init.py",line 17, in from cuml.internals.base import Base, UniversalBase File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/init.py",line 18, in from cuml.internals.base_helpers import BaseMetaClass, _tags_class_and_instance File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/base_helpers.py",line 20, in from cuml.internals.api_decorators import ( File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/api_decorators.py",line 24, in from cuml.internals import input_utils as iu File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/input_utils.py",line 19, in from cuml.internals.array import CumlArray File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/array.py",line 21, in from cuml.internals.global_settings import GlobalSettings File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/global_settings.py",line 20, in from cuml.internals.device_type import DeviceType File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/device_type.py",line 19, in from cuml.internals.mem_type import MemoryType File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/mem_type.py",line 22, in cudf = gpu_only_import("cudf") File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cuml/internals/safe_imports.py",line 356, in gpu_only_importreturn importlib.import_module(module) File "/home/user/miniconda3/envs/rapid/lib/python3.10/importlib/init.py",line 126, in import_modulereturn _bootstrap._gcd_import(name[level:], package, level) File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/init.py",line 27, in from cudf.core.algorithms import factorize File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/core/algorithms.py",line 10, in from cudf.core.indexed_frame import IndexedFrame File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/core/indexed_frame.py",line 59, in from cudf.core.groupby.groupby import GroupBy File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/core/groupby/init.py",line 3, in from cudf.core.groupby.groupby import GroupBy, Grouper File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/core/groupby/groupby.py",line 31, in from cudf.core.udf.groupby_utils import _can_be_jitted, jit_groupby_apply File"/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/core/udf/groupby_utils.py",line 11, in import cudf.core.udf.utils File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/core/udf/utils.py",line 66, in _PTX_FILE = get_ptx_file(os.path.dirname(file), "shim") File"/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/utils/_numba.py",line 47, in _get_ptx_filedev = cuda.get_current_device() File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/api.py",line 443, in get_current_devicereturn current_context().device File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py",line 220, in get_contextreturn _runtime.get_or_create_context(devnum) File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py",line 138, in get_or_create_contextreturn self._get_or_create_context_uncached(devnum) File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py",line 155, in _get_or_create_context_uncachedreturn self._activate_context_for(0) File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py",line 177, in _activate_context_fornewctx = gpu.get_primary_context() File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py",line 675, in get_primary_contextdriver.cuDevicePrimaryCtxRetain(byref(hctx), self.id) File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py",line 331, in safe_cuda_api_callself._check_ctypes_error(fname, retcode) File "/home/user/miniconda3/envs/rapid/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py",line 399, in _check_ctypes_errorraise CudaAPIError(retcode, msg) numba.cuda.cudadrv.driver.CudaAPIError: 2 Call tocuDevicePrimaryCtxRetain results in CUDA_ERROR_OUT_OF_MEMORY """

The above exception was the direct cause of the following exception:

BrokenProcessPool Traceback (most recent calllast) Cell In[11], line 129 sfs=SFS(model,n_features_to_select=2,direction='forward',scoring='r2',n_jobs=-1)#,cv=1PredefinedSplit(test_fold=test_fold)10 #result=sfs.fit(combined_df.iloc[:,2:].to_cupy(dtype='float32'),combined_df['Mcap_w'].to_cupy(dtype='float32'))#.to_numpy(dtype='float32').to_numpy(dtype='float32')---> 12 result=sfs.fit(cupy.asarray(combined_df.iloc[:,2:].values).get(),cupy.asarray(combined_df['Mcap_w'].values).get())14 #result=Parallel(prefer="threads")(delayed(sfs.fit)(cupy.asarray(combined_df.iloc[:,2:].values).get(),cupy.asarray(combined_df['Mcap_w'].values).get()))15 sfs_df = pd.DataFrame({'Feature': combined_df.iloc[:,2:].columns, 'SFS_filter':sfs.get_support().tolist()})

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/base.py:1351,in _fit_context..decorator..wrapper(estimator, *args,**kwargs) 1344 estimator._validate_params() 1346 with config_context( 1347 skip_parameter_validation=( 1348
prefer_skip_nested_validation or global_skip_validation 1349 )1350 ):-> 1351 return fit_method(estimator, *args, **kwargs)

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/feature_selection/_sequential.py:251,in SequentialFeatureSelector.fit(self, X, y)249 is_auto_select = self.tol is not None and self.n_features_to_select == "auto"250 for _ in range(n_iterations):--> 251 new_feature_idx, new_score = self._get_best_new_feature_score(252 cloned_estimator, X, y, cv, current_mask253 )254 if is_auto_select and ((new_score - old_score) < self.tol):255 break

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/feature_selection/_sequential.py:282,in SequentialFeatureSelector._get_best_new_feature_score(self,estimator, X, y, cv, current_mask)280 candidate_mask = ~candidate_mask281 X_new = X[:, candidate_mask]--> 282 scores[feature_idx] = cross_val_score(283 estimator,284 X_new,285 y,286 cv=cv,287 scoring=self.scoring,288 n_jobs=self.n_jobs,289 ).mean()290 new_feature_idx = max(scores, key=lambda feature_idx: scores[feature_idx])291 return new_feature_idx, scores[new_feature_idx]

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/utils/_param_validation.py:213,in validate_params..decorator..wrapper(*args,**kwargs)207 try:208 with config_context(209 skip_parameter_validation=(210 prefer_skip_nested_validation or global_skip_validation211 )212 ):--> 213 return func(*args, **kwargs)214 except InvalidParameterError as e:215 # When the function is just a wrapper around an estimator, we allow216 # the function to delegate validation to the estimator, but we replace217 # the name of the estimator by the name of the function in the error218 # message to avoid confusion.219 msg = re.sub(220 r"parameter of \w+ must be",221 f"parameter of {func.qualname} must be",222 str(e),223 )

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:714,in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs,verbose, fit_params, params, pre_dispatch, error_score)711 # To ensure multimetric format is not supported712 scorer = check_scoring(estimator, scoring=scoring)--> 714 cv_results = cross_validate(715 estimator=estimator,716 X=X,717 y=y,718 groups=groups,719 scoring={"score": scorer},720 cv=cv,721 n_jobs=n_jobs,722 verbose=verbose,723 fit_params=fit_params,724 params=params,725 pre_dispatch=pre_dispatch,726 error_score=error_score,727 )728 return cv_results["test_score"]

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/utils/_param_validation.py:213,in validate_params..decorator..wrapper(*args,**kwargs)207 try:208 with config_context(209 skip_parameter_validation=(210 prefer_skip_nested_validation or global_skip_validation211 )212 ):--> 213 return func(*args, **kwargs)214 except InvalidParameterError as e:215 # When the function is just a wrapper around an estimator, we allow216 # the function to delegate validation to the estimator, but we replace217 # the name of the estimator by the name of the function in the error218 # message to avoid confusion.219 msg = re.sub(220 r"parameter of \w+ must be",221 f"parameter of {func.qualname} must be",222 str(e),223 )

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:425,in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs,verbose, fit_params, params, pre_dispatch, return_train_score,return_estimator, return_indices, error_score)422 # We clone the estimator to make sure that all the folds are423 # independent, and that it is pickle-able.424 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch = pre_dispatch)--> 425 results = parallel(426 delayed(_fit_and_score)(427 clone(estimator),428 X,429 y,430 scorer=scorers,431 train=train,432 test=test,433 verbose=verbose,434 parameters=None,435 fit_params=routed_params.estimator.fit,436 score_params=routed_params.scorer.score,437 return_train_score=return_train_score,438 return_times=True,439 return_estimator=return_estimator,440 error_score=error_score,441 )442 for train, test in indices443 )445 _warn_or_raise_about_fit_failures(results, error_score)447 # For callable scoring, the return type is only know after calling. If the448 # return type is a dictionary, the error scores can now be inserted with449 # the correct key.

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/utils/parallel.py:67,in Parallel.call(self, iterable)62 config = get_config()63 iterable_with_config = (64 (_with_config(delayed_func, config), args, kwargs)65 for delayed_func, args, kwargs in iterable66 )---> 67 return super().call(iterable_with_config)

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/joblib/parallel.py:1952,in Parallel.call(self, iterable) 1946 # The first item from theoutput is blank, but it makes the interpreter 1947 # progress untilit enters the Try/Except block of the generator and 1948 # reachthe first yield statement. This starts the aynchronous 1949 #dispatch of the tasks to the workers. 1950 next(output)-> 1952 return output if self.return_generator else list(output)

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/joblib/parallel.py:1595,in Parallel._get_outputs(self, iterator, pre_dispatch) 1592
yield 1594 with self._backend.retrieval_context():-> 1595 yield from self._retrieve() 1597 except GeneratorExit: 1598 # The generator has been garbage collectedbefore being fully 1599 # consumed. This aborts the remainingtasks if possible and warn 1600 # the user if necessary.
1601 self._exception = True

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/joblib/parallel.py:1699,in Parallel._retrieve(self) 1692 while self._wait_retrieval():
1693 1694 # If the callback thread of a worker has signaledthat its task 1695 # triggered an exception, or if theretrieval loop has raised an 1696 # exception (e.g.GeneratorExit), exit the loop and surface the 1697 # workertraceback. 1698 if self._aborting:-> 1699 self._raise_error_fast() 1700 break 1702 # If the next job is not ready for retrieval yet, we just wait for 1703 # async callbacks to progress.

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/joblib/parallel.py:1734,in Parallel._raise_error_fast(self) 1730 # If this error jobexists, immediatly raise the error by 1731 # calling get_result.This job might not exists if abort has been 1732 # called directlyor if the generator is gc'ed. 1733 if error_job is not None:-> 1734 error_job.get_result(self.timeout)

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/joblib/parallel.py:736,in BatchCompletionCallBack.get_result(self, timeout)730 backend = self.parallel._backend732 if backend.supports_retrieve_callback:733 # We assume that the result has already been retrieved by the734 # callback thread, and is stored internally. It's just waiting to735 # be returned.--> 736 return self._return_or_raise()738 # For other backends, the main thread needs to run the retrieval step.739 try:

File~/miniconda3/envs/rapid/lib/python3.10/site-packages/joblib/parallel.py:754,in BatchCompletionCallBack._return_or_raise(self)752 try:753 if self.status == TASK_ERROR:--> 754 raise self._result755 return self._result756 finally:

BrokenProcessPool: A task has failed to un-serialize. Please ensurethat the arguments of the function are all picklable.

while the code was running i ran nvidia-smi

enter image description here

in this image you can clearly see the development of the RAM and GPU usange:enter image description here

As it is clear in the picture i have 8GB Ram, but it uses only 1/8 of the capacity. is it possible for me to use n_jobs=-1, or i should stick to n_jobs=13?


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>