I have a wheel package called my_sdk.whl
that I have developed and built locally.
I also tested this package in a virtual environment using pip install my_sdk.whl, and tried using the modules on a local pyspark application, all are working perfectly.
Now, I tried uploading it to databricks file system under dbfs:/libraries/my_sdk.whl
path and installed it in my interactive cluster using the Libraries tab in the Compute page. Restarted the cluster and after successful installation, I tried using it in a databricks repos notebook ie.
import my_sdk
Executing the above code would take 10-20 minutes in the Running command... status
Then after that I will get the following error:
OSError: [Errno 5] Input/output error: '/Workspace/Repos/my-userxxx/path/to/notebooks'---------------------------------------------------------------------------OSError Traceback (most recent call last)File <command-3809752991307962>, line 1----> 1 import my_sdkFile <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)File <frozen importlib._bootstrap>:1002, in _find_and_load_unlocked(name, import_)File <frozen importlib._bootstrap>:945, in _find_spec(name, path, target)File <frozen importlib._bootstrap_external>:1439, in find_spec(cls, fullname, path, target)File <frozen importlib._bootstrap_external>:1411, in _get_spec(cls, fullname, path, target)File <frozen importlib._bootstrap_external>:1548, in find_spec(self, fullname, target)File <frozen importlib._bootstrap_external>:1591, in _fill_cache(self)OSError: [Errno 5] Input/output error: '/Workspace/Repos/my-userxxx/path/to/notebooks'
Any idea why I am getting this?
Additional Info:
It takes too long to execute but sometimes it can import the package successfully, and sometimes not
One thing that I noticed is when I run a notebook (not under the Repos folder) with a single cell import my_sdk, it can import the package without any issues. I believe it has to do with the library precedence as mentioned in microsoft documentation. Based on the second precedence Libraries in the Repo (Git folder) root directory (Repos only). It could be because the root folder in my Repos Workspace contains both adf resources and databricks resources which is why databricks takes so much time to search for a matching python package.
After fixing this, I will run the notebook on a Job Cluster and orchestrate it using Azure Databricks.
I am using Windows 10 and Python 3.10.11 to compile the wheel package.
The command I used to compile the wheel package is
python -m build --wheel
My interactive cluster runtime version is 13.3 that terminates after 20 minutes.
The setup.py file contains the following:
"""Setup.py script for packaging project."""from setuptools import setup, find_packagesimport osdef read_pip_requirements(filename: str): filepath = os.path.join(os.path.dirname(__file__), filename) with open(filepath) as f: return f.readlines()if __name__ == '__main__': sdk_version = os.environ.get("BUILD_NUMBER") if sdk_version is None: raise ValueError("SDK Version Cannot be Null. Did you initialized the BUILD_NUMBER variable?") setup( name="my_sdk", version=sdk_version, package_dir={"": "src"}, packages=find_packages(where="src", include=["my_sdk*"]), description="Software Development Kit for My Project", install_requires=["pyspark==3.4.1"] )
I tried the following but I am still facing the long running cell issue during the import and will randomly get an OS Error or a Successful Import after waiting.
- Running
%pip freeze
command and it shows that the package was installed @file:///local_disk0/tmp/addedFile375359a4e2e749fba4206df7c97999b07096403526362698460/my_sdk-10003-py3-none-any.whl
- Using the web terminal and run python to see if I can import my_sdk, and I can import it Real Fast without any issue
- Restarting the interactive cluster and running the notebook
- Spinning a job cluster in a ADF Databricks Notebook activity with the wheel package configured in the Append libraries
- Using an older Databricks Runtime version