Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13951

OS Error When Trying to Import an Installed Python Wheel Package on Azure Databricks

$
0
0

I have a wheel package called my_sdk.whl that I have developed and built locally.
I also tested this package in a virtual environment using pip install my_sdk.whl, and tried using the modules on a local pyspark application, all are working perfectly.

Now, I tried uploading it to databricks file system under dbfs:/libraries/my_sdk.whl path and installed it in my interactive cluster using the Libraries tab in the Compute page. Restarted the cluster and after successful installation, I tried using it in a databricks repos notebook ie.

import my_sdk

Executing the above code would take 10-20 minutes in the Running command... status

Then after that I will get the following error:

OSError: [Errno 5] Input/output error: '/Workspace/Repos/my-userxxx/path/to/notebooks'---------------------------------------------------------------------------OSError                                   Traceback (most recent call last)File <command-3809752991307962>, line 1----> 1 import my_sdkFile <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)File <frozen importlib._bootstrap>:1002, in _find_and_load_unlocked(name, import_)File <frozen importlib._bootstrap>:945, in _find_spec(name, path, target)File <frozen importlib._bootstrap_external>:1439, in find_spec(cls, fullname, path, target)File <frozen importlib._bootstrap_external>:1411, in _get_spec(cls, fullname, path, target)File <frozen importlib._bootstrap_external>:1548, in find_spec(self, fullname, target)File <frozen importlib._bootstrap_external>:1591, in _fill_cache(self)OSError: [Errno 5] Input/output error: '/Workspace/Repos/my-userxxx/path/to/notebooks'

Any idea why I am getting this?

Additional Info:

  • It takes too long to execute but sometimes it can import the package successfully, and sometimes not

  • One thing that I noticed is when I run a notebook (not under the Repos folder) with a single cell import my_sdk, it can import the package without any issues. I believe it has to do with the library precedence as mentioned in microsoft documentation. Based on the second precedence Libraries in the Repo (Git folder) root directory (Repos only). It could be because the root folder in my Repos Workspace contains both adf resources and databricks resources which is why databricks takes so much time to search for a matching python package.

  • After fixing this, I will run the notebook on a Job Cluster and orchestrate it using Azure Databricks.

  • I am using Windows 10 and Python 3.10.11 to compile the wheel package.

  • The command I used to compile the wheel package is python -m build --wheel

  • My interactive cluster runtime version is 13.3 that terminates after 20 minutes.

  • The setup.py file contains the following:

"""Setup.py script for packaging project."""from setuptools import setup, find_packagesimport osdef read_pip_requirements(filename: str):    filepath = os.path.join(os.path.dirname(__file__), filename)    with open(filepath) as f:        return f.readlines()if __name__ == '__main__':    sdk_version = os.environ.get("BUILD_NUMBER")    if sdk_version is None:        raise ValueError("SDK Version Cannot be Null. Did you initialized the BUILD_NUMBER variable?")    setup(        name="my_sdk",        version=sdk_version,        package_dir={"": "src"},        packages=find_packages(where="src", include=["my_sdk*"]),        description="Software Development Kit for My Project",        install_requires=["pyspark==3.4.1"]    )

I tried the following but I am still facing the long running cell issue during the import and will randomly get an OS Error or a Successful Import after waiting.

  • Running %pip freeze command and it shows that the package was installed @ file:///local_disk0/tmp/addedFile375359a4e2e749fba4206df7c97999b07096403526362698460/my_sdk-10003-py3-none-any.whl
  • Using the web terminal and run python to see if I can import my_sdk, and I can import it Real Fast without any issue
  • Restarting the interactive cluster and running the notebook
  • Spinning a job cluster in a ADF Databricks Notebook activity with the wheel package configured in the Append libraries
  • Using an older Databricks Runtime version

Viewing all articles
Browse latest Browse all 13951

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>