Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14301

Pyenv - Switching between Python and PySpark versions without hardcoding environment variable paths for python

$
0
0

I have trouble getting different versions of PySpark to work correctly on my windows machine in combination with different versions of Python installed via PyEnv.

The setup:

  1. I installed pyenv and let it set the environment variables (PYENV, PYENV_HOME, PYENV_ROOT and the entry in PATH)
  2. I installed Amazon Coretto Java JDK (jdk1.8.0_412) and set the JAVA_HOME environment variable.
  3. I downloaded the winutils.exe & hadoop.dll from here and set the HADOOP_HOME environment variable.
  4. Via pyenv I installed Python 3.10.10 and then pyspark 3.4.1
  5. Via pyenv I installed Python 3.8.10 and then pyspark 3.2.1

Python works as expected:

  • I can switch between different versions with pyenv global <version>
  • When I use python --version in PowerShell it always shows the version that I set before with pyenv.

But I'm having trouble with PySpark.

For one, I cannot start PySpark via the powershell console by running pyspark>>> The term 'pyspark' is not recognized as the name of a cmdlet, function, script file.....

More annoyingly, my repo-scripts (with a .venv created via pyenv & poetry) also fail:

  • Caused by: java.io.IOException: Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified [...] Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified

However, both work after I add the following two entries to the PATH environment variable:

  • C:\Users\myuser\.pyenv\pyenv-win\versions\3.10.10
  • C:\Users\myuser\.pyenv\pyenv-win\versions\3.10.10\Scripts

but I would have to "hardcode" the Python Version - which is exactly what I don't want to do while using pyenv.

If I hardcode the path, even if I switch to another Python version (pyenv global 3.8.10), once I run pyspark in Powershell, the version PySpark 3.4.1 starts from the environment PATH entry for Python 3.10.10. I also cannot just do anything with python in the command line as it always points to the hardcoded python version, no matter what I do with pyenv.

I was hoping to be able to start PySpark 3.2.1 from Python 3.8.10 which I just "activated" with pyenv globally.

What do I have to do to be able to switch between the Python installations (and thus also between PySparks) with pyenv without "hardcoding" the Python paths?

Example PySpark script:

from pyspark.sql import SparkSessionspark = (    SparkSession    .builder    .master("local[*]")    .appName("myapp")    .getOrCreate())data = [("Finance", 10),        ("Marketing", 20),        ]df = spark.createDataFrame(data=data)df.show(10, False)

Viewing all articles
Browse latest Browse all 14301

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>