from pyspark.sql import SparkSessionspark = SparkSession.builder.appName("Datacamp Pyspark Tutorial").config("spark.memory.offHeap.enabled","true").config("spark.memory.offHeap.size","10g").getOrCreate()df = spark.read.csv('datacamp_ecommerce.csv',header=True,escape="\"")df.show(5,0)
When I run using the option Run Python file (shown in the image above), it successfully displays the output in the TERMINAL tab of VSCode.
But when I use the first option Run Code (shown in image above), it throws the following error in the OUTPUT tab of VSCode.
[Running] python -u "c:\VSCode_PyProjects\VSCode_PysparkProj\venv\test.py"Traceback (most recent call last):File "c:\VSCode_PyProjects\VSCode_PysparkProj\venv\test.py", line 2, in from pyspark.sql import SparkSessionModuleNotFoundError: No module named 'pyspark'
Remarks:
- As shown in the image below, the
envfolder containspyspark. - I have
python,spark, pyspark, andjavainstalled on Windows-10 using this post. - I've also installed popular Code Runner extension in VSCode.
Question: Why the code works fine with Run Python File, but not with Run Code option. What I may be missing here, and how we can fix the issue?
