from pyspark.sql import SparkSessionspark = SparkSession.builder.appName("Datacamp Pyspark Tutorial").config("spark.memory.offHeap.enabled","true").config("spark.memory.offHeap.size","10g").getOrCreate()df = spark.read.csv('datacamp_ecommerce.csv',header=True,escape="\"")df.show(5,0)
When I run using the option
Run Python file
(shown in the image above), it successfully displays the output in the TERMINAL
tab of VSCode
.
But when I use the first option Run Code
(shown in image above), it throws the following error in the OUTPUT
tab of VSCode
.
[Running] python -u "c:\VSCode_PyProjects\VSCode_PysparkProj\venv\test.py"Traceback (most recent call last):File "c:\VSCode_PyProjects\VSCode_PysparkProj\venv\test.py", line 2, in from pyspark.sql import SparkSessionModuleNotFoundError: No module named 'pyspark'
Remarks:
- As shown in the image below, the
env
folder containspyspark
. - I have
python
,spark
, pyspark, andjava
installed on Windows-10 using this post. - I've also installed popular Code Runner extension in VSCode.
Question: Why the code works fine with Run Python File
, but not with Run Code
option. What I may be missing here, and how we can fix the issue?