I am learning Python and using the PyCharm software. I installed the pyspark package using pip install pyspark. However, when I use the map() method and then collect(), I encounter an error. I'm not sure how to resolve it. Can anyone help me with this issue?and I am using pyspark 3.5.0,python 3.12.1如果我直接执行 rdd1.collect() ,能顺利输出。但是当我利用map函数执行一些操作时候(lambda x:x +1 )就会报错If I directly execute rdd1.collect(), it can output smoothly. But when I use the map function to perform some operations (lambda x:x +1), it will throw an error.About code:
from pyspark import SparkConf, SparkContextimport osos.environ["PYSPARK_PYTHON"] = "D:/python-workSpace/Testpython1/.venv/Scripts/python.exe"conf = SparkConf().setMaster("local").setAppName("sparkRDD")sc = SparkContext(conf=conf)rdd1 = sc.parallelize([6, 7])rdd2 = rdd1.map(lambda x: x + 1)print(rdd2.collect())sc.stop()
error msg:
23/12/26 16:30:19 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)23/12/26 16:30:19 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting jobTraceback (most recent call last): File "D:\python-workSpace\Testpython1\spark\teste3.py", line 8, in <module> print(rdd4.collect()) ^^^^^^^^^^^^^^ File "D:\python-workSpace\Testpython1\.venv\Lib\site-packages\pyspark\rdd.py", line 1833, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python-workSpace\Testpython1\.venv\Lib\site-packages\py4j\java_gateway.py", line 1322, in __call__ return_value = get_return_value( ^^^^^^^^^^^^^^^^^ File "D:\python-workSpace\Testpython1\.venv\Lib\site-packages\py4j\protocol.py", line 326, in get_return_value raise Py4JJavaError(py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (DESKTOP-2JGNEKL executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
我期待我可以使用map()函数,但是报错了