Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14126

spark-nlp : DocumentAssembler initializing failing with java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class ESG

$
0
0

I'm trying to use the john snow ESG model.

And I keep getting the following error:

Line document_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document')

Error java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class

Py4JJavaError: An error occurred while calling None.com.johnsnowlabs.nlp.DocumentAssembler.: java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class    at com.johnsnowlabs.nlp.DocumentAssembler.<init>(DocumentAssembler.scala:16)    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)    at py4j.Gateway.invoke(Gateway.java:257)    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)    at py4j.ClientServerConnection.run(ClientServerConnection.java:115)    at java.lang.Thread.run(Thread.java:750)Caused by: java.lang.ClassNotFoundException: org.apache.spark.ml.util.MLWritable$class    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)    at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)    ... 13 more

I'm working over Databricks, with the following clusters:

  • Cluster 1:
    • Runtime: 13.2 ML (includes Apache Spark 3.4.0, GPU, Scala 2.12)
    • Worker & Driver type: Standard_NC21s_v3 224 GB Memory, 2 GPUs
  • Cluster 2:
    • Runtime: 12.2 LTS ML (includes Apache Spark 3.3.2, Scala 2.12)
    • Node type: Standard_DS5_v2 56 GB Memory, 16 Cores

Added libraries to the cluster are according to here:

  • PyPi: spark-nlp (tried with and without version)
  • PyPi: pyspark (tried with and without version)
  • Maven: com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.0

Spark NLP Version: 5.2.2

Spark Version: 3.4.0 (Tried also with 14.1 Cluster with 3.5.0 version)

Code:

import sparknlpspark = sparknlp.start()sparknlp.version(), spark.versionfrom sparknlp.base import *from sparknlp.annotator import *from pyspark.ml import Pipelineimport pandas as pddocument_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document')

I found the following questions and data online but haven't manage to get a solution:

  1. spark-nlp : DocumentAssembler initializing failing
  2. Maven dependency for java.lang.NoClassDefFoundError
  3. java.lang.NoClassDefFoundError
  4. java.lang.NoClassDefFoundError
  5. NoClassDefFoundError: org/apache/spark/ml/util/MLWritable
  6. java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class
  7. NoClassDefFoundError: org/apache/spark/ml/util/MLWritable
  8. TypeError: 'JavaPackage' object is not callable - DocumentAssembler() - Spark NLP
  9. Natural language processing
  10. apache-spark-support

Viewing all articles
Browse latest Browse all 14126

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>