Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

Spark doesn't see Hive tables depending on how you run it

$
0
0

The problem is that depending on how you run Spark you can see or not Hive databases. I do three next actions:

  1. Using Hive

hive> show databases;OKdefaultmydbsparkdbTime taken: 0.041 seconds, Fetched: 3 row(s)
  1. Using pyspark3 shell

Using Python version 3.6.9 (default, Feb 28 2023 09:55:20)Spark context Web UI available at http://training.us-west4-b.c.ace-sight-379210.internal:4040Spark context available as 'sc' (master = yarn, app id = application_1678451025755_0002).SparkSession available as 'spark'.>>> spark.sql(""" show databases; """)23/03/10 12:24:55 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist23/03/10 12:24:55 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist+---------+|namespace|+---------+|  default||     mydb||  sparkdb|+---------+
  1. Using pyspark script, spark3-submit --master=yarn script.py

from pyspark.sql import SparkSessionspark = SparkSession.builder.master("yarn").appName("Test").getOrCreate()print(spark.sql(""" show databases """))

output is

+---------+|namespace|+---------+|  default|+---------+

Can anyone explain what's wrong with the 3rd method? It doesn't show all tables. The file hive-site.xml is copied to spark/conf folder and the 2nd method works fine.


Viewing all articles
Browse latest Browse all 23160

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>