The problem is that depending on how you run Spark you can see or not Hive databases. I do three next actions:
- Using Hive
hive> show databases;OKdefaultmydbsparkdbTime taken: 0.041 seconds, Fetched: 3 row(s)- Using pyspark3 shell
Using Python version 3.6.9 (default, Feb 28 2023 09:55:20)Spark context Web UI available at http://training.us-west4-b.c.ace-sight-379210.internal:4040Spark context available as 'sc' (master = yarn, app id = application_1678451025755_0002).SparkSession available as 'spark'.>>> spark.sql(""" show databases; """)23/03/10 12:24:55 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist23/03/10 12:24:55 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist+---------+|namespace|+---------+| default|| mydb|| sparkdb|+---------+- Using pyspark script, spark3-submit --master=yarn script.py
from pyspark.sql import SparkSessionspark = SparkSession.builder.master("yarn").appName("Test").getOrCreate()print(spark.sql(""" show databases """))output is
+---------+|namespace|+---------+| default|+---------+Can anyone explain what's wrong with the 3rd method? It doesn't show all tables. The file hive-site.xml is copied to spark/conf folder and the 2nd method works fine.