Skip to content
Snippets Groups Projects
Commit 1b50e0e0 authored by Dongjoon Hyun's avatar Dongjoon Hyun Committed by gatorsmile
Browse files

[SPARK-20256][SQL] SessionState should be created more lazily

## What changes were proposed in this pull request?

`SessionState` is designed to be created lazily. However, in reality, it created immediately in `SparkSession.Builder.getOrCreate` ([here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L943)).

This PR aims to recover the lazy behavior by keeping the options into `initialSessionOptions`. The benefit is like the following. Users can start `spark-shell` and use RDD operations without any problems.

**BEFORE**
```scala
$ bin/spark-shell
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'
...
Caused by: org.apache.spark.sql.AnalysisException:
    org.apache.hadoop.hive.ql.metadata.HiveException:
       MetaException(message:java.security.AccessControlException:
          Permission denied: user=spark, access=READ,
             inode="/apps/hive/warehouse":hive:hdfs:drwx------
```
As reported in SPARK-20256, this happens when the warehouse directory is not allowed for this user.

**AFTER**
```scala
$ bin/spark-shell
...
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.range(0, 10, 1).count()
res0: Long = 10
```

## How was this patch tested?

Manual.

This closes #18512 .

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #18501 from dongjoon-hyun/SPARK-20256.
parent a3c29fcb
No related branches found
No related tags found
No related merge requests found
...@@ -117,6 +117,12 @@ class SparkSession private( ...@@ -117,6 +117,12 @@ class SparkSession private(
existingSharedState.getOrElse(new SharedState(sparkContext)) existingSharedState.getOrElse(new SharedState(sparkContext))
} }
/**
* Initial options for session. This options are applied once when sessionState is created.
*/
@transient
private[sql] val initialSessionOptions = new scala.collection.mutable.HashMap[String, String]
/** /**
* State isolated across sessions, including SQL configurations, temporary tables, registered * State isolated across sessions, including SQL configurations, temporary tables, registered
* functions, and everything else that accepts a [[org.apache.spark.sql.internal.SQLConf]]. * functions, and everything else that accepts a [[org.apache.spark.sql.internal.SQLConf]].
...@@ -132,9 +138,11 @@ class SparkSession private( ...@@ -132,9 +138,11 @@ class SparkSession private(
parentSessionState parentSessionState
.map(_.clone(this)) .map(_.clone(this))
.getOrElse { .getOrElse {
SparkSession.instantiateSessionState( val state = SparkSession.instantiateSessionState(
SparkSession.sessionStateClassName(sparkContext.conf), SparkSession.sessionStateClassName(sparkContext.conf),
self) self)
initialSessionOptions.foreach { case (k, v) => state.conf.setConfString(k, v) }
state
} }
} }
...@@ -940,7 +948,7 @@ object SparkSession { ...@@ -940,7 +948,7 @@ object SparkSession {
} }
session = new SparkSession(sparkContext, None, None, extensions) session = new SparkSession(sparkContext, None, None, extensions)
options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) } options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) }
defaultSession.set(session) defaultSession.set(session)
// Register a successfully instantiated context to the singleton. This should be at the // Register a successfully instantiated context to the singleton. This should be at the
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment