-
- Downloads
[Spark-5068][SQL]Fix bug query data when path doesn't exist for HiveContext
This PR follow up PR #3907 & #3891 & #4356. According to marmbrus liancheng 's comments, I try to use fs.globStatus to retrieve all FileStatus objects under path(s), and then do the filtering locally. [1]. get pathPattern by path, and put it into pathPatternSet. (hdfs://cluster/user/demo/2016/08/12 -> hdfs://cluster/user/demo/*/*/*) [2]. retrieve all FileStatus objects ,and cache them by undating existPathSet. [3]. do the filtering locally [4]. if we have new pathPattern,do 1,2 step again. (external table maybe have more than one partition pathPattern) chenghao-intel jeanlyn Author: lazymam500 <lazyman500@gmail.com> Author: lazyman <lazyman500@gmail.com> Closes #5059 from lazyman500/SPARK-5068 and squashes the following commits: 5bfcbfd [lazyman] move spark.sql.hive.verifyPartitionPath to SQLConf,fix scala style e1d6386 [lazymam500] fix scala style f23133f [lazymam500] bug fix 47e0023 [lazymam500] fix scala style,add config flag,break the chaining 04c443c [lazyman] SPARK-5068: fix bug when partition path doesn't exists #2 41f60ce [lazymam500] Merge pull request #1 from apache/master
Showing
- sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala 6 additions, 0 deletionssql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala 40 additions, 1 deletion...rc/main/scala/org/apache/spark/sql/hive/TableReader.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala 64 additions, 0 deletions...scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala
Please register or sign in to comment