-
- Downloads
[SPARK-15056][SQL] Parse Unsupported Sampling Syntax and Issue Better Exceptions
#### What changes were proposed in this pull request? Compared with the current Spark parser, there are two extra syntax are supported in Hive for sampling - In `On` clauses, `rand()` is used for indicating sampling on the entire row instead of an individual column. For example, ```SQL SELECT * FROM source TABLESAMPLE(BUCKET 3 OUT OF 32 ON rand()) s; ``` - Users can specify the total length to be read. For example, ```SQL SELECT * FROM source TABLESAMPLE(100M) s; ``` Below is the link for references: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling This PR is to parse and capture these two extra syntax, and issue a better error message. #### How was this patch tested? Added test cases to verify the thrown exceptions Author: gatorsmile <gatorsmile@gmail.com> Closes #12838 from gatorsmile/bucketOnRand.
Showing
- sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 6 additions, 1 deletion...in/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala 11 additions, 1 deletion...ala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala 5 additions, 1 deletion...rg/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Loading
Please register or sign in to comment