Skip to content
Snippets Groups Projects
Commit 17bdc36e authored by aokolnychyi's avatar aokolnychyi Committed by gatorsmile
Browse files

[SPARK-21102][SQL] Refresh command is too aggressive in parsing

### Idea

This PR adds validation to REFRESH sql statements. Currently, users can specify whatever they want as resource path. For example, spark.sql("REFRESH ! $ !") will be executed without any exceptions.

### Implementation

I am not sure that my current implementation is the most optimal, so any feedback is appreciated. My first idea was to make the grammar as strict as possible. Unfortunately, there were some problems. I tried the approach below:

SqlBase.g4
```
...
    | REFRESH TABLE tableIdentifier                                    #refreshTable
    | REFRESH resourcePath                                             #refreshResource
...

resourcePath
    : STRING
    | (IDENTIFIER | number | nonReserved | '/' | '-')+ // other symbols can be added if needed
    ;
```
It is not flexible enough and requires to explicitly mention all possible symbols. Therefore, I came up with the current approach that is implemented in the code.

Let me know your opinion on which one is better.

Author: aokolnychyi <anton.okolnychyi@sap.com>

Closes #18368 from aokolnychyi/spark-21102.
parent eb7a5a66
No related branches found
No related tags found
No related merge requests found
......@@ -149,7 +149,7 @@ statement
| (DESC | DESCRIBE) TABLE? option=(EXTENDED | FORMATTED)?
tableIdentifier partitionSpec? describeColName? #describeTable
| REFRESH TABLE tableIdentifier #refreshTable
| REFRESH .*? #refreshResource
| REFRESH (STRING | .*?) #refreshResource
| CACHE LAZY? TABLE tableIdentifier (AS? query)? #cacheTable
| UNCACHE TABLE (IF EXISTS)? tableIdentifier #uncacheTable
| CLEAR CACHE #clearCache
......
......@@ -230,11 +230,25 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
}
/**
* Create a [[RefreshTable]] logical plan.
* Create a [[RefreshResource]] logical plan.
*/
override def visitRefreshResource(ctx: RefreshResourceContext): LogicalPlan = withOrigin(ctx) {
val resourcePath = remainder(ctx.REFRESH.getSymbol).trim
RefreshResource(resourcePath)
val path = if (ctx.STRING != null) string(ctx.STRING) else extractUnquotedResourcePath(ctx)
RefreshResource(path)
}
private def extractUnquotedResourcePath(ctx: RefreshResourceContext): String = withOrigin(ctx) {
val unquotedPath = remainder(ctx.REFRESH.getSymbol).trim
validate(
unquotedPath != null && !unquotedPath.isEmpty,
"Resource paths cannot be empty in REFRESH statements. Use / to match everything",
ctx)
val forbiddenSymbols = Seq(" ", "\n", "\r", "\t")
validate(
!forbiddenSymbols.exists(unquotedPath.contains(_)),
"REFRESH statements cannot contain ' ', '\\n', '\\r', '\\t' inside unquoted resource paths",
ctx)
unquotedPath
}
/**
......
......@@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.expressions.{Ascending, Concat, SortOrder}
import org.apache.spark.sql.catalyst.parser.ParseException
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, RepartitionByExpression, Sort}
import org.apache.spark.sql.execution.command._
import org.apache.spark.sql.execution.datasources.CreateTable
import org.apache.spark.sql.execution.datasources.{CreateTable, RefreshResource}
import org.apache.spark.sql.internal.{HiveSerDe, SQLConf}
import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructType}
......@@ -66,6 +66,25 @@ class SparkSqlParserSuite extends AnalysisTest {
}
}
test("refresh resource") {
assertEqual("REFRESH prefix_path", RefreshResource("prefix_path"))
assertEqual("REFRESH /", RefreshResource("/"))
assertEqual("REFRESH /path///a", RefreshResource("/path///a"))
assertEqual("REFRESH pat1h/112/_1a", RefreshResource("pat1h/112/_1a"))
assertEqual("REFRESH pat1h/112/_1a/a-1", RefreshResource("pat1h/112/_1a/a-1"))
assertEqual("REFRESH path-with-dash", RefreshResource("path-with-dash"))
assertEqual("REFRESH \'path with space\'", RefreshResource("path with space"))
assertEqual("REFRESH \"path with space 2\"", RefreshResource("path with space 2"))
intercept("REFRESH a b", "REFRESH statements cannot contain")
intercept("REFRESH a\tb", "REFRESH statements cannot contain")
intercept("REFRESH a\nb", "REFRESH statements cannot contain")
intercept("REFRESH a\rb", "REFRESH statements cannot contain")
intercept("REFRESH a\r\nb", "REFRESH statements cannot contain")
intercept("REFRESH @ $a$", "REFRESH statements cannot contain")
intercept("REFRESH ", "Resource paths cannot be empty in REFRESH statements")
intercept("REFRESH", "Resource paths cannot be empty in REFRESH statements")
}
test("show functions") {
assertEqual("show functions", ShowFunctionsCommand(None, None, true, true))
assertEqual("show all functions", ShowFunctionsCommand(None, None, true, true))
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment