-
- Downloads
[SPARK-16044][SQL] input_file_name() returns empty strings in data sources based on NewHadoopRDD
## What changes were proposed in this pull request? This PR makes `input_file_name()` function return the file paths not empty strings for external data sources based on `NewHadoopRDD`, such as [spark-redshift](https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149) and [spark-xml](https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47). The codes with the external data sources below: ```scala df.select(input_file_name).show() ``` will produce - **Before** ``` +-----------------+ |input_file_name()| +-----------------+ | | +-----------------+ ``` - **After** ``` +--------------------+ | input_file_name()| +--------------------+ |file:/private/var...| +--------------------+ ``` ## How was this patch tested? Unit tests in `ColumnExpressionSuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #13759 from HyukjinKwon/SPARK-16044.
Showing
- core/src/main/scala/org/apache/spark/rdd/InputFileNameHolder.scala 1 addition, 1 deletion...main/scala/org/apache/spark/rdd/InputFileNameHolder.scala
- core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala 7 additions, 0 deletionscore/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
- sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala 32 additions, 2 deletions...st/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
Loading
Please register or sign in to comment