-
- Downloads
[SPARK-9397] DataFrame should provide an API to find source data files if applicable
Certain applications would benefit from being able to inspect DataFrames that are straightforwardly produced by data sources that stem from files, and find out their source data. For example, one might want to display to a user the size of the data underlying a table, or to copy or mutate it. This PR exposes an `inputFiles` method on DataFrame which attempts to discover the source data in a best-effort manner, by inspecting HadoopFsRelations and JSONRelations. Author: Aaron Davidson <aaron@databricks.com> Closes #7717 from aarondav/paths and squashes the following commits: ff67430 [Aaron Davidson] inputFiles 0acd3ad [Aaron Davidson] [SPARK-9397] DataFrame should provide an API to find source data files if applicable
Showing
- sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala 18 additions, 2 deletionssql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 20 additions, 0 deletions.../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 3 additions, 3 deletions...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
Loading
Please register or sign in to comment