-
- Downloads
[SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-expanding...
[SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-expanding buffer as default for maxCharsPerColumn option in CSV ## What changes were proposed in this pull request? This PR includes the changes below: 1. Upgrade Univocity library from 2.1.1 to 2.2.1 This includes some performance improvement and also enabling auto-extending buffer in `maxCharsPerColumn` option in CSV. Please refer the [release notes](https://github.com/uniVocity/univocity-parsers/releases). 2. Remove useless `rowSeparator` variable existing in `CSVOptions` We have this unused variable in [CSVOptions.scala#L127](https://github.com/apache/spark/blob/29952ed096fd2a0a19079933ff691671d6f00835/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L127) but it seems possibly causing confusion that it actually does not care of `\r\n`. For example, we have an issue open about this, [SPARK-17227](https://issues.apache.org/jira/browse/SPARK-17227), describing this variable. This variable is virtually not being used because we rely on `LineRecordReader` in Hadoop which deals with only both `\n` and `\r\n`. 3. Set the default value of `maxCharsPerColumn` to auto-expending. We are setting 1000000 for the length of each column. It'd be more sensible we allow auto-expending rather than fixed length by default. To make sure, using `-1` is being described in the release note, [2.2.0](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.2.0). ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #15138 from HyukjinKwon/SPARK-17583.
Showing
- dev/deps/spark-deps-hadoop-2.2 1 addition, 1 deletiondev/deps/spark-deps-hadoop-2.2
- dev/deps/spark-deps-hadoop-2.3 1 addition, 1 deletiondev/deps/spark-deps-hadoop-2.3
- dev/deps/spark-deps-hadoop-2.4 1 addition, 1 deletiondev/deps/spark-deps-hadoop-2.4
- dev/deps/spark-deps-hadoop-2.6 1 addition, 1 deletiondev/deps/spark-deps-hadoop-2.6
- dev/deps/spark-deps-hadoop-2.7 1 addition, 1 deletiondev/deps/spark-deps-hadoop-2.7
- python/pyspark/sql/readwriter.py 1 addition, 1 deletionpython/pyspark/sql/readwriter.py
- python/pyspark/sql/streaming.py 1 addition, 1 deletionpython/pyspark/sql/streaming.py
- sql/core/pom.xml 1 addition, 1 deletionsql/core/pom.xml
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 2 additions, 2 deletions...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala 1 addition, 3 deletions...ache/spark/sql/execution/datasources/csv/CSVOptions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala 0 additions, 2 deletions...pache/spark/sql/execution/datasources/csv/CSVParser.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 2 additions, 2 deletions...ala/org/apache/spark/sql/streaming/DataStreamReader.scala
Loading
Please register or sign in to comment