Commits · cf95d728c64f76e8b1065d7cacf1c3ad7769e935 · cs525-sp18-g07 / spark

Mar 03, 2016

[SPARK-13543][SQL] Support for specifying compression codec for Parquet/ORC via option() · cf95d728

hyukjinkwon authored 9 years ago

## What changes were proposed in this pull request?

This PR adds the support to specify compression codecs for both ORC and Parquet.

## How was this patch tested?

unittests within IDE and code style tests with `dev/run_tests`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #11464 from HyukjinKwon/SPARK-13543.

cf95d728

Feb 29, 2016

[SPARK-13509][SPARK-13507][SQL] Support for writing CSV with a single function call · 02aa499d

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-13507
https://issues.apache.org/jira/browse/SPARK-13509

## What changes were proposed in this pull request?
This PR adds the support to write CSV data directly by a single call to the given path.

Several unitests were added for each functionality.
## How was this patch tested?

This was tested with unittests and with `dev/run_tests` for coding style

Author: hyukjinkwon <gurwls223@gmail.com>
Author: Hyukjin Kwon <gurwls223@gmail.com>

Closes #11389 from HyukjinKwon/SPARK-13507-13509.

02aa499d

Jan 28, 2016

[SPARK-12749][SQL] add json option to parse floating-point types as DecimalType · 3a40c0e5

Brandon Bradley authored 9 years ago

I tried to add this via `USE_BIG_DECIMAL_FOR_FLOATS` option from Jackson with no success.

Added test for non-complex types. Should I add a test for complex types?

Author: Brandon Bradley <bradleytastic@gmail.com>

Closes #10936 from blbradley/spark-12749.

3a40c0e5

Jan 04, 2016
- [SPARK-12600][SQL] Remove deprecated methods in Spark SQL · 77ab49b8
  Reynold Xin authored 9 years ago
  
  Author: Reynold Xin <rxin@databricks.com> Closes #10559 from rxin/remove-deprecated-sql.
  77ab49b8
Jan 03, 2016

[SPARK-12537][SQL] Add option to accept quoting of all character backslash quoting mechanism · b8410ff9

Cazen authored 9 years ago

We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.

Author: Cazen <Cazen@korea.com>
Author: Cazen Lee <cazen.lee@samsung.com>
Author: Cazen Lee <Cazen@korea.com>
Author: cazen.lee <cazen.lee@samsung.com>

Closes #10497 from Cazen/master.

b8410ff9

Dec 17, 2015

[SQL] Update SQLContext.read.text doc · 6e077166

Yanbo Liang authored 9 years ago

Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10349 from yanboliang/text-value.

6e077166

Nov 24, 2015

[SPARK-11967][SQL] Consistent use of varargs for multiple paths in DataFrameReader · 25bbd3c1

Reynold Xin authored 9 years ago

This patch makes it consistent to use varargs in all DataFrameReader methods, including Parquet, JSON, text, and the generic load function.

Also added a few more API tests for the Java API.

Author: Reynold Xin <rxin@databricks.com>

Closes #9945 from rxin/SPARK-11967.

25bbd3c1

Nov 18, 2015
- [SPARK-11804] [PYSPARK] Exception raise when using Jdbc predicates opt… · 3a6807fd
  Jeff Zhang authored 9 years ago
  
  …ion in PySpark Author: Jeff Zhang <zjffdu@apache.org> Closes #9791 from zjffdu/SPARK-11804.
  3a6807fd
Nov 16, 2015

[SPARK-11745][SQL] Enable more JSON parsing options · 42de5253

Reynold Xin authored 9 years ago

This patch adds the following options to the JSON data source, for dealing with non-standard JSON files:
* `allowComments` (default `false`): ignores Java/C++ style comment in JSON records
* `allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names
* `allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes
* `allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers (e.g. 00012)

To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options.

Also updated documentation to explain these options.

Scala

![screen shot 2015-11-15 at 6 12 12 pm](https://cloud.githubusercontent.com/assets/323388/11172965/e3ace6ec-8bc4-11e5-805e-2d78f80d0ed6.png)

Python

![screen shot 2015-11-15 at 6 11 28 pm](https://cloud.githubusercontent.com/assets/323388/11172964/e23ed6ee-8bc4-11e5-8216-312f5983acd5.png)

Author: Reynold Xin <rxin@databricks.com>

Closes #9724 from rxin/SPARK-11745.

42de5253

Nov 06, 2015

[HOTFIX] Fix python tests after #9527 · 105732dc

Michael Armbrust authored 9 years ago

#9527 missed updating the python tests.

Author: Michael Armbrust <michael@databricks.com>

Closes #9533 from marmbrus/hotfixTextValue.

105732dc

Oct 28, 2015

[SPARK-11292] [SQL] Python API for text data source · 5aa05219

Reynold Xin authored 9 years ago

Adds DataFrameReader.text and DataFrameWriter.text.

Author: Reynold Xin <rxin@databricks.com>

Closes #9259 from rxin/SPARK-11292.

5aa05219

Oct 17, 2015

[SPARK-10185] [SQL] Feat sql comma separated paths · 57f83e36

Koert Kuipers authored 9 years ago

Make sure comma-separated paths get processed correcly in ResolvedDataSource for a HadoopFsRelationProvider

Author: Koert Kuipers <koert@tresata.com>

Closes #8416 from koertkuipers/feat-sql-comma-separated-paths.

57f83e36

Sep 08, 2015
- [SPARK-10373] [PYSPARK] move @since into pyspark from sql · 3a11e50e
  Davies Liu authored 9 years ago
  
  cc mengxr Author: Davies Liu <davies@databricks.com> Closes #8657 from davies/move_since.
  3a11e50e
Aug 27, 2015

[SPARK-9964] [PYSPARK] [SQL] PySpark DataFrameReader accept RDD of String for JSON · ce97834d

Yanbo Liang authored 9 years ago

PySpark DataFrameReader should could accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path.
If this PR is merged, it should be duplicated to cover the other input types (not just JSON).

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8444 from yanboliang/spark-9964.

ce97834d

Aug 14, 2015
- [SPARK-9828] [PYSPARK] Mutable values should not be default arguments · ffa05c84
  MechCoder authored 9 years ago
  
  Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #8110 from MechCoder/spark-9828.
  ffa05c84
Aug 05, 2015

[SPARK-6591] [SQL] Python data source load options should auto convert common types into strings · 8c320e45

Yijie Shen authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-6591

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #7926 from yjshen/py_dsload_opt and squashes the following commits:

b207832 [Yijie Shen] fix style
efdf834 [Yijie Shen] resolve comment
7a8f6a2 [Yijie Shen] lowercase
822e769 [Yijie Shen] convert load opts to string

8c320e45

Jul 21, 2015

[SPARK-9100] [SQL] Adds DataFrame reader/writer shortcut methods for ORC · d38c5029

Cheng Lian authored 9 years ago

This PR adds DataFrame reader/writer shortcut methods for ORC in both Scala and Python.

Author: Cheng Lian <lian@databricks.com>

Closes #7444 from liancheng/spark-9100 and squashes the following commits:

284d043 [Cheng Lian] Fixes PySpark test cases and addresses PR comments
e0b09fb [Cheng Lian] Adds DataFrame reader/writer shortcut methods for ORC

d38c5029

Jun 29, 2015

[SPARK-8698] partitionBy in Python DataFrame reader/writer interface should... · 660c6cec

Reynold Xin authored 10 years ago

[SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.

Author: Reynold Xin <rxin@databricks.com>

Closes #7079 from rxin/SPARK-8698 and squashes the following commits:

8513e1c [Reynold Xin] [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.

660c6cec

[SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala · ac2e17b0

Cheolsoo Park authored 10 years ago

I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match.

I added `Option` to reader and writer and updated the `pyspark-sql` test.

Author: Cheolsoo Park <cheolsoop@netflix.com>

Closes #7078 from piaozhexiu/SPARK-8355 and squashes the following commits:

c63d419 [Cheolsoo Park] Fix version
524e0aa [Cheolsoo Park] Add option function to df reader and writer

ac2e17b0

Jun 22, 2015

[SPARK-8532] [SQL] In Python's DataFrameWriter,... · 5ab9fcfb

Yin Huai authored 10 years ago

[SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json/parquet/jdbc always override mode

https://issues.apache.org/jira/browse/SPARK-8532

This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.

Author: Yin Huai <yhuai@databricks.com>

Closes #6937 from yhuai/SPARK-8532 and squashes the following commits:

f972d5d [Yin Huai] davies's comment.
d37abd2 [Yin Huai] style.
d21290a [Yin Huai] Python doc.
889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
d696dff [Yin Huai] Python style.
88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
c40c461 [Yin Huai] Regression test.

5ab9fcfb

Jun 03, 2015

[SPARK-8060] Improve DataFrame Python test coverage and documentation. · ce320cb2

Reynold Xin authored 10 years ago

Author: Reynold Xin <rxin@databricks.com>

Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits:

baa8ad5 [Reynold Xin] Code review feedback.
f081d47 [Reynold Xin] More documentation updates.
c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing.

ce320cb2

Jun 02, 2015

[SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with Scala · 445647a1

Davies Liu authored 10 years ago

add schema()/format()/options() for reader,  add mode()/format()/options()/partitionBy() for writer

cc rxin yhuai  pwendell

Author: Davies Liu <davies@databricks.com>

Closes #6578 from davies/readwrite and squashes the following commits:

720d293 [Davies Liu] address comments
b65dfa2 [Davies Liu] Update readwriter.py
1299ab6 [Davies Liu] make Python API consistent with Scala

445647a1

May 23, 2015

[SPARK-7840] add insertInto() to Writer · be47af1b

Davies Liu authored 10 years ago

Add tests later.

Author: Davies Liu <davies@databricks.com>

Closes #6375 from davies/insertInto and squashes the following commits:

826423e [Davies Liu] add insertInto() to Writer

be47af1b

May 21, 2015

[SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs · 8ddcb25b

Davies Liu authored 10 years ago

Add version info for public Python SQL API.

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #6295 from davies/versions and squashes the following commits:

cfd91e6 [Davies Liu] add more version for DataFrame API
600834d [Davies Liu] add version to SQL API docs

8ddcb25b

May 19, 2015

[SPARK-7738] [SQL] [PySpark] add reader and writer API in Python · 4de74d26

Davies Liu authored 10 years ago

cc rxin, please take a quick look, I'm working on tests.

Author: Davies Liu <davies@databricks.com>

Closes #6238 from davies/readwrite and squashes the following commits:

c7200eb [Davies Liu] update tests
9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
f0c5a04 [Davies Liu] use sqlContext.read.load
5f68bc8 [Davies Liu] update tests
6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
bcc6668 [Davies Liu] add reader amd writer API in Python

4de74d26