Skip to content
Snippets Groups Projects
  1. Jun 23, 2015
    • Xiangrui Meng's avatar
      [SPARK-7157][SQL] add sampleBy to DataFrame · 0401cbaa
      Xiangrui Meng authored
      Add `sampleBy` to DataFrame. rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6769 from mengxr/SPARK-7157 and squashes the following commits:
      
      991f26f [Xiangrui Meng] fix seed
      4a14834 [Xiangrui Meng] move sampleBy to stat
      832f7cc [Xiangrui Meng] add sampleBy to DataFrame
      0401cbaa
  2. Jun 17, 2015
    • MechCoder's avatar
      [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to PySpark · 6765ef98
      MechCoder authored
      MatrixUDT was recently coded in scala. This has been ported to PySpark
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6354 from MechCoder/spark-6390 and squashes the following commits:
      
      fc4dc1e [MechCoder] Better error message
      c940a44 [MechCoder] Added test
      aa9c391 [MechCoder] Add pyUDT to MatrixUDT
      62a2a7d [MechCoder] [SPARK-6390] Port MatrixUDT to PySpark
      6765ef98
  3. Jun 09, 2015
    • Reynold Xin's avatar
      [SPARK-7886] Add built-in expressions to FunctionRegistry. · 1b499993
      Reynold Xin authored
      This patch switches to using FunctionRegistry for built-in expressions. It is based on #6463, but with some work to simplify it along with unit tests.
      
      TODOs for future pull requests:
      - Use static registration so we don't need to register all functions every time we start a new SQLContext
      - Switch to using this in HiveContext
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Santiago M. Mola <santi@mola.io>
      
      Closes #6710 from rxin/udf-registry and squashes the following commits:
      
      6930822 [Reynold Xin] Fixed Python test.
      b802c9a [Reynold Xin] Made UDF case insensitive.
      e60d815 [Reynold Xin] Made UDF case insensitive.
      852f9c0 [Reynold Xin] Fixed style violation.
      e76a3c1 [Reynold Xin] Fixed parser.
      52ddaba [Reynold Xin] Fixed compilation.
      ee7854f [Reynold Xin] Improved error reporting.
      ff906f2 [Reynold Xin] More robust constructor calling.
      77b46f1 [Reynold Xin] Simplified the code.
      2a2a149 [Reynold Xin] Merge pull request #6463 from smola/SPARK-7886
      8616924 [Santiago M. Mola] [SPARK-7886] Add built-in expressions to FunctionRegistry.
      1b499993
    • Liang-Chi Hsieh's avatar
      [SPARK-7990][SQL] Add methods to facilitate equi-join on multiple joining keys · 7658eb28
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7990
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6616 from viirya/multi_keys_equi_join and squashes the following commits:
      
      cd5c888 [Liang-Chi Hsieh] Import reduce in python3.
      c43722c [Liang-Chi Hsieh] For comments.
      0400e89 [Liang-Chi Hsieh] Fix scala style.
      cc90015 [Liang-Chi Hsieh] Add methods to facilitate equi-join on multiple joining keys.
      7658eb28
  4. Jun 07, 2015
  5. Jun 05, 2015
  6. Jun 04, 2015
    • Mike Dusenberry's avatar
      [SPARK-7969] [SQL] Added a DataFrame.drop function that accepts a Column reference. · df7da07a
      Mike Dusenberry authored
      Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests.  Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits:
      
      514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0.
      2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over.
      6bf7c0e [Mike Dusenberry] Minor code formatting change.
      e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name.
      5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names.
      4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference.
      986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests.  Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function.
      df7da07a
  7. Jun 03, 2015
    • Reynold Xin's avatar
      [SPARK-8060] Improve DataFrame Python test coverage and documentation. · ce320cb2
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits:
      
      baa8ad5 [Reynold Xin] Code review feedback.
      f081d47 [Reynold Xin] More documentation updates.
      c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing.
      ce320cb2
  8. Jun 01, 2015
  9. May 23, 2015
    • Davies Liu's avatar
      [SPARK-7840] add insertInto() to Writer · be47af1b
      Davies Liu authored
      Add tests later.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6375 from davies/insertInto and squashes the following commits:
      
      826423e [Davies Liu] add insertInto() to Writer
      be47af1b
    • Davies Liu's avatar
      [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates · efe3bfdf
      Davies Liu authored
      1. ntile should take an integer as parameter.
      2. Added Python API (based on #6364)
      3. Update documentation of various DataFrame Python functions.
      
      Author: Davies Liu <davies@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6374 from rxin/window-final and squashes the following commits:
      
      69004c7 [Reynold Xin] Style fix.
      288cea9 [Reynold Xin] Update documentaiton.
      7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
      66092b4 [Davies Liu] update docs
      ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
      ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
      8936ade [Davies Liu] fix maxint in python 3
      2649358 [Davies Liu] update docs
      778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
      efe3bfdf
  10. May 21, 2015
    • Davies Liu's avatar
      [SPARK-7783] [SQL] [PySpark] add DataFrame.rollup/cube in Python · 17791a58
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6311 from davies/rollup and squashes the following commits:
      
      0261db1 [Davies Liu] use @since
      a51ca6b [Davies Liu] Merge branch 'master' of github.com:apache/spark into rollup
      8ad5af4 [Davies Liu] Update dataframe.py
      ade3841 [Davies Liu] add DataFrame.rollup/cube in Python
      17791a58
    • Davies Liu's avatar
      [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs · 8ddcb25b
      Davies Liu authored
      Add version info for public Python SQL API.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6295 from davies/versions and squashes the following commits:
      
      cfd91e6 [Davies Liu] add more version for DataFrame API
      600834d [Davies Liu] add version to SQL API docs
      8ddcb25b
  11. May 19, 2015
    • Davies Liu's avatar
      [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python · 4de74d26
      Davies Liu authored
      cc rxin, please take a quick look, I'm working on tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6238 from davies/readwrite and squashes the following commits:
      
      c7200eb [Davies Liu] update tests
      9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
      f0c5a04 [Davies Liu] use sqlContext.read.load
      5f68bc8 [Davies Liu] update tests
      6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
      bcc6668 [Davies Liu] add reader amd writer API in Python
      4de74d26
  12. May 18, 2015
    • Xiangrui Meng's avatar
      [SPARK-6657] [PYSPARK] Fix doc warnings · 1ecfac6e
      Xiangrui Meng authored
      Fixed the following warnings in `make clean html` under `python/docs`:
      
      ~~~
      /Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:3: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:4: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:3: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:4: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/sql/__init__.py:docstring of pyspark.sql.DataFrame.replace:16: WARNING: Field list ends without a blank line; unexpected unindent.
      /Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:8: ERROR: Unexpected indentation.
      /Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:9: WARNING: Block quote ends without a blank line; unexpected unindent.
      ~~~
      
      davies
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6221 from mengxr/SPARK-6657 and squashes the following commits:
      
      e3f83fe [Xiangrui Meng] fix sql and streaming doc warnings
      2b4371e [Xiangrui Meng] fix mllib python doc warnings
      1ecfac6e
  13. May 15, 2015
    • Davies Liu's avatar
      [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files · d7b69946
      Davies Liu authored
      dataframe.py is splited into column.py, group.py and dataframe.py:
      ```
         360 column.py
        1223 dataframe.py
         183 group.py
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6201 from davies/split_df and squashes the following commits:
      
      fc8f5ab [Davies Liu] split dataframe.py into multiple files
      d7b69946
  14. May 14, 2015
    • Michael Armbrust's avatar
      [SPARK-7548] [SQL] Add explode function for DataFrames · 6d0633e3
      Michael Armbrust authored
      Add an `explode` function for dataframes and modify the analyzer so that single table generating functions can be present in a select clause along with other expressions.   There are currently the following restrictions:
       - only top level TGFs are allowed (i.e. no `select(explode('list) + 1)`)
       - only one may be present in a single select to avoid potentially confusing implicit Cartesian products.
      
      TODO:
       - [ ] Python
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6107 from marmbrus/explodeFunction and squashes the following commits:
      
      7ee2c87 [Michael Armbrust] whitespace
      6f80ba3 [Michael Armbrust] Update dataframe.py
      c176c89 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      81b5da3 [Michael Armbrust] style
      d3faa05 [Michael Armbrust] fix self join case
      f9e1e3e [Michael Armbrust] fix python, add since
      4f0d0a9 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      e710fe4 [Michael Armbrust] add java and python
      52ca0dc [Michael Armbrust] [SPARK-7548][SQL] Add explode function for dataframes.
      6d0633e3
  15. May 12, 2015
    • Reynold Xin's avatar
      [SPARK-7321][SQL] Add Column expression for conditional statements (when/otherwise) · 97dee313
      Reynold Xin authored
      This builds on https://github.com/apache/spark/pull/5932 and should close https://github.com/apache/spark/pull/5932 as well.
      
      As an example:
      ```python
      df.select(when(df['age'] == 2, 3).otherwise(4).alias("age")).collect()
      ```
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: kaka1992 <kaka_1992@163.com>
      
      Closes #6072 from rxin/when-expr and squashes the following commits:
      
      8f49201 [Reynold Xin] Throw exception if otherwise is applied twice.
      0455eda [Reynold Xin] Reset run-tests.
      bfb9d9f [Reynold Xin] Updated documentation and test cases.
      762f6a5 [Reynold Xin] Merge pull request #5932 from kaka1992/IFCASE
      95724c6 [kaka1992] Update
      8218d0a [kaka1992] Update
      801009e [kaka1992] Update
      76d6346 [kaka1992] [SPARK-7321][SQL] Add Column expression for conditional statements (if, case)
      97dee313
    • Daoyuan Wang's avatar
      [SPARK-6876] [PySpark] [SQL] add DataFrame na.replace in pyspark · d86ce845
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #6003 from adrian-wang/pynareplace and squashes the following commits:
      
      672efba [Daoyuan Wang] remove py2.7 feature
      4a148f7 [Daoyuan Wang] to_replace support dict, value support single value, and add full tests
      9e232e7 [Daoyuan Wang] rename scala map
      af0268a [Daoyuan Wang] remove na
      63ac579 [Daoyuan Wang] add na.replace in pyspark
      d86ce845
  16. May 11, 2015
  17. May 08, 2015
    • Wenchen Fan's avatar
      [SPARK-7133] [SQL] Implement struct, array, and map field accessor · 2d05f325
      Wenchen Fan authored
      It's the first step: generalize UnresolvedGetField to support all map, struct, and array
      TODO: add `apply` in Scala and `__getitem__` in Python, and unify the `getItem` and `getField` methods to one single API(or should we keep them for compatibility?).
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #5744 from cloud-fan/generalize and squashes the following commits:
      
      715c589 [Wenchen Fan] address comments
      7ea5b31 [Wenchen Fan] fix python test
      4f0833a [Wenchen Fan] add python test
      f515d69 [Wenchen Fan] add apply method and test cases
      8df6199 [Wenchen Fan] fix python test
      239730c [Wenchen Fan] fix test compile
      2a70526 [Wenchen Fan] use _bin_op in dataframe.py
      6bf72bc [Wenchen Fan] address comments
      3f880c3 [Wenchen Fan] add java doc
      ab35ab5 [Wenchen Fan] fix python test
      b5961a9 [Wenchen Fan] fix style
      c9d85f5 [Wenchen Fan] generalize UnresolvedGetField to support all map, struct, and array
      2d05f325
  18. May 07, 2015
    • Shiti's avatar
      [SPARK-7295][SQL] bitwise operations for DataFrame DSL · fa8fddff
      Shiti authored
      Author: Shiti <ssaxena.ece@gmail.com>
      
      Closes #5867 from Shiti/spark-7295 and squashes the following commits:
      
      71a9913 [Shiti] implementation for bitwise and,or, not and xor on Column with tests and docs
      fa8fddff
  19. May 05, 2015
    • 云峤's avatar
      [SPARK-7294][SQL] ADD BETWEEN · 735bc3d0
      云峤 authored
      Author: 云峤 <chensong.cs@alibaba-inc.com>
      Author: kaka1992 <kaka_1992@163.com>
      
      Closes #5839 from kaka1992/master and squashes the following commits:
      
      b15360d [kaka1992] Fix python unit test in sql/test. =_= I forget to commit this file last time.
      f928816 [kaka1992] Fix python style in sql/test.
      d2e7f72 [kaka1992] Fix python style in sql/test.
      c54d904 [kaka1992] Fix empty map bug.
      7e64d1e [云峤] Update
      7b9b858 [云峤] undo
      f080f8d [云峤] update pep8
      76f0c51 [云峤] Merge remote-tracking branch 'remotes/upstream/master'
      7d62368 [云峤] [SPARK-7294] ADD BETWEEN
      baf839b [云峤] [SPARK-7294] ADD BETWEEN
      d11d5b9 [云峤] [SPARK-7294] ADD BETWEEN
      735bc3d0
    • Burak Yavuz's avatar
      [SPARK-7243][SQL] Reduce size for Contingency Tables in DataFrames · 18340d7b
      Burak Yavuz authored
      Reduced take size from 1e8 to 1e6.
      
      cc rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5900 from brkyvz/df-cont-followup and squashes the following commits:
      
      c11e762 [Burak Yavuz] fix grammar
      b30ace2 [Burak Yavuz] address comments
      a417ba5 [Burak Yavuz] [SPARK-7243][SQL] Reduce  size for Contingency Tables in DataFrames
      18340d7b
  20. May 04, 2015
    • Burak Yavuz's avatar
      [SPARK-7243][SQL] Contingency Tables for DataFrames · 80554111
      Burak Yavuz authored
      Computes a pair-wise frequency table of the given columns. Also known as cross-tabulation.
      cc mengxr rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5842 from brkyvz/df-cont and squashes the following commits:
      
      a07c01e [Burak Yavuz] addressed comments v4.1
      ae9e01d [Burak Yavuz] fix test
      9106585 [Burak Yavuz] addressed comments v4.0
      bced829 [Burak Yavuz] fix merge conflicts
      a63ad00 [Burak Yavuz] addressed comments v3.0
      a0cad97 [Burak Yavuz] addressed comments v3.0
      6805df8 [Burak Yavuz] addressed comments and fixed test
      939b7c4 [Burak Yavuz] lint python
      7f098bc [Burak Yavuz] add crosstab pyTest
      fd53b00 [Burak Yavuz] added python support for crosstab
      27a5a81 [Burak Yavuz] implemented crosstab
      80554111
    • 云峤's avatar
      [SPARK-7319][SQL] Improve the output from DataFrame.show() · f32e69ec
      云峤 authored
      Author: 云峤 <chensong.cs@alibaba-inc.com>
      
      Closes #5865 from kaka1992/df.show and squashes the following commits:
      
      c79204b [云峤] Update
      a1338f6 [云峤] Update python dataFrame show test and add empty df unit test.
      734369c [云峤] Update python dataFrame show test and add empty df unit test.
      84aec3e [云峤] Update python dataFrame show test and add empty df unit test.
      159b3d5 [云峤] update
      03ef434 [云峤] update
      7394fd5 [云峤] update test show
      ced487a [云峤] update pep8
      b6e690b [云峤] Merge remote-tracking branch 'upstream/master' into df.show
      30ac311 [云峤] [SPARK-7294] ADD BETWEEN
      7d62368 [云峤] [SPARK-7294] ADD BETWEEN
      baf839b [云峤] [SPARK-7294] ADD BETWEEN
      d11d5b9 [云峤] [SPARK-7294] ADD BETWEEN
      f32e69ec
  21. May 03, 2015
    • Burak Yavuz's avatar
      [SPARK-7241] Pearson correlation for DataFrames · 9646018b
      Burak Yavuz authored
      submitting this PR from a phone, excuse the brevity.
      adds Pearson correlation to Dataframes, reusing the covariance calculation code
      
      cc mengxr rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5858 from brkyvz/df-corr and squashes the following commits:
      
      285b838 [Burak Yavuz] addressed comments v2.0
      d10babb [Burak Yavuz] addressed comments v0.2
      4b74b24 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into df-corr
      4fe693b [Burak Yavuz] addressed comments v0.1
      a682d06 [Burak Yavuz] ready for PR
      9646018b
  22. May 02, 2015
    • Dean Chen's avatar
      [SPARK-3444] Fix typo in Dataframes.py introduced in [] · 856a571e
      Dean Chen authored
      Author: Dean Chen <deanchen5@gmail.com>
      
      Closes #5866 from deanchen/patch-1 and squashes the following commits:
      
      0028bc4 [Dean Chen] Fix typo in Dataframes.py introduced in [SPARK-3444]
      856a571e
    • Burak Yavuz's avatar
      [SPARK-7242] added python api for freqItems in DataFrames · 2e0f3579
      Burak Yavuz authored
      The python api for DataFrame's plus addressed your comments from previous PR.
      rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5859 from brkyvz/df-freq-py2 and squashes the following commits:
      
      f9aa9ce [Burak Yavuz] addressed comments v0.1
      4b25056 [Burak Yavuz] added python api for freqItems
      2e0f3579
  23. May 01, 2015
    • Holden Karau's avatar
      [SPARK-3444] Provide an easy way to change log level · ae98eec7
      Holden Karau authored
      Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from pwendel & davies
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #5791 from holdenk/SPARK-3444-provide-an-easy-way-to-change-log-level-r2 and squashes the following commits:
      
      3bf3be9 [Holden Karau] fix exception
      42ba873 [Holden Karau] fix exception
      9117244 [Holden Karau] Only allow valid log levels, throw exception if invalid log level.
      338d7bf [Holden Karau] rename setLoggingLevel to setLogLevel
      fac14a0 [Holden Karau] Fix style errors
      d9d03f3 [Holden Karau] Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from @pwendel & @davies
      ae98eec7
    • Burak Yavuz's avatar
      [SPARK-7240][SQL] Single pass covariance calculation for dataframes · 4dc8d744
      Burak Yavuz authored
      Added the calculation of covariance between two columns to DataFrames.
      
      cc mengxr rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5825 from brkyvz/df-cov and squashes the following commits:
      
      cb18046 [Burak Yavuz] changed to sample covariance
      f2e862b [Burak Yavuz] fixed failed test
      51e39b8 [Burak Yavuz] moved implementation
      0c6a759 [Burak Yavuz] addressed math comments
      8456eca [Burak Yavuz] fix pyStyle3
      aa2ad29 [Burak Yavuz] fix pyStyle2
      4e97a50 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into df-cov
      e3b0b85 [Burak Yavuz] addressed comments v0.1
      a7115f1 [Burak Yavuz] fix python style
      7dc6dbc [Burak Yavuz] reorder imports
      408cb77 [Burak Yavuz] initial commit
      4dc8d744
  24. Apr 29, 2015
  25. Apr 23, 2015
  26. Apr 22, 2015
  27. Apr 21, 2015
    • Davies Liu's avatar
      [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression · ab9128fb
      Davies Liu authored
      This PR enable auto_convert in JavaGateway, then we could register a converter for a given types, for example, date and datetime.
      
      There are two bugs related to auto_convert, see [1] and [2], we workaround it in this PR.
      
      [1]  https://github.com/bartdag/py4j/issues/160
      [2] https://github.com/bartdag/py4j/issues/161
      
      cc rxin JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5570 from davies/py4j_date and squashes the following commits:
      
      eb4fa53 [Davies Liu] fix tests in python 3
      d17d634 [Davies Liu] rollback changes in mllib
      2e7566d [Davies Liu] convert tuple into ArrayList
      ceb3779 [Davies Liu] Update rdd.py
      3c373f3 [Davies Liu] support date and datetime by auto_convert
      cb094ff [Davies Liu] enable auto convert
      ab9128fb
Loading