Skip to content
Snippets Groups Projects
  1. Jul 23, 2016
    • WeichenXu's avatar
      [SPARK-16662][PYSPARK][SQL] fix HiveContext warning bug · ab6e4aea
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      move the `HiveContext` deprecate warning printing statement into `HiveContext` constructor.
      so that this warning will appear only when we use `HiveContext`
      otherwise this warning will always appear if we reference the pyspark.ml.context code file.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14301 from WeichenXu123/hiveContext_python_warning_update.
      ab6e4aea
  2. Jun 30, 2016
    • Reynold Xin's avatar
      [SPARK-15954][SQL] Disable loading test tables in Python tests · 38f4d6f4
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch introduces a flag to disable loading test tables in TestHiveSparkSession and disables that in Python. This fixes an issue in which python/run-tests would fail due to failure to load test tables.
      
      Note that these test tables are not used outside of HiveCompatibilitySuite. In the long run we should probably decouple the loading of test tables from the test Hive setup.
      
      ## How was this patch tested?
      This is a test only change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14005 from rxin/SPARK-15954.
      38f4d6f4
    • Reynold Xin's avatar
      [SPARK-16313][SQL] Spark should not silently drop exceptions in file listing · 3d75a5b2
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.
      
      ## How was this patch tested?
      Manually verified.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13987 from rxin/SPARK-16313.
      3d75a5b2
  3. Jun 29, 2016
    • Tathagata Das's avatar
      [SPARK-16266][SQL][STREAING] Moved DataStreamReader/Writer from pyspark.sql to... · f454a7f9
      Tathagata Das authored
      [SPARK-16266][SQL][STREAING] Moved DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming
      
      ## What changes were proposed in this pull request?
      
      - Moved DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming to make them consistent with scala packaging
      - Exposed the necessary classes in sql.streaming package so that they appear in the docs
      - Added pyspark.sql.streaming module to the docs
      
      ## How was this patch tested?
      - updated unit tests.
      - generated docs for testing visibility of pyspark.sql.streaming classes.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13955 from tdas/SPARK-16266.
      f454a7f9
  4. Jun 28, 2016
    • Shixiong Zhu's avatar
      [SPARK-16268][PYSPARK] SQLContext should import DataStreamReader · 5bf8881b
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Fixed the following error:
      ```
      >>> sqlContext.readStream
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "...", line 442, in readStream
          return DataStreamReader(self._wrapped)
      NameError: global name 'DataStreamReader' is not defined
      ```
      
      ## How was this patch tested?
      
      The added test.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13958 from zsxwing/fix-import.
      5bf8881b
  5. Jun 15, 2016
  6. Jun 14, 2016
    • Tathagata Das's avatar
      [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream... · 214adb14
      Tathagata Das authored
      [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs
      
      ## What changes were proposed in this pull request?
      Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams.
      
      - [x] Python API!!
      
      ## How was this patch tested?
      Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13653 from tdas/SPARK-15933.
      214adb14
    • Shixiong Zhu's avatar
      [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests · 96c3500c
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR just enables tests for sql/streaming.py and also fixes the failures.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13655 from zsxwing/python-streaming-test.
      96c3500c
  7. May 19, 2016
    • Reynold Xin's avatar
      [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate... · f2ee0ed4
      Reynold Xin authored
      [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified
      
      ## What changes were proposed in this pull request?
      Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that.
      
      This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession.
      
      ## How was this patch tested?
      Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13200 from rxin/SPARK-15075.
      f2ee0ed4
  8. May 17, 2016
  9. May 12, 2016
  10. May 11, 2016
    • Sandeep Singh's avatar
      [SPARK-15270] [SQL] Use SparkSession Builder to build a session with HiveSupport · de9c85cc
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      Before:
      Creating a hiveContext was failing
      ```python
      from pyspark.sql import HiveContext
      hc = HiveContext(sc)
      ```
      with
      ```
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "spark-2.0/python/pyspark/sql/context.py", line 458, in __init__
          sparkSession = SparkSession.withHiveSupport(sparkContext)
        File "spark-2.0/python/pyspark/sql/session.py", line 192, in withHiveSupport
          jsparkSession = sparkContext._jvm.SparkSession.withHiveSupport(sparkContext._jsc.sc())
        File "spark-2.0/python/lib/py4j-0.9.2-src.zip/py4j/java_gateway.py", line 1048, in __getattr__
      py4j.protocol.Py4JError: org.apache.spark.sql.SparkSession.withHiveSupport does not exist in the JVM
      ```
      
      Now:
      ```python
      >>> from pyspark.sql import HiveContext
      >>> hc = HiveContext(sc)
      >>> hc.range(0, 100)
      DataFrame[id: bigint]
      >>> hc.range(0, 100).count()
      100
      ```
      ## How was this patch tested?
      Existing Tests, tested manually in python shell
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #13056 from techaddict/SPARK-15270.
      de9c85cc
  11. May 04, 2016
    • Andrew Or's avatar
      [SPARK-14896][SQL] Deprecate HiveContext in python · fa79d346
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      See title.
      
      ## How was this patch tested?
      
      PySpark tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12917 from andrewor14/deprecate-hive-context-python.
      fa79d346
  12. Apr 29, 2016
    • Andrew Or's avatar
      [SPARK-15012][SQL] Simplify configuration API further · 66773eb8
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      1. Remove all the `spark.setConf` etc. Just expose `spark.conf`
      2. Make `spark.conf` take in things set in the core `SparkConf` as well, otherwise users may get confused
      
      This was done for both the Python and Scala APIs.
      
      ## How was this patch tested?
      `SQLConfSuite`, python tests.
      
      This one fixes the failed tests in #12787
      
      Closes #12787
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12798 from yhuai/conf-api.
      66773eb8
    • Andrew Or's avatar
      [SPARK-14988][PYTHON] SparkSession API follow-ups · d33e3d57
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      Addresses comments in #12765.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12784 from andrewor14/python-followup.
      d33e3d57
    • Andrew Or's avatar
      [SPARK-14988][PYTHON] SparkSession catalog and conf API · a7d0fedc
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      The `catalog` and `conf` APIs were exposed in `SparkSession` in #12713 and #12669. This patch adds those to the python API.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12765 from andrewor14/python-spark-session-more.
      a7d0fedc
  13. Apr 28, 2016
    • Burak Yavuz's avatar
      [SPARK-14555] Second cut of Python API for Structured Streaming · 78c8aaf8
      Burak Yavuz authored
      ## What changes were proposed in this pull request?
      
      This PR adds Python APIs for:
       - `ContinuousQueryManager`
       - `ContinuousQueryException`
      
      The `ContinuousQueryException` is a very basic wrapper, it doesn't provide the functionality that the Scala side provides, but it follows the same pattern for `AnalysisException`.
      
      For `ContinuousQueryManager`, all APIs are provided except for registering listeners.
      
      This PR also attempts to fix test flakiness by stopping all active streams just before tests.
      
      ## How was this patch tested?
      
      Python Doc tests and unit tests
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #12673 from brkyvz/pyspark-cqm.
      78c8aaf8
    • Andrew Or's avatar
      [SPARK-14945][PYTHON] SparkSession Python API · 89addd40
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      ```
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /__ / .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
            /_/
      
      Using Python version 2.7.5 (default, Mar  9 2014 22:15:05)
      SparkSession available as 'spark'.
      >>> spark
      <pyspark.sql.session.SparkSession object at 0x101f3bfd0>
      >>> spark.sql("SHOW TABLES").show()
      ...
      +---------+-----------+
      |tableName|isTemporary|
      +---------+-----------+
      |      src|      false|
      +---------+-----------+
      
      >>> spark.range(1, 10, 2).show()
      +---+
      | id|
      +---+
      |  1|
      |  3|
      |  5|
      |  7|
      |  9|
      +---+
      ```
      **Note**: This API is NOT complete in its current state. In particular, for now I left out the `conf` and `catalog` APIs, which were added later in Scala. These will be added later before 2.0.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12746 from andrewor14/python-spark-session.
      89addd40
  14. Apr 25, 2016
    • Andrew Or's avatar
      [SPARK-14721][SQL] Remove HiveContext (part 2) · 3c5e65c3
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.
      
      Note: A couple of things will break after this patch. These will be fixed separately.
      - the python HiveContext
      - all the documentation / comments referencing HiveContext
      - there will be no more HiveContext in the REPL (fixed by #12589)
      
      ## How was this patch tested?
      
      No change in functionality.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12585 from andrewor14/delete-hive-context.
      3c5e65c3
  15. Apr 24, 2016
    • mathieu longtin's avatar
      Support single argument version of sqlContext.getConf · 902c15c5
      mathieu longtin authored
      ## What changes were proposed in this pull request?
      
      In Python, sqlContext.getConf didn't allow getting the system default (getConf with one parameter).
      
      Now the following are supported:
      ```
      sqlContext.getConf(confName)  # System default if not locally set, this is new
      sqlContext.getConf(confName, myDefault)  # myDefault if not locally set, old behavior
      ```
      
      I also added doctests to this function. The original behavior does not change.
      
      ## How was this patch tested?
      
      Manually, but doctests were added.
      
      Author: mathieu longtin <mathieu.longtin@nuance.com>
      
      Closes #12488 from mathieulongtin/pyfixgetconf3.
      902c15c5
  16. Apr 14, 2016
    • Holden Karau's avatar
      [SPARK-14573][PYSPARK][BUILD] Fix PyDoc Makefile & highlighting issues · 478af2f4
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      The PyDoc Makefile used "=" rather than "?=" for setting env variables so it overwrote the user values. This ignored the environment variables we set for linting allowing warnings through. This PR also fixes the warnings that had been introduced.
      
      ## How was this patch tested?
      
      manual local export & make
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12336 from holdenk/SPARK-14573-fix-pydoc-makefile.
      478af2f4
  17. Mar 25, 2016
    • Andrew Or's avatar
      [SPARK-14014][SQL] Integrate session catalog (attempt #2) · 20ddf5fd
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This reopens #11836, which was merged but promptly reverted because it introduced flaky Hive tests.
      
      ## How was this patch tested?
      
      See `CatalogTestCases`, `SessionCatalogSuite` and `HiveContextSuite`.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #11938 from andrewor14/session-catalog-again.
      20ddf5fd
  18. Mar 24, 2016
  19. Mar 23, 2016
    • Andrew Or's avatar
      [SPARK-14014][SQL] Replace existing catalog with SessionCatalog · 5dfc0197
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      `SessionCatalog`, introduced in #11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`.
      
      As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely:
      - SPARK-14013: Properly implement temporary functions in `SessionCatalog`
      - SPARK-13879: Decide which DDL/DML commands to support natively in Spark
      - SPARK-?????: Implement the ones we do want to support through `SessionCatalog`.
      - SPARK-?????: Merge SQL/HiveContext
      
      ## How was this patch tested?
      
      This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`.
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #11836 from andrewor14/use-session-catalog.
      5dfc0197
  20. Mar 08, 2016
    • Wenchen Fan's avatar
      [SPARK-13593] [SQL] improve the `createDataFrame` to accept data type string and verify the data · d57daf1f
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      This PR improves the `createDataFrame` method to make it also accept datatype string, then users can convert python RDD to DataFrame easily, for example, `df = rdd.toDF("a: int, b: string")`.
      It also supports flat schema so users can convert an RDD of int to DataFrame directly, we will automatically wrap int to row for users.
      If schema is given, now we checks if the real data matches the given schema, and throw error if it doesn't.
      
      ## How was this patch tested?
      
      new tests in `test.py` and doc test in `types.py`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #11444 from cloud-fan/pyrdd.
      d57daf1f
  21. Mar 02, 2016
  22. Feb 24, 2016
    • Wenchen Fan's avatar
      [SPARK-13467] [PYSPARK] abstract python function to simplify pyspark code · a60f9128
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      When we pass a Python function to JVM side, we also need to send its context, e.g. `envVars`, `pythonIncludes`, `pythonExec`, etc. However, it's annoying to pass around so many parameters at many places. This PR abstract python function along with its context, to simplify some pyspark code and make the logic more clear.
      
      ## How was the this patch tested?
      
      by existing unit tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #11342 from cloud-fan/python-clean.
      a60f9128
  23. Feb 21, 2016
    • Cheng Lian's avatar
      [SPARK-12799] Simplify various string output for expressions · d9efe63e
      Cheng Lian authored
      This PR introduces several major changes:
      
      1. Replacing `Expression.prettyString` with `Expression.sql`
      
         The `prettyString` method is mostly an internal, developer faced facility for debugging purposes, and shouldn't be exposed to users.
      
      1. Using SQL-like representation as column names for selected fields that are not named expression (back-ticks and double quotes should be removed)
      
         Before, we were using `prettyString` as column names when possible, and sometimes the result column names can be weird.  Here are several examples:
      
         Expression         | `prettyString` | `sql`      | Note
         ------------------ | -------------- | ---------- | ---------------
         `a && b`           | `a && b`       | `a AND b`  |
         `a.getField("f")`  | `a[f]`         | `a.f`      | `a` is a struct
      
      1. Adding trait `NonSQLExpression` extending from `Expression` for expressions that don't have a SQL representation (e.g. Scala UDF/UDAF and Java/Scala object expressions used for encoders)
      
         `NonSQLExpression.sql` may return an arbitrary user facing string representation of the expression.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10757 from liancheng/spark-12799.simplify-expression-string-methods.
      d9efe63e
  24. Jan 24, 2016
    • Jeff Zhang's avatar
      [SPARK-12120][PYSPARK] Improve exception message when failing to init… · e789b1d2
      Jeff Zhang authored
      …ialize HiveContext in PySpark
      
      davies Mind to review ?
      
      This is the error message after this PR
      
      ```
      15/12/03 16:59:53 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
      /Users/jzhang/github/spark/python/pyspark/sql/context.py:689: UserWarning: You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
        warnings.warn("You must build Spark with Hive. "
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 663, in read
          return DataFrameReader(self)
        File "/Users/jzhang/github/spark/python/pyspark/sql/readwriter.py", line 56, in __init__
          self._jreader = sqlContext._ssql_ctx.read()
        File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 692, in _ssql_ctx
          raise e
      py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
      : java.lang.RuntimeException: java.net.ConnectException: Call From jzhangMBPr.local/127.0.0.1 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
      	at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
      	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
      	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
      	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
      	at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
      	at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
      	at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
      	at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
      	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
      	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
      	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
      	at py4j.Gateway.invoke(Gateway.java:214)
      	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
      	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
      	at py4j.GatewayConnection.run(GatewayConnection.java:209)
      	at java.lang.Thread.run(Thread.java:745)
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10126 from zjffdu/SPARK-12120.
      e789b1d2
  25. Jan 04, 2016
  26. Dec 30, 2015
    • Holden Karau's avatar
      [SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections · d1ca634d
      Holden Karau authored
      Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections.
      d1ca634d
  27. Nov 26, 2015
  28. Nov 25, 2015
  29. Nov 12, 2015
  30. Nov 02, 2015
    • Jason White's avatar
      [SPARK-11437] [PYSPARK] Don't .take when converting RDD to DataFrame with provided schema · f92f334c
      Jason White authored
      When creating a DataFrame from an RDD in PySpark, `createDataFrame` calls `.take(10)` to verify the first 10 rows of the RDD match the provided schema. Similar to https://issues.apache.org/jira/browse/SPARK-8070, but that issue affected cases where a schema was not provided.
      
      Verifying the first 10 rows is of limited utility and causes the DAG to be executed non-lazily. If necessary, I believe this verification should be done lazily on all rows. However, since the caller is providing a schema to follow, I think it's acceptable to simply fail if the schema is incorrect.
      
      marmbrus We chatted about this at SparkSummitEU. davies you made a similar change for the infer-schema path in https://github.com/apache/spark/pull/6606
      
      Author: Jason White <jason.white@shopify.com>
      
      Closes #9392 from JasonMWhite/createDataFrame_without_take.
      f92f334c
  31. Oct 19, 2015
  32. Sep 08, 2015
  33. Aug 13, 2015
  34. Jul 30, 2015
    • Davies Liu's avatar
      [SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__ · e044705b
      Davies Liu authored
      Also we could create a Python UDT without having a Scala one, it's important for Python users.
      
      cc mengxr JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7453 from davies/class_in_main and squashes the following commits:
      
      4dfd5e1 [Davies Liu] add tests for Python and Scala UDT
      793d9b2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      dc65f19 [Davies Liu] address comment
      a9a3c40 [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      a86e1fc [Davies Liu] fix serialization
      ad528ba [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      63f52ef [Davies Liu] fix pylint check
      655b8a9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      316a394 [Davies Liu] support Python UDT with UTF
      0bcb3ef [Davies Liu] fix bug in mllib
      de986d6 [Davies Liu] fix test
      83d65ac [Davies Liu] fix bug in StructType
      55bb86e [Davies Liu] support Python UDT in __main__ (without Scala one)
      e044705b
  35. Jul 20, 2015
    • Davies Liu's avatar
      [SPARK-9114] [SQL] [PySpark] convert returned object from UDF into internal type · 9f913c4f
      Davies Liu authored
      This PR also remove the duplicated code between registerFunction and UserDefinedFunction.
      
      cc JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7450 from davies/fix_return_type and squashes the following commits:
      
      e80bf9f [Davies Liu] remove debugging code
      f94b1f6 [Davies Liu] fix mima
      8f9c58b [Davies Liu] convert returned object from UDF into internal type
      9f913c4f
Loading