Skip to content
Snippets Groups Projects
  1. Sep 02, 2016
    • Jeff Zhang's avatar
      [SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkContext in... · ea662286
      Jeff Zhang authored
      [SPARK-17261] [PYSPARK] Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"
      
      ## What changes were proposed in this pull request?
      
      Set SparkSession._instantiatedContext as None so that we can recreate SparkSession again.
      
      ## How was this patch tested?
      
      Tested manually using the following command in pyspark shell
      ```
      spark.stop()
      spark = SparkSession.builder.enableHiveSupport().getOrCreate()
      spark.sql("show databases").show()
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14857 from zjffdu/SPARK-17261.
      ea662286
  2. Aug 15, 2016
    • Davies Liu's avatar
      [SPARK-16700][PYSPARK][SQL] create DataFrame from dict/Row with schema · fffb0c0d
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      In 2.0, we verify the data type against schema for every row for safety, but with performance cost, this PR make it optional.
      
      When we verify the data type for StructType, it does not support all the types we support in infer schema (for example, dict), this PR fix that to make them consistent.
      
      For Row object which is created using named arguments, the order of fields are sorted by name, they may be not different than the order in provided schema, this PR fix that by ignore the order of fields in this case.
      
      ## How was this patch tested?
      
      Created regression tests for them.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #14469 from davies/py_dict.
      fffb0c0d
  3. Jul 29, 2016
  4. Jul 28, 2016
  5. Jul 14, 2016
    • Liwei Lin's avatar
      [SPARK-16503] SparkSession should provide Spark version · 39c836e9
      Liwei Lin authored
      ## What changes were proposed in this pull request?
      
      This patch enables SparkSession to provide spark version.
      
      ## How was this patch tested?
      
      Manual test:
      
      ```
      scala> sc.version
      res0: String = 2.1.0-SNAPSHOT
      
      scala> spark.version
      res1: String = 2.1.0-SNAPSHOT
      ```
      
      ```
      >>> sc.version
      u'2.1.0-SNAPSHOT'
      >>> spark.version
      u'2.1.0-SNAPSHOT'
      ```
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14165 from lw-lin/add-version.
      39c836e9
  6. Jul 06, 2016
  7. Jun 29, 2016
    • Tathagata Das's avatar
      [SPARK-16266][SQL][STREAING] Moved DataStreamReader/Writer from pyspark.sql to... · f454a7f9
      Tathagata Das authored
      [SPARK-16266][SQL][STREAING] Moved DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming
      
      ## What changes were proposed in this pull request?
      
      - Moved DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming to make them consistent with scala packaging
      - Exposed the necessary classes in sql.streaming package so that they appear in the docs
      - Added pyspark.sql.streaming module to the docs
      
      ## How was this patch tested?
      - updated unit tests.
      - generated docs for testing visibility of pyspark.sql.streaming classes.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13955 from tdas/SPARK-16266.
      f454a7f9
  8. Jun 28, 2016
    • Yin Huai's avatar
      [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to... · 0923c4f5
      Yin Huai authored
      [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to the existing Scala SparkContext's SparkConf
      
      ## What changes were proposed in this pull request?
      When we create a SparkSession at the Python side, it is possible that a SparkContext has been created. For this case, we need to set configs of the SparkSession builder to the Scala SparkContext's SparkConf (we need to do so because conf changes on a active Python SparkContext will not be propagated to the JVM side). Otherwise, we may create a wrong SparkSession (e.g. Hive support is not enabled even if enableHiveSupport is called).
      
      ## How was this patch tested?
      New tests and manual tests.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #13931 from yhuai/SPARK-16224.
      0923c4f5
  9. Jun 18, 2016
    • Jeff Zhang's avatar
      [SPARK-15803] [PYSPARK] Support with statement syntax for SparkSession · 898cb652
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Support with statement syntax for SparkSession in pyspark
      
      ## How was this patch tested?
      
      Manually verify it. Although I can add unit test for it, it would affect other unit test because the SparkContext is stopped after the with statement.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #13541 from zjffdu/SPARK-15803.
      898cb652
  10. Jun 15, 2016
  11. Jun 14, 2016
    • Tathagata Das's avatar
      [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream... · 214adb14
      Tathagata Das authored
      [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs
      
      ## What changes were proposed in this pull request?
      Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams.
      
      - [x] Python API!!
      
      ## How was this patch tested?
      Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13653 from tdas/SPARK-15933.
      214adb14
    • Shixiong Zhu's avatar
      [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests · 96c3500c
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR just enables tests for sql/streaming.py and also fixes the failures.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13655 from zsxwing/python-streaming-test.
      96c3500c
  12. Jun 06, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'an -> a' · fd8af397
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `an -> a`
      
      Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one.
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13515 from zhengruifeng/an_a.
      fd8af397
  13. May 26, 2016
  14. May 25, 2016
  15. May 20, 2016
    • Andrew Or's avatar
      [SPARK-15417][SQL][PYTHON] PySpark shell always uses in-memory catalog · c32b1b16
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      There is no way to use the Hive catalog in `pyspark-shell`. This is because we used to create a `SparkContext` before calling `SparkSession.enableHiveSupport().getOrCreate()`, which just gets the existing `SparkContext` instead of creating a new one. As a result, `spark.sql.catalogImplementation` was never propagated.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #13203 from andrewor14/fix-pyspark-shell.
      c32b1b16
  16. May 19, 2016
    • Reynold Xin's avatar
      [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate... · f2ee0ed4
      Reynold Xin authored
      [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified
      
      ## What changes were proposed in this pull request?
      Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that.
      
      This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession.
      
      ## How was this patch tested?
      Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13200 from rxin/SPARK-15075.
      f2ee0ed4
  17. May 17, 2016
    • Sean Zhong's avatar
      [SPARK-15171][SQL] Remove the references to deprecated method dataset.registerTempTable · 25b315e6
      Sean Zhong authored
      ## What changes were proposed in this pull request?
      
      Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`.
      
      ## How was this patch tested?
      
      This PR only changes the unit test code, examples, and comments. It should be safe.
      This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged.
      
      Author: Sean Zhong <seanzhong@databricks.com>
      
      Closes #13098 from clockfly/spark-15171-remove-deprecation.
      25b315e6
    • Dongjoon Hyun's avatar
      [SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent. · 0f576a57
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      **createDataFrame** returns inconsistent types for column names.
      ```python
      >>> from pyspark.sql.types import StructType, StructField, StringType
      >>> schema = StructType([StructField(u"col", StringType())])
      >>> df1 = spark.createDataFrame([("a",)], schema)
      >>> df1.columns # "col" is str
      ['col']
      >>> df2 = spark.createDataFrame([("a",)], [u"col"])
      >>> df2.columns # "col" is unicode
      [u'col']
      ```
      
      The reason is only **StructField** has the following code.
      ```
      if not isinstance(name, str):
          name = name.encode('utf-8')
      ```
      This PR adds the same logic into **createDataFrame** for consistency.
      ```
      if isinstance(schema, list):
          schema = [x.encode('utf-8') if not isinstance(x, str) else x for x in schema]
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins test (with new python doctest)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13097 from dongjoon-hyun/SPARK-15244.
      0f576a57
  18. May 12, 2016
  19. May 11, 2016
  20. May 04, 2016
    • Reynold Xin's avatar
      [SPARK-15126][SQL] RuntimeConfig.set should return Unit · 6ae9fc00
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      Currently we return RuntimeConfig itself to facilitate chaining. However, it makes the output in interactive environments (e.g. notebooks, scala repl) weird because it'd show the response of calling set as a RuntimeConfig itself.
      
      ## How was this patch tested?
      Updated unit tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #12902 from rxin/SPARK-15126.
      6ae9fc00
  21. May 03, 2016
  22. Apr 29, 2016
    • Andrew Or's avatar
      [SPARK-15012][SQL] Simplify configuration API further · 66773eb8
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      1. Remove all the `spark.setConf` etc. Just expose `spark.conf`
      2. Make `spark.conf` take in things set in the core `SparkConf` as well, otherwise users may get confused
      
      This was done for both the Python and Scala APIs.
      
      ## How was this patch tested?
      `SQLConfSuite`, python tests.
      
      This one fixes the failed tests in #12787
      
      Closes #12787
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12798 from yhuai/conf-api.
      66773eb8
    • Andrew Or's avatar
      [SPARK-14988][PYTHON] SparkSession API follow-ups · d33e3d57
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      Addresses comments in #12765.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12784 from andrewor14/python-followup.
      d33e3d57
    • Andrew Or's avatar
      [SPARK-14988][PYTHON] SparkSession catalog and conf API · a7d0fedc
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      The `catalog` and `conf` APIs were exposed in `SparkSession` in #12713 and #12669. This patch adds those to the python API.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12765 from andrewor14/python-spark-session-more.
      a7d0fedc
  23. Apr 28, 2016
    • Andrew Or's avatar
      [SPARK-14945][PYTHON] SparkSession Python API · 89addd40
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      ```
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /__ / .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
            /_/
      
      Using Python version 2.7.5 (default, Mar  9 2014 22:15:05)
      SparkSession available as 'spark'.
      >>> spark
      <pyspark.sql.session.SparkSession object at 0x101f3bfd0>
      >>> spark.sql("SHOW TABLES").show()
      ...
      +---------+-----------+
      |tableName|isTemporary|
      +---------+-----------+
      |      src|      false|
      +---------+-----------+
      
      >>> spark.range(1, 10, 2).show()
      +---+
      | id|
      +---+
      |  1|
      |  3|
      |  5|
      |  7|
      |  9|
      +---+
      ```
      **Note**: This API is NOT complete in its current state. In particular, for now I left out the `conf` and `catalog` APIs, which were added later in Scala. These will be added later before 2.0.
      
      ## How was this patch tested?
      
      Python tests.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12746 from andrewor14/python-spark-session.
      89addd40
  24. Apr 25, 2016
    • Andrew Or's avatar
      [SPARK-14721][SQL] Remove HiveContext (part 2) · 3c5e65c3
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.
      
      Note: A couple of things will break after this patch. These will be fixed separately.
      - the python HiveContext
      - all the documentation / comments referencing HiveContext
      - there will be no more HiveContext in the REPL (fixed by #12589)
      
      ## How was this patch tested?
      
      No change in functionality.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12585 from andrewor14/delete-hive-context.
      3c5e65c3
  25. Apr 24, 2016
    • mathieu longtin's avatar
      Support single argument version of sqlContext.getConf · 902c15c5
      mathieu longtin authored
      ## What changes were proposed in this pull request?
      
      In Python, sqlContext.getConf didn't allow getting the system default (getConf with one parameter).
      
      Now the following are supported:
      ```
      sqlContext.getConf(confName)  # System default if not locally set, this is new
      sqlContext.getConf(confName, myDefault)  # myDefault if not locally set, old behavior
      ```
      
      I also added doctests to this function. The original behavior does not change.
      
      ## How was this patch tested?
      
      Manually, but doctests were added.
      
      Author: mathieu longtin <mathieu.longtin@nuance.com>
      
      Closes #12488 from mathieulongtin/pyfixgetconf3.
      902c15c5
  26. Apr 14, 2016
    • Holden Karau's avatar
      [SPARK-14573][PYSPARK][BUILD] Fix PyDoc Makefile & highlighting issues · 478af2f4
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      The PyDoc Makefile used "=" rather than "?=" for setting env variables so it overwrote the user values. This ignored the environment variables we set for linting allowing warnings through. This PR also fixes the warnings that had been introduced.
      
      ## How was this patch tested?
      
      manual local export & make
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12336 from holdenk/SPARK-14573-fix-pydoc-makefile.
      478af2f4
  27. Mar 25, 2016
    • Andrew Or's avatar
      [SPARK-14014][SQL] Integrate session catalog (attempt #2) · 20ddf5fd
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This reopens #11836, which was merged but promptly reverted because it introduced flaky Hive tests.
      
      ## How was this patch tested?
      
      See `CatalogTestCases`, `SessionCatalogSuite` and `HiveContextSuite`.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #11938 from andrewor14/session-catalog-again.
      20ddf5fd
  28. Mar 24, 2016
  29. Mar 23, 2016
    • Andrew Or's avatar
      [SPARK-14014][SQL] Replace existing catalog with SessionCatalog · 5dfc0197
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      `SessionCatalog`, introduced in #11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`.
      
      As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely:
      - SPARK-14013: Properly implement temporary functions in `SessionCatalog`
      - SPARK-13879: Decide which DDL/DML commands to support natively in Spark
      - SPARK-?????: Implement the ones we do want to support through `SessionCatalog`.
      - SPARK-?????: Merge SQL/HiveContext
      
      ## How was this patch tested?
      
      This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`.
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #11836 from andrewor14/use-session-catalog.
      5dfc0197
  30. Mar 08, 2016
    • Wenchen Fan's avatar
      [SPARK-13593] [SQL] improve the `createDataFrame` to accept data type string and verify the data · d57daf1f
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      This PR improves the `createDataFrame` method to make it also accept datatype string, then users can convert python RDD to DataFrame easily, for example, `df = rdd.toDF("a: int, b: string")`.
      It also supports flat schema so users can convert an RDD of int to DataFrame directly, we will automatically wrap int to row for users.
      If schema is given, now we checks if the real data matches the given schema, and throw error if it doesn't.
      
      ## How was this patch tested?
      
      new tests in `test.py` and doc test in `types.py`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #11444 from cloud-fan/pyrdd.
      d57daf1f
  31. Mar 02, 2016
  32. Feb 24, 2016
    • Wenchen Fan's avatar
      [SPARK-13467] [PYSPARK] abstract python function to simplify pyspark code · a60f9128
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      When we pass a Python function to JVM side, we also need to send its context, e.g. `envVars`, `pythonIncludes`, `pythonExec`, etc. However, it's annoying to pass around so many parameters at many places. This PR abstract python function along with its context, to simplify some pyspark code and make the logic more clear.
      
      ## How was the this patch tested?
      
      by existing unit tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #11342 from cloud-fan/python-clean.
      a60f9128
  33. Feb 21, 2016
    • Cheng Lian's avatar
      [SPARK-12799] Simplify various string output for expressions · d9efe63e
      Cheng Lian authored
      This PR introduces several major changes:
      
      1. Replacing `Expression.prettyString` with `Expression.sql`
      
         The `prettyString` method is mostly an internal, developer faced facility for debugging purposes, and shouldn't be exposed to users.
      
      1. Using SQL-like representation as column names for selected fields that are not named expression (back-ticks and double quotes should be removed)
      
         Before, we were using `prettyString` as column names when possible, and sometimes the result column names can be weird.  Here are several examples:
      
         Expression         | `prettyString` | `sql`      | Note
         ------------------ | -------------- | ---------- | ---------------
         `a && b`           | `a && b`       | `a AND b`  |
         `a.getField("f")`  | `a[f]`         | `a.f`      | `a` is a struct
      
      1. Adding trait `NonSQLExpression` extending from `Expression` for expressions that don't have a SQL representation (e.g. Scala UDF/UDAF and Java/Scala object expressions used for encoders)
      
         `NonSQLExpression.sql` may return an arbitrary user facing string representation of the expression.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10757 from liancheng/spark-12799.simplify-expression-string-methods.
      d9efe63e
  34. Jan 24, 2016
    • Jeff Zhang's avatar
      [SPARK-12120][PYSPARK] Improve exception message when failing to init… · e789b1d2
      Jeff Zhang authored
      …ialize HiveContext in PySpark
      
      davies Mind to review ?
      
      This is the error message after this PR
      
      ```
      15/12/03 16:59:53 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
      /Users/jzhang/github/spark/python/pyspark/sql/context.py:689: UserWarning: You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
        warnings.warn("You must build Spark with Hive. "
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 663, in read
          return DataFrameReader(self)
        File "/Users/jzhang/github/spark/python/pyspark/sql/readwriter.py", line 56, in __init__
          self._jreader = sqlContext._ssql_ctx.read()
        File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 692, in _ssql_ctx
          raise e
      py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
      : java.lang.RuntimeException: java.net.ConnectException: Call From jzhangMBPr.local/127.0.0.1 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
      	at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
      	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
      	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
      	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
      	at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
      	at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
      	at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
      	at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
      	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
      	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
      	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
      	at py4j.Gateway.invoke(Gateway.java:214)
      	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
      	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
      	at py4j.GatewayConnection.run(GatewayConnection.java:209)
      	at java.lang.Thread.run(Thread.java:745)
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10126 from zjffdu/SPARK-12120.
      e789b1d2
  35. Jan 04, 2016
  36. Dec 30, 2015
    • Holden Karau's avatar
      [SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections · d1ca634d
      Holden Karau authored
      Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections.
      d1ca634d
Loading