Skip to content
Snippets Groups Projects
  1. Dec 04, 2015
    • Burak Yavuz's avatar
      [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests · 302d68de
      Burak Yavuz authored
      Python tests require access to the `KinesisTestUtils` file. When this file exists under src/test, python can't access it, since it is not available in the assembly jar.
      
      However, if we move KinesisTestUtils to src/main, we need to add the KinesisProducerLibrary as a dependency. In order to avoid this, I moved KinesisTestUtils to src/main, and extended it with ExtendedKinesisTestUtils which is under src/test that adds support for the KPL.
      
      cc zsxwing tdas
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #10050 from brkyvz/kinesis-py.
      302d68de
  2. Dec 03, 2015
  3. Dec 02, 2015
  4. Dec 01, 2015
  5. Nov 30, 2015
  6. Nov 26, 2015
  7. Nov 25, 2015
  8. Nov 24, 2015
    • Reynold Xin's avatar
      [SPARK-10621][SQL] Consistent naming for functions in SQL, Python, Scala · 151d7c2b
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9948 from rxin/SPARK-10621.
      151d7c2b
    • Reynold Xin's avatar
      [SPARK-11967][SQL] Consistent use of varargs for multiple paths in DataFrameReader · 25bbd3c1
      Reynold Xin authored
      This patch makes it consistent to use varargs in all DataFrameReader methods, including Parquet, JSON, text, and the generic load function.
      
      Also added a few more API tests for the Java API.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9945 from rxin/SPARK-11967.
      25bbd3c1
    • Reynold Xin's avatar
      [SPARK-11946][SQL] Audit pivot API for 1.6. · f3152722
      Reynold Xin authored
      Currently pivot's signature looks like
      
      ```scala
      scala.annotation.varargs
      def pivot(pivotColumn: Column, values: Column*): GroupedData
      
      scala.annotation.varargs
      def pivot(pivotColumn: String, values: Any*): GroupedData
      ```
      
      I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List.
      
      I also made similar changes for Python.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9929 from rxin/SPARK-11946.
      f3152722
  9. Nov 23, 2015
    • Bryan Cutler's avatar
      [SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionWithSGD... · 10574564
      Bryan Cutler authored
      [SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionWithSGD Python API equal to Scala one
      
      This is to bring the API documentation of StreamingLogisticReressionWithSGD and StreamingLinearRegressionWithSGC in line with the Scala versions.
      
      -Fixed the algorithm descriptions
      -Added default values to parameter descriptions
      -Changed StreamingLogisticRegressionWithSGD regParam to default to 0, as in the Scala version
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #9141 from BryanCutler/StreamingLogisticRegressionWithSGD-python-api-sync.
      10574564
    • Davies Liu's avatar
      [SPARK-11836][SQL] udf/cast should not create new SQLContext · 1d912020
      Davies Liu authored
      They should use the existing SQLContext.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9914 from davies/create_udf.
      1d912020
  10. Nov 20, 2015
    • Shixiong Zhu's avatar
      [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction... · be7a2cfd
      Shixiong Zhu authored
      [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer
      
      TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception happens, it just return None. This will cause some weird NPE and confuse people.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #9847 from zsxwing/pyspark-streaming-exception.
      be7a2cfd
    • Yanbo Liang's avatar
      [SPARK-11875][ML][PYSPARK] Update doc for PySpark HasCheckpointInterval · 7216f405
      Yanbo Liang authored
      * Update doc for PySpark ```HasCheckpointInterval``` that users can understand how to disable checkpoint.
      * Update doc for PySpark ```cacheNodeIds``` of ```DecisionTreeParams``` to notify the relationship between ```cacheNodeIds``` and ```checkpointInterval```.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9856 from yanboliang/spark-11875.
      7216f405
  11. Nov 19, 2015
    • David Tolpin's avatar
      [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow · 599a8c6e
      David Tolpin authored
      invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None,
      thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data.
      
      In addition, the docstring used wrong parameter names, also fixed.
      
      Author: David Tolpin <david.tolpin@gmail.com>
      
      Closes #9775 from dtolpin/master.
      599a8c6e
  12. Nov 18, 2015
  13. Nov 17, 2015
    • jerryshao's avatar
      [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API · 75a29229
      jerryshao authored
      Fixed the merge conflicts in #7410
      
      Closes #7410
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      Author: jerryshao <saisai.shao@intel.com>
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #9742 from zsxwing/pr7410.
      75a29229
    • Shixiong Zhu's avatar
      [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch · 928d6316
      Shixiong Zhu authored
      We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch interval, checkpointing for completing an old batch may run after checkpointing for generating a new batch. If this happens, checkpoint of an old batch actually has the latest information, so we want to recovery from it. This PR will use the latest checkpoint time as the file name, so that we can always recovery from the latest checkpoint file.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #9707 from zsxwing/fix-checkpoint.
      928d6316
  14. Nov 16, 2015
  15. Nov 13, 2015
  16. Nov 12, 2015
  17. Nov 11, 2015
  18. Nov 10, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-11566] [MLLIB] [PYTHON] Refactoring GaussianMixtureModel.gaussians in Python · c0e48dfa
      Yu ISHIKAWA authored
      cc jkbradley
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #9534 from yu-iskw/SPARK-11566.
      c0e48dfa
    • felixcheung's avatar
      [SPARK-11567] [PYTHON] Add Python API for corr Aggregate function · 32790fe7
      felixcheung authored
      like `df.agg(corr("col1", "col2")`
      
      davies
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #9536 from felixcheung/pyfunc.
      32790fe7
    • Yin Huai's avatar
      [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to... · e0701c75
      Yin Huai authored
      [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s
      
      https://issues.apache.org/jira/browse/SPARK-9830
      
      This PR contains the following main changes.
      * Removing `AggregateExpression1`.
      * Removing `Aggregate` operator, which is used to evaluate `AggregateExpression1`.
      * Removing planner rule used to plan `Aggregate`.
      * Linking `MultipleDistinctRewriter` to analyzer.
      * Renaming `AggregateExpression2` to `AggregateExpression` and `AggregateFunction2` to `AggregateFunction`.
      * Updating places where we create aggregate expression. The way to create aggregate expressions is `AggregateExpression(aggregateFunction, mode, isDistinct)`.
      * Changing `val`s in `DeclarativeAggregate`s that touch children of this function to `lazy val`s (when we create aggregate expression in DataFrame API, children of an aggregate function can be unresolved).
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #9556 from yhuai/removeAgg1.
      e0701c75
  19. Nov 09, 2015
  20. Nov 07, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-8467] [MLLIB] [PYSPARK] Add LDAModel.describeTopics() in Python · 2ff0e79a
      Yu ISHIKAWA authored
      Could jkbradley and davies review it?
      
      - Create a wrapper class: `LDAModelWrapper` for `LDAModel`. Because we can't deal with the return value of`describeTopics` in Scala from pyspark directly. `Array[(Array[Int], Array[Double])]` is too complicated to convert it.
      - Add `loadLDAModel` in `PythonMLlibAPI`. Since `LDAModel` in Scala is an abstract class and we need to call `load` of `DistributedLDAModel`.
      
      [[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8467)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #8643 from yu-iskw/SPARK-8467-2.
      2ff0e79a
  21. Nov 06, 2015
Loading