Skip to content
Snippets Groups Projects
  1. Dec 09, 2015
  2. Dec 08, 2015
  3. Dec 07, 2015
    • Joseph K. Bradley's avatar
      [SPARK-12160][MLLIB] Use SQLContext.getOrCreate in MLlib · 3e7e05f5
      Joseph K. Bradley authored
      Switched from using SQLContext constructor to using getOrCreate, mainly in model save/load methods.
      
      This covers all instances in spark.mllib.  There were no uses of the constructor in spark.ml.
      
      CC: mengxr yhuai
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #10161 from jkbradley/mllib-sqlcontext-fix.
      3e7e05f5
    • Andrew Ray's avatar
      [SPARK-12184][PYTHON] Make python api doc for pivot consistant with scala doc · 36282f78
      Andrew Ray authored
      In SPARK-11946 the API for pivot was changed a bit and got updated doc, the doc changes were not made for the python api though. This PR updates the python doc to be consistent.
      
      Author: Andrew Ray <ray.andrew@gmail.com>
      
      Closes #10176 from aray/sql-pivot-python-doc.
      36282f78
    • tedyu's avatar
      [SPARK-11884] Drop multiple columns in the DataFrame API · 84b80944
      tedyu authored
      See the thread Ben started:
      http://search-hadoop.com/m/q3RTtveEuhjsr7g/
      
      This PR adds drop() method to DataFrame which accepts multiple column names
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #9862 from ted-yu/master.
      84b80944
    • Xusen Yin's avatar
      [SPARK-11963][DOC] Add docs for QuantileDiscretizer · 871e85d9
      Xusen Yin authored
      https://issues.apache.org/jira/browse/SPARK-11963
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #9962 from yinxusen/SPARK-11963.
      871e85d9
    • Shixiong Zhu's avatar
      [SPARK-12060][CORE] Avoid memory copy in JavaSerializerInstance.serialize · 3f4efb5c
      Shixiong Zhu authored
      Merged #10051 again since #10083 is resolved.
      
      This reverts commit 328b757d.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10167 from zsxwing/merge-SPARK-12060.
      3f4efb5c
    • Tathagata Das's avatar
      [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present · 5d80d8c6
      Tathagata Das authored
      The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004).
      
      While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #9988 from tdas/SPARK-11932.
      5d80d8c6
    • Davies Liu's avatar
      [SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler · ef3f047c
      Davies Liu authored
      Currently, the current line is not cleared by Cltr-C
      
      After this patch
      ```
      >>> asdfasdf^C
      Traceback (most recent call last):
        File "~/spark/python/pyspark/context.py", line 225, in signal_handler
          raise KeyboardInterrupt()
      KeyboardInterrupt
      ```
      
      It's still worse than 1.5 (and before).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10134 from davies/fix_cltrc.
      ef3f047c
    • Sun Rui's avatar
      [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. · 39d677c8
      Sun Rui authored
      This PR:
      1. Suppress all known warnings.
      2. Cleanup test cases and fix some errors in test cases.
      3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
      4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
      5. Make sure the default Hadoop file system is local when running test cases.
      6. Turn on warnings into errors.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #10030 from sun-rui/SPARK-12034.
      39d677c8
    • Davies Liu's avatar
      [SPARK-12032] [SQL] Re-order inner joins to do join with conditions first · 9cde7d5f
      Davies Liu authored
      Currently, the order of joins is exactly the same as SQL query, some conditions may not pushed down to the correct join, then those join will become cross product and is extremely slow.
      
      This patch try to re-order the inner joins (which are common in SQL query), pick the joins that have self-contain conditions first, delay those that does not have conditions.
      
      After this patch, the TPCDS query Q64/65 can run hundreds times faster.
      
      cc marmbrus nongli
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10073 from davies/reorder_joins.
      9cde7d5f
    • Burak Yavuz's avatar
      [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when Jenkins load is high · 6fd9e70e
      Burak Yavuz authored
      We need to make sure that the last entry is indeed the last entry in the queue.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #10110 from brkyvz/batch-wal-test-fix.
      6fd9e70e
  4. Dec 06, 2015
Loading