Skip to content
Snippets Groups Projects
  1. Sep 14, 2015
  2. Sep 13, 2015
  3. Sep 12, 2015
    • Josh Rosen's avatar
      [SPARK-10330] Add Scalastyle rule to require use of SparkHadoopUtil JobContext methods · b3a7480a
      Josh Rosen authored
      This is a followup to #8499 which adds a Scalastyle rule to mandate the use of SparkHadoopUtil's JobContext accessor methods and fixes the existing violations.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8521 from JoshRosen/SPARK-10330-part2.
      b3a7480a
    • JihongMa's avatar
      [SPARK-6548] Adding stddev to DataFrame functions · f4a22808
      JihongMa authored
      Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.
      
      Author: JihongMa <linlin200605@gmail.com>
      Author: Jihong MA <linlin200605@gmail.com>
      Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
      Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>
      
      Closes #6297 from JihongMA/SPARK-SQL.
      f4a22808
    • Sean Owen's avatar
      [SPARK-10547] [TEST] Streamline / improve style of Java API tests · 22730ad5
      Sean Owen authored
      Fix a few Java API test style issues: unused generic types, exceptions, wrong assert argument order
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #8706 from srowen/SPARK-10547.
      22730ad5
    • Nithin Asokan's avatar
      [SPARK-10554] [CORE] Fix NPE with ShutdownHook · 8285e3b0
      Nithin Asokan authored
      https://issues.apache.org/jira/browse/SPARK-10554
      
      Fixes NPE when ShutdownHook tries to cleanup temporary folders
      
      Author: Nithin Asokan <Nithin.Asokan@Cerner.com>
      
      Closes #8720 from nasokan/SPARK-10554.
      8285e3b0
    • Daniel Imfeld's avatar
      [SPARK-10566] [CORE] SnappyCompressionCodec init exception handling masks... · 6d836780
      Daniel Imfeld authored
      [SPARK-10566] [CORE] SnappyCompressionCodec init exception handling masks important error information
      
      When throwing an IllegalArgumentException in SnappyCompressionCodec.init, chain the existing exception. This allows potentially important debugging info to be passed to the user.
      
      Manual testing shows the exception chained properly, and the test suite still looks fine as well.
      
      This contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Daniel Imfeld <daniel@danielimfeld.com>
      
      Closes #8725 from dimfeld/dimfeld-patch-1.
      6d836780
  4. Sep 11, 2015
  5. Sep 10, 2015
    • Yanbo Liang's avatar
      [SPARK-10027] [ML] [PySpark] Add Python API missing methods for ml.feature · a140dd77
      Yanbo Liang authored
      Missing method of ml.feature are listed here:
      ```StringIndexer``` lacks of parameter ```handleInvalid```.
      ```StringIndexerModel``` lacks of method ```labels```.
      ```VectorIndexerModel``` lacks of methods ```numFeatures``` and ```categoryMaps```.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8313 from yanboliang/spark-10027.
      a140dd77
    • Yanbo Liang's avatar
      [SPARK-10023] [ML] [PySpark] Unified DecisionTreeParams checkpointInterval... · 339a5271
      Yanbo Liang authored
      [SPARK-10023] [ML] [PySpark] Unified DecisionTreeParams checkpointInterval between Scala and Python API.
      
      "checkpointInterval" is member of DecisionTreeParams in Scala API which is inconsistency with Python API, we should unified them.
      ```
      member of DecisionTreeParams <-> Scala API
      shared param for all ML Transformer/Estimator <-> Python API
      ```
      Proposal:
      "checkpointInterval" is also used by ALS, so we make it shared params at Scala.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8528 from yanboliang/spark-10023.
      339a5271
    • Matt Massie's avatar
      [SPARK-9043] Serialize key, value and combiner classes in ShuffleDependency · 0eabea8a
      Matt Massie authored
      ShuffleManager implementations are currently not given type information for
      the key, value and combiner classes. Serialization of shuffle objects relies
      on objects being JavaSerializable, with methods defined for reading/writing
      the object or, alternatively, serialization via Kryo which uses reflection.
      
      Serialization systems like Avro, Thrift and Protobuf generate classes with
      zero argument constructors and explicit schema information
      (e.g. IndexedRecords in Avro have get, put and getSchema methods).
      
      By serializing the key, value and combiner class names in ShuffleDependency,
      shuffle implementations will have access to schema information when
      registerShuffle() is called.
      
      Author: Matt Massie <massie@cs.berkeley.edu>
      
      Closes #7403 from massie/shuffle-classtags.
      0eabea8a
    • Yanbo Liang's avatar
      [SPARK-7544] [SQL] [PySpark] pyspark.sql.types.Row implements __getitem__ · 89562a17
      Yanbo Liang authored
      pyspark.sql.types.Row implements ```__getitem__```
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8333 from yanboliang/spark-7544.
      89562a17
    • Shivaram Venkataraman's avatar
      Add 1.5 to master branch EC2 scripts · 42047577
      Shivaram Venkataraman authored
      This change brings it to par with `branch-1.5` (and 1.5.0 release)
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8704 from shivaram/ec2-1.5-update.
      42047577
    • Andrew Or's avatar
      [SPARK-10443] [SQL] Refactor SortMergeOuterJoin to reduce duplication · 3db72554
      Andrew Or authored
      `LeftOutputIterator` and `RightOutputIterator` are symmetrically identical and can share a lot of code. If someone makes a change in one but forgets to do the same thing in the other we'll end up with inconsistent behavior. This patch also adds inline comments to clarify the intention of the code.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8596 from andrewor14/smoj-cleanup.
      3db72554
    • Sun Rui's avatar
      [SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame. · 45e3be5c
      Sun Rui authored
      this PR :
      1.  Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.
      
      2.  Enhance the SerDe to support transferring  a Scala seq to R side. Data of ArrayType in DataFrame
      after collection is observed to be of Scala Seq type.
      
      3.  Support ArrayType in createDataFrame().
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8458 from sun-rui/SPARK-10049.
      45e3be5c
Loading