Skip to content
Snippets Groups Projects
  1. Sep 14, 2015
    • Edoardo Vacchi's avatar
      [SPARK-6981] [SQL] Factor out SparkPlanner and QueryExecution from SQLContext · 64f04154
      Edoardo Vacchi authored
      Alternative to PR #6122; in this case the refactored out classes are replaced by inner classes with the same name for backwards binary compatibility
      
         * process in a lighter-weight, backwards-compatible way
      
      Author: Edoardo Vacchi <uncommonnonsense@gmail.com>
      
      Closes #6356 from evacchi/sqlctx-refactoring-lite.
      64f04154
    • Davies Liu's avatar
      [SPARK-10522] [SQL] Nanoseconds of Timestamp in Parquet should be positive · 7e32387a
      Davies Liu authored
      Or Hive can't read it back correctly.
      
      Thanks vanzin for report this.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8674 from davies/positive_nano.
      7e32387a
    • Nick Pritchard's avatar
      [SPARK-10573] [ML] IndexToString output schema should be StringType · 8a634e9b
      Nick Pritchard authored
      Fixes bug where IndexToString output schema was DoubleType. Correct me if I'm wrong, but it doesn't seem like the output needs to have any "ML Attribute" metadata.
      
      Author: Nick Pritchard <nicholas.pritchard@falkonry.com>
      
      Closes #8751 from pnpritchard/SPARK-10573.
      8a634e9b
    • Yanbo Liang's avatar
      [SPARK-10194] [MLLIB] [PYSPARK] SGD algorithms need convergenceTol parameter in Python · ce6f3f16
      Yanbo Liang authored
      [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382) added a ```convergenceTol``` parameter for GradientDescent-based methods in Scala. We need that parameter in Python; otherwise, Python users will not be able to adjust that behavior (or even reproduce behavior from previous releases since the default changed).
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8457 from yanboliang/spark-10194.
      ce6f3f16
    • Kousuke Saruta's avatar
      [SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastore.version is wrong. · cf2821ef
      Kousuke Saruta authored
      The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1.
      Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #8739 from sarutak/SPARK-10584.
      cf2821ef
    • Wenchen Fan's avatar
      [SPARK-9899] [SQL] log warning for direct output committer with speculation enabled · 32407bfd
      Wenchen Fan authored
      This is a follow-up of https://github.com/apache/spark/pull/8317.
      
      When speculation is enabled, there may be multiply tasks writing to the same path. Generally it's OK as we will write to a temporary directory first and only one task can commit the temporary directory to target path.
      
      However, when we use direct output committer, tasks will write data to target path directly without temporary directory. This causes problems like corrupted data. Please see [PR comment](https://github.com/apache/spark/pull/8191#issuecomment-131598385) for more details.
      
      Unfortunately, we don't have a simple flag to tell if a output committer will write to temporary directory or not, so for safety, we have to disable any customized output committer when `speculation` is true.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8687 from cloud-fan/direct-committer.
      32407bfd
    • Bertrand Dechoux's avatar
      [SPARK-9720] [ML] Identifiable types need UID in toString methods · d8156546
      Bertrand Dechoux authored
      A few Identifiable types did override their toString method but without using the parent implementation. As a consequence, the uid was not present anymore in the toString result. It is the default behaviour.
      
      This patch is a quick fix. The question of enforcement is still up.
      
      No tests have been written to verify the toString method behaviour. That would be long to do because all types should be tested and not only those which have a regression now.
      
      It is possible to enforce the condition using the compiler by making the toString method final but that would introduce unwanted potential API breaking changes (see jira).
      
      Author: Bertrand Dechoux <BertrandDechoux@users.noreply.github.com>
      
      Closes #8062 from BertrandDechoux/SPARK-9720.
      d8156546
  2. Sep 13, 2015
  3. Sep 12, 2015
    • Josh Rosen's avatar
      [SPARK-10330] Add Scalastyle rule to require use of SparkHadoopUtil JobContext methods · b3a7480a
      Josh Rosen authored
      This is a followup to #8499 which adds a Scalastyle rule to mandate the use of SparkHadoopUtil's JobContext accessor methods and fixes the existing violations.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8521 from JoshRosen/SPARK-10330-part2.
      b3a7480a
    • JihongMa's avatar
      [SPARK-6548] Adding stddev to DataFrame functions · f4a22808
      JihongMa authored
      Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.
      
      Author: JihongMa <linlin200605@gmail.com>
      Author: Jihong MA <linlin200605@gmail.com>
      Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
      Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>
      
      Closes #6297 from JihongMA/SPARK-SQL.
      f4a22808
    • Sean Owen's avatar
      [SPARK-10547] [TEST] Streamline / improve style of Java API tests · 22730ad5
      Sean Owen authored
      Fix a few Java API test style issues: unused generic types, exceptions, wrong assert argument order
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #8706 from srowen/SPARK-10547.
      22730ad5
    • Nithin Asokan's avatar
      [SPARK-10554] [CORE] Fix NPE with ShutdownHook · 8285e3b0
      Nithin Asokan authored
      https://issues.apache.org/jira/browse/SPARK-10554
      
      Fixes NPE when ShutdownHook tries to cleanup temporary folders
      
      Author: Nithin Asokan <Nithin.Asokan@Cerner.com>
      
      Closes #8720 from nasokan/SPARK-10554.
      8285e3b0
    • Daniel Imfeld's avatar
      [SPARK-10566] [CORE] SnappyCompressionCodec init exception handling masks... · 6d836780
      Daniel Imfeld authored
      [SPARK-10566] [CORE] SnappyCompressionCodec init exception handling masks important error information
      
      When throwing an IllegalArgumentException in SnappyCompressionCodec.init, chain the existing exception. This allows potentially important debugging info to be passed to the user.
      
      Manual testing shows the exception chained properly, and the test suite still looks fine as well.
      
      This contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Daniel Imfeld <daniel@danielimfeld.com>
      
      Closes #8725 from dimfeld/dimfeld-patch-1.
      6d836780
  4. Sep 11, 2015
  5. Sep 10, 2015
    • Yanbo Liang's avatar
      [SPARK-10027] [ML] [PySpark] Add Python API missing methods for ml.feature · a140dd77
      Yanbo Liang authored
      Missing method of ml.feature are listed here:
      ```StringIndexer``` lacks of parameter ```handleInvalid```.
      ```StringIndexerModel``` lacks of method ```labels```.
      ```VectorIndexerModel``` lacks of methods ```numFeatures``` and ```categoryMaps```.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8313 from yanboliang/spark-10027.
      a140dd77
    • Yanbo Liang's avatar
      [SPARK-10023] [ML] [PySpark] Unified DecisionTreeParams checkpointInterval... · 339a5271
      Yanbo Liang authored
      [SPARK-10023] [ML] [PySpark] Unified DecisionTreeParams checkpointInterval between Scala and Python API.
      
      "checkpointInterval" is member of DecisionTreeParams in Scala API which is inconsistency with Python API, we should unified them.
      ```
      member of DecisionTreeParams <-> Scala API
      shared param for all ML Transformer/Estimator <-> Python API
      ```
      Proposal:
      "checkpointInterval" is also used by ALS, so we make it shared params at Scala.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8528 from yanboliang/spark-10023.
      339a5271
    • Matt Massie's avatar
      [SPARK-9043] Serialize key, value and combiner classes in ShuffleDependency · 0eabea8a
      Matt Massie authored
      ShuffleManager implementations are currently not given type information for
      the key, value and combiner classes. Serialization of shuffle objects relies
      on objects being JavaSerializable, with methods defined for reading/writing
      the object or, alternatively, serialization via Kryo which uses reflection.
      
      Serialization systems like Avro, Thrift and Protobuf generate classes with
      zero argument constructors and explicit schema information
      (e.g. IndexedRecords in Avro have get, put and getSchema methods).
      
      By serializing the key, value and combiner class names in ShuffleDependency,
      shuffle implementations will have access to schema information when
      registerShuffle() is called.
      
      Author: Matt Massie <massie@cs.berkeley.edu>
      
      Closes #7403 from massie/shuffle-classtags.
      0eabea8a
    • Yanbo Liang's avatar
      [SPARK-7544] [SQL] [PySpark] pyspark.sql.types.Row implements __getitem__ · 89562a17
      Yanbo Liang authored
      pyspark.sql.types.Row implements ```__getitem__```
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8333 from yanboliang/spark-7544.
      89562a17
    • Shivaram Venkataraman's avatar
      Add 1.5 to master branch EC2 scripts · 42047577
      Shivaram Venkataraman authored
      This change brings it to par with `branch-1.5` (and 1.5.0 release)
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8704 from shivaram/ec2-1.5-update.
      42047577
    • Andrew Or's avatar
      [SPARK-10443] [SQL] Refactor SortMergeOuterJoin to reduce duplication · 3db72554
      Andrew Or authored
      `LeftOutputIterator` and `RightOutputIterator` are symmetrically identical and can share a lot of code. If someone makes a change in one but forgets to do the same thing in the other we'll end up with inconsistent behavior. This patch also adds inline comments to clarify the intention of the code.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8596 from andrewor14/smoj-cleanup.
      3db72554
    • Sun Rui's avatar
      [SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame. · 45e3be5c
      Sun Rui authored
      this PR :
      1.  Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.
      
      2.  Enhance the SerDe to support transferring  a Scala seq to R side. Data of ArrayType in DataFrame
      after collection is observed to be of Scala Seq type.
      
      3.  Support ArrayType in createDataFrame().
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8458 from sun-rui/SPARK-10049.
      45e3be5c
    • zsxwing's avatar
      [SPARK-9990] [SQL] Create local hash join operator · d88abb7e
      zsxwing authored
      This PR includes the following changes:
      - Add SQLConf to LocalNode
      - Add HashJoinNode
      - Add ConvertToUnsafeNode and ConvertToSafeNode.scala to test unsafe hash join.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8535 from zsxwing/SPARK-9990.
      d88abb7e
    • Akash Mishra's avatar
      [SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by... · a5ef2d06
      Akash Mishra authored
      [SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by implementing the sufficientResourcesRegistered method
      
      spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode.
      
      If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true.
      
      There are no existing test for YARN mode too. Hence not added test for the same.
      
      Author: Akash Mishra <akash.mishra20@gmail.com>
      
      Closes #8672 from SleepyThread/master.
      a5ef2d06
    • Iulian Dragos's avatar
      [SPARK-6350] [MESOS] Fine-grained mode scheduler respects mesosExecutor.cores · f0562e8c
      Iulian Dragos authored
      This is a regression introduced in #4960, this commit fixes it and adds a test.
      
      tnachen andrewor14 please review, this should be an easy one.
      
      Author: Iulian Dragos <jaguarul@gmail.com>
      
      Closes #8653 from dragos/issue/mesos/fine-grained-maxExecutorCores.
      f0562e8c
Loading