Skip to content
Snippets Groups Projects
  1. Mar 11, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive. · 07f1c544
      Marcelo Vanzin authored
      In preparation for the demise of assemblies, this change allows the
      YARN backend to use multiple jars and globs as the "Spark jar". The
      config option has been renamed to "spark.yarn.jars" to reflect that.
      
      A second option "spark.yarn.archive" was also added; if set, this
      takes precedence and uploads an archive expected to contain the jar
      files with the Spark code and its dependencies.
      
      Existing deployments should keep working, mostly. This change drops
      support for the "SPARK_JAR" environment variable, and also does not
      fall back to using "jarOfClass" if no configuration is set, falling
      back to finding files under SPARK_HOME instead. This should be fine
      since "jarOfClass" probably wouldn't work unless you were using
      spark-submit anyway.
      
      Tested with the unit tests, and trying the different config options
      on a YARN cluster.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11500 from vanzin/SPARK-13577.
      07f1c544
    • Yuhao Yang's avatar
      [SPARK-13512][ML] add example and doc for MaxAbsScaler · 0b713e04
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      
      jira: https://issues.apache.org/jira/browse/SPARK-13512
      Add example and doc for ml.feature.MaxAbsScaler.
      
      ## How was this patch tested?
       unit tests
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #11392 from hhbyyh/maxabsdoc.
      0b713e04
    • Zheng RuiFeng's avatar
      [SPARK-13672][ML] Add python examples of BisectingKMeans in ML and MLLIB · d18276cb
      Zheng RuiFeng authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13672
      
      ## What changes were proposed in this pull request?
      
      add two python examples of BisectingKMeans for ml and mllib
      
      ## How was this patch tested?
      
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11515 from zhengruifeng/mllib_bkm_pe.
      d18276cb
  2. Mar 10, 2016
    • Dongjoon Hyun's avatar
      [MINOR][DOC] Fix supported hive version in doc · 88fa8666
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines.
      ```
      -By default Spark will build with Hive 0.13.1 bindings.
      +By default Spark will build with Hive 1.2.1 bindings.
      -# Apache Hadoop 2.4.X with Hive 13 support
      +# Apache Hadoop 2.4.X with Hive 1.2.1 support
      ```
      `sql/README.md` file also describe
      
      ## How was this patch tested?
      
      Manual.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11639 from dongjoon-hyun/fix_doc_hive_version.
      88fa8666
    • JeremyNixon's avatar
      [SPARK-13706][ML] Add Python Example for Train Validation Split · 3e3c3d58
      JeremyNixon authored
      ## What changes were proposed in this pull request?
      
      This pull request adds a python example for train validation split.
      
      ## How was this patch tested?
      
      This was style tested through lint-python, generally tested with ./dev/run-tests, and run in notebook and shell environments. It was viewed in docs locally with jekyll serve.
      
      This contribution is my original work and I license it to Spark under its open source license.
      
      Author: JeremyNixon <jnixon2@gmail.com>
      
      Closes #11547 from JeremyNixon/tvs_example.
      3e3c3d58
  3. Mar 09, 2016
    • Sergiusz Urbaniak's avatar
      [SPARK-13492][MESOS] Configurable Mesos framework webui URL. · a4a0addc
      Sergiusz Urbaniak authored
      ## What changes were proposed in this pull request?
      
      Previously the Mesos framework webui URL was being derived only from the Spark UI address leaving no possibility to configure it. This commit makes it configurable. If unset it falls back to the previous behavior.
      
      Motivation:
      This change is necessary in order to be able to install Spark on DCOS and to be able to give it a custom service link. The configured `webui_url` is configured to point to a reverse proxy in the DCOS environment.
      
      ## How was this patch tested?
      
      Locally, using unit tests and on DCOS testing and stable revision.
      
      Author: Sergiusz Urbaniak <sur@mesosphere.io>
      
      Closes #11369 from s-urbaniak/sur-webui-url.
      a4a0addc
    • Sean Owen's avatar
      [SPARK-13595][BUILD] Move docker, extras modules into external · 256704c7
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/`
      
      ## How was this patch tested?
      
      This is tested with Jenkins tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11523 from srowen/SPARK-13595.
      256704c7
    • Dongjoon Hyun's avatar
      [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. · c3689bc2
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator.
      
      ```
      -    final ArrayList<Product2<Object, Object>> dataToWrite =
      -      new ArrayList<Product2<Object, Object>>();
      +    final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>();
      ```
      
      Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this.
      
      ## How was this patch tested?
      
      Manual.
      Pass the existing tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11541 from dongjoon-hyun/SPARK-13702.
      c3689bc2
  4. Mar 08, 2016
    • Sean Owen's avatar
      [SPARK-13715][MLLIB] Remove last usages of jblas in tests · 54040f8d
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Remove last usage of jblas, in tests
      
      ## How was this patch tested?
      
      Jenkins tests -- the same ones that are being modified.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11560 from srowen/SPARK-13715.
      54040f8d
  5. Mar 07, 2016
    • Sean Owen's avatar
      [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs · 0eea12a3
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`.
      
      I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope.
      
      ## How was this patch tested?
      
      `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11522 from srowen/SPARK-13596.
      0eea12a3
    • CodingCat's avatar
      [MINOR][DOC] improve the doc for "spark.memory.offHeap.size" · a3ec50a4
      CodingCat authored
      The description of "spark.memory.offHeap.size" in the current document does not clearly state that memory is counted with bytes....
      
      This PR contains a small fix for this tiny issue
      
      document fix
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #11561 from CodingCat/master.
      a3ec50a4
    • rmishra's avatar
      [SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly... · 4b13896e
      rmishra authored
      [SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount
      
      ## What changes were proposed in this pull request?
      The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey.
      
      ## How was this patch tested?
      Have tested the new documentation with jekyll build.
      
      Author: rmishra <rmishra@pivotal.io>
      
      Closes #11545 from rishitesh/SPARK-13705.
      4b13896e
  6. Mar 03, 2016
  7. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  8. Feb 27, 2016
    • Reynold Xin's avatar
      [SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts · 59e3e10b
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts.
      
      Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11400 from rxin/release-script.
      59e3e10b
  9. Feb 26, 2016
  10. Feb 25, 2016
  11. Feb 23, 2016
  12. Feb 22, 2016
  13. Feb 21, 2016
    • Dongjoon Hyun's avatar
      [MINOR][DOCS] Fix typos in `configuration.md` and `hardware-provisioning.md` · 03e62aa3
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes some typos in the following documentation files.
       * `NOTICE`, `configuration.md`, and `hardware-provisioning.md`.
      
      ## How was the this patch tested?
      
      manual tests
      
      Author: Dongjoon Hyun <dongjoonapache.org>
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11289 from dongjoon-hyun/minor_fix_typos_notice_and_confdoc.
      03e62aa3
  14. Feb 19, 2016
  15. Feb 17, 2016
  16. Feb 16, 2016
  17. Feb 15, 2016
  18. Feb 14, 2016
    • Amit Dev's avatar
      [SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy · 331293c3
      Amit Dev authored
      Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps.
      
      Author: Amit Dev <amitdev@gmail.com>
      
      Closes #11180 from amitdev/master.
      331293c3
  19. Feb 12, 2016
  20. Feb 11, 2016
  21. Feb 10, 2016
    • Sean Owen's avatar
      [SPARK-12414][CORE] Remove closure serializer · 29c54730
      Sean Owen authored
      Remove spark.closure.serializer option and use JavaSerializer always
      
      CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11150 from srowen/SPARK-12414.
      29c54730
    • Michael Gummelt's avatar
      [SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode. · 80cb963a
      Michael Gummelt authored
      This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027
      
      In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone.  This PR implements that resolution.
      
      This PR implements two high-level features.  These two features are co-dependent, so they're implemented both here:
      - Mesos support for spark.executor.cores
      - Multiple executors per slave
      
      We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR.
      
      The contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #10993 from mgummelt/executor_sizing.
      80cb963a
Loading