Skip to content
Snippets Groups Projects
  1. Oct 28, 2014
    • Kousuke Saruta's avatar
      [SPARK-4089][Doc][Minor] The version number of Spark in _config.yaml is wrong. · 4d52cec2
      Kousuke Saruta authored
      The version number of Spark in docs/_config.yaml for master branch should be 1.2.0 for now.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2943 from sarutak/SPARK-4089 and squashes the following commits:
      
      aba7fb4 [Kousuke Saruta] Fixed the version number of Spark in _config.yaml
      4d52cec2
    • Davies Liu's avatar
      [SPARK-3961] [MLlib] [PySpark] Python API for mllib.feature · fae095bc
      Davies Liu authored
      Added completed Python API for MLlib.feature
      
      Normalizer
      StandardScalerModel
      StandardScaler
      HashTF
      IDFModel
      IDF
      
      cc mengxr
      
      Author: Davies Liu <davies@databricks.com>
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2819 from davies/feature and squashes the following commits:
      
      4f48f48 [Davies Liu] add a note for HashingTF
      67f6d21 [Davies Liu] address comments
      b628693 [Davies Liu] rollback changes in Word2Vec
      efb4f4f [Davies Liu] Merge branch 'master' into feature
      806c7c2 [Davies Liu] address comments
      3abb8c2 [Davies Liu] address comments
      59781b9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into feature
      a405ae7 [Davies Liu] fix tests
      7a1891a [Davies Liu] fix tests
      486795f [Davies Liu] update programming guide, HashTF -> HashingTF
      8a50584 [Davies Liu] Python API for mllib.feature
      fae095bc
  2. Oct 27, 2014
    • Prashant Sharma's avatar
      [SPARK-4032] Deprecate YARN alpha support in Spark 1.2 · c9e05ca2
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2878 from ScrapCodes/SPARK-4032/deprecate-yarn-alpha and squashes the following commits:
      
      17e9857 [Prashant Sharma] added deperecated comment to Client and ExecutorRunnable.
      3a34b1e [Prashant Sharma] Updated docs...
      4608dea [Prashant Sharma] [SPARK-4032] Deprecate YARN alpha support in Spark 1.2
      c9e05ca2
  3. Oct 26, 2014
  4. Oct 25, 2014
    • Josh Rosen's avatar
      [SPARK-2321] Stable pull-based progress / status API · 95303168
      Josh Rosen authored
      This pull request is a first step towards the implementation of a stable, pull-based progress / status API for Spark (see [SPARK-2321](https://issues.apache.org/jira/browse/SPARK-2321)).  For now, I'd like to discuss the basic implementation, API names, and overall interface design.  Once we arrive at a good design, I'll go back and add additional methods to expose more information via these API.
      
      #### Design goals:
      
      - Pull-based API
      - Usable from Java / Scala / Python (eventually, likely with a wrapper)
      - Can be extended to expose more information without introducing binary incompatibilities.
      - Returns immutable objects.
      - Don't leak any implementation details, preserving our freedom to change the implementation.
      
      #### Implementation:
      
      - Add public methods (`getJobInfo`, `getStageInfo`) to SparkContext to allow status / progress information to be retrieved.
      - Add public interfaces (`SparkJobInfo`, `SparkStageInfo`) for our API return values.  These interfaces consist entirely of Java-style getter methods.  The interfaces are currently implemented in Java.  I decided to explicitly separate the interface from its implementation (`SparkJobInfoImpl`, `SparkStageInfoImpl`) in order to prevent users from constructing these responses themselves.
      -Allow an existing JobProgressListener to be used when constructing a live SparkUI.  This allows us to re-use this listeners in the implementation of this status API.  There are a few reasons why this listener re-use makes sense:
         - The status API and web UI are guaranteed to show consistent information.
         - These listeners are already well-tested.
         - The same garbage-collection / information retention configurations can apply to both this API and the web UI.
      - Extend JobProgressListener to maintain `jobId -> Job` and `stageId -> Stage` mappings.
      
      The progress API methods are implemented in a separate trait that's mixed into SparkContext.  This helps to avoid SparkContext.scala from becoming larger and more difficult to read.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2696 from JoshRosen/progress-reporting-api and squashes the following commits:
      
      e6aa78d [Josh Rosen] Add tests.
      b585c16 [Josh Rosen] Accept SparkListenerBus instead of more specific subclasses.
      c96402d [Josh Rosen] Address review comments.
      2707f98 [Josh Rosen] Expose current stage attempt id
      c28ba76 [Josh Rosen] Update demo code:
      646ff1d [Josh Rosen] Document spark.ui.retainedJobs.
      7f47d6d [Josh Rosen] Clean up SparkUI constructors, per Andrew's feedback.
      b77b3d8 [Josh Rosen] Merge remote-tracking branch 'origin/master' into progress-reporting-api
      787444c [Josh Rosen] Move status API methods into trait that can be mixed into SparkContext.
      f9a9a00 [Josh Rosen] More review comments:
      3dc79af [Josh Rosen] Remove creation of unused listeners in SparkContext.
      249ca16 [Josh Rosen] Address several review comments:
      da5648e [Josh Rosen] Add example of basic progress reporting in Java.
      7319ffd [Josh Rosen] Add getJobIdsForGroup() and num*Tasks() methods.
      cc568e5 [Josh Rosen] Add note explaining that interfaces should not be implemented outside of Spark.
      6e840d4 [Josh Rosen] Remove getter-style names and "consistent snapshot" semantics:
      08cbec9 [Josh Rosen] Begin to sketch the interfaces for a stable, public status API.
      ac2d13a [Josh Rosen] Add jobId->stage, stageId->stage mappings in JobProgressListener
      24de263 [Josh Rosen] Create UI listeners in SparkContext instead of in Tabs:
      95303168
  5. Oct 24, 2014
    • Zhan Zhang's avatar
      [SPARK-2706][SQL] Enable Spark to support Hive 0.13 · 7c89a8f0
      Zhan Zhang authored
      Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support.
      
      Approach: Introduce “hive-version” property,  and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically,
      
      1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1
      
      2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0)  if no hive.version is specified.
      
      3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1.
      
      4. With this approach, nothing is changed with current hive-0.12 support.
      
      No change by default: sbt/sbt -Phive
      For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
      
      To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1
      For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
      
      Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1).
      
      Author: Zhan Zhang <zhazhan@gmail.com>
      Author: zhzhan <zhazhan@gmail.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2241 from zhzhan/spark-2706 and squashes the following commits:
      
      3ece905 [Zhan Zhang] minor fix
      410b668 [Zhan Zhang] solve review comments
      cbb4691 [Zhan Zhang] change run-test for new options
      0d4d2ed [Zhan Zhang] rebase
      497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default
      ab028d1 [Zhan Zhang] rebase
      4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241
      b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706
      2b50502 [Zhan Zhang] rebase
      a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      cb22863 [Zhan Zhang] correct the typo
      20f6cf7 [Zhan Zhang] solve compatability issue
      f7912a9 [Zhan Zhang] rebase and solve review feedback
      301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      10c3565 [Zhan Zhang] address review comments
      6bc9204 [Zhan Zhang] rebase and remove temparory repo
      d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706
      cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      3ced0d7 [Zhan Zhang] rebase
      d9b981d [Zhan Zhang] rebase and fix error due to rollback
      adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts
      d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      dc7bdb3 [Zhan Zhang] solve conflicts
      7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706
      68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      d48bd18 [Zhan Zhang] address review comments
      3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706
      2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      9412d24 [Zhan Zhang] address review comments
      f4af934 [Zhan Zhang] rebase
      1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being
      af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      5f5619f [Zhan Zhang] restructure the directory and different hive version support
      05d3683 [Zhan Zhang] solve conflicts
      e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark
      87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706
      921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706
      789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
      f6a8a40 [Zhan Zhang] revert
      ba14f28 [Zhan Zhang] test
      dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master'
      70964fe [Zhan Zhang] revert
      fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
      70ffd93 [Zhan Zhang] revert
      42585ec [Zhan Zhang] test
      7d5fce2 [Zhan Zhang] test
      7c89a8f0
  6. Oct 23, 2014
    • Kousuke Saruta's avatar
      [SPARK-4055][MLlib] Inconsistent spelling 'MLlib' and 'MLLib' · f799700e
      Kousuke Saruta authored
      Thare are some inconsistent spellings 'MLlib' and 'MLLib' in some documents and source codes.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2903 from sarutak/SPARK-4055 and squashes the following commits:
      
      b031640 [Kousuke Saruta] Fixed inconsistent spelling "MLlib and MLLib"
      f799700e
  7. Oct 21, 2014
    • Sandy Ryza's avatar
      SPARK-1813. Add a utility to SparkConf that makes using Kryo really easy · 6bb56fae
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #789 from sryza/sandy-spark-1813 and squashes the following commits:
      
      48b05e9 [Sandy Ryza] Simplify
      b824932 [Sandy Ryza] Allow both spark.kryo.classesToRegister and spark.kryo.registrator at the same time
      6a15bb7 [Sandy Ryza] Small fix
      a2278c0 [Sandy Ryza] Respond to review comments
      6ef592e [Sandy Ryza] SPARK-1813. Add a utility to SparkConf that makes using Kryo really easy
      6bb56fae
  8. Oct 19, 2014
    • Josh Rosen's avatar
      [SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport) · 7e63bb49
      Josh Rosen authored
      
      This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`.  The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops.  The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks.  Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket).
      
      This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues.
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits:
      
      f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf.
      b562451 [Josh Rosen] Remove unused jobConfCacheKey field.
      dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task.
      
      (cherry picked from commit 2cd40db2)
      Signed-off-by: default avatarJosh Rosen <joshrosen@databricks.com>
      
      Conflicts:
      	core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
      7e63bb49
  9. Oct 18, 2014
    • Davies Liu's avatar
      [SPARK-3952] [Streaming] [PySpark] add Python examples in Streaming Programming Guide · 05db2da7
      Davies Liu authored
      Having Python examples in Streaming Programming Guide.
      
      Also add RecoverableNetworkWordCount example.
      
      Author: Davies Liu <davies.liu@gmail.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #2808 from davies/pyguide and squashes the following commits:
      
      8d4bec4 [Davies Liu] update readme
      26a7e37 [Davies Liu] fix format
      3821c4d [Davies Liu] address comments, add missing file
      7e4bb8a [Davies Liu] add Python examples in Streaming Programming Guide
      05db2da7
  10. Oct 16, 2014
    • WangTaoTheTonic's avatar
      [SPARK-3890][Docs]remove redundant spark.executor.memory in doc · e7f4ea8a
      WangTaoTheTonic authored
      Introduced in https://github.com/pwendell/spark/commit/f7e79bc42c1635686c3af01eef147dae92de2529, I'm not sure why we need two spark.executor.memory here.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #2745 from WangTaoTheTonic/redundantconfig and squashes the following commits:
      
      e7564dc [WangTao] too long line
      fdbdb1f [WangTaoTheTonic] trivial workaround
      d06b6e5 [WangTaoTheTonic] remove redundant spark.executor.memory in doc
      e7f4ea8a
    • Aaron Davidson's avatar
      [SPARK-3923] Increase Akka heartbeat pause above heartbeat interval · 7f7b50ed
      Aaron Davidson authored
      Something about the 2.3.4 upgrade seems to have made the issue manifest where all the services disconnect from each other after exactly 1000 seconds (which is the heartbeat interval). [This post](https://groups.google.com/forum/#!topic/akka-user/X3xzpTCbEFs) suggests that heartbeat pause should be greater than heartbeat interval, and increasing the pause from 600s to 6000s seems to have rectified the issue. My current cluster has now exceeded 1400s of uptime without failure!
      
      I do not know why this fixed it, because the threshold we have set for the failure detector is the exponent of a timeout, and 300 is extremely large. Perhaps the default failure detector changed in 2.3.4 and now ignores threshold.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #2784 from aarondav/fix-timeout and squashes the following commits:
      
      bd1151a [Aaron Davidson] Increase pause, don't decrease interval
      9cb0372 [Aaron Davidson] [SPARK-3923] Decrease Akka heartbeat interval below heartbeat pause
      7f7b50ed
  11. Oct 15, 2014
    • GuoQiang Li's avatar
      [SPARK-2098] All Spark processes should support spark-defaults.conf, config file · 293a0b5d
      GuoQiang Li authored
      This is another implementation about #1256
      cc andrewor14 vanzin
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2379 from witgo/SPARK-2098-new and squashes the following commits:
      
      4ef1cbd [GuoQiang Li] review commit
      49ef70e [GuoQiang Li] Refactor getDefaultPropertiesFile
      c45d20c [GuoQiang Li] All Spark processes should support spark-defaults.conf, config file
      293a0b5d
  12. Oct 14, 2014
    • Sean Owen's avatar
      SPARK-1307 [DOCS] Don't use term 'standalone' to refer to a Spark Application · 18ab6bd7
      Sean Owen authored
      HT to Diana, just proposing an implementation of her suggestion, which I rather agreed with. Is there a second/third for the motion?
      
      Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2787 from srowen/SPARK-1307 and squashes the following commits:
      
      b5b82e2 [Sean Owen] Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      18ab6bd7
  13. Oct 13, 2014
    • w00228970's avatar
      [SPARK-3899][Doc]fix wrong links in streaming doc · 92e017fb
      w00228970 authored
      There are three  [Custom Receiver Guide] links in streaming doc, the first is wrong.
      
      Author: w00228970 <wangfei1@huawei.com>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #2749 from scwf/streaming-doc and squashes the following commits:
      
      0cd76b7 [wangfei] update link tojump to the Akka-specific section
      45b0646 [w00228970] wrong link in streaming doc
      92e017fb
  14. Oct 09, 2014
    • Josh Rosen's avatar
      [SPARK-3772] Allow `ipython` to be used by Pyspark workers; IPython support improvements: · 4e9b551a
      Josh Rosen authored
      This pull request addresses a few issues related to PySpark's IPython support:
      
      - Fix the remaining uses of the '-u' flag, which IPython doesn't support (see SPARK-3772).
      - Change PYSPARK_PYTHON_OPTS to PYSPARK_DRIVER_PYTHON_OPTS, so that the old name is reserved in case we ever want to allow the worker Python options to be customized (this variable was introduced in #2554 and hasn't landed in a release yet, so this doesn't break any compatibility).
      - Introduce a PYSPARK_DRIVER_PYTHON option that allows the driver to use `ipython` while the workers use a different Python version.
      - Attempt to use Python 2.7 by default if PYSPARK_PYTHON is not specified.
      - Retain the old semantics for IPYTHON=1 and IPYTHON_OPTS (to avoid breaking existing example programs).
      
      There are more details in a block comment in `bin/pyspark`.
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2651 from JoshRosen/SPARK-3772 and squashes the following commits:
      
      7b8eb86 [Josh Rosen] More changes to PySpark python executable configuration:
      c4f5778 [Josh Rosen] [SPARK-3772] Allow ipython to be used by Pyspark workers; IPython fixes:
      4e9b551a
    • nartz's avatar
      add spark.driver.memory to config docs · 13cab5ba
      nartz authored
      It took me a minute to track this down, so I thought it could be useful to have it in the docs.
      
      I'm unsure if 512mb is the default for spark.driver.memory? Also - there could be a better value for the 'description' to differentiate it from spark.executor.memory.
      
      Author: nartz <nartzpod@gmail.com>
      Author: Nathan Artz <nathanartz@Nathans-MacBook-Pro.local>
      
      Closes #2410 from nartz/docs/add-spark-driver-memory-to-config-docs and squashes the following commits:
      
      a2f6c62 [nartz] Update configuration.md
      74521b8 [Nathan Artz] add spark.driver.memory to config docs
      13cab5ba
  15. Oct 07, 2014
  16. Oct 05, 2014
  17. Oct 03, 2014
    • Kousuke Saruta's avatar
      [SPARK-3763] The example of building with sbt should be "sbt assembly" instead of "sbt compile" · 1eb8389c
      Kousuke Saruta authored
      In building-spark.md, there are some examples for making assembled package with maven but the example for building with sbt is only about for compiling.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2627 from sarutak/SPARK-3763 and squashes the following commits:
      
      fadb990 [Kousuke Saruta] Modified the example to build with sbt in building-spark.md
      1eb8389c
    • Brenden Matthews's avatar
      [SPARK-3535][Mesos] Fix resource handling. · a8c52d53
      Brenden Matthews authored
      Author: Brenden Matthews <brenden@diddyinc.com>
      
      Closes #2401 from brndnmtthws/master and squashes the following commits:
      
      4abaa5d [Brenden Matthews] [SPARK-3535][Mesos] Fix resource handling.
      a8c52d53
    • EugenCepoi's avatar
      SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR · f0811f92
      EugenCepoi authored
      Update of PR #997.
      
      With this PR, setting SPARK_CONF_DIR overrides SPARK_HOME/conf (not only spark-defaults.conf and spark-env).
      
      Author: EugenCepoi <cepoi.eugen@gmail.com>
      
      Closes #2481 from EugenCepoi/SPARK-2058 and squashes the following commits:
      
      0bb32c2 [EugenCepoi] use orElse orNull and fixing trailing percent in compute-classpath.cmd
      77f35d7 [EugenCepoi] SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR
      f0811f92
  18. Oct 02, 2014
    • scwf's avatar
      [SPARK-3766][Doc]Snappy is also the default compress codec for broadcast variables · c6469a02
      scwf authored
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2632 from scwf/compress-doc and squashes the following commits:
      
      7983a1a [scwf] snappy is the default compression codec for broadcast
      c6469a02
    • Nishkam Ravi's avatar
      Modify default YARN memory_overhead-- from an additive constant to a multiplier · b4fb7b80
      Nishkam Ravi authored
      Redone against the recent master branch (https://github.com/apache/spark/pull/1391)
      
      Author: Nishkam Ravi <nravi@cloudera.com>
      Author: nravi <nravi@c1704.halxg.cloudera.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      
      Closes #2485 from nishkamravi2/master_nravi and squashes the following commits:
      
      636a9ff [nishkamravi2] Update YarnAllocator.scala
      8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
      35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
      5ac2ec1 [Nishkam Ravi] Remove out
      dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
      42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
      362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
      c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
      1cf2d1e [nishkamravi2] Update YarnAllocator.scala
      ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
      2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
      2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
      3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
      5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
      eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
      df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
      6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
      5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
      681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
      b4fb7b80
    • Yin Huai's avatar
      [SQL][Docs] Update the output of printSchema and fix a typo in SQL programming guide. · 82a6a083
      Yin Huai authored
      We have changed the output format of `printSchema`. This PR will update our SQL programming guide to show the updated format. Also, it fixes a typo (the value type of `StructType` in Java API).
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2630 from yhuai/sqlDoc and squashes the following commits:
      
      267d63e [Yin Huai] Update the output of printSchema and fix a typo.
      82a6a083
    • cocoatomo's avatar
      [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset · 5b4a5b1a
      cocoatomo authored
      ### Problem
      
      The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython.
      But a folloing command does not run IPython but a default Python executable.
      
      ```
      $ IPYTHON=1 ./bin/pyspark
      Python 2.7.8 (default, Jul  2 2014, 10:14:46)
      ...
      ```
      
      the spark/bin/pyspark script on the commit b235e013 decides which executable and options it use folloing way.
      
      1. if PYSPARK_PYTHON unset
         * → defaulting to "python"
      2. if IPYTHON_OPTS set
         * → set IPYTHON "1"
      3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
         * out of this issues scope
      4. if IPYTHON set as "1"
         * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
         * otherwise execute $PYSPARK_PYTHON
      
      Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1".
      In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use.
      
      PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command
      ---- | ---- | ----- | ----- | -----
      (unset → defaults to python) | (unset) | (unset) | python | (same)
      (unset → defaults to python) | (unset) | 1 | python | ipython
      (unset → defaults to python) | an_option | (unset → set to 1) | python an_option | ipython an_option
      (unset → defaults to python) | an_option | 1 | python an_option | ipython an_option
      ipython | (unset) | (unset) | ipython | (same)
      ipython | (unset) | 1 | ipython | (same)
      ipython | an_option | (unset → set to 1) | ipython an_option | (same)
      ipython | an_option | 1 | ipython an_option | (same)
      
      ### Suggestion
      
      The pyspark script should determine firstly whether a user wants to run IPython or other executables.
      
      1. if IPYTHON_OPTS set
         * set IPYTHON "1"
      2.  if IPYTHON has a value "1"
         * PYSPARK_PYTHON defaults to "ipython" if not set
      3. PYSPARK_PYTHON defaults to "python" if not set
      
      See the pull request for more detailed modification.
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and squashes the following commits:
      
      d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment variable instead of -u option
      264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated environment variables
      42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to customize execution of PySpark REPL
      10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
      5b4a5b1a
  19. Sep 30, 2014
    • Davies Liu's avatar
      [SPARK-3478] [PySpark] Profile the Python tasks · c5414b68
      Davies Liu authored
      This patch add profiling support for PySpark, it will show the profiling results
      before the driver exits, here is one example:
      
      ```
      ============================================================
      Profile of RDD<id=3>
      ============================================================
               5146507 function calls (5146487 primitive calls) in 71.094 seconds
      
         Ordered by: internal time, cumulative time
      
         ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5144576   68.331    0.000   68.331    0.000 statcounter.py:44(merge)
             20    2.735    0.137   71.071    3.554 statcounter.py:33(__init__)
             20    0.017    0.001    0.017    0.001 {cPickle.dumps}
           1024    0.003    0.000    0.003    0.000 t.py:16(<lambda>)
             20    0.001    0.000    0.001    0.000 {reduce}
             21    0.001    0.000    0.001    0.000 {cPickle.loads}
             20    0.001    0.000    0.001    0.000 copy_reg.py:95(_slotnames)
             41    0.001    0.000    0.001    0.000 serializers.py:461(read_int)
             40    0.001    0.000    0.002    0.000 serializers.py:179(_batched)
             62    0.000    0.000    0.000    0.000 {method 'read' of 'file' objects}
             20    0.000    0.000   71.072    3.554 rdd.py:863(<lambda>)
             20    0.000    0.000    0.001    0.000 serializers.py:198(load_stream)
          40/20    0.000    0.000   71.072    3.554 rdd.py:2093(pipeline_func)
             41    0.000    0.000    0.002    0.000 serializers.py:130(load_stream)
             40    0.000    0.000   71.072    1.777 rdd.py:304(func)
             20    0.000    0.000   71.094    3.555 worker.py:82(process)
      ```
      
      Also, use can show profile result manually by `sc.show_profiles()` or dump it into disk
      by `sc.dump_profiles(path)`, such as
      
      ```python
      >>> sc._conf.set("spark.python.profile", "true")
      >>> rdd = sc.parallelize(range(100)).map(str)
      >>> rdd.count()
      100
      >>> sc.show_profiles()
      ============================================================
      Profile of RDD<id=1>
      ============================================================
               284 function calls (276 primitive calls) in 0.001 seconds
      
         Ordered by: internal time, cumulative time
      
         ncalls  tottime  percall  cumtime  percall filename:lineno(function)
              4    0.000    0.000    0.000    0.000 serializers.py:198(load_stream)
              4    0.000    0.000    0.000    0.000 {reduce}
           12/4    0.000    0.000    0.001    0.000 rdd.py:2092(pipeline_func)
              4    0.000    0.000    0.000    0.000 {cPickle.loads}
              4    0.000    0.000    0.000    0.000 {cPickle.dumps}
            104    0.000    0.000    0.000    0.000 rdd.py:852(<genexpr>)
              8    0.000    0.000    0.000    0.000 serializers.py:461(read_int)
             12    0.000    0.000    0.000    0.000 rdd.py:303(func)
      ```
      The profiling is disabled by default, can be enabled by "spark.python.profile=true".
      
      Also, users can dump the results into disks automatically for future analysis, by "spark.python.profile.dump=path_to_dump"
      
      This is bugfix of #2351 cc JoshRosen
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2556 from davies/profiler and squashes the following commits:
      
      e68df5a [Davies Liu] Merge branch 'master' of github.com:apache/spark into profiler
      858e74c [Davies Liu] compatitable with python 2.6
      7ef2aa0 [Davies Liu] bugfix, add tests for show_profiles and dump_profiles()
      2b0daf2 [Davies Liu] fix docs
      7a56c24 [Davies Liu] bugfix
      cba9463 [Davies Liu] move show_profiles and dump_profiles to SparkContext
      fb9565b [Davies Liu] Merge branch 'master' of github.com:apache/spark into profiler
      116d52a [Davies Liu] Merge branch 'master' of github.com:apache/spark into profiler
      09d02c3 [Davies Liu] Merge branch 'master' into profiler
      c23865c [Davies Liu] Merge branch 'master' into profiler
      15d6f18 [Davies Liu] add docs for two configs
      dadee1a [Davies Liu] add docs string and clear profiles after show or dump
      4f8309d [Davies Liu] address comment, add tests
      0a5b6eb [Davies Liu] fix Python UDF
      4b20494 [Davies Liu] add profile for python
      c5414b68
    • Sean Owen's avatar
      [SPARK-3356] [DOCS] Document when RDD elements' ordering within partitions is nondeterministic · ab6dd80b
      Sean Owen authored
      As suggested by mateiz , and because it came up on the mailing list again last week, this attempts to document that ordering of elements is not guaranteed across RDD evaluations in groupBy, zip, and partition-wise RDD methods. Suggestions welcome about the wording, or other methods that need a note.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2508 from srowen/SPARK-3356 and squashes the following commits:
      
      b7c96fd [Sean Owen] Undo change to programming guide
      ad4aeec [Sean Owen] Don't mention ordering in partition-wise methods, reword description of ordering for zip methods per review, and add similar note to programming guide, which mentions groupByKey (but not zip methods)
      fce943b [Sean Owen] Note that ordering of elements is not guaranteed across RDD evaluations in groupBy, zip, and partition-wise RDD methods
      ab6dd80b
  20. Sep 28, 2014
  21. Sep 27, 2014
    • CrazyJvm's avatar
      Docs : use "--total-executor-cores" rather than "--cores" after spark-shell · 66107f46
      CrazyJvm authored
      Author: CrazyJvm <crazyjvm@gmail.com>
      
      Closes #2540 from CrazyJvm/standalone-core and squashes the following commits:
      
      66d9fc6 [CrazyJvm] use "--total-executor-cores" rather than "--cores" after spark-shell
      66107f46
    • Jeff Steinmetz's avatar
      stop, start and destroy require the EC2_REGION · 9e8ced78
      Jeff Steinmetz authored
      i.e
      ./spark-ec2 --region=us-west-1 stop yourclustername
      
      Author: Jeff Steinmetz <jeffrey.steinmetz@gmail.com>
      
      Closes #2473 from jeffsteinmetz/master and squashes the following commits:
      
      7491f2c [Jeff Steinmetz] fix case in EC2 cluster setup documentation
      bd3d777 [Jeff Steinmetz] standardized ec2 documenation to use <lower-case> sample args
      2bf4a57 [Jeff Steinmetz] standardized ec2 documenation to use <lower-case> sample args
      68d8372 [Jeff Steinmetz] standardized ec2 documenation to use <lower-case> sample args
      d2ab6e2 [Jeff Steinmetz] standardized ec2 documenation to use <lower-case> sample args
      520e6dc [Jeff Steinmetz] standardized ec2 documenation to use <lower-case> sample args
      37fc876 [Jeff Steinmetz] stop, start and destroy require the EC2_REGION
      9e8ced78
    • Michael Armbrust's avatar
      [SQL][DOCS] Clarify that the server is for JDBC and ODBC · f0eea76d
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2527 from marmbrus/patch-1 and squashes the following commits:
      
      a0f9f1c [Michael Armbrust] [SQL][DOCS] Clarify that the server is for JDBC and ODBC
      f0eea76d
  22. Sep 26, 2014
    • Josh Rosen's avatar
      Revert "[SPARK-3478] [PySpark] Profile the Python tasks" · f872e4fb
      Josh Rosen authored
      This reverts commit 1aa549ba.
      f872e4fb
    • RJ Nowling's avatar
      [SPARK-3614][MLLIB] Add minimumOccurence filtering to IDF · ec9df6a7
      RJ Nowling authored
      This PR for [SPARK-3614](https://issues.apache.org/jira/browse/SPARK-3614) adds functionality for filtering out terms which do not appear in at least a minimum number of documents.
      
      This is implemented using a minimumOccurence parameter (default 0).  When terms' document frequencies are less than minimumOccurence, their IDFs are set to 0, just like when the DF is 0.  As a result, the TF-IDFs for the terms are found to be 0, as if the terms were not present in the documents.
      
      This PR makes the following changes:
      * Add a minimumOccurence parameter to the IDF and DocumentFrequencyAggregator classes.
      * Create a parameter-less constructor for IDF with a default minimumOccurence value of 0 to remain backwards-compatibility with the original IDF API.
      * Sets the IDFs to 0 for terms which DFs are less than minimumOccurence
      * Add tests to the Spark IDFSuite and Java JavaTfIdfSuite test suites
      * Updated the MLLib Feature Extraction programming guide to describe the new feature
      
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #2494 from rnowling/spark-3614-idf-filter and squashes the following commits:
      
      0aa3c63 [RJ Nowling] Fix identation
      e6523a8 [RJ Nowling] Remove unnecessary toDouble's from IDFSuite
      bfa82ec [RJ Nowling] Add space after if
      30d20b3 [RJ Nowling] Add spaces around equals signs
      9013447 [RJ Nowling] Add space before division operator
      79978fc [RJ Nowling] Remove unnecessary semi-colon
      40fd70c [RJ Nowling] Change minimumOccurence to minDocFreq in code and docs
      47850ab [RJ Nowling] Changed minimumOccurence to Int from Long
      9fb4093 [RJ Nowling] Remove unnecessary lines from IDF class docs
      1fc09d8 [RJ Nowling] Add backwards-compatible constructor to DocumentFrequencyAggregator
      1801fd2 [RJ Nowling] Fix style errors in IDF.scala
      6897252 [RJ Nowling] Preface minimumOccurence members with val to make them final and immutable
      a200bab [RJ Nowling] Remove unnecessary else statement
      4b974f5 [RJ Nowling] Remove accidentally-added import from testing
      c0cc643 [RJ Nowling] Add minimumOccurence filtering to IDF
      ec9df6a7
    • Davies Liu's avatar
      [SPARK-3478] [PySpark] Profile the Python tasks · 1aa549ba
      Davies Liu authored
      This patch add profiling support for PySpark, it will show the profiling results
      before the driver exits, here is one example:
      
      ```
      ============================================================
      Profile of RDD<id=3>
      ============================================================
               5146507 function calls (5146487 primitive calls) in 71.094 seconds
      
         Ordered by: internal time, cumulative time
      
         ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5144576   68.331    0.000   68.331    0.000 statcounter.py:44(merge)
             20    2.735    0.137   71.071    3.554 statcounter.py:33(__init__)
             20    0.017    0.001    0.017    0.001 {cPickle.dumps}
           1024    0.003    0.000    0.003    0.000 t.py:16(<lambda>)
             20    0.001    0.000    0.001    0.000 {reduce}
             21    0.001    0.000    0.001    0.000 {cPickle.loads}
             20    0.001    0.000    0.001    0.000 copy_reg.py:95(_slotnames)
             41    0.001    0.000    0.001    0.000 serializers.py:461(read_int)
             40    0.001    0.000    0.002    0.000 serializers.py:179(_batched)
             62    0.000    0.000    0.000    0.000 {method 'read' of 'file' objects}
             20    0.000    0.000   71.072    3.554 rdd.py:863(<lambda>)
             20    0.000    0.000    0.001    0.000 serializers.py:198(load_stream)
          40/20    0.000    0.000   71.072    3.554 rdd.py:2093(pipeline_func)
             41    0.000    0.000    0.002    0.000 serializers.py:130(load_stream)
             40    0.000    0.000   71.072    1.777 rdd.py:304(func)
             20    0.000    0.000   71.094    3.555 worker.py:82(process)
      ```
      
      Also, use can show profile result manually by `sc.show_profiles()` or dump it into disk
      by `sc.dump_profiles(path)`, such as
      
      ```python
      >>> sc._conf.set("spark.python.profile", "true")
      >>> rdd = sc.parallelize(range(100)).map(str)
      >>> rdd.count()
      100
      >>> sc.show_profiles()
      ============================================================
      Profile of RDD<id=1>
      ============================================================
               284 function calls (276 primitive calls) in 0.001 seconds
      
         Ordered by: internal time, cumulative time
      
         ncalls  tottime  percall  cumtime  percall filename:lineno(function)
              4    0.000    0.000    0.000    0.000 serializers.py:198(load_stream)
              4    0.000    0.000    0.000    0.000 {reduce}
           12/4    0.000    0.000    0.001    0.000 rdd.py:2092(pipeline_func)
              4    0.000    0.000    0.000    0.000 {cPickle.loads}
              4    0.000    0.000    0.000    0.000 {cPickle.dumps}
            104    0.000    0.000    0.000    0.000 rdd.py:852(<genexpr>)
              8    0.000    0.000    0.000    0.000 serializers.py:461(read_int)
             12    0.000    0.000    0.000    0.000 rdd.py:303(func)
      ```
      The profiling is disabled by default, can be enabled by "spark.python.profile=true".
      
      Also, users can dump the results into disks automatically for future analysis, by "spark.python.profile.dump=path_to_dump"
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2351 from davies/profiler and squashes the following commits:
      
      7ef2aa0 [Davies Liu] bugfix, add tests for show_profiles and dump_profiles()
      2b0daf2 [Davies Liu] fix docs
      7a56c24 [Davies Liu] bugfix
      cba9463 [Davies Liu] move show_profiles and dump_profiles to SparkContext
      fb9565b [Davies Liu] Merge branch 'master' of github.com:apache/spark into profiler
      116d52a [Davies Liu] Merge branch 'master' of github.com:apache/spark into profiler
      09d02c3 [Davies Liu] Merge branch 'master' into profiler
      c23865c [Davies Liu] Merge branch 'master' into profiler
      15d6f18 [Davies Liu] add docs for two configs
      dadee1a [Davies Liu] add docs string and clear profiles after show or dump
      4f8309d [Davies Liu] address comment, add tests
      0a5b6eb [Davies Liu] fix Python UDF
      4b20494 [Davies Liu] add profile for python
      1aa549ba
  23. Sep 25, 2014
    • Kousuke Saruta's avatar
      [SPARK-3584] sbin/slaves doesn't work when we use password authentication for SSH · 0dc868e7
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2444 from sarutak/slaves-scripts-modification and squashes the following commits:
      
      eff7394 [Kousuke Saruta] Improve the description about Cluster Launch Script in docs/spark-standalone.md
      7858225 [Kousuke Saruta] Modified sbin/slaves to use the environment variable "SPARK_SSH_FOREGROUND" as a flag
      53d7121 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
      e570431 [Kousuke Saruta] Added a description for SPARK_SSH_FOREGROUND variable
      7120a0c [Kousuke Saruta] Added a description about default host for sbin/slaves
      1bba8a9 [Kousuke Saruta] Added SPARK_SSH_FOREGROUND flag to sbin/slaves
      88e2f17 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
      297e75d [Kousuke Saruta] Modified sbin/slaves not to export HOSTLIST
      0dc868e7
    • Aaron Staple's avatar
      [SPARK-1484][MLLIB] Warn when running an iterative algorithm on uncached data. · ff637c93
      Aaron Staple authored
      Add warnings to KMeans, GeneralizedLinearAlgorithm, and computeSVD when called with input data that is not cached. KMeans is implemented iteratively, and I believe that GeneralizedLinearAlgorithm’s current optimizers are iterative and its future optimizers are also likely to be iterative. RowMatrix’s computeSVD is iterative against an RDD when run in DistARPACK mode. ALS and DecisionTree are iterative as well, but they implement RDD caching internally so do not require a warning.
      
      I added a warning to GeneralizedLinearAlgorithm rather than inside its optimizers, where the iteration actually occurs, because internally GeneralizedLinearAlgorithm maps its input data to an uncached RDD before passing it to an optimizer. (In other words, the warning would be printed for every GeneralizedLinearAlgorithm run, regardless of whether its input is cached, if the warning were in GradientDescent or other optimizer.) I assume that use of an uncached RDD by GeneralizedLinearAlgorithm is intentional, and that the mapping there (adding label, intercepts and scaling) is a lightweight operation. Arguably a user calling an optimizer such as GradientDescent will be knowledgable enough to cache their data without needing a log warning, so lack of a warning in the optimizers may be ok.
      
      Some of the documentation examples making use of these iterative algorithms did not cache their training RDDs (while others did). I updated the examples to always cache. I also fixed some (unrelated) minor errors in the documentation examples.
      
      Author: Aaron Staple <aaron.staple@gmail.com>
      
      Closes #2347 from staple/SPARK-1484 and squashes the following commits:
      
      bd49701 [Aaron Staple] Address review comments.
      ab2d4a4 [Aaron Staple] Disable warnings on python code path.
      a7a0f99 [Aaron Staple] Change code comments per review comments.
      7cca1dc [Aaron Staple] Change warning message text.
      c77e939 [Aaron Staple] [SPARK-1484][MLLIB] Warn when running an iterative algorithm on uncached data.
      3b6c511 [Aaron Staple] Minor doc example fixes.
      ff637c93
  24. Sep 24, 2014
    • Aaron Staple's avatar
      [SPARK-546] Add full outer join to RDD and DStream. · 8ca4ecb6
      Aaron Staple authored
      leftOuterJoin and rightOuterJoin are already implemented.  This patch adds fullOuterJoin.
      
      Author: Aaron Staple <aaron.staple@gmail.com>
      
      Closes #1395 from staple/SPARK-546 and squashes the following commits:
      
      1f5595c [Aaron Staple] Fix python style
      7ac0aa9 [Aaron Staple] [SPARK-546] Add full outer join to RDD and DStream.
      3b5d137 [Aaron Staple] In JavaPairDStream, make class tag specification in rightOuterJoin consistent with other functions.
      31f2956 [Aaron Staple] Fix left outer join documentation comments.
      8ca4ecb6
  25. Sep 23, 2014
    • peng.zhang's avatar
      [YARN] SPARK-2668: Add variable of yarn log directory for reference from the log4j configuration · 14f8c340
      peng.zhang authored
      Assign value of yarn container log directory to java opts "spark.yarn.app.container.log.dir", So user defined log4j.properties can reference this value and write log to YARN container's log directory.
      Otherwise, user defined file appender will only write to container's CWD, and log files in CWD will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished.
      
      User defined log4j.properties reference example:
      log4j.appender.rolling_file.File = ${spark.yarn.app.container.log.dir}/spark.log
      
      Author: peng.zhang <peng.zhang@xiaomi.com>
      
      Closes #1573 from renozhang/yarn-log-dir and squashes the following commits:
      
      16c5cb8 [peng.zhang] Update doc
      f2b5e2a [peng.zhang] Change variable's name, and update running-on-yarn.md
      503ea2d [peng.zhang] Support log4j log to yarn container dir
      14f8c340
  26. Sep 22, 2014
Loading