Skip to content
Snippets Groups Projects
  1. Aug 09, 2015
    • Yadong Qi's avatar
      [SPARK-9737] [YARN] Add the suggested configuration when required executor... · 86fa4ba6
      Yadong Qi authored
      [SPARK-9737] [YARN] Add the suggested configuration when required executor memory is above the max threshold of this cluster on YARN mode
      
      Author: Yadong Qi <qiyadong2010@gmail.com>
      
      Closes #8028 from watermen/SPARK-9737 and squashes the following commits:
      
      48bdf3d [Yadong Qi] Add suggested configuration.
      86fa4ba6
  2. Aug 05, 2015
    • linweizhong's avatar
      [SPARK-9519] [YARN] Confirm stop sc successfully when application was killed · 7a969a69
      linweizhong authored
      Currently, when we kill application on Yarn, then will call sc.stop() at Yarn application state monitor thread, then in YarnClientSchedulerBackend.stop() will call interrupt this will cause SparkContext not stop fully as we will wait executor to exit.
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #7846 from Sephiroth-Lin/SPARK-9519 and squashes the following commits:
      
      1ae736d [linweizhong] Update comments
      2e8e365 [linweizhong] Add comment explaining the code
      ad0e23b [linweizhong] Update
      243d2c7 [linweizhong] Confirm stop sc successfully when application was killed
      7a969a69
  3. Aug 03, 2015
    • Steve Loughran's avatar
      [SPARK-8064] [SQL] Build against Hive 1.2.1 · a2409d1c
      Steve Loughran authored
      Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get sql/hive to compile against hive 1.2.1. That's the ASF release packaged under org.apache.hive, not any fork.
      
      Tests not run yet: that's what the machines are for
      
      Author: Steve Loughran <stevel@hortonworks.com>
      Author: Cheng Lian <lian@databricks.com>
      Author: Michael Armbrust <michael@databricks.com>
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #7191 from steveloughran/stevel/feature/SPARK-8064-hive-1.2-002 and squashes the following commits:
      
      7556d85 [Cheng Lian] Updates .q files and corresponding golden files
      ef4af62 [Steve Loughran] Merge commit '6a92bb09f46a04d6cd8c41bdba3ecb727ebb9030' into stevel/feature/SPARK-8064-hive-1.2-002
      6a92bb0 [Cheng Lian] Overrides HiveConf time vars
      dcbb391 [Cheng Lian] Adds com.twitter:parquet-hadoop-bundle:1.6.0 for Hive Parquet SerDe
      0bbe475 [Steve Loughran] SPARK-8064 scalastyle rejects the standard Hadoop ASF license header...
      fdf759b [Steve Loughran] SPARK-8064 classpath dependency suite to be in sync with shading in final (?) hive-exec spark
      7a6c727 [Steve Loughran] SPARK-8064 switch to second staging repo of the spark-hive artifacts. This one has the protobuf-shaded hive-exec jar
      376c003 [Steve Loughran] SPARK-8064 purge duplicate protobuf declaration
      2c74697 [Steve Loughran] SPARK-8064 switch to the protobuf shaded hive-exec jar with tests to chase it down
      cc44020 [Steve Loughran] SPARK-8064 remove hadoop.version from runtest.py, as profile will fix that automatically.
      6901fa9 [Steve Loughran] SPARK-8064 explicit protobuf import
      da310dc [Michael Armbrust] Fixes for Hive tests.
      a775a75 [Steve Loughran] SPARK-8064 cherry-pick-incomplete
      7404f34 [Patrick Wendell] Add spark-hive staging repo
      832c164 [Steve Loughran] SPARK-8064 try to supress compiler warnings on Complex.java pasted-thrift-code
      312c0d4 [Steve Loughran] SPARK-8064  maven/ivy dependency purge; calcite declaration needed
      fa5ae7b [Steve Loughran] HIVE-8064 fix up hive-thriftserver dependencies and cut back on evicted references in the hive- packages; this keeps mvn and ivy resolution compatible, as the reconciliation policy is "by hand"
      c188048 [Steve Loughran] SPARK-8064 manage the Hive depencencies to that -things that aren't needed are excluded -sql/hive built with ivy is in sync with the maven reconciliation policy, rather than latest-first
      4c8be8d [Cheng Lian] WIP: Partial fix for Thrift server and CLI tests
      314eb3c [Steve Loughran] SPARK-8064 deprecation warning  noise in one of the tests
      17b0341 [Steve Loughran] SPARK-8064 IDE-hinted cleanups of Complex.java to reduce compiler warnings. It's all autogenerated code, so still ugly.
      d029b92 [Steve Loughran] SPARK-8064 rely on unescaping to have already taken place, so go straight to map of serde options
      23eca7e [Steve Loughran] HIVE-8064 handle raw and escaped property tokens
      54d9b06 [Steve Loughran] SPARK-8064 fix compilation regression surfacing from rebase
      0b12d5f [Steve Loughran] HIVE-8064 use subset of hive complex type whose types deserialize
      fce73b6 [Steve Loughran] SPARK-8064 poms rely implicitly on the version of kryo chill provides
      fd3aa5d [Steve Loughran] SPARK-8064 version of hive to d/l from ivy is 1.2.1
      dc73ece [Steve Loughran] SPARK-8064 revert to master's determinstic pushdown strategy
      d3c1e4a [Steve Loughran] SPARK-8064 purge UnionType
      051cc21 [Steve Loughran] SPARK-8064 switch to an unshaded version of hive-exec-core, which must have been built with Kryo 2.21. This currently looks for a (locally built) version 1.2.1.spark
      6684c60 [Steve Loughran] SPARK-8064 ignore RTE raised in blocking process.exitValue() call
      e6121e5 [Steve Loughran] SPARK-8064 address review comments
      aa43dc6 [Steve Loughran] SPARK-8064  more robust teardown on JavaMetastoreDatasourcesSuite
      f2bff01 [Steve Loughran] SPARK-8064 better takeup of asynchronously caught error text
      8b1ef38 [Steve Loughran] SPARK-8064: on failures executing spark-submit in HiveSparkSubmitSuite, print command line and all logged output.
      5a9ce6b [Steve Loughran] SPARK-8064 add explicit reason for kv split failure, rather than array OOB. *does not address the issue*
      642b63a [Steve Loughran] SPARK-8064 reinstate something cut briefly during rebasing
      97194dc [Steve Loughran] SPARK-8064 add extra logging to the YarnClusterSuite classpath test. There should be no reason why this is failing on jenkins, but as it is (and presumably its CP-related), improve the logging including any exception raised.
      335357f [Steve Loughran] SPARK-8064 fail fast on thrive process spawning tests on exit codes and/or error string patterns seen in log.
      3ed872f [Steve Loughran] SPARK-8064 rename field double to  dbl
      bca55e5 [Steve Loughran] SPARK-8064 missed one of the `date` escapes
      41d6479 [Steve Loughran] SPARK-8064 wrap tests with withTable() calls to avoid table-exists exceptions
      2bc29a4 [Steve Loughran] SPARK-8064 ParquetSuites to escape `date` field name
      1ab9bc4 [Steve Loughran] SPARK-8064 TestHive to use sered2.thrift.test.Complex
      bf3a249 [Steve Loughran] SPARK-8064: more resubmit than fix; tighten startup timeout to 60s. Still no obvious reason why jersey server code in spark-assembly isn't being picked up -it hasn't been shaded
      c829b8f [Steve Loughran] SPARK-8064: reinstate yarn-rm-server dependencies to hive-exec to ensure that jersey server is on classpath on hadoop versions < 2.6
      0b0f738 [Steve Loughran] SPARK-8064: thrift server startup to fail fast on any exception in the main thread
      13abaf1 [Steve Loughran] SPARK-8064 Hive compatibilty tests sin sync with explain/show output from Hive 1.2.1
      d14d5ea [Steve Loughran] SPARK-8064: DATE is now a predicate; you can't use it as a field in select ops
      26eef1c [Steve Loughran] SPARK-8064: HIVE-9039 renamed TOK_UNION => TOK_UNIONALL while adding TOK_UNIONDISTINCT
      3d64523 [Steve Loughran] SPARK-8064 improve diagns on uknown token; fix scalastyle failure
      d0360f6 [Steve Loughran] SPARK-8064: delicate merge in of the branch vanzin/hive-1.1
      1126e5a [Steve Loughran] SPARK-8064: name of unrecognized file format wasn't appearing in error text
      8cb09c4 [Steve Loughran] SPARK-8064: test resilience/assertion improvements. Independent of the rest of the work; can be backported to earlier versions
      dec12cb [Steve Loughran] SPARK-8064: when a CLI suite test fails include the full output text in the raised exception; this ensures that the stdout/stderr is included in jenkins reports, so it becomes possible to diagnose the cause.
      463a670 [Steve Loughran] SPARK-8064 run-tests.py adds a hadoop-2.6 profile, and changes info messages to say "w/Hive 1.2.1" in console output
      2531099 [Steve Loughran] SPARK-8064 successful attempt to get rid of pentaho as a transitive dependency of hive-exec
      1d59100 [Steve Loughran] SPARK-8064 (unsuccessful) attempt to get rid of pentaho as a transitive dependency of hive-exec
      75733fc [Steve Loughran] SPARK-8064 change thrift binary startup message to "Starting ThriftBinaryCLIService on port"
      3ebc279 [Steve Loughran] SPARK-8064 move strings used to check for http/bin thrift services up into constants
      c80979d [Steve Loughran] SPARK-8064: SparkSQLCLIDriver drops remote mode support. CLISuite Tests pass instead of timing out: undetected regression?
      27e8370 [Steve Loughran] SPARK-8064 fix some style & IDE warnings
      00e50d6 [Steve Loughran] SPARK-8064 stop excluding hive shims from dependency (commented out , for now)
      cb4f142 [Steve Loughran] SPARK-8054 cut pentaho dependency from calcite
      f7aa9cb [Steve Loughran] SPARK-8064 everything compiles with some commenting and moving of classes into a hive package
      6c310b4 [Steve Loughran] SPARK-8064 subclass  Hive ServerOptionsProcessor to make it public again
      f61a675 [Steve Loughran] SPARK-8064 thrift server switched to Hive 1.2.1, though it doesn't compile everywhere
      4890b9d [Steve Loughran] SPARK-8064, build against Hive 1.2.1
      a2409d1c
  4. Aug 01, 2015
  5. Jul 30, 2015
    • Marcelo Vanzin's avatar
      [SPARK-9388] [YARN] Make executor info log messages easier to read. · ab78b1d2
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7706 from vanzin/SPARK-9388 and squashes the following commits:
      
      028b990 [Marcelo Vanzin] Single log statement.
      3c5fb6a [Marcelo Vanzin] YARN not Yarn.
      5bcd7a0 [Marcelo Vanzin] [SPARK-9388] [yarn] Make executor info log messages easier to read.
      ab78b1d2
    • Mridul Muralidharan's avatar
      [SPARK-8297] [YARN] Scheduler backend is not notified in case node fails in YARN · e5353465
      Mridul Muralidharan authored
      This change adds code to notify the scheduler backend when a container dies in YARN.
      
      Author: Mridul Muralidharan <mridulm@yahoo-inc.com>
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7431 from vanzin/SPARK-8297 and squashes the following commits:
      
      471e4a0 [Marcelo Vanzin] Fix unit test after merge.
      d4adf4e [Marcelo Vanzin] Merge branch 'master' into SPARK-8297
      3b262e8 [Marcelo Vanzin] Merge branch 'master' into SPARK-8297
      537da6f [Marcelo Vanzin] Make an expected log less scary.
      04dc112 [Marcelo Vanzin] Use driver <-> AM communication to send "remove executor" request.
      8855b97 [Marcelo Vanzin] Merge remote-tracking branch 'mridul/fix_yarn_scheduler_bug' into SPARK-8297
      687790f [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
      e1b0067 [Mridul Muralidharan] Fix failing testcase, fix merge issue from our 1.3 -> master
      9218fcc [Mridul Muralidharan] Fix failing testcase
      362d64a [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
      62ad0cc [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
      bbf8811 [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
      9ee1307 [Mridul Muralidharan] Fix SPARK-8297
      a3a0f01 [Mridul Muralidharan] Fix SPARK-8297
      e5353465
  6. Jul 27, 2015
    • jerryshao's avatar
      [SPARK-4352] [YARN] [WIP] Incorporate locality preferences in dynamic allocation requests · ab625956
      jerryshao authored
      Currently there's no locality preference for container request in YARN mode, this will affect the performance if fetching data remotely, so here proposed to add locality in Yarn dynamic allocation mode.
      
      Ping sryza, please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #6394 from jerryshao/SPARK-4352 and squashes the following commits:
      
      d45fecb [jerryshao] Add documents
      6c3fe5c [jerryshao] Fix bug
      8db6c0e [jerryshao] Further address the comments
      2e2b2cb [jerryshao] Fix rebase compiling problem
      ce5f096 [jerryshao] Fix style issue
      7f7df95 [jerryshao] Fix rebase issue
      9ca9e07 [jerryshao] Code refactor according to comments
      d3e4236 [jerryshao] Further address the comments
      5e7a593 [jerryshao] Fix bug introduced code rebase
      9ca7783 [jerryshao] Style changes
      08317f9 [jerryshao] code and comment refines
      65b2423 [jerryshao] Further address the comments
      a27c587 [jerryshao] address the comment
      27faabc [jerryshao] redundant code remove
      9ce06a1 [jerryshao] refactor the code
      f5ba27b [jerryshao] Style fix
      2c6cc8a [jerryshao] Fix bug and add unit tests
      0757335 [jerryshao] Consider the distribution of existed containers to recalculate the new container requests
      0ad66ff [jerryshao] Fix compile bugs
      1c20381 [jerryshao] Minor fix
      5ef2dc8 [jerryshao] Add docs and improve the code
      3359814 [jerryshao] Fix rebase and test bugs
      0398539 [jerryshao] reinitialize the new implementation
      67596d6 [jerryshao] Still fix the code
      654e1d2 [jerryshao] Fix some bugs
      45b1c89 [jerryshao] Further polish the algorithm
      dea0152 [jerryshao] Enable node locality information in YarnAllocator
      74bbcc6 [jerryshao] Support node locality for dynamic allocation initial commit
      ab625956
    • Hari Shreedharan's avatar
      [SPARK-8988] [YARN] Make sure driver log links appear in secure cluste… · c1be9f30
      Hari Shreedharan authored
      …r mode.
      
      The NodeReports API currently used does not work in secure mode since we do not get RM tokens. Instead this patch just uses environment vars exported by YARN to create the log links.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #7624 from harishreedharan/driver-logs-env and squashes the following commits:
      
      7368c7e [Hari Shreedharan] [SPARK-8988][YARN] Make sure driver log links appear in secure cluster mode.
      c1be9f30
  7. Jul 17, 2015
    • Hari Shreedharan's avatar
      [SPARK-8851] [YARN] In Client mode, make sure the client logs in and updates tokens · c043a3e9
      Hari Shreedharan authored
      In client side, the flow is SparkSubmit -> SparkContext -> yarn/Client. Since the yarn client only gets a cloned config and the staging dir is set here, it is not really possible to do re-logins in the SparkContext. So, do the initial logins in Spark Submit and do re-logins as we do now in the AM, but the Client behaves like an executor in this specific context and reads the credentials file to update the tokens. This way, even if the streaming context is started up from checkpoint - it is fine since we have logged in from SparkSubmit itself itself.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #7394 from harishreedharan/yarn-client-login and squashes the following commits:
      
      9a2166f [Hari Shreedharan] make it possible to use command line args and config parameters together.
      de08f57 [Hari Shreedharan] Fix import order.
      5c4fa63 [Hari Shreedharan] Add a comment explaining what is being done in YarnClientSchedulerBackend.
      c872caa [Hari Shreedharan] Fix typo in log message.
      2c80540 [Hari Shreedharan] Move token renewal to YarnClientSchedulerBackend.
      0c48ac2 [Hari Shreedharan] Remove direct use of ExecutorDelegationTokenUpdater in Client.
      26f8bfa [Hari Shreedharan] [SPARK-8851][YARN] In Client mode, make sure the client logs in and updates tokens.
      58b1969 [Hari Shreedharan] Simple attempt 1.
      c043a3e9
  8. Jul 16, 2015
  9. Jul 14, 2015
    • Josh Rosen's avatar
      [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses · 11e5c372
      Josh Rosen authored
      This pull request adds a Scalastyle regex rule which fails the style check if `Class.forName` is used directly.  `Class.forName` always loads classes from the default / system classloader, but in a majority of cases, we should be using Spark's own `Utils.classForName` instead, which tries to load classes from the current thread's context classloader and falls back to the classloader which loaded Spark when the context classloader is not defined.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7350 from JoshRosen/ban-Class.forName and squashes the following commits:
      
      e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      c0b7885 [Josh Rosen] Hopefully fix the last two cases
      d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first cleanup pass
      046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion.
      d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName
      11e5c372
  10. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
  11. Jul 08, 2015
  12. Jul 02, 2015
    • huangzhaowei's avatar
      [SPARK-8687] [YARN] Fix bug: Executor can't fetch the new set configuration in yarn-client · 1b0c8e61
      huangzhaowei authored
      Spark initi the properties CoarseGrainedSchedulerBackend.start
      ```scala
          // TODO (prashant) send conf instead of properties
          driverEndpoint = rpcEnv.setupEndpoint(
            CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties))
      ```
      Then the yarn logic will set some configuration but not update in this `properties`.
      So `Executor` won't gain the `properties`.
      
      [Jira](https://issues.apache.org/jira/browse/SPARK-8687)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #7066 from SaintBacchus/SPARK-8687 and squashes the following commits:
      
      1de4f48 [huangzhaowei] Ensure all necessary properties have already been set before startup ExecutorLaucher
      1b0c8e61
    • Ilya Ganelin's avatar
      [SPARK-3071] Increase default driver memory · 3697232b
      Ilya Ganelin authored
      I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place.
      
      Please let me know if I've missed anything.
      
      Will the spark-shell use the value within the command line builder during instantiation?
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #7132 from ilganeli/SPARK-3071 and squashes the following commits:
      
      4074164 [Ilya Ganelin] String fix
      271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071
      273b6e9 [Ilya Ganelin] Test fix
      fd67721 [Ilya Ganelin] Update JavaUtils.java
      26cc177 [Ilya Ganelin] test fix
      e5db35d [Ilya Ganelin] Fixed test failure
      39732a1 [Ilya Ganelin] merge fix
      a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each
      09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala
      19b6f25 [Ilya Ganelin] Missed one doc update
      2698a3d [Ilya Ganelin] Updated default value for driver memory
      3697232b
    • huangzhaowei's avatar
      [SPARK-8688] [YARN] Bug fix: disable the cache fs to gain the HDFS connection. · 646366b5
      huangzhaowei authored
      If `fs.hdfs.impl.disable.cache` was `false`(default), `FileSystem` will use the cached `DFSClient` which use old token.
      [AMDelegationTokenRenewer](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala#L196)
      ```scala
          val credentials = UserGroupInformation.getCurrentUser.getCredentials
          credentials.writeTokenStorageFile(tempTokenPath, discachedConfiguration)
      ```
      Although the `credentials` had the new Token, but it still use the cached client and old token.
      So It's better to set the `fs.hdfs.impl.disable.cache`  as `true` to avoid token expired.
      
      [Jira](https://issues.apache.org/jira/browse/SPARK-8688)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #7069 from SaintBacchus/SPARK-8688 and squashes the following commits:
      
      f94cd0b [huangzhaowei] modify function parameter
      8fb9eb9 [huangzhaowei] explicit  the comment
      0cd55c9 [huangzhaowei] Rename function name to be an accurate one
      cf776a1 [huangzhaowei] [SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection.
      646366b5
    • Devaraj K's avatar
      [SPARK-8754] [YARN] YarnClientSchedulerBackend doesn't stop gracefully in failure conditions · 792fcd80
      Devaraj K authored
      In YarnClientSchedulerBackend.stop(), added a check for monitorThread.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #7153 from devaraj-kavali/master and squashes the following commits:
      
      66be9ad [Devaraj K] https://issues.apache.org/jira/browse/SPARK-8754 YarnClientSchedulerBackend doesn't stop gracefully in failure conditions
      792fcd80
  13. Jun 28, 2015
    • Josh Rosen's avatar
      [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all · f5100451
      Josh Rosen authored
      Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
      
      See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
      
      70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
      f5100451
  14. Jun 26, 2015
    • Marcelo Vanzin's avatar
      [SPARK-8302] Support heterogeneous cluster install paths on YARN. · 37bf76a2
      Marcelo Vanzin authored
      Some users have Hadoop installations on different paths across
      their cluster. Currently, that makes it hard to set up some
      configuration in Spark since that requires hardcoding paths to
      jar files or native libraries, which wouldn't work on such a cluster.
      
      This change introduces a couple of YARN-specific configurations
      that instruct the backend to replace certain paths when launching
      remote processes. That way, if the configuration says the Spark
      jar is in "/spark/spark.jar", and also says that "/spark" should be
      replaced with "{{SPARK_INSTALL_DIR}}", YARN will start containers
      in the NMs with "{{SPARK_INSTALL_DIR}}/spark.jar" as the location
      of the jar.
      
      Coupled with YARN's environment whitelist (which allows certain
      env variables to be exposed to containers), this allows users to
      support such heterogeneous environments, as long as a single
      replacement is enough. (Otherwise, this feature would need to be
      extended to support multiple path replacements.)
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6752 from vanzin/SPARK-8302 and squashes the following commits:
      
      4bff8d4 [Marcelo Vanzin] Add docs, rename configs.
      0aa2a02 [Marcelo Vanzin] Only do replacement for paths that need it.
      2e9cc9d [Marcelo Vanzin] Style.
      a5e1f68 [Marcelo Vanzin] [SPARK-8302] Support heterogeneous cluster install paths on YARN.
      37bf76a2
  15. Jun 19, 2015
    • Carson Wang's avatar
      [SPARK-8387] [FOLLOWUP ] [WEBUI] Update driver log URL to show only 4096 bytes · 54557f35
      Carson Wang authored
      This is to follow up #6834 , update the driver log URL as well for consistency.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #6878 from carsonwang/logUrl and squashes the following commits:
      
      13be948 [Carson Wang] update log URL in YarnClusterSuite
      a0004f4 [Carson Wang] Update driver log URL to show only 4096 bytes
      54557f35
  16. Jun 16, 2015
  17. Jun 10, 2015
    • WangTaoTheTonic's avatar
      [SPARK-8273] Driver hangs up when yarn shutdown in client mode · 5014d0ed
      WangTaoTheTonic authored
      In client mode, if yarn was shut down with spark application running, the application will hang up after several retries(default: 30) because the exception throwed by YarnClientImpl could not be caught by upper level, we should exit in case that user can not be aware that.
      
      The exception we wanna catch is [here](https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java#L122), and I try to fix it refer to [MR](https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java#L320).
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #6717 from WangTaoTheTonic/SPARK-8273 and squashes the following commits:
      
      28752d6 [WangTaoTheTonic] catch the throwed exception
      5014d0ed
    • Marcelo Vanzin's avatar
      [SPARK-5479] [YARN] Handle --py-files correctly in YARN. · 38112905
      Marcelo Vanzin authored
      The bug description is a little misleading: the actual issue is that
      .py files are not handled correctly when distributed by YARN. They're
      added to "spark.submit.pyFiles", which, when processed by context.py,
      explicitly whitelists certain extensions (see PACKAGE_EXTENSIONS),
      and that does not include .py files.
      
      On top of that, archives were not handled at all! They made it to the
      driver's python path, but never made it to executors, since the mechanism
      used to propagate their location (spark.submit.pyFiles) only works on
      the driver side.
      
      So, instead, ignore "spark.submit.pyFiles" and just build PYTHONPATH
      correctly for both driver and executors. Individual .py files are
      placed in a subdirectory of the container's local dir in the cluster,
      which is then added to the python path. Archives are added directly.
      
      The change, as a side effect, ends up solving the symptom described
      in the bug. The issue was not that the files were not being distributed,
      but that they were never made visible to the python application
      running under Spark.
      
      Also included is a proper unit test for running python on YARN, which
      broke in several different ways with the previous code.
      
      A short walk around of the changes:
      - SparkSubmit does not try to be smart about how YARN handles python
        files anymore. It just passes down the configs to the YARN client
        code.
      - The YARN client distributes python files and archives differently,
        placing the files in a subdirectory.
      - The YARN client now sets PYTHONPATH for the processes it launches;
        to properly handle different locations, it uses YARN's support for
        embedding env variables, so to avoid YARN expanding those at the
        wrong time, SparkConf is now propagated to the AM using a conf file
        instead of command line options.
      - Because the Client initialization code is a maze of implicit
        dependencies, some code needed to be moved around to make sure
        all needed state was available when the code ran.
      - The pyspark tests in YarnClusterSuite now actually distribute and try
        to use both a python file and an archive containing a different python
        module. Also added a yarn-client tests for completeness.
      - I cleaned up some of the code around distributing files to YARN, to
        avoid adding more copied & pasted code to handle the new files being
        distributed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6360 from vanzin/SPARK-5479 and squashes the following commits:
      
      bcaf7e6 [Marcelo Vanzin] Feedback.
      c47501f [Marcelo Vanzin] Fix yarn-client mode.
      46b1d0c [Marcelo Vanzin] Merge branch 'master' into SPARK-5479
      c743778 [Marcelo Vanzin] Only pyspark cares about python archives.
      c8e5a82 [Marcelo Vanzin] Actually run pyspark in client mode.
      705571d [Marcelo Vanzin] Move some code to the YARN module.
      1dd4d0c [Marcelo Vanzin] Review feedback.
      71ee736 [Marcelo Vanzin] Merge branch 'master' into SPARK-5479
      220358b [Marcelo Vanzin] Scalastyle.
      cdbb990 [Marcelo Vanzin] Merge branch 'master' into SPARK-5479
      7fe3cd4 [Marcelo Vanzin] No need to distribute primary file to executors.
      09045f1 [Marcelo Vanzin] Style.
      943cbf4 [Marcelo Vanzin] [SPARK-5479] [yarn] Handle --py-files correctly in YARN.
      38112905
  18. Jun 08, 2015
    • linweizhong's avatar
      [SPARK-7705] [YARN] Cleanup of .sparkStaging directory fails if application is killed · eacd4a92
      linweizhong authored
      As I have tested, if we cancel or kill the app then the final status may be undefined, killed or succeeded, so clean up staging directory when appMaster exit at any final application status.
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #6409 from Sephiroth-Lin/SPARK-7705 and squashes the following commits:
      
      3a5a0a5 [linweizhong] Update
      83dc274 [linweizhong] Update
      923d44d [linweizhong] Update
      0dd7c2d [linweizhong] Update
      b76a102 [linweizhong] Update code style
      7846b69 [linweizhong] Update
      bd6cf0d [linweizhong] Refactor
      aed9f18 [linweizhong] Clean up stagingDir when launch app on yarn
      95595c3 [linweizhong] Cleanup of .sparkStaging directory when AppMaster exit at any final application status
      eacd4a92
  19. Jun 06, 2015
    • Hari Shreedharan's avatar
      [SPARK-8136] [YARN] Fix flakiness in YarnClusterSuite. · ed2cc3ee
      Hari Shreedharan authored
      Instead of actually downloading the logs, just verify that the logs link is actually
      a URL and is in the expected format.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6680 from harishreedharan/simplify-am-log-tests and squashes the following commits:
      
      3183aeb [Hari Shreedharan] Remove check for hostname which can fail on machines with several hostnames. Removed some unused imports.
      50d69a7 [Hari Shreedharan] [SPARK-8136][YARN] Fix flakiness in YarnClusterSuite.
      ed2cc3ee
  20. Jun 03, 2015
    • zsxwing's avatar
      [SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout · 1d8669f1
      zsxwing authored
      Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits:
      
      607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
      1d8669f1
    • Marcelo Vanzin's avatar
      [SPARK-8059] [YARN] Wake up allocation thread when new requests arrive. · aa40c442
      Marcelo Vanzin authored
      This should help reduce latency for new executor allocations.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6600 from vanzin/SPARK-8059 and squashes the following commits:
      
      8387a3a [Marcelo Vanzin] [SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.
      aa40c442
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
  21. May 31, 2015
    • Reynold Xin's avatar
      [SPARK-3850] Trim trailing spaces for examples/streaming/yarn. · 564bc11e
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits:
      
      7b7b3a0 [Reynold Xin] Reset again.
      dc14597 [Reynold Xin] Reset scalastyle.
      cd556c4 [Reynold Xin] YARN, Kinesis, Flume.
      4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.
      564bc11e
  22. May 29, 2015
    • Andrew Or's avatar
      [SPARK-7558] Demarcate tests in unit-tests.log · 9eb222c1
      Andrew Or authored
      Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:
      
      ```
      ===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====
      
      15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents
      
      ...
      
      15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
      15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:
      
      ***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****
      
      ...
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:
      
      879b060 [Andrew Or] Fix compile after rebase
      d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      7790b6c [Andrew Or] Fix tests after logical merge conflict
      c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      c43ffc4 [Andrew Or] Fix tests?
      8882581 [Andrew Or] Fix tests
      ee22cda [Andrew Or] Fix log message
      fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      12d1e1b [Andrew Or] Various whitespace changes (minor)
      69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
      bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
      da0b12f [Andrew Or] Add core tests as dependencies in all modules
      f7d29ce [Andrew Or] Introduce base abstract class for all test suites
      9eb222c1
    • WangTaoTheTonic's avatar
      [SPARK-7524] [SPARK-7846] add configs for keytab and principal, pass these two... · a51b133d
      WangTaoTheTonic authored
      [SPARK-7524] [SPARK-7846] add configs for keytab and principal, pass these two configs with different way in different modes
      
      * As spark now supports long running service by updating tokens for namenode, but only accept parameters passed with "--k=v" format which is not very convinient. This patch add spark.* configs in properties file and system property.
      
      *  --principal and --keytabl options are passed to client but when we started thrift server or spark-shell these two are also passed into the Main class (org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 and org.apache.spark.repl.Main).
      In these two main class, arguments passed in will be processed with some 3rd libraries, which will lead to some error: "Invalid option: --principal" or "Unrecgnised option: --principal".
      We should pass these command args in different forms, say system properties.
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #6051 from WangTaoTheTonic/SPARK-7524 and squashes the following commits:
      
      e65699a [WangTaoTheTonic] change logic to loadEnvironments
      ebd9ea0 [WangTaoTheTonic] merge master
      ecfe43a [WangTaoTheTonic] pass keytab and principal seperately in different mode
      33a7f40 [WangTaoTheTonic] expand the use of the current configs
      08bb4e8 [WangTaoTheTonic] fix wrong cite
      73afa64 [WangTaoTheTonic] add configs for keytab and principal, move originals to internal
      a51b133d
    • Reynold Xin's avatar
      [SPARK-7929] Turn whitespace checker on for more token types. · 97a60cf7
      Reynold Xin authored
      This is the last batch of changes to complete SPARK-7929.
      
      Previous related PRs:
      https://github.com/apache/spark/pull/6480
      https://github.com/apache/spark/pull/6478
      https://github.com/apache/spark/pull/6477
      https://github.com/apache/spark/pull/6476
      https://github.com/apache/spark/pull/6475
      https://github.com/apache/spark/pull/6474
      https://github.com/apache/spark/pull/6473
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6487 from rxin/whitespace-lint and squashes the following commits:
      
      b33d43d [Reynold Xin] [SPARK-7929] Turn whitespace checker on for more token types.
      97a60cf7
  23. May 26, 2015
    • zsxwing's avatar
      [SPARK-6602] [CORE] Remove some places in core that calling SparkEnv.actorSystem · 9f742241
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6333 from zsxwing/remove-actor-system-usage and squashes the following commits:
      
      f125aa6 [zsxwing] Fix YarnAllocatorSuite
      ceadcf6 [zsxwing] Change the "port" parameter type of "AkkaUtils.address" to "int"; update ApplicationMaster and YarnAllocator to get the driverUrl from RpcEnv
      3239380 [zsxwing] Remove some places in core that calling SparkEnv.actorSystem
      9f742241
  24. May 21, 2015
    • Hari Shreedharan's avatar
      [SPARK-7657] [YARN] Add driver logs links in application UI, in cluster mode. · 956c4c91
      Hari Shreedharan authored
      This PR adds the URLs to the driver logs to `SparkListenerApplicationStarted` event, which is later used by the `ExecutorsListener` to populate the URLs to the driver logs in its own state. This info is then used when the UI is rendered to display links to the logs.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6166 from harishreedharan/am-log-link and squashes the following commits:
      
      943fc4f [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into am-log-link
      9e5c04b [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into am-log-link
      b3f9b9d [Hari Shreedharan] Updated comment based on feedback.
      0840a95 [Hari Shreedharan] Move the result and sc.stop back to original location, minor import changes.
      537a2f7 [Hari Shreedharan] Add test to ensure the log urls are populated and valid.
      4033725 [Hari Shreedharan] Adding comments explaining how node reports are used to get the log urls.
      6c5c285 [Hari Shreedharan] Import order.
      346f4ea [Hari Shreedharan] Review feedback fixes.
      629c1dc [Hari Shreedharan] Cleanup.
      99fb1a3 [Hari Shreedharan] Send the log urls in App start event, to ensure that other listeners are not affected.
      c0de336 [Hari Shreedharan] Ensure new unit test cleans up after itself.
      50cdae3 [Hari Shreedharan] Added unit test, made the approach generic.
      402e8e4 [Hari Shreedharan] Use `NodeReport` to get the URL for the logs. Also, make the environment variables generic so other cluster managers can use them as well.
      1cf338f [Hari Shreedharan] [SPARK-7657][YARN] Add driver link in application UI, in cluster mode.
      956c4c91
    • Andrew Or's avatar
      [SPARK-7775] YARN AM negative sleep exception · 15680aee
      Andrew Or authored
      ```
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      Exception in thread "Reporter" java.lang.IllegalArgumentException: timeout value is negative
        at java.lang.Thread.sleep(Native Method)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:356)
      ```
      This kills the reporter thread. This is caused by #6082 (merged into master branch only).
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6305 from andrewor14/yarn-negative-sleep and squashes the following commits:
      
      b970770 [Andrew Or] Use existing cap
      56d6e5e [Andrew Or] Avoid negative sleep
      15680aee
  25. May 20, 2015
    • ehnalis's avatar
      [SPARK-7533] [YARN] Decrease spacing between AM-RM heartbeats. · 3ddf051e
      ehnalis authored
      Added faster RM-heartbeats on pending container allocations with multiplicative back-off.
      Also updated related documentations.
      
      Author: ehnalis <zoltan.zvara@gmail.com>
      
      Closes #6082 from ehnalis/yarn and squashes the following commits:
      
      a1d2101 [ehnalis] MIss-spell fixed.
      90f8ba4 [ehnalis] Changed default HB values.
      6120295 [ehnalis] Removed the bug, when allocation heartbeat would not start from initial value.
      08bac63 [ehnalis] Refined style, grammar, removed duplicated code.
      073d283 [ehnalis] [SPARK-7533] [YARN] Decrease spacing between AM-RM heartbeats.
      d4408c9 [ehnalis] [SPARK-7533] [YARN] Decrease spacing between AM-RM heartbeats.
      3ddf051e
  26. May 15, 2015
    • Kousuke Saruta's avatar
      [SPARK-7503] [YARN] Resources in .sparkStaging directory can't be cleaned up on error · c64ff803
      Kousuke Saruta authored
      When we run applications on YARN with cluster mode, uploaded resources on .sparkStaging directory can't be cleaned up in case of failure of uploading local resources.
      
      You can see this issue by running following command.
      ```
      bin/spark-submit --master yarn --deploy-mode cluster --class <someClassName> <non-existing-jar>
      ```
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6026 from sarutak/delete-uploaded-resources-on-error and squashes the following commits:
      
      caef9f4 [Kousuke Saruta] Fixed style
      882f921 [Kousuke Saruta] Wrapped Client#submitApplication with try/catch blocks in order to delete resources on error
      1786ca4 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into delete-uploaded-resources-on-error
      f61071b [Kousuke Saruta] Fixed cleanup problem
      c64ff803
  27. May 14, 2015
    • FavioVazquez's avatar
      [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions · 7fb715de
      FavioVazquez authored
      Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.
      
      Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly.
      
      Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.
      
      Author: FavioVazquez <favio.vazquezp@gmail.com>
      
      Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:
      
      11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
      379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
      3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
      31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
      cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about  hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
      83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
      93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
      668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
      fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh  due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
      0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
      a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
      199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
      88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
      70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
      287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
      1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
      6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
      7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
      7fb715de
  28. May 11, 2015
    • Sandy Ryza's avatar
      [SPARK-6470] [YARN] Add support for YARN node labels. · 82fee9d9
      Sandy Ryza authored
      This is difficult to write a test for because it relies on the latest version of YARN, but I verified manually that the patch does pass along the label expression on this version and containers are successfully launched.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #5242 from sryza/sandy-spark-6470 and squashes the following commits:
      
      6af87b9 [Sandy Ryza] Change info to warning
      6e22d99 [Sandy Ryza] [YARN] SPARK-6470.  Add support for YARN node labels.
      82fee9d9
  29. May 08, 2015
    • Ashwin Shankar's avatar
      [SPARK-7451] [YARN] Preemption of executors is counted as failure causing Spark job to fail · b6c797b0
      Ashwin Shankar authored
      Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on.
      andrewor14
      
      Author: Ashwin Shankar <ashankar@netflix.com>
      
      Closes #5993 from ashwinshankar77/SPARK-7451 and squashes the following commits:
      
      90900cf [Ashwin Shankar] Fix log info message
      cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
      b6c797b0
Loading