Skip to content
Snippets Groups Projects
  1. Apr 25, 2017
  2. Apr 14, 2017
  3. Apr 04, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20191][YARN] Crate wrapper for RackResolver so tests can override it. · 00c12488
      Marcelo Vanzin authored
      
      Current test code tries to override the RackResolver used by setting
      configuration params, but because YARN libs statically initialize the
      resolver the first time it's used, that means that those configs don't
      really take effect during Spark tests.
      
      This change adds a wrapper class that easily allows tests to override the
      behavior of the resolver for the Spark code that uses it.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #17508 from vanzin/SPARK-20191.
      
      (cherry picked from commit 0736980f)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      00c12488
  4. Mar 29, 2017
    • jerryshao's avatar
      [SPARK-20059][YARN] Use the correct classloader for HBaseCredentialProvider · 103ff54d
      jerryshao authored
      
      ## What changes were proposed in this pull request?
      
      Currently we use system classloader to find HBase jars, if it is specified by `--jars`, then it will be failed with ClassNotFound issue. So here changing to use child classloader.
      
      Also putting added jars and main jar into classpath of submitted application in yarn cluster mode, otherwise HBase jars specified with `--jars` will never be honored in cluster mode, and fetching tokens in client side will always be failed.
      
      ## How was this patch tested?
      
      Unit test and local verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17388 from jerryshao/SPARK-20059.
      
      (cherry picked from commit c622a87c)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      103ff54d
  5. Mar 28, 2017
    • Patrick Wendell's avatar
      4964dbed
    • Patrick Wendell's avatar
      Preparing Spark release v2.1.1-rc2 · 02b165dc
      Patrick Wendell authored
      02b165dc
    • jerryshao's avatar
      [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of... · 4bcb7d67
      jerryshao authored
      [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode
      
      ## What changes were proposed in this pull request?
      
      In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user.
      
      So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above.
      
      ## How was this patch tested?
      
      Local verified in secure cluster.
      
      vanzin tgravescs mridulm  dongjoon-hyun please help to review, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17335 from jerryshao/SPARK-19995.
      
      (cherry picked from commit 17eddb35)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      4bcb7d67
  6. Mar 21, 2017
  7. Mar 07, 2017
  8. Feb 24, 2017
    • jerryshao's avatar
      [SPARK-19038][YARN] Avoid overwriting keytab configuration in yarn-client · ed9aaa31
      jerryshao authored
      
      ## What changes were proposed in this pull request?
      
      Because yarn#client will reset the `spark.yarn.keytab` configuration to point to the location in distributed file, so if user still uses the old `SparkConf` to create `SparkSession` with Hive enabled, it will read keytab from the path in distributed cached. This is OK for yarn cluster mode, but in yarn client mode where driver is running out of container, it will be failed to fetch the keytab.
      
      So here we should avoid reseting this configuration in the `yarn#client` and only overwriting it for AM, so using `spark.yarn.keytab` could get correct keytab path no matter running in client (keytab in local fs) or cluster (keytab in distributed cache) mode.
      
      ## How was this patch tested?
      
      Verified in security cluster.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16923 from jerryshao/SPARK-19038.
      
      (cherry picked from commit a920a436)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      ed9aaa31
  9. Feb 21, 2017
    • Kent Yao's avatar
      [SPARK-19626][YARN] Using the correct config to set credentials update time · 6edf02a8
      Kent Yao authored
      ## What changes were proposed in this pull request?
      
      In https://github.com/apache/spark/pull/14065
      
      , we introduced a configurable credential manager for Spark running on YARN. Also two configs `spark.yarn.credentials.renewalTime` and `spark.yarn.credentials.updateTime` were added, one is for the credential renewer and the other updater. But now we just query `spark.yarn.credentials.renewalTime` by mistake during CREDENTIALS UPDATING, where should be actually `spark.yarn.credentials.updateTime` .
      
      This PR fixes this mistake.
      
      ## How was this patch tested?
      
      existing test
      
      cc jerryshao vanzin
      
      Author: Kent Yao <yaooqinn@hotmail.com>
      
      Closes #16955 from yaooqinn/cred_update.
      
      (cherry picked from commit 7363dde6)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      6edf02a8
  10. Feb 14, 2017
  11. Jan 25, 2017
  12. Jan 02, 2017
  13. Dec 22, 2016
  14. Dec 15, 2016
  15. Dec 13, 2016
    • jerryshao's avatar
      [SPARK-18840][YARN] Avoid throw exception when getting token renewal interval... · d5c4a5d0
      jerryshao authored
      [SPARK-18840][YARN] Avoid throw exception when getting token renewal interval in non HDFS security environment
      
      ## What changes were proposed in this pull request?
      
      Fix `java.util.NoSuchElementException` when running Spark in non-hdfs security environment.
      
      In the current code, we assume `HDFS_DELEGATION_KIND` token will be found in Credentials. But in some cloud environments, HDFS is not required, so we should avoid this exception.
      
      ## How was this patch tested?
      
      Manually verified in local environment.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16265 from jerryshao/SPARK-18840.
      
      (cherry picked from commit 43298d15)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      d5c4a5d0
  16. Dec 08, 2016
  17. Nov 28, 2016
    • Marcelo Vanzin's avatar
      [SPARK-18547][CORE] Propagate I/O encryption key when executors register. · c4cbdc86
      Marcelo Vanzin authored
      
      This change modifies the method used to propagate encryption keys used during
      shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
      this change explicitly distributes the key using the messages exchanged between
      driver and executor during registration. When RPC encryption is enabled, this means
      key propagation is also secure.
      
      This allows shuffle encryption to work in non-YARN mode, which means that it's
      easier to write unit tests for areas of the code that are affected by the feature.
      
      The key is stored in the SecurityManager; because there are many instances of
      that class used in the code, the key is only guaranteed to exist in the instance
      managed by the SparkEnv. This path was chosen to avoid storing the key in the
      SparkConf, which would risk having the key being written to disk as part of the
      configuration (as, for example, is done when starting YARN applications).
      
      Tested by new and existing unit tests (which were moved from the YARN module to
      core), and by running apps with shuffle encryption enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15981 from vanzin/SPARK-18547.
      
      (cherry picked from commit 8b325b17)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      c4cbdc86
    • Patrick Wendell's avatar
      75d73d13
    • Patrick Wendell's avatar
      Preparing Spark release v2.1.0-rc1 · 80aabc0b
      Patrick Wendell authored
      80aabc0b
  18. Nov 25, 2016
  19. Nov 08, 2016
    • Kishor Patil's avatar
      [SPARK-18357] Fix yarn files/archive broken issue andd unit tests · 876eee2b
      Kishor Patil authored
      
      ## What changes were proposed in this pull request?
      
      The #15627 broke functionality with yarn --files --archives does not accept any files.
      This patch ensures that --files and --archives accept unique files.
      
      ## How was this patch tested?
      
      A. I added unit tests.
      B. Also, manually tested --files with --archives to throw exception if duplicate files are specified and continue if unique files are specified.
      
      Author: Kishor Patil <kpatil@yahoo-inc.com>
      
      Closes #15810 from kishorvpatil/SPARK18357.
      
      (cherry picked from commit 245e5a2f)
      Signed-off-by: default avatarTom Graves <tgraves@yahoo-inc.com>
      876eee2b
  20. Nov 03, 2016
  21. Nov 02, 2016
  22. Oct 26, 2016
  23. Oct 21, 2016
    • Jagadeesan's avatar
      [SPARK-17960][PYSPARK][UPGRADE TO PY4J 0.10.4] · 595893d3
      Jagadeesan authored
      ## What changes were proposed in this pull request?
      
      1) Upgrade the Py4J version on the Java side
      2) Update the py4j src zip file we bundle with Spark
      
      ## How was this patch tested?
      
      Existing doctests & unit tests pass
      
      Author: Jagadeesan <as2@us.ibm.com>
      
      Closes #15514 from jagadeesanas2/SPARK-17960.
      Unverified
      595893d3
  24. Sep 27, 2016
    • Weiqing Yang's avatar
      [SPARK-16757] Set up Spark caller context to HDFS and YARN · 6a68c5d7
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      
      1. Pass `jobId` to Task.
      2. Invoke Hadoop APIs.
          * A new function `setCallerContext` is added in `Utils`. `setCallerContext` function invokes APIs of   `org.apache.hadoop.ipc.CallerContext` to set up spark caller contexts, which will be written into `hdfs-audit.log` and Yarn RM audit log.
          * For HDFS: Spark sets up its caller context by invoking`org.apache.hadoop.ipc.CallerContext` in `Task` and Yarn `Client` and `ApplicationMaster`.
          * For Yarn: Spark sets up its caller context by invoking `org.apache.hadoop.ipc.CallerContext` in Yarn `Client`.
      
      ## How was this patch tested?
      Manual Tests against some Spark applications in Yarn client mode and Yarn cluster mode. Need to check if spark caller contexts are written into HDFS hdfs-audit.log and Yarn RM audit log successfully.
      
      For example, run SparkKmeans in Yarn client mode:
      ```
      ./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5
      ```
      
      **Before**:
      There will be no Spark caller context in records of `hdfs-audit.log` and Yarn RM audit log.
      
      **After**:
      Spark caller contexts will be written in records of `hdfs-audit.log` and Yarn RM audit log.
      
      These are records in `hdfs-audit.log`:
      ```
      2016-09-20 11:54:24,116 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_CLIENT_AppId_application_1474394339641_0005
      2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
      2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
      2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0
      ```
      ```
      2016-09-20 11:59:33,868 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=mkdirs	src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1474394339641_0006/container_1474394339641_0006_01_000001/spark-warehouse	dst=null	perm=wyang:supergroup:rwxr-xr-x	proto=rpc	callerContext=SPARK_APPLICATION_MASTER_AppId_application_1474394339641_0006_AttemptId_1
      2016-09-20 11:59:37,214 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
      2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
      2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0
      2016-09-20 11:59:42,391 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_3_AttemptNum_0
      ```
      This is a record in Yarn RM log:
      ```
      2016-09-20 11:59:24,050 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang	IP=127.0.0.1	OPERATION=Submit Application Request	TARGET=ClientRMService	RESULT=SUCCESS	APPID=application_1474394339641_0006	CALLERCONTEXT=SPARK_CLIENT_AppId_application_1474394339641_0006
      ```
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14659 from Sherry302/callercontextSubmit.
      6a68c5d7
  25. Sep 20, 2016
    • Marcelo Vanzin's avatar
      [SPARK-17611][YARN][TEST] Make shuffle service test really test auth. · 7e418e99
      Marcelo Vanzin authored
      Currently, the code is just swallowing exceptions, and not really checking
      whether the auth information was being recorded properly. Fix both problems,
      and also avoid tests inadvertently affecting other tests by modifying the
      shared config variable (by making it not shared).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15161 from vanzin/SPARK-17611.
      7e418e99
  26. Sep 14, 2016
    • Kishor Patil's avatar
      [SPARK-17511] Yarn Dynamic Allocation: Avoid marking released container as Failed · ff6e4cbd
      Kishor Patil authored
      ## What changes were proposed in this pull request?
      
      Due to race conditions, the ` assert(numExecutorsRunning <= targetNumExecutors)` can fail causing `AssertionError`. So removed the assertion, instead moved the conditional check before launching new container:
      ```
      java.lang.AssertionError: assertion failed
              at scala.Predef$.assert(Predef.scala:156)
              at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1.org$apache$spark$deploy$yarn$YarnAllocator$$anonfun$$updateInternalState$1(YarnAllocator.scala:489)
              at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:519)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      ```
      ## How was this patch tested?
      This was manually tested using a large ForkAndJoin job with Dynamic Allocation enabled to validate the failing job succeeds, without any such exception.
      
      Author: Kishor Patil <kpatil@yahoo-inc.com>
      
      Closes #15069 from kishorvpatil/SPARK-17511.
      ff6e4cbd
  27. Sep 09, 2016
    • Thomas Graves's avatar
      [SPARK-17433] YarnShuffleService doesn't handle moving credentials levelDb · a3981c28
      Thomas Graves authored
      The secrets leveldb isn't being moved if you run spark shuffle services without yarn nm recovery on and then turn it on.  This fixes that.  I unfortunately missed this when I ported the patch from our internal branch 2 to master branch due to the changes for the recovery path.  Note this only applies to master since it is the only place the yarn nm recovery dir is used.
      
      Unit tests ran and tested on 8 node cluster.  Fresh startup with NM recovery, fresh startup no nm recovery, switching between no nm recovery and recovery.  Also tested running applications to make sure wasn't affected by rolling upgrade.
      
      Author: Thomas Graves <tgraves@prevailsail.corp.gq1.yahoo.com>
      Author: Tom Graves <tgraves@apache.org>
      
      Closes #14999 from tgravescs/SPARK-17433.
      a3981c28
Loading