Skip to content
Snippets Groups Projects
  1. Nov 28, 2016
    • Marcelo Vanzin's avatar
      [SPARK-18547][CORE] Propagate I/O encryption key when executors register. · 8b325b17
      Marcelo Vanzin authored
      This change modifies the method used to propagate encryption keys used during
      shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
      this change explicitly distributes the key using the messages exchanged between
      driver and executor during registration. When RPC encryption is enabled, this means
      key propagation is also secure.
      
      This allows shuffle encryption to work in non-YARN mode, which means that it's
      easier to write unit tests for areas of the code that are affected by the feature.
      
      The key is stored in the SecurityManager; because there are many instances of
      that class used in the code, the key is only guaranteed to exist in the instance
      managed by the SparkEnv. This path was chosen to avoid storing the key in the
      SparkConf, which would risk having the key being written to disk as part of the
      configuration (as, for example, is done when starting YARN applications).
      
      Tested by new and existing unit tests (which were moved from the YARN module to
      core), and by running apps with shuffle encryption enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15981 from vanzin/SPARK-18547.
      8b325b17
    • Mark Grover's avatar
      [SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI · 237c3b96
      Mark Grover authored
      ## What changes were proposed in this pull request?
      
      This patch adds a new property called `spark.secret.redactionPattern` that
      allows users to specify a scala regex to decide which Spark configuration
      properties and environment variables in driver and executor environments
      contain sensitive information. When this regex matches the property or
      environment variable name, its value is redacted from the environment UI and
      various logs like YARN and event logs.
      
      This change uses this property to redact information from event logs and YARN
      logs. It also, updates the UI code to adhere to this property instead of
      hardcoding the logic to decipher which properties are sensitive.
      
      Here's an image of the UI post-redaction:
      ![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png)
      
      Here's the text in the YARN logs, post-redaction:
      ``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)``
      
      Here's the text in the event logs, post-redaction:
      ``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...``
      
      ## How was this patch tested?
      1. Unit tests are added to ensure that redaction works.
      2. A YARN job reading data off of S3 with confidential information
      (hadoop credential provider password) being provided in the environment
      variables of driver and executor. And, afterwards, logs were grepped to make
      sure that no mention of secret password was present. It was also ensure that
      the job was able to read the data off of S3 correctly, thereby ensuring that
      the sensitive information was being trickled down to the right places to read
      the data.
      3. The event logs were checked to make sure no mention of secret password was
      present.
      4. UI environment tab was checked to make sure there was no secret information
      being displayed.
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #15971 from markgrover/master_redaction.
      237c3b96
  2. Nov 25, 2016
  3. Nov 15, 2016
    • Weiqing Yang's avatar
      [SPARK-18417][YARN] Define 'spark.yarn.am.port' in yarn config object · 5bcb9a7f
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      This PR is to define 'spark.yarn.am.port' in yarn config.scala just like other Yarn configurations. That makes code easier to maintain.
      
      ## How was this patch tested?
      Build passed & tested some Yarn unit tests.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #15858 from weiqingy/yarn.
      5bcb9a7f
  4. Nov 11, 2016
    • Weiqing Yang's avatar
      [SPARK-16759][CORE] Add a configuration property to pass caller contexts of... · 3af89451
      Weiqing Yang authored
      [SPARK-16759][CORE] Add a configuration property to pass caller contexts of upstream applications into Spark
      
      ## What changes were proposed in this pull request?
      
      Many applications take Spark as a computing engine and run on it. This PR adds a configuration property `spark.log.callerContext` that can be used by Spark's upstream applications (e.g. Oozie) to set up their caller contexts into Spark. In the end, Spark will combine its own caller context with the caller contexts of its upstream applications, and write them into Yarn RM log and HDFS audit log.
      
      The audit log has a config to truncate the caller contexts passed in (default 128). The caller contexts will be sent over rpc, so it should be concise. The call context written into HDFS log and Yarn log consists of two parts: the information `A` specified by Spark itself and the value `B` of `spark.log.callerContext` property.  Currently `A` typically takes 64 to 74 characters,  so `B` can have up to 50 characters (mentioned in the doc `running-on-yarn.md`)
      ## How was this patch tested?
      
      Manual tests. I have run some Spark applications with `spark.log.callerContext` configuration in Yarn client/cluster mode, and verified that the caller contexts were written into Yarn RM log and HDFS audit log correctly.
      
      The ways to configure `spark.log.callerContext` property:
      - In spark-defaults.conf:
      
      ```
      spark.log.callerContext  infoSpecifiedByUpstreamApp
      ```
      - In app's source code:
      
      ```
      val spark = SparkSession
            .builder
            .appName("SparkKMeans")
            .config("spark.log.callerContext", "infoSpecifiedByUpstreamApp")
            .getOrCreate()
      ```
      
      When running on Spark Yarn cluster mode, the driver is unable to pass 'spark.log.callerContext' to Yarn client and AM since Yarn client and AM have already started before the driver performs `.config("spark.log.callerContext", "infoSpecifiedByUpstreamApp")`.
      
      The following  example shows the command line used to submit a SparkKMeans application and the corresponding records in Yarn RM log and HDFS audit log.
      
      Command:
      
      ```
      ./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5
      ```
      
      Yarn RM log:
      
      <img width="1440" alt="screen shot 2016-10-19 at 9 12 03 pm" src="https://cloud.githubusercontent.com/assets/8546874/19547050/7d2f278c-9649-11e6-9df8-8d5ff12609f0.png">
      
      HDFS audit log:
      
      <img width="1400" alt="screen shot 2016-10-19 at 10 18 14 pm" src="https://cloud.githubusercontent.com/assets/8546874/19547102/096060ae-964a-11e6-981a-cb28efd5a058.png">
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #15563 from weiqingy/SPARK-16759.
      3af89451
  5. Nov 08, 2016
    • Kishor Patil's avatar
      [SPARK-18357] Fix yarn files/archive broken issue andd unit tests · 245e5a2f
      Kishor Patil authored
      ## What changes were proposed in this pull request?
      
      The #15627 broke functionality with yarn --files --archives does not accept any files.
      This patch ensures that --files and --archives accept unique files.
      
      ## How was this patch tested?
      
      A. I added unit tests.
      B. Also, manually tested --files with --archives to throw exception if duplicate files are specified and continue if unique files are specified.
      
      Author: Kishor Patil <kpatil@yahoo-inc.com>
      
      Closes #15810 from kishorvpatil/SPARK18357.
      245e5a2f
  6. Nov 03, 2016
    • Kishor Patil's avatar
      [SPARK-18099][YARN] Fail if same files added to distributed cache for --files and --archives · 098e4ca9
      Kishor Patil authored
      ## What changes were proposed in this pull request?
      
      During spark-submit, if yarn dist cache is instructed to add same file under --files and --archives, This code change ensures the spark yarn distributed cache behaviour is retained i.e. to warn and fail if same files is mentioned in both --files and --archives.
      ## How was this patch tested?
      
      Manually tested:
      1. if same jar is mentioned in --jars and --files it will continue to submit the job.
      - basically functionality [SPARK-14423] #12203 is unchanged
        1. if same file is mentioned in --files and --archives it will fail to submit the job.
      
      Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
      
      … under archives and files
      
      Author: Kishor Patil <kpatil@yahoo-inc.com>
      
      Closes #15627 from kishorvpatil/spark18099.
      098e4ca9
  7. Nov 02, 2016
    • Jeff Zhang's avatar
      [SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode · 3c24299b
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      spark.files is still passed to driver in yarn mode, so SparkContext will still handle it which cause the error in the jira desc.
      
      ## How was this patch tested?
      
      Tested manually in a 5 node cluster. As this issue only happens in multiple node cluster, so I didn't write test for it.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #15669 from zjffdu/SPARK-18160.
      3c24299b
    • Jacek Laskowski's avatar
      [SPARK-18204][WEBUI] Remove SparkUI.appUIAddress · 70a5db7b
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Removing `appUIAddress` attribute since it is no longer in use.
      ## How was this patch tested?
      
      Local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #15603 from jaceklaskowski/sparkui-fixes.
      Unverified
      70a5db7b
  8. Oct 26, 2016
  9. Oct 21, 2016
    • Jagadeesan's avatar
      [SPARK-17960][PYSPARK][UPGRADE TO PY4J 0.10.4] · 595893d3
      Jagadeesan authored
      ## What changes were proposed in this pull request?
      
      1) Upgrade the Py4J version on the Java side
      2) Update the py4j src zip file we bundle with Spark
      
      ## How was this patch tested?
      
      Existing doctests & unit tests pass
      
      Author: Jagadeesan <as2@us.ibm.com>
      
      Closes #15514 from jagadeesanas2/SPARK-17960.
      Unverified
      595893d3
  10. Sep 27, 2016
    • Weiqing Yang's avatar
      [SPARK-16757] Set up Spark caller context to HDFS and YARN · 6a68c5d7
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      
      1. Pass `jobId` to Task.
      2. Invoke Hadoop APIs.
          * A new function `setCallerContext` is added in `Utils`. `setCallerContext` function invokes APIs of   `org.apache.hadoop.ipc.CallerContext` to set up spark caller contexts, which will be written into `hdfs-audit.log` and Yarn RM audit log.
          * For HDFS: Spark sets up its caller context by invoking`org.apache.hadoop.ipc.CallerContext` in `Task` and Yarn `Client` and `ApplicationMaster`.
          * For Yarn: Spark sets up its caller context by invoking `org.apache.hadoop.ipc.CallerContext` in Yarn `Client`.
      
      ## How was this patch tested?
      Manual Tests against some Spark applications in Yarn client mode and Yarn cluster mode. Need to check if spark caller contexts are written into HDFS hdfs-audit.log and Yarn RM audit log successfully.
      
      For example, run SparkKmeans in Yarn client mode:
      ```
      ./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5
      ```
      
      **Before**:
      There will be no Spark caller context in records of `hdfs-audit.log` and Yarn RM audit log.
      
      **After**:
      Spark caller contexts will be written in records of `hdfs-audit.log` and Yarn RM audit log.
      
      These are records in `hdfs-audit.log`:
      ```
      2016-09-20 11:54:24,116 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_CLIENT_AppId_application_1474394339641_0005
      2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
      2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
      2016-09-20 11:54:28,164 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0005_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0
      ```
      ```
      2016-09-20 11:59:33,868 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=mkdirs	src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1474394339641_0006/container_1474394339641_0006_01_000001/spark-warehouse	dst=null	perm=wyang:supergroup:rwxr-xr-x	proto=rpc	callerContext=SPARK_APPLICATION_MASTER_AppId_application_1474394339641_0006_AttemptId_1
      2016-09-20 11:59:37,214 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_1_AttemptNum_0
      2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_2_AttemptNum_0
      2016-09-20 11:59:37,215 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_0_AttemptNum_0
      2016-09-20 11:59:42,391 INFO FSNamesystem.audit: allowed=true	ugi=wyang (auth:SIMPLE)	ip=/127.0.0.1	cmd=open	src=/lr_big.txt	dst=null	perm=null	proto=rpc	callerContext=SPARK_TASK_AppId_application_1474394339641_0006_AttemptId_1_JobId_0_StageId_0_AttemptId_0_TaskId_3_AttemptNum_0
      ```
      This is a record in Yarn RM log:
      ```
      2016-09-20 11:59:24,050 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=wyang	IP=127.0.0.1	OPERATION=Submit Application Request	TARGET=ClientRMService	RESULT=SUCCESS	APPID=application_1474394339641_0006	CALLERCONTEXT=SPARK_CLIENT_AppId_application_1474394339641_0006
      ```
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14659 from Sherry302/callercontextSubmit.
      6a68c5d7
  11. Sep 20, 2016
    • Marcelo Vanzin's avatar
      [SPARK-17611][YARN][TEST] Make shuffle service test really test auth. · 7e418e99
      Marcelo Vanzin authored
      Currently, the code is just swallowing exceptions, and not really checking
      whether the auth information was being recorded properly. Fix both problems,
      and also avoid tests inadvertently affecting other tests by modifying the
      shared config variable (by making it not shared).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15161 from vanzin/SPARK-17611.
      7e418e99
  12. Sep 14, 2016
    • Kishor Patil's avatar
      [SPARK-17511] Yarn Dynamic Allocation: Avoid marking released container as Failed · ff6e4cbd
      Kishor Patil authored
      ## What changes were proposed in this pull request?
      
      Due to race conditions, the ` assert(numExecutorsRunning <= targetNumExecutors)` can fail causing `AssertionError`. So removed the assertion, instead moved the conditional check before launching new container:
      ```
      java.lang.AssertionError: assertion failed
              at scala.Predef$.assert(Predef.scala:156)
              at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1.org$apache$spark$deploy$yarn$YarnAllocator$$anonfun$$updateInternalState$1(YarnAllocator.scala:489)
              at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:519)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      ```
      ## How was this patch tested?
      This was manually tested using a large ForkAndJoin job with Dynamic Allocation enabled to validate the failing job succeeds, without any such exception.
      
      Author: Kishor Patil <kpatil@yahoo-inc.com>
      
      Closes #15069 from kishorvpatil/SPARK-17511.
      ff6e4cbd
  13. Sep 09, 2016
    • Thomas Graves's avatar
      [SPARK-17433] YarnShuffleService doesn't handle moving credentials levelDb · a3981c28
      Thomas Graves authored
      The secrets leveldb isn't being moved if you run spark shuffle services without yarn nm recovery on and then turn it on.  This fixes that.  I unfortunately missed this when I ported the patch from our internal branch 2 to master branch due to the changes for the recovery path.  Note this only applies to master since it is the only place the yarn nm recovery dir is used.
      
      Unit tests ran and tested on 8 node cluster.  Fresh startup with NM recovery, fresh startup no nm recovery, switching between no nm recovery and recovery.  Also tested running applications to make sure wasn't affected by rolling upgrade.
      
      Author: Thomas Graves <tgraves@prevailsail.corp.gq1.yahoo.com>
      Author: Tom Graves <tgraves@apache.org>
      
      Closes #14999 from tgravescs/SPARK-17433.
      a3981c28
  14. Sep 07, 2016
    • Liwei Lin's avatar
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of... · 3ce3a282
      Liwei Lin authored
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ArrayBuffer.append(A) in performance critical paths
      
      ## What changes were proposed in this pull request?
      
      We should generally use `ArrayBuffer.+=(A)` rather than `ArrayBuffer.append(A)`, because `append(A)` would involve extra boxing / unboxing.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14914 from lw-lin/append_to_plus_eq_v2.
      3ce3a282
  15. Sep 06, 2016
    • Marcelo Vanzin's avatar
      [SPARK-15891][YARN] Clean up some logging in the YARN AM. · 0bd00ff2
      Marcelo Vanzin authored
      To make the log file more readable, rework some of the logging done
      by the AM:
      
      - log executor command / env just once, since they're all almost the same;
        the information that changes, such as executor ID, is already available
        in other log messages.
      - avoid printing logs when nothing happens, especially when updating the
        container requests in the allocator.
      - print fewer log messages when requesting many unlocalized executors,
        instead of repeating the same message multiple times.
      - removed some logs that seemed unnecessary.
      
      In the process, I slightly fixed up the wording in a few log messages, and
      did some minor clean up of method arguments that were redundant.
      
      Tested by running existing unit tests, and analyzing the logs of an
      application that exercises dynamic allocation by forcing executors
      to be allocated and be killed in waves.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14943 from vanzin/SPARK-15891.
      0bd00ff2
  16. Sep 02, 2016
    • Thomas Graves's avatar
      [SPARK-16711] YarnShuffleService doesn't re-init properly on YARN rolling upgrade · e79962f2
      Thomas Graves authored
      The Spark Yarn Shuffle Service doesn't re-initialize the application credentials early enough which causes any other spark executors trying to fetch from that node during a rolling upgrade to fail with "java.lang.NullPointerException: Password cannot be null if SASL is enabled".  Right now the spark shuffle service relies on the Yarn nodemanager to re-register the applications, unfortunately this is after we open the port for other executors to connect. If other executors connected before the re-register they get a null pointer exception which isn't a re-tryable exception and cause them to fail pretty quickly. To solve this I added another leveldb file so that it can save and re-initialize all the applications before opening the port for other executors to connect to it.  Adding another leveldb was simpler from the code structure point of view.
      
      Most of the code changes are moving things to common util class.
      
      Patch was tested manually on a Yarn cluster with rolling upgrade was happing while spark job was running. Without the patch I consistently get the NullPointerException, with the patch the job gets a few Connection refused exceptions but the retries kick in and the it succeeds.
      
      Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
      
      Closes #14718 from tgravescs/SPARK-16711.
      e79962f2
  17. Sep 01, 2016
    • Angus Gerry's avatar
      [SPARK-16533][CORE] resolve deadlocking in driver when executors die · a0aac4b7
      Angus Gerry authored
      ## What changes were proposed in this pull request?
      This pull request reverts the changes made as a part of #14605, which simply side-steps the deadlock issue. Instead, I propose the following approach:
      * Use `scheduleWithFixedDelay` when calling `ExecutorAllocationManager.schedule` for scheduling executor requests. The intent of this is that if invocations are delayed beyond the default schedule interval on account of lock contention, then we avoid a situation where calls to `schedule` are made back-to-back, potentially releasing and then immediately reacquiring these locks - further exacerbating contention.
      * Replace a number of calls to `askWithRetry` with `ask` inside of message handling code in `CoarseGrainedSchedulerBackend` and its ilk. This allows us queue messages with the relevant endpoints, release whatever locks we might be holding, and then block whilst awaiting the response. This change is made at the cost of being able to retry should sending the message fail, as retrying outside of the lock could easily cause race conditions if other conflicting messages have been sent whilst awaiting a response. I believe this to be the lesser of two evils, as in many cases these RPC calls are to process local components, and so failures are more likely to be deterministic, and timeouts are more likely to be caused by lock contention.
      
      ## How was this patch tested?
      Existing tests, and manual tests under yarn-client mode.
      
      Author: Angus Gerry <angolon@gmail.com>
      
      Closes #14710 from angolon/SPARK-16533.
      a0aac4b7
  18. Aug 30, 2016
    • Ferdinand Xu's avatar
      [SPARK-5682][CORE] Add encrypted shuffle in spark · 4b4e329e
      Ferdinand Xu authored
      This patch is using Apache Commons Crypto library to enable shuffle encryption support.
      
      Author: Ferdinand Xu <cheng.a.xu@intel.com>
      Author: kellyzly <kellyzly@126.com>
      
      Closes #8880 from winningsix/SPARK-10771.
      4b4e329e
  19. Aug 24, 2016
    • Sean Owen's avatar
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same... · 0b3a4be9
      Sean Owen authored
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
      
      ## What changes were proposed in this pull request?
      
      Update to py4j 0.10.3 to enable JAVA_HOME support
      
      ## How was this patch tested?
      
      Pyspark tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14748 from srowen/SPARK-16781.
      0b3a4be9
  20. Aug 17, 2016
    • Steve Loughran's avatar
      [SPARK-16736][CORE][SQL] purge superfluous fs calls · cc97ea18
      Steve Loughran authored
      A review of the code, working back from Hadoop's `FileSystem.exists()` and `FileSystem.isDirectory()` code, then removing uses of the calls when superfluous.
      
      1. delete is harmless if called on a nonexistent path, so don't do any checks before deletes
      1. any `FileSystem.exists()`  check before `getFileStatus()` or `open()` is superfluous as the operation itself does the check. Instead the `FileNotFoundException` is caught and triggers the downgraded path. When a `FileNotFoundException` was thrown before, the code still creates a new FNFE with the error messages. Though now the inner exceptions are nested, for easier diagnostics.
      
      Initially, relying on Jenkins test runs.
      
      One troublespot here is that some of the codepaths are clearly error situations; it's not clear that they have coverage anyway. Trying to create the failure conditions in tests would be ideal, but it will also be hard.
      
      Author: Steve Loughran <stevel@apache.org>
      
      Closes #14371 from steveloughran/cloud/SPARK-16736-superfluous-fs-calls.
      cc97ea18
    • Marcelo Vanzin's avatar
      [SPARK-16930][YARN] Fix a couple of races in cluster app initialization. · e3fec51f
      Marcelo Vanzin authored
      There are two narrow races that could cause the ApplicationMaster to miss
      when the user application instantiates the SparkContext, which could cause
      app failures when nothing was wrong with the app. It was also possible for
      a failing application to get stuck in the loop that waits for the context
      for a long time, instead of failing quickly.
      
      The change uses a promise to track the SparkContext instance, which gets
      rid of the races and allows for some simplification of the code.
      
      Tested with existing unit tests, and a new one being added to test the
      timeout code.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14542 from vanzin/SPARK-16930.
      e3fec51f
  21. Aug 11, 2016
  22. Aug 10, 2016
    • jerryshao's avatar
      [SPARK-14743][YARN] Add a configurable credential manager for Spark running on YARN · ab648c00
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Add a configurable token manager for Spark on running on yarn.
      
      ### Current Problems ###
      
      1. Supported token provider is hard-coded, currently only hdfs, hbase and hive are supported and it is impossible for user to add new token provider without code changes.
      2. Also this problem exits in timely token renewer and updater.
      
      ### Changes In This Proposal ###
      
      In this proposal, to address the problems mentioned above and make the current code more cleaner and easier to understand, mainly has 3 changes:
      
      1. Abstract a `ServiceTokenProvider` as well as `ServiceTokenRenewable` interface for token provider. Each service wants to communicate with Spark through token way needs to implement this interface.
      2. Provide a `ConfigurableTokenManager` to manage all the register token providers, also token renewer and updater. Also this class offers the API for other modules to obtain tokens, get renewal interval and so on.
      3. Implement 3 built-in token providers `HDFSTokenProvider`, `HiveTokenProvider` and `HBaseTokenProvider` to keep the same semantics as supported today. Whether to load in these built-in token providers is controlled by configuration "spark.yarn.security.tokens.${service}.enabled", by default for all the built-in token providers are loaded.
      
      ### Behavior Changes ###
      
      For the end user there's no behavior change, we still use the same configuration `spark.yarn.security.tokens.${service}.enabled` to decide which token provider is enabled (hbase or hive).
      
      For user implemented token provider (assume the name of token provider is "test") needs to add into this class should have two configurations:
      
      1. `spark.yarn.security.tokens.test.enabled` to true
      2. `spark.yarn.security.tokens.test.class` to the full qualified class name.
      
      So we still keep the same semantics as current code while add one new configuration.
      
      ### Current Status ###
      
      - [x] token provider interface and management framework.
      - [x] implement built-in token providers (hdfs, hbase, hive).
      - [x] Coverage of unit test.
      - [x] Integrated test with security cluster.
      
      ## How was this patch tested?
      
      Unit test and integrated test.
      
      Please suggest and review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #14065 from jerryshao/SPARK-16342.
      ab648c00
  23. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
  24. Jul 27, 2016
    • KevinGrealish's avatar
      [SPARK-16110][YARN][PYSPARK] Fix allowing python version to be specified per... · b14d7b5c
      KevinGrealish authored
      [SPARK-16110][YARN][PYSPARK] Fix allowing python version to be specified per submit for cluster mode.
      
      ## What changes were proposed in this pull request?
      
      This fix allows submit of pyspark jobs to specify python 2 or 3.
      
      Change ordering in setup for application master environment so env vars PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON can be overridden by spark.yarn.appMasterEnv.* conf settings. This applies to YARN in cluster mode. This allows them to be set per submission without needing the unset the env vars (which is not always possible - e.g. batch submit with LIVY only exposes the arguments to spark-submit)
      
      ## How was this patch tested?
      Manual and existing unit tests.
      
      Author: KevinGrealish <KevinGre@microsoft.com>
      
      Closes #13824 from KevinGrealish/SPARK-16110.
      b14d7b5c
  25. Jul 19, 2016
  26. Jul 14, 2016
  27. Jul 13, 2016
    • sharkd's avatar
      [MINOR][YARN] Fix code error in yarn-cluster unit test · 3d6f679c
      sharkd authored
      ## What changes were proposed in this pull request?
      
      Fix code error in yarn-cluster unit test.
      
      ## How was this patch tested?
      
      Use exist tests
      
      Author: sharkd <sharkd.tu@gmail.com>
      
      Closes #14166 from sharkdtu/master.
      3d6f679c
  28. Jul 12, 2016
    • sharkd's avatar
      [SPARK-16414][YARN] Fix bugs for "Can not get user config when calling... · d513c99c
      sharkd authored
      [SPARK-16414][YARN] Fix bugs for "Can not get user config when calling SparkHadoopUtil.get.conf on yarn cluser mode"
      
      ## What changes were proposed in this pull request?
      
      The `SparkHadoopUtil` singleton was instantiated before `ApplicationMaster` in `ApplicationMaster.main` when deploying spark on yarn cluster mode, the `conf` in the `SparkHadoopUtil` singleton didn't include user's configuration.
      
      So, we should load the properties file with the Spark configuration and set entries as system properties before `SparkHadoopUtil` first instantiate.
      
      ## How was this patch tested?
      
      Add a test case
      
      Author: sharkd <sharkd.tu@gmail.com>
      Author: sharkdtu <sharkdtu@tencent.com>
      
      Closes #14088 from sharkdtu/master.
      d513c99c
  29. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  30. Jul 01, 2016
  31. Jun 29, 2016
    • jerryshao's avatar
      [SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn · 272a2f78
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Yarn supports rolling log aggregation since 2.6, previously log will only be aggregated to HDFS after application is finished, it is quite painful for long running applications like Spark Streaming, thriftserver. Also out of disk problem will be occurred when log file is too large. So here propose to add support of rolling log aggregation for Spark on yarn.
      
      One limitation for this is that log4j should be set to change to file appender, now in Spark itself uses console appender by default, in which file will not be created again once removed after aggregation. But I think lots of production users should have changed their log4j configuration instead of default on, so this is not a big problem.
      
      ## How was this patch tested?
      
      Manually verified with Hadoop 2.7.1.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #13712 from jerryshao/SPARK-15990.
      272a2f78
  32. Jun 24, 2016
    • peng.zhang's avatar
      [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite · f4fd7432
      peng.zhang authored
      ## What changes were proposed in this pull request?
      
      Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite doesn't test "yarn cluster" mode correctly.
      This pull request fixes it.
      
      ## How was this patch tested?
      Unit test
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: peng.zhang <peng.zhang@xiaomi.com>
      
      Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode.
      f4fd7432
  33. Jun 23, 2016
    • Ryan Blue's avatar
      [SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation. · 738f134b
      Ryan Blue authored
      ## What changes were proposed in this pull request?
      
      This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.
      
      This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.
      
      ## How was this patch tested?
      
      This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.
      
      Author: Ryan Blue <blue@apache.org>
      
      Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.
      738f134b
    • Ryan Blue's avatar
      [SPARK-15725][YARN] Ensure ApplicationMaster sleeps for the min interval. · a410814c
      Ryan Blue authored
      ## What changes were proposed in this pull request?
      
      Update `ApplicationMaster` to sleep for at least the minimum allocation interval before calling `allocateResources`. This prevents overloading the `YarnAllocator` that is happening because the thread is triggered when an executor is killed and its connections die. In YARN, this prevents the app from overloading the allocator and becoming unstable.
      
      ## How was this patch tested?
      
      Tested that this allows the an app to recover instead of hanging. It is still possible for the YarnAllocator to be overwhelmed by requests, but this prevents the issue for the most common cause.
      
      Author: Ryan Blue <blue@apache.org>
      
      Closes #13482 from rdblue/SPARK-15725-am-sleep-work-around.
      a410814c
    • Peter Ableda's avatar
      [SPARK-16138] Try to cancel executor requests only if we have at least 1 · 5bf2889b
      Peter Ableda authored
      ## What changes were proposed in this pull request?
      Adding additional check to if statement
      
      ## How was this patch tested?
      I built and deployed to internal cluster to observe behaviour. After the change the invalid logging is gone:
      
      ```
      16/06/22 08:46:36 INFO yarn.YarnAllocator: Driver requested a total number of 1 executor(s).
      16/06/22 08:46:36 INFO yarn.YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 1 executors.
      16/06/22 08:46:36 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
      16/06/22 08:47:36 INFO yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 1.
      ```
      
      Author: Peter Ableda <abledapeter@gmail.com>
      
      Closes #13850 from peterableda/patch-2.
      5bf2889b
  34. Jun 21, 2016
  35. Jun 15, 2016
Loading