Skip to content
Snippets Groups Projects
  1. Jun 19, 2017
  2. Jun 15, 2017
    • Michael Gummelt's avatar
      [SPARK-20434][YARN][CORE] Move Hadoop delegation token code from yarn to core · a18d6371
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Move Hadoop delegation token code from `spark-yarn` to `spark-core`, so that other schedulers (such as Mesos), may use it.  In order to avoid exposing Hadoop interfaces in spark-core, the new Hadoop delegation token classes are kept private.  In order to provider backward compatiblity, and to allow YARN users to continue to load their own delegation token providers via Java service loading, the old YARN interfaces, as well as the client code that uses them, have been retained.
      
      Summary:
      - Move registered `yarn.security.ServiceCredentialProvider` classes from `spark-yarn` to `spark-core`.  Moved them into a new, private hierarchy under `HadoopDelegationTokenProvider`.  Client code in `HadoopDelegationTokenManager` now loads credentials from a whitelist of three providers (`HadoopFSDelegationTokenProvider`, `HiveDelegationTokenProvider`, `HBaseDelegationTokenProvider`), instead of service loading, which means that users are not able to implement their own delegation token providers, as they are in the `spark-yarn` module.
      
      - The `yarn.security.ServiceCredentialProvider` interface has been kept for backwards compatibility, and to continue to allow YARN users to implement their own delegation token provider implementations.  Client code in YARN now fetches tokens via the new `YARNHadoopDelegationTokenManager` class, which fetches tokens from the core providers through `HadoopDelegationTokenManager`, as well as service loads them from `yarn.security.ServiceCredentialProvider`.
      
      Old Hierarchy:
      
      ```
      yarn.security.ServiceCredentialProvider (service loaded)
        HadoopFSCredentialProvider
        HiveCredentialProvider
        HBaseCredentialProvider
      yarn.security.ConfigurableCredentialManager
      ```
      
      New Hierarchy:
      
      ```
      HadoopDelegationTokenManager
      HadoopDelegationTokenProvider (not service loaded)
        HadoopFSDelegationTokenProvider
        HiveDelegationTokenProvider
        HBaseDelegationTokenProvider
      
      yarn.security.ServiceCredentialProvider (service loaded)
      yarn.security.YARNHadoopDelegationTokenManager
      ```
      ## How was this patch tested?
      
      unit tests
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      Author: Dr. Stefan Schimanski <sttts@mesosphere.io>
      
      Closes #17723 from mgummelt/SPARK-20434-refactor-kerberos.
      a18d6371
  3. Jun 11, 2017
  4. Jun 01, 2017
    • Li Yichao's avatar
      [SPARK-20365][YARN] Remove local scheme when add path to ClassPath. · 640afa49
      Li Yichao authored
      In Spark on YARN, when configuring "spark.yarn.jars" with local jars (jars started with "local" scheme), we will get inaccurate classpath for AM and containers. This is because we don't remove "local" scheme when concatenating classpath. It is OK to run because classpath is separated with ":" and java treat "local" as a separate jar. But we could improve it to remove the scheme.
      
      Updated `ClientSuite` to check "local" is not in the classpath.
      
      cc jerryshao
      
      Author: Li Yichao <lyc@zhihu.com>
      Author: Li Yichao <liyichao.good@gmail.com>
      
      Closes #18129 from liyichao/SPARK-20365.
      640afa49
  5. May 25, 2017
  6. May 22, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20814][MESOS] Restore support for spark.executor.extraClassPath. · df64fa79
      Marcelo Vanzin authored
      Restore code that was removed as part of SPARK-17979, but instead of
      using the deprecated env variable name to propagate the class path, use
      a new one.
      
      Verified by running "./bin/spark-class o.a.s.executor.CoarseGrainedExecutorBackend"
      manually.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18037 from vanzin/SPARK-20814.
      df64fa79
  7. May 10, 2017
    • NICHOLAS T. MARION's avatar
      [SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities · b512233a
      NICHOLAS T. MARION authored
      ## What changes were proposed in this pull request?
      
      Add stripXSS and stripXSSMap to Spark Core's UIUtils. Calling these functions at any point that getParameter is called against a HttpServletRequest.
      
      ## How was this patch tested?
      
      Unit tests, IBM Security AppScan Standard no longer showing vulnerabilities, manual verification of WebUI pages.
      
      Author: NICHOLAS T. MARION <nmarion@us.ibm.com>
      
      Closes #17686 from n-marion/xss-fix.
      b512233a
  8. May 08, 2017
    • jerryshao's avatar
      [SPARK-20605][CORE][YARN][MESOS] Deprecate not used AM and executor port configuration · 829cd7b8
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      After SPARK-10997, client mode Netty RpcEnv doesn't require to start server, so port configurations are not used any more, here propose to remove these two configurations: "spark.executor.port" and "spark.am.port".
      
      ## How was this patch tested?
      
      Existing UTs.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17866 from jerryshao/SPARK-20605.
      829cd7b8
    • Xianyang Liu's avatar
      [SPARK-20621][DEPLOY] Delete deprecated config parameter in 'spark-env.sh' · aeb2ecc0
      Xianyang Liu authored
      ## What changes were proposed in this pull request?
      
      Currently, `spark.executor.instances` is deprecated in `spark-env.sh`, because we suggest config it in `spark-defaults.conf` or other config file. And also this parameter is useless even if you set it in `spark-env.sh`, so remove it in this patch.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Xianyang Liu <xianyang.liu@intel.com>
      
      Closes #17881 from ConeyLiu/deprecatedParam.
      aeb2ecc0
    • liuxian's avatar
      [SPARK-20519][SQL][CORE] Modify to prevent some possible runtime exceptions · 0f820e2b
      liuxian authored
      Signed-off-by: liuxian <liu.xian3zte.com.cn>
      
      ## What changes were proposed in this pull request?
      
      When the input parameter is null, may be a runtime exception occurs
      
      ## How was this patch tested?
      Existing unit tests
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #17796 from 10110346/wip_lx_0428.
      0f820e2b
  9. May 03, 2017
    • Sean Owen's avatar
      [SPARK-20523][BUILD] Clean up build warnings for 2.2.0 release · 16fab6b0
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Fix build warnings primarily related to Breeze 0.13 operator changes, Java style problems
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17803 from srowen/SPARK-20523.
      16fab6b0
  10. Apr 27, 2017
    • Davis Shepherd's avatar
      [SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Mesos frameworks · 039e32ca
      Davis Shepherd authored
      ## What changes were proposed in this pull request?
      
      Add test case for scenarios where executor.cores is set as a
      (non)divisor of spark.cores.max
      This tests the change in
      #17786
      
      ## How was this patch tested?
      
      Ran the existing test suite with the new tests
      
      dbtsai
      
      Author: Davis Shepherd <dshepherd@netflix.com>
      
      Closes #17788 from dgshep/add_mesos_test.
      Unverified
      039e32ca
    • Davis Shepherd's avatar
      [SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks · 7633933e
      Davis Shepherd authored
      ## What changes were proposed in this pull request?
      
      Set maxCores to be a multiple of the smallest executor that can be launched. This ensures that we correctly detect the condition where no more executors will be launched when spark.cores.max is not a multiple of spark.executor.cores
      
      ## How was this patch tested?
      
      This was manually tested with other sample frameworks measuring their incoming offers to determine if starvation would occur.
      
      dbtsai mgummelt
      
      Author: Davis Shepherd <dshepherd@netflix.com>
      
      Closes #17786 from dgshep/fix_mesos_max_cores.
      Unverified
      7633933e
  11. Apr 26, 2017
    • Mark Grover's avatar
      [SPARK-20435][CORE] More thorough redaction of sensitive information · 66636ef0
      Mark Grover authored
      This change does a more thorough redaction of sensitive information from logs and UI
      Add unit tests that ensure that no regressions happen that leak sensitive information to the logs.
      
      The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations:
      `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
      `
      Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well.
      
      ## How was this patch tested?
      
      New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it.
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #17725 from markgrover/spark-20435.
      66636ef0
  12. Apr 24, 2017
  13. Apr 23, 2017
    • 郭小龙 10207633's avatar
      [SPARK-20385][WEB-UI] Submitted Time' field, the date format needs to be... · 2eaf4f3f
      郭小龙 10207633 authored
      [SPARK-20385][WEB-UI] Submitted Time' field, the date format needs to be formatted, in running Drivers table or Completed Drivers table in master web ui.
      
      ## What changes were proposed in this pull request?
      Submitted Time' field, the date format **needs to be formatted**, in running Drivers table or Completed Drivers table in master web ui.
      Before fix this problem  e.g.
      
      Completed Drivers
      Submission ID	             **Submitted Time**  	             Worker	                            State	   Cores	   Memory	       Main Class
      driver-20170419145755-0005	 **Wed Apr 19 14:57:55 CST 2017**	 worker-20170419145250-zdh120-40412	FAILED	   1	       1024.0 MB	   cn.zte.HdfsTest
      
      please see the  attachment:https://issues.apache.org/jira/secure/attachment/12863977/before_fix.png
      
      After fix this problem e.g.
      
      Completed Drivers
      Submission ID	             **Submitted Time**  	             Worker	                            State	   Cores	   Memory	       Main Class
      driver-20170419145755-0006	 **2017/04/19 16:01:25**	 worker-20170419145250-zdh120-40412	         FAILED	   1	       1024.0 MB	   cn.zte.HdfsTest
      
      please see the  attachment:https://issues.apache.org/jira/secure/attachment/12863976/after_fix.png
      
      'Submitted Time' field, the date format **has been formatted**, in running Applications table or Completed Applicationstable in master web ui, **it is correct.**
      e.g.
      Running Applications
      Application ID	                Name	                Cores	Memory per Executor	   **Submitted Time**	      User	   State	        Duration
      app-20170419160910-0000 (kill)	SparkSQL::10.43.183.120	1	    5.0 GB	               **2017/04/19 16:09:10**	  root	   RUNNING	    53 s
      
      **Format after the time easier to observe, and consistent with the applications table,so I think it's worth fixing.**
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: 郭小龙 10207633 <guo.xiaolong1@zte.com.cn>
      Author: guoxiaolong <guo.xiaolong1@zte.com.cn>
      Author: guoxiaolongzte <guo.xiaolong1@zte.com.cn>
      
      Closes #17682 from guoxiaolongzte/SPARK-20385.
      2eaf4f3f
  14. Apr 17, 2017
    • Andrew Ash's avatar
      Typo fix: distitrbuted -> distributed · 0075562d
      Andrew Ash authored
      ## What changes were proposed in this pull request?
      
      Typo fix: distitrbuted -> distributed
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #17664 from ash211/patch-1.
      0075562d
  15. Apr 16, 2017
    • Ji Yan's avatar
      [SPARK-19740][MESOS] Add support in Spark to pass arbitrary parameters into... · a888fed3
      Ji Yan authored
      [SPARK-19740][MESOS] Add support in Spark to pass arbitrary parameters into docker when running on mesos with docker containerizer
      
      ## What changes were proposed in this pull request?
      
      Allow passing in arbitrary parameters into docker when launching spark executors on mesos with docker containerizer tnachen
      
      ## How was this patch tested?
      
      Manually built and tested with passed in parameter
      
      Author: Ji Yan <jiyan@Jis-MacBook-Air.local>
      
      Closes #17109 from yanji84/ji/allow_set_docker_user.
      a888fed3
  16. Apr 12, 2017
    • hyukjinkwon's avatar
      [SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins · ceaf77ae
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.
      
      There are several problems with it:
      
      - It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".
      
      - > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.
      
        (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))
      
      To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.
      
      There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013
      
      Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings.
      
      ## How was this patch tested?
      
      Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.
      
      This was tested via manually adding `time.time()` as below:
      
      ```diff
           profiles_and_goals = build_profiles + sbt_goals
      
           print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
                 " ".join(profiles_and_goals))
      
      +    import time
      +    st = time.time()
           exec_sbt(profiles_and_goals)
      +    print("Elapsed :[%s]" % str(time.time() - st))
      ```
      
      produces
      
      ```
      ...
      ========================================================================
      Building Unidoc API Documentation
      ========================================================================
      ...
      [info] Main Java API documentation successful.
      ...
      Elapsed :[94.8746569157]
      ...
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17477 from HyukjinKwon/SPARK-18692.
      ceaf77ae
  17. Apr 10, 2017
    • Sean Owen's avatar
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish... · a26e3ed5
      Sean Owen authored
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems
      
      ## What changes were proposed in this pull request?
      
      Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
      
      The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17527 from srowen/SPARK-20156.
      a26e3ed5
  18. Apr 06, 2017
    • Kalvin Chau's avatar
      [SPARK-20085][MESOS] Configurable mesos labels for executors · c8fc1f3b
      Kalvin Chau authored
      ## What changes were proposed in this pull request?
      
      Add spark.mesos.task.labels configuration option to add mesos key:value labels to the executor.
      
       "k1:v1,k2:v2" as the format, colons separating key-value and commas to list out more than one.
      
      Discussion of labels with mgummelt at #17404
      
      ## How was this patch tested?
      
      Added unit tests to verify labels were added correctly, with incorrect labels being ignored and added a test to test the name of the executor.
      
      Tested with: `./build/sbt -Pmesos mesos/test`
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Kalvin Chau <kalvin.chau@viasat.com>
      
      Closes #17413 from kalvinnchau/mesos-labels.
      c8fc1f3b
  19. Apr 04, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20191][YARN] Crate wrapper for RackResolver so tests can override it. · 0736980f
      Marcelo Vanzin authored
      Current test code tries to override the RackResolver used by setting
      configuration params, but because YARN libs statically initialize the
      resolver the first time it's used, that means that those configs don't
      really take effect during Spark tests.
      
      This change adds a wrapper class that easily allows tests to override the
      behavior of the resolver for the Spark code that uses it.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #17508 from vanzin/SPARK-20191.
      0736980f
  20. Mar 29, 2017
    • jerryshao's avatar
      [SPARK-20059][YARN] Use the correct classloader for HBaseCredentialProvider · c622a87c
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Currently we use system classloader to find HBase jars, if it is specified by `--jars`, then it will be failed with ClassNotFound issue. So here changing to use child classloader.
      
      Also putting added jars and main jar into classpath of submitted application in yarn cluster mode, otherwise HBase jars specified with `--jars` will never be honored in cluster mode, and fetching tokens in client side will always be failed.
      
      ## How was this patch tested?
      
      Unit test and local verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17388 from jerryshao/SPARK-20059.
      c622a87c
  21. Mar 28, 2017
    • jerryshao's avatar
      [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of... · 17eddb35
      jerryshao authored
      [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode
      
      ## What changes were proposed in this pull request?
      
      In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user.
      
      So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above.
      
      ## How was this patch tested?
      
      Local verified in secure cluster.
      
      vanzin tgravescs mridulm  dongjoon-hyun please help to review, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17335 from jerryshao/SPARK-19995.
      17eddb35
  22. Mar 26, 2017
    • Juan Rodriguez Hortala's avatar
      logging improvements · 362ee932
      Juan Rodriguez Hortala authored
      ## What changes were proposed in this pull request?
      Adding additional information to existing logging messages:
        - YarnAllocator: log the executor ID together with the container id when a container for an executor is launched.
        - NettyRpcEnv: log the receiver address when there is a timeout waiting for an answer to a remote call.
        - ExecutorAllocationManager: fix a typo in the logging message for the list of executors to be removed.
      
      ## How was this patch tested?
      Build spark and submit the word count example to a YARN cluster using cluster mode
      
      Author: Juan Rodriguez Hortala <hortala@amazon.com>
      
      Closes #17411 from juanrh/logging-improvements.
      362ee932
  23. Mar 25, 2017
    • Kalvin Chau's avatar
      [SPARK-20078][MESOS] Mesos executor configurability for task name and labels · e8ddb91c
      Kalvin Chau authored
      ## What changes were proposed in this pull request?
      
      Adding configurable mesos executor names and labels using `spark.mesos.task.name` and `spark.mesos.task.labels`.
      
      Labels were defined as `k1:v1,k2:v2`.
      
      mgummelt
      
      ## How was this patch tested?
      
      Added unit tests to verify labels were added correctly, with incorrect labels being ignored and added a test to test the name of the executor.
      
      Tested with: `./build/sbt -Pmesos mesos/test`
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Kalvin Chau <kalvin.chau@viasat.com>
      
      Closes #17404 from kalvinnchau/mesos-config.
      e8ddb91c
  24. Mar 24, 2017
  25. Mar 23, 2017
    • Ye Yin's avatar
      Typo fixup in comment · b0ae6a38
      Ye Yin authored
      ## What changes were proposed in this pull request?
      
      Fixup typo in comment.
      
      ## How was this patch tested?
      
      Don't need.
      
      Author: Ye Yin <eyniy@qq.com>
      
      Closes #17396 from hustcat/fix.
      b0ae6a38
  26. Mar 10, 2017
  27. Mar 07, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19857][YARN] Correctly calculate next credential update time. · 8e41c2ee
      Marcelo Vanzin authored
      Add parentheses so that both lines form a single statement; also add
      a log message so that the issue becomes more explicit if it shows up
      again.
      
      Tested manually with integration test that exercises the feature.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #17198 from vanzin/SPARK-19857.
      8e41c2ee
    • Michael Gummelt's avatar
      [SPARK-19702][MESOS] Increase default refuse_seconds timeout in the Mesos Spark Dispatcher · 2e30c0b9
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Increase default refuse_seconds timeout, and make it configurable.  See JIRA for details on how this reduces the risk of starvation.
      
      ## How was this patch tested?
      
      Unit tests, Manual testing, and Mesos/Spark integration test suite
      
      cc susanxhuynh skonto jmlvanre
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #17031 from mgummelt/SPARK-19702-suppress-revive.
      2e30c0b9
  28. Feb 28, 2017
    • Michael Gummelt's avatar
      [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on... · ca3864d6
      Michael Gummelt authored
      [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on registered cores rather than accepted cores
      
      ## What changes were proposed in this pull request?
      
      See JIRA
      
      ## How was this patch tested?
      
      Unit tests, Mesos/Spark integration tests
      
      cc skonto susanxhuynh
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #17045 from mgummelt/SPARK-19373-registered-resources.
      ca3864d6
  29. Feb 25, 2017
    • Devaraj K's avatar
      [SPARK-15288][MESOS] Mesos dispatcher should handle gracefully when any thread... · 410392ed
      Devaraj K authored
      [SPARK-15288][MESOS] Mesos dispatcher should handle gracefully when any thread gets UncaughtException
      
      ## What changes were proposed in this pull request?
      
      Adding the default UncaughtExceptionHandler to the MesosClusterDispatcher.
      ## How was this patch tested?
      
      I verified it manually, when any of the dispatcher thread gets uncaught exceptions then the default UncaughtExceptionHandler will handle those exceptions.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #13072 from devaraj-kavali/SPARK-15288.
      410392ed
  30. Feb 24, 2017
    • Jeff Zhang's avatar
      [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated to python worker · 330c3e33
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      self.environment will be propagated to executor. Should set PYTHONHASHSEED as long as the python version is greater than 3.3
      
      ## How was this patch tested?
      Manually tested it.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #11211 from zjffdu/SPARK-13330.
      330c3e33
    • jerryshao's avatar
      [SPARK-19038][YARN] Avoid overwriting keytab configuration in yarn-client · a920a436
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Because yarn#client will reset the `spark.yarn.keytab` configuration to point to the location in distributed file, so if user still uses the old `SparkConf` to create `SparkSession` with Hive enabled, it will read keytab from the path in distributed cached. This is OK for yarn cluster mode, but in yarn client mode where driver is running out of container, it will be failed to fetch the keytab.
      
      So here we should avoid reseting this configuration in the `yarn#client` and only overwriting it for AM, so using `spark.yarn.keytab` could get correct keytab path no matter running in client (keytab in local fs) or cluster (keytab in distributed cache) mode.
      
      ## How was this patch tested?
      
      Verified in security cluster.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16923 from jerryshao/SPARK-19038.
      a920a436
  31. Feb 22, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM. · 4661d30b
      Marcelo Vanzin authored
      Allow an application to use the History Server URL as the tracking
      URL in the YARN RM, so there's still a link to the web UI somewhere
      in YARN even if the driver's UI is disabled. This is useful, for
      example, if an admin wants to disable the driver UI by default for
      applications, since it's harder to secure it (since it involves non
      trivial ssl certificate and auth management that admins may not want
      to expose to user apps).
      
      This needs to be opt-in, because of the way the YARN proxy works, so
      a new configuration was added to enable the option.
      
      The YARN RM will proxy requests to live AMs instead of redirecting
      the client, so pages in the SHS UI will not render correctly since
      they'll reference invalid paths in the RM UI. The proxy base support
      in the SHS cannot be used since that would prevent direct access to
      the SHS.
      
      So, to solve this problem, for the feature to work end-to-end, a new
      YARN-specific filter was added that detects whether the requests come
      from the proxy and redirects the client appropriatly. The SHS admin has
      to add this filter manually if they want the feature to work.
      
      Tested with new unit test, and by running with the documented configuration
      set in a test cluster. Also verified the driver UI is used when it's
      enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16946 from vanzin/SPARK-19554.
      4661d30b
  32. Feb 21, 2017
    • Kent Yao's avatar
      [SPARK-19626][YARN] Using the correct config to set credentials update time · 7363dde6
      Kent Yao authored
      ## What changes were proposed in this pull request?
      
      In https://github.com/apache/spark/pull/14065, we introduced a configurable credential manager for Spark running on YARN. Also two configs `spark.yarn.credentials.renewalTime` and `spark.yarn.credentials.updateTime` were added, one is for the credential renewer and the other updater. But now we just query `spark.yarn.credentials.renewalTime` by mistake during CREDENTIALS UPDATING, where should be actually `spark.yarn.credentials.updateTime` .
      
      This PR fixes this mistake.
      
      ## How was this patch tested?
      
      existing test
      
      cc jerryshao vanzin
      
      Author: Kent Yao <yaooqinn@hotmail.com>
      
      Closes #16955 from yaooqinn/cred_update.
      7363dde6
  33. Feb 19, 2017
    • jinxing's avatar
      [SPARK-19450] Replace askWithRetry with askSync. · ba8912e5
      jinxing authored
      ## What changes were proposed in this pull request?
      
      `askSync` is already added in `RpcEndpointRef` (see SPARK-19347 and https://github.com/apache/spark/pull/16690#issuecomment-276850068) and `askWithRetry` is marked as deprecated.
      As mentioned SPARK-18113(https://github.com/apache/spark/pull/16503#event-927953218):
      
      >askWithRetry is basically an unneeded API, and a leftover from the akka days that doesn't make sense anymore. It's prone to cause deadlocks (exactly because it's blocking), it imposes restrictions on the caller (e.g. idempotency) and other things that people generally don't pay that much attention to when using it.
      
      Since `askWithRetry` is just used inside spark and not in user logic. It might make sense to replace all of them with `askSync`.
      
      ## How was this patch tested?
      This PR doesn't change code logic, existing unit test can cover.
      
      Author: jinxing <jinxing@meituan.com>
      
      Closes #16790 from jinxing64/SPARK-19450.
      Unverified
      ba8912e5
  34. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  35. Feb 14, 2017
Loading