Skip to content
Snippets Groups Projects
  1. Apr 27, 2017
    • Davis Shepherd's avatar
      [SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Mesos frameworks · 039e32ca
      Davis Shepherd authored
      ## What changes were proposed in this pull request?
      
      Add test case for scenarios where executor.cores is set as a
      (non)divisor of spark.cores.max
      This tests the change in
      #17786
      
      ## How was this patch tested?
      
      Ran the existing test suite with the new tests
      
      dbtsai
      
      Author: Davis Shepherd <dshepherd@netflix.com>
      
      Closes #17788 from dgshep/add_mesos_test.
      Unverified
      039e32ca
    • Davis Shepherd's avatar
      [SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks · 7633933e
      Davis Shepherd authored
      ## What changes were proposed in this pull request?
      
      Set maxCores to be a multiple of the smallest executor that can be launched. This ensures that we correctly detect the condition where no more executors will be launched when spark.cores.max is not a multiple of spark.executor.cores
      
      ## How was this patch tested?
      
      This was manually tested with other sample frameworks measuring their incoming offers to determine if starvation would occur.
      
      dbtsai mgummelt
      
      Author: Davis Shepherd <dshepherd@netflix.com>
      
      Closes #17786 from dgshep/fix_mesos_max_cores.
      Unverified
      7633933e
  2. Apr 26, 2017
    • Mark Grover's avatar
      [SPARK-20435][CORE] More thorough redaction of sensitive information · 66636ef0
      Mark Grover authored
      This change does a more thorough redaction of sensitive information from logs and UI
      Add unit tests that ensure that no regressions happen that leak sensitive information to the logs.
      
      The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations:
      `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
      `
      Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well.
      
      ## How was this patch tested?
      
      New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it.
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #17725 from markgrover/spark-20435.
      66636ef0
  3. Apr 24, 2017
  4. Apr 23, 2017
    • 郭小龙 10207633's avatar
      [SPARK-20385][WEB-UI] Submitted Time' field, the date format needs to be... · 2eaf4f3f
      郭小龙 10207633 authored
      [SPARK-20385][WEB-UI] Submitted Time' field, the date format needs to be formatted, in running Drivers table or Completed Drivers table in master web ui.
      
      ## What changes were proposed in this pull request?
      Submitted Time' field, the date format **needs to be formatted**, in running Drivers table or Completed Drivers table in master web ui.
      Before fix this problem  e.g.
      
      Completed Drivers
      Submission ID	             **Submitted Time**  	             Worker	                            State	   Cores	   Memory	       Main Class
      driver-20170419145755-0005	 **Wed Apr 19 14:57:55 CST 2017**	 worker-20170419145250-zdh120-40412	FAILED	   1	       1024.0 MB	   cn.zte.HdfsTest
      
      please see the  attachment:https://issues.apache.org/jira/secure/attachment/12863977/before_fix.png
      
      After fix this problem e.g.
      
      Completed Drivers
      Submission ID	             **Submitted Time**  	             Worker	                            State	   Cores	   Memory	       Main Class
      driver-20170419145755-0006	 **2017/04/19 16:01:25**	 worker-20170419145250-zdh120-40412	         FAILED	   1	       1024.0 MB	   cn.zte.HdfsTest
      
      please see the  attachment:https://issues.apache.org/jira/secure/attachment/12863976/after_fix.png
      
      'Submitted Time' field, the date format **has been formatted**, in running Applications table or Completed Applicationstable in master web ui, **it is correct.**
      e.g.
      Running Applications
      Application ID	                Name	                Cores	Memory per Executor	   **Submitted Time**	      User	   State	        Duration
      app-20170419160910-0000 (kill)	SparkSQL::10.43.183.120	1	    5.0 GB	               **2017/04/19 16:09:10**	  root	   RUNNING	    53 s
      
      **Format after the time easier to observe, and consistent with the applications table,so I think it's worth fixing.**
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: 郭小龙 10207633 <guo.xiaolong1@zte.com.cn>
      Author: guoxiaolong <guo.xiaolong1@zte.com.cn>
      Author: guoxiaolongzte <guo.xiaolong1@zte.com.cn>
      
      Closes #17682 from guoxiaolongzte/SPARK-20385.
      2eaf4f3f
  5. Apr 17, 2017
    • Andrew Ash's avatar
      Typo fix: distitrbuted -> distributed · 0075562d
      Andrew Ash authored
      ## What changes were proposed in this pull request?
      
      Typo fix: distitrbuted -> distributed
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #17664 from ash211/patch-1.
      0075562d
  6. Apr 16, 2017
    • Ji Yan's avatar
      [SPARK-19740][MESOS] Add support in Spark to pass arbitrary parameters into... · a888fed3
      Ji Yan authored
      [SPARK-19740][MESOS] Add support in Spark to pass arbitrary parameters into docker when running on mesos with docker containerizer
      
      ## What changes were proposed in this pull request?
      
      Allow passing in arbitrary parameters into docker when launching spark executors on mesos with docker containerizer tnachen
      
      ## How was this patch tested?
      
      Manually built and tested with passed in parameter
      
      Author: Ji Yan <jiyan@Jis-MacBook-Air.local>
      
      Closes #17109 from yanji84/ji/allow_set_docker_user.
      a888fed3
  7. Apr 12, 2017
    • hyukjinkwon's avatar
      [SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins · ceaf77ae
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.
      
      There are several problems with it:
      
      - It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".
      
      - > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.
      
        (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))
      
      To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.
      
      There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013
      
      Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings.
      
      ## How was this patch tested?
      
      Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.
      
      This was tested via manually adding `time.time()` as below:
      
      ```diff
           profiles_and_goals = build_profiles + sbt_goals
      
           print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
                 " ".join(profiles_and_goals))
      
      +    import time
      +    st = time.time()
           exec_sbt(profiles_and_goals)
      +    print("Elapsed :[%s]" % str(time.time() - st))
      ```
      
      produces
      
      ```
      ...
      ========================================================================
      Building Unidoc API Documentation
      ========================================================================
      ...
      [info] Main Java API documentation successful.
      ...
      Elapsed :[94.8746569157]
      ...
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17477 from HyukjinKwon/SPARK-18692.
      ceaf77ae
  8. Apr 10, 2017
    • Sean Owen's avatar
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish... · a26e3ed5
      Sean Owen authored
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems
      
      ## What changes were proposed in this pull request?
      
      Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
      
      The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17527 from srowen/SPARK-20156.
      a26e3ed5
  9. Apr 06, 2017
    • Kalvin Chau's avatar
      [SPARK-20085][MESOS] Configurable mesos labels for executors · c8fc1f3b
      Kalvin Chau authored
      ## What changes were proposed in this pull request?
      
      Add spark.mesos.task.labels configuration option to add mesos key:value labels to the executor.
      
       "k1:v1,k2:v2" as the format, colons separating key-value and commas to list out more than one.
      
      Discussion of labels with mgummelt at #17404
      
      ## How was this patch tested?
      
      Added unit tests to verify labels were added correctly, with incorrect labels being ignored and added a test to test the name of the executor.
      
      Tested with: `./build/sbt -Pmesos mesos/test`
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Kalvin Chau <kalvin.chau@viasat.com>
      
      Closes #17413 from kalvinnchau/mesos-labels.
      c8fc1f3b
  10. Apr 04, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20191][YARN] Crate wrapper for RackResolver so tests can override it. · 0736980f
      Marcelo Vanzin authored
      Current test code tries to override the RackResolver used by setting
      configuration params, but because YARN libs statically initialize the
      resolver the first time it's used, that means that those configs don't
      really take effect during Spark tests.
      
      This change adds a wrapper class that easily allows tests to override the
      behavior of the resolver for the Spark code that uses it.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #17508 from vanzin/SPARK-20191.
      0736980f
  11. Mar 29, 2017
    • jerryshao's avatar
      [SPARK-20059][YARN] Use the correct classloader for HBaseCredentialProvider · c622a87c
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Currently we use system classloader to find HBase jars, if it is specified by `--jars`, then it will be failed with ClassNotFound issue. So here changing to use child classloader.
      
      Also putting added jars and main jar into classpath of submitted application in yarn cluster mode, otherwise HBase jars specified with `--jars` will never be honored in cluster mode, and fetching tokens in client side will always be failed.
      
      ## How was this patch tested?
      
      Unit test and local verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17388 from jerryshao/SPARK-20059.
      c622a87c
  12. Mar 28, 2017
    • jerryshao's avatar
      [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of... · 17eddb35
      jerryshao authored
      [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode
      
      ## What changes were proposed in this pull request?
      
      In the current Spark on YARN code, we will obtain tokens from provided services, but we're not going to add these tokens to the current user's credentials. This will make all the following operations to these services still require TGT rather than delegation tokens. This is unnecessary since we already got the tokens, also this will lead to failure in user impersonation scenario, because the TGT is granted by real user, not proxy user.
      
      So here changing to put all the tokens to the current UGI, so that following operations to these services will honor tokens rather than TGT, and this will further handle the proxy user issue mentioned above.
      
      ## How was this patch tested?
      
      Local verified in secure cluster.
      
      vanzin tgravescs mridulm  dongjoon-hyun please help to review, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17335 from jerryshao/SPARK-19995.
      17eddb35
  13. Mar 26, 2017
    • Juan Rodriguez Hortala's avatar
      logging improvements · 362ee932
      Juan Rodriguez Hortala authored
      ## What changes were proposed in this pull request?
      Adding additional information to existing logging messages:
        - YarnAllocator: log the executor ID together with the container id when a container for an executor is launched.
        - NettyRpcEnv: log the receiver address when there is a timeout waiting for an answer to a remote call.
        - ExecutorAllocationManager: fix a typo in the logging message for the list of executors to be removed.
      
      ## How was this patch tested?
      Build spark and submit the word count example to a YARN cluster using cluster mode
      
      Author: Juan Rodriguez Hortala <hortala@amazon.com>
      
      Closes #17411 from juanrh/logging-improvements.
      362ee932
  14. Mar 25, 2017
    • Kalvin Chau's avatar
      [SPARK-20078][MESOS] Mesos executor configurability for task name and labels · e8ddb91c
      Kalvin Chau authored
      ## What changes were proposed in this pull request?
      
      Adding configurable mesos executor names and labels using `spark.mesos.task.name` and `spark.mesos.task.labels`.
      
      Labels were defined as `k1:v1,k2:v2`.
      
      mgummelt
      
      ## How was this patch tested?
      
      Added unit tests to verify labels were added correctly, with incorrect labels being ignored and added a test to test the name of the executor.
      
      Tested with: `./build/sbt -Pmesos mesos/test`
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Kalvin Chau <kalvin.chau@viasat.com>
      
      Closes #17404 from kalvinnchau/mesos-config.
      e8ddb91c
  15. Mar 24, 2017
  16. Mar 23, 2017
    • Ye Yin's avatar
      Typo fixup in comment · b0ae6a38
      Ye Yin authored
      ## What changes were proposed in this pull request?
      
      Fixup typo in comment.
      
      ## How was this patch tested?
      
      Don't need.
      
      Author: Ye Yin <eyniy@qq.com>
      
      Closes #17396 from hustcat/fix.
      b0ae6a38
  17. Mar 10, 2017
  18. Mar 07, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19857][YARN] Correctly calculate next credential update time. · 8e41c2ee
      Marcelo Vanzin authored
      Add parentheses so that both lines form a single statement; also add
      a log message so that the issue becomes more explicit if it shows up
      again.
      
      Tested manually with integration test that exercises the feature.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #17198 from vanzin/SPARK-19857.
      8e41c2ee
    • Michael Gummelt's avatar
      [SPARK-19702][MESOS] Increase default refuse_seconds timeout in the Mesos Spark Dispatcher · 2e30c0b9
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Increase default refuse_seconds timeout, and make it configurable.  See JIRA for details on how this reduces the risk of starvation.
      
      ## How was this patch tested?
      
      Unit tests, Manual testing, and Mesos/Spark integration test suite
      
      cc susanxhuynh skonto jmlvanre
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #17031 from mgummelt/SPARK-19702-suppress-revive.
      2e30c0b9
  19. Feb 28, 2017
    • Michael Gummelt's avatar
      [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on... · ca3864d6
      Michael Gummelt authored
      [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on registered cores rather than accepted cores
      
      ## What changes were proposed in this pull request?
      
      See JIRA
      
      ## How was this patch tested?
      
      Unit tests, Mesos/Spark integration tests
      
      cc skonto susanxhuynh
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #17045 from mgummelt/SPARK-19373-registered-resources.
      ca3864d6
  20. Feb 25, 2017
    • Devaraj K's avatar
      [SPARK-15288][MESOS] Mesos dispatcher should handle gracefully when any thread... · 410392ed
      Devaraj K authored
      [SPARK-15288][MESOS] Mesos dispatcher should handle gracefully when any thread gets UncaughtException
      
      ## What changes were proposed in this pull request?
      
      Adding the default UncaughtExceptionHandler to the MesosClusterDispatcher.
      ## How was this patch tested?
      
      I verified it manually, when any of the dispatcher thread gets uncaught exceptions then the default UncaughtExceptionHandler will handle those exceptions.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #13072 from devaraj-kavali/SPARK-15288.
      410392ed
  21. Feb 24, 2017
    • Jeff Zhang's avatar
      [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated to python worker · 330c3e33
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      self.environment will be propagated to executor. Should set PYTHONHASHSEED as long as the python version is greater than 3.3
      
      ## How was this patch tested?
      Manually tested it.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #11211 from zjffdu/SPARK-13330.
      330c3e33
    • jerryshao's avatar
      [SPARK-19038][YARN] Avoid overwriting keytab configuration in yarn-client · a920a436
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Because yarn#client will reset the `spark.yarn.keytab` configuration to point to the location in distributed file, so if user still uses the old `SparkConf` to create `SparkSession` with Hive enabled, it will read keytab from the path in distributed cached. This is OK for yarn cluster mode, but in yarn client mode where driver is running out of container, it will be failed to fetch the keytab.
      
      So here we should avoid reseting this configuration in the `yarn#client` and only overwriting it for AM, so using `spark.yarn.keytab` could get correct keytab path no matter running in client (keytab in local fs) or cluster (keytab in distributed cache) mode.
      
      ## How was this patch tested?
      
      Verified in security cluster.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16923 from jerryshao/SPARK-19038.
      a920a436
  22. Feb 22, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM. · 4661d30b
      Marcelo Vanzin authored
      Allow an application to use the History Server URL as the tracking
      URL in the YARN RM, so there's still a link to the web UI somewhere
      in YARN even if the driver's UI is disabled. This is useful, for
      example, if an admin wants to disable the driver UI by default for
      applications, since it's harder to secure it (since it involves non
      trivial ssl certificate and auth management that admins may not want
      to expose to user apps).
      
      This needs to be opt-in, because of the way the YARN proxy works, so
      a new configuration was added to enable the option.
      
      The YARN RM will proxy requests to live AMs instead of redirecting
      the client, so pages in the SHS UI will not render correctly since
      they'll reference invalid paths in the RM UI. The proxy base support
      in the SHS cannot be used since that would prevent direct access to
      the SHS.
      
      So, to solve this problem, for the feature to work end-to-end, a new
      YARN-specific filter was added that detects whether the requests come
      from the proxy and redirects the client appropriatly. The SHS admin has
      to add this filter manually if they want the feature to work.
      
      Tested with new unit test, and by running with the documented configuration
      set in a test cluster. Also verified the driver UI is used when it's
      enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16946 from vanzin/SPARK-19554.
      4661d30b
  23. Feb 21, 2017
    • Kent Yao's avatar
      [SPARK-19626][YARN] Using the correct config to set credentials update time · 7363dde6
      Kent Yao authored
      ## What changes were proposed in this pull request?
      
      In https://github.com/apache/spark/pull/14065, we introduced a configurable credential manager for Spark running on YARN. Also two configs `spark.yarn.credentials.renewalTime` and `spark.yarn.credentials.updateTime` were added, one is for the credential renewer and the other updater. But now we just query `spark.yarn.credentials.renewalTime` by mistake during CREDENTIALS UPDATING, where should be actually `spark.yarn.credentials.updateTime` .
      
      This PR fixes this mistake.
      
      ## How was this patch tested?
      
      existing test
      
      cc jerryshao vanzin
      
      Author: Kent Yao <yaooqinn@hotmail.com>
      
      Closes #16955 from yaooqinn/cred_update.
      7363dde6
  24. Feb 19, 2017
    • jinxing's avatar
      [SPARK-19450] Replace askWithRetry with askSync. · ba8912e5
      jinxing authored
      ## What changes were proposed in this pull request?
      
      `askSync` is already added in `RpcEndpointRef` (see SPARK-19347 and https://github.com/apache/spark/pull/16690#issuecomment-276850068) and `askWithRetry` is marked as deprecated.
      As mentioned SPARK-18113(https://github.com/apache/spark/pull/16503#event-927953218):
      
      >askWithRetry is basically an unneeded API, and a leftover from the akka days that doesn't make sense anymore. It's prone to cause deadlocks (exactly because it's blocking), it imposes restrictions on the caller (e.g. idempotency) and other things that people generally don't pay that much attention to when using it.
      
      Since `askWithRetry` is just used inside spark and not in user logic. It might make sense to replace all of them with `askSync`.
      
      ## How was this patch tested?
      This PR doesn't change code logic, existing unit test can cover.
      
      Author: jinxing <jinxing@meituan.com>
      
      Closes #16790 from jinxing64/SPARK-19450.
      Unverified
      ba8912e5
  25. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  26. Feb 14, 2017
  27. Feb 10, 2017
    • Devaraj K's avatar
      [SPARK-10748][MESOS] Log error instead of crashing Spark Mesos dispatcher when... · 8640dc08
      Devaraj K authored
      [SPARK-10748][MESOS] Log error instead of crashing Spark Mesos dispatcher when a job is misconfigured
      
      ## What changes were proposed in this pull request?
      
      Now handling the spark exception which gets thrown for invalid job configuration, marking that job as failed and continuing to launch the other drivers instead of throwing the exception.
      ## How was this patch tested?
      
      I verified manually, now the misconfigured jobs move to Finished Drivers section in UI and continue to launch the other jobs.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #13077 from devaraj-kavali/SPARK-10748.
      Unverified
      8640dc08
    • jerryshao's avatar
      [SPARK-19545][YARN] Fix compile issue for Spark on Yarn when building against Hadoop 2.6.0~2.6.3 · 8e8afb3a
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Due to the newly added API in Hadoop 2.6.4+, Spark builds against Hadoop 2.6.0~2.6.3 will meet compile error. So here still reverting back to use reflection to handle this issue.
      
      ## How was this patch tested?
      
      Manual verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16884 from jerryshao/SPARK-19545.
      Unverified
      8e8afb3a
  28. Feb 08, 2017
    • Sean Owen's avatar
      [SPARK-19464][BUILD][HOTFIX][TEST-HADOOP2.6] Add back mockito test dep in YARN... · 15627ac7
      Sean Owen authored
      [SPARK-19464][BUILD][HOTFIX][TEST-HADOOP2.6] Add back mockito test dep in YARN module, as it ends up being required in a Maven build
      
      Add back mockito test dep in YARN module, as it ends up being required in a Maven build
      
      ## How was this patch tested?
      
      PR builder again, but also a local `mvn` run using the command that the broken Jenkins job uses
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16853 from srowen/SPARK-19464.2.
      Unverified
      15627ac7
    • Dongjoon Hyun's avatar
      [SPARK-19409][BUILD][TEST-MAVEN] Fix ParquetAvroCompatibilitySuite failure due... · 0077bfcb
      Dongjoon Hyun authored
      [SPARK-19409][BUILD][TEST-MAVEN] Fix ParquetAvroCompatibilitySuite failure due to test dependency on avro
      
      ## What changes were proposed in this pull request?
      
      After using Apache Parquet 1.8.2, `ParquetAvroCompatibilitySuite` fails on **Maven** test. It is because `org.apache.parquet.avro.AvroParquetWriter` in the test code used new `avro 1.8.0` specific class, `LogicalType`. This PR aims to fix the test dependency of `sql/core` module to use avro 1.8.0.
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2530/consoleFull
      
      ```
      ParquetAvroCompatibilitySuite:
      *** RUN ABORTED ***
        java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
        at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144)
      ```
      
      ## How was this patch tested?
      
      Pass the existing test with **Maven**.
      
      ```
      $ build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver test
      ...
      [INFO] ------------------------------------------------------------------------
      [INFO] BUILD SUCCESS
      [INFO] ------------------------------------------------------------------------
      [INFO] Total time: 02:07 h
      [INFO] Finished at: 2017-02-04T05:41:43+00:00
      [INFO] Final Memory: 77M/987M
      [INFO] ------------------------------------------------------------------------
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16795 from dongjoon-hyun/SPARK-19409-2.
      Unverified
      0077bfcb
    • Sean Owen's avatar
      [SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier · e8d3fca4
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove support for Hadoop 2.5 and earlier
      - Remove reflection and code constructs only needed to support multiple versions at once
      - Update docs to reflect newer versions
      - Remove older versions' builds and profiles.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16810 from srowen/SPARK-19464.
      Unverified
      e8d3fca4
  29. Jan 25, 2017
  30. Jan 24, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19139][CORE] New auth mechanism for transport library. · 8f3f73ab
      Marcelo Vanzin authored
      This change introduces a new auth mechanism to the transport library,
      to be used when users enable strong encryption. This auth mechanism
      has better security than the currently used DIGEST-MD5.
      
      The new protocol uses symmetric key encryption to mutually authenticate
      the endpoints, and is very loosely based on ISO/IEC 9798.
      
      The new protocol falls back to SASL when it thinks the remote end is old.
      Because SASL does not support asking the server for multiple auth protocols,
      which would mean we could re-use the existing SASL code by just adding a
      new SASL provider, the protocol is implemented outside of the SASL API
      to avoid the boilerplate of adding a new provider.
      
      Details of the auth protocol are discussed in the included README.md
      file.
      
      This change partly undos the changes added in SPARK-13331; AES encryption
      is now decoupled from SASL authentication. The encryption code itself,
      though, has been re-used as part of this change.
      
      ## How was this patch tested?
      
      - Unit tests
      - Tested Spark 2.2 against Spark 1.6 shuffle service with SASL enabled
      - Tested Spark 2.2 against Spark 2.2 shuffle service with SASL fallback disabled
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16521 from vanzin/SPARK-19139.
      8f3f73ab
  31. Jan 18, 2017
  32. Jan 17, 2017
    • jerryshao's avatar
      [SPARK-19179][YARN] Change spark.yarn.access.namenodes config and update docs · b79cc7ce
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      `spark.yarn.access.namenodes` configuration cannot actually reflects the usage of it, inside the code it is the Hadoop filesystems we get tokens, not NNs. So here propose to update the name of this configuration, also change the related code and doc.
      
      ## How was this patch tested?
      
      Local verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16560 from jerryshao/SPARK-19179.
      b79cc7ce
    • Yanbo Liang's avatar
      [MINOR][YARN] Move YarnSchedulerBackendSuite to resource-managers/yarn directory. · 84f0b645
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      #16092 moves YARN resource manager related code to resource-managers/yarn directory. The test case ```YarnSchedulerBackendSuite``` was added after that but with the wrong place. I move it to correct directory in this PR.
      
      ## How was this patch tested?
      Existing test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16595 from yanboliang/yarn.
      84f0b645
  33. Jan 11, 2017
    • jerryshao's avatar
      [SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS security filesystems · 4239a108
      jerryshao authored
      Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different DelegationTokenIdentifier.
      
      ## How was this patch tested?
      
      Manually verified in security cluster.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16432 from jerryshao/SPARK-19021.
      4239a108
Loading