Skip to content
Snippets Groups Projects
  1. Jun 07, 2015
  2. Jun 06, 2015
    • Hari Shreedharan's avatar
      [SPARK-7955] [CORE] Ensure executors with cached RDD blocks are not re… · 3285a511
      Hari Shreedharan authored
      …moved if dynamic allocation is enabled.
      
      This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed,
      but makes no attempt to find another executor to remove. This is meant to get some feedback on the current
      approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6508 from harishreedharan/dymanic-caching and squashes the following commits:
      
      dddf1eb [Hari Shreedharan] Minor configuration description update.
      10130e2 [Hari Shreedharan] Fix compile issue.
      5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped.
      875916a [Hari Shreedharan] Make some code more readable.
      39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered.
      90ad711 [Hari Shreedharan] Remove unused imports and unused methods.
      063985c [Hari Shreedharan] Send correct message instead of recursively calling same method.
      ec2fd7e [Hari Shreedharan] Add file missed in last commit
      5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC.
      193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC.
      ae932ff [Hari Shreedharan] Fix config param name.
      272969d [Hari Shreedharan] Fix seconds to millis bug.
      5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks.
      57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.
      3285a511
    • Marcelo Vanzin's avatar
      [SPARK-7169] [CORE] Allow metrics system to be configured through SparkConf. · 18c4fceb
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #6560 from vanzin/SPARK-7169 and squashes the following commits:
      
      737266f [Marcelo Vanzin] Feedback.
      702d5a3 [Marcelo Vanzin] Scalastyle.
      ce66e7e [Marcelo Vanzin] Remove metrics config handling from SparkConf.
      439938a [Jacek Lewandowski] SPARK-7169: Metrics can be additionally configured from Spark configuration
      18c4fceb
    • Xu Tingjun's avatar
      [SPARK-6973] remove skipped stage ID from completed set on the allJobsPage · a8077e5c
      Xu Tingjun authored
      Though totalStages = allStages - skippedStages is understandable. But consider the problem [SPARK-6973], I think totalStages = allStages is more reasonable. Like "2/1 (2 failed) (1 skipped)", this item also shows the skipped num, it also will be understandable.
      
      Author: Xu Tingjun <xutingjun@huawei.com>
      Author: Xutingjun <xutingjun@huawei.com>
      Author: meiyoula <1039320815@qq.com>
      
      Closes #5550 from XuTingjun/allJobsPage and squashes the following commits:
      
      a742541 [Xu Tingjun] delete the loop
      40ce94b [Xutingjun] remove stage id from completed set if it retries again
      6459238 [meiyoula] delete space
      9e23c71 [Xu Tingjun] recover numSkippedStages
      b987ea7 [Xutingjun] delete skkiped stages from completed set
      47525c6 [Xu Tingjun] modify total stages/tasks on the allJobsPage
      a8077e5c
  3. Jun 05, 2015
    • jerryshao's avatar
      [SPARK-7699] [CORE] Lazy start the scheduler for dynamic allocation · 3f80bc84
      jerryshao authored
      This patch propose to lazy start the scheduler for dynamic allocation to avoid fast ramp down executor numbers is load is less.
      
      This implementation will:
      1. immediately start the scheduler is `numExecutorsTarget` is 0, this is the expected behavior.
      2. if `numExecutorsTarget` is not zero, start the scheduler until the number is satisfied, if the load is less, this initial started executors will last for at least 60 seconds, user will have a window to submit a job, no need to revamp the executors.
      3. if `numExecutorsTarget` is not satisfied until the timeout, this means resource is not enough, the scheduler will start until this timeout, will not wait infinitely.
      
      Please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #6430 from jerryshao/SPARK-7699 and squashes the following commits:
      
      02cac8e [jerryshao] Address the comments
      7242450 [jerryshao] Remove the useless import
      ecc0b00 [jerryshao] Address the comments
      6f75f00 [jerryshao] Style changes
      8b8decc [jerryshao] change the test name
      fb822ca [jerryshao] Change the solution according to comments
      1cc74e5 [jerryshao] Lazy start the scheduler for dynamic allocation
      3f80bc84
    • Xutingjun's avatar
      [SPARK-8099] set executor cores into system in yarn-cluster mode · 0992a0a7
      Xutingjun authored
      Author: Xutingjun <xutingjun@huawei.com>
      Author: xutingjun <xutingjun@huawei.com>
      
      Closes #6643 from XuTingjun/SPARK-8099 and squashes the following commits:
      
      80b18cd [Xutingjun] change to STANDALONE | YARN
      ce33148 [Xutingjun] set executor cores into system
      e51cc9e [Xutingjun] set executor cores into system
      0600861 [xutingjun] set executor cores into system
      0992a0a7
    • Andrew Or's avatar
      Revert "[MINOR] [BUILD] Use custom temp directory during build." · 4036d05c
      Andrew Or authored
      This reverts commit b16b5434.
      4036d05c
    • Marcelo Vanzin's avatar
      [SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1
      Marcelo Vanzin authored
      Reorganize code so that the launcher library handles most of the work
      of printing usage messages, instead of having an awkward protocol between
      the library and the scripts for that.
      
      This mostly applies to SparkSubmit, since the launcher lib does not do
      command line parsing for classes invoked in other ways, and thus cannot
      handle failures for those. Most scripts end up going through SparkSubmit,
      though, so it all works.
      
      The change adds a new, internal command line switch, "--usage-error",
      which prints the usage message and exits with a non-zero status. Scripts
      can override the command printed in the usage message by setting an
      environment variable - this avoids having to grep the output of
      SparkSubmit to remove references to the "spark-submit" script.
      
      The only sub-optimal part of the change is the special handling for the
      spark-sql usage, which is now done in SparkSubmitArguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:
      
      2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
      bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
      c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
      6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
      700312e1
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Use custom temp directory during build. · b16b5434
      Marcelo Vanzin authored
      Even with all the efforts to cleanup the temp directories created by
      unit tests, Spark leaves a lot of garbage in /tmp after a test run.
      This change overrides java.io.tmpdir to place those files under the
      build directory instead.
      
      After an sbt full unit test run, I was left with > 400 MB of temp
      files. Since they're now under the build dir, it's much easier to
      clean them up.
      
      Also make a slight change to a unit test to make it not pollute the
      source directory with test data.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6653 from vanzin/unit-test-tmp and squashes the following commits:
      
      31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
      aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.
      b16b5434
    • Sean Owen's avatar
      [MINOR] remove unused interpolation var in log message · 3a5c4da4
      Sean Owen authored
      Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6650 from srowen/Interpolation and squashes the following commits:
      
      518687a [Sean Owen] Actually interpolate log string
      7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message
      3a5c4da4
  4. Jun 04, 2015
    • Carson Wang's avatar
      [SPARK-8098] [WEBUI] Show correct length of bytes on log page · 63bc0c44
      Carson Wang authored
      The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #6640 from carsonwang/logpage and squashes the following commits:
      
      58cb3fd [Carson Wang] Show correct length of bytes on log page
      63bc0c44
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh · 3dc00528
      Shivaram Venkataraman authored
      This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      
      Related to discussion in #6567
      
      cc pwendell srowen -- Let me know if this looks better
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:
      
      b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
      8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      3dc00528
    • Davies Liu's avatar
      [SPARK-7956] [SQL] Use Janino to compile SQL expressions into bytecode · c8709dcf
      Davies Liu authored
      In order to reduce the overhead of codegen, this PR switch to use Janino to compile SQL expressions into bytecode.
      
      After this, the time used to compile a SQL expression is decreased from 100ms to 5ms, which is necessary to turn on codegen for general workload, also tests.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6479 from davies/janino and squashes the following commits:
      
      cc689f5 [Davies Liu] remove globalLock
      262d848 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      eec3a33 [Davies Liu] address comments from Josh
      f37c8c3 [Davies Liu] fix DecimalType and cast to String
      202298b [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      a21e968 [Davies Liu] fix style
      0ed3dc6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      551a851 [Davies Liu] fix tests
      c3bdffa [Davies Liu] remove print
      6089ce5 [Davies Liu] change logging level
      7e46ac3 [Davies Liu] fix style
      d8f0f6c [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      da4926a [Davies Liu] fix tests
      03660f3 [Davies Liu] WIP: use Janino to compile Java source
      f2629cd [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      f7d66cf [Davies Liu] use template based string for codegen
      c8709dcf
    • Daniel Darabos's avatar
      Fix maxTaskFailures comment · 10ba1880
      Daniel Darabos authored
      If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks!
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #6621 from darabos/patch-2 and squashes the following commits:
      
      dfebdec [Daniel Darabos] Fix comment.
      10ba1880
  5. Jun 03, 2015
    • Ryan Williams's avatar
      [SPARK-8088] don't attempt to lower number of executors by 0 · 51898b51
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #6624 from ryan-williams/execs and squashes the following commits:
      
      b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0
      51898b51
    • Hari Shreedharan's avatar
      [HOTFIX] History Server API docs error fix. · 566cb594
      Hari Shreedharan authored
      Minor error in the monitoring docs. Also made indentation changes in `ApiRootResource`
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6628 from harishreedharan/eventlog-formatting and squashes the following commits:
      
      a12553d [Hari Shreedharan] Javadoc updates.
      ca399b6 [Hari Shreedharan] [HOTFIX] History Server API docs error fix.
      566cb594
    • Andrew Or's avatar
      [HOTFIX] [TYPO] Fix typo in #6546 · bfbdab12
      Andrew Or authored
      bfbdab12
    • Hari Shreedharan's avatar
      [HOTFIX] Fix Hadoop-1 build caused by #5792. · a8f1f154
      Hari Shreedharan authored
      Replaced `fs.listFiles` with Hadoop-1 friendly `fs.listStatus` method.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6619 from harishreedharan/evetlog-hadoop-1-fix and squashes the following commits:
      
      6192078 [Hari Shreedharan] [HOTFIX] Fix Hadoop-1 build caused by #5972.
      a8f1f154
    • zsxwing's avatar
      [SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and... · f2713478
      zsxwing authored
      [SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
      
      The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs.
      
      This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6546 from zsxwing/SPARK-7989 and squashes the following commits:
      
      5560e09 [zsxwing] Fix a typo
      3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
      f2713478
    • zsxwing's avatar
      [SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout · 1d8669f1
      zsxwing authored
      Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits:
      
      607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
      1d8669f1
    • Timothy Chen's avatar
      [SPARK-8083] [MESOS] Use the correct base path in mesos driver page. · bfbf12b3
      Timothy Chen authored
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #6615 from tnachen/mesos_driver_path and squashes the following commits:
      
      4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.
      bfbf12b3
    • Andrew Or's avatar
      [MINOR] [UI] Improve confusing message on log page · c6a6dd0d
      Andrew Or authored
      It's good practice to check if the input path is in the directory
      we expect to avoid potentially confusing error messages.
      c6a6dd0d
    • Hari Shreedharan's avatar
      [SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro... · d2a86eb8
      Hari Shreedharan authored
      ...m History Server
      
      This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.
      
      This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #5792 from harishreedharan/eventlog-download and squashes the following commits:
      
      221cc26 [Hari Shreedharan] Update docs with new API information.
      a131be6 [Hari Shreedharan] Fix style issues.
      5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
      6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
      d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
      ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
      1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
      5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
      0b66948 [Hari Shreedharan] Minor formatting/import fixes.
      4fc518c [Hari Shreedharan] Fix rat failures.
      a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
      0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
      350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
      fd6ab00 [Hari Shreedharan] Fix style issues
      32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
      7b362b2 [Hari Shreedharan] Almost working.
      3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
      d2a86eb8
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
    • Wenchen Fan's avatar
      [SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch · d38cf217
      Wenchen Fan authored
      It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is.
      
      This PR mainly works on apply input types checking for arithmetic and predicate expressions.
      
      TODO: apply type checking interface to more expressions.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6405 from cloud-fan/6444 and squashes the following commits:
      
      b5ff31b [Wenchen Fan] address comments
      b917275 [Wenchen Fan] rebase
      39929d9 [Wenchen Fan] add todo
      0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private
      3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison
      8883025 [Wenchen Fan] apply type check interface to CaseWhen
      cffb67c [Wenchen Fan] to have resolved call the data type check function
      6eaadff [Wenchen Fan] add equal type constraint to EqualTo
      3affbd8 [Wenchen Fan] more fixes
      654d46a [Wenchen Fan] improve tests
      e0a3628 [Wenchen Fan] improve error message
      1524ff6 [Wenchen Fan] fix style
      69ca3fe [Wenchen Fan] add error message and tests
      c71d02c [Wenchen Fan] fix hive tests
      6491721 [Wenchen Fan] use value class TypeCheckResult
      7ae76b9 [Wenchen Fan] address comments
      cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
      d38cf217
  6. Jun 01, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Add maven profile to build R package docs · cae9306c
      Shivaram Venkataraman authored
      Also use that profile in create-release.sh
      
      cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6567 from shivaram/SPARK-8027 and squashes the following commits:
      
      8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
      cae9306c
    • Shivaram Venkataraman's avatar
      [SPARK-8028] [SPARKR] Use addJar instead of setJars in SparkR · 6b44278e
      Shivaram Venkataraman authored
      This prevents the spark.jars from being cleared while using `--packages` or `--jars`
      
      cc pwendell davies brkyvz
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6568 from shivaram/SPARK-8028 and squashes the following commits:
      
      3a9cf1f [Shivaram Venkataraman] Use addJar instead of setJars in SparkR This prevents the spark.jars from being cleared
      6b44278e
    • Andrew Or's avatar
      [MINOR] [UI] Improve error message on log page · 15d7c90a
      Andrew Or authored
      Currently if a bad log type if specified, then we get blank.
      We should provide a more informative error message.
      15d7c90a
  7. May 31, 2015
    • Sun Rui's avatar
      [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame. · 46576ab3
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits:
      
      dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
      41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.
      46576ab3
    • Reynold Xin's avatar
      [SPARK-7979] Enforce structural type checker. · 4b5f12ba
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6536 from rxin/structural-type-checker and squashes the following commits:
      
      f833151 [Reynold Xin] Fixed compilation.
      633f9a1 [Reynold Xin] Fixed typo.
      d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.
      4b5f12ba
    • Reynold Xin's avatar
      [SPARK-3850] Trim trailing spaces for core. · 74fdc97c
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6533 from rxin/whitespace-2 and squashes the following commits:
      
      038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.
      74fdc97c
    • Reynold Xin's avatar
      [SPARK-7976] Add style checker to disallow overriding finalize. · 084fef76
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6528 from rxin/style-finalizer and squashes the following commits:
      
      a2211ca [Reynold Xin] [SPARK-7976] Enable NoFinalizeChecker.
      084fef76
  8. May 30, 2015
    • Josh Rosen's avatar
      [HOTFIX] Replace FunSuite with SparkFunSuite. · 66a53a69
      Josh Rosen authored
      This fixes a build break introduced by merging a6430028,
      which fails the new style checks that ensure that we use SparkFunSuite instead
      of FunSuite.
      66a53a69
    • Josh Rosen's avatar
      [SPARK-7855] Move bypassMergeSort-handling from ExternalSorter to own component · a6430028
      Josh Rosen authored
      Spark's `ExternalSorter` writes shuffle output files during sort-based shuffle. Sort-shuffle contains a configuration, `spark.shuffle.sort.bypassMergeThreshold`, which causes ExternalSorter to skip sorting and merging and simply write separate files per partition, which are then concatenated together to form the final map output file.
      
      The code paths used during this bypass are almost completely separate from ExternalSorter's other code paths, so refactoring them into a separate file can significantly simplify the code.
      
      In addition to re-arranging code, this patch deletes a bunch of dead code.  The main entry point into ExternalSorter is `insertAll()` and in SPARK-4479 / #3422 this method was modified to completely bypass in-memory buffering of records when `bypassMergeSort` takes effect. As a result, some of the spilling and merging code paths will no longer be called when `bypassMergeSort` is used, so we should be able to safely remove that code.
      
      There's an open JIRA ([SPARK-6026](https://issues.apache.org/jira/browse/SPARK-6026)) for removing the `bypassMergeThreshold` parameter and code paths; I have not done that here, but the changes in this patch will make removing that parameter significantly easier if we ever decide to do that.
      
      This patch also makes several improvements to shuffle-related tests and adds more defensive checks to certain shuffle classes:
      
      - DiskBlockObjectWriter now throws an exception if `fileSegment()` is called before `commitAndClose()` has been called.
      - DiskBlockObjectWriter's close methods are now idempotent, so calling any of the close methods twice in a row will no longer result in incorrect shuffle write metrics changes.  Calling `revertPartialWritesAndClose()` on a closed DiskBlockObjectWriter now has no effect (before, it might mess up the metrics).
      - The end-to-end shuffle record count metrics tests have been moved from InputOutputMetricsSuite to ShuffleSuite.  This means that these tests will now be run against all shuffle implementations rather than just the default shuffle configuration.
      - The end-to-end metrics tests now include a test of a job which performs aggregation in the shuffle.
      - Our tests now check that `shuffleBytesWritten == totalShuffleBytesRead`.
      - FileSegment now throws IllegalArgumentException if it is constructed with a negative length or offset.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6397 from JoshRosen/external-sorter-bypass-cleanup and squashes the following commits:
      
      bf3f3f6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into external-sorter-bypass-cleanup
      8b216c4 [Josh Rosen] Guard against negative offsets and lengths in FileSegment
      03f35a4 [Josh Rosen] Minor fix to cleanup logic.
      b5cc35b [Josh Rosen] Move shuffle metrics tests to ShuffleSuite.
      8b8fb9e [Josh Rosen] Add more tests + defensive programming to DiskBlockObjectWriter.
      16564eb [Josh Rosen] Guard against calling fileSegment() before commitAndClose() has been called.
      96811b4 [Josh Rosen] Remove confusing taskMetrics.shuffleWriteMetrics() optional call
      8522b6a [Josh Rosen] Do not perform a map-side sort unless we're also doing map-side aggregation
      08e40f3 [Josh Rosen] Remove excessively clever (and wrong) implementation of newBuffer()
      d7f9938 [Josh Rosen] Add missing overrides; fix compilation
      71d76ff [Josh Rosen] Update Javadoc
      bf0d98f [Josh Rosen] Add comment to clarify confusing factory code
      5197f73 [Josh Rosen] Add missing private[this]
      30ef2c8 [Josh Rosen] Convert BypassMergeSortShuffleWriter to Java
      bc1a820 [Josh Rosen] Fix bug when aggregator is used but map-side combine is disabled
      0d3dcc0 [Josh Rosen] Remove unnecessary overloaded methods
      25b964f [Josh Rosen] Rename SortShuffleSorter to SortShuffleFileWriter
      0d9848c [Josh Rosen] Make it more clear that curWriteMetrics is now only used for spill metrics
      7af7aea [Josh Rosen] Combine spill() and spillToMergeableFile()
      6320112 [Josh Rosen] Add missing negation in deletion success check.
      d267e0d [Josh Rosen] Fix style issue
      7f15f7b [Josh Rosen] Back out extra cleanup-handling code, since this is already covered in stop()
      25aa3bd [Josh Rosen] Make sure to delete outputFile after errors.
      931ca68 [Josh Rosen] Refactor tests.
      6a35716 [Josh Rosen] Refactor logic for deciding when to bypass
      4b03539 [Josh Rosen] Move conf prior to first use
      1265b25 [Josh Rosen] Fix some style errors and comments.
      02355ef [Josh Rosen] More simplification
      d4cb536 [Josh Rosen] Delete more unused code
      bb96678 [Josh Rosen] Add missing interface file
      b6cc1eb [Josh Rosen] Realize that bypass never buffers; proceed to delete tons of code
      6185ee2 [Josh Rosen] WIP towards moving bypass code into own file.
      8d0678c [Josh Rosen] Move diskBytesSpilled getter next to variable
      19bccd6 [Josh Rosen] Remove duplicated buffer creation code.
      18959bb [Josh Rosen] Move comparator methods closer together.
      a6430028
    • zhichao.li's avatar
      [SPARK-7717] [WEBUI] Only showing total memory and cores for alive workers · 2b35c99c
      zhichao.li authored
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #6317 from zhichao-li/workers and squashes the following commits:
      
      d68bf11 [zhichao.li] change prefix
      99b6768 [zhichao.li] remove extra space and add 'Alive' prefix
      1e8eb06 [zhichao.li] only showing alive workers
      2b35c99c
    • Timothy Chen's avatar
      [SPARK-7962] [MESOS] Fix master url parsing in rest submission client. · 78657d53
      Timothy Chen authored
      Only parse standalone master url when master url starts with spark://
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #6517 from tnachen/fix_mesos_client and squashes the following commits:
      
      61a1198 [Timothy Chen] Fix master url parsing in rest submission client.
      78657d53
    • Andrew Or's avatar
      [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike · 609c4923
      Andrew Or authored
      This is a follow-up patch to #6441.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6510 from andrewor14/extends-funsuite-check and squashes the following commits:
      
      6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check
      99d02ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into extends-funsuite-check
      48874dd [Andrew Or] Guard against direct uses of FunSuite / FunSuiteLike
      609c4923
    • Burak Yavuz's avatar
      [SPARK-7957] Preserve partitioning when using randomSplit · 7ed06c39
      Burak Yavuz authored
      cc JoshRosen
      Thanks for noticing this!
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits:
      
      497465d [Burak Yavuz] addressed code review
      293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit
      7ed06c39
  9. May 29, 2015
    • Holden Karau's avatar
      [SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd · 82a396c2
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6464 from holdenk/SPARK-7910-expose-partitioner-information-in-javardd and squashes the following commits:
      
      de1e644 [Holden Karau] Fix the test to get the partitioner
      bdb31cc [Holden Karau] Add Mima exclude for the new method
      347ef4c [Holden Karau] Add a quick little test for the partitioner JavaAPI
      f49dca9 [Holden Karau] Add partitoner information to JavaRDDLike and fix some whitespace
      82a396c2
Loading