Skip to content
Snippets Groups Projects
  1. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Use custom temp directory during build. · b16b5434
      Marcelo Vanzin authored
      Even with all the efforts to cleanup the temp directories created by
      unit tests, Spark leaves a lot of garbage in /tmp after a test run.
      This change overrides java.io.tmpdir to place those files under the
      build directory instead.
      
      After an sbt full unit test run, I was left with > 400 MB of temp
      files. Since they're now under the build dir, it's much easier to
      clean them up.
      
      Also make a slight change to a unit test to make it not pollute the
      source directory with test data.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6653 from vanzin/unit-test-tmp and squashes the following commits:
      
      31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
      aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.
      b16b5434
    • Sean Owen's avatar
      [MINOR] remove unused interpolation var in log message · 3a5c4da4
      Sean Owen authored
      Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6650 from srowen/Interpolation and squashes the following commits:
      
      518687a [Sean Owen] Actually interpolate log string
      7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message
      3a5c4da4
  2. Jun 04, 2015
    • Carson Wang's avatar
      [SPARK-8098] [WEBUI] Show correct length of bytes on log page · 63bc0c44
      Carson Wang authored
      The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #6640 from carsonwang/logpage and squashes the following commits:
      
      58cb3fd [Carson Wang] Show correct length of bytes on log page
      63bc0c44
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh · 3dc00528
      Shivaram Venkataraman authored
      This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      
      Related to discussion in #6567
      
      cc pwendell srowen -- Let me know if this looks better
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:
      
      b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
      8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      3dc00528
    • Davies Liu's avatar
      [SPARK-7956] [SQL] Use Janino to compile SQL expressions into bytecode · c8709dcf
      Davies Liu authored
      In order to reduce the overhead of codegen, this PR switch to use Janino to compile SQL expressions into bytecode.
      
      After this, the time used to compile a SQL expression is decreased from 100ms to 5ms, which is necessary to turn on codegen for general workload, also tests.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6479 from davies/janino and squashes the following commits:
      
      cc689f5 [Davies Liu] remove globalLock
      262d848 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      eec3a33 [Davies Liu] address comments from Josh
      f37c8c3 [Davies Liu] fix DecimalType and cast to String
      202298b [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      a21e968 [Davies Liu] fix style
      0ed3dc6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      551a851 [Davies Liu] fix tests
      c3bdffa [Davies Liu] remove print
      6089ce5 [Davies Liu] change logging level
      7e46ac3 [Davies Liu] fix style
      d8f0f6c [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      da4926a [Davies Liu] fix tests
      03660f3 [Davies Liu] WIP: use Janino to compile Java source
      f2629cd [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      f7d66cf [Davies Liu] use template based string for codegen
      c8709dcf
    • Daniel Darabos's avatar
      Fix maxTaskFailures comment · 10ba1880
      Daniel Darabos authored
      If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks!
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #6621 from darabos/patch-2 and squashes the following commits:
      
      dfebdec [Daniel Darabos] Fix comment.
      10ba1880
  3. Jun 03, 2015
    • Ryan Williams's avatar
      [SPARK-8088] don't attempt to lower number of executors by 0 · 51898b51
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #6624 from ryan-williams/execs and squashes the following commits:
      
      b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0
      51898b51
    • Hari Shreedharan's avatar
      [HOTFIX] History Server API docs error fix. · 566cb594
      Hari Shreedharan authored
      Minor error in the monitoring docs. Also made indentation changes in `ApiRootResource`
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6628 from harishreedharan/eventlog-formatting and squashes the following commits:
      
      a12553d [Hari Shreedharan] Javadoc updates.
      ca399b6 [Hari Shreedharan] [HOTFIX] History Server API docs error fix.
      566cb594
    • Andrew Or's avatar
      [HOTFIX] [TYPO] Fix typo in #6546 · bfbdab12
      Andrew Or authored
      bfbdab12
    • Hari Shreedharan's avatar
      [HOTFIX] Fix Hadoop-1 build caused by #5792. · a8f1f154
      Hari Shreedharan authored
      Replaced `fs.listFiles` with Hadoop-1 friendly `fs.listStatus` method.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6619 from harishreedharan/evetlog-hadoop-1-fix and squashes the following commits:
      
      6192078 [Hari Shreedharan] [HOTFIX] Fix Hadoop-1 build caused by #5972.
      a8f1f154
    • zsxwing's avatar
      [SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and... · f2713478
      zsxwing authored
      [SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
      
      The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs.
      
      This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6546 from zsxwing/SPARK-7989 and squashes the following commits:
      
      5560e09 [zsxwing] Fix a typo
      3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
      f2713478
    • zsxwing's avatar
      [SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout · 1d8669f1
      zsxwing authored
      Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits:
      
      607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
      1d8669f1
    • Timothy Chen's avatar
      [SPARK-8083] [MESOS] Use the correct base path in mesos driver page. · bfbf12b3
      Timothy Chen authored
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #6615 from tnachen/mesos_driver_path and squashes the following commits:
      
      4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.
      bfbf12b3
    • Andrew Or's avatar
      [MINOR] [UI] Improve confusing message on log page · c6a6dd0d
      Andrew Or authored
      It's good practice to check if the input path is in the directory
      we expect to avoid potentially confusing error messages.
      c6a6dd0d
    • Hari Shreedharan's avatar
      [SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro... · d2a86eb8
      Hari Shreedharan authored
      ...m History Server
      
      This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.
      
      This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #5792 from harishreedharan/eventlog-download and squashes the following commits:
      
      221cc26 [Hari Shreedharan] Update docs with new API information.
      a131be6 [Hari Shreedharan] Fix style issues.
      5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
      6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
      d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
      ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
      1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
      5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
      0b66948 [Hari Shreedharan] Minor formatting/import fixes.
      4fc518c [Hari Shreedharan] Fix rat failures.
      a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
      0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
      350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
      fd6ab00 [Hari Shreedharan] Fix style issues
      32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
      7b362b2 [Hari Shreedharan] Almost working.
      3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
      d2a86eb8
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
    • Wenchen Fan's avatar
      [SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch · d38cf217
      Wenchen Fan authored
      It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is.
      
      This PR mainly works on apply input types checking for arithmetic and predicate expressions.
      
      TODO: apply type checking interface to more expressions.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6405 from cloud-fan/6444 and squashes the following commits:
      
      b5ff31b [Wenchen Fan] address comments
      b917275 [Wenchen Fan] rebase
      39929d9 [Wenchen Fan] add todo
      0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private
      3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison
      8883025 [Wenchen Fan] apply type check interface to CaseWhen
      cffb67c [Wenchen Fan] to have resolved call the data type check function
      6eaadff [Wenchen Fan] add equal type constraint to EqualTo
      3affbd8 [Wenchen Fan] more fixes
      654d46a [Wenchen Fan] improve tests
      e0a3628 [Wenchen Fan] improve error message
      1524ff6 [Wenchen Fan] fix style
      69ca3fe [Wenchen Fan] add error message and tests
      c71d02c [Wenchen Fan] fix hive tests
      6491721 [Wenchen Fan] use value class TypeCheckResult
      7ae76b9 [Wenchen Fan] address comments
      cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
      d38cf217
  4. Jun 01, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Add maven profile to build R package docs · cae9306c
      Shivaram Venkataraman authored
      Also use that profile in create-release.sh
      
      cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6567 from shivaram/SPARK-8027 and squashes the following commits:
      
      8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
      cae9306c
    • Shivaram Venkataraman's avatar
      [SPARK-8028] [SPARKR] Use addJar instead of setJars in SparkR · 6b44278e
      Shivaram Venkataraman authored
      This prevents the spark.jars from being cleared while using `--packages` or `--jars`
      
      cc pwendell davies brkyvz
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6568 from shivaram/SPARK-8028 and squashes the following commits:
      
      3a9cf1f [Shivaram Venkataraman] Use addJar instead of setJars in SparkR This prevents the spark.jars from being cleared
      6b44278e
    • Andrew Or's avatar
      [MINOR] [UI] Improve error message on log page · 15d7c90a
      Andrew Or authored
      Currently if a bad log type if specified, then we get blank.
      We should provide a more informative error message.
      15d7c90a
  5. May 31, 2015
    • Sun Rui's avatar
      [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame. · 46576ab3
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits:
      
      dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
      41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.
      46576ab3
    • Reynold Xin's avatar
      [SPARK-7979] Enforce structural type checker. · 4b5f12ba
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6536 from rxin/structural-type-checker and squashes the following commits:
      
      f833151 [Reynold Xin] Fixed compilation.
      633f9a1 [Reynold Xin] Fixed typo.
      d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.
      4b5f12ba
    • Reynold Xin's avatar
      [SPARK-3850] Trim trailing spaces for core. · 74fdc97c
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6533 from rxin/whitespace-2 and squashes the following commits:
      
      038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.
      74fdc97c
    • Reynold Xin's avatar
      [SPARK-7976] Add style checker to disallow overriding finalize. · 084fef76
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6528 from rxin/style-finalizer and squashes the following commits:
      
      a2211ca [Reynold Xin] [SPARK-7976] Enable NoFinalizeChecker.
      084fef76
  6. May 30, 2015
    • Josh Rosen's avatar
      [HOTFIX] Replace FunSuite with SparkFunSuite. · 66a53a69
      Josh Rosen authored
      This fixes a build break introduced by merging a6430028,
      which fails the new style checks that ensure that we use SparkFunSuite instead
      of FunSuite.
      66a53a69
    • Josh Rosen's avatar
      [SPARK-7855] Move bypassMergeSort-handling from ExternalSorter to own component · a6430028
      Josh Rosen authored
      Spark's `ExternalSorter` writes shuffle output files during sort-based shuffle. Sort-shuffle contains a configuration, `spark.shuffle.sort.bypassMergeThreshold`, which causes ExternalSorter to skip sorting and merging and simply write separate files per partition, which are then concatenated together to form the final map output file.
      
      The code paths used during this bypass are almost completely separate from ExternalSorter's other code paths, so refactoring them into a separate file can significantly simplify the code.
      
      In addition to re-arranging code, this patch deletes a bunch of dead code.  The main entry point into ExternalSorter is `insertAll()` and in SPARK-4479 / #3422 this method was modified to completely bypass in-memory buffering of records when `bypassMergeSort` takes effect. As a result, some of the spilling and merging code paths will no longer be called when `bypassMergeSort` is used, so we should be able to safely remove that code.
      
      There's an open JIRA ([SPARK-6026](https://issues.apache.org/jira/browse/SPARK-6026)) for removing the `bypassMergeThreshold` parameter and code paths; I have not done that here, but the changes in this patch will make removing that parameter significantly easier if we ever decide to do that.
      
      This patch also makes several improvements to shuffle-related tests and adds more defensive checks to certain shuffle classes:
      
      - DiskBlockObjectWriter now throws an exception if `fileSegment()` is called before `commitAndClose()` has been called.
      - DiskBlockObjectWriter's close methods are now idempotent, so calling any of the close methods twice in a row will no longer result in incorrect shuffle write metrics changes.  Calling `revertPartialWritesAndClose()` on a closed DiskBlockObjectWriter now has no effect (before, it might mess up the metrics).
      - The end-to-end shuffle record count metrics tests have been moved from InputOutputMetricsSuite to ShuffleSuite.  This means that these tests will now be run against all shuffle implementations rather than just the default shuffle configuration.
      - The end-to-end metrics tests now include a test of a job which performs aggregation in the shuffle.
      - Our tests now check that `shuffleBytesWritten == totalShuffleBytesRead`.
      - FileSegment now throws IllegalArgumentException if it is constructed with a negative length or offset.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6397 from JoshRosen/external-sorter-bypass-cleanup and squashes the following commits:
      
      bf3f3f6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into external-sorter-bypass-cleanup
      8b216c4 [Josh Rosen] Guard against negative offsets and lengths in FileSegment
      03f35a4 [Josh Rosen] Minor fix to cleanup logic.
      b5cc35b [Josh Rosen] Move shuffle metrics tests to ShuffleSuite.
      8b8fb9e [Josh Rosen] Add more tests + defensive programming to DiskBlockObjectWriter.
      16564eb [Josh Rosen] Guard against calling fileSegment() before commitAndClose() has been called.
      96811b4 [Josh Rosen] Remove confusing taskMetrics.shuffleWriteMetrics() optional call
      8522b6a [Josh Rosen] Do not perform a map-side sort unless we're also doing map-side aggregation
      08e40f3 [Josh Rosen] Remove excessively clever (and wrong) implementation of newBuffer()
      d7f9938 [Josh Rosen] Add missing overrides; fix compilation
      71d76ff [Josh Rosen] Update Javadoc
      bf0d98f [Josh Rosen] Add comment to clarify confusing factory code
      5197f73 [Josh Rosen] Add missing private[this]
      30ef2c8 [Josh Rosen] Convert BypassMergeSortShuffleWriter to Java
      bc1a820 [Josh Rosen] Fix bug when aggregator is used but map-side combine is disabled
      0d3dcc0 [Josh Rosen] Remove unnecessary overloaded methods
      25b964f [Josh Rosen] Rename SortShuffleSorter to SortShuffleFileWriter
      0d9848c [Josh Rosen] Make it more clear that curWriteMetrics is now only used for spill metrics
      7af7aea [Josh Rosen] Combine spill() and spillToMergeableFile()
      6320112 [Josh Rosen] Add missing negation in deletion success check.
      d267e0d [Josh Rosen] Fix style issue
      7f15f7b [Josh Rosen] Back out extra cleanup-handling code, since this is already covered in stop()
      25aa3bd [Josh Rosen] Make sure to delete outputFile after errors.
      931ca68 [Josh Rosen] Refactor tests.
      6a35716 [Josh Rosen] Refactor logic for deciding when to bypass
      4b03539 [Josh Rosen] Move conf prior to first use
      1265b25 [Josh Rosen] Fix some style errors and comments.
      02355ef [Josh Rosen] More simplification
      d4cb536 [Josh Rosen] Delete more unused code
      bb96678 [Josh Rosen] Add missing interface file
      b6cc1eb [Josh Rosen] Realize that bypass never buffers; proceed to delete tons of code
      6185ee2 [Josh Rosen] WIP towards moving bypass code into own file.
      8d0678c [Josh Rosen] Move diskBytesSpilled getter next to variable
      19bccd6 [Josh Rosen] Remove duplicated buffer creation code.
      18959bb [Josh Rosen] Move comparator methods closer together.
      a6430028
    • zhichao.li's avatar
      [SPARK-7717] [WEBUI] Only showing total memory and cores for alive workers · 2b35c99c
      zhichao.li authored
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #6317 from zhichao-li/workers and squashes the following commits:
      
      d68bf11 [zhichao.li] change prefix
      99b6768 [zhichao.li] remove extra space and add 'Alive' prefix
      1e8eb06 [zhichao.li] only showing alive workers
      2b35c99c
    • Timothy Chen's avatar
      [SPARK-7962] [MESOS] Fix master url parsing in rest submission client. · 78657d53
      Timothy Chen authored
      Only parse standalone master url when master url starts with spark://
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #6517 from tnachen/fix_mesos_client and squashes the following commits:
      
      61a1198 [Timothy Chen] Fix master url parsing in rest submission client.
      78657d53
    • Andrew Or's avatar
      [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike · 609c4923
      Andrew Or authored
      This is a follow-up patch to #6441.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6510 from andrewor14/extends-funsuite-check and squashes the following commits:
      
      6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check
      99d02ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into extends-funsuite-check
      48874dd [Andrew Or] Guard against direct uses of FunSuite / FunSuiteLike
      609c4923
    • Burak Yavuz's avatar
      [SPARK-7957] Preserve partitioning when using randomSplit · 7ed06c39
      Burak Yavuz authored
      cc JoshRosen
      Thanks for noticing this!
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits:
      
      497465d [Burak Yavuz] addressed code review
      293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit
      7ed06c39
  7. May 29, 2015
    • Holden Karau's avatar
      [SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd · 82a396c2
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6464 from holdenk/SPARK-7910-expose-partitioner-information-in-javardd and squashes the following commits:
      
      de1e644 [Holden Karau] Fix the test to get the partitioner
      bdb31cc [Holden Karau] Add Mima exclude for the new method
      347ef4c [Holden Karau] Add a quick little test for the partitioner JavaAPI
      f49dca9 [Holden Karau] Add partitoner information to JavaRDDLike and fix some whitespace
      82a396c2
    • Andrew Or's avatar
      [SPARK-7558] Demarcate tests in unit-tests.log · 9eb222c1
      Andrew Or authored
      Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:
      
      ```
      ===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====
      
      15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents
      
      ...
      
      15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
      15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:
      
      ***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****
      
      ...
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:
      
      879b060 [Andrew Or] Fix compile after rebase
      d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      7790b6c [Andrew Or] Fix tests after logical merge conflict
      c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      c43ffc4 [Andrew Or] Fix tests?
      8882581 [Andrew Or] Fix tests
      ee22cda [Andrew Or] Fix log message
      fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      12d1e1b [Andrew Or] Various whitespace changes (minor)
      69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
      bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
      da0b12f [Andrew Or] Add core tests as dependencies in all modules
      f7d29ce [Andrew Or] Introduce base abstract class for all test suites
      9eb222c1
    • Reynold Xin's avatar
      [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH,... · 94f62a49
      Reynold Xin authored
      [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker.
      
      …
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6491 from rxin/more-whitespace and squashes the following commits:
      
      f6e63dc [Reynold Xin] [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker.
      94f62a49
    • WangTaoTheTonic's avatar
      [SPARK-7524] [SPARK-7846] add configs for keytab and principal, pass these two... · a51b133d
      WangTaoTheTonic authored
      [SPARK-7524] [SPARK-7846] add configs for keytab and principal, pass these two configs with different way in different modes
      
      * As spark now supports long running service by updating tokens for namenode, but only accept parameters passed with "--k=v" format which is not very convinient. This patch add spark.* configs in properties file and system property.
      
      *  --principal and --keytabl options are passed to client but when we started thrift server or spark-shell these two are also passed into the Main class (org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 and org.apache.spark.repl.Main).
      In these two main class, arguments passed in will be processed with some 3rd libraries, which will lead to some error: "Invalid option: --principal" or "Unrecgnised option: --principal".
      We should pass these command args in different forms, say system properties.
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #6051 from WangTaoTheTonic/SPARK-7524 and squashes the following commits:
      
      e65699a [WangTaoTheTonic] change logic to loadEnvironments
      ebd9ea0 [WangTaoTheTonic] merge master
      ecfe43a [WangTaoTheTonic] pass keytab and principal seperately in different mode
      33a7f40 [WangTaoTheTonic] expand the use of the current configs
      08bb4e8 [WangTaoTheTonic] fix wrong cite
      73afa64 [WangTaoTheTonic] add configs for keytab and principal, move originals to internal
      a51b133d
    • zsxwing's avatar
      [SPARK-7863] [CORE] Create SimpleDateFormat for every SimpleDateParam instance... · 8db40f67
      zsxwing authored
      [SPARK-7863] [CORE] Create SimpleDateFormat for every SimpleDateParam instance because it's not thread-safe
      
      SimpleDateFormat is not thread-safe. This PR creates new `SimpleDateFormat` for each `SimpleDateParam` instance.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6406 from zsxwing/SPARK-7863 and squashes the following commits:
      
      aeed4c1 [zsxwing] Rewrite SimpleDateParam
      8cdd986 [zsxwing] Inline formats
      9680a15 [zsxwing] Create SimpleDateFormat for each SimpleDateParam instance because it's not thread-safe
      8db40f67
    • Tim Ellison's avatar
      [SPARK-7756] [CORE] Use testing cipher suites common to Oracle and IBM security providers · bf465807
      Tim Ellison authored
      Add alias names for supported cipher suites to the sample SSL configuration.
      
      The IBM JSSE provider reports its cipher suite with an SSL_ prefix, but accepts TLS_ prefixed suite names as an alias.  However, Jetty filters the requested ciphers based on the provider's reported supported suites, so the TLS_ versions are never passed through to JSSE causing an SSL handshake failure.
      
      Author: Tim Ellison <t.p.ellison@gmail.com>
      
      Closes #6282 from tellison/SSLFailure and squashes the following commits:
      
      8de8a3e [Tim Ellison] Update SecurityManagerSuite with new expected suite names
      96158b2 [Tim Ellison] Update the sample configs to use ciphers that are common to both the Oracle and IBM security providers.
      705421b [Tim Ellison] Merge branch 'master' of github.com:tellison/spark into SSLFailure
      68b9425 [Tim Ellison] Merge branch 'master' of https://github.com/apache/spark into SSLFailure
      b0c35f6 [Tim Ellison] [CORE] Add aliases used for cipher suites in IBM provider
      bf465807
    • Tathagata Das's avatar
      [SPARK-7930] [CORE] [STREAMING] Fixed shutdown hook priorities · cd3d9a5c
      Tathagata Das authored
      Shutdown hook for temp directories had priority 100 while SparkContext was 50. So the local root directory was deleted before SparkContext was shutdown. This leads to scary errors on running jobs, at the time of shutdown. This is especially a problem when running streaming examples, where Ctrl-C is the only way to shutdown.
      
      The fix in this PR is to make the temp directory shutdown priority lower than SparkContext, so that the temp dirs are the last thing to get deleted, after the SparkContext has been shut down. Also, the DiskBlockManager shutdown priority is change from default 100 to temp_dir_prio + 1, so that it gets invoked just before all temp dirs are cleared.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6482 from tdas/SPARK-7930 and squashes the following commits:
      
      d7cbeb5 [Tathagata Das] Removed unnecessary line
      1514d0b [Tathagata Das] Fixed shutdown hook priorities
      cd3d9a5c
    • Kay Ousterhout's avatar
      [SPARK-7932] Fix misleading scheduler delay visualization · 04ddcd4d
      Kay Ousterhout authored
      The existing code rounds down to the nearest percent when computing the proportion
      of a task's time that was spent on each phase of execution, and then computes
      the scheduler delay proportion as 100 - sum(all other proportions).  As a result,
      a few extra percent can end up in the scheduler delay. This commit eliminates
      the rounding so that the time visualizations correspond properly to the real times.
      
      sarutak If you could take a look at this, that would be great! Not sure if there's a good
      reason to round here that I missed.
      
      cc shivaram
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6484 from kayousterhout/SPARK-7932 and squashes the following commits:
      
      1723cc4 [Kay Ousterhout] [SPARK-7932] Fix misleading scheduler delay visualization
      04ddcd4d
  8. May 28, 2015
    • Xiangrui Meng's avatar
      [SPARK-7926] [PYSPARK] use the official Pyrolite release · c45d58c1
      Xiangrui Meng authored
      Switch to the official Pyrolite release from the one published under `org.spark-project`. Thanks irmen for making the releases on Maven Central. We didn't upgrade to 4.6 because we don't have enough time for QA. I excludes `serpent` from its dependencies because we don't use it in Spark.
      ~~~
      [info]   +-net.jpountz.lz4:lz4:1.3.0
      [info]   +-net.razorvine:pyrolite:4.4
      [info]   +-net.sf.py4j:py4j:0.8.2.1
      ~~~
      
      davies
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6472 from mengxr/SPARK-7926 and squashes the following commits:
      
      7b3c6bf [Xiangrui Meng] use the official Pyrolite release
      c45d58c1
    • Reynold Xin's avatar
      [SPARK-7927] whitespace fixes for core. · 7f7505d8
      Reynold Xin authored
      So we can enable a whitespace enforcement rule in the style checker to save code review time.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6473 from rxin/whitespace-core and squashes the following commits:
      
      058195d [Reynold Xin] Fixed tests.
      fce11e9 [Reynold Xin] [SPARK-7927] whitespace fixes for core.
      7f7505d8
Loading