Skip to content
Snippets Groups Projects
  1. May 08, 2015
    • Imran Rashid's avatar
      [SPARK-3454] separate json endpoints for data in the UI · c796be70
      Imran Rashid authored
      Exposes data available in the UI as json over http.  Key points:
      
      * new endpoints, handled independently of existing XyzPage classes.  Root entrypoint is `JsonRootResource`
      * Uses jersey + jackson for routing & converting POJOs into json
      * tests against known results in `HistoryServerSuite`
      * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages.
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #5940 from squito/SPARK-3454_better_test_files and squashes the following commits:
      
      1a72ed6 [Imran Rashid] rats
      85fdb3e [Imran Rashid] Merge branch 'no_php' into SPARK-3454
      1fc65b0 [Imran Rashid] Revert "Revert "[SPARK-3454] separate json endpoints for data in the UI""
      1276900 [Imran Rashid] get rid of giant event file, replace w/ smaller one; check both shuffle read & shuffle write
      4e12013 [Imran Rashid] just use test case name for expectation file name
      863ef64 [Imran Rashid] rename json files to avoid strange file names and not look like php
      c796be70
    • Lianhui Wang's avatar
      [SPARK-6869] [PYSPARK] Add pyspark archives path to PYTHONPATH · ebff7327
      Lianhui Wang authored
      Based on https://github.com/apache/spark/pull/5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR.
      andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks.
      
      Author: Lianhui Wang <lianhuiwang09@gmail.com>
      
      Closes #5580 from lianhuiwang/SPARK-6869 and squashes the following commits:
      
      66ffa43 [Lianhui Wang] Update Client.scala
      c2ad0f9 [Lianhui Wang] Update Client.scala
      1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
      008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
      f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
      150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869
      20402cd [Lianhui Wang] use ZipEntry
      9d87c3f [Lianhui Wang] update scala style
      e7bd971 [Lianhui Wang] address vanzin's comments
      4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt
      e6b573b [Lianhui Wang] address vanzin's comments
      f11f84a [Lianhui Wang] zip pyspark archives
      5192cca [Lianhui Wang] update import path
      3b1e4c8 [Lianhui Wang] address tgravescs's comments
      9396346 [Lianhui Wang] put zip to make-distribution.sh
      0d2baf7 [Lianhui Wang] update import paths
      e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit
      31e8e06 [Lianhui Wang] update code style
      9f31dac [Lianhui Wang] update code and add comments
      f72987c [Lianhui Wang] add archives path to PYTHONPATH
      ebff7327
    • Zhang, Liye's avatar
      [SPARK-7392] [CORE] bugfix: Kryo buffer size cannot be larger than 2M · c2f0821a
      Zhang, Liye authored
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #5934 from liyezhang556520/kryoBufSize and squashes the following commits:
      
      5707e04 [Zhang, Liye] fix import order
      8693288 [Zhang, Liye] replace multiplier with ByteUnit methods
      9bf93e9 [Zhang, Liye] add tests
      d91e5ed [Zhang, Liye] change kb to mb
      c2f0821a
  2. May 07, 2015
    • Andrew Or's avatar
      [SPARK-7347] DAG visualization: add tooltips to RDDs · 88717ee4
      Andrew Or authored
      This is an addition to #5729.
      
      Here's an example with ALS.
      <img src="https://issues.apache.org/jira/secure/attachment/12731039/tooltip.png" width="400px"></img>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5957 from andrewor14/viz-hover2 and squashes the following commits:
      
      60e3758 [Andrew Or] Add tooltips for RDDs on job page
      88717ee4
    • Andrew Or's avatar
      [SPARK-7391] DAG visualization: auto expand if linked from another viz · f1216514
      Andrew Or authored
      This is an addition to #5729.
      
      If you click into a stage from the DAG viz on the job page, you might expect to expand on the stage. However, once you get to the stage page, you actually have to expand the DAG viz there yourself.
      
      This patch makes this happen automatically. It's a small UX improvement.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5958 from andrewor14/viz-auto-expand and squashes the following commits:
      
      03cd157 [Andrew Or] Automatically expand DAG viz if from job page
      f1216514
    • Timothy Chen's avatar
      [SPARK-7373] [MESOS] Add docker support for launching drivers in mesos cluster mode. · 4eecf550
      Timothy Chen authored
      Using the existing docker support for mesos, also enabling the mesos cluster mode scheduler to launch Spark drivers in docker images as well.
      
      This also allows the executors launched by the drivers to be also in the same Docker image by passing  the docker settings.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #5917 from tnachen/spark_cluster_docker and squashes the following commits:
      
      1e842f5 [Timothy Chen] Add docker support for launching drivers in mesos cluster mode.
      4eecf550
    • Tijo Thomas's avatar
      [SPARK-7399] [SPARK CORE] Fixed compilation error in scala 2.11 · 0c33bf81
      Tijo Thomas authored
      scala has deterministic naming-scheme for the generated methods which return default arguments . here one of the default argument of overloaded method has to be removed
      
      Author: Tijo Thomas <tijoparacka@gmail.com>
      
      Closes #5966 from tijoparacka/fix_compilation_error_in_scala2.11 and squashes the following commits:
      
      c90bba8 [Tijo Thomas] Fixed compilation error in scala 2.11
      0c33bf81
  3. May 06, 2015
    • Andrew Or's avatar
      [HOT FIX] For DAG visualization #5954 · 71a452b6
      Andrew Or authored
      71a452b6
    • Andrew Or's avatar
      [SPARK-7371] [SPARK-7377] [SPARK-7408] DAG visualization addendum (#5729) · 8fa6829f
      Andrew Or authored
      This is a follow-up patch for #5729.
      
      **[SPARK-7408]** Move as much style code from JS to CSS as possible
      **[SPARK-7377]** Fix JS error if a job / stage contains only one RDD
      **[SPARK-7371]** Decrease emphasis on RDD on stage page as requested by mateiz pwendell
      
      This patch also includes general code clean up.
      
      <img src="https://issues.apache.org/jira/secure/attachment/12730992/before-after.png" width="500px"></img>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5954 from andrewor14/viz-emphasize-rdd and squashes the following commits:
      
      3c0d4f0 [Andrew Or] Guard against JS error by rendering arrows only if needed
      f23e15b [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz-emphasize-rdd
      565801f [Andrew Or] Clean up code
      9dab5f0 [Andrew Or] Move styling from JS to CSS + clean up code
      107c0b6 [Andrew Or] Tweak background color, stroke width, font size etc.
      1610c62 [Andrew Or] Implement cluster padding for stage page
      8fa6829f
    • Andrew Or's avatar
    • Josh Rosen's avatar
      Add `Private` annotation. · 845d1d4d
      Josh Rosen authored
      This was originally added as part of #4435, which was reverted.
      845d1d4d
    • Josh Rosen's avatar
      [SPARK-7311] Introduce internal Serializer API for determining if serializers... · 002c1238
      Josh Rosen authored
      [SPARK-7311] Introduce internal Serializer API for determining if serializers support object relocation
      
      This patch extends the `Serializer` interface with a new `Private` API which allows serializers to indicate whether they support relocation of serialized objects in serializer stream output.
      
      This relocatibilty property is described in more detail in `Serializer.scala`, but in a nutshell a serializer supports relocation if reordering the bytes of serialized objects in serialization stream output is equivalent to having re-ordered those elements prior to serializing them.  The optimized shuffle path introduced in #4450 and #5868 both rely on serializers having this property; this patch just centralizes the logic for determining whether a serializer has this property.  I also added tests and comments clarifying when this works for KryoSerializer.
      
      This change allows the optimizations in #4450 to be applied for shuffles that use `SqlSerializer2`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5924 from JoshRosen/SPARK-7311 and squashes the following commits:
      
      50a68ca [Josh Rosen] Address minor nits
      0a7ebd7 [Josh Rosen] Clarify reason why SqlSerializer2 supports this serializer
      123b992 [Josh Rosen] Cleanup for submitting as standalone patch.
      4aa61b2 [Josh Rosen] Add missing newline
      2c1233a [Josh Rosen] Small refactoring of SerializerPropertiesSuite to enable test re-use:
      0ba75e6 [Josh Rosen] Add tests for serializer relocation property.
      450fa21 [Josh Rosen] Back out accidental log4j.properties change
      86d4dcd [Josh Rosen] Flag that SparkSqlSerializer2 supports relocation
      b9624ee [Josh Rosen] Expand serializer API and use new function to help control when new UnsafeShuffle path is used.
      002c1238
    • zsxwing's avatar
      [SPARK-7384][Core][Tests] Fix flaky tests for distributed mode in BroadcastSuite · 9f019c72
      zsxwing authored
      Fixed the following failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/452/testReport/junit/org.apache.spark.broadcast/BroadcastSuite/Unpersisting_HttpBroadcast_on_executors_and_driver_in_distributed_mode/
      
      The tests should wait until all slaves are up. Otherwise, there may be only a part of `BlockManager`s registered, and fail the tests.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5925 from zsxwing/SPARK-7384 and squashes the following commits:
      
      783cb7b [zsxwing] Add comments for _jobProgressListener and remove postfixOps
      1009ef1 [zsxwing] [SPARK-7384][Core][Tests] Fix flaky tests for distributed mode in BroadcastSuite
      9f019c72
  4. May 05, 2015
    • Reynold Xin's avatar
      Revert "[SPARK-3454] separate json endpoints for data in the UI" · 51b3d41e
      Reynold Xin authored
      This reverts commit d4973580.
      
      The commit broke Spark on Windows.
      51b3d41e
    • Sandy Ryza's avatar
      Some minor cleanup after SPARK-4550. · 0092abb4
      Sandy Ryza authored
      JoshRosen this PR addresses the comments you left on #4450 after it got merged.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #5916 from sryza/sandy-spark-4550-cleanup and squashes the following commits:
      
      dee3d85 [Sandy Ryza] Some minor cleanup after SPARK-4550.
      0092abb4
    • zsxwing's avatar
      [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics · 489700c8
      zsxwing authored
      This is the initial work of SPARK-6939. Not yet ready for code review. Here are the screenshots:
      
      ![graph1](https://cloud.githubusercontent.com/assets/1000778/7165766/465942e0-e3dc-11e4-9b05-c184b09d75dc.png)
      
      ![graph2](https://cloud.githubusercontent.com/assets/1000778/7165779/53f13f34-e3dc-11e4-8714-a4a75b7e09ff.png)
      
      TODOs:
      - [x] Display more information on mouse hover
      - [x] Align the timeline and distribution graphs
      - [x] Clean up the codes
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5533 from zsxwing/SPARK-6939 and squashes the following commits:
      
      9f7cd19 [zsxwing] Merge branch 'master' into SPARK-6939
      deacc3f [zsxwing] Remove unused import
      cd03424 [zsxwing] Fix .rat-excludes
      70cc87d [zsxwing] Streaming Scheduling Delay => Scheduling Delay
      d457277 [zsxwing] Fix UIUtils in BatchPage
      b3f303e [zsxwing] Add comments for unclear classes and methods
      ff0bff8 [zsxwing] Make InputDStream.name private[streaming]
      cc392c5 [zsxwing] Merge branch 'master' into SPARK-6939
      e275e23 [zsxwing] Move time related methods to Streaming's UIUtils
      d5d86f6 [zsxwing] Fix incorrect lastErrorTime
      3be4b7a [zsxwing] Use InputInfo
      b50fa32 [zsxwing] Jump to the batch page when clicking a point in the timeline graphs
      203605d [zsxwing] Merge branch 'master' into SPARK-6939
      74307cf [zsxwing] Reuse the data for histogram graphs to reduce the page size
      2586916 [zsxwing] Merge branch 'master' into SPARK-6939
      70d8533 [zsxwing] Remove BatchInfo.numRecords and a few renames
      7bbdc0a [zsxwing] Hide the receiver sub table if no receiver
      a2972e9 [zsxwing] Add some ui tests for StreamingPage
      fd03ad0 [zsxwing] Add a test to verify no memory leak
      4a8f886 [zsxwing] Merge branch 'master' into SPARK-6939
      18607a1 [zsxwing] Merge branch 'master' into SPARK-6939
      d0b0aec [zsxwing] Clean up the codes
      a459f49 [zsxwing] Add a dash line to processing time graphs
      8e4363c [zsxwing] Prepare for the demo
      c81a1ee [zsxwing] Change time unit in the graphs automatically
      4c0b43f [zsxwing] Update Streaming UI
      04c7500 [zsxwing] Make the server and client use the same timezone
      fed8219 [zsxwing] Move the x axis at the top and show a better tooltip
      c23ce10 [zsxwing] Make two graphs close
      d78672a [zsxwing] Make the X axis use the same range
      881c907 [zsxwing] Use histogram for distribution
      5688702 [zsxwing] Fix the unit test
      ddf741a [zsxwing] Fix the unit test
      ad93295 [zsxwing] Remove unnecessary codes
      a0458f9 [zsxwing] Clean the codes
      b82ed1e [zsxwing] Update the graphs as per comments
      dd653a1 [zsxwing] Add timeline and histogram graphs for streaming statistics
      489700c8
    • jerryshao's avatar
      [SPARK-7007] [CORE] Add a metric source for ExecutorAllocationManager · 9f1f9b10
      jerryshao authored
      Add a metric source to expose the internal status of ExecutorAllocationManager to better monitoring the resource usage of executors when dynamic allocation is enable. Please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #5589 from jerryshao/dynamic-allocation-source and squashes the following commits:
      
      104d155 [jerryshao] rebase and address the comments
      c501a2c [jerryshao] Address the comments
      d237ba5 [jerryshao] Address the comments
      2c3540f [jerryshao] Add a metric source for ExecutorAllocationManager
      9f1f9b10
    • Andrew Or's avatar
      [SPARK-7318] [STREAMING] DStream cleans objects that are not closures · 57e9f29e
      Andrew Or authored
      I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:
      
      8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
      5ee4e25 [Andrew Or] Fix tests
      eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
      67eeff4 [Andrew Or] Add tests
      a4fa768 [Andrew Or] Clean the closure, not the RDD
      57e9f29e
    • Andrew Or's avatar
      [SPARK-7237] Many user provided closures are not actually cleaned · 1fdabf8d
      Andrew Or authored
      Note: ~140 lines are tests.
      
      In a nutshell, we never cleaned closures the user provided through the following operations:
      - sortBy
      - keyBy
      - mapPartitions
      - mapPartitionsWithIndex
      - aggregateByKey
      - foldByKey
      - foreachAsync
      - one of the aliases for runJob
      - runApproximateJob
      
      For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237).
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5787 from andrewor14/clean-more and squashes the following commits:
      
      2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
      7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
      df3caa3 [Andrew Or] Address comments
      7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
      6498f44 [Andrew Or] Add missing test for groupBy
      e83699e [Andrew Or] Clean one more
      8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app
      9ac5f9b [Andrew Or] Clean closures that are not currently cleaned
      19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
      1fdabf8d
    • zsxwing's avatar
      [SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map stage... · 5ffc73e6
      zsxwing authored
      [SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite
      
      Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/
      
      This is because many tests share the same `JobListener`. Because after each test, `scheduler` isn't stopped. So actually it's still running. When running the test `run shuffle with map stage failure`, some previous test may trigger [ResubmitFailedStages](https://github.com/apache/spark/blob/ebc25a4ddfe07a67668217cec59893bc3b8cf730/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1120) logic, and report `jobFailed` and override the global `failure` variable.
      
      This PR uses `after` to call `scheduler.stop()` for each test.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5903 from zsxwing/SPARK-5074 and squashes the following commits:
      
      1e6f13e [zsxwing] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite
      5ffc73e6
    • Imran Rashid's avatar
      [SPARK-3454] separate json endpoints for data in the UI · d4973580
      Imran Rashid authored
      Exposes data available in the UI as json over http.  Key points:
      
      * new endpoints, handled independently of existing XyzPage classes.  Root entrypoint is `JsonRootResource`
      * Uses jersey + jackson for routing & converting POJOs into json
      * tests against known results in `HistoryServerSuite`
      * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages.
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #4435 from squito/SPARK-3454 and squashes the following commits:
      
      da1e35f [Imran Rashid] typos etc.
      5e78b4f [Imran Rashid] fix rendering problems
      5ae02ad [Imran Rashid] Merge branch 'master' into SPARK-3454
      f016182 [Imran Rashid] change all constructors json-pojo class constructors to be private[spark] to protect us from mima-false-positives if we add fields
      3347b72 [Imran Rashid] mark EnumUtil as @Private
      ec140a2 [Imran Rashid] create @Private
      cc1febf [Imran Rashid] add docs on the metrics-as-json api
      cbaf287 [Imran Rashid] Merge branch 'master' into SPARK-3454
      56db31e [Imran Rashid] update tests for mulit-attempt
      7f3bc4e [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt"
      67008b4 [Imran Rashid] rats
      9e51400 [Imran Rashid] style
      c9bae1c [Imran Rashid] handle multiple attempts per app
      b87cd63 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt
      188762c [Imran Rashid] multi-attempt
      2af11e5 [Imran Rashid] Merge branch 'master' into SPARK-3454
      befff0c [Imran Rashid] review feedback
      14ac3ed [Imran Rashid] jersey-core needs to be explicit; move version & scope to parent pom.xml
      f90680e [Imran Rashid] Merge branch 'master' into SPARK-3454
      dc8a7fe [Imran Rashid] style, fix errant comments
      acb7ef6 [Imran Rashid] fix indentation
      7bf1811 [Imran Rashid] move MetricHelper so mima doesnt think its exposed; comments
      9d889d6 [Imran Rashid] undo some unnecessary changes
      f48a7b0 [Imran Rashid] docs
      52bbae8 [Imran Rashid] StorageListener & StorageStatusListener needs to synchronize internally to be thread-safe
      31c79ce [Imran Rashid] asm no longer needed for SPARK_PREPEND_CLASSES
      b2f8b91 [Imran Rashid] @DeveloperApi
      2e19be2 [Imran Rashid] lazily convert ApplicationInfo to avoid memory overhead
      ba3d9d2 [Imran Rashid] upper case enums
      39ac29c [Imran Rashid] move EnumUtil
      d2bde77 [Imran Rashid] update error handling & scoping
      4a234d3 [Imran Rashid] avoid jersey-media-json-jackson b/c of potential version conflicts
      a157a2f [Imran Rashid] style
      7bd4d15 [Imran Rashid] delete security test, since it doesnt do anything
      a325563 [Imran Rashid] style
      a9c5cf1 [Imran Rashid] undo changes superceeded by master
      0c6f968 [Imran Rashid] update deps
      1ed0d07 [Imran Rashid] Merge branch 'master' into SPARK-3454
      4c92af6 [Imran Rashid] style
      f2e63ad [Imran Rashid] Merge branch 'master' into SPARK-3454
      c22b11f [Imran Rashid] fix compile error
      9ea682c [Imran Rashid] go back to good ol' java enums
      cf86175 [Imran Rashid] style
      d493b38 [Imran Rashid] Merge branch 'master' into SPARK-3454
      f05ae89 [Imran Rashid] add in ExecutorSummaryInfo for MiMa :(
      101a698 [Imran Rashid] style
      d2ef58d [Imran Rashid] revert changes that had HistoryServer refresh the application listing more often
      b136e39b [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt"
      e031719 [Imran Rashid] fixes from review
      1f53a66 [Imran Rashid] style
      b4a7863 [Imran Rashid] fix compile error
      2c8b7ee [Imran Rashid] rats
      1578a4a [Imran Rashid] doc
      674f8dc [Imran Rashid] more explicit about total numbers of jobs & stages vs. number retained
      9922be0 [Imran Rashid] Merge branch 'master' into stage_distributions
      f5a5196 [Imran Rashid] undo removal of renderJson from MasterPage, since there is no substitute yet
      db61211 [Imran Rashid] get JobProgressListener directly from UI
      fdfc181 [Imran Rashid] stage/taskList
      63eb4a6 [Imran Rashid] tests for taskSummary
      ad27de8 [Imran Rashid] error handling on quantile values
      b2efcaf [Imran Rashid] cleanup, combine stage-related paths into one resource
      aaba896 [Imran Rashid] wire up task summary
      a4b1397 [Imran Rashid] stage metric distributions
      e48ba32 [Imran Rashid] rename
      eaf3bbb [Imran Rashid] style
      25cd894 [Imran Rashid] if only given day, assume GMT
      51eaedb [Imran Rashid] more visibility fixes
      9f28b7e [Imran Rashid] ack, more cleanup
      99764e1 [Imran Rashid] Merge branch 'SPARK-3454_w_jersey' into SPARK-3454
      a61a43c [Imran Rashid] oops, remove accidental checkin
      a066055 [Imran Rashid] set visibility on a lot of classes
      1f361c8 [Imran Rashid] update rat-excludes
      0be5120 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      2382bef [Imran Rashid] switch to using new "enum"
      fef6605 [Imran Rashid] some utils for working w/ new "enum" format
      dbfc7bf [Imran Rashid] style
      b86bcb0 [Imran Rashid] update test to look at one stage attempt
      5f9df24 [Imran Rashid] style
      7fd156a [Imran Rashid] refactor jsonDiff to avoid code duplication
      73f1378 [Imran Rashid] test json; also add test cases for cleaned stages & jobs
      97d411f [Imran Rashid] json endpoint for one job
      0c96147 [Imran Rashid] better error msgs for bad stageId vs bad attemptId
      dddbd29 [Imran Rashid] stages have attempt; jobs are sorted; resource for all attempts for one stage
      190c17a [Imran Rashid] StagePage should distinguish no task data, from unknown stage
      84cd497 [Imran Rashid] AllJobsPage should still report correct completed & failed job count, even if some have been cleaned, to make it consistent w/ AllStagesPage
      36e4062 [Imran Rashid] SparkUI needs to know about startTime, so it can list its own applicationInfo
      b4c75ed [Imran Rashid] fix merge conflicts; need to widen visibility in a few cases
      e91750a [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      56d2fc7 [Imran Rashid] jersey needs asm for SPARK_PREPEND_CLASSES to work
      f7df095 [Imran Rashid] add test for accumulables, and discover that I need update after all
      9c0c125 [Imran Rashid] add accumulableInfo
      00e9cc5 [Imran Rashid] more style
      3377e61 [Imran Rashid] scaladoc
      d05f7a9 [Imran Rashid] dont use case classes for status api POJOs, since they have binary compatibility issues
      654cecf [Imran Rashid] move all the status api POJOs to one file
      b86e2b0 [Imran Rashid] style
      18a8c45 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      5598f19 [Imran Rashid] delete some unnecessary code, more to go
      56edce0 [Imran Rashid] style
      017c755 [Imran Rashid] add in metrics now available
      1b78cb7 [Imran Rashid] fix some import ordering
      0dc3ea7 [Imran Rashid] if app isnt found, reload apps from FS before giving up
      c7d884f [Imran Rashid] fix merge conflicts
      0c12b50 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      b6a96a8 [Imran Rashid] compare json by AST, not string
      cd37845 [Imran Rashid] switch to using java.util.Dates for times
      a4ab5aa [Imran Rashid] add in explicit dependency on jersey 1.9 -- maven wasn't happy before this
      4fdc39f [Imran Rashid] refactor case insensitive enum parsing
      cba1ef6 [Imran Rashid] add security (maybe?) for metrics json
      f0264a7 [Imran Rashid] switch to using jersey for metrics json
      bceb3a9 [Imran Rashid] set http response code on error, some testing
      e0356b6 [Imran Rashid] put new test expectation files in rat excludes (is this OK?)
      b252e7a [Imran Rashid] small cleanup of accidental changes
      d1a8c92 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt
      4b398d0 [Imran Rashid] expose UI data as json in new endpoints
      d4973580
    • Sandy Ryza's avatar
      [SPARK-5112] Expose SizeEstimator as a developer api · 4222da68
      Sandy Ryza authored
      "The best way to size the amount of memory consumption your dataset will require is to create an RDD, put it into cache, and look at the SparkContext logs on your driver program. The logs will tell you how much memory each partition is consuming, which you can aggregate to get the total size of the RDD."
      -the Tuning Spark page
      
      This is a pain. It would be much nicer to expose simply functionality for understanding the memory footprint of a Java object.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3913 from sryza/sandy-spark-5112 and squashes the following commits:
      
      8d9e082 [Sandy Ryza] Add SizeEstimator in org.apache.spark
      2e1a906 [Sandy Ryza] Revert "Move SizeEstimator out of util"
      93f4cd0 [Sandy Ryza] Move SizeEstimator out of util
      e21c1f4 [Sandy Ryza] Remove unused import
      798ab88 [Sandy Ryza] Update documentation and add to SparkContext
      34c523c [Sandy Ryza] SPARK-5112. Expose SizeEstimator as a developer api
      4222da68
    • Tathagata Das's avatar
      [HOTFIX] [TEST] Ignoring flaky tests · 8776fe0b
      Tathagata Das authored
      org.apache.spark.DriverSuite.driver should exit after finishing without cleanup (SPARK-530)
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2267/
      
      org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through --jars
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2271/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/
      
      org.apache.spark.streaming.flume.FlumePollingStreamSuite.flume polling test
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2269/
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5901 from tdas/ignore-flaky-tests and squashes the following commits:
      
      9cd8667 [Tathagata Das] Ignoring tests.
      8776fe0b
    • Tathagata Das's avatar
      [SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL and... · 1854ac32
      Tathagata Das authored
      [SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL and recovered on driver failure
      
      - Enabled ReceivedBlockTracker WAL by default
      - Stored block metadata in the WAL
      - Optimized WALBackedBlockRDD by skipping block fetch when the block is known to not exist in Spark
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5732 from tdas/SPARK-7139 and squashes the following commits:
      
      575476e [Tathagata Das] Added more tests to get 100% coverage of the WALBackedBlockRDD
      19668ba [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7139
      685fab3 [Tathagata Das] Addressed comments in PR
      637bc9c [Tathagata Das] Changed segment to handle
      466212c [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7139
      5f67a59 [Tathagata Das] Fixed HdfsUtils to handle append in local file system
      1bc5bc3 [Tathagata Das] Fixed bug on unexpected recovery
      d06fa21 [Tathagata Das] Enabled ReceivedBlockTracker by default, stored block metadata and optimized block fetching in WALBackedBlockRDD
      1854ac32
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Declare ivy dependency in root pom. · c5790a2f
      Marcelo Vanzin authored
      Without this, any dependency that pulls ivy transitively may override
      the version and potentially cause issue. In my machine, the hive tests
      were pulling an old version of ivy, and subsequently failing with a
      "NoSuchMethodError".
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5893 from vanzin/ivy-dep-fix and squashes the following commits:
      
      ea2112d [Marcelo Vanzin] [minor] [build] Declare ivy dependency in root pom.
      c5790a2f
    • Xiangrui Meng's avatar
      [SPARK-7314] [SPARK-3524] [PYSPARK] upgrade Pyrolite to 4.4 · e9b16e67
      Xiangrui Meng authored
      This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under `org.spark-project` on Maven Central since there is no official release published there.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5850 from mengxr/SPARK-7314 and squashes the following commits:
      
      2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7314
      da3c2dd [Xiangrui Meng] remove my repo
      fe7e29b [Xiangrui Meng] switch to maven central
      6ddac0e [Xiangrui Meng] reverse the machine code for float/double
      d2d5b5b [Xiangrui Meng] change back to 4.4
      7824a9c [Xiangrui Meng] use Pyrolite 3.1
      cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing
      e9b16e67
  5. May 04, 2015
    • Bryan Cutler's avatar
      [SPARK-7236] [CORE] Fix to prevent AkkaUtils askWithReply from sleeping on final attempt · 8aa5aea7
      Bryan Cutler authored
      Added a check so that if `AkkaUtils.askWithReply` is on the final attempt, it will not sleep for the `retryInterval`.  This should also prevent the thread from sleeping for `Int.Max` when using `askWithReply` with default values for `maxAttempts` and `retryInterval`.
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #5896 from BryanCutler/askWithReply-sleep-7236 and squashes the following commits:
      
      653a07b [Bryan Cutler] [SPARK-7236] Fix to prevent AkkaUtils askWithReply from sleeping on final attempt
      8aa5aea7
    • Andrew Or's avatar
      [SPARK-6943] [SPARK-6944] DAG visualization on SparkUI · fc8b5819
      Andrew Or authored
      This patch adds the functionality to display the RDD DAG on the SparkUI.
      
      This DAG describes the relationships between
      - an RDD and its dependencies,
      - an RDD and its operation scopes, and
      - an RDD's operation scopes and the stage / job hierarchy
      
      An operation scope here refers to the existing public APIs that created the RDDs (e.g. `textFile`, `treeAggregate`). In the future, we can expand this to include higher level operations like SQL queries.
      
      *Note: This blatantly stole a few lines of HTML and JavaScript from #5547 (thanks shroffpradyumn!)*
      
      Here's what the job page looks like:
      <img src="https://issues.apache.org/jira/secure/attachment/12730286/job-page.png" width="700px"/>
      and the stage page:
      <img src="https://issues.apache.org/jira/secure/attachment/12730287/stage-page.png" width="300px"/>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5729 from andrewor14/viz2 and squashes the following commits:
      
      666c03b [Andrew Or] Round corners of RDD boxes on stage page (minor)
      01ba336 [Andrew Or] Change RDD cache color to red (minor)
      6f9574a [Andrew Or] Add tests for RDDOperationScope
      1c310e4 [Andrew Or] Wrap a few more RDD functions in an operation scope
      3ffe566 [Andrew Or] Restore "null" as default for RDD name
      5fdd89d [Andrew Or] children -> child (minor)
      0d07a84 [Andrew Or] Fix python style
      afb98e2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
      0d7aa32 [Andrew Or] Fix python tests
      3459ab2 [Andrew Or] Fix tests
      832443c [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
      429e9e1 [Andrew Or] Display cached RDDs on the viz
      b1f0fd1 [Andrew Or] Rename OperatorScope -> RDDOperationScope
      31aae06 [Andrew Or] Extract visualization logic from listener
      83f9c58 [Andrew Or] Implement a programmatic representation of operator scopes
      5a7faf4 [Andrew Or] Rename references to viz scopes to viz clusters
      ee33d52 [Andrew Or] Separate HTML generating code from listener
      f9830a2 [Andrew Or] Refactor + clean up + document JS visualization code
      b80cc52 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
      0706992 [Andrew Or] Add link from jobs to stages
      deb48a0 [Andrew Or] Translate stage boxes taking into account the width
      5c7ce16 [Andrew Or] Connect RDDs across stages + update style
      ab91416 [Andrew Or] Introduce visualization to the Job Page
      5f07e9c [Andrew Or] Remove more return statements from scopes
      5e388ea [Andrew Or] Fix line too long
      43de96e [Andrew Or] Add parent IDs to StageInfo
      6e2cfea [Andrew Or] Remove all return statements in `withScope`
      d19c4da [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
      7ef957c [Andrew Or] Fix scala style
      4310271 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz2
      aa868a9 [Andrew Or] Ensure that HadoopRDD is actually serializable
      c3bfcae [Andrew Or] Re-implement scopes using closures instead of annotations
      52187fc [Andrew Or] Rat excludes
      09d361e [Andrew Or] Add ID to node label (minor)
      71281fa [Andrew Or] Embed the viz in the UI in a toggleable manner
      8dd5af2 [Andrew Or] Fill in documentation + miscellaneous minor changes
      fe7816f [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
      205f838 [Andrew Or] Reimplement rendering with dagre-d3 instead of viz.js
      5e22946 [Andrew Or] Merge branch 'master' of github.com:apache/spark into viz
      6a7cdca [Andrew Or] Move RDD scope util methods and logic to its own file
      494d5c2 [Andrew Or] Revert a few unintended style changes
      9fac6f3 [Andrew Or] Re-implement scopes through annotations instead
      f22f337 [Andrew Or] First working implementation of visualization with vis.js
      2184348 [Andrew Or] Translate RDD information to dot file
      5143523 [Andrew Or] Expose the necessary information in RDDInfo
      a9ed4f9 [Andrew Or] Add a few missing scopes to certain RDD methods
      6b3403b [Andrew Or] Scope all RDD methods
      fc8b5819
  6. May 03, 2015
    • Michael Armbrust's avatar
      [SPARK-6907] [SQL] Isolated client for HiveMetastore · daa70bf1
      Michael Armbrust authored
      This PR adds initial support for loading multiple versions of Hive in a single JVM and provides a common interface for extracting metadata from the `HiveMetastoreClient` for a given version.  This is accomplished by creating an isolated `ClassLoader` that operates according to the following rules:
      
       - __Shared Classes__: Java, Scala, logging, and Spark classes are delegated to `baseClassLoader`
        allowing the results of calls to the `ClientInterface` to be visible externally.
       - __Hive Classes__: new instances are loaded from `execJars`.  These classes are not
        accessible externally due to their custom loading.
       - __Barrier Classes__: Classes such as `ClientWrapper` are defined in Spark but must link to a specific version of Hive.  As a result, the bytecode is acquired from the Spark `ClassLoader` but a new copy is created for each instance of `IsolatedClientLoader`.
        This new instance is able to see a specific version of hive without using reflection where ever hive is consistent across versions. Since
        this is a unique instance, it is not visible externally other than as a generic
        `ClientInterface`, unless `isolationOn` is set to `false`.
      
      In addition to the unit tests, I have also tested this locally against mysql instances of the Hive Metastore.  I've also successfully ported Spark SQL to run with this client, but due to the size of the changes, that will come in a follow-up PR.
      
      By default, Hive jars are currently downloaded from Maven automatically for a given version to ease packaging and testing.  However, there is also support for specifying their location manually for deployments without internet.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #5851 from marmbrus/isolatedClient and squashes the following commits:
      
      c72f6ac [Michael Armbrust] rxins comments
      1e271fa [Michael Armbrust] [SPARK-6907][SQL] Isolated client for HiveMetastore
      daa70bf1
  7. May 02, 2015
    • Ye Xianjin's avatar
      [SPARK-6030] [CORE] Using simulated field layout method to compute class shellSize · bfcd528d
      Ye Xianjin authored
      SizeEstimator gives wrong result for Integer on 64bit JVM with UseCompressedOops on, this pr fixes that. For more details, please refer [SPARK-6030](https://issues.apache.org/jira/browse/SPARK-6030)
      sryza, I noticed there is a pr to expose SizeEstimator, maybe that should be waited by this pr get merged if we confirm this problem.
      And shivaram would you mind to review this pr since you contribute related code. Also cc to srowen and mateiz
      
      Author: Ye Xianjin <advancedxy@gmail.com>
      
      Closes #4783 from advancedxy/SPARK-6030 and squashes the following commits:
      
      c4dcb41 [Ye Xianjin] Add super.beforeEach in the beforeEach method to make the trait stackable.. Remove useless leading whitespace.
      3f80640 [Ye Xianjin] The size of Integer class changes from 24 to 16 on a 64-bit JVM with -UseCompressedOops flag on after the fix. I don't how 100000 was originally calculated, It looks like 100000 is the magic number which makes sure spilling. Because of the size change, It fails because there is no spilling at all. Change the number to a slightly larger number fixes that.
      e849d2d [Ye Xianjin] Merge two shellSize assignments into one. Add some explanation to alignSizeUp method.
      85a0b51 [Ye Xianjin] Fix typos and update wording in comments. Using alignSizeUp to compute alignSize.
      d27eb77 [Ye Xianjin] Add some detailed comments in the code. Add some test cases. It's very difficult to design test cases as the final object alignment will hide a lot of filed layout details if we just considering the whole size.
      842aed1 [Ye Xianjin] primitiveSize(cls) can just return Int. Use a simplified class field layout method to calculate class instance size. Will add more documents and test cases. Add a new alignSizeUp function which uses bitwise operators to speedup.
      62e8ab4 [Ye Xianjin] Don't alignSize for objects' shellSize, alignSize when added to state.size. Add some primitive wrapper objects size tests.
      bfcd528d
    • Mridul Muralidharan's avatar
      [SPARK-7323] [SPARK CORE] Use insertAll instead of insert while merging combiners in reducer · da303526
      Mridul Muralidharan authored
      Author: Mridul Muralidharan <mridulm@yahoo-inc.com>
      
      Closes #5862 from mridulm/optimize_aggregator and squashes the following commits:
      
      61cf43a [Mridul Muralidharan] Use insertAll instead of insert - much more expensive to do it per tuple
      da303526
    • Andrew Or's avatar
      [SPARK-7120] [SPARK-7121] Closure cleaner nesting + documentation + tests · 7394e7ad
      Andrew Or authored
      Note: ~600 lines of this is test code, and ~100 lines documentation.
      
      **[SPARK-7121]** ClosureCleaner does not handle nested closures properly. For instance, in SparkContext, I tried to do the following:
      ```
      def scope[T](body: => T): T = body // no-op
      def myCoolMethod(path: String): RDD[String] = scope {
        parallelize(1 to 10).map { _ => path }
      }
      ```
      and I got an exception complaining that SparkContext is not serializable. The issue here is that the inner closure is getting its path from the outer closure (the scope), but the outer closure references the SparkContext object itself to get the `parallelize` method.
      
      Note, however, that the inner closure doesn't actually need the SparkContext; it just needs a field from the outer closure. If we modify ClosureCleaner to clean the outer closure recursively using only the fields accessed by the inner closure, then we can serialize the inner closure.
      
      **[SPARK-7120]** Also, the other thing is that this file is one of the least understood, partly because it is very low level and is written a long time ago. This patch attempts to change that by adding the missing documentation.
      
      This is blocking my effort on a separate task #5729.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5685 from andrewor14/closure-cleaner and squashes the following commits:
      
      cd46230 [Andrew Or] Revert a small change that affected streaming
      0bbe77f [Andrew Or] Fix style
      ea874bc [Andrew Or] Fix tests
      26c5072 [Andrew Or] Address comments
      16fbcfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into closure-cleaner
      26c7aba [Andrew Or] Revert "In sc.runJob, actually clean the inner closure"
      6f75784 [Andrew Or] Revert "Guard against NPE if CC is used outside of an application"
      e909a42 [Andrew Or] Guard against NPE if CC is used outside of an application
      3998168 [Andrew Or] In sc.runJob, actually clean the inner closure
      9187066 [Andrew Or] Merge branch 'master' of github.com:apache/spark into closure-cleaner
      d889950 [Andrew Or] Revert "Bypass SerializationDebugger for now (SPARK-7180)"
      9419efe [Andrew Or] Bypass SerializationDebugger for now (SPARK-7180)
      6d4d3f1 [Andrew Or] Fix scala style?
      4aab379 [Andrew Or] Merge branch 'master' of github.com:apache/spark into closure-cleaner
      e45e904 [Andrew Or] More minor updates (wording, renaming etc.)
      8b71cdb [Andrew Or] Update a few comments
      eb127e5 [Andrew Or] Use private method tester for a few things
      a3aa465 [Andrew Or] Add more tests for individual closure cleaner operations
      e672170 [Andrew Or] Guard against potential infinite cycles in method visitor
      6d36f38 [Andrew Or] Fix closure cleaner visibility
      2106f12 [Andrew Or] Merge branch 'master' of github.com:apache/spark into closure-cleaner
      263593d [Andrew Or] Finalize tests
      06fd668 [Andrew Or] Make closure cleaning idempotent
      a4866e3 [Andrew Or] Add tests (still WIP)
      438c68f [Andrew Or] Minor changes
      2390a60 [Andrew Or] Feature flag this new behavior
      86f7823 [Andrew Or] Implement transitive cleaning + add missing documentation
      7394e7ad
  8. May 01, 2015
    • Mridul Muralidharan's avatar
      [SPARK-7317] [Shuffle] Expose shuffle handle · b79aeb95
      Mridul Muralidharan authored
      Details in JIRA, in a nut-shell, all machinary for custom RDD's to leverage spark shuffle directly (without exposing impl details of shuffle) exists - except for this small piece.
      
      Exposing this will allow for custom dependencies to get a handle to ShuffleHandle - which they can then leverage on reduce side.
      
      Author: Mridul Muralidharan <mridulm@yahoo-inc.com>
      
      Closes #5857 from mridulm/expose_shuffle_handle and squashes the following commits:
      
      d8b6bd4 [Mridul Muralidharan] Expose ShuffleHandle
      b79aeb95
    • Marcelo Vanzin's avatar
      [SPARK-6229] Add SASL encryption to network library. · 38d4e9e4
      Marcelo Vanzin authored
      There are two main parts of this change:
      
      - Extending the bootstrap mechanism in the network library to add a server-side
        bootstrap (which works a little bit differently than the client-side bootstrap), and
        to allow the  bootstraps to modify the underlying channel.
      
      - Use SASL to encrypt data going through the RPC channel.
      
      The second item requires some non-optimal code to be able to work around the
      fact that the outbound path in netty is not thread-safe, and ordering is very important
      when encryption is in the picture.
      
      A lot of the changes outside the network/common library are just to adjust to the
      changed API for initializing the RPC server.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5377 from vanzin/SPARK-6229 and squashes the following commits:
      
      ff01966 [Marcelo Vanzin] Use fancy new size config style.
      be53f32 [Marcelo Vanzin] Merge branch 'master' into SPARK-6229
      47d4aff [Marcelo Vanzin] Merge branch 'master' into SPARK-6229
      7a2a805 [Marcelo Vanzin] Clean up some unneeded changes.
      2f92237 [Marcelo Vanzin] Add comment.
      67bb0c6 [Marcelo Vanzin] Revert "Avoid exposing ByteArrayWritableChannel outside of test code."
      065f684 [Marcelo Vanzin] Add test to verify chunking.
      3d1695d [Marcelo Vanzin] Minor cleanups.
      73cff0e [Marcelo Vanzin] Skip bytes in decode path too.
      318ad23 [Marcelo Vanzin] Avoid exposing ByteArrayWritableChannel outside of test code.
      346f829 [Marcelo Vanzin] Avoid trip through channel selector by not reporting 0 bytes written.
      a4a5938 [Marcelo Vanzin] Review feedback.
      4797519 [Marcelo Vanzin] Remove unused import.
      9908ada [Marcelo Vanzin] Fix test, SASL backend disposal.
      7fe1489 [Marcelo Vanzin] Add a test that makes sure encryption is actually enabled.
      adb6f9d [Marcelo Vanzin] Review feedback.
      cf2a605 [Marcelo Vanzin] Clean up some code.
      8584323 [Marcelo Vanzin] Fix a comment.
      e98bc55 [Marcelo Vanzin] Add option to only allow encrypted connections to the server.
      dad42fc [Marcelo Vanzin] Make encryption thread-safe, less memory-intensive.
      b00999a [Marcelo Vanzin] Consolidate ByteArrayWritableChannel, fix SASL code to match master changes.
      b923cae [Marcelo Vanzin] Make SASL encryption handler thread-safe, handle FileRegion messages.
      39539a7 [Marcelo Vanzin] Add config option to enable SASL encryption.
      351a86f [Marcelo Vanzin] Add SASL encryption to network library.
      fbe6ccb [Marcelo Vanzin] Add TransportServerBootstrap, make SASL code use it.
      38d4e9e4
    • Chris Heller's avatar
      [SPARK-2691] [MESOS] Support for Mesos DockerInfo · 8f50a07d
      Chris Heller authored
      This patch adds partial support for running spark on mesos inside of a docker container. Only fine-grained mode is presently supported, and there is no checking done to ensure that the version of libmesos is recent enough to have a DockerInfo structure in the protobuf (other than pinning a mesos version in the pom.xml).
      
      Author: Chris Heller <hellertime@gmail.com>
      
      Closes #3074 from hellertime/SPARK-2691 and squashes the following commits:
      
      d504af6 [Chris Heller] Assist type inference
      f64885d [Chris Heller] Fix errant line length
      17c41c0 [Chris Heller] Base Dockerfile on mesosphere/mesos image
      8aebda4 [Chris Heller] Simplfy Docker image docs
      1ae7f4f [Chris Heller] Style points
      974bd56 [Chris Heller] Convert map to flatMap
      5d8bdf7 [Chris Heller] Factor out the DockerInfo construction.
      7b75a3d [Chris Heller] Align to styleguide
      80108e7 [Chris Heller] Bend to the will of RAT
      ba77056 [Chris Heller] Explicit RAT exclude
      abda5e5 [Chris Heller] Wildcard .rat-excludes
      2f2873c [Chris Heller] Exclude spark-mesos from RAT
      a589a5b [Chris Heller] Add example Dockerfile
      b6825ce [Chris Heller] Remove use of EasyMock
      eae1b86 [Chris Heller] Move properties under 'spark.mesos.'
      c184d00 [Chris Heller] Use map on Option to be consistent with non-coarse code
      fb9501a [Chris Heller] Bumped mesos version to current release
      fa11879 [Chris Heller] Add listenerBus to EasyMock
      882151e [Chris Heller] Changes to scala style
      b22d42d [Chris Heller] Exclude template from RAT
      db536cf [Chris Heller] Remove unneeded mocks
      dea1bd5 [Chris Heller] Force default protocol
      7dac042 [Chris Heller] Add test for DockerInfo
      5456c0c [Chris Heller] Adjust syntax style
      521c194 [Chris Heller] Adjust version info
      6e38f70 [Chris Heller] Document Mesos Docker properties
      29572ab [Chris Heller] Support all DockerInfo fields
      b8c0dea [Chris Heller] Support for mesos DockerInfo in coarse-mode.
      482a9fd [Chris Heller] Support for mesos DockerInfo in fine-grained mode.
      8f50a07d
    • WangTaoTheTonic's avatar
      [SPARK-6443] [SPARK SUBMIT] Could not submit app in standalone cluster mode when HA is enabled · b4b43df8
      WangTaoTheTonic authored
      **3/26 update:**
      * Akka-based:
        Use an array of `ActorSelection` to represent multiple master. Add an `activeMasterActor` for query status of driver. And will add lost masters( including the standby one) to `lostMasters`.
        When size of `lostMasters` equals or greater than # of all masters, we should give an error that all masters are not avalible.
      
      * Rest-based:
        When all masters are not available(throw an exception), we use akka gateway to submit apps.
      
      I have tested simply on standalone HA cluster(with two masters alive and one alive/one dead), it worked.
      
      There might remains some issues on style or message print, but we can check the solution then fix them together.
      
      /cc srowen andrewor14
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #5116 from WangTaoTheTonic/SPARK-6443 and squashes the following commits:
      
      2a28aab [WangTaoTheTonic] based the newest change https://github.com/apache/spark/pull/5144
      76fd411 [WangTaoTheTonic] rebase
      f4f972b [WangTaoTheTonic] rebase...again
      a41de0b [WangTaoTheTonic] rebase
      220cb3c [WangTaoTheTonic] move connect exception inside
      35119a0 [WangTaoTheTonic] style and compile issues
      9d636be [WangTaoTheTonic] per Andrew's comments
      979760c [WangTaoTheTonic] rebase
      e4f4ece [WangTaoTheTonic] fix failed test
      5d23958 [WangTaoTheTonic] refact some duplicated code, style and comments
      7a881b3 [WangTaoTheTonic] when one of masters is gone, we still can submit
      2b011c9 [WangTaoTheTonic] fix broken tests
      60d97a4 [WangTaoTheTonic] rebase
      fa1fa80 [WangTaoTheTonic] submit app to HA cluster in standalone cluster mode
      b4b43df8
    • Timothy Chen's avatar
      [SPARK-7216] [MESOS] Add driver details page to Mesos cluster UI. · 20221934
      Timothy Chen authored
      Add a details page that displays Mesos driver in the Mesos cluster UI
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #5763 from tnachen/mesos_cluster_page and squashes the following commits:
      
      55f36eb [Timothy Chen] Add driver details page to Mesos cluster UI.
      20221934
    • Sandy Ryza's avatar
      [SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a negative n... · 099327d5
      Sandy Ryza authored
      ...umber of executors
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #5704 from sryza/sandy-spark-6954 and squashes the following commits:
      
      b7890fb [Sandy Ryza] Avoid ramping up to an existing number of executors
      6eb516a [Sandy Ryza] SPARK-6954. ExecutorAllocationManager can end up requesting a negative number of executors
      099327d5
    • Holden Karau's avatar
      [SPARK-3444] Provide an easy way to change log level · ae98eec7
      Holden Karau authored
      Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from pwendel & davies
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #5791 from holdenk/SPARK-3444-provide-an-easy-way-to-change-log-level-r2 and squashes the following commits:
      
      3bf3be9 [Holden Karau] fix exception
      42ba873 [Holden Karau] fix exception
      9117244 [Holden Karau] Only allow valid log levels, throw exception if invalid log level.
      338d7bf [Holden Karau] rename setLoggingLevel to setLogLevel
      fac14a0 [Holden Karau] Fix style errors
      d9d03f3 [Holden Karau] Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from @pwendel & @davies
      ae98eec7
    • zsxwing's avatar
      [SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler · ebc25a4d
      zsxwing authored
      Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler when stopping them.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5845 from zsxwing/SPARK-7309 and squashes the following commits:
      
      6c004fd [zsxwing] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler
      ebc25a4d
Loading