Skip to content
Snippets Groups Projects
  1. May 06, 2015
  2. May 05, 2015
    • qhuang's avatar
      [SPARK-6841] [SPARKR] add support for mean, median, stdev etc. · a4669443
      qhuang authored
      Moving here from https://github.com/amplab-extras/SparkR-pkg/pull/241
      sum() has been implemented. (https://github.com/amplab-extras/SparkR-pkg/pull/242)
      
      Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841
      
      Author: qhuang <qian.huang@intel.com>
      
      Closes #5446 from hqzizania/R and squashes the following commits:
      
      f283572 [qhuang] add test unit for describe()
      2e74d5a [qhuang] add describe() DataFrame API
      a4669443
    • Reynold Xin's avatar
      Revert "[SPARK-3454] separate json endpoints for data in the UI" · 51b3d41e
      Reynold Xin authored
      This reverts commit d4973580.
      
      The commit broke Spark on Windows.
      51b3d41e
    • Reynold Xin's avatar
      [SPARK-6231][SQL/DF] Automatically resolve join condition ambiguity for self-joins. · 1fd31ba0
      Reynold Xin authored
      See the comment in join function for more information.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5919 from rxin/self-join-resolve and squashes the following commits:
      
      e2fb0da [Reynold Xin] Updated SQLConf comment.
      7233a86 [Reynold Xin] Updated comment.
      6be2b4d [Reynold Xin] Removed println
      9f6b72f [Reynold Xin] [SPARK-6231][SQL/DF] Automatically resolve ambiguity in join condition for self-joins.
      1fd31ba0
    • Sandy Ryza's avatar
      Some minor cleanup after SPARK-4550. · 0092abb4
      Sandy Ryza authored
      JoshRosen this PR addresses the comments you left on #4450 after it got merged.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #5916 from sryza/sandy-spark-4550-cleanup and squashes the following commits:
      
      dee3d85 [Sandy Ryza] Some minor cleanup after SPARK-4550.
      0092abb4
    • Shivaram Venkataraman's avatar
      [SPARK-7230] [SPARKR] Make RDD private in SparkR. · c688e3c5
      Shivaram Venkataraman authored
      This change makes the RDD API private in SparkR and all internal uses of the SparkR API use SparkR::: to access private functions.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #5895 from shivaram/rrdd-private and squashes the following commits:
      
      bdb2f07 [Shivaram Venkataraman] Make RDD private in SparkR. This change also makes all internal uses of the SparkR API use SparkR::: to access private functions
      c688e3c5
    • wangfei's avatar
      [SQL][Minor] make StringComparison extends ExpectsInputTypes · 3059291e
      wangfei authored
      make StringComparison extends ExpectsInputTypes and added expectedChildTypes, so do not need override expectedChildTypes in each subclass
      
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #5905 from scwf/ExpectsInputTypes and squashes the following commits:
      
      b374ddf [wangfei] make stringcomparison extends ExpectsInputTypes
      3059291e
    • zsxwing's avatar
      [SPARK-7351] [STREAMING] [DOCS] Add spark.streaming.ui.retainedBatches to docs · fec7b29f
      zsxwing authored
      The default value will be changed to `1000` in #5533. So here I just used `1000`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5899 from zsxwing/SPARK-7351 and squashes the following commits:
      
      e1ec515 [zsxwing] [SPARK-7351][Streaming][Docs] Add spark.streaming.ui.retainedBatches to docs
      fec7b29f
    • 云峤's avatar
      [SPARK-7294][SQL] ADD BETWEEN · 735bc3d0
      云峤 authored
      Author: 云峤 <chensong.cs@alibaba-inc.com>
      Author: kaka1992 <kaka_1992@163.com>
      
      Closes #5839 from kaka1992/master and squashes the following commits:
      
      b15360d [kaka1992] Fix python unit test in sql/test. =_= I forget to commit this file last time.
      f928816 [kaka1992] Fix python style in sql/test.
      d2e7f72 [kaka1992] Fix python style in sql/test.
      c54d904 [kaka1992] Fix empty map bug.
      7e64d1e [云峤] Update
      7b9b858 [云峤] undo
      f080f8d [云峤] update pep8
      76f0c51 [云峤] Merge remote-tracking branch 'remotes/upstream/master'
      7d62368 [云峤] [SPARK-7294] ADD BETWEEN
      baf839b [云峤] [SPARK-7294] ADD BETWEEN
      d11d5b9 [云峤] [SPARK-7294] ADD BETWEEN
      735bc3d0
    • zsxwing's avatar
      [SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs for streaming statistics · 489700c8
      zsxwing authored
      This is the initial work of SPARK-6939. Not yet ready for code review. Here are the screenshots:
      
      ![graph1](https://cloud.githubusercontent.com/assets/1000778/7165766/465942e0-e3dc-11e4-9b05-c184b09d75dc.png)
      
      ![graph2](https://cloud.githubusercontent.com/assets/1000778/7165779/53f13f34-e3dc-11e4-8714-a4a75b7e09ff.png)
      
      TODOs:
      - [x] Display more information on mouse hover
      - [x] Align the timeline and distribution graphs
      - [x] Clean up the codes
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5533 from zsxwing/SPARK-6939 and squashes the following commits:
      
      9f7cd19 [zsxwing] Merge branch 'master' into SPARK-6939
      deacc3f [zsxwing] Remove unused import
      cd03424 [zsxwing] Fix .rat-excludes
      70cc87d [zsxwing] Streaming Scheduling Delay => Scheduling Delay
      d457277 [zsxwing] Fix UIUtils in BatchPage
      b3f303e [zsxwing] Add comments for unclear classes and methods
      ff0bff8 [zsxwing] Make InputDStream.name private[streaming]
      cc392c5 [zsxwing] Merge branch 'master' into SPARK-6939
      e275e23 [zsxwing] Move time related methods to Streaming's UIUtils
      d5d86f6 [zsxwing] Fix incorrect lastErrorTime
      3be4b7a [zsxwing] Use InputInfo
      b50fa32 [zsxwing] Jump to the batch page when clicking a point in the timeline graphs
      203605d [zsxwing] Merge branch 'master' into SPARK-6939
      74307cf [zsxwing] Reuse the data for histogram graphs to reduce the page size
      2586916 [zsxwing] Merge branch 'master' into SPARK-6939
      70d8533 [zsxwing] Remove BatchInfo.numRecords and a few renames
      7bbdc0a [zsxwing] Hide the receiver sub table if no receiver
      a2972e9 [zsxwing] Add some ui tests for StreamingPage
      fd03ad0 [zsxwing] Add a test to verify no memory leak
      4a8f886 [zsxwing] Merge branch 'master' into SPARK-6939
      18607a1 [zsxwing] Merge branch 'master' into SPARK-6939
      d0b0aec [zsxwing] Clean up the codes
      a459f49 [zsxwing] Add a dash line to processing time graphs
      8e4363c [zsxwing] Prepare for the demo
      c81a1ee [zsxwing] Change time unit in the graphs automatically
      4c0b43f [zsxwing] Update Streaming UI
      04c7500 [zsxwing] Make the server and client use the same timezone
      fed8219 [zsxwing] Move the x axis at the top and show a better tooltip
      c23ce10 [zsxwing] Make two graphs close
      d78672a [zsxwing] Make the X axis use the same range
      881c907 [zsxwing] Use histogram for distribution
      5688702 [zsxwing] Fix the unit test
      ddf741a [zsxwing] Fix the unit test
      ad93295 [zsxwing] Remove unnecessary codes
      a0458f9 [zsxwing] Clean the codes
      b82ed1e [zsxwing] Update the graphs as per comments
      dd653a1 [zsxwing] Add timeline and histogram graphs for streaming statistics
      489700c8
    • Sandy Ryza's avatar
      [SPARK-5888] [MLLIB] Add OneHotEncoder as a Transformer · 47728db7
      Sandy Ryza authored
      This patch adds a one hot encoder for categorical features.  Planning to add documentation and another test after getting feedback on the approach.
      
      A couple choices made here:
      * There's an `includeFirst` option which, if false, creates numCategories - 1 columns and, if true, creates numCategories columns.  The default is true, which is the behavior in scikit-learn.
      * The user is expected to pass a `Seq` of category names when instantiating a `OneHotEncoder`.  These can be easily gotten from a `StringIndexer`.  The names are used for the output column names, which take the form colName_categoryName.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #5500 from sryza/sandy-spark-5888 and squashes the following commits:
      
      f383250 [Sandy Ryza] Infer label names automatically
      6e257b9 [Sandy Ryza] Review comments
      7c539cf [Sandy Ryza] Vector transformers
      1c182dd [Sandy Ryza] SPARK-5888. [MLLIB]. Add OneHotEncoder as a Transformer
      47728db7
    • Xiangrui Meng's avatar
      [SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark · ee374e89
      Xiangrui Meng authored
      This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5885 from mengxr/SPARK-7333 and squashes the following commits:
      
      25d7451 [Xiangrui Meng] fix tests in python 3
      babdde7 [Xiangrui Meng] fix doc
      cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
      ee374e89
    • Burak Yavuz's avatar
      [SPARK-7243][SQL] Reduce size for Contingency Tables in DataFrames · 18340d7b
      Burak Yavuz authored
      Reduced take size from 1e8 to 1e6.
      
      cc rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5900 from brkyvz/df-cont-followup and squashes the following commits:
      
      c11e762 [Burak Yavuz] fix grammar
      b30ace2 [Burak Yavuz] address comments
      a417ba5 [Burak Yavuz] [SPARK-7243][SQL] Reduce  size for Contingency Tables in DataFrames
      18340d7b
    • jerryshao's avatar
      [SPARK-7007] [CORE] Add a metric source for ExecutorAllocationManager · 9f1f9b10
      jerryshao authored
      Add a metric source to expose the internal status of ExecutorAllocationManager to better monitoring the resource usage of executors when dynamic allocation is enable. Please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #5589 from jerryshao/dynamic-allocation-source and squashes the following commits:
      
      104d155 [jerryshao] rebase and address the comments
      c501a2c [jerryshao] Address the comments
      d237ba5 [jerryshao] Address the comments
      2c3540f [jerryshao] Add a metric source for ExecutorAllocationManager
      9f1f9b10
    • Andrew Or's avatar
      [SPARK-7318] [STREAMING] DStream cleans objects that are not closures · 57e9f29e
      Andrew Or authored
      I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:
      
      8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
      5ee4e25 [Andrew Or] Fix tests
      eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
      67eeff4 [Andrew Or] Add tests
      a4fa768 [Andrew Or] Clean the closure, not the RDD
      57e9f29e
    • Andrew Or's avatar
      [SPARK-7237] Many user provided closures are not actually cleaned · 1fdabf8d
      Andrew Or authored
      Note: ~140 lines are tests.
      
      In a nutshell, we never cleaned closures the user provided through the following operations:
      - sortBy
      - keyBy
      - mapPartitions
      - mapPartitionsWithIndex
      - aggregateByKey
      - foldByKey
      - foreachAsync
      - one of the aliases for runJob
      - runApproximateJob
      
      For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237).
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5787 from andrewor14/clean-more and squashes the following commits:
      
      2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
      7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
      df3caa3 [Andrew Or] Address comments
      7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more
      6498f44 [Andrew Or] Add missing test for groupBy
      e83699e [Andrew Or] Clean one more
      8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app
      9ac5f9b [Andrew Or] Clean closures that are not currently cleaned
      19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
      1fdabf8d
    • Alain's avatar
      [MLLIB] [TREE] Verify size of input rdd > 0 when building meta data · d4cb38ae
      Alain authored
      Require non empty input rdd such that we can take the first labeledpoint and get the feature size
      
      Author: Alain <aihe@usc.edu>
      Author: aihe@usc.edu <aihe@usc.edu>
      
      Closes #5810 from AiHe/decisiontree-issue and squashes the following commits:
      
      3b1d08a [aihe@usc.edu] [MLLIB][tree] merge the assertion into the evaluation of numFeatures
      cf2e567 [Alain] [MLLIB][tree] Use a rdd api to verify size of input rdd > 0 when building meta data
      b448f47 [Alain] [MLLIB][tree] Verify size of input rdd > 0 when building meta data
      d4cb38ae
    • Xiangrui Meng's avatar
      Closes #5591 · 9d250e64
      Xiangrui Meng authored
      Closes #5878
      9d250e64
    • Hrishikesh Subramonian's avatar
      [SPARK-6612] [MLLIB] [PYSPARK] Python KMeans parity · 5995ada9
      Hrishikesh Subramonian authored
      The following items are added to Python kmeans:
      
      kmeans - setEpsilon, setInitializationSteps
      KMeansModel - computeCost, k
      
      Author: Hrishikesh Subramonian <hrishikesh.subramonian@flytxt.com>
      
      Closes #5647 from FlytxtRnD/newPyKmeansAPI and squashes the following commits:
      
      b9e451b [Hrishikesh Subramonian] set seed to fixed value in doc test
      5fd3ced [Hrishikesh Subramonian] doc test corrections
      20b3c68 [Hrishikesh Subramonian] python 3 fixes
      4d4e695 [Hrishikesh Subramonian] added arguments in python tests
      21eb84c [Hrishikesh Subramonian] Python Kmeans - setEpsilon, setInitializationSteps, k and computeCost added.
      5995ada9
    • MechCoder's avatar
      [SPARK-7202] [MLLIB] [PYSPARK] Add SparseMatrixPickler to SerDe · 5ab652cd
      MechCoder authored
      Utilities for pickling and unpickling SparseMatrices using SerDe
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #5775 from MechCoder/spark-7202 and squashes the following commits:
      
      7e689dc [MechCoder] [SPARK-7202] Add SparseMatrixPickler to SerDe
      5ab652cd
    • zsxwing's avatar
      [SPARK-7350] [STREAMING] [WEBUI] Attach the Streaming tab when calling ssc.start() · c6d1efba
      zsxwing authored
      It's meaningless to display the Streaming tab before `ssc.start()`. So we should attach it in the `ssc.start` method.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5898 from zsxwing/SPARK-7350 and squashes the following commits:
      
      e676487 [zsxwing] Attach the Streaming tab when calling ssc.start()
      c6d1efba
    • zsxwing's avatar
      [SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map stage... · 5ffc73e6
      zsxwing authored
      [SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite
      
      Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/
      
      This is because many tests share the same `JobListener`. Because after each test, `scheduler` isn't stopped. So actually it's still running. When running the test `run shuffle with map stage failure`, some previous test may trigger [ResubmitFailedStages](https://github.com/apache/spark/blob/ebc25a4ddfe07a67668217cec59893bc3b8cf730/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1120) logic, and report `jobFailed` and override the global `failure` variable.
      
      This PR uses `after` to call `scheduler.stop()` for each test.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5903 from zsxwing/SPARK-5074 and squashes the following commits:
      
      1e6f13e [zsxwing] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite
      5ffc73e6
    • Liang-Chi Hsieh's avatar
      [MINOR] Minor update for document · b83091ae
      Liang-Chi Hsieh authored
      Two minor doc errors in `BytesToBytesMap` and `UnsafeRow`.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5906 from viirya/minor_doc and squashes the following commits:
      
      27f9089 [Liang-Chi Hsieh] Minor update for doc.
      b83091ae
    • Imran Rashid's avatar
      [SPARK-3454] separate json endpoints for data in the UI · d4973580
      Imran Rashid authored
      Exposes data available in the UI as json over http.  Key points:
      
      * new endpoints, handled independently of existing XyzPage classes.  Root entrypoint is `JsonRootResource`
      * Uses jersey + jackson for routing & converting POJOs into json
      * tests against known results in `HistoryServerSuite`
      * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages.
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #4435 from squito/SPARK-3454 and squashes the following commits:
      
      da1e35f [Imran Rashid] typos etc.
      5e78b4f [Imran Rashid] fix rendering problems
      5ae02ad [Imran Rashid] Merge branch 'master' into SPARK-3454
      f016182 [Imran Rashid] change all constructors json-pojo class constructors to be private[spark] to protect us from mima-false-positives if we add fields
      3347b72 [Imran Rashid] mark EnumUtil as @Private
      ec140a2 [Imran Rashid] create @Private
      cc1febf [Imran Rashid] add docs on the metrics-as-json api
      cbaf287 [Imran Rashid] Merge branch 'master' into SPARK-3454
      56db31e [Imran Rashid] update tests for mulit-attempt
      7f3bc4e [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt"
      67008b4 [Imran Rashid] rats
      9e51400 [Imran Rashid] style
      c9bae1c [Imran Rashid] handle multiple attempts per app
      b87cd63 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt
      188762c [Imran Rashid] multi-attempt
      2af11e5 [Imran Rashid] Merge branch 'master' into SPARK-3454
      befff0c [Imran Rashid] review feedback
      14ac3ed [Imran Rashid] jersey-core needs to be explicit; move version & scope to parent pom.xml
      f90680e [Imran Rashid] Merge branch 'master' into SPARK-3454
      dc8a7fe [Imran Rashid] style, fix errant comments
      acb7ef6 [Imran Rashid] fix indentation
      7bf1811 [Imran Rashid] move MetricHelper so mima doesnt think its exposed; comments
      9d889d6 [Imran Rashid] undo some unnecessary changes
      f48a7b0 [Imran Rashid] docs
      52bbae8 [Imran Rashid] StorageListener & StorageStatusListener needs to synchronize internally to be thread-safe
      31c79ce [Imran Rashid] asm no longer needed for SPARK_PREPEND_CLASSES
      b2f8b91 [Imran Rashid] @DeveloperApi
      2e19be2 [Imran Rashid] lazily convert ApplicationInfo to avoid memory overhead
      ba3d9d2 [Imran Rashid] upper case enums
      39ac29c [Imran Rashid] move EnumUtil
      d2bde77 [Imran Rashid] update error handling & scoping
      4a234d3 [Imran Rashid] avoid jersey-media-json-jackson b/c of potential version conflicts
      a157a2f [Imran Rashid] style
      7bd4d15 [Imran Rashid] delete security test, since it doesnt do anything
      a325563 [Imran Rashid] style
      a9c5cf1 [Imran Rashid] undo changes superceeded by master
      0c6f968 [Imran Rashid] update deps
      1ed0d07 [Imran Rashid] Merge branch 'master' into SPARK-3454
      4c92af6 [Imran Rashid] style
      f2e63ad [Imran Rashid] Merge branch 'master' into SPARK-3454
      c22b11f [Imran Rashid] fix compile error
      9ea682c [Imran Rashid] go back to good ol' java enums
      cf86175 [Imran Rashid] style
      d493b38 [Imran Rashid] Merge branch 'master' into SPARK-3454
      f05ae89 [Imran Rashid] add in ExecutorSummaryInfo for MiMa :(
      101a698 [Imran Rashid] style
      d2ef58d [Imran Rashid] revert changes that had HistoryServer refresh the application listing more often
      b136e39b [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt"
      e031719 [Imran Rashid] fixes from review
      1f53a66 [Imran Rashid] style
      b4a7863 [Imran Rashid] fix compile error
      2c8b7ee [Imran Rashid] rats
      1578a4a [Imran Rashid] doc
      674f8dc [Imran Rashid] more explicit about total numbers of jobs & stages vs. number retained
      9922be0 [Imran Rashid] Merge branch 'master' into stage_distributions
      f5a5196 [Imran Rashid] undo removal of renderJson from MasterPage, since there is no substitute yet
      db61211 [Imran Rashid] get JobProgressListener directly from UI
      fdfc181 [Imran Rashid] stage/taskList
      63eb4a6 [Imran Rashid] tests for taskSummary
      ad27de8 [Imran Rashid] error handling on quantile values
      b2efcaf [Imran Rashid] cleanup, combine stage-related paths into one resource
      aaba896 [Imran Rashid] wire up task summary
      a4b1397 [Imran Rashid] stage metric distributions
      e48ba32 [Imran Rashid] rename
      eaf3bbb [Imran Rashid] style
      25cd894 [Imran Rashid] if only given day, assume GMT
      51eaedb [Imran Rashid] more visibility fixes
      9f28b7e [Imran Rashid] ack, more cleanup
      99764e1 [Imran Rashid] Merge branch 'SPARK-3454_w_jersey' into SPARK-3454
      a61a43c [Imran Rashid] oops, remove accidental checkin
      a066055 [Imran Rashid] set visibility on a lot of classes
      1f361c8 [Imran Rashid] update rat-excludes
      0be5120 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      2382bef [Imran Rashid] switch to using new "enum"
      fef6605 [Imran Rashid] some utils for working w/ new "enum" format
      dbfc7bf [Imran Rashid] style
      b86bcb0 [Imran Rashid] update test to look at one stage attempt
      5f9df24 [Imran Rashid] style
      7fd156a [Imran Rashid] refactor jsonDiff to avoid code duplication
      73f1378 [Imran Rashid] test json; also add test cases for cleaned stages & jobs
      97d411f [Imran Rashid] json endpoint for one job
      0c96147 [Imran Rashid] better error msgs for bad stageId vs bad attemptId
      dddbd29 [Imran Rashid] stages have attempt; jobs are sorted; resource for all attempts for one stage
      190c17a [Imran Rashid] StagePage should distinguish no task data, from unknown stage
      84cd497 [Imran Rashid] AllJobsPage should still report correct completed & failed job count, even if some have been cleaned, to make it consistent w/ AllStagesPage
      36e4062 [Imran Rashid] SparkUI needs to know about startTime, so it can list its own applicationInfo
      b4c75ed [Imran Rashid] fix merge conflicts; need to widen visibility in a few cases
      e91750a [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      56d2fc7 [Imran Rashid] jersey needs asm for SPARK_PREPEND_CLASSES to work
      f7df095 [Imran Rashid] add test for accumulables, and discover that I need update after all
      9c0c125 [Imran Rashid] add accumulableInfo
      00e9cc5 [Imran Rashid] more style
      3377e61 [Imran Rashid] scaladoc
      d05f7a9 [Imran Rashid] dont use case classes for status api POJOs, since they have binary compatibility issues
      654cecf [Imran Rashid] move all the status api POJOs to one file
      b86e2b0 [Imran Rashid] style
      18a8c45 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      5598f19 [Imran Rashid] delete some unnecessary code, more to go
      56edce0 [Imran Rashid] style
      017c755 [Imran Rashid] add in metrics now available
      1b78cb7 [Imran Rashid] fix some import ordering
      0dc3ea7 [Imran Rashid] if app isnt found, reload apps from FS before giving up
      c7d884f [Imran Rashid] fix merge conflicts
      0c12b50 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey
      b6a96a8 [Imran Rashid] compare json by AST, not string
      cd37845 [Imran Rashid] switch to using java.util.Dates for times
      a4ab5aa [Imran Rashid] add in explicit dependency on jersey 1.9 -- maven wasn't happy before this
      4fdc39f [Imran Rashid] refactor case insensitive enum parsing
      cba1ef6 [Imran Rashid] add security (maybe?) for metrics json
      f0264a7 [Imran Rashid] switch to using jersey for metrics json
      bceb3a9 [Imran Rashid] set http response code on error, some testing
      e0356b6 [Imran Rashid] put new test expectation files in rat excludes (is this OK?)
      b252e7a [Imran Rashid] small cleanup of accidental changes
      d1a8c92 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt
      4b398d0 [Imran Rashid] expose UI data as json in new endpoints
      d4973580
    • Jihong MA's avatar
      [SPARK-7357] Improving HBaseTest example · 51f46200
      Jihong MA authored
      Author: Jihong MA <linlin200605@gmail.com>
      
      Closes #5904 from JihongMA/SPARK-7357 and squashes the following commits:
      
      7d6153a [Jihong MA] SPARK-7357 Improving HBaseTest example
      51f46200
    • Sandy Ryza's avatar
      [SPARK-5112] Expose SizeEstimator as a developer api · 4222da68
      Sandy Ryza authored
      "The best way to size the amount of memory consumption your dataset will require is to create an RDD, put it into cache, and look at the SparkContext logs on your driver program. The logs will tell you how much memory each partition is consuming, which you can aggregate to get the total size of the RDD."
      -the Tuning Spark page
      
      This is a pain. It would be much nicer to expose simply functionality for understanding the memory footprint of a Java object.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3913 from sryza/sandy-spark-5112 and squashes the following commits:
      
      8d9e082 [Sandy Ryza] Add SizeEstimator in org.apache.spark
      2e1a906 [Sandy Ryza] Revert "Move SizeEstimator out of util"
      93f4cd0 [Sandy Ryza] Move SizeEstimator out of util
      e21c1f4 [Sandy Ryza] Remove unused import
      798ab88 [Sandy Ryza] Update documentation and add to SparkContext
      34c523c [Sandy Ryza] SPARK-5112. Expose SizeEstimator as a developer api
      4222da68
    • shekhar.bansal's avatar
      [SPARK-6653] [YARN] New config to specify port for sparkYarnAM actor system · fc8feaa8
      shekhar.bansal authored
      Author: shekhar.bansal <shekhar.bansal@guavus.com>
      
      Closes #5719 from zuxqoj/master and squashes the following commits:
      
      5574ff7 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for sparkYarnAM actor system
      5117258 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for sparkYarnAM actor system
      9de5330 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for sparkYarnAM actor system
      456a592 [shekhar.bansal] [SPARK-6653][yarn] New configuration property to specify port for sparkYarnAM actor system
      803e93e [shekhar.bansal] [SPARK-6653][yarn] New configuration property to specify port for sparkYarnAM actor system
      fc8feaa8
    • zsxwing's avatar
      [SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark.stre... · 4d29867e
      zsxwing authored
      ...aming.InputStreamsSuite.socket input stream
      
      Remove non-deterministic "Thread.sleep" and use deterministic strategies to fix the flaky failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2127/testReport/junit/org.apache.spark.streaming/InputStreamsSuite/socket_input_stream/
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5891 from zsxwing/SPARK-7341 and squashes the following commits:
      
      611157a [zsxwing] Add wait methods to BatchCounter and use BatchCounter in InputStreamsSuite
      014b58f [zsxwing] Use withXXX to clean up the resources
      c9bf746 [zsxwing] Move 'waitForStart' into the 'start' method and fix the code style
      9d0de6d [zsxwing] [SPARK-7341][Streaming][Tests] Fix the flaky test: org.apache.spark.streaming.InputStreamsSuite.socket input stream
      4d29867e
    • jerryshao's avatar
      [SPARK-7113] [STREAMING] Support input information reporting for Direct Kafka stream · 8436f7e9
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #5879 from jerryshao/SPARK-7113 and squashes the following commits:
      
      b0b506c [jerryshao] Address the comments
      0babe66 [jerryshao] Support input information reporting for Direct Kafka stream
      8436f7e9
    • Tathagata Das's avatar
      [HOTFIX] [TEST] Ignoring flaky tests · 8776fe0b
      Tathagata Das authored
      org.apache.spark.DriverSuite.driver should exit after finishing without cleanup (SPARK-530)
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2267/
      
      org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through --jars
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2271/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/
      
      org.apache.spark.streaming.flume.FlumePollingStreamSuite.flume polling test
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2269/
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5901 from tdas/ignore-flaky-tests and squashes the following commits:
      
      9cd8667 [Tathagata Das] Ignoring tests.
      8776fe0b
    • Tathagata Das's avatar
      [SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL and... · 1854ac32
      Tathagata Das authored
      [SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL and recovered on driver failure
      
      - Enabled ReceivedBlockTracker WAL by default
      - Stored block metadata in the WAL
      - Optimized WALBackedBlockRDD by skipping block fetch when the block is known to not exist in Spark
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5732 from tdas/SPARK-7139 and squashes the following commits:
      
      575476e [Tathagata Das] Added more tests to get 100% coverage of the WALBackedBlockRDD
      19668ba [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7139
      685fab3 [Tathagata Das] Addressed comments in PR
      637bc9c [Tathagata Das] Changed segment to handle
      466212c [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7139
      5f67a59 [Tathagata Das] Fixed HdfsUtils to handle append in local file system
      1bc5bc3 [Tathagata Das] Fixed bug on unexpected recovery
      d06fa21 [Tathagata Das] Enabled ReceivedBlockTracker by default, stored block metadata and optimized block fetching in WALBackedBlockRDD
      1854ac32
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Declare ivy dependency in root pom. · c5790a2f
      Marcelo Vanzin authored
      Without this, any dependency that pulls ivy transitively may override
      the version and potentially cause issue. In my machine, the hive tests
      were pulling an old version of ivy, and subsequently failing with a
      "NoSuchMethodError".
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5893 from vanzin/ivy-dep-fix and squashes the following commits:
      
      ea2112d [Marcelo Vanzin] [minor] [build] Declare ivy dependency in root pom.
      c5790a2f
    • Niccolo Becchi's avatar
      [MINOR] Renamed variables in SparkKMeans.scala, LocalKMeans.scala and... · da738cff
      Niccolo Becchi authored
      [MINOR] Renamed variables in SparkKMeans.scala, LocalKMeans.scala and kmeans.py to simplify readability
      
      With the previous syntax it could look like that the reduceByKey sums separately abscissas and ordinates of some 2D points. Perhaps in this way should be easier to understand the example, especially for who is starting the functional programming like me now.
      
      Author: Niccolo Becchi <niccolo.becchi@gmail.com>
      Author: pippobaudos <niccolo.becchi@gmail.com>
      
      Closes #5875 from pippobaudos/patch-1 and squashes the following commits:
      
      3bb3a47 [pippobaudos] renamed variables in LocalKMeans.scala and kmeans.py to simplify readability
      2c2a7a2 [Niccolo Becchi] Update SparkKMeans.scala
      da738cff
    • Xiangrui Meng's avatar
      [SPARK-7314] [SPARK-3524] [PYSPARK] upgrade Pyrolite to 4.4 · e9b16e67
      Xiangrui Meng authored
      This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under `org.spark-project` on Maven Central since there is no official release published there.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5850 from mengxr/SPARK-7314 and squashes the following commits:
      
      2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7314
      da3c2dd [Xiangrui Meng] remove my repo
      fe7e29b [Xiangrui Meng] switch to maven central
      6ddac0e [Xiangrui Meng] reverse the machine code for float/double
      d2d5b5b [Xiangrui Meng] change back to 4.4
      7824a9c [Xiangrui Meng] use Pyrolite 3.1
      cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing
      e9b16e67
Loading