- May 06, 2015
-
-
Daoyuan Wang authored
huangjs Acutally spark sql will first go through analysis period, in which we do widen types and promote strings, and then optimization, where constant IN will be converted into INSET. So it turn out that we only need to fix this for IN. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #4945 from adrian-wang/inset and squashes the following commits: 71e05cc [Daoyuan Wang] minor fix 581fa1c [Daoyuan Wang] mysql way f3f7baf [Daoyuan Wang] address comments 5eed4bc [Daoyuan Wang] promote string and do widen types for IN
-
Daoyuan Wang authored
Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #5803 from adrian-wang/decimalcompare and squashes the following commits: aef0e96 [Daoyuan Wang] add null handle ec455b9 [Daoyuan Wang] fix decimal compare for jdbc rdd
-
Reynold Xin authored
Author: Reynold Xin <rxin@databricks.com> Closes #5935 from rxin/df-doc1 and squashes the following commits: aaeaadb [Reynold Xin] [SQL] JavaDoc update for various DataFrame functions.
-
Xiangrui Meng authored
Since CrossValidator is a meta algorithm, we copy the implementation in Python. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #5926 from mengxr/SPARK-6940 and squashes the following commits: 6af181f [Xiangrui Meng] add TODOs 8285134 [Xiangrui Meng] update doc 060f7c3 [Xiangrui Meng] update doctest acac727 [Xiangrui Meng] add keyword args cdddecd [Xiangrui Meng] add CrossValidator in Python
-
zsxwing authored
Fixed the following failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/452/testReport/junit/org.apache.spark.broadcast/BroadcastSuite/Unpersisting_HttpBroadcast_on_executors_and_driver_in_distributed_mode/ The tests should wait until all slaves are up. Otherwise, there may be only a part of `BlockManager`s registered, and fail the tests. Author: zsxwing <zsxwing@gmail.com> Closes #5925 from zsxwing/SPARK-7384 and squashes the following commits: 783cb7b [zsxwing] Add comments for _jobProgressListener and remove postfixOps 1009ef1 [zsxwing] [SPARK-7384][Core][Tests] Fix flaky tests for distributed mode in BroadcastSuite
-
Yanbo Liang authored
https://issues.apache.org/jira/browse/SPARK-6267 Author: Yanbo Liang <ybliang8@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #5890 from yanboliang/spark-6267 and squashes the following commits: f20541d [Yanbo Liang] Merge pull request #3 from mengxr/SPARK-6267 7f202f9 [Xiangrui Meng] use Vector to have the best Python 2&3 compatibility 4bccfee [Yanbo Liang] fix doctest ec09412 [Yanbo Liang] fix typos 8214bbb [Yanbo Liang] fix code style 5c8ebe5 [Yanbo Liang] Python API for IsotonicRegression
-
Burak Yavuz authored
After a discussion on the user mailing list, it was decided to put all UDF's under `o.a.s.sql.functions` cc rxin Author: Burak Yavuz <brkyvz@gmail.com> Closes #5923 from brkyvz/move-math-funcs and squashes the following commits: a8dc3f7 [Burak Yavuz] address comments cf7a7bb [Burak Yavuz] [SPARK-7358] Move DataFrame mathfunctions into functions
-
- May 05, 2015
-
-
qhuang authored
Moving here from https://github.com/amplab-extras/SparkR-pkg/pull/241 sum() has been implemented. (https://github.com/amplab-extras/SparkR-pkg/pull/242) Now Phase 1: mean, sd, var have been implemented, but some things still need to be improved with the suggestions in https://issues.apache.org/jira/browse/SPARK-6841 Author: qhuang <qian.huang@intel.com> Closes #5446 from hqzizania/R and squashes the following commits: f283572 [qhuang] add test unit for describe() 2e74d5a [qhuang] add describe() DataFrame API
-
Reynold Xin authored
This reverts commit d4973580. The commit broke Spark on Windows.
-
Reynold Xin authored
See the comment in join function for more information. Author: Reynold Xin <rxin@databricks.com> Closes #5919 from rxin/self-join-resolve and squashes the following commits: e2fb0da [Reynold Xin] Updated SQLConf comment. 7233a86 [Reynold Xin] Updated comment. 6be2b4d [Reynold Xin] Removed println 9f6b72f [Reynold Xin] [SPARK-6231][SQL/DF] Automatically resolve ambiguity in join condition for self-joins.
-
Sandy Ryza authored
JoshRosen this PR addresses the comments you left on #4450 after it got merged. Author: Sandy Ryza <sandy@cloudera.com> Closes #5916 from sryza/sandy-spark-4550-cleanup and squashes the following commits: dee3d85 [Sandy Ryza] Some minor cleanup after SPARK-4550.
-
Shivaram Venkataraman authored
This change makes the RDD API private in SparkR and all internal uses of the SparkR API use SparkR::: to access private functions. Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #5895 from shivaram/rrdd-private and squashes the following commits: bdb2f07 [Shivaram Venkataraman] Make RDD private in SparkR. This change also makes all internal uses of the SparkR API use SparkR::: to access private functions
-
wangfei authored
make StringComparison extends ExpectsInputTypes and added expectedChildTypes, so do not need override expectedChildTypes in each subclass Author: wangfei <wangfei1@huawei.com> Closes #5905 from scwf/ExpectsInputTypes and squashes the following commits: b374ddf [wangfei] make stringcomparison extends ExpectsInputTypes
-
zsxwing authored
The default value will be changed to `1000` in #5533. So here I just used `1000`. Author: zsxwing <zsxwing@gmail.com> Closes #5899 from zsxwing/SPARK-7351 and squashes the following commits: e1ec515 [zsxwing] [SPARK-7351][Streaming][Docs] Add spark.streaming.ui.retainedBatches to docs
-
云峤 authored
Author: 云峤 <chensong.cs@alibaba-inc.com> Author: kaka1992 <kaka_1992@163.com> Closes #5839 from kaka1992/master and squashes the following commits: b15360d [kaka1992] Fix python unit test in sql/test. =_= I forget to commit this file last time. f928816 [kaka1992] Fix python style in sql/test. d2e7f72 [kaka1992] Fix python style in sql/test. c54d904 [kaka1992] Fix empty map bug. 7e64d1e [云峤] Update 7b9b858 [云峤] undo f080f8d [云峤] update pep8 76f0c51 [云峤] Merge remote-tracking branch 'remotes/upstream/master' 7d62368 [云峤] [SPARK-7294] ADD BETWEEN baf839b [云峤] [SPARK-7294] ADD BETWEEN d11d5b9 [云峤] [SPARK-7294] ADD BETWEEN
-
zsxwing authored
This is the initial work of SPARK-6939. Not yet ready for code review. Here are the screenshots:   TODOs: - [x] Display more information on mouse hover - [x] Align the timeline and distribution graphs - [x] Clean up the codes Author: zsxwing <zsxwing@gmail.com> Closes #5533 from zsxwing/SPARK-6939 and squashes the following commits: 9f7cd19 [zsxwing] Merge branch 'master' into SPARK-6939 deacc3f [zsxwing] Remove unused import cd03424 [zsxwing] Fix .rat-excludes 70cc87d [zsxwing] Streaming Scheduling Delay => Scheduling Delay d457277 [zsxwing] Fix UIUtils in BatchPage b3f303e [zsxwing] Add comments for unclear classes and methods ff0bff8 [zsxwing] Make InputDStream.name private[streaming] cc392c5 [zsxwing] Merge branch 'master' into SPARK-6939 e275e23 [zsxwing] Move time related methods to Streaming's UIUtils d5d86f6 [zsxwing] Fix incorrect lastErrorTime 3be4b7a [zsxwing] Use InputInfo b50fa32 [zsxwing] Jump to the batch page when clicking a point in the timeline graphs 203605d [zsxwing] Merge branch 'master' into SPARK-6939 74307cf [zsxwing] Reuse the data for histogram graphs to reduce the page size 2586916 [zsxwing] Merge branch 'master' into SPARK-6939 70d8533 [zsxwing] Remove BatchInfo.numRecords and a few renames 7bbdc0a [zsxwing] Hide the receiver sub table if no receiver a2972e9 [zsxwing] Add some ui tests for StreamingPage fd03ad0 [zsxwing] Add a test to verify no memory leak 4a8f886 [zsxwing] Merge branch 'master' into SPARK-6939 18607a1 [zsxwing] Merge branch 'master' into SPARK-6939 d0b0aec [zsxwing] Clean up the codes a459f49 [zsxwing] Add a dash line to processing time graphs 8e4363c [zsxwing] Prepare for the demo c81a1ee [zsxwing] Change time unit in the graphs automatically 4c0b43f [zsxwing] Update Streaming UI 04c7500 [zsxwing] Make the server and client use the same timezone fed8219 [zsxwing] Move the x axis at the top and show a better tooltip c23ce10 [zsxwing] Make two graphs close d78672a [zsxwing] Make the X axis use the same range 881c907 [zsxwing] Use histogram for distribution 5688702 [zsxwing] Fix the unit test ddf741a [zsxwing] Fix the unit test ad93295 [zsxwing] Remove unnecessary codes a0458f9 [zsxwing] Clean the codes b82ed1e [zsxwing] Update the graphs as per comments dd653a1 [zsxwing] Add timeline and histogram graphs for streaming statistics
-
Sandy Ryza authored
This patch adds a one hot encoder for categorical features. Planning to add documentation and another test after getting feedback on the approach. A couple choices made here: * There's an `includeFirst` option which, if false, creates numCategories - 1 columns and, if true, creates numCategories columns. The default is true, which is the behavior in scikit-learn. * The user is expected to pass a `Seq` of category names when instantiating a `OneHotEncoder`. These can be easily gotten from a `StringIndexer`. The names are used for the output column names, which take the form colName_categoryName. Author: Sandy Ryza <sandy@cloudera.com> Closes #5500 from sryza/sandy-spark-5888 and squashes the following commits: f383250 [Sandy Ryza] Infer label names automatically 6e257b9 [Sandy Ryza] Review comments 7c539cf [Sandy Ryza] Vector transformers 1c182dd [Sandy Ryza] SPARK-5888. [MLLIB]. Add OneHotEncoder as a Transformer
-
Xiangrui Meng authored
This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz Author: Xiangrui Meng <meng@databricks.com> Closes #5885 from mengxr/SPARK-7333 and squashes the following commits: 25d7451 [Xiangrui Meng] fix tests in python 3 babdde7 [Xiangrui Meng] fix doc cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
-
Burak Yavuz authored
Reduced take size from 1e8 to 1e6. cc rxin Author: Burak Yavuz <brkyvz@gmail.com> Closes #5900 from brkyvz/df-cont-followup and squashes the following commits: c11e762 [Burak Yavuz] fix grammar b30ace2 [Burak Yavuz] address comments a417ba5 [Burak Yavuz] [SPARK-7243][SQL] Reduce size for Contingency Tables in DataFrames
-
jerryshao authored
Add a metric source to expose the internal status of ExecutorAllocationManager to better monitoring the resource usage of executors when dynamic allocation is enable. Please help to review, thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes #5589 from jerryshao/dynamic-allocation-source and squashes the following commits: 104d155 [jerryshao] rebase and address the comments c501a2c [jerryshao] Address the comments d237ba5 [jerryshao] Address the comments 2c3540f [jerryshao] Add a metric source for ExecutorAllocationManager
-
Andrew Or authored
I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas Author: Andrew Or <andrew@databricks.com> Closes #5860 from andrewor14/streaming-closure-cleaner and squashes the following commits: 8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure 5ee4e25 [Andrew Or] Fix tests eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner 67eeff4 [Andrew Or] Add tests a4fa768 [Andrew Or] Clean the closure, not the RDD
-
Andrew Or authored
Note: ~140 lines are tests. In a nutshell, we never cleaned closures the user provided through the following operations: - sortBy - keyBy - mapPartitions - mapPartitionsWithIndex - aggregateByKey - foldByKey - foreachAsync - one of the aliases for runJob - runApproximateJob For more details on a reproduction and why they were not cleaned, please see [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237). Author: Andrew Or <andrew@databricks.com> Closes #5787 from andrewor14/clean-more and squashes the following commits: 2f1f476 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 7265865 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more df3caa3 [Andrew Or] Address comments 7a3cc80 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-more 6498f44 [Andrew Or] Add missing test for groupBy e83699e [Andrew Or] Clean one more 8ac3074 [Andrew Or] Prevent NPE in tests when CC is used outside of an app 9ac5f9b [Andrew Or] Clean closures that are not currently cleaned 19e33b4 [Andrew Or] Add tests for all public RDD APIs that take in closures
-
Alain authored
Require non empty input rdd such that we can take the first labeledpoint and get the feature size Author: Alain <aihe@usc.edu> Author: aihe@usc.edu <aihe@usc.edu> Closes #5810 from AiHe/decisiontree-issue and squashes the following commits: 3b1d08a [aihe@usc.edu] [MLLIB][tree] merge the assertion into the evaluation of numFeatures cf2e567 [Alain] [MLLIB][tree] Use a rdd api to verify size of input rdd > 0 when building meta data b448f47 [Alain] [MLLIB][tree] Verify size of input rdd > 0 when building meta data
-
Xiangrui Meng authored
Closes #5878
-
Hrishikesh Subramonian authored
The following items are added to Python kmeans: kmeans - setEpsilon, setInitializationSteps KMeansModel - computeCost, k Author: Hrishikesh Subramonian <hrishikesh.subramonian@flytxt.com> Closes #5647 from FlytxtRnD/newPyKmeansAPI and squashes the following commits: b9e451b [Hrishikesh Subramonian] set seed to fixed value in doc test 5fd3ced [Hrishikesh Subramonian] doc test corrections 20b3c68 [Hrishikesh Subramonian] python 3 fixes 4d4e695 [Hrishikesh Subramonian] added arguments in python tests 21eb84c [Hrishikesh Subramonian] Python Kmeans - setEpsilon, setInitializationSteps, k and computeCost added.
-
MechCoder authored
Utilities for pickling and unpickling SparseMatrices using SerDe Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #5775 from MechCoder/spark-7202 and squashes the following commits: 7e689dc [MechCoder] [SPARK-7202] Add SparseMatrixPickler to SerDe
-
zsxwing authored
It's meaningless to display the Streaming tab before `ssc.start()`. So we should attach it in the `ssc.start` method. Author: zsxwing <zsxwing@gmail.com> Closes #5898 from zsxwing/SPARK-7350 and squashes the following commits: e676487 [zsxwing] Attach the Streaming tab when calling ssc.start()
-
zsxwing authored
[SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/ This is because many tests share the same `JobListener`. Because after each test, `scheduler` isn't stopped. So actually it's still running. When running the test `run shuffle with map stage failure`, some previous test may trigger [ResubmitFailedStages](https://github.com/apache/spark/blob/ebc25a4ddfe07a67668217cec59893bc3b8cf730/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1120) logic, and report `jobFailed` and override the global `failure` variable. This PR uses `after` to call `scheduler.stop()` for each test. Author: zsxwing <zsxwing@gmail.com> Closes #5903 from zsxwing/SPARK-5074 and squashes the following commits: 1e6f13e [zsxwing] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite
-
Liang-Chi Hsieh authored
Two minor doc errors in `BytesToBytesMap` and `UnsafeRow`. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #5906 from viirya/minor_doc and squashes the following commits: 27f9089 [Liang-Chi Hsieh] Minor update for doc.
-
Imran Rashid authored
Exposes data available in the UI as json over http. Key points: * new endpoints, handled independently of existing XyzPage classes. Root entrypoint is `JsonRootResource` * Uses jersey + jackson for routing & converting POJOs into json * tests against known results in `HistoryServerSuite` * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages. Author: Imran Rashid <irashid@cloudera.com> Closes #4435 from squito/SPARK-3454 and squashes the following commits: da1e35f [Imran Rashid] typos etc. 5e78b4f [Imran Rashid] fix rendering problems 5ae02ad [Imran Rashid] Merge branch 'master' into SPARK-3454 f016182 [Imran Rashid] change all constructors json-pojo class constructors to be private[spark] to protect us from mima-false-positives if we add fields 3347b72 [Imran Rashid] mark EnumUtil as @Private ec140a2 [Imran Rashid] create @Private cc1febf [Imran Rashid] add docs on the metrics-as-json api cbaf287 [Imran Rashid] Merge branch 'master' into SPARK-3454 56db31e [Imran Rashid] update tests for mulit-attempt 7f3bc4e [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt" 67008b4 [Imran Rashid] rats 9e51400 [Imran Rashid] style c9bae1c [Imran Rashid] handle multiple attempts per app b87cd63 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt 188762c [Imran Rashid] multi-attempt 2af11e5 [Imran Rashid] Merge branch 'master' into SPARK-3454 befff0c [Imran Rashid] review feedback 14ac3ed [Imran Rashid] jersey-core needs to be explicit; move version & scope to parent pom.xml f90680e [Imran Rashid] Merge branch 'master' into SPARK-3454 dc8a7fe [Imran Rashid] style, fix errant comments acb7ef6 [Imran Rashid] fix indentation 7bf1811 [Imran Rashid] move MetricHelper so mima doesnt think its exposed; comments 9d889d6 [Imran Rashid] undo some unnecessary changes f48a7b0 [Imran Rashid] docs 52bbae8 [Imran Rashid] StorageListener & StorageStatusListener needs to synchronize internally to be thread-safe 31c79ce [Imran Rashid] asm no longer needed for SPARK_PREPEND_CLASSES b2f8b91 [Imran Rashid] @DeveloperApi 2e19be2 [Imran Rashid] lazily convert ApplicationInfo to avoid memory overhead ba3d9d2 [Imran Rashid] upper case enums 39ac29c [Imran Rashid] move EnumUtil d2bde77 [Imran Rashid] update error handling & scoping 4a234d3 [Imran Rashid] avoid jersey-media-json-jackson b/c of potential version conflicts a157a2f [Imran Rashid] style 7bd4d15 [Imran Rashid] delete security test, since it doesnt do anything a325563 [Imran Rashid] style a9c5cf1 [Imran Rashid] undo changes superceeded by master 0c6f968 [Imran Rashid] update deps 1ed0d07 [Imran Rashid] Merge branch 'master' into SPARK-3454 4c92af6 [Imran Rashid] style f2e63ad [Imran Rashid] Merge branch 'master' into SPARK-3454 c22b11f [Imran Rashid] fix compile error 9ea682c [Imran Rashid] go back to good ol' java enums cf86175 [Imran Rashid] style d493b38 [Imran Rashid] Merge branch 'master' into SPARK-3454 f05ae89 [Imran Rashid] add in ExecutorSummaryInfo for MiMa :( 101a698 [Imran Rashid] style d2ef58d [Imran Rashid] revert changes that had HistoryServer refresh the application listing more often b136e39b [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt" e031719 [Imran Rashid] fixes from review 1f53a66 [Imran Rashid] style b4a7863 [Imran Rashid] fix compile error 2c8b7ee [Imran Rashid] rats 1578a4a [Imran Rashid] doc 674f8dc [Imran Rashid] more explicit about total numbers of jobs & stages vs. number retained 9922be0 [Imran Rashid] Merge branch 'master' into stage_distributions f5a5196 [Imran Rashid] undo removal of renderJson from MasterPage, since there is no substitute yet db61211 [Imran Rashid] get JobProgressListener directly from UI fdfc181 [Imran Rashid] stage/taskList 63eb4a6 [Imran Rashid] tests for taskSummary ad27de8 [Imran Rashid] error handling on quantile values b2efcaf [Imran Rashid] cleanup, combine stage-related paths into one resource aaba896 [Imran Rashid] wire up task summary a4b1397 [Imran Rashid] stage metric distributions e48ba32 [Imran Rashid] rename eaf3bbb [Imran Rashid] style 25cd894 [Imran Rashid] if only given day, assume GMT 51eaedb [Imran Rashid] more visibility fixes 9f28b7e [Imran Rashid] ack, more cleanup 99764e1 [Imran Rashid] Merge branch 'SPARK-3454_w_jersey' into SPARK-3454 a61a43c [Imran Rashid] oops, remove accidental checkin a066055 [Imran Rashid] set visibility on a lot of classes 1f361c8 [Imran Rashid] update rat-excludes 0be5120 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 2382bef [Imran Rashid] switch to using new "enum" fef6605 [Imran Rashid] some utils for working w/ new "enum" format dbfc7bf [Imran Rashid] style b86bcb0 [Imran Rashid] update test to look at one stage attempt 5f9df24 [Imran Rashid] style 7fd156a [Imran Rashid] refactor jsonDiff to avoid code duplication 73f1378 [Imran Rashid] test json; also add test cases for cleaned stages & jobs 97d411f [Imran Rashid] json endpoint for one job 0c96147 [Imran Rashid] better error msgs for bad stageId vs bad attemptId dddbd29 [Imran Rashid] stages have attempt; jobs are sorted; resource for all attempts for one stage 190c17a [Imran Rashid] StagePage should distinguish no task data, from unknown stage 84cd497 [Imran Rashid] AllJobsPage should still report correct completed & failed job count, even if some have been cleaned, to make it consistent w/ AllStagesPage 36e4062 [Imran Rashid] SparkUI needs to know about startTime, so it can list its own applicationInfo b4c75ed [Imran Rashid] fix merge conflicts; need to widen visibility in a few cases e91750a [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 56d2fc7 [Imran Rashid] jersey needs asm for SPARK_PREPEND_CLASSES to work f7df095 [Imran Rashid] add test for accumulables, and discover that I need update after all 9c0c125 [Imran Rashid] add accumulableInfo 00e9cc5 [Imran Rashid] more style 3377e61 [Imran Rashid] scaladoc d05f7a9 [Imran Rashid] dont use case classes for status api POJOs, since they have binary compatibility issues 654cecf [Imran Rashid] move all the status api POJOs to one file b86e2b0 [Imran Rashid] style 18a8c45 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 5598f19 [Imran Rashid] delete some unnecessary code, more to go 56edce0 [Imran Rashid] style 017c755 [Imran Rashid] add in metrics now available 1b78cb7 [Imran Rashid] fix some import ordering 0dc3ea7 [Imran Rashid] if app isnt found, reload apps from FS before giving up c7d884f [Imran Rashid] fix merge conflicts 0c12b50 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey b6a96a8 [Imran Rashid] compare json by AST, not string cd37845 [Imran Rashid] switch to using java.util.Dates for times a4ab5aa [Imran Rashid] add in explicit dependency on jersey 1.9 -- maven wasn't happy before this 4fdc39f [Imran Rashid] refactor case insensitive enum parsing cba1ef6 [Imran Rashid] add security (maybe?) for metrics json f0264a7 [Imran Rashid] switch to using jersey for metrics json bceb3a9 [Imran Rashid] set http response code on error, some testing e0356b6 [Imran Rashid] put new test expectation files in rat excludes (is this OK?) b252e7a [Imran Rashid] small cleanup of accidental changes d1a8c92 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt 4b398d0 [Imran Rashid] expose UI data as json in new endpoints
-
Jihong MA authored
Author: Jihong MA <linlin200605@gmail.com> Closes #5904 from JihongMA/SPARK-7357 and squashes the following commits: 7d6153a [Jihong MA] SPARK-7357 Improving HBaseTest example
-
Sandy Ryza authored
"The best way to size the amount of memory consumption your dataset will require is to create an RDD, put it into cache, and look at the SparkContext logs on your driver program. The logs will tell you how much memory each partition is consuming, which you can aggregate to get the total size of the RDD." -the Tuning Spark page This is a pain. It would be much nicer to expose simply functionality for understanding the memory footprint of a Java object. Author: Sandy Ryza <sandy@cloudera.com> Closes #3913 from sryza/sandy-spark-5112 and squashes the following commits: 8d9e082 [Sandy Ryza] Add SizeEstimator in org.apache.spark 2e1a906 [Sandy Ryza] Revert "Move SizeEstimator out of util" 93f4cd0 [Sandy Ryza] Move SizeEstimator out of util e21c1f4 [Sandy Ryza] Remove unused import 798ab88 [Sandy Ryza] Update documentation and add to SparkContext 34c523c [Sandy Ryza] SPARK-5112. Expose SizeEstimator as a developer api
-
shekhar.bansal authored
Author: shekhar.bansal <shekhar.bansal@guavus.com> Closes #5719 from zuxqoj/master and squashes the following commits: 5574ff7 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for sparkYarnAM actor system 5117258 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for sparkYarnAM actor system 9de5330 [shekhar.bansal] [SPARK-6653][yarn] New config to specify port for sparkYarnAM actor system 456a592 [shekhar.bansal] [SPARK-6653][yarn] New configuration property to specify port for sparkYarnAM actor system 803e93e [shekhar.bansal] [SPARK-6653][yarn] New configuration property to specify port for sparkYarnAM actor system
-
zsxwing authored
...aming.InputStreamsSuite.socket input stream Remove non-deterministic "Thread.sleep" and use deterministic strategies to fix the flaky failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2127/testReport/junit/org.apache.spark.streaming/InputStreamsSuite/socket_input_stream/ Author: zsxwing <zsxwing@gmail.com> Closes #5891 from zsxwing/SPARK-7341 and squashes the following commits: 611157a [zsxwing] Add wait methods to BatchCounter and use BatchCounter in InputStreamsSuite 014b58f [zsxwing] Use withXXX to clean up the resources c9bf746 [zsxwing] Move 'waitForStart' into the 'start' method and fix the code style 9d0de6d [zsxwing] [SPARK-7341][Streaming][Tests] Fix the flaky test: org.apache.spark.streaming.InputStreamsSuite.socket input stream
-
jerryshao authored
Author: jerryshao <saisai.shao@intel.com> Closes #5879 from jerryshao/SPARK-7113 and squashes the following commits: b0b506c [jerryshao] Address the comments 0babe66 [jerryshao] Support input information reporting for Direct Kafka stream
-
Tathagata Das authored
org.apache.spark.DriverSuite.driver should exit after finishing without cleanup (SPARK-530) https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2267/ org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through --jars https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2271/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/ org.apache.spark.streaming.flume.FlumePollingStreamSuite.flume polling test https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2269/ Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #5901 from tdas/ignore-flaky-tests and squashes the following commits: 9cd8667 [Tathagata Das] Ignoring tests.
-
Tathagata Das authored
[SPARK-7139] [STREAMING] Allow received block metadata to be saved to WAL and recovered on driver failure - Enabled ReceivedBlockTracker WAL by default - Stored block metadata in the WAL - Optimized WALBackedBlockRDD by skipping block fetch when the block is known to not exist in Spark Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #5732 from tdas/SPARK-7139 and squashes the following commits: 575476e [Tathagata Das] Added more tests to get 100% coverage of the WALBackedBlockRDD 19668ba [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7139 685fab3 [Tathagata Das] Addressed comments in PR 637bc9c [Tathagata Das] Changed segment to handle 466212c [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7139 5f67a59 [Tathagata Das] Fixed HdfsUtils to handle append in local file system 1bc5bc3 [Tathagata Das] Fixed bug on unexpected recovery d06fa21 [Tathagata Das] Enabled ReceivedBlockTracker by default, stored block metadata and optimized block fetching in WALBackedBlockRDD
-
Marcelo Vanzin authored
Without this, any dependency that pulls ivy transitively may override the version and potentially cause issue. In my machine, the hive tests were pulling an old version of ivy, and subsequently failing with a "NoSuchMethodError". Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #5893 from vanzin/ivy-dep-fix and squashes the following commits: ea2112d [Marcelo Vanzin] [minor] [build] Declare ivy dependency in root pom.
-
Niccolo Becchi authored
[MINOR] Renamed variables in SparkKMeans.scala, LocalKMeans.scala and kmeans.py to simplify readability With the previous syntax it could look like that the reduceByKey sums separately abscissas and ordinates of some 2D points. Perhaps in this way should be easier to understand the example, especially for who is starting the functional programming like me now. Author: Niccolo Becchi <niccolo.becchi@gmail.com> Author: pippobaudos <niccolo.becchi@gmail.com> Closes #5875 from pippobaudos/patch-1 and squashes the following commits: 3bb3a47 [pippobaudos] renamed variables in LocalKMeans.scala and kmeans.py to simplify readability 2c2a7a2 [Niccolo Becchi] Update SparkKMeans.scala
-
Xiangrui Meng authored
This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under `org.spark-project` on Maven Central since there is no official release published there. Author: Xiangrui Meng <meng@databricks.com> Closes #5850 from mengxr/SPARK-7314 and squashes the following commits: 2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7314 da3c2dd [Xiangrui Meng] remove my repo fe7e29b [Xiangrui Meng] switch to maven central 6ddac0e [Xiangrui Meng] reverse the machine code for float/double d2d5b5b [Xiangrui Meng] change back to 4.4 7824a9c [Xiangrui Meng] use Pyrolite 3.1 cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing
-