Skip to content
Snippets Groups Projects
  1. Jan 22, 2015
    • Sandy Ryza's avatar
      SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAlloca... · 820ce035
      Sandy Ryza authored
      ...tor
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #4164 from sryza/sandy-spark-5370 and squashes the following commits:
      
      0c8d736 [Sandy Ryza] SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAllocator
      820ce035
    • Liang-Chi Hsieh's avatar
      [SPARK-5365][MLlib] Refactor KMeans to reduce redundant data · 246111d1
      Liang-Chi Hsieh authored
      If a point is selected as new centers for many runs, it would collect many redundant data. This pr refactors it.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4159 from viirya/small_refactor_kmeans and squashes the following commits:
      
      25487e6 [Liang-Chi Hsieh] Refactor codes to reduce redundant data.
      246111d1
    • Tathagata Das's avatar
      [SPARK-5147][Streaming] Delete the received data WAL log periodically · 3027f06b
      Tathagata Das authored
      This is a refactored fix based on jerryshao 's PR #4037
      This enabled deletion of old WAL files containing the received block data.
      Improvements over #4037
      - Respecting the rememberDuration of all receiver streams. In #4037, if there were two receiver streams with multiple remember durations, the deletion would have delete based on the shortest remember duration, thus deleting data prematurely for the receiver stream with longer remember duration.
      - Added unit test to test creation of receiver WAL, automatic deletion, and respecting of remember duration.
      
      jerryshao I am going to merge this ASAP to make it 1.2.1 Thanks for the initial draft of this PR. Made my job much easier.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #4149 from tdas/SPARK-5147 and squashes the following commits:
      
      730798b [Tathagata Das] Added comments.
      c4cf067 [Tathagata Das] Minor fixes
      2579b27 [Tathagata Das] Refactored the fix to make sure that the cleanup respects the remember duration of all the receiver streams
      2736fd1 [jerryshao] Delete the old WAL log periodically
      3027f06b
    • Basin's avatar
      [SPARK-5317]Set BoostingStrategy.defaultParams With Enumeration... · fcb3e186
      Basin authored
      [SPARK-5317]Set BoostingStrategy.defaultParams With Enumeration Algo.Classification or Algo.Regression
      
      JIRA Issue: https://issues.apache.org/jira/browse/SPARK-5317
      When setting the BoostingStrategy.defaultParams("Classification"), It's more straightforward to set it with the Enumeration Algo.Classification, just like BoostingStragety.defaultParams(Algo.Classification).
      I overload the method BoostingStragety.defaultParams().
      
      Author: Basin <jpsachilles@gmail.com>
      
      Closes #4103 from Peishen-Jia/stragetyAlgo and squashes the following commits:
      
      87bab1c [Basin] Docs and Code documentations updated.
      3b72875 [Basin] defaultParams(algoStr: String) call defaultParams(algo: Algo).
      7c1e6ee [Basin] Doc of Java updated. algo -> algoStr instead.
      d5c8a2e [Basin] Merge branch 'stragetyAlgo' of github.com:Peishen-Jia/spark into stragetyAlgo
      65f96ce [Basin] mllib-ensembles doc modified.
      e04a5aa [Basin] boostingstrategy.defaultParam string algo to enumeration.
      68cf544 [Basin] mllib-ensembles doc modified.
      a4aea51 [Basin] boostingstrategy.defaultParam string algo to enumeration.
      fcb3e186
  2. Jan 21, 2015
    • Xiangrui Meng's avatar
      [SPARK-3424][MLLIB] cache point distances during k-means|| init · ca7910d6
      Xiangrui Meng authored
      This PR ports the following feature implemented in #2634 by derrickburns:
      
      * During k-means|| initialization, we should cache costs (squared distances) previously computed.
      
      It also contains the following optimization:
      
      * aggregate sumCosts directly
      * ran multiple (#runs) k-means++ in parallel
      
      I compared the performance locally on mnist-digit. Before this patch:
      
      ![before](https://cloud.githubusercontent.com/assets/829644/5845647/93080862-a172-11e4-9a35-044ec711afc4.png)
      
      with this patch:
      
      ![after](https://cloud.githubusercontent.com/assets/829644/5845653/a47c29e8-a172-11e4-8e9f-08db57fe3502.png)
      
      It is clear that each k-means|| iteration takes about the same amount of time with this patch.
      
      Authors:
        Derrick Burns <derrickburns@gmail.com>
        Xiangrui Meng <meng@databricks.com>
      
      Closes #4144 from mengxr/SPARK-3424-kmeans-parallel and squashes the following commits:
      
      0a875ec [Xiangrui Meng] address comments
      4341bb8 [Xiangrui Meng] do not re-compute point distances during k-means||
      ca7910d6
    • Cheng Hao's avatar
      [SPARK-5202] [SQL] Add hql variable substitution support · 27bccc5e
      Cheng Hao authored
      https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
      
      This is a block issue for the CLI user, it impacts the existed hql scripts from Hive.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4003 from chenghao-intel/substitution and squashes the following commits:
      
      bb41fd6 [Cheng Hao] revert the removed the implicit conversion
      af7c31a [Cheng Hao] add hql variable substitution support
      27bccc5e
    • Davies Liu's avatar
      [SPARK-5355] make SparkConf thread-safe · 9bad0622
      Davies Liu authored
      The SparkConf is not thread-safe, but is accessed by many threads. The getAll() could return parts of the configs if another thread is access it.
      
      This PR changes SparkConf.settings to a thread-safe TrieMap.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4143 from davies/safe-conf and squashes the following commits:
      
      f8fa1cf [Davies Liu] change to TrieMap
      a1d769a [Davies Liu] make SparkConf thread-safe
      9bad0622
    • wangfei's avatar
      [SPARK-4984][CORE][WEBUI] Adding a pop-up containing the full job description when it is very long · 3be2a887
      wangfei authored
      In some case the job description will be very long, such as a long sql. refer to #3718
      This PR add a pop-up for job description when it is long.
      
      ![image](https://cloud.githubusercontent.com/assets/7018048/5847400/c757cbbc-a207-11e4-891f-528821c2e68d.png)
      
      ![image](https://cloud.githubusercontent.com/assets/7018048/5847409/d434b2b4-a207-11e4-8813-03a74b43d766.png)
      
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #3819 from scwf/popup-descrip-ui and squashes the following commits:
      
      ba02b83 [wangfei] address comments
      a7c5e7b [wangfei] spot that it's been truncated
      fbf6162 [wangfei] Merge branch 'master' into popup-descrip-ui
      0bca96d [wangfei] remove no use val
      4b55c3b [wangfei] fix style issue
      353c6f4 [wangfei] pop up the description of job with a styled read-only text form field
      3be2a887
    • Cheng Lian's avatar
      [SQL] [Minor] Remove deprecated parquet tests · ba19689f
      Cheng Lian authored
      This PR removes the deprecated `ParquetQuerySuite`, renamed `ParquetQuerySuite2` to `ParquetQuerySuite`, and refactored changes introduced in #4115 to `ParquetFilterSuite` . It is a follow-up of #3644.
      
      Notice that test cases in the old `ParquetQuerySuite` have already been well covered by other test suites introduced in #3644.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4116)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4116 from liancheng/remove-deprecated-parquet-tests and squashes the following commits:
      
      f73b8f9 [Cheng Lian] Removes deprecated Parquet test suite
      ba19689f
    • Josh Rosen's avatar
      Revert "[SPARK-5244] [SQL] add coalesce() in sql parser" · b328ac6c
      Josh Rosen authored
      This reverts commit 812d3679.
      b328ac6c
    • Cheng Hao's avatar
      [SPARK-5009] [SQL] Long keyword support in SQL Parsers · 8361078e
      Cheng Hao authored
      * The `SqlLexical.allCaseVersions` will cause `StackOverflowException` if the key word is too long, the patch will fix that by normalizing all of the keywords in `SqlLexical`.
      * And make a unified SparkSQLParser for sharing the common code.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3926 from chenghao-intel/long_keyword and squashes the following commits:
      
      686660f [Cheng Hao] Support Long Keyword and Refactor the SQLParsers
      8361078e
    • Daoyuan Wang's avatar
      [SPARK-5244] [SQL] add coalesce() in sql parser · 812d3679
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #4040 from adrian-wang/coalesce and squashes the following commits:
      
      0ac8e8f [Daoyuan Wang] add coalesce() in sql parser
      812d3679
    • Kenji Kikushima's avatar
      [SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph... · 3ee3ab59
      Kenji Kikushima authored
      [SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop
      
      I looked into GraphGenerators#chooseCell, and found that chooseCell can't generate more edges than pow(2, (2 * (log2(numVertices)-1))) to make a Power-law graph. (Ex. numVertices:4 upperbound:4, numVertices:8 upperbound:16, numVertices:16 upperbound:64)
      If we request more edges over the upperbound, rmatGraph fall into infinite loop. So, how about adding an argument validation?
      
      Author: Kenji Kikushima <kikushima.kenji@lab.ntt.co.jp>
      
      Closes #3950 from kj-ki/SPARK-5064 and squashes the following commits:
      
      4ee18c7 [Ankur Dave] Reword error message and add unit test
      d760bc7 [Kenji Kikushima] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop.
      3ee3ab59
    • nate.crosswhite's avatar
      [SPARK-4749] [mllib]: Allow initializing KMeans clusters using a seed · 7450a992
      nate.crosswhite authored
      This implements the functionality for SPARK-4749 and provides units tests in Scala and PySpark
      
      Author: nate.crosswhite <nate.crosswhite@stresearch.com>
      Author: nxwhite-str <nxwhite-str@users.noreply.github.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #3610 from nxwhite-str/master and squashes the following commits:
      
      a2ebbd3 [nxwhite-str] Merge pull request #1 from mengxr/SPARK-4749-kmeans-seed
      7668124 [Xiangrui Meng] minor updates
      f8d5928 [nate.crosswhite] Addressing PR issues
      277d367 [nate.crosswhite] Merge remote-tracking branch 'upstream/master'
      9156a57 [nate.crosswhite] Merge remote-tracking branch 'upstream/master'
      5d087b4 [nate.crosswhite] Adding KMeans train with seed and Scala unit test
      616d111 [nate.crosswhite] Merge remote-tracking branch 'upstream/master'
      35c1884 [nate.crosswhite] Add kmeans initial seed to pyspark API
      7450a992
    • Reza Zadeh's avatar
      [MLlib] [SPARK-5301] Missing conversions and operations on IndexedRowMatrix and CoordinateMatrix · aa1e22b1
      Reza Zadeh authored
      * Transpose is missing from CoordinateMatrix (this is cheap to compute, so it should be there)
      * IndexedRowMatrix should be convertable to CoordinateMatrix (conversion added)
      
      Tests for both added.
      
      Author: Reza Zadeh <reza@databricks.com>
      
      Closes #4089 from rezazadeh/matutils and squashes the following commits:
      
      ec5238b [Reza Zadeh] Array -> Iterator to avoid temp array
      3ce0b5d [Reza Zadeh] Array -> Iterator
      bbc907a [Reza Zadeh] Use 'i' for index, and zipWithIndex
      cb10ae5 [Reza Zadeh] remove unnecessary import
      a7ae048 [Reza Zadeh] Missing linear algebra utilities
      aa1e22b1
    • Sandy Ryza's avatar
      SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnA... · 2eeada37
      Sandy Ryza authored
      ...llocator
      
      The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark.
      
      In service of this, it does a few things:
      * Uses AMRMClient APIs for matching containers to requests.
      * Adds calls to AMRMClient.removeContainerRequest so that, when we use a container, we don't end up requesting it again.
      * Removes YarnAllocator's host->rack cache. YARN's RackResolver already does this caching, so this is redundant.
      * Adds tests for basic YarnAllocator functionality.
      * Breaks up the allocateResources method, which was previously nearly 300 lines.
      * A little bit of stylistic cleanup.
      * Fixes a bug that causes three times the requests to be filed when preferred host locations are given.
      
      The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is:
      
      * If, in a single response from the RM, we receive a set of containers on a node, and prefer some number of containers on that node greater than 0 but less than the number we received, give back the delta between what we preferred and what we received.
      
      This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3765 from sryza/sandy-spark-1714 and squashes the following commits:
      
      32a5942 [Sandy Ryza] Muffle RackResolver logs
      74f56dd [Sandy Ryza] Fix a couple comments and simplify requestTotalExecutors
      60ea4bd [Sandy Ryza] Fix scalastyle
      ca35b53 [Sandy Ryza] Simplify further
      e9cf8a6 [Sandy Ryza] Fix YarnClusterSuite
      257acf3 [Sandy Ryza] Remove locality stuff and more cleanup
      59a3c5e [Sandy Ryza] Take out rack stuff
      5f72fd5 [Sandy Ryza] Further documentation and cleanup
      89edd68 [Sandy Ryza] SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnAllocator
      2eeada37
    • WangTao's avatar
      [SPARK-5336][YARN]spark.executor.cores must not be less than spark.task.cpus · 8c06a5fa
      WangTao authored
      https://issues.apache.org/jira/browse/SPARK-5336
      
      Author: WangTao <barneystinson@aliyun.com>
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #4123 from WangTaoTheTonic/SPARK-5336 and squashes the following commits:
      
      6c9676a [WangTao] Update ClientArguments.scala
      9632d3a [WangTaoTheTonic] minor comment fix
      d03d6fa [WangTaoTheTonic] import ordering should be alphabetical'
      3112af9 [WangTao] spark.executor.cores must not be less than spark.task.cpus
      8c06a5fa
    • jerryshao's avatar
      [SPARK-5297][Streaming] Fix Java file stream type erasure problem · 424d8c6f
      jerryshao authored
      Current Java file stream doesn't support custom key/value type because of loss of type information, details can be seen in [SPARK-5297](https://issues.apache.org/jira/browse/SPARK-5297). Fix this problem by getting correct `ClassTag` from `Class[_]`.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #4101 from jerryshao/SPARK-5297 and squashes the following commits:
      
      e022ca3 [jerryshao] Add Mima exclusion
      ecd61b8 [jerryshao] Fix Java fileInputStream type erasure problem
      424d8c6f
    • Kannan Rajah's avatar
      [HOTFIX] Update pom.xml to pull MapR's Hadoop version 2.4.1. · ec5b0f2c
      Kannan Rajah authored
      Author: Kannan Rajah <rkannan82@gmail.com>
      
      Closes #4108 from rkannan82/master and squashes the following commits:
      
      eca095b [Kannan Rajah] Update pom.xml to pull MapR's Hadoop version 2.4.1.
      ec5b0f2c
    • Davies Liu's avatar
      [SPARK-5275] [Streaming] include python source code · bad6c572
      Davies Liu authored
      Include the python source code into assembly jar.
      
      cc mengxr pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4128 from davies/build_streaming2 and squashes the following commits:
      
      546af4c [Davies Liu] fix indent
      48859b2 [Davies Liu] include python source code
      bad6c572
  3. Jan 20, 2015
    • Kousuke Saruta's avatar
      [SPARK-5294][WebUI] Hide tables in AllStagePages for "Active Stages, Completed... · 9a151ce5
      Kousuke Saruta authored
      [SPARK-5294][WebUI] Hide tables in AllStagePages for "Active Stages, Completed Stages and Failed Stages" when they are empty
      
      Related to SPARK-5228 and #4028, `AllStagesPage` also should hide the table for  `ActiveStages`, `CompleteStages` and `FailedStages` when they are empty.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4083 from sarutak/SPARK-5294 and squashes the following commits:
      
      a7625c1 [Kousuke Saruta] Fixed conflicts
      9a151ce5
    • Yuhao Yang's avatar
      [SPARK-5186] [MLLIB] Vector.equals and Vector.hashCode are very inefficient · 2f82c841
      Yuhao Yang authored
      JIRA Issue: https://issues.apache.org/jira/browse/SPARK-5186
      
      Currently SparseVector is using the inherited equals from Vector, which will create a full-size array for even the sparse vector. The pull request contains a specialized equals optimization that improves on both time and space.
      
      1. The implementation will be consistent with the original. Especially it will keep equality comparison between SparseVector and DenseVector.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      Author: Yuhao Yang <yuhao@yuhaodevbox.sh.intel.com>
      
      Closes #3997 from hhbyyh/master and squashes the following commits:
      
      0d9d130 [Yuhao Yang] function name change and ut update
      93f0d46 [Yuhao Yang] unify sparse vs dense vectors
      985e160 [Yuhao Yang] improve locality for equals
      bdf8789 [Yuhao Yang] improve equals and rewrite hashCode for Vector
      a6952c3 [Yuhao Yang] fix scala style for comments
      50abef3 [Yuhao Yang] fix ut for sparse vector with explicit 0
      f41b135 [Yuhao Yang] iterative equals for sparse vector
      5741144 [Yuhao Yang] Specialized equals for SparseVector
      2f82c841
    • Reynold Xin's avatar
      [SPARK-5323][SQL] Remove Row's Seq inheritance. · d181c2a1
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4115 from rxin/row-seq and squashes the following commits:
      
      e33abd8 [Reynold Xin] Fixed compilation error.
      cceb650 [Reynold Xin] Python test fixes, and removal of WrapDynamic.
      0334a52 [Reynold Xin] mkString.
      9cdeb7d [Reynold Xin] Hive tests.
      15681c2 [Reynold Xin] Fix more test cases.
      ea9023a [Reynold Xin] Fixed a catalyst test.
      c5e2cb5 [Reynold Xin] Minor patch up.
      b9cab7c [Reynold Xin] [SPARK-5323][SQL] Remove Row's Seq inheritance.
      d181c2a1
    • Yin Huai's avatar
      [SPARK-5287][SQL] Add defaultSizeOf to every data type. · bc20a52b
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5287
      
      This PR only add `defaultSizeOf` to data types and make those internal type classes `protected[sql]`. I will use another PR to cleanup the type hierarchy of data types.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4081 from yhuai/SPARK-5287 and squashes the following commits:
      
      90cec75 [Yin Huai] Update unit test.
      e1c600c [Yin Huai] Make internal classes protected[sql].
      7eaba68 [Yin Huai] Add `defaultSize` method to data types.
      fd425e0 [Yin Huai] Add all native types to NativeType.defaultSizeOf.
      bc20a52b
    • Travis Galoppo's avatar
      SPARK-5019 [MLlib] - GaussianMixtureModel exposes instances of MultivariateGauss... · 23e25543
      Travis Galoppo authored
      This PR modifies GaussianMixtureModel to expose instances of MutlivariateGaussian rather than separate mean and covariance arrays.
      
      Author: Travis Galoppo <tjg2107@columbia.edu>
      
      Closes #4088 from tgaloppo/spark-5019 and squashes the following commits:
      
      3ef6c7f [Travis Galoppo] In GaussianMixtureModel: Changed name of weight, gaussian to weights, gaussians.  Other sources modified accordingly.
      091e8da [Travis Galoppo] SPARK-5019 - GaussianMixtureModel exposes instances of MultivariateGaussian rather than mean/covariance matrices
      23e25543
    • Kousuke Saruta's avatar
      [SPARK-5329][WebUI] UIWorkloadGenerator should stop SparkContext. · 769aced9
      Kousuke Saruta authored
      UIWorkloadGenerator don't stop SparkContext. I ran UIWorkloadGenerator and try to watch the result at WebUI but Jobs are marked as finished.
      It's because SparkContext is not stopped.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4112 from sarutak/SPARK-5329 and squashes the following commits:
      
      bcc0fa9 [Kousuke Saruta] Disabled scalastyle for a bock comment
      86a3b95 [Kousuke Saruta] Fixed UIWorkloadGenerator to stop SparkContext in it
      769aced9
    • Jacek Lewandowski's avatar
      SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840... · c93a57f0
      Jacek Lewandowski authored
      ... by Piotr Kolaczkowski)
      
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #4113 from jacek-lewandowski/SPARK-4660-master and squashes the following commits:
      
      a5e84ca [Jacek Lewandowski] SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840 by Piotr Kolaczkowski)
      c93a57f0
    • Cheng Lian's avatar
      [SQL][Minor] Refactors deeply nested FP style code in BooleanSimplification · 81408027
      Cheng Lian authored
      This is a follow-up of #4090. The original deeply nested `reduceOption` code is hard to grasp.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4091)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4091 from liancheng/refactor-boolean-simplification and squashes the following commits:
      
      cd8860b [Cheng Lian] Improves `compareConditions` to handle more subtle cases
      1bf3258 [Cheng Lian] Avoids converting predicate sets to lists
      e833ca4 [Cheng Lian] Refactors deeply nested FP style code
      81408027
    • Jongyoul Lee's avatar
      [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException · 9d9294ae
      Jongyoul Lee authored
      - Rewind ByteBuffer before making ByteString
      
      (This fixes a bug introduced in #3849 / SPARK-4014)
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #4119 from jongyoul/SPARK-5333 and squashes the following commits:
      
      c6693a8 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - changed logDebug location
      4141f58 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Added license information
      2190606 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Adjusted imported libraries
      b7f5517 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Rewind ByteBuffer before making ByteString
      9d9294ae
    • Ilayaperumal Gopinathan's avatar
      [SPARK-4803] [streaming] Remove duplicate RegisterReceiver message · 4afad9c7
      Ilayaperumal Gopinathan authored
        - The ReceiverTracker receivers `RegisterReceiver` messages two times
           1) When the actor at `ReceiverSupervisorImpl`'s preStart is invoked
           2) After the receiver is started at the executor `onReceiverStart()` at `ReceiverSupervisorImpl`
      
      Though, RegisterReceiver message uses the same streamId and the receiverInfo gets updated everytime
      the message is processed at the `ReceiverTracker`, it makes sense to call register receiver only after the
      receiver is started.
      
      Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io>
      
      Closes #3648 from ilayaperumalg/RTActor-remove-prestart and squashes the following commits:
      
      868efab [Ilayaperumal Gopinathan] Increase receiverInfo collector timeout to 2 secs
      3118e5e [Ilayaperumal Gopinathan] Fix StreamingListenerSuite's startedReceiverStreamIds size
      634abde [Ilayaperumal Gopinathan] Remove duplicate RegisterReceiver message
      4afad9c7
    • Reynold Xin's avatar
      [SQL][minor] Add a log4j file for catalyst test. · debc0319
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4117 from rxin/catalyst-test-log4j and squashes the following commits:
      
      8ad610b [Reynold Xin] [SQL][minor] Add a log4j file for catalyst test.
      debc0319
    • Sean Owen's avatar
      SPARK-5270 [CORE] Provide isEmpty() function in RDD API · 306ff187
      Sean Owen authored
      Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4074 from srowen/SPARK-5270 and squashes the following commits:
      
      66885b8 [Sean Owen] Add note that JavaRDDLike should not be implemented by user code
      2e9b490 [Sean Owen] More tests, and Mima-exclude the new isEmpty method in JavaRDDLike
      28395ff [Sean Owen] Add isEmpty to Java, Python
      7dd04b7 [Sean Owen] Add efficient RDD.isEmpty()
      306ff187
  4. Jan 19, 2015
    • zsxwing's avatar
      [SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop · e69fb8c7
      zsxwing authored
      This PR adds a simple `EventLoop` and use it to replace Actor in DAGScheduler. `EventLoop` is a general class to support that posting events in multiple threads and handling events in a single event thread.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #4016 from zsxwing/event-loop and squashes the following commits:
      
      aefa1ce [zsxwing] Add protected to on*** methods
      5cfac83 [zsxwing] Remove null check of eventProcessLoop
      dba35b2 [zsxwing] Add a test that onReceive swallows InterruptException
      460f7b3 [zsxwing] Use volatile instead of Atomic things in unit tests
      227bf33 [zsxwing] Add a stop flag and some tests
      37f79c6 [zsxwing] Fix docs
      55fb6f6 [zsxwing] Add private[spark] to EventLoop
      1f73eac [zsxwing] Fix the import order
      3b2e59c [zsxwing] Add EventLoop and change DAGScheduler to an EventLoop
      e69fb8c7
    • Venkata Ramana Gollamudi's avatar
      [SPARK-4504][Examples] fix run-example failure if multiple assembly jars exist · 74de94ea
      Venkata Ramana Gollamudi authored
      Fix run-example script to fail fast with useful error message if multiple
      example assembly JARs are present.
      
      Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
      
      Closes #3377 from gvramana/run-example_fails and squashes the following commits:
      
      fa7f481 [Venkata Ramana Gollamudi] Fixed review comments, avoiding ls output scanning.
      6aa1ab7 [Venkata Ramana Gollamudi] Fix run-examples script error during multiple jars
      74de94ea
    • Yin Huai's avatar
      [SPARK-5286][SQL] Fail to drop an invalid table when using the data source API · 2604bc35
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5286
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4076 from yhuai/SPARK-5286 and squashes the following commits:
      
      6b69ed1 [Yin Huai] Catch all exception when we try to uncache a query.
      2604bc35
    • Yin Huai's avatar
      [SPARK-5284][SQL] Insert into Hive throws NPE when a inner complex type field has a null value · cd5da428
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5284
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4077 from yhuai/SPARK-5284 and squashes the following commits:
      
      fceacd6 [Yin Huai] Check if a value is null when the field has a complex type.
      cd5da428
    • Yuhao Yang's avatar
      [SPARK-5282][mllib]: RowMatrix easily gets int overflow in the memory size warning · 4432568a
      Yuhao Yang authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5282
      
      fix the possible int overflow in the memory computation warning
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #4069 from hhbyyh/addscStop and squashes the following commits:
      
      e54e5c8 [Yuhao Yang] change to MB based number
      7afac23 [Yuhao Yang] 5282: fix int overflow in the warning
      4432568a
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 1ac1c1dc
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3584 (close requested by 'pwendell')
      Closes #2433 (close requested by 'pwendell')
      Closes #1697 (close requested by 'pwendell')
      Closes #4042 (close requested by 'pwendell')
      Closes #3723 (close requested by 'pwendell')
      Closes #1560 (close requested by 'pwendell')
      Closes #3515 (close requested by 'pwendell')
      Closes #1386 (close requested by 'pwendell')
      1ac1c1dc
    • Jongyoul Lee's avatar
      [SPARK-5088] Use spark-class for running executors directly · 4a4f9ccb
      Jongyoul Lee authored
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits:
      
      8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases
      932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master
      613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases
      ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import
      97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement
      4a4f9ccb
    • Ilya Ganelin's avatar
      [SPARK-3288] All fields in TaskMetrics should be private and use getters/setters · 3453d578
      Ilya Ganelin authored
      I've updated the fields and all usages of these fields in the Spark code. I've verified that this did not break anything on my local repo.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #4020 from ilganeli/SPARK-3288 and squashes the following commits:
      
      39f3810 [Ilya Ganelin] resolved merge issues
      e446287 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
      b8c05cb [Ilya Ganelin] Missed making a variable private
      6444391 [Ilya Ganelin] Made inc/dec functions private[spark]
      1149e78 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
      26b312b [Ilya Ganelin] Debugging tests
      17146c2 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
      5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private
      c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private
      3453d578
Loading