Skip to content
Snippets Groups Projects
  1. Dec 30, 2014
    • Josh Rosen's avatar
      [SPARK-1010] Clean up uses of System.setProperty in unit tests · 352ed6bb
      Josh Rosen authored
      Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures.
      This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself).
      For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure.  See the block comment at the top of the ResetSystemProperties class for more details.
      Author: Josh Rosen <>
      Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits:
      0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools
      3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext
      4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties
      4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering.
      0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite.
      7a3d224 [Josh Rosen] Fix trait ordering
      3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite
      bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite
      655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite
      3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite
      cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite
      8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait.
      633a84a [Josh Rosen] Remove use of system properties in FileServerSuite
      25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite
      1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite
      dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite
      b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite
      e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite
      5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite
      0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite
      c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite
      51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite
      60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite
      14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite
      628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite
      9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite.
      4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class.
    • Liu Jiongzhou's avatar
      [SPARK-4998][MLlib]delete the "train" function · 035bac88
      Liu Jiongzhou authored
      To make the functions with the same in "object" effective, specially when using java reflection.
      As the "train" function defined in "class DecisionTree" will hide the functions with the same name in "object DecisionTree".
      Author: Liu Jiongzhou <>
      Closes #3836 from ljzzju/master and squashes the following commits:
      4e13133 [Liu Jiongzhou] [MLlib]delete the "train" function
    • zsxwing's avatar
      [SPARK-4813][Streaming] Fix the issue that ContextWaiter didn't handle 'spurious wakeup' · 6a897829
      zsxwing authored
      Used `Condition` to rewrite `ContextWaiter` because it provides a convenient API `awaitNanos` for timeout.
      Author: zsxwing <>
      Closes #3661 from zsxwing/SPARK-4813 and squashes the following commits:
      52247f5 [zsxwing] Add explicit unit type
      be42bcf [zsxwing] Update as per review suggestion
      e06bd4f [zsxwing] Fix the issue that ContextWaiter didn't handle 'spurious wakeup'
    • Jakub Dubovsky's avatar
      [Spark-4995] Replace Vector.toBreeze.activeIterator with foreachActive · 0f31992c
      Jakub Dubovsky authored
      New foreachActive method of vector was introduced by SPARK-4431 as more efficient alternative to vector.toBreeze.activeIterator. There are some parts of codebase where it was not yet replaced.
      Author: Jakub Dubovsky <>
      Closes #3846 from james64/SPARK-4995-foreachActive and squashes the following commits:
      3eb7e37 [Jakub Dubovsky] Scalastyle fix
      32fe6c6 [Jakub Dubovsky] activeIterator removed - IndexedRowMatrix.toBreeze
      47a4777 [Jakub Dubovsky] activeIterator removed in RowMatrix.toBreeze
      90a7d98 [Jakub Dubovsky] activeIterator removed in MLUtils.saveAsLibSVMFile
    • Sean Owen's avatar
      SPARK-3955 part 2 [CORE] [HOTFIX] Different versions between... · b239ea1c
      Sean Owen authored
      SPARK-3955 part 2 [CORE] [HOTFIX] Different versions between jackson-mapper-asl and jackson-core-asl
      pwendell didn't actually add a reference to `jackson-core-asl` as intended, but a second redundant reference to `jackson-mapper-asl`, as markhamstra picked up on (  This just rectifies the typo. I missed it as well; the original PR had it correct and I also didn't see the problem.
      Author: Sean Owen <>
      Closes #3829 from srowen/SPARK-3955 and squashes the following commits:
      6cfdc4e [Sean Owen] Actually refer to jackson-core-asl
    • wangxiaojing's avatar
      [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash · 07fa1910
      wangxiaojing authored
      JIRA issue: [SPARK-4570](
      We are planning to create a `BroadcastLeftSemiJoinHash` to implement the broadcast join for `left semijoin`
      In left semijoin :
      If the size of data from right side is smaller than the user-settable threshold `AUTO_BROADCASTJOIN_THRESHOLD`,
      the planner would mark it as the `broadcast` relation and mark the other relation as the stream side. The broadcast table will be broadcasted to all of the executors involved in the join, as a `org.apache.spark.broadcast.Broadcast` object. It will use `joins.BroadcastLeftSemiJoinHash`.,else it will use `joins.LeftSemiJoinHash`.
      The benchmark suggests these  made the optimized version 4x faster  when `left semijoin`
      left semi join : 9288 ms
      left semi join : 1963 ms
      The micro benchmark load `data1/kv3.txt` into a normal Hive table.
      Benchmark code:
       def benchmark(f: => Unit) = {
          val begin = System.currentTimeMillis()
          val end = System.currentTimeMillis()
          end - begin
        val sc = new SparkContext(
          new SparkConf()
        val hiveContext = new HiveContext(sc)
        import hiveContext._
        sql("drop table if exists left_table")
        sql("drop table if exists right_table")
        sql( """create table left_table (key int, value string)
        sql( s"""load data local inpath "/data1/kv3.txt" into table left_table""")
        sql( """create table right_table (key int, value string)
            |from left_table
            |insert overwrite table right_table
            |select left_table.key, left_table.value
        val leftSimeJoin = sql(
          """select a.key from left_table a
            |left semi join right_table b on a.key = b.key""".stripMargin)
        val leftSemiJoinDuration = benchmark(leftSimeJoin.count())
        println(s"left semi join : $leftSemiJoinDuration ms ")
      Author: wangxiaojing <>
      Closes #3442 from wangxiaojing/SPARK-4570 and squashes the following commits:
      a4a43c9 [wangxiaojing] rebase
      f103983 [wangxiaojing] change style
      fbe4887 [wangxiaojing] change style
      ff2e618 [wangxiaojing] add testsuite
      1a8da2a [wangxiaojing] add BroadcastLeftSemiJoinHash
    • wangfei's avatar
      [SPARK-4935][SQL] When hive.cli.print.header configured, spark-sql aborted if... · 8f29b7ca
      wangfei authored
      [SPARK-4935][SQL] When hive.cli.print.header configured, spark-sql aborted if passed in a invalid sql
      If we passed in a wrong sql like ```abdcdfsfs```, the spark-sql script aborted.
      Author: wangfei <>
      Author: Fei Wang <>
      Closes #3761 from scwf/patch-10 and squashes the following commits:
      46dc344 [Fei Wang] revert console.printError(rc.getErrorMessage())
      0330e07 [wangfei] avoid to print error message repeatedly
      1614a11 [wangfei] spark-sql abort when passed in a wrong sql
    • Michael Davies's avatar
      [SPARK-4386] Improve performance when writing Parquet files · 7425bec3
      Michael Davies authored
      Convert type of RowWriteSupport.attributes to Array.
      Analysis of performance for writing very wide tables shows that time is spent predominantly in apply method on  attributes var. Type of attributes previously was LinearSeqOptimized and apply is O(N) which made write O(N squared).
      Measurements on 575 column table showed this change made a 6x improvement in write times.
      Author: Michael Davies <>
      Closes #3843 from MickDavies/SPARK-4386 and squashes the following commits:
      892519d [Michael Davies] [SPARK-4386] Improve performance when writing Parquet files
    • Cheng Lian's avatar
      [SPARK-4937][SQL] Normalizes conjunctions and disjunctions to eliminate common predicates · 61a99f6a
      Cheng Lian authored
      This PR is a simplified version of several filter optimization rules introduced in #3778 authored by scwf. Newly introduced optimizations include:
      1. `a && a` => `a`
      2. `a || a` => `a`
      3. `(a || b || c || ...) && (a || b || d || ...)` => `a && b && (c || d || ...)`
      The 3rd rule is particularly useful for optimizing the following query, which is planned into a cartesian product
      SELECT *
        FROM t1, t2
       WHERE (t1.key = t2.key AND t1.value > 10)
          OR (t1.key = t2.key AND t2.value < 20)
      to the following one, which is planned into an equi-join:
      SELECT *
        FROM t1, t2
       WHERE t1.key = t2.key
         AND (t1.value > 10 OR t2.value < 20)
      The example above is quite artificial, but common predicates are likely to appear in real life complex queries (like the one mentioned in #3778).
      A difference between this PR and #3778 is that these optimizations are not limited to `Filter`, but are generalized to all logical plan nodes. Thanks to scwf for bringing up these optimizations, and chenghao-intel for the generalization suggestion.
      <!-- Reviewable:start -->
      [<img src="" height=40 alt="Review on Reviewable"/>](
      <!-- Reviewable:end -->
      Author: Cheng Lian <>
      Closes #3784 from liancheng/normalize-filters and squashes the following commits:
      caca560 [Cheng Lian] Moves filter normalization into BooleanSimplification rule
      4ab3a58 [Cheng Lian] Fixes test failure, adds more tests
      5d54349 [Cheng Lian] Fixes typo in comment
      2abbf8e [Cheng Lian] Forgot our sacred Apache licence header...
      cf95639 [Cheng Lian] Adds an optimization rule for filter normalization
    • guowei2's avatar
      [SPARK-4928][SQL] Fix: Operator '>,<,>=,<=' with decimal between different precision report error · a75dd83b
      guowei2 authored
      case operator  with decimal between different precision, we need change them to unlimited
      Author: guowei2 <>
      Closes #3767 from guowei2/SPARK-4928 and squashes the following commits:
      c6a6e3e [guowei2] fix code style
      3214e0a [guowei2] add test case
      b4985a2 [guowei2] fix code style
      27adf42 [guowei2] Fix: Operation '>,<,>=,<=' with Decimal report error
    • luogankun's avatar
      [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE is eager · 2deac748
      luogankun authored
      `CACHE TABLE tbl` is now __eager__ by default not __lazy__
      Author: luogankun <>
      Closes #3773 from luogankun/SPARK-4930 and squashes the following commits:
      cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ...
      bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager
    • luogankun's avatar
      [SPARK-4916][SQL][DOCS]Update SQL programming guide about cache section · f7a41a0e
      luogankun authored
      `SchemeRDD.cache()` now uses in-memory columnar storage.
      Author: luogankun <>
      Closes #3759 from luogankun/SPARK-4916 and squashes the following commits:
      7b39864 [luogankun] [SPARK-4916]Update SQL programming guide
      6018122 [luogankun] Merge branch 'master' of into SPARK-4916
      0b93785 [luogankun] [SPARK-4916]Update SQL programming guide
      99b2336 [luogankun] [SPARK-4916]Update SQL programming guide
    • Cheng Lian's avatar
      [SPARK-4493][SQL] Tests for IsNull / IsNotNull in the ParquetFilterSuite · 19a8802e
      Cheng Lian authored
      This is a follow-up of #3367 and #3644.
      At the time #3644 was written, #3367 hadn't been merged yet, thus `IsNull` and `IsNotNull` filters are not covered in the first version of `ParquetFilterSuite`. This PR adds corresponding test cases.
      <!-- Reviewable:start -->
      [<img src="" height=40 alt="Review on Reviewable"/>](
      <!-- Reviewable:end -->
      Author: Cheng Lian <>
      Closes #3748 from liancheng/test-null-filters and squashes the following commits:
      1ab943f [Cheng Lian] IsNull and IsNotNull Parquet filter test case for boolean type
      bcd616b [Cheng Lian] Adds Parquet filter pushedown tests for IsNull and IsNotNull
    • Cheng Hao's avatar
      [Spark-4512] [SQL] Unresolved Attribute Exception in Sort By · 53f0a00b
      Cheng Hao authored
      It will cause exception while do query like:
      SELECT key+key FROM src sort by value;
      Author: Cheng Hao <>
      Closes #3386 from chenghao-intel/sort and squashes the following commits:
      38c78cc [Cheng Hao] revert the SortPartition in SparkStrategies
      7e9dd15 [Cheng Hao] update the typo
      fcd1d64 [Cheng Hao] rebase the latest master and update the SortBy unit test
    • wangfei's avatar
      [SPARK-5002][SQL] Using ascending by default when not specify order in order by · daac2213
      wangfei authored
      spark sql does not support ```SELECT a, b FROM testData2 ORDER BY a desc, b```.
      Author: wangfei <>
      Closes #3838 from scwf/orderby and squashes the following commits:
      114b64a [wangfei] remove nouse methods
      48145d3 [wangfei] fix order, using asc by default
    • Cheng Hao's avatar
      [SPARK-4904] [SQL] Remove the unnecessary code change in Generic UDF · 63b84b7d
      Cheng Hao authored
      Since #3429 has been merged, the bug of wrapping to Writable for HiveGenericUDF is resolved, we can safely remove the foldable checking in `HiveGenericUdf.eval`, which discussed in #2802.
      Author: Cheng Hao <>
      Closes #3745 from chenghao-intel/generic_udf and squashes the following commits:
      622ad03 [Cheng Hao] Remove the unnecessary code change in Generic UDF
    • Cheng Hao's avatar
      [SPARK-4959] [SQL] Attributes are case sensitive when using a select query from a projection · 5595eaa7
      Cheng Hao authored
      Author: Cheng Hao <>
      Closes #3796 from chenghao-intel/spark_4959 and squashes the following commits:
      3ec08f8 [Cheng Hao] Replace the attribute in comparing its exprId other than itself
    • scwf's avatar
      [SPARK-4975][SQL] Fix HiveInspectorSuite test failure · 65357f11
      scwf authored
      HiveInspectorSuite test failure:
      [info] - wrap / unwrap null, constant null and writables *** FAILED *** (21 milliseconds)
      [info] 1 did not equal 0 (HiveInspectorSuite.scala:136)
      this is because the origin date(is 3914-10-23) not equals the date returned by ```unwrap```(is 3914-10-22).
      Setting TimeZone and Locale fix this.
      Another minor change here is rename ```def checkValues(v1: Any, v2: Any): Unit```  to  ```def checkValue(v1: Any, v2: Any): Unit ``` to make the code more clear
      Author: scwf <>
      Author: Fei Wang <>
      Closes #3814 from scwf/fix-inspectorsuite and squashes the following commits:
      d8531ef [Fei Wang] Delete test.log
      72b19a9 [scwf] fix HiveInspectorSuite test error
    • Daoyuan Wang's avatar
      [SQL] enable view test · 94d60b70
      Daoyuan Wang authored
      This is a follow up of #3396 , just add a test to white list.
      Author: Daoyuan Wang <>
      Closes #3826 from adrian-wang/viewtest and squashes the following commits:
      f105f68 [Daoyuan Wang] enable view test
    • Michael Armbrust's avatar
      [SPARK-4908][SQL] Prevent multiple concurrent hive native commands · 480bd1d2
      Michael Armbrust authored
      This is just a quick fix that locks when calling `runHive`.  If we can find a way to avoid the error without a global lock that would be better.
      Author: Michael Armbrust <>
      Closes #3834 from marmbrus/hiveConcurrency and squashes the following commits:
      bf25300 [Michael Armbrust] prevent multiple concurrent hive native commands
    • Josh Rosen's avatar
      [SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works with KryoSerializer · efa80a53
      Josh Rosen authored
      This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used.  The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer.
      Author: Josh Rosen <>
      Closes #3831 from JoshRosen/SPARK-4882 and squashes the following commits:
      0466c7a [Josh Rosen] Register PythonBroadcast with Kryo.
      d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug.
      069d8a7 [Josh Rosen] Add failing test for SPARK-4882
    • Zhang, Liye's avatar
      [SPARK-4920][UI] add version on master and worker page for standalone mode · 9077e721
      Zhang, Liye authored
      Author: Zhang, Liye <>
      Closes #3769 from liyezhang556520/spark-4920_WebVersion and squashes the following commits:
      3bb7e0d [Zhang, Liye] add version on master and worker page
  2. Dec 29, 2014
    • DB Tsai's avatar
      [SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression for... · 040d6f2d
      DB Tsai authored
      [SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression for the change of LeastSquaresGradient
      In #SPARK-4907, we added factor of 2 into the LeastSquaresGradient. We updated the scala doc for lasso and ridge regression here.
      Author: DB Tsai <>
      Closes #3808 from dbtsai/doc and squashes the following commits:
      ec3c989 [DB Tsai] first commit
    • ganonp's avatar
      Added setMinCount to Word2Vec.scala · 343db392
      ganonp authored
      Wanted to customize the private minCount variable in the Word2Vec class. Added
      a method to do so.
      Author: ganonp <>
      Closes #3693 from ganonp/my-custom-spark and squashes the following commits:
      ad534f2 [ganonp] made norm method public
      5110a6f [ganonp] Reorganized
      854958b [ganonp] Fixed Indentation for setMinCount
      12ed8f9 [ganonp] Update Word2Vec.scala
      76bdf5a [ganonp] Update Word2Vec.scala
      ffb88bb [ganonp] Update Word2Vec.scala
      5eb9100 [ganonp] Added setMinCount to Word2Vec.scala
    • Travis Galoppo's avatar
      SPARK-4156 [MLLIB] EM algorithm for GMMs · 6cf6fdf3
      Travis Galoppo authored
      Implementation of Expectation-Maximization for Gaussian Mixture Models.
      This is my maiden contribution to Apache Spark, so I apologize now if I have done anything incorrectly; having said that, this work is my own, and I offer it to the project under the project's open source license.
      Author: Travis Galoppo <>
      Author: Travis Galoppo <travis@localhost.localdomain>
      Author: tgaloppo <>
      Author: FlytxtRnD <>
      Closes #3022 from tgaloppo/master and squashes the following commits:
      aaa8f25 [Travis Galoppo] MLUtils: changed privacy of EPSILON from [util] to [mllib]
      709e4bf [Travis Galoppo] fixed usage line to include optional maxIterations parameter
      acf1fba [Travis Galoppo] Fixed parameter comment in GaussianMixtureModel Made maximum iterations an optional parameter to DenseGmmEM
      9b2fc2a [Travis Galoppo] Style improvements Changed ExpectationSum to a private class
      b97fe00 [Travis Galoppo] Minor fixes and tweaks.
      1de73f3 [Travis Galoppo] Removed redundant array from array creation
      578c2d1 [Travis Galoppo] Removed unused import
      227ad66 [Travis Galoppo] Moved prediction methods into model class.
      308c8ad [Travis Galoppo] Numerous changes to improve code
      cff73e0 [Travis Galoppo] Replaced accumulators with RDD.aggregate
      20ebca1 [Travis Galoppo] Removed unusued code
      42b2142 [Travis Galoppo] Added functionality to allow setting of GMM starting point. Added two cluster test to testing suite.
      8b633f3 [Travis Galoppo] Style issue
      9be2534 [Travis Galoppo] Style issue
      d695034 [Travis Galoppo] Fixed style issues
      c3b8ce0 [Travis Galoppo] Merge branch 'master' of   Adds predict() method
      2df336b [Travis Galoppo] Fixed style issue
      b99ecc4 [tgaloppo] Merge pull request #1 from FlytxtRnD/predictBranch
      f407b4c [FlytxtRnD] Added predict() to return the cluster labels and membership values
      97044cf [Travis Galoppo] Fixed style issues
      dc9c742 [Travis Galoppo] Moved MultivariateGaussian utility class
      e7d413b [Travis Galoppo] Moved multivariate Gaussian utility class to mllib/stat/impl Improved comments
      9770261 [Travis Galoppo] Corrected a variety of style and naming issues.
      8aaa17d [Travis Galoppo] Added additional train() method to companion object for cluster count and tolerance parameters.
      676e523 [Travis Galoppo] Fixed to no longer ignore delta value provided on command line
      e6ea805 [Travis Galoppo] Merged with master branch; update test suite with latest context changes. Improved cluster initialization strategy.
      86fb382 [Travis Galoppo] Merge remote-tracking branch 'upstream/master'
      719d8cc [Travis Galoppo] Added scala test suite with basic test
      c1a8e16 [Travis Galoppo] Made GaussianMixtureModel class serializable Modified sum function for better performance
      5c96c57 [Travis Galoppo] Merge remote-tracking branch 'upstream/master'
      c15405c [Travis Galoppo] SPARK-4156
    • Yash Datta's avatar
      SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions · 9bc0df68
      Yash Datta authored
      takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception :
      4. run query
      SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;
      Error trace
      java.lang.UnsupportedOperationException: empty collection
      at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
      at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
      at scala.Option.getOrElse(Option.scala:120)
      at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
      at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
      Author: Yash Datta <>
      Closes #3830 from saucam/fix_takeorder and squashes the following commits:
      5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions
    • Burak Yavuz's avatar
      [SPARK-4409][MLlib] Additional Linear Algebra Utils · 02b55de3
      Burak Yavuz authored
      Addition of a very limited number of local matrix manipulation and generation methods that would be helpful in the further development for algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and Multi Model Training (SPARK-1486).
      The proposed methods for addition are:
      For `Matrix`
       - map: maps the values in the matrix with a given function. Produces a new matrix.
       - update: the values in the matrix are updated with a given function. Occurs in place.
      Factory methods for `DenseMatrix`:
       - *zeros: Generate a matrix consisting of zeros
       - *ones: Generate a matrix consisting of ones
       - *eye: Generate an identity matrix
       - *rand: Generate a matrix consisting of i.i.d. uniform random numbers
       - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
       - *diag: Generate a diagonal matrix from a supplied vector
      *These methods already exist in the factory methods for `Matrices`, however for cases where we require a `DenseMatrix`, you constantly have to add `.asInstanceOf[DenseMatrix]` everywhere, which makes the code "dirtier". I propose moving these functions to factory methods for `DenseMatrix` where the putput will be a `DenseMatrix` and the factory methods for `Matrices` will call these functions directly and output a generic `Matrix`.
      Factory methods for `SparseMatrix`:
       - speye: Identity matrix in sparse format. Saves a ton of memory when dimensions are large, especially in Multi Model Training, where each row requires being multiplied by a scalar.
       - sprand: Generate a sparse matrix with a given density consisting of i.i.d. uniform random numbers.
       - sprandn: Generate a sparse matrix with a given density consisting of i.i.d. gaussian random numbers.
       - diag: Generate a diagonal matrix from a supplied vector, but is memory efficient, because it just stores the diagonal. Again, very helpful in Multi Model Training.
      Factory methods for `Matrices`:
       - Include all the factory methods given above, but return a generic `Matrix` rather than `SparseMatrix` or `DenseMatrix`.
       - horzCat: Horizontally concatenate matrices to form one larger matrix. Very useful in both Multi Model Training, and for the repartitioning of BlockMatrix.
       - vertCat: Vertically concatenate matrices to form one larger matrix. Very useful for the repartitioning of BlockMatrix.
      The names for these methods were selected from MATLAB
      Author: Burak Yavuz <>
      Author: Xiangrui Meng <>
      Closes #3319 from brkyvz/SPARK-4409 and squashes the following commits:
      b0354f6 [Burak Yavuz] [SPARK-4409] Incorporated mengxr's code
      04c4829 [Burak Yavuz] Merge pull request #1 from mengxr/SPARK-4409
      80cfa29 [Xiangrui Meng] minor changes
      ecc937a [Xiangrui Meng] update sprand
      4e95e24 [Xiangrui Meng] simplify fromCOO implementation
      10a63a6 [Burak Yavuz] [SPARK-4409] Fourth pass of code review
      f62d6c7 [Burak Yavuz] [SPARK-4409] Modified genRandMatrix
      3971c93 [Burak Yavuz] [SPARK-4409] Third pass of code review
      75239f8 [Burak Yavuz] [SPARK-4409] Second pass of code review
      e4bd0c0 [Burak Yavuz] [SPARK-4409] Modified horzcat and vertcat
      65c562e [Burak Yavuz] [SPARK-4409] Hopefully fixed Java Test
      d8be7bc [Burak Yavuz] [SPARK-4409] Organized imports
      065b531 [Burak Yavuz] [SPARK-4409] First pass after code review
      a8120d2 [Burak Yavuz] [SPARK-4409] Finished updates to API according to SPARK-4614
      f798c82 [Burak Yavuz] [SPARK-4409] Updated API according to SPARK-4614
      c75f3cd [Burak Yavuz] [SPARK-4409] Added JavaAPI Tests, and fixed a couple of bugs
      d662f9d [Burak Yavuz] [SPARK-4409] Modified according to remote repo
      83dfe37 [Burak Yavuz] [SPARK-4409] Scalastyle error fixed
      a14c0da [Burak Yavuz] [SPARK-4409] Initial commit to add methods
    • Kousuke Saruta's avatar
      [Minor] Fix a typo of type parameter in JavaUtils.scala · 8d72341a
      Kousuke Saruta authored
      In JavaUtils.scala, thare is a typo of type parameter. In addition, the type information is removed at the time of compile by erasure.
      This issue is really minor so I don't  file in JIRA.
      Author: Kousuke Saruta <>
      Closes #3789 from sarutak/fix-typo-in-javautils and squashes the following commits:
      e20193d [Kousuke Saruta] Fixed a typo of type parameter
      82bc5d9 [Kousuke Saruta] Merge branch 'master' of git:// into fix-typo-in-javautils
      99f6f63 [Kousuke Saruta] Fixed a typo of type parameter in JavaUtils.scala
    • YanTangZhai's avatar
      [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in... · 815de540
      YanTangZhai authored
      [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
      Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
      Author: YanTangZhai <>
      Author: yantangzhai <>
      Closes #3785 from YanTangZhai/SPARK-4946 and squashes the following commits:
      9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
      e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
      718afeb [YanTangZhai] Merge pull request #12 from apache/master
      6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
      e249846 [YanTangZhai] Merge pull request #10 from apache/master
      d26d982 [YanTangZhai] Merge pull request #9 from apache/master
      76d4027 [YanTangZhai] Merge pull request #8 from apache/master
      03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
      8a00106 [YanTangZhai] Merge pull request #6 from apache/master
      cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
      cdef539 [YanTangZhai] Merge pull request #1 from apache/master
    • Kousuke Saruta's avatar
      Adde LICENSE Header to build/mvn, build/sbt and sbt/sbt · 4cef05e1
      Kousuke Saruta authored
      Recently, build/mvn and build/sbt are added, and sbt/sbt is changed but there are no license headers. Should we add license headers to the scripts right?
      If it's not right, please let me correct.
      This PR doesn't affect behavior of Spark, I don't file in JIRA.
      Author: Kousuke Saruta <>
      Closes #3817 from sarutak/add-license-header and squashes the following commits:
      1abc972 [Kousuke Saruta] Added LICENSE Header
    • wangxiaojing's avatar
      [SPARK-4982][DOC] `spark.ui.retainedJobs` description is wrong in Spark UI configuration guide · 6645e525
      wangxiaojing authored
      Author: wangxiaojing <>
      Closes #3818 from wangxiaojing/SPARK-4982 and squashes the following commits:
      fe2ad5f [wangxiaojing] change stages to jobs
    • meiyoula's avatar
      [SPARK-4966][YARN]The MemoryOverhead value is setted not correctly · 14fa87bd
      meiyoula authored
      Author: meiyoula <>
      Closes #3797 from XuTingjun/MemoryOverhead and squashes the following commits:
      5a780fc [meiyoula] Update ClientArguments.scala
  3. Dec 27, 2014
    • Brennon York's avatar
      [SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac · a3e51cc9
      Brennon York authored
      Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.
      Author: Brennon York <>
      Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:
      0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
      9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
      d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
      b979c58 [Brennon York] updated the merge conflict
      c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
      b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
      be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
      28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
      7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
      14a5da0 [Brennon York] removed unnecessary zinc output on startup
      1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
      3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
      a680d12 [Brennon York] Added comments to functions and tested various mvn calls
      bb8cc9d [Brennon York] removed package files
      ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
      07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
      f914dea [Brennon York] Beginning final portions of localized scala home
      69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
      4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
      cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability
    • GuoQiang Li's avatar
      [SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails · 080ceb77
      GuoQiang Li authored
      Author: GuoQiang Li <>
      Closes #3788 from witgo/SPARK-4952 and squashes the following commits:
      d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails
    • Zhang, Liye's avatar
      [SPARK-4954][Core] add spark version infomation in log for standalone mode · 786808ab
      Zhang, Liye authored
      The master and worker spark version may be not the same with Driver spark version. That is because spark Jar file might be replaced for new application without restarting the spark cluster. So there shall log out the spark-version in both Mater and Worker log.
      Author: Zhang, Liye <>
      Closes #3790 from liyezhang556520/version4Standalone and squashes the following commits:
      e05e1e3 [Zhang, Liye] add spark version infomation in log for standalone mode
    • Jongyoul Lee's avatar
      [SPARK-3955] Different versions between jackson-mapper-asl and jackson-c... · 2483c1ef
      Jongyoul Lee authored
      - set the same version to jackson-mapper-asl and jackson-core-asl
      - It's related with #2818
      - coded a same patch from a latest master
      Author: Jongyoul Lee <>
      Closes #3716 from jongyoul/SPARK-3955 and squashes the following commits:
      efa29aa [Jongyoul Lee] [SPARK-3955] Different versions between jackson-mapper-asl and jackson-core-asl - set the same version to jackson-mapper-asl and jackson-core-asl
    • Patrick Wendell's avatar
      HOTFIX: Slight tweak on previous commit. · 82bf4bee
      Patrick Wendell authored
      Meant to merge this in when committing SPARK-3787.
    • Kousuke Saruta's avatar
      [SPARK-3787][BUILD] Assembly jar name is wrong when we build with sbt omitting -Dhadoop.version · de95c57a
      Kousuke Saruta authored
      This PR is another solution for When we build with sbt with profile for hadoop and without property for hadoop version like:
          sbt/sbt -Phadoop-2.2 assembly
      jar name is always used default version (1.0.4).
      When we build with maven with same condition for sbt, default version for each profile is used.
      For instance, if we  build like:
          mvn -Phadoop-2.2 package
      jar name is used hadoop2.2.0 as a default version of hadoop-2.2.
      Author: Kousuke Saruta <>
      Closes #3046 from sarutak/fix-assembly-jarname-2 and squashes the following commits:
      41ef90e [Kousuke Saruta] Merge branch 'master' of git:// into fix-assembly-jarname-2
      50c8676 [Kousuke Saruta] Merge branch 'fix-assembly-jarname-2' of into fix-assembly-jarname-2
      52a1cd2 [Kousuke Saruta] Fixed comflicts
      dd30768 [Kousuke Saruta] Merge branch 'master' of git:// into fix-assembly-jarname2
      f1c90bb [Kousuke Saruta] Fixed SparkBuild.scala in order to read `hadoop.version` property from pom.xml
      af6b100 [Kousuke Saruta] Merge branch 'master' of git:// into fix-assembly-jarname
      c81806b [Kousuke Saruta] Merge branch 'master' of git:// into fix-assembly-jarname
      ad1f96e [Kousuke Saruta] Merge branch 'master' of git:// into fix-assembly-jarname
      b2318eb [Kousuke Saruta] Merge branch 'master' of git:// into fix-assembly-jarname
      5fc1259 [Kousuke Saruta] Fixed typo.
      eebbb7d [Kousuke Saruta] Fixed wrong jar name
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 534f24b2
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      Closes #3456 (close requested by 'pwendell')
      Closes #1602 (close requested by 'tdas')
      Closes #2633 (close requested by 'tdas')
      Closes #2059 (close requested by 'JoshRosen')
      Closes #2348 (close requested by 'tdas')
      Closes #3662 (close requested by 'tdas')
      Closes #2031 (close requested by 'andrewor14')
      Closes #265 (close requested by 'JoshRosen')
  4. Dec 26, 2014