Skip to content
Snippets Groups Projects
  1. Jan 28, 2015
    • Kousuke Saruta's avatar
      [SPARK-5188][BUILD] make-distribution.sh should support curl, not only wget to get Tachyon · e902dc44
      Kousuke Saruta authored
      When we use `make-distribution.sh` with `--with-tachyon` option, Tachyon will be downloaded by `wget` command but some systems don't have `wget` by default (MacOS X doesn't have).
      Other scripts like build/mvn, build/sbt support not only `wget` but also `curl` so `make-distribution.sh` should support `curl` too.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3988 from sarutak/SPARK-5188 and squashes the following commits:
      
      0f546e0 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5188
      010e884 [Kousuke Saruta] Merge branch 'SPARK-5188' of github.com:sarutak/spark into SPARK-5188
      163687e [Kousuke Saruta] Fixed a merge conflict
      e24e01b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5188
      3daf1f1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5188
      3caa4cb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5188
      7cc8255 [Kousuke Saruta] Fix to use \$MVN instead of mvn
      a3e908b [Kousuke Saruta] Fixed style
      2db9fbf [Kousuke Saruta] Removed redirection from the logic which checks the existence of commands
      1e4c7e0 [Kousuke Saruta] Used "command" command instead of "type" command
      83b49b5 [Kousuke Saruta] Modified make-distribution.sh so that we use curl, not only wget to get tachyon
      e902dc44
    • Sandy Ryza's avatar
      SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs · 406f6d30
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #4251 from sryza/sandy-spark-5458 and squashes the following commits:
      
      460827a [Sandy Ryza] Python too
      d2dc160 [Sandy Ryza] SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs
      406f6d30
    • Reynold Xin's avatar
      [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame. · c8e934ef
      Reynold Xin authored
      and
      
      [SPARK-5448][SQL] Make CacheManager a concrete class and field in SQLContext
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4242 from rxin/sqlCleanup and squashes the following commits:
      
      e351cb2 [Reynold Xin] Fixed toDataFrame.
      6545c42 [Reynold Xin] More changes.
      728c017 [Reynold Xin] [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame.
      c8e934ef
    • Winston Chen's avatar
      [SPARK-5361]Multiple Java RDD <-> Python RDD conversions not working correctly · 453d7999
      Winston Chen authored
      This is found through reading RDD from `sc.newAPIHadoopRDD` and writing it back using `rdd.saveAsNewAPIHadoopFile` in pyspark.
      
      It turns out that whenever there are multiple RDD conversions from JavaRDD to PythonRDD then back to JavaRDD, the exception below happens:
      
      ```
      15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7)
      java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to java.util.ArrayList
      	at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157)
      	at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
      ```
      
      The test case code below reproduces it:
      
      ```
      from pyspark.rdd import RDD
      
      dl = [
          (u'2', {u'director': u'David Lean'}),
          (u'7', {u'director': u'Andrew Dominik'})
      ]
      
      dl_rdd = sc.parallelize(dl)
      tmp = dl_rdd._to_java_object_rdd()
      tmp2 = sc._jvm.SerDe.javaToPython(tmp)
      t = RDD(tmp2, sc)
      t.count()
      
      tmp = t._to_java_object_rdd()
      tmp2 = sc._jvm.SerDe.javaToPython(tmp)
      t = RDD(tmp2, sc)
      t.count() # it blows up here during the 2nd time of conversion
      ```
      
      Author: Winston Chen <wchen@quid.com>
      
      Closes #4146 from wingchen/master and squashes the following commits:
      
      903df7d [Winston Chen] SPARK-5361, update to toSeq based on the PR
      5d90a83 [Winston Chen] SPARK-5361, make python pretty, so to pass PEP 8 checks
      126be6b [Winston Chen] SPARK-5361, add in test case
      4cf1187 [Winston Chen] SPARK-5361, add in test case
      9f1a097 [Winston Chen] add in tuple handling while converting form python RDD back to JavaRDD
      453d7999
    • Kousuke Saruta's avatar
      [SPARK-5291][CORE] Add timestamp and reason why an executor is removed to... · 0b35fcd7
      Kousuke Saruta authored
      [SPARK-5291][CORE] Add timestamp and reason why an executor is removed to SparkListenerExecutorAdded and SparkListenerExecutorRemoved
      
      Recently `SparkListenerExecutorAdded` and `SparkListenerExecutorRemoved` are added.
      I think it's useful if they have timestamp and the reason why an executor is removed.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4082 from sarutak/SPARK-5291 and squashes the following commits:
      
      a026ff2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291
      979dfe1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291
      cf9f9080 [Kousuke Saruta] Fixed test case
      1f2a89b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291
      243f2a60 [Kousuke Saruta] Modified MesosSchedulerBackendSuite
      a527c35 [Kousuke Saruta] Added timestamp to SparkListenerExecutorAdded
      0b35fcd7
    • Burak Yavuz's avatar
      [SPARK-3974][MLlib] Distributed Block Matrix Abstractions · eeb53bf9
      Burak Yavuz authored
      This pull request includes the abstractions for the distributed BlockMatrix representation.
      `BlockMatrix` will allow users to store very large matrices in small blocks of local matrices. Specific partitioners, such as `RowBasedPartitioner` and `ColumnBasedPartitioner`, are implemented in order to optimize addition and multiplication operations that will be added in a following PR.
      
      This work is based on the ml-matrix repo developed at the AMPLab at UC Berkeley, CA.
      https://github.com/amplab/ml-matrix
      
      Additional thanks to rezazadeh, shivaram, and mengxr for guidance on the design.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Burak Yavuz <brkyvz@dn51t42l.sunet>
      Author: Burak Yavuz <brkyvz@dn51t4rd.sunet>
      Author: Burak Yavuz <brkyvz@dn0a221430.sunet>
      
      Closes #3200 from brkyvz/SPARK-3974 and squashes the following commits:
      
      a8eace2 [Burak Yavuz] Merge pull request #2 from mengxr/brkyvz-SPARK-3974
      feb32a7 [Xiangrui Meng] update tests
      e1d3ee8 [Xiangrui Meng] minor updates
      24ec7b8 [Xiangrui Meng] update grid partitioner
      5eecd48 [Burak Yavuz] fixed gridPartitioner and added tests
      140f20e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-3974
      1694c9e [Burak Yavuz] almost finished addressing comments
      f9d664b [Burak Yavuz] updated API and modified partitioning scheme
      eebbdf7 [Burak Yavuz] preliminary changes addressing code review
      1a63b20 [Burak Yavuz] [SPARK-3974] Remove setPartition method. Isn't required
      1e8bb2a [Burak Yavuz] [SPARK-3974] Change return type of cache and persist
      239ab4b [Burak Yavuz] [SPARK-3974] Addressed @jkbradley's comments
      ba414d2 [Burak Yavuz] [SPARK-3974] fixed frobenius norm
      ab6cde0 [Burak Yavuz] [SPARK-3974] Modifications cleaning code up, making size calculation more robust
      9ae85aa [Burak Yavuz] [SPARK-3974] Made partitioner a variable inside BlockMatrix instead of a constructor variable
      d033861 [Burak Yavuz] [SPARK-3974] Removed SubMatrixInfo and added constructor without partitioner
      49b9586 [Burak Yavuz] [SPARK-3974] Updated testing utils from master
      645afbe [Burak Yavuz] [SPARK-3974] Pull latest master
      b05aabb [Burak Yavuz] [SPARK-3974] Updated tests to reflect changes
      19c17e8 [Burak Yavuz] [SPARK-3974] Changed blockIdRow and blockIdCol
      589fbb6 [Burak Yavuz] [SPARK-3974] Code review feedback addressed
      aa8f086 [Burak Yavuz] [SPARK-3974] Additional comments added
      f378e16 [Burak Yavuz] [SPARK-3974] Block Matrix Abstractions ready
      b693209 [Burak Yavuz] Ready for Pull request
      eeb53bf9
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 622ff09d
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1480 (close requested by 'pwendell')
      Closes #4205 (close requested by 'kdatta')
      Closes #4114 (close requested by 'pwendell')
      Closes #3382 (close requested by 'mengxr')
      Closes #3933 (close requested by 'mengxr')
      Closes #3870 (close requested by 'yhuai')
      622ff09d
    • Ryan Williams's avatar
      [SPARK-5415] bump sbt to version to 0.13.7 · 661d3f9f
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #4211 from ryan-williams/sbt0.13.7 and squashes the following commits:
      
      e28476d [Ryan Williams] bump sbt to version to 0.13.7
      661d3f9f
    • Marcelo Vanzin's avatar
      [SPARK-4809] Rework Guava library shading. · 37a5e272
      Marcelo Vanzin authored
      The current way of shading Guava is a little problematic. Code that
      depends on "spark-core" does not see the transitive dependency, yet
      classes in "spark-core" actually depend on Guava. So it's a little
      tricky to run unit tests that use spark-core classes, since you need
      a compatible version of Guava in your dependencies when running the
      tests. This can become a little tricky, and is kind of a bad user
      experience.
      
      This change modifies the way Guava is shaded so that it's applied
      uniformly across the Spark build. This means Guava is shaded inside
      spark-core itself, so that the dependency issues above are solved.
      Aside from that, all Spark sub-modules have their Guava references
      relocated, so that they refer to the relocated classes now packaged
      inside spark-core. Before, this was only done by the time the assembly
      was built, so projects that did not end up inside the assembly (such
      as streaming backends) could still reference the original location
      of Guava classes.
      
      The Guava classes are added to the "first" artifact Spark generates
      (network-common), so that all downstream modules have the needed
      classes available. Since "network-common" is a dependency of spark-core,
      all Spark apps should get the relocated classes automatically.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3658 from vanzin/SPARK-4809 and squashes the following commits:
      
      3c93e42 [Marcelo Vanzin] Shade Guava in the network-common artifact.
      5d69ec9 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      b3104fc [Marcelo Vanzin] Add comment.
      941848f [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      f78c48a [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      8053dd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      107d7da [Marcelo Vanzin] Add fix for SPARK-5052 (PR #3874).
      40b8723 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809
      4a4ed42 [Marcelo Vanzin] [SPARK-4809] Rework Guava library shading.
      37a5e272
  2. Jan 27, 2015
    • Reynold Xin's avatar
      [SPARK-5097][SQL] Test cases for DataFrame expressions. · d7437322
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4235 from rxin/df-tests1 and squashes the following commits:
      
      f341db6 [Reynold Xin] [SPARK-5097][SQL] Test cases for DataFrame expressions.
      d7437322
    • Reynold Xin's avatar
      [SPARK-5097][SQL] DataFrame · 119f45d6
      Reynold Xin authored
      This pull request redesigns the existing Spark SQL dsl, which already provides data frame like functionalities.
      
      TODOs:
      With the exception of Python support, other tasks can be done in separate, follow-up PRs.
      - [ ] Audit of the API
      - [ ] Documentation
      - [ ] More test cases to cover the new API
      - [x] Python support
      - [ ] Type alias SchemaRDD
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4173 from rxin/df1 and squashes the following commits:
      
      0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1
      23b4427 [Reynold Xin] Mima.
      828f70d [Reynold Xin] Merge pull request #7 from davies/df
      257b9e6 [Davies Liu] add repartition
      6bf2b73 [Davies Liu] fix collect with UDT and tests
      e971078 [Reynold Xin] Missing quotes.
      b9306b4 [Reynold Xin] Remove removeColumn/updateColumn for now.
      a728bf2 [Reynold Xin] Example rename.
      e8aa3d3 [Reynold Xin] groupby -> groupBy.
      9662c9e [Davies Liu] improve DataFrame Python API
      4ae51ea [Davies Liu] python API for dataframe
      1e5e454 [Reynold Xin] Fixed a bug with symbol conversion.
      2ca74db [Reynold Xin] Couple minor fixes.
      ea98ea1 [Reynold Xin] Documentation & literal expressions.
      2b22684 [Reynold Xin] Got rid of IntelliJ problems.
      02bbfbc [Reynold Xin] Tightening imports.
      ffbce66 [Reynold Xin] Fixed compilation error.
      59b6d8b [Reynold Xin] Style violation.
      b85edfb [Reynold Xin] ALS.
      8c37f0a [Reynold Xin] Made MLlib and examples compile
      6d53134 [Reynold Xin] Hive module.
      d35efd5 [Reynold Xin] Fixed compilation error.
      ce4a5d2 [Reynold Xin] Fixed test cases in SQL except ParquetIOSuite.
      66d5ef1 [Reynold Xin] SQLContext minor patch.
      c9bcdc0 [Reynold Xin] Checkpoint: SQL module compiles!
      119f45d6
    • Sandy Ryza's avatar
      SPARK-5199. FS read metrics should support CombineFileSplits and track bytes from all FSs · b1b35ca2
      Sandy Ryza authored
      ...mbineFileSplits
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #4050 from sryza/sandy-spark-5199 and squashes the following commits:
      
      864514b [Sandy Ryza] Add tests and fix bug
      0d504f1 [Sandy Ryza] Prettify
      915c7e6 [Sandy Ryza] Get metrics from all filesystems
      cdbc3e8 [Sandy Ryza] SPARK-5199. Input metrics should show up for InputFormats that return CombineFileSplits
      b1b35ca2
    • Davies Liu's avatar
      [MLlib] fix python example of ALS in guide · fdaad4eb
      Davies Liu authored
      fix python example of ALS in guide, use Rating instead of np.array.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4226 from davies/fix_als_guide and squashes the following commits:
      
      1433d76 [Davies Liu] fix python example of als in guide
      fdaad4eb
    • Sean Owen's avatar
      SPARK-5308 [BUILD] MD5 / SHA1 hash format doesn't match standard Maven output · ff356e2a
      Sean Owen authored
      Here's one way to make the hashes match what Maven's plugins would create. It takes a little extra footwork since OS X doesn't have the same command line tools. An alternative is just to make Maven output these of course - would that be better? I ask in case there is a reason I'm missing, like, we need to hash files that Maven doesn't build.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4161 from srowen/SPARK-5308 and squashes the following commits:
      
      70d09d0 [Sean Owen] Use $(...) syntax
      e25eff8 [Sean Owen] Generate MD5, SHA1 hashes in a format like Maven's plugin
      ff356e2a
    • Burak Yavuz's avatar
      [SPARK-5321] Support for transposing local matrices · 91426748
      Burak Yavuz authored
      Support for transposing local matrices added. The `.transpose` function creates a new object re-using the backing array(s) but switches `numRows` and `numCols`. Operations check the flag `.isTransposed` to see whether the indexing in `values` should be modified.
      
      This PR will pave the way for transposing `BlockMatrix`.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #4109 from brkyvz/SPARK-5321 and squashes the following commits:
      
      87ab83c [Burak Yavuz] fixed scalastyle
      caf4438 [Burak Yavuz] addressed code review v3
      c524770 [Burak Yavuz] address code review comments 2
      77481e8 [Burak Yavuz] fixed MiMa
      f1c1742 [Burak Yavuz] small refactoring
      ccccdec [Burak Yavuz] fixed failed test
      dd45c88 [Burak Yavuz] addressed code review
      a01bd5f [Burak Yavuz] [SPARK-5321] Fixed MiMa issues
      2a63593 [Burak Yavuz] [SPARK-5321] fixed bug causing failed gemm test
      c55f29a [Burak Yavuz] [SPARK-5321] Support for transposing local matrices cleaned up
      c408c05 [Burak Yavuz] [SPARK-5321] Support for transposing local matrices added
      91426748
    • Liang-Chi Hsieh's avatar
      [SPARK-5419][Mllib] Fix the logic in Vectors.sqdist · 7b0ed797
      Liang-Chi Hsieh authored
      The current implementation in Vectors.sqdist is not efficient because of allocating temp arrays. There is also a bug in the code `v1.indices.length / v1.size < 0.5`. This pr fixes the bug and refactors sqdist without allocating new arrays.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4217 from viirya/fix_sqdist and squashes the following commits:
      
      e8b0b3d [Liang-Chi Hsieh] For review comments.
      314c424 [Liang-Chi Hsieh] Fix sqdist bug.
      7b0ed797
  3. Jan 26, 2015
    • MechCoder's avatar
      [SPARK-3726] [MLlib] Allow sampling_rate not equal to 1.0 in RandomForests · d6894b1c
      MechCoder authored
      I've added support for sampling_rate not equal to 1.0 . I have two major questions.
      
      1. A Scala style test is failing, since the number of parameters now exceed 10.
      2. I would like suggestions to understand how to test this.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4073 from MechCoder/spark-3726 and squashes the following commits:
      
      8012fb2 [MechCoder] Add test in Strategy
      e0e0d9c [MechCoder] TST: Add better test
      d1df1b2 [MechCoder] Add test to verify subsampling behavior
      a7bfc70 [MechCoder] [SPARK-3726] Allow sampling_rate not equal to 1.0
      d6894b1c
    • lewuathe's avatar
      [SPARK-5119] java.lang.ArrayIndexOutOfBoundsException on trying to train... · f2ba5c6f
      lewuathe authored
      ... decision tree model
      
      Labels loaded from libsvm files are mapped to 0.0 if they are negative labels because they should be nonnegative value.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #3975 from Lewuathe/map-negative-label-to-positive and squashes the following commits:
      
      12d1d59 [lewuathe] [SPARK-5119] Fix code styles
      6d9a18a [lewuathe] [SPARK-5119] Organize test codes
      62a150c [lewuathe] [SPARK-5119] Modify Impurities throw exceptions with negatie labels
      3336c21 [lewuathe] [SPARK-5119] java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model
      f2ba5c6f
    • Elmer Garduno's avatar
      [SPARK-5052] Add common/base classes to fix guava methods signatures. · 661e0fca
      Elmer Garduno authored
      Fixes problems with incorrect method signatures related to shaded classes. For discussion see the jira issue.
      
      Author: Elmer Garduno <elmerg@google.com>
      
      Closes #3874 from elmer-garduno/fix_guava_signatures and squashes the following commits:
      
      aa5d8e0 [Elmer Garduno] Unshade common/base[Function|Supplier] classes to fix guava methods signatures.
      661e0fca
    • Sean Owen's avatar
      SPARK-960 [CORE] [TEST] JobCancellationSuite "two jobs sharing the same stage" is broken · 0497ea51
      Sean Owen authored
      This reenables and fixes this test, after addressing two issues:
      
      - The Semaphore that was intended to be shared locally was being serialized and copied; it's now a static member in the companion object as in other tests
      - Later changes to Spark means that cancelling the first task will not cancel the shared stage and therefore the second task should succeed
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4180 from srowen/SPARK-960 and squashes the following commits:
      
      43da66f [Sean Owen] Fix 'two jobs sharing the same stage' test and reenable it: truly share a Semaphore locally as intended, and update expectation of failure in non-cancelled task
      0497ea51
    • David Y. Ross's avatar
      Fix command spaces issue in make-distribution.sh · b38034e8
      David Y. Ross authored
      Storing command in variables is tricky in bash, use an array
      to handle all issues with spaces, quoting, etc.
      See: http://mywiki.wooledge.org/BashFAQ/050
      
      Author: David Y. Ross <dyross@gmail.com>
      
      Closes #4126 from dyross/dyr-fix-make-distribution and squashes the following commits:
      
      4ce522b [David Y. Ross] Fix command spaces issue in make-distribution.sh
      b38034e8
    • Sean Owen's avatar
      SPARK-4147 [CORE] Reduce log4j dependency · 54e7b456
      Sean Owen authored
      Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework. The only change is to push one half of the check in the original `if` condition inside. This is a trivial change, may or may not actually solve a problem, but I think it's all that makes sense to do for SPARK-4147.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4190 from srowen/SPARK-4147 and squashes the following commits:
      
      4e99942 [Sean Owen] Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework.
      54e7b456
    • Kousuke Saruta's avatar
      [SPARK-5339][BUILD] build/mvn doesn't work because of invalid URL for maven's tgz. · c094c732
      Kousuke Saruta authored
      build/mvn will automatically download tarball of maven. But currently, the URL is invalid.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4124 from sarutak/SPARK-5339 and squashes the following commits:
      
      6e96121 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5339
      0e012d1 [Kousuke Saruta] Updated Maven version to 3.2.5
      ca26499 [Kousuke Saruta] Fixed URL of the tarball of Maven
      c094c732
    • Davies Liu's avatar
      [SPARK-5355] use j.u.c.ConcurrentHashMap instead of TrieMap · 14209317
      Davies Liu authored
      j.u.c.ConcurrentHashMap is more battle tested.
      
      cc rxin JoshRosen pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4208 from davies/safe-conf and squashes the following commits:
      
      c2182dc [Davies Liu] address comments, fix tests
      3a1d821 [Davies Liu] fix test
      da14ced [Davies Liu] Merge branch 'master' of github.com:apache/spark into safe-conf
      ae4d305 [Davies Liu] change to j.u.c.ConcurrentMap
      f8fa1cf [Davies Liu] change to TrieMap
      a1d769a [Davies Liu] make SparkConf thread-safe
      14209317
    • Yuhao Yang's avatar
      [SPARK-5384][mllib] Vectors.sqdist returns inconsistent results for... · 81251682
      Yuhao Yang authored
      [SPARK-5384][mllib] Vectors.sqdist returns inconsistent results for sparse/dense vectors when the vectors have different lengths
      
      JIRA issue: https://issues.apache.org/jira/browse/SPARK-5384
      Currently `Vectors.sqdist` return inconsistent result for sparse/dense vectors when the vectors have different lengths, please refer to JIRA for sample
      
      PR scope:
      Unify the sqdist logic for dense/sparse vectors and fix the inconsistency, also remove the possible sparse to dense conversion in the original code.
      
      For reviewers:
      Maybe we should first discuss what's the correct behavior.
      1. Vectors for sqdist must have the same length, like in breeze?
      2. If they can have different lengths, what's the correct result for sqdist? (should the extra part get into calculation?)
      
      I'll update PR with more optimization and additional ut afterwards. Thanks.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #4183 from hhbyyh/fixDouble and squashes the following commits:
      
      1f17328 [Yuhao Yang] limit PR scope to size constraints only
      54cbf97 [Yuhao Yang] fix Vectors.sqdist inconsistence
      81251682
  4. Jan 25, 2015
    • CodingCat's avatar
      [SPARK-5268] don't stop CoarseGrainedExecutorBackend for irrelevant DisassociatedEvent · 8df94355
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-5268
      
      In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event...
      
      let's consider the following case
      
      The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted.
      
      This is not the expected behavior.....
      
      ----
      
      This is a simple fix to check the event before making the quit decision
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #4063 from CodingCat/SPARK-5268 and squashes the following commits:
      
      4d7d48e [CodingCat] simplify the log
      18c36f4 [CodingCat] more descriptive log
      f299e0b [CodingCat] clean log
      1632e79 [CodingCat] check whether DisassociatedEvent is relevant before quit
      8df94355
    • Sean Owen's avatar
      SPARK-4430 [STREAMING] [TEST] Apache RAT Checks fail spuriously on test files · 0528b85c
      Sean Owen authored
      Another trivial one. The RAT failure was due to temp files from `FailureSuite` not being cleaned up. This just makes the cleanup more reliable by using the standard temp dir mechanism.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4189 from srowen/SPARK-4430 and squashes the following commits:
      
      9ea63ff [Sean Owen] Properly acquire a temp directory to ensure it is cleaned up at shutdown, which helps avoid a RAT check failure
      0528b85c
    • Kay Ousterhout's avatar
      [SPARK-5326] Show fetch wait time as optional metric in the UI · fc2168f0
      Kay Ousterhout authored
      With this change, here's what the UI looks like:
      
      ![image](https://cloud.githubusercontent.com/assets/1108612/5809994/1ec8a904-9ff4-11e4-8f24-6a59a1a858f7.png)
      
      If you want to locally test this, you need to spin up multiple executors, because the shuffle read metrics are only shown for data read remotely.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #4110 from kayousterhout/SPARK-5326 and squashes the following commits:
      
      610051e [Kay Ousterhout] Josh style comments
      5feaa28 [Kay Ousterhout] What is the difference here??
      aa129cb [Kay Ousterhout] Removed inadvertent change
      721c742 [Kay Ousterhout] Improved tooltip
      f3a7111 [Kay Ousterhout] Style fix
      679b4e9 [Kay Ousterhout] [SPARK-5326] Show fetch wait time as optional metric in the UI
      fc2168f0
    • Kousuke Saruta's avatar
      [SPARK-5344][WebUI] HistoryServer cannot recognize that inprogress file was... · 8f5c827b
      Kousuke Saruta authored
      [SPARK-5344][WebUI] HistoryServer cannot recognize that inprogress file was renamed to completed file
      
      `FsHistoryProvider` tries to update application status but if `checkForLogs` is called before `.inprogress` file is renamed to completed file, the file is not recognized as completed.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4132 from sarutak/SPARK-5344 and squashes the following commits:
      
      9658008 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5344
      d2c72b6 [Kousuke Saruta] Fixed update issue of FsHistoryProvider
      8f5c827b
    • Sean Owen's avatar
      SPARK-4506 [DOCS] Addendum: Update more docs to reflect that standalone works in cluster mode · 9f643576
      Sean Owen authored
      This is a trivial addendum to SPARK-4506, which was already resolved. noted by Asim Jalis in SPARK-4506.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4160 from srowen/SPARK-4506 and squashes the following commits:
      
      5f5f7df [Sean Owen] Update more docs to reflect that standalone works in cluster mode
      9f643576
    • Jacek Lewandowski's avatar
      SPARK-5382: Use SPARK_CONF_DIR in spark-class if it is defined · 1c30afdf
      Jacek Lewandowski authored
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #4179 from jacek-lewandowski/SPARK-5382-1.3 and squashes the following commits:
      
      55d7791 [Jacek Lewandowski] SPARK-5382: Use SPARK_CONF_DIR in spark-class if it is defined
      1c30afdf
    • Sean Owen's avatar
      SPARK-3782 [CORE] Direct use of log4j in AkkaUtils interferes with certain logging configurations · 383425ab
      Sean Owen authored
      Although the underlying issue can I think be solved by having user code use slf4j 1.7.6+, it might be helpful and consistent to update Spark's slf4j too. I see no reason to believe it would be incompatible with other 1.7.x releases: http://www.slf4j.org/news.html  Lots of different version of slf4j are in use in the wild and anecdotally I have never seen an issue mixing them.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4184 from srowen/SPARK-3782 and squashes the following commits:
      
      5608d28 [Sean Owen] Update slf4j to 1.7.10
      383425ab
    • Sean Owen's avatar
      SPARK-3852 [DOCS] Document spark.driver.extra* configs · c586b45d
      Sean Owen authored
      As per the JIRA. I copied the `spark.executor.extra*` text, but removed info that appears to be specific to the `executor` config and not `driver`.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4185 from srowen/SPARK-3852 and squashes the following commits:
      
      f60a8a1 [Sean Owen] Document spark.driver.extra* configs
      c586b45d
    • Ryan Williams's avatar
      [SPARK-5402] log executor ID at executor-construction time · aea25482
      Ryan Williams authored
      also rename "slaveHostname" to "executorHostname"
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #4195 from ryan-williams/exec and squashes the following commits:
      
      e60a7bb [Ryan Williams] log executor ID at executor-construction time
      aea25482
    • Ryan Williams's avatar
      [SPARK-5401] set executor ID before creating MetricsSystem · 2d9887ba
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #4194 from ryan-williams/metrics and squashes the following commits:
      
      7c5a33f [Ryan Williams] set executor ID before creating MetricsSystem
      2d9887ba
    • Idan Zalzberg's avatar
      Add comment about defaultMinPartitions · 412a58e1
      Idan Zalzberg authored
      Added a comment about using math.min for choosing default partition count
      
      Author: Idan Zalzberg <idanzalz@gmail.com>
      
      Closes #4102 from idanz/patch-2 and squashes the following commits:
      
      50e9d58 [Idan Zalzberg] Update SparkContext.scala
      412a58e1
    • Reynold Xin's avatar
      Closes #4157 · d22ca1e9
      Reynold Xin authored
      d22ca1e9
  5. Jan 24, 2015
  6. Jan 23, 2015
    • Takeshi Yamamuro's avatar
      [SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a partitioner of EdgeRDDImp... · e224dbb0
      Takeshi Yamamuro authored
      If the value of 'spark.default.parallelism' does not match the number of partitoins in EdgePartition(EdgeRDDImpl),
      the following error occurs in ReplicatedVertexView.scala:72;
      
      object GraphTest extends Logging {
        def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): VertexRDD[Int] = {
          graph.aggregateMessages(
            ctx => {
              ctx.sendToSrc(1)
              ctx.sendToDst(2)
            },
            _ + _)
        }
      }
      
      val g = GraphLoader.edgeListFile(sc, "graph.txt")
      val rdd = GraphTest.run(g)
      
      java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
      	at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:57)
      	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206)
      	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
      	at scala.Option.getOrElse(Option.scala:120)
      	at org.apache.spark.rdd.RDD.partitions(RDD.scala:204)
      	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
      	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206)
      	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
      	at scala.Option.getOrElse(Option.scala:120)
      	at org.apache.spark.rdd.RDD.partitions(RDD.scala:204)
      	at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
      	at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
      	at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:193)
      	at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:191)
          ...
      
      Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
      
      Closes #4136 from maropu/EdgePartitionBugFix and squashes the following commits:
      
      0cd8942 [Ankur Dave] Use more concise getOrElse
      aad4a2c [Ankur Dave] Add unit test for non-default number of edge partitions
      0a2f32b [Takeshi Yamamuro] Do not use Partitioner.defaultPartitioner as a partitioner of EdgeRDDImpl
      e224dbb0
Loading