Skip to content
Snippets Groups Projects
  1. Apr 06, 2016
    • Victor Chima's avatar
      Added omitted word in error message · 24015199
      Victor Chima authored
      ## What changes were proposed in this pull request?
      
      Added an omitted word in the error message displayed by the Graphx Pregel API when `maxIterations <= 0`
      
      ## How was this patch tested?
      
      Manual test
      
      Author: Victor Chima <blazy2k9@gmail.com>
      
      Closes #12205 from blazy2k9/hotfix/pregel-error-message.
      24015199
  2. Apr 02, 2016
    • Dongjoon Hyun's avatar
      [MINOR][DOCS] Use multi-line JavaDoc comments in Scala code. · 4a6e78ab
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes.
      (All comment-only changes over 77 files: +786 lines, −747 lines)
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
      4a6e78ab
  3. Mar 28, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinite loops... · 289257c4
      Dongjoon Hyun authored
      [SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex
      
      ## What changes were proposed in this pull request?
      
      Currently, `GraphOps.pickRandomVertex()` falls into infinite loops for graphs having only one vertex. This PR fixes it by modifying the following termination-checking condition.
      ```scala
      -      if (selectedVertices.count > 1) {
      +      if (selectedVertices.count > 0) {
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins tests (including new test case).
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12018 from dongjoon-hyun/SPARK-14219.
      289257c4
  4. Mar 26, 2016
    • Dongjoon Hyun's avatar
      [MINOR] Fix newly added java-lint errors · 18084658
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes some newly added java-lint errors(unused-imports, line-lengsth).
      
      ## How was this patch tested?
      
      Pass the Jenkins tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11968 from dongjoon-hyun/SPARK-14167.
      18084658
  5. Mar 17, 2016
  6. Mar 16, 2016
  7. Mar 14, 2016
    • Dongjoon Hyun's avatar
      [MINOR][DOCS] Fix more typos in comments/strings. · acdf2197
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes 135 typos over 107 files:
      * 121 typos in comments
      * 11 typos in testcase name
      * 3 typos in log messages
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11689 from dongjoon-hyun/fix_more_typos.
      acdf2197
  8. Mar 13, 2016
    • Sean Owen's avatar
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <->... · 18408528
      Sean Owen authored
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
      
      ## What changes were proposed in this pull request?
      
      - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8
      - Same for `InputStreamReader` and `OutputStreamWriter` constructors
      - Standardizes on UTF-8 everywhere
      - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`)
      - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c )
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11657 from srowen/SPARK-13823.
      18408528
  9. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [MINOR] Fix typos in comments and testcase name of code · 941b270b
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes typos in comments and testcase name of code.
      
      ## How was this patch tested?
      
      manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
      941b270b
    • Dongjoon Hyun's avatar
      [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule · b5f02d67
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time.
      This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers.
      
      ## How was this patch tested?
      ```
      ./dev/lint-java
      ./build/sbt compile
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11438 from dongjoon-hyun/SPARK-13583.
      b5f02d67
    • Sean Owen's avatar
      [SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.x · e97fc7f1
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly:
      
      - Inner class should be static
      - Mismatched hashCode/equals
      - Overflow in compareTo
      - Unchecked warnings
      - Misuse of assert, vs junit.assert
      - get(a) + getOrElse(b) -> getOrElse(a,b)
      - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions
      - Dead code
      - tailrec
      - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count
      - reduce(_+_) -> sum map + flatten -> map
      
      The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places.
      
      ## How was the this patch tested?
      
      Existing Jenkins unit tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11292 from srowen/SPARK-13423.
      e97fc7f1
  10. Feb 22, 2016
  11. Feb 21, 2016
  12. Feb 20, 2016
  13. Feb 15, 2016
  14. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  15. Jan 15, 2016
    • Jason Lee's avatar
      [SPARK-12655][GRAPHX] GraphX does not unpersist RDDs · d0a5c32b
      Jason Lee authored
      Some VertexRDD and EdgeRDD are created during the intermediate step of g.connectedComponents() but unnecessarily left cached after the method is done. The fix is to unpersist these RDDs once they are no longer in use.
      
      A test case is added to confirm the fix for the reported bug.
      
      Author: Jason Lee <cjlee@us.ibm.com>
      
      Closes #10713 from jasoncl/SPARK-12655.
      d0a5c32b
  16. Jan 10, 2016
  17. Jan 06, 2016
    • Kousuke Saruta's avatar
      [SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and... · 94c202c7
      Kousuke Saruta authored
      [SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and GraphKryoRegistrator which are deprecated and no longer used
      
      Whole code of Vector.scala, VectorSuite.scala and GraphKryoRegistrator.scala  are no longer used so it's time to remove them in Spark 2.0.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10613 from sarutak/SPARK-12665.
      94c202c7
  18. Jan 05, 2016
  19. Dec 30, 2015
  20. Dec 21, 2015
  21. Dec 19, 2015
  22. Dec 04, 2015
    • Josh Rosen's avatar
      [SPARK-12112][BUILD] Upgrade to SBT 0.13.9 · b7204e1d
      Josh Rosen authored
      We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin).
      
      I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
      b7204e1d
  23. Nov 12, 2015
    • Gaurav Kumar's avatar
      Fixed error in scaladoc of convertToCanonicalEdges · df0e3181
      Gaurav Kumar authored
      The code convertToCanonicalEdges is such that srcIds are smaller than dstIds but the scaladoc suggested otherwise. Have fixed the same.
      
      Author: Gaurav Kumar <gauravkumar37@gmail.com>
      
      Closes #9666 from gauravkumar37/patch-1.
      df0e3181
  24. Nov 11, 2015
  25. Nov 02, 2015
  26. Oct 07, 2015
  27. Sep 15, 2015
    • Reynold Xin's avatar
      Update version to 1.6.0-SNAPSHOT. · 09b7e7c1
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8350 from rxin/1.6.
      09b7e7c1
    • Robin East's avatar
      [SPARK-10598] [DOCS] · 6503c4b5
      Robin East authored
      Comments preceding toMessage method state: "The edge partition is encoded in the lower
         * 30 bytes of the Int, and the position is encoded in the upper 2 bytes of the Int.". References to bytes should be changed to bits.
      
      This contribution is my original work and I license the work to the Spark project under it's open source license.
      
      Author: Robin East <robin.east@xense.co.uk>
      
      Closes #8756 from insidedctm/master.
      6503c4b5
  28. Sep 14, 2015
  29. Sep 09, 2015
    • Luc Bourlier's avatar
      [SPARK-10227] fatal warnings with sbt on Scala 2.11 · c1bc4f43
      Luc Bourlier authored
      The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary.
      But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations.
      
      The remainder are some potential bugs, and deprecated syntax.
      
      Author: Luc Bourlier <luc.bourlier@typesafe.com>
      
      Closes #8433 from skyluc/issue/sbt-2.11.
      c1bc4f43
  30. Aug 14, 2015
  31. Aug 04, 2015
  32. Jul 29, 2015
  33. Jul 17, 2015
    • tien-dungle's avatar
      [SPARK-9109] [GRAPHX] Keep the cached edge in the graph · 587c315b
      tien-dungle authored
      The change here is to keep the cached RDDs in the graph object so that when the graph.unpersist() is called these RDDs are correctly unpersisted.
      
      ```java
      import org.apache.spark.graphx._
      import org.apache.spark.rdd.RDD
      import org.slf4j.LoggerFactory
      import org.apache.spark.graphx.util.GraphGenerators
      
      // Create an RDD for the vertices
      val users: RDD[(VertexId, (String, String))] =
        sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
                             (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
      // Create an RDD for edges
      val relationships: RDD[Edge[String]] =
        sc.parallelize(Array(Edge(3L, 7L, "collab"),    Edge(5L, 3L, "advisor"),
                             Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
      // Define a default user in case there are relationship with missing user
      val defaultUser = ("John Doe", "Missing")
      // Build the initial Graph
      val graph = Graph(users, relationships, defaultUser)
      graph.cache().numEdges
      
      graph.unpersist()
      
      sc.getPersistentRDDs.foreach( r => println( r._2.toString))
      ```
      
      Author: tien-dungle <tien-dung.le@realimpactanalytics.com>
      
      Closes #7469 from tien-dungle/SPARK-9109_Graphx-unpersist and squashes the following commits:
      
      8d87997 [tien-dungle] Keep the cached edge in the graph
      587c315b
  34. Jul 14, 2015
    • Josh Rosen's avatar
      [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses · 11e5c372
      Josh Rosen authored
      This pull request adds a Scalastyle regex rule which fails the style check if `Class.forName` is used directly.  `Class.forName` always loads classes from the default / system classloader, but in a majority of cases, we should be using Spark's own `Utils.classForName` instead, which tries to load classes from the current thread's context classloader and falls back to the classloader which loaded Spark when the context classloader is not defined.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7350 from JoshRosen/ban-Class.forName and squashes the following commits:
      
      e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      c0b7885 [Josh Rosen] Hopefully fix the last two cases
      d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first cleanup pass
      046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion.
      d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName
      11e5c372
    • Andrew Ray's avatar
      [SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of partitions · 0a4071ea
      Andrew Ray authored
      See https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb
      
      Author: Andrew Ray <ray.andrew@gmail.com>
      
      Closes #7104 from aray/edge-partition-2d-improvement and squashes the following commits:
      
      3729f84 [Andrew Ray] correct bounds and remove unneeded comments
      97f8464 [Andrew Ray] change less
      5141ab4 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement
      925fd2c [Andrew Ray] use new interface for partitioning
      001bfd0 [Andrew Ray] Refactor PartitionStrategy so that we can return a prtition function for a given number of parts. To keep compatibility we define default methods that translate between the two implementation options. Made EdgePartition2D use old strategy when we have a perfect square and implement new interface.
      5d42105 [Andrew Ray] % -> /
      3560084 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement
      f006364 [Andrew Ray] remove unneeded comments
      cfa2c5e [Andrew Ray] Modifications to EdgePartition2D so that it works for non perfect squares.
      0a4071ea
  35. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
Loading