Skip to content
Snippets Groups Projects
  1. Apr 24, 2017
  2. Dec 21, 2016
    • Ryan Williams's avatar
      [SPARK-17807][CORE] split test-tags into test-JAR · afd9bc1d
      Ryan Williams authored
      Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
      
      Alternative to #16303.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #16311 from ryan-williams/tt.
      afd9bc1d
  3. Dec 02, 2016
  4. Nov 29, 2016
    • hyukjinkwon's avatar
      [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genjavadoc compatibility... · f830bb91
      hyukjinkwon authored
      [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genjavadoc compatibility in Java API documentation
      
      ## What changes were proposed in this pull request?
      
      This PR make `sbt unidoc` complete with Java 8.
      
      This PR roughly includes several fixes as below:
      
      - Fix unrecognisable class and method links in javadoc by changing it from `[[..]]` to `` `...` ``
      
        ```diff
        - * A column that will be computed based on the data in a [[DataFrame]].
        + * A column that will be computed based on the data in a `DataFrame`.
        ```
      
      - Fix throws annotations so that they are recognisable in javadoc
      
      - Fix URL links to `<a href="http..."></a>`.
      
        ```diff
        - * [[http://en.wikipedia.org/wiki/Decision_tree_learning Decision tree]] model for regression.
        + * <a href="http://en.wikipedia.org/wiki/Decision_tree_learning">
        + * Decision tree (Wikipedia)</a> model for regression.
        ```
      
        ```diff
        -   * see http://en.wikipedia.org/wiki/Receiver_operating_characteristic
        +   * see <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic">
        +   * Receiver operating characteristic (Wikipedia)</a>
        ```
      
      - Fix < to > to
      
        - `greater than`/`greater than or equal to` or `less than`/`less than or equal to` where applicable.
      
        - Wrap it with `{{{...}}}` to print them in javadoc or use `{code ...}` or `{literal ..}`. Please refer https://github.com/apache/spark/pull/16013#discussion_r89665558
      
      - Fix `</p>` complaint
      
      ## How was this patch tested?
      
      Manually tested by `jekyll build` with Java 7 and 8
      
      ```
      java version "1.7.0_80"
      Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
      ```
      
      ```
      java version "1.8.0_45"
      Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16013 from HyukjinKwon/SPARK-3359-errors-more.
      Unverified
      f830bb91
  5. Jul 19, 2016
  6. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  7. Jun 06, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'an -> a' · fd8af397
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `an -> a`
      
      Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one.
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13515 from zhengruifeng/an_a.
      fd8af397
  8. May 17, 2016
  9. May 10, 2016
    • Xin Ren's avatar
      [SPARK-14936][BUILD][TESTS] FlumePollingStreamSuite is slow · 86475520
      Xin Ren authored
      https://issues.apache.org/jira/browse/SPARK-14936
      
      ## What changes were proposed in this pull request?
      
      FlumePollingStreamSuite contains two tests which run for a minute each. This seems excessively slow and we should speed it up if possible.
      
      In this PR, instead of creating `StreamingContext` directly from `conf`, here an underlying `SparkContext` is created before all and it is used to create  each`StreamingContext`.
      
      Running time is reduced by avoiding multiple `SparkContext` creations and destroys.
      
      ## How was this patch tested?
      
      Tested on my local machine running `testOnly *.FlumePollingStreamSuite`
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #12845 from keypointt/SPARK-14936.
      86475520
  10. Apr 28, 2016
  11. Apr 24, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix lint-java errors · d34d6503
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items.
      
      - Adds a new line at the end of the files (19 files)
      - Fixes 25 lint-java errors (12 RedundantModifier, 6 **ArrayTypeStyle**, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder)
      
      ## How was this patch tested?
      
      After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.)
      ```bash
      $ dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12632 from dongjoon-hyun/SPARK-14868.
      d34d6503
  12. Apr 14, 2016
    • hyukjinkwon's avatar
      [MINOR][SQL] Remove extra anonymous closure within functional transformations · 6fc3dc88
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR removes extra anonymous closure within functional transformations.
      
      For example,
      
      ```scala
      .map(item => {
        ...
      })
      ```
      
      which can be just simply as below:
      
      ```scala
      .map { item =>
        ...
      }
      ```
      
      ## How was this patch tested?
      
      Related unit tests and `sbt scalastyle`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12382 from HyukjinKwon/minor-extra-closers.
      6fc3dc88
  13. Apr 06, 2016
  14. Apr 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results · 3f749f7e
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines).
      - Fix typos(exception/log strings, testcase name, comments) in 44 lines.
      - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011)
      - Use diamond operators in 40 lines. (New codes after SPARK-13702)
      - Fix redundant semicolon in 5 lines.
      - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala.
      
      ## How was this patch tested?
      
      Manual and pass the Jenkins tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12139 from dongjoon-hyun/SPARK-14355.
      3f749f7e
  15. Apr 02, 2016
    • Dongjoon Hyun's avatar
      [MINOR][DOCS] Use multi-line JavaDoc comments in Scala code. · 4a6e78ab
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes.
      (All comment-only changes over 77 files: +786 lines, −747 lines)
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
      4a6e78ab
  16. Mar 25, 2016
    • Shixiong Zhu's avatar
      [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark · 24587ce4
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR moves flume back to Spark as per the discussion in the dev mail-list.
      
      ## How was this patch tested?
      
      Existing Jenkins tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11895 from zsxwing/move-flume-back.
      24587ce4
  17. Mar 14, 2016
    • Shixiong Zhu's avatar
      [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt,... · 06dec374
      Shixiong Zhu authored
      [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages
      
      ## What changes were proposed in this pull request?
      
      Currently there are a few sub-projects, each for integrating with different external sources for Streaming.  Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages
      
      - streaming-flume
      - streaming-akka
      - streaming-mqtt
      - streaming-zeromq
      - streaming-twitter
      
      They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster.
      
      I have already copied these projects to https://github.com/spark-packages
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11672 from zsxwing/remove-external-pkg.
      06dec374
    • Dongjoon Hyun's avatar
      [MINOR][DOCS] Fix more typos in comments/strings. · acdf2197
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes 135 typos over 107 files:
      * 121 typos in comments
      * 11 typos in testcase name
      * 3 typos in log messages
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11689 from dongjoon-hyun/fix_more_typos.
      acdf2197
  18. Mar 13, 2016
    • Sean Owen's avatar
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <->... · 18408528
      Sean Owen authored
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
      
      ## What changes were proposed in this pull request?
      
      - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8
      - Same for `InputStreamReader` and `OutputStreamWriter` constructors
      - Standardizes on UTF-8 everywhere
      - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`)
      - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c )
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11657 from srowen/SPARK-13823.
      18408528
  19. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule · b5f02d67
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time.
      This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers.
      
      ## How was this patch tested?
      ```
      ./dev/lint-java
      ./build/sbt compile
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11438 from dongjoon-hyun/SPARK-13583.
      b5f02d67
  20. Feb 09, 2016
    • Holden Karau's avatar
      [SPARK-13165][STREAMING] Replace deprecated synchronizedBuffer in streaming · 159198ef
      Holden Karau authored
      Building with Scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative - we already use ConcurrentLinkedQueue elsewhere so lets replace it.
      
      Some notes about how behaviour is different for reviewers:
      The Seq from a SynchronizedBuffer that was implicitly converted would continue to receive updates - however when we do the same conversion explicitly on the ConcurrentLinkedQueue this isn't the case. Hence changing some of the (internal & test) APIs to pass an Iterable. toSeq is safe to use if there are no more updates.
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #11067 from holdenk/SPARK-13165-replace-deprecated-synchronizedBuffer-in-streaming.
      159198ef
  21. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  22. Jan 11, 2016
  23. Jan 05, 2016
  24. Dec 31, 2015
  25. Dec 19, 2015
  26. Dec 04, 2015
    • Shixiong Zhu's avatar
      [SPARK-12084][CORE] Fix codes that uses ByteBuffer.array incorrectly · 3af53e61
      Shixiong Zhu authored
      `ByteBuffer` doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by `ByteBuffer.slice`. We should not use the whole content of `ByteBuffer` unless we know that's correct.
      
      This patch fixed all places that use `ByteBuffer.array` incorrectly.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10083 from zsxwing/bytebuffer-array.
      3af53e61
  27. Nov 13, 2015
  28. Nov 10, 2015
    • Tathagata Das's avatar
      [SPARK-11361][STREAMING] Show scopes of RDD operations inside... · 6600786d
      Tathagata Das authored
      [SPARK-11361][STREAMING] Show scopes of RDD operations inside DStream.foreachRDD and DStream.transform in DAG viz
      
      Currently, when a DStream sets the scope for RDD generated by it, that scope is not allowed to be overridden by the RDD operations. So in case of `DStream.foreachRDD`, all the RDDs generated inside the foreachRDD get the same scope - `foreachRDD  <time>`, as set by the `ForeachDStream`. So it is hard to debug generated RDDs in the RDD DAG viz in the Spark UI.
      
      This patch allows the RDD operations inside `DStream.transform` and `DStream.foreachRDD` to append their own scopes to the earlier DStream scope.
      
      I have also slightly tweaked how callsites are set such that the short callsite reflects the RDD operation name and line number. This tweak is necessary as callsites are not managed through scopes (which support nesting and overriding) and I didnt want to add another local property to control nesting and overriding of callsites.
      
      ## Before:
      ![image](https://cloud.githubusercontent.com/assets/663212/10808548/fa71c0c4-7da9-11e5-9af0-5737793a146f.png)
      
      ## After:
      ![image](https://cloud.githubusercontent.com/assets/663212/10808659/37bc45b6-7dab-11e5-8041-c20be6a9bc26.png)
      
      The code that was used to generate this is:
      ```
          val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)
          val words = lines.flatMap(_.split(" "))
          val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
          wordCounts.foreachRDD { rdd =>
            val temp = rdd.map { _ -> 1 }.reduceByKey( _ + _)
            val temp2 = temp.map { _ -> 1}.reduceByKey(_ + _)
            val count = temp2.count
            println(count)
          }
      ```
      
      Note
      - The inner scopes of the RDD operations map/reduceByKey inside foreachRDD is visible
      - The short callsites of stages refers to the line number of the RDD ops rather than the same line number of foreachRDD in all three cases.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #9315 from tdas/SPARK-11361.
      6600786d
  29. Oct 08, 2015
  30. Oct 07, 2015
  31. Sep 15, 2015
  32. Sep 09, 2015
    • Luc Bourlier's avatar
      [SPARK-10227] fatal warnings with sbt on Scala 2.11 · c1bc4f43
      Luc Bourlier authored
      The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary.
      But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations.
      
      The remainder are some potential bugs, and deprecated syntax.
      
      Author: Luc Bourlier <luc.bourlier@typesafe.com>
      
      Closes #8433 from skyluc/issue/sbt-2.11.
      c1bc4f43
  33. Aug 25, 2015
  34. Aug 24, 2015
    • Tathagata Das's avatar
      [SPARK-9791] [PACKAGE] Change private class to private class to prevent... · 7478c8b6
      Tathagata Das authored
      [SPARK-9791] [PACKAGE] Change private class to private class to prevent unnecessary classes from showing up in the docs
      
      In addition, some random cleanup of import ordering
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8387 from tdas/SPARK-9791 and squashes the following commits:
      
      67f3ee9 [Tathagata Das] Change private class to private[package] class to prevent them from showing up in the docs
      7478c8b6
  35. Jul 01, 2015
    • zsxwing's avatar
      [SPARK-8378] [STREAMING] Add the Python API for Flume · 75b9fe4c
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6830 from zsxwing/flume-python and squashes the following commits:
      
      78dfdac [zsxwing] Fix the compile error in the test code
      f1bf3c0 [zsxwing] Address TD's comments
      0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
      e93736b [zsxwing] Fix the test case for determine_modules_to_test
      9d5821e [zsxwing] Fix pyspark_core dependencies
      f9ee681 [zsxwing] Merge branch 'master' into flume-python
      7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
      b96b0de [zsxwing] Merge branch 'master' into flume-python
      ce85e83 [zsxwing] Fix incompatible issues for Python 3
      01cbb3d [zsxwing] Add import sys
      152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
      14ba0ff [zsxwing] Add flume-assembly for sbt building
      b8d5551 [zsxwing] Merge branch 'master' into flume-python
      4762c34 [zsxwing] Fix the doc
      0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
      9f33873 [zsxwing] Add the Python API for Flume
      75b9fe4c
Loading