Skip to content
Snippets Groups Projects
  1. Sep 05, 2017
    • hyukjinkwon's avatar
      [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0. · 7f3c6ff4
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      1.0.0 fixes an issue with import order, explicit type for public methods, line length limitation and comment validation:
      
      ```
      [error] .../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala:50:16: Are you sure you want to println? If yes, wrap the code block with
      [error]       // scalastyle:off println
      [error]       println(...)
      [error]       // scalastyle:on println
      [error] .../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala:49: File line length exceeds 100 characters
      [error] .../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala:22:21: Are you sure you want to println? If yes, wrap the code block with
      [error]       // scalastyle:off println
      [error]       println(...)
      [error]       // scalastyle:on println
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:35:6: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:51:6: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:93:15: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:98:15: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:47:2: Insert a space after the start of the comment
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:26:43: JavaDStream should come before JavaDStreamLike.
      ```
      
      This PR also fixes the workaround added in SPARK-16877 for `org.scalastyle.scalariform.OverrideJavaChecker` feature, added from 0.9.0.
      
      ## How was this patch tested?
      
      Manually tested.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #19116 from HyukjinKwon/scalastyle-1.0.0.
      7f3c6ff4
  2. May 17, 2017
  3. Dec 13, 2016
    • Shixiong Zhu's avatar
      [SPARK-13747][CORE] Fix potential ThreadLocal leaks in RPC when using ForkJoinPool · fb3081d3
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Some places in SQL may call `RpcEndpointRef.askWithRetry` (e.g., ParquetFileFormat.buildReader -> SparkContext.broadcast -> ... -> BlockManagerMaster.updateBlockInfo -> RpcEndpointRef.askWithRetry), which will finally call `Await.result`. It may cause `java.lang.IllegalArgumentException: spark.sql.execution.id is already set` when running in Scala ForkJoinPool.
      
      This PR includes the following changes to fix this issue:
      
      - Remove `ThreadUtils.awaitResult`
      - Rename `ThreadUtils. awaitResultInForkJoinSafely` to `ThreadUtils.awaitResult`
      - Replace `Await.result` in RpcTimeout with `ThreadUtils.awaitResult`.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16230 from zsxwing/fix-SPARK-13747.
      fb3081d3
  4. Nov 04, 2016
    • Josh Rosen's avatar
      [SPARK-18256] Improve the performance of event log replay in HistoryServer · 0e3312ee
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      This patch significantly improves the performance of event log replay in the HistoryServer via two simple changes:
      
      - **Don't use `extractOpt`**: it turns out that `json4s`'s `extractOpt` method uses exceptions for control flow, causing huge performance bottlenecks due to the overhead of initializing exceptions. To avoid this overhead, we can simply use our own` Utils.jsonOption` method. This patch replaces all uses of `extractOpt` with `Utils.jsonOption` and adds a style checker rule to ban the use of the slow `extractOpt` method.
      - **Don't call `Utils.getFormattedClassName` for every event**: the old code called` Utils.getFormattedClassName` dozens of times per replayed event in order to match up class names in events with SparkListener event names. By simply storing the results of these calls in constants rather than recomputing them, we're able to eliminate a huge performance hotspot by removing thousands of expensive `Class.getSimpleName` calls.
      
      ## How was this patch tested?
      
      Tested by profiling the replay of a long event log using YourKit. For an event log containing 1000+ jobs, each of which had thousands of tasks, the changes in this patch cut the replay time in half:
      
      ![image](https://cloud.githubusercontent.com/assets/50748/19980953/31154622-a1bd-11e6-9be4-21fbb9b3f9a7.png)
      
      Prior to this patch's changes, the two slowest methods in log replay were internal exceptions thrown by `Json4S` and calls to `Class.getSimpleName()`:
      
      ![image](https://cloud.githubusercontent.com/assets/50748/19981052/87416cce-a1bd-11e6-9f25-06a7cd391822.png)
      
      After this patch, these hotspots are completely eliminated.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #15756 from JoshRosen/speed-up-jsonprotocol.
      0e3312ee
  5. Oct 26, 2016
    • Shixiong Zhu's avatar
      [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL · 7ac70e7b
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Calling `Await.result` will allow other tasks to be run on the same thread when using ForkJoinPool. However, SQL uses a `ThreadLocal` execution id to trace Spark jobs launched by a query, which doesn't work perfectly in ForkJoinPool.
      
      This PR just uses `Awaitable.result` instead to  prevent ForkJoinPool from running other tasks in the current waiting thread.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #15520 from zsxwing/SPARK-13747.
      7ac70e7b
  6. Aug 04, 2016
    • hyukjinkwon's avatar
      [SPARK-16877][BUILD] Add rules for preventing to use Java annotations (Deprecated and Override) · 1d781572
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR adds both rules for preventing to use `Deprecated` and `Override`.
      
      - Java's `Override`
        It seems Scala compiler just ignores this. Apparently, `override` modifier is only mandatory for " that override some other **concrete member definition** in a parent class" but not for for **incomplete member definition** (such as ones from trait or abstract), see (http://www.scala-lang.org/files/archive/spec/2.11/05-classes-and-objects.html#override)
      
        For a simple example,
      
        - Normal class - needs `override` modifier
      
        ```bash
        scala> class A { def say = {}}
        defined class A
      
        scala> class B extends A { def say = {}}
        <console>:8: error: overriding method say in class A of type => Unit;
         method say needs `override' modifier
               class B extends A { def say = {}}
                                       ^
        ```
      
        - Trait - does not need `override` modifier
      
        ```bash
        scala> trait A { def say }
        defined trait A
      
        scala> class B extends A { def say = {}}
        defined class B
        ```
      
        To cut this short, this case below is possible,
      
        ```bash
        scala> class B extends A {
             |    Override
             |    def say = {}
             | }
        defined class B
        ```
        we can write `Override` annotation (meaning nothing) which might confuse engineers that Java's annotation is working fine. It might be great if we prevent those potential confusion.
      
      - Java's `Deprecated`
        When `Deprecated` is used,  it seems Scala compiler recognises this correctly but it seems we use Scala one `deprecated` across codebase.
      
      ## How was this patch tested?
      
      Manually tested, by inserting both `Override` and `Deprecated`. This will shows the error messages as below:
      
      ```bash
      Scalastyle checks failed at following occurrences:
      [error] ... : deprecated should be used instead of java.lang.Deprecated.
      ```
      
      ```basg
      Scalastyle checks failed at following occurrences:
      [error] ... : override modifier should be used instead of java.lang.Override.
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #14490 from HyukjinKwon/SPARK-16877.
      1d781572
  7. Jun 24, 2016
  8. Apr 22, 2016
    • Joan's avatar
      [SPARK-6429] Implement hashCode and equals together · bf95b8da
      Joan authored
      ## What changes were proposed in this pull request?
      
      Implement some `hashCode` and `equals` together in order to enable the scalastyle.
      This is a first batch, I will continue to implement them but I wanted to know your thoughts.
      
      Author: Joan <joan@goyeau.com>
      
      Closes #12157 from joan38/SPARK-6429-HashCode-Equals.
      bf95b8da
  9. Apr 19, 2016
    • Josh Rosen's avatar
      [SPARK-14676] Wrap and re-throw Await.result exceptions in order to capture full stacktrace · 947b9020
      Josh Rosen authored
      When `Await.result` throws an exception which originated from a different thread, the resulting stacktrace doesn't include the path leading to the `Await.result` call itself, making it difficult to identify the impact of these exceptions. For example, I've seen cases where broadcast cleaning errors propagate to the main thread and crash it but the resulting stacktrace doesn't include any of the main thread's code, making it difficult to pinpoint which exception crashed that thread.
      
      This patch addresses this issue by explicitly catching, wrapping, and re-throwing exceptions that are thrown by `Await.result`.
      
      I tested this manually using https://github.com/JoshRosen/spark/commit/16b31c825197ee31a50214c6ba3c1df08148f403, a patch which reproduces an issue where an RPC exception which occurs while unpersisting RDDs manages to crash the main thread without any useful stacktrace, and verified that informative, full stacktraces were generated after applying the fix in this PR.
      
      /cc rxin nongli yhuai anabranch
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #12433 from JoshRosen/wrap-and-rethrow-await-exceptions.
      947b9020
  10. Apr 12, 2016
  11. Apr 06, 2016
  12. Mar 10, 2016
    • Dongjoon Hyun's avatar
      [SPARK-3854][BUILD] Scala style: require spaces before `{`. · 91fed8e9
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Since the opening curly brace, '{', has many usages as discussed in [SPARK-3854](https://issues.apache.org/jira/browse/SPARK-3854), this PR adds a ScalaStyle rule to prevent '){' pattern  for the following majority pattern and fixes the code accordingly. If we enforce this in ScalaStyle from now, it will improve the Scala code quality and reduce review time.
      ```
      // Correct:
      if (true) {
        println("Wow!")
      }
      
      // Incorrect:
      if (true){
         println("Wow!")
      }
      ```
      IntelliJ also shows new warnings based on this.
      
      ## How was this patch tested?
      
      Pass the Jenkins ScalaStyle test.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11637 from dongjoon-hyun/SPARK-3854.
      91fed8e9
  13. Feb 10, 2016
  14. Jan 13, 2016
  15. Jan 12, 2016
  16. Jan 10, 2016
    • Marcelo Vanzin's avatar
      [SPARK-3873][BUILD] Enable import ordering error checking. · 6439a825
      Marcelo Vanzin authored
      Turn import ordering violations into build errors, plus a few adjustments
      to account for how the checker behaves. I'm a little on the fence about
      whether the existing code is right, but it's easier to appease the checker
      than to discuss what's the more correct order here.
      
      Plus a few fixes to imports that cropped in since my recent cleanups.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10612 from vanzin/SPARK-3873-enable.
      6439a825
  17. Jan 08, 2016
  18. Jan 02, 2016
  19. Jan 01, 2016
    • Marcelo Vanzin's avatar
      [SPARK-3873][MLLIB] Import order fixes. · a59a357c
      Marcelo Vanzin authored
      A slight adjustment to the checker configuration was needed; there is
      a handful of warnings still left, but those are because of a bug in
      the checker that I'll fix separately (before enabling errors for the
      checker, of course).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10535 from vanzin/SPARK-3873-mllib.
      a59a357c
  20. Dec 16, 2015
  21. Dec 08, 2015
    • Marcelo Vanzin's avatar
      [SPARK-3873][BUILD] Add style checker to enforce import ordering. · 2ff17bcf
      Marcelo Vanzin authored
      The checker tries to follow as closely as possible the guidelines of
      the code style document, and makes some decisions where the guide is
      not clear. In particular:
      
      - wildcard imports come first when there are other imports in the
        same package
      - multi-import blocks come before single imports
      - lower-case names inside multi-import blocks come before others
      
      In some projects, such as graphx, there seems to be a convention to
      separate o.a.s imports from the project's own; to simplify the
      checker, I chose not to allow that, which is a strict interpretation
      of the code style guide, even though I think it makes sense.
      
      Since the checks are based on syntax only, some edge cases may
      generate spurious warnings; for example, when class names start
      with a lower case letter (and are thus treated as a package name
      by the checker).
      
      The checker is currently only generating warnings, and since there
      are many of those, the build output does get a little noisy. The
      idea is to fix the code (and the checker, as needed) little by little
      instead of having a huge change that touches everywhere.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6502 from vanzin/SPARK-3873.
      2ff17bcf
  22. Nov 10, 2015
  23. Sep 12, 2015
  24. Aug 25, 2015
  25. Jul 14, 2015
    • Josh Rosen's avatar
      [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses · 11e5c372
      Josh Rosen authored
      This pull request adds a Scalastyle regex rule which fails the style check if `Class.forName` is used directly.  `Class.forName` always loads classes from the default / system classloader, but in a majority of cases, we should be using Spark's own `Utils.classForName` instead, which tries to load classes from the current thread's context classloader and falls back to the classloader which loaded Spark when the context classloader is not defined.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7350 from JoshRosen/ban-Class.forName and squashes the following commits:
      
      e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      c0b7885 [Josh Rosen] Hopefully fix the last two cases
      d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first cleanup pass
      046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion.
      d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName
      11e5c372
  26. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
  27. May 31, 2015
    • Reynold Xin's avatar
      [SPARK-7986] Split scalastyle config into 3 sections. · 6f006b5f
      Reynold Xin authored
      (1) rules that we enforce.
      (2) rules that we would like to enforce, but haven't cleaned up the codebase to
          turn on yet (or we need to make the scalastyle rule more configurable).
      (3) rules that we don't want to enforce.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6543 from rxin/scalastyle and squashes the following commits:
      
      beefaab [Reynold Xin] [SPARK-7986] Split scalastyle config into 3 sections.
      6f006b5f
    • Reynold Xin's avatar
      [SPARK-3850] Turn style checker on for trailing whitespaces. · 866652c9
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6541 from rxin/trailing-whitespace-on and squashes the following commits:
      
      f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing whitespaces.
      866652c9
    • Reynold Xin's avatar
      [SPARK-7979] Enforce structural type checker. · 4b5f12ba
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6536 from rxin/structural-type-checker and squashes the following commits:
      
      f833151 [Reynold Xin] Fixed compilation.
      633f9a1 [Reynold Xin] Fixed typo.
      d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.
      4b5f12ba
    • Reynold Xin's avatar
      [SPARK-7975] Add style checker to disallow overriding equals covariantly. · 7896e99b
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Reynold Xin <rxin@databricks.com>
      
      Closes #6527 from rxin/covariant-equals and squashes the following commits:
      
      e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker
      7896e99b
    • Reynold Xin's avatar
      [SPARK-7976] Add style checker to disallow overriding finalize. · 084fef76
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6528 from rxin/style-finalizer and squashes the following commits:
      
      a2211ca [Reynold Xin] [SPARK-7976] Enable NoFinalizeChecker.
      084fef76
  28. May 30, 2015
    • Andrew Or's avatar
      [TRIVIAL] Typo fix for last commit · 193dba01
      Andrew Or authored
      193dba01
    • Andrew Or's avatar
      [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike · 609c4923
      Andrew Or authored
      This is a follow-up patch to #6441.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6510 from andrewor14/extends-funsuite-check and squashes the following commits:
      
      6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check
      99d02ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into extends-funsuite-check
      48874dd [Andrew Or] Guard against direct uses of FunSuite / FunSuiteLike
      609c4923
  29. May 29, 2015
  30. Apr 03, 2015
    • Reynold Xin's avatar
      [SPARK-6428] Turn on explicit type checking for public methods. · 82701ee2
      Reynold Xin authored
      This builds on my earlier pull requests and turns on the explicit type checking in scalastyle.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5342 from rxin/SPARK-6428 and squashes the following commits:
      
      7b531ab [Reynold Xin] import ordering
      2d9a8a5 [Reynold Xin] jl
      e668b1c [Reynold Xin] override
      9b9e119 [Reynold Xin] Parenthesis.
      82e0cf5 [Reynold Xin] [SPARK-6428] Turn on explicit type checking for public methods.
      82701ee2
  31. Mar 24, 2015
    • Reynold Xin's avatar
      [SPARK-6428] Added explicit types for all public methods in core. · 4ce2782a
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5125 from rxin/core-explicit-type and squashes the following commits:
      
      f471415 [Reynold Xin] Revert style checker changes.
      81b66e4 [Reynold Xin] Code review feedback.
      a7533e3 [Reynold Xin] Mima excludes.
      1d795f5 [Reynold Xin] [SPARK-6428] Added explicit types for all public methods in core.
      4ce2782a
Loading