Skip to content
Snippets Groups Projects
  1. Jul 18, 2017
    • Sean Owen's avatar
      [SPARK-21415] Triage scapegoat warnings, part 1 · e26dac5f
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Address scapegoat warnings for:
      - BigDecimal double constructor
      - Catching NPE
      - Finalizer without super
      - List.size is O(n)
      - Prefer Seq.empty
      - Prefer Set.empty
      - reverse.map instead of reverseMap
      - Type shadowing
      - Unnecessary if condition.
      - Use .log1p
      - Var could be val
      
      In some instances like Seq.empty, I avoided making the change even where valid in test code to keep the scope of the change smaller. Those issues are concerned with performance and it won't matter for tests.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #18635 from srowen/Scapegoat1.
      e26dac5f
  2. Jun 19, 2017
  3. Jun 11, 2017
    • hyukjinkwon's avatar
      [SPARK-20935][STREAMING] Always close WriteAheadLog and make it idempotent · eb3ea3a0
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to stop `ReceiverTracker` to close `WriteAheadLog` whenever it is and make `WriteAheadLog` and its implementations idempotent.
      
      ## How was this patch tested?
      
      Added a test in `WriteAheadLogSuite`. Note that  the added test looks passing even if it closes twice (namely even without the changes in `FileBasedWriteAheadLog` and `BatchedWriteAheadLog`. It looks both are already idempotent but this is a rather sanity check.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18224 from HyukjinKwon/streaming-closing.
      eb3ea3a0
  4. Jun 08, 2017
    • Josh Rosen's avatar
      [SPARK-20863] Add metrics/instrumentation to LiveListenerBus · 2a23cdd0
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      This patch adds Coda Hale metrics for instrumenting the `LiveListenerBus` in order to track the number of events received, dropped, and processed. In addition, it adds per-SparkListener-subclass timers to track message processing time. This is useful for identifying when slow third-party SparkListeners cause performance bottlenecks.
      
      See the new `LiveListenerBusMetrics` for a complete description of the new metrics.
      
      ## How was this patch tested?
      
      New tests in SparkListenerSuite, including a test to ensure proper counting of dropped listener events.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #18083 from JoshRosen/listener-bus-metrics.
      2a23cdd0
  5. May 26, 2017
  6. May 17, 2017
  7. May 12, 2017
    • Sean Owen's avatar
      [SPARK-20554][BUILD] Remove usage of scala.language.reflectiveCalls · fc8a2b6e
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Remove uses of scala.language.reflectiveCalls that are either unnecessary or probably resulting in more complex code. This turned out to be less significant than I thought, but, still worth a touch-up.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17949 from srowen/SPARK-20554.
      fc8a2b6e
  8. May 10, 2017
    • NICHOLAS T. MARION's avatar
      [SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities · b512233a
      NICHOLAS T. MARION authored
      ## What changes were proposed in this pull request?
      
      Add stripXSS and stripXSSMap to Spark Core's UIUtils. Calling these functions at any point that getParameter is called against a HttpServletRequest.
      
      ## How was this patch tested?
      
      Unit tests, IBM Security AppScan Standard no longer showing vulnerabilities, manual verification of WebUI pages.
      
      Author: NICHOLAS T. MARION <nmarion@us.ibm.com>
      
      Closes #17686 from n-marion/xss-fix.
      b512233a
  9. Apr 24, 2017
  10. Apr 12, 2017
    • hyukjinkwon's avatar
      [SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins · ceaf77ae
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.
      
      There are several problems with it:
      
      - It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".
      
      - > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.
      
        (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))
      
      To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.
      
      There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013
      
      Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings.
      
      ## How was this patch tested?
      
      Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.
      
      This was tested via manually adding `time.time()` as below:
      
      ```diff
           profiles_and_goals = build_profiles + sbt_goals
      
           print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
                 " ".join(profiles_and_goals))
      
      +    import time
      +    st = time.time()
           exec_sbt(profiles_and_goals)
      +    print("Elapsed :[%s]" % str(time.time() - st))
      ```
      
      produces
      
      ```
      ...
      ========================================================================
      Building Unidoc API Documentation
      ========================================================================
      ...
      [info] Main Java API documentation successful.
      ...
      Elapsed :[94.8746569157]
      ...
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17477 from HyukjinKwon/SPARK-18692.
      ceaf77ae
  11. Apr 10, 2017
    • Sean Owen's avatar
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish... · a26e3ed5
      Sean Owen authored
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems
      
      ## What changes were proposed in this pull request?
      
      Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
      
      The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17527 from srowen/SPARK-20156.
      a26e3ed5
  12. Mar 30, 2017
    • Jacek Laskowski's avatar
      [DOCS] Docs-only improvements · 0197262a
      Jacek Laskowski authored
      …adoc
      
      ## What changes were proposed in this pull request?
      
      Use recommended values for row boundaries in Window's scaladoc, i.e. `Window.unboundedPreceding`, `Window.unboundedFollowing`, and `Window.currentRow` (that were introduced in 2.1.0).
      
      ## How was this patch tested?
      
      Local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #17417 from jaceklaskowski/window-expression-scaladoc.
      0197262a
  13. Mar 29, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19556][CORE] Do not encrypt block manager data in memory. · b56ad2b1
      Marcelo Vanzin authored
      This change modifies the way block data is encrypted to make the more
      common cases faster, while penalizing an edge case. As a side effect
      of the change, all data that goes through the block manager is now
      encrypted only when needed, including the previous path (broadcast
      variables) where that did not happen.
      
      The way the change works is by not encrypting data that is stored in
      memory; so if a serialized block is in memory, it will only be encrypted
      once it is evicted to disk.
      
      The penalty comes when transferring that encrypted data from disk. If the
      data ends up in memory again, it is as efficient as before; but if the
      evicted block needs to be transferred directly to a remote executor, then
      there's now a performance penalty, since the code now uses a custom
      FileRegion implementation to decrypt the data before transferring.
      
      This also means that block data transferred between executors now is
      not encrypted (and thus relies on the network library encryption support
      for secrecy). Shuffle blocks are still transferred in encrypted form,
      since they're handled in a slightly different way by the code. This also
      keeps compatibility with existing external shuffle services, which transfer
      encrypted shuffle blocks, and avoids having to make the external service
      aware of encryption at all.
      
      The serialization and deserialization APIs in the SerializerManager now
      do not do encryption automatically; callers need to explicitly wrap their
      streams with an appropriate crypto stream before using those.
      
      As a result of these changes, some of the workarounds added in SPARK-19520
      are removed here.
      
      Testing: a new trait ("EncryptionFunSuite") was added that provides an easy
      way to run a test twice, with encryption on and off; broadcast, block manager
      and caching tests were modified to use this new trait so that the existing
      tests exercise both encrypted and non-encrypted paths. I also ran some
      applications with encryption turned on to verify that they still work,
      including streaming tests that failed without the fix for SPARK-19520.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #17295 from vanzin/SPARK-19556.
      b56ad2b1
  14. Mar 24, 2017
  15. Mar 12, 2017
    • uncleGen's avatar
      [DOCS][SS] fix structured streaming python example · e29a74d5
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      - SS python example: `TypeError: 'xxx' object is not callable`
      - some other doc issue.
      
      ## How was this patch tested?
      
      Jenkins.
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #17257 from uncleGen/docs-ss-python.
      e29a74d5
  16. Mar 05, 2017
    • uncleGen's avatar
      [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not... · 207067ea
      uncleGen authored
      [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string.
      
      ## What changes were proposed in this pull request?
      
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73800/testReport/
      
      ```
      sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code
      passed to eventually never returned normally. Attempted 617 times over 10.003740484 seconds.
      Last failure message: 8 did not equal 2.
      	at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
      	at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
      	at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      	at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:336)
      	at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      	at org.apache.spark.streaming.DStreamCheckpointTester$class.generateOutput(CheckpointSuite
      .scala:172)
      	at org.apache.spark.streaming.CheckpointSuite.generateOutput(CheckpointSuite.scala:211)
      ```
      
      the check condition is:
      
      ```
      val checkpointFilesOfLatestTime = Checkpoint.getCheckpointFiles(checkpointDir).filter {
           _.toString.contains(clock.getTimeMillis.toString)
      }
      // Checkpoint files are written twice for every batch interval. So assert that both
      // are written to make sure that both of them have been written.
      assert(checkpointFilesOfLatestTime.size === 2)
      ```
      
      the path string may contain the `clock.getTimeMillis.toString`, like `3500` :
      
      ```
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-500
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1000
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1500
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2000
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2500
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3000
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500.bk
      file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500
                                                             ▲▲▲▲
      ```
      
      so we should only check the filename, but not the whole path.
      
      ## How was this patch tested?
      
      Jenkins.
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #17167 from uncleGen/flaky-CheckpointSuite.
      207067ea
  17. Feb 27, 2017
    • hyukjinkwon's avatar
      [MINOR][BUILD] Fix lint-java breaks in Java · 4ba9c6c4
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to fix the lint-breaks as below:
      
      ```
      [ERROR] src/test/java/org/apache/spark/network/TransportResponseHandlerSuite.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.network.buffer.ManagedBuffer.
      [ERROR] src/main/java/org/apache/spark/unsafe/types/UTF8String.java:[156,10] (modifier) ModifierOrder: 'Nonnull' annotation modifier does not precede non-annotation modifiers.
      [ERROR] src/main/java/org/apache/spark/SparkFirehoseListener.java:[122] (sizes) LineLength: Line is longer than 100 characters (found 105).
      [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[164,78] (coding) OneStatementPerLine: Only one statement per line allowed.
      [ERROR] src/test/java/test/org/apache/spark/JavaAPISuite.java:[1157] (sizes) LineLength: Line is longer than 100 characters (found 121).
      [ERROR] src/test/java/org/apache/spark/streaming/JavaMapWithStateSuite.java:[149] (sizes) LineLength: Line is longer than 100 characters (found 113).
      [ERROR] src/test/java/test/org/apache/spark/streaming/Java8APISuite.java:[146] (sizes) LineLength: Line is longer than 100 characters (found 122).
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[32,8] (imports) UnusedImports: Unused import - org.apache.spark.streaming.Time.
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[611] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[1317] (sizes) LineLength: Line is longer than 100 characters (found 102).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java:[91] (sizes) LineLength: Line is longer than 100 characters (found 102).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[113] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[164] (sizes) LineLength: Line is longer than 100 characters (found 110).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[212] (sizes) LineLength: Line is longer than 100 characters (found 114).
      [ERROR] src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java:[36] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java:[26,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
      [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[20,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
      [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[94] (sizes) LineLength: Line is longer than 100 characters (found 103).
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[30,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.api.java.UDF1.
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[72] (sizes) LineLength: Line is longer than 100 characters (found 104).
      [ERROR] src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java:[121] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[28,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaRDD.
      [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaSparkContext.
      ```
      
      ## How was this patch tested?
      
      Manually via
      
      ```bash
      ./dev/lint-java
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17072 from HyukjinKwon/java-lint.
      4ba9c6c4
  18. Feb 21, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19652][UI] Do auth checks for REST API access. · 17d83e1e
      Marcelo Vanzin authored
      The REST API has a security filter that performs auth checks
      based on the UI root's security manager. That works fine when
      the UI root is the app's UI, but not when it's the history server.
      
      In the SHS case, all users would be allowed to see all applications
      through the REST API, even if the UI itself wouldn't be available
      to them.
      
      This change adds auth checks for each app access through the API
      too, so that only authorized users can see the app's data.
      
      The change also modifies the existing security filter to use
      `HttpServletRequest.getRemoteUser()`, which is used in other
      places. That is not necessarily the same as the principal's
      name; for example, when using Hadoop's SPNEGO auth filter,
      the remote user strips the realm information, which then matches
      the user name registered as the owner of the application.
      
      I also renamed the UIRootFromServletContext trait to a more generic
      name since I'm using it to store more context information now.
      
      Tested manually with an authentication filter enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16978 from vanzin/SPARK-19652.
      17d83e1e
  19. Feb 20, 2017
    • Sean Owen's avatar
      [SPARK-19646][CORE][STREAMING] binaryRecords replicates records in scala API · d0ecca60
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Use `BytesWritable.copyBytes`, not `getBytes`, because `getBytes` returns the underlying array, which may be reused when repeated reads don't need a different size, as is the case with binaryRecords APIs
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16974 from srowen/SPARK-19646.
      Unverified
      d0ecca60
  20. Feb 19, 2017
    • Sean Owen's avatar
      [SPARK-19534][TESTS] Convert Java tests to use lambdas, Java 8 features · 1487c9af
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Convert tests to use Java 8 lambdas, and modest related fixes to surrounding code.
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16964 from srowen/SPARK-19534.
      Unverified
      1487c9af
    • jinxing's avatar
      [SPARK-19450] Replace askWithRetry with askSync. · ba8912e5
      jinxing authored
      ## What changes were proposed in this pull request?
      
      `askSync` is already added in `RpcEndpointRef` (see SPARK-19347 and https://github.com/apache/spark/pull/16690#issuecomment-276850068) and `askWithRetry` is marked as deprecated.
      As mentioned SPARK-18113(https://github.com/apache/spark/pull/16503#event-927953218):
      
      >askWithRetry is basically an unneeded API, and a leftover from the akka days that doesn't make sense anymore. It's prone to cause deadlocks (exactly because it's blocking), it imposes restrictions on the caller (e.g. idempotency) and other things that people generally don't pay that much attention to when using it.
      
      Since `askWithRetry` is just used inside spark and not in user logic. It might make sense to replace all of them with `askSync`.
      
      ## How was this patch tested?
      This PR doesn't change code logic, existing unit test can cover.
      
      Author: jinxing <jinxing@meituan.com>
      
      Closes #16790 from jinxing64/SPARK-19450.
      Unverified
      ba8912e5
  21. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  22. Feb 13, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19520][STREAMING] Do not encrypt data written to the WAL. · 0169360e
      Marcelo Vanzin authored
      Spark's I/O encryption uses an ephemeral key for each driver instance.
      So driver B cannot decrypt data written by driver A since it doesn't
      have the correct key.
      
      The write ahead log is used for recovery, thus needs to be readable by
      a different driver. So it cannot be encrypted by Spark's I/O encryption
      code.
      
      The BlockManager APIs used by the WAL code to write the data automatically
      encrypt data, so changes are needed so that callers can to opt out of
      encryption.
      
      Aside from that, the "putBytes" API in the BlockManager does not do
      encryption, so a separate situation arised where the WAL would write
      unencrypted data to the BM and, when those blocks were read, decryption
      would fail. So the WAL code needs to ask the BM to encrypt that data
      when encryption is enabled; this code is not optimal since it results
      in a (temporary) second copy of the data block in memory, but should be
      OK for now until a more performant solution is added. The non-encryption
      case should not be affected.
      
      Tested with new unit tests, and by running streaming apps that do
      recovery using the WAL data with I/O encryption turned on.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16862 from vanzin/SPARK-19520.
      0169360e
  23. Feb 01, 2017
    • jinxing's avatar
      [SPARK-19347] ReceiverSupervisorImpl can add block to ReceiverTracker multiple... · c5fcb7f6
      jinxing authored
      [SPARK-19347] ReceiverSupervisorImpl can add block to ReceiverTracker multiple times because of askWithRetry.
      
      ## What changes were proposed in this pull request?
      
      `ReceiverSupervisorImpl` on executor side reports block's meta back to `ReceiverTracker` on driver side. In current code, `askWithRetry` is used. However, for `AddBlock`, `ReceiverTracker` is not idempotent, which may result in messages are processed multiple times.
      
      **To reproduce**:
      
      1. Check if it is the first time receiving `AddBlock` in `ReceiverTracker`, if so sleep long enough(say 200 seconds), thus the first RPC call will be timeout in `askWithRetry`, then `AddBlock` will be resent.
      2. Rebuild Spark and run following job:
      ```
        def streamProcessing(): Unit = {
          val conf = new SparkConf()
            .setAppName("StreamingTest")
            .setMaster(masterUrl)
          val ssc = new StreamingContext(conf, Seconds(200))
          val stream = ssc.socketTextStream("localhost", 1234)
          stream.print()
          ssc.start()
          ssc.awaitTermination()
        }
      ```
      **To fix**:
      
      It makes sense to provide a blocking version `ask` in RpcEndpointRef, as mentioned in SPARK-18113 (https://github.com/apache/spark/pull/16503#event-927953218). Because Netty RPC layer will not drop messages. `askWithRetry` is a leftover from akka days. It imposes restrictions on the caller(e.g. idempotency) and other things that people generally don't pay that much attention to when using it.
      
      ## How was this patch tested?
      Test manually. The scenario described above doesn't happen with this patch.
      
      Author: jinxing <jinxing@meituan.com>
      
      Closes #16690 from jinxing64/SPARK-19347.
      c5fcb7f6
    • hyukjinkwon's avatar
      [SPARK-19402][DOCS] Support LaTex inline formula correctly and fix warnings in... · f1a1f260
      hyukjinkwon authored
      [SPARK-19402][DOCS] Support LaTex inline formula correctly and fix warnings in Scala/Java APIs generation
      
      ## What changes were proposed in this pull request?
      
      This PR proposes three things as below:
      
      - Support LaTex inline-formula, `\( ... \)` in Scala API documentation
        It seems currently,
      
        ```
        \( ... \)
        ```
      
        are rendered as they are, for example,
      
        <img width="345" alt="2017-01-30 10 01 13" src="https://cloud.githubusercontent.com/assets/6477701/22423960/ab37d54a-e737-11e6-9196-4f6229c0189c.png">
      
        It seems mistakenly more backslashes were added.
      
      - Fix warnings Scaladoc/Javadoc generation
        This PR fixes t two types of warnings as below:
      
        ```
        [warn] .../spark/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala:335: Could not find any member to link for "UnsupportedOperationException".
        [warn]   /**
        [warn]   ^
        ```
      
        ```
        [warn] .../spark/sql/core/src/main/scala/org/apache/spark/sql/internal/VariableSubstitution.scala:24: Variable var undefined in comment for class VariableSubstitution in class VariableSubstitution
        [warn]  * `${var}`, `${system:var}` and `${env:var}`.
        [warn]      ^
        ```
      
      - Fix Javadoc8 break
        ```
        [error] .../spark/mllib/target/java/org/apache/spark/ml/PredictionModel.java:7: error: reference not found
        [error]  *                       E.g., {link VectorUDT} for vector features.
        [error]                                       ^
        [error] .../spark/mllib/target/java/org/apache/spark/ml/PredictorParams.java:12: error: reference not found
        [error]    *                          E.g., {link VectorUDT} for vector features.
        [error]                                            ^
        [error] .../spark/mllib/target/java/org/apache/spark/ml/Predictor.java:10: error: reference not found
        [error]  *                       E.g., {link VectorUDT} for vector features.
        [error]                                       ^
        [error] .../spark/sql/hive/target/java/org/apache/spark/sql/hive/HiveAnalysis.java:5: error: reference not found
        [error]  * Note that, this rule must be run after {link PreprocessTableInsertion}.
        [error]                                                  ^
        ```
      
      ## How was this patch tested?
      
      Manually via `sbt unidoc` and `jeykil build`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16741 from HyukjinKwon/warn-and-break.
      Unverified
      f1a1f260
  24. Jan 24, 2017
  25. Jan 18, 2017
    • uncleGen's avatar
      [SPARK-19182][DSTREAM] Optimize the lock in StreamingJobProgressListener to... · a81e336f
      uncleGen authored
      [SPARK-19182][DSTREAM] Optimize the lock in StreamingJobProgressListener to not block UI when generating Streaming jobs
      
      ## What changes were proposed in this pull request?
      
      When DStreamGraph is generating a job, it will hold a lock and block other APIs. Because StreamingJobProgressListener (numInactiveReceivers, streamName(streamId: Int), streamIds) needs to call DStreamGraph's methods to access some information, the UI may hang if generating a job is very slow (e.g., talking to the slow Kafka cluster to fetch metadata).
      It's better to optimize the locks in DStreamGraph and StreamingJobProgressListener to make the UI not block by job generation.
      
      ## How was this patch tested?
      existing ut
      
      cc zsxwing
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #16601 from uncleGen/SPARK-19182.
      a81e336f
  26. Jan 16, 2017
  27. Jan 10, 2017
    • hyukjinkwon's avatar
      [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all identified tests failed due... · 4e27578f
      hyukjinkwon authored
      [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all identified tests failed due to path and resource-not-closed problems on Windows
      
      ## What changes were proposed in this pull request?
      
      This PR proposes to fix all the test failures identified by testing with AppVeyor.
      
      **Scala - aborted tests**
      
      ```
      WindowQuerySuite:
        Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.WindowQuerySuite *** ABORTED *** (156 milliseconds)
         org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive   argetscala-2.11   est-classesdatafilespart_tiny.txt;
      
      OrcSourceSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.orc.OrcSourceSuite *** ABORTED *** (62 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
      ParquetMetastoreSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.ParquetMetastoreSuite *** ABORTED *** (4 seconds, 703 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
      ParquetSourceSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.ParquetSourceSuite *** ABORTED *** (3 seconds, 907 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark  arget mpspark-581a6575-454f-4f21-a516-a07f95266143;
      
      KafkaRDDSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaRDDSuite *** ABORTED *** (5 seconds, 212 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-4722304d-213e-4296-b556-951df1a46807
      
      DirectKafkaStreamSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite *** ABORTED *** (7 seconds, 127 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-d0d3eba7-4215-4e10-b40e-bb797e89338e
         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
      
      ReliableKafkaStreamSuite
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite *** ABORTED *** (5 seconds, 498 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-d33e45a0-287e-4bed-acae-ca809a89d888
      
      KafkaStreamSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaStreamSuite *** ABORTED *** (2 seconds, 892 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-59c9d169-5a56-4519-9ef0-cefdbd3f2e6c
      
      KafkaClusterSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaClusterSuite *** ABORTED *** (1 second, 690 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-3ef402b0-8689-4a60-85ae-e41e274f179d
      
      DirectKafkaStreamSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite *** ABORTED *** (59 seconds, 626 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-426107da-68cf-4d94-b0d6-1f428f1c53f6
      
      KafkaRDDSuite:
      Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka010.KafkaRDDSuite *** ABORTED *** (2 minutes, 6 seconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-b9ce7929-5dae-46ab-a0c4-9ef6f58fbc2
      ```
      
      **Java - failed tests**
      
      ```
      Test org.apache.spark.streaming.kafka.JavaKafkaRDDSuite.testKafkaRDD failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-1cee32f4-4390-4321-82c9-e8616b3f0fb0, took 9.61 sec
      
      Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-f42695dd-242e-4b07-847c-f299b8e4676e, took 11.797 sec
      
      Test org.apache.spark.streaming.kafka.JavaDirectKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-85c0d062-78cf-459c-a2dd-7973572101ce, took 1.581 sec
      
      Test org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite.testKafkaRDD failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-49eb6b5c-8366-47a6-83f2-80c443c48280, took 17.895 sec
      
      org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-898cf826-d636-4b1c-a61a-c12a364c02e7, took 8.858 sec
      ```
      
      **Scala - failed tests**
      
      ```
      PartitionProviderCompatibilitySuite:
       - insert overwrite partition of new datasource table overwrites just partition *** FAILED *** (828 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-bb6337b9-4f99-45ab-ad2c-a787ab965c09
      
       - SPARK-18635 special chars in partition values - partition management true *** FAILED *** (5 seconds, 360 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - SPARK-18635 special chars in partition values - partition management false *** FAILED *** (141 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      ```
      
      ```
      UtilsSuite:
       - reading offset bytes of a file (compressed) *** FAILED *** (0 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-ecb2b7d5-db8b-43a7-b268-1bf242b5a491
      
       - reading offset bytes across multiple files (compressed) *** FAILED *** (0 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-25cc47a8-1faa-4da5-8862-cf174df63ce0
      ```
      
      ```
      StatisticsSuite:
       - MetastoreRelations fallback to HDFS for size estimation *** FAILED *** (110 milliseconds)
         org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'csv_table' not found in database 'default';
      ```
      
      ```
      SQLQuerySuite:
       - permanent UDTF *** FAILED *** (125 milliseconds)
         org.apache.spark.sql.AnalysisException: Undefined function: 'udtf_count_temp'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 24
      
       - describe functions - user defined functions *** FAILED *** (125 milliseconds)
         org.apache.spark.sql.AnalysisException: Undefined function: 'udtf_count'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
      
       - CTAS without serde with location *** FAILED *** (16 milliseconds)
         java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-ed673d73-edfc-404e-829e-2e2b9725d94e/c1
      
       - derived from Hive query file: drop_database_removes_partition_dirs.q *** FAILED *** (47 milliseconds)
         java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-d2ddf08e-699e-45be-9ebd-3dfe619680fe/drop_database_removes_partition_dirs_table
      
       - derived from Hive query file: drop_table_removes_partition_dirs.q *** FAILED *** (0 milliseconds)
         java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-d2ddf08e-699e-45be-9ebd-3dfe619680fe/drop_table_removes_partition_dirs_table2
      
       - SPARK-17796 Support wildcard character in filename for LOAD DATA LOCAL INPATH *** FAILED *** (109 milliseconds)
         java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/projects/spark/sql/hive/projectsspark	arget	mpspark-1a122f8c-dfb3-46c4-bab1-f30764baee0e/*part-r*
      ```
      
      ```
      HiveDDLSuite:
       - drop external tables in default database *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - add/drop partitions - external table *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - create/drop database - location without pre-created directory *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - create/drop database - location with pre-created directory *** FAILED *** (32 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - drop database containing tables - CASCADE *** FAILED *** (94 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - drop an empty database - CASCADE *** FAILED *** (63 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - drop database containing tables - RESTRICT *** FAILED *** (47 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - drop an empty database - RESTRICT *** FAILED *** (47 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - CREATE TABLE LIKE an external data source table *** FAILED *** (140 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-c5eba16d-07ae-4186-95bb-21c5811cf888;
      
       - CREATE TABLE LIKE an external Hive serde table *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - desc table for data source table - no user-defined schema *** FAILED *** (125 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-e8bf5bf5-721a-4cbe-9d6	at scala.collection.immutable.List.foreach(List.scala:381)d-5543a8301c1d;
      ```
      
      ```
      MetastoreDataSourcesSuite
       - CTAS: persisted bucketed data source table *** FAILED *** (16 milliseconds)
         java.lang.IllegalArgumentException: Can not create a Path from an empty string
      ```
      
      ```
      ShowCreateTableSuite:
       - simple external hive table *** FAILED *** (0 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      ```
      
      ```
      PartitionedTablePerfStatsSuite:
       - hive table: partitioned pruned table reports only selected files *** FAILED *** (313 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: partitioned pruned table reports only selected files *** FAILED *** (219 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-311f45f8-d064-4023-a4bb-e28235bff64d;
      
       - hive table: lazy partition pruning reads only necessary partition data *** FAILED *** (203 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: lazy partition pruning reads only necessary partition data *** FAILED *** (187 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-fde874ca-66bd-4d0b-a40f-a043b65bf957;
      
       - hive table: lazy partition pruning with file status caching enabled *** FAILED *** (188 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: lazy partition pruning with file status caching enabled *** FAILED *** (187 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-e6d20183-dd68-4145-acbe-4a509849accd;
      
       - hive table: file status caching respects refresh table and refreshByPath *** FAILED *** (172 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: file status caching respects refresh table and refreshByPath *** FAILED *** (203 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-8b2c9651-2adf-4d58-874f-659007e21463;
      
       - hive table: file status cache respects size limit *** FAILED *** (219 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: file status cache respects size limit *** FAILED *** (171 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-7835ab57-cb48-4d2c-bb1d-b46d5a4c47e4;
      
       - datasource table: table setup does not scan filesystem *** FAILED *** (266 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-20598d76-c004-42a7-8061-6c56f0eda5e2;
      
       - hive table: table setup does not scan filesystem *** FAILED *** (266 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - hive table: num hive client calls does not scale with partition count *** FAILED *** (2 seconds, 281 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: num hive client calls does not scale with partition count *** FAILED *** (2 seconds, 422 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-4cfed321-4d1d-4b48-8d34-5c169afff383;
      
       - hive table: files read and cached when filesource partition management is off *** FAILED *** (234 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: all partition data cached in memory when partition management is off *** FAILED *** (203 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-4bcc0398-15c9-4f6a-811e-12d40f3eec12;
      
       - SPARK-18700: table loaded only once even when resolved concurrently *** FAILED *** (1 second, 266 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      ```
      
      ```
      HiveSparkSubmitSuite:
       - temporary Hive UDF: define a UDF and use it *** FAILED *** (2 seconds, 94 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - permanent Hive UDF: define a UDF and use it *** FAILED *** (281 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - permanent Hive UDF: use a already defined permanent function *** FAILED *** (718 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-8368: includes jars passed in through --jars *** FAILED *** (3 seconds, 521 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-8020: set sql conf in spark conf *** FAILED *** (0 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-8489: MissingRequirementError during reflection *** FAILED *** (94 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-9757 Persist Parquet relation with decimal column *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-11009 fix wrong result of Window function in cluster mode *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-14244 fix window partition size attribute binding failure *** FAILED *** (78 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - set spark.sql.warehouse.dir *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - set hive.metastore.warehouse.dir *** FAILED *** (15 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-16901: set javax.jdo.option.ConnectionURL *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-18360: default table path of tables in default database should depend on the location of default database *** FAILED *** (15 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      ```
      
      ```
      UtilsSuite:
       - resolveURIs with multiple paths *** FAILED *** (0 milliseconds)
         ".../jar3,file:/C:/pi.py[%23]py.pi,file:/C:/path%..." did not equal ".../jar3,file:/C:/pi.py[#]py.pi,file:/C:/path%..." (UtilsSuite.scala:468)
      ```
      
      ```
      CheckpointSuite:
       - recovery with file input stream *** FAILED *** (10 seconds, 205 milliseconds)
         The code passed to eventually never returned normally. Attempted 660 times over 10.014272499999999 seconds. Last failure message: Unexpected internal error near index 1
         \
          ^. (CheckpointSuite.scala:680)
      ```
      
      ## How was this patch tested?
      
      Manually via AppVeyor as below:
      
      **Scala - aborted tests**
      
      ```
      WindowQuerySuite - all passed
      OrcSourceSuite:
      - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (4 seconds, 417 milliseconds)
        org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
      ParquetMetastoreSuite - all passed
      ParquetSourceSuite - all passed
      KafkaRDDSuite - all passed
      DirectKafkaStreamSuite - all passed
      ReliableKafkaStreamSuite - all passed
      KafkaStreamSuite - all passed
      KafkaClusterSuite - all passed
      DirectKafkaStreamSuite - all passed
      KafkaRDDSuite - all passed
      ```
      
      **Java - failed tests**
      
      ```
      org.apache.spark.streaming.kafka.JavaKafkaRDDSuite - all passed
      org.apache.spark.streaming.kafka.JavaDirectKafkaStreamSuite - all passed
      org.apache.spark.streaming.kafka.JavaKafkaStreamSuite - all passed
      org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite - all passed
      org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite - all passed
      ```
      
      **Scala - failed tests**
      
      ```
      PartitionProviderCompatibilitySuite:
      - insert overwrite partition of new datasource table overwrites just partition (1 second, 953 milliseconds)
      - SPARK-18635 special chars in partition values - partition management true (6 seconds, 31 milliseconds)
      - SPARK-18635 special chars in partition values - partition management false (4 seconds, 578 milliseconds)
      ```
      
      ```
      UtilsSuite:
      - reading offset bytes of a file (compressed) (203 milliseconds)
      - reading offset bytes across multiple files (compressed) (0 milliseconds)
      ```
      
      ```
      StatisticsSuite:
      - MetastoreRelations fallback to HDFS for size estimation (94 milliseconds)
      ```
      
      ```
      SQLQuerySuite:
       - permanent UDTF (407 milliseconds)
       - describe functions - user defined functions (441 milliseconds)
       - CTAS without serde with location (2 seconds, 831 milliseconds)
       - derived from Hive query file: drop_database_removes_partition_dirs.q (734 milliseconds)
       - derived from Hive query file: drop_table_removes_partition_dirs.q (563 milliseconds)
       - SPARK-17796 Support wildcard character in filename for LOAD DATA LOCAL INPATH (453 milliseconds)
      ```
      
      ```
      HiveDDLSuite:
       - drop external tables in default database (3 seconds, 5 milliseconds)
       - add/drop partitions - external table (2 seconds, 750 milliseconds)
       - create/drop database - location without pre-created directory (500 milliseconds)
       - create/drop database - location with pre-created directory (407 milliseconds)
       - drop database containing tables - CASCADE (453 milliseconds)
       - drop an empty database - CASCADE (375 milliseconds)
       - drop database containing tables - RESTRICT (328 milliseconds)
       - drop an empty database - RESTRICT (391 milliseconds)
       - CREATE TABLE LIKE an external data source table (953 milliseconds)
       - CREATE TABLE LIKE an external Hive serde table (3 seconds, 782 milliseconds)
       - desc table for data source table - no user-defined schema (1 second, 150 milliseconds)
      ```
      
      ```
      MetastoreDataSourcesSuite
       - CTAS: persisted bucketed data source table (875 milliseconds)
      ```
      
      ```
      ShowCreateTableSuite:
       - simple external hive table (78 milliseconds)
      ```
      
      ```
      PartitionedTablePerfStatsSuite:
       - hive table: partitioned pruned table reports only selected files (1 second, 109 milliseconds)
      - datasource table: partitioned pruned table reports only selected files (860 milliseconds)
       - hive table: lazy partition pruning reads only necessary partition data (859 milliseconds)
       - datasource table: lazy partition pruning reads only necessary partition data (1 second, 219 milliseconds)
       - hive table: lazy partition pruning with file status caching enabled (875 milliseconds)
       - datasource table: lazy partition pruning with file status caching enabled (890 milliseconds)
       - hive table: file status caching respects refresh table and refreshByPath (922 milliseconds)
       - datasource table: file status caching respects refresh table and refreshByPath (640 milliseconds)
       - hive table: file status cache respects size limit (469 milliseconds)
       - datasource table: file status cache respects size limit (453 milliseconds)
       - datasource table: table setup does not scan filesystem (328 milliseconds)
       - hive table: table setup does not scan filesystem (313 milliseconds)
       - hive table: num hive client calls does not scale with partition count (5 seconds, 431 milliseconds)
       - datasource table: num hive client calls does not scale with partition count (4 seconds, 79 milliseconds)
       - hive table: files read and cached when filesource partition management is off (656 milliseconds)
       - datasource table: all partition data cached in memory when partition management is off (484 milliseconds)
       - SPARK-18700: table loaded only once even when resolved concurrently (2 seconds, 578 milliseconds)
      ```
      
      ```
      HiveSparkSubmitSuite:
       - temporary Hive UDF: define a UDF and use it (1 second, 745 milliseconds)
       - permanent Hive UDF: define a UDF and use it (406 milliseconds)
       - permanent Hive UDF: use a already defined permanent function (375 milliseconds)
       - SPARK-8368: includes jars passed in through --jars (391 milliseconds)
       - SPARK-8020: set sql conf in spark conf (156 milliseconds)
       - SPARK-8489: MissingRequirementError during reflection (187 milliseconds)
       - SPARK-9757 Persist Parquet relation with decimal column (157 milliseconds)
       - SPARK-11009 fix wrong result of Window function in cluster mode (156 milliseconds)
       - SPARK-14244 fix window partition size attribute binding failure (156 milliseconds)
       - set spark.sql.warehouse.dir (172 milliseconds)
       - set hive.metastore.warehouse.dir (156 milliseconds)
       - SPARK-16901: set javax.jdo.option.ConnectionURL (157 milliseconds)
       - SPARK-18360: default table path of tables in default database should depend on the location of default database (172 milliseconds)
      ```
      
      ```
      UtilsSuite:
       - resolveURIs with multiple paths (0 milliseconds)
      ```
      
      ```
      CheckpointSuite:
       - recovery with file input stream (4 seconds, 452 milliseconds)
      ```
      
      Note: after resolving the aborted tests, there is a test failure identified as below:
      
      ```
      OrcSourceSuite:
      - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (4 seconds, 417 milliseconds)
        org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
      ```
      
      This does not look due to this problem so this PR does not fix it here.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16451 from HyukjinKwon/all-path-resource-fixes.
      Unverified
      4e27578f
  28. Jan 04, 2017
    • Niranjan Padmanabhan's avatar
      [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo · a1e40b1f
      Niranjan Padmanabhan authored
      ## What changes were proposed in this pull request?
      There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.
      
      ## How was this patch tested?
      N/A since only docs or comments were updated.
      
      Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>
      
      Closes #16455 from neurons/np.structure_streaming_doc.
      Unverified
      a1e40b1f
  29. Dec 22, 2016
    • saturday_s's avatar
      [SPARK-18537][WEB UI] Add a REST api to serve spark streaming information · ce99f51d
      saturday_s authored
      ## What changes were proposed in this pull request?
      
      This PR is an inheritance from #16000, and is a completion of #15904.
      
      **Description**
      
      - Augment the `org.apache.spark.status.api.v1` package for serving streaming information.
      - Retrieve the streaming information through StreamingJobProgressListener.
      
      > this api should cover exceptly the same amount of information as you can get from the web interface
      > the implementation is base on the current REST implementation of spark-core
      > and will be available for running applications only
      >
      > https://issues.apache.org/jira/browse/SPARK-18537
      
      ## How was this patch tested?
      
      Local test.
      
      Author: saturday_s <shi.indetail@gmail.com>
      Author: Chan Chor Pang <ChorPang.Chan@access-company.com>
      Author: peterCPChan <universknight@gmail.com>
      
      Closes #16253 from saturday-shi/SPARK-18537.
      ce99f51d
  30. Dec 21, 2016
  31. Dec 02, 2016
  32. Dec 01, 2016
    • Shixiong Zhu's avatar
      [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite.... · 086b0c8f
      Shixiong Zhu authored
      [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
      
      ## What changes were proposed in this pull request?
      
      Avoid to create multiple threads to stop StreamingContext. Otherwise, the latch added in #16091 can be passed too early.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16105 from zsxwing/SPARK-18617-2.
      086b0c8f
  33. Nov 30, 2016
    • Shixiong Zhu's avatar
      [SPARK-18617][SPARK-18560][TEST] Fix flaky test: StreamingContextSuite.... · 0a811210
      Shixiong Zhu authored
      [SPARK-18617][SPARK-18560][TEST] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
      
      ## What changes were proposed in this pull request?
      
      Fixed the potential SparkContext leak in `StreamingContextSuite.SPARK-18560 Receiver data should be deserialized properly` which was added in #16052. I also removed FakeByteArrayReceiver and used TestReceiver directly.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16091 from zsxwing/SPARK-18617-follow-up.
      0a811210
    • uncleGen's avatar
      [SPARK-18617][CORE][STREAMING] Close "kryo auto pick" feature for Spark Streaming · 56c82eda
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      #15992 provided a solution to fix the bug, i.e. **receiver data can not be deserialized properly**. As zsxwing said, it is a critical bug, but we should not break APIs between maintenance releases. It may be a rational choice to close auto pick kryo serializer for Spark Streaming in the first step. I will continue #15992 to optimize the solution.
      
      ## How was this patch tested?
      
      existing ut
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #16052 from uncleGen/SPARK-18617.
      56c82eda
  34. Nov 29, 2016
Loading