Skip to content
Snippets Groups Projects
  1. Mar 23, 2017
    • erenavsarogullari's avatar
      [SPARK-19567][CORE][SCHEDULER] Support some Schedulable variables immutability and access · b7be05a2
      erenavsarogullari authored
      ## What changes were proposed in this pull request?
      Some `Schedulable` Entities(`Pool` and `TaskSetManager`) variables need refactoring for _immutability_ and _access modifiers_ levels as follows:
      - From `var` to `val` (if there is no requirement): This is important to support immutability as much as possible.
        - Sample => `Pool`: `weight`, `minShare`, `priority`, `name` and `taskSetSchedulingAlgorithm`.
      - Access modifiers: Specially, `var`s access needs to be restricted from other parts of codebase to prevent potential side effects.
        - `TaskSetManager`: `tasksSuccessful`, `totalResultSize`, `calculatedTasks` etc...
      
      This PR is related with #15604 and has been created seperatedly to keep patch content as isolated and to help the reviewers.
      
      ## How was this patch tested?
      Added new UTs and existing UT coverage.
      
      Author: erenavsarogullari <erenavsarogullari@gmail.com>
      
      Closes #16905 from erenavsarogullari/SPARK-19567.
      b7be05a2
    • hyukjinkwon's avatar
      [MINOR][BUILD] Fix javadoc8 break · aefe7989
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      Several javadoc8 breaks have been introduced. This PR proposes fix those instances so that we can build Scala/Java API docs.
      
      ```
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:6: error: reference not found
      [error]  * <code>flatMapGroupsWithState</code> operations on {link KeyValueGroupedDataset}.
      [error]                                                             ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:10: error: reference not found
      [error]  * Both, <code>mapGroupsWithState</code> and <code>flatMapGroupsWithState</code> in {link KeyValueGroupedDataset}
      [error]                                                                                            ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:51: error: reference not found
      [error]  *    {link GroupStateTimeout.ProcessingTimeTimeout}) or event time (i.e.
      [error]              ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:52: error: reference not found
      [error]  *    {link GroupStateTimeout.EventTimeTimeout}).
      [error]              ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:158: error: reference not found
      [error]  *           Spark SQL types (see {link Encoder} for more details).
      [error]                                          ^
      [error] .../spark/mllib/target/java/org/apache/spark/ml/fpm/FPGrowthParams.java:26: error: bad use of '>'
      [error]    * Number of partitions (>=1) used by parallel FP-growth. By default the param is not set, and
      [error]                            ^
      [error] .../spark/sql/core/src/main/java/org/apache/spark/api/java/function/FlatMapGroupsWithStateFunction.java:30: error: reference not found
      [error]  * {link org.apache.spark.sql.KeyValueGroupedDataset#flatMapGroupsWithState(
      [error]           ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:211: error: reference not found
      [error]    * See {link GroupState} for more details.
      [error]                 ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:232: error: reference not found
      [error]    * See {link GroupState} for more details.
      [error]                 ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:254: error: reference not found
      [error]    * See {link GroupState} for more details.
      [error]                 ^
      [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:277: error: reference not found
      [error]    * See {link GroupState} for more details.
      [error]                 ^
      [error] .../spark/core/target/java/org/apache/spark/TaskContextImpl.java:10: error: reference not found
      [error]  * {link TaskMetrics} &amp; {link MetricsSystem} objects are not thread safe.
      [error]           ^
      [error] .../spark/core/target/java/org/apache/spark/TaskContextImpl.java:10: error: reference not found
      [error]  * {link TaskMetrics} &amp; {link MetricsSystem} objects are not thread safe.
      [error]                                     ^
      [info] 13 errors
      ```
      
      ```
      jekyll 3.3.1 | Error:  Unidoc generation failed
      ```
      
      ## How was this patch tested?
      
      Manually via `jekyll build`
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17389 from HyukjinKwon/minor-javadoc8-fix.
      aefe7989
  2. Mar 21, 2017
  3. Mar 20, 2017
    • Michael Allman's avatar
      [SPARK-17204][CORE] Fix replicated off heap storage · 7fa116f8
      Michael Allman authored
      (Jira: https://issues.apache.org/jira/browse/SPARK-17204)
      
      ## What changes were proposed in this pull request?
      
      There are a couple of bugs in the `BlockManager` with respect to support for replicated off-heap storage. First, the locally-stored off-heap byte buffer is disposed of when it is replicated. It should not be. Second, the replica byte buffers are stored as heap byte buffers instead of direct byte buffers even when the storage level memory mode is off-heap. This PR addresses both of these problems.
      
      ## How was this patch tested?
      
      `BlockManagerReplicationSuite` was enhanced to fill in the coverage gaps. It now fails if either of the bugs in this PR exist.
      
      Author: Michael Allman <michael@videoamp.com>
      
      Closes #16499 from mallman/spark-17204-replicated_off_heap_storage.
      7fa116f8
  4. Mar 19, 2017
    • Felix Cheung's avatar
      [SPARK-18817][SPARKR][SQL] change derby log output to temp dir · 422aa67d
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, set System.Property for derby home dir to move derby.log
      
      ## How was this patch tested?
      
      Manually, unit tests
      
      With this, these are relocated to under /tmp
      ```
      # ls /tmp/RtmpG2M0cB/
      derby.log
      ```
      And they are removed automatically when the R session is ended.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16330 from felixcheung/rderby.
      422aa67d
  5. Mar 18, 2017
    • Sean Owen's avatar
      [SPARK-16599][CORE] java.util.NoSuchElementException: None.get at at... · 54e61df2
      Sean Owen authored
      [SPARK-16599][CORE] java.util.NoSuchElementException: None.get at at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask
      
      ## What changes were proposed in this pull request?
      
      Avoid None.get exception in (rare?) case that no readLocks exist
      Note that while this would resolve the immediate cause of the exception, it's not clear it is the root problem.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17290 from srowen/SPARK-16599.
      54e61df2
  6. Mar 17, 2017
    • Sital Kedia's avatar
      [SPARK-13369] Add config for number of consecutive fetch failures · 7b5d873a
      Sital Kedia authored
      The previously hardcoded max 4 retries per stage is not suitable for all cluster configurations. Since spark retries a stage at the sign of the first fetch failure, you can easily end up with many stage retries to discover all the failures. In particular, two scenarios this value should change are (1) if there are more than 4 executors per node; in that case, it may take 4 retries to discover the problem with each executor on the node and (2) during cluster maintenance on large clusters, where multiple machines are serviced at once, but you also cannot afford total cluster downtime. By making this value configurable, cluster managers can tune this value to something more appropriate to their cluster configuration.
      
      Unit tests
      
      Author: Sital Kedia <skedia@fb.com>
      
      Closes #17307 from sitalkedia/SPARK-13369.
      7b5d873a
  7. Mar 16, 2017
    • Bogdan Raducanu's avatar
      [SPARK-19946][TESTING] DebugFilesystem.assertNoOpenStreams should report the... · ee91a0de
      Bogdan Raducanu authored
      [SPARK-19946][TESTING] DebugFilesystem.assertNoOpenStreams should report the open streams to help debugging
      
      ## What changes were proposed in this pull request?
      
      DebugFilesystem.assertNoOpenStreams throws an exception with a cause exception that actually shows the code line which leaked the stream.
      
      ## How was this patch tested?
      New test in SparkContextSuite to check there is a cause exception.
      
      Author: Bogdan Raducanu <bogdan@databricks.com>
      
      Closes #17292 from bogdanrdc/SPARK-19946.
      ee91a0de
  8. Mar 15, 2017
    • erenavsarogullari's avatar
      [SPARK-18066][CORE][TESTS] Add Pool usage policies test coverage for FIFO & FAIR Schedulers · 046b8d4a
      erenavsarogullari authored
      ## What changes were proposed in this pull request?
      
      The following FIFO & FAIR Schedulers Pool usage cases need to have unit test coverage :
      - FIFO Scheduler just uses **root pool** so even if `spark.scheduler.pool` property is set, related pool is not created and `TaskSetManagers` are added to **root pool**.
      - FAIR Scheduler uses `default pool` when `spark.scheduler.pool` property is not set. This can be happened when
        - `Properties` object is **null**,
        - `Properties` object is **empty**(`new Properties()`),
        - **default pool** is set(`spark.scheduler.pool=default`).
      - FAIR Scheduler creates a **new pool** with **default values** when `spark.scheduler.pool` property points a **non-existent** pool. This can be happened when **scheduler allocation file** is not set or it does not contain related pool.
      ## How was this patch tested?
      
      New Unit tests are added.
      
      Author: erenavsarogullari <erenavsarogullari@gmail.com>
      
      Closes #15604 from erenavsarogullari/SPARK-18066.
      046b8d4a
    • jiangxingbo's avatar
      [SPARK-19960][CORE] Move `SparkHadoopWriter` to `internal/io/` · 97cc5e5a
      jiangxingbo authored
      ## What changes were proposed in this pull request?
      
      This PR introduces the following changes:
      1. Move `SparkHadoopWriter` to `core/internal/io/`, so that it's in the same directory with `SparkHadoopMapReduceWriter`;
      2. Move `SparkHadoopWriterUtils` to a separated file.
      
      After this PR is merged, we may consolidate `SparkHadoopWriter` and `SparkHadoopMapReduceWriter`, and make the new commit protocol support the old `mapred` package's committer;
      
      ## How was this patch tested?
      
      Tested by existing test cases.
      
      Author: jiangxingbo <jiangxb1987@gmail.com>
      
      Closes #17304 from jiangxb1987/writer.
      97cc5e5a
    • Herman van Hovell's avatar
      [SPARK-19889][SQL] Make TaskContext callbacks thread safe · 9ff85be3
      Herman van Hovell authored
      ## What changes were proposed in this pull request?
      It is sometimes useful to use multiple threads in a task to parallelize tasks. These threads might register some completion/failure listeners to clean up when the task completes or fails. We currently cannot register such a callback and be sure that it will get called, because the context might be in the process of invoking its callbacks, when the the callback gets registered.
      
      This PR improves this by making sure that you cannot add a completion/failure listener from a different thread when the context is being marked as completed/failed in another thread. This is done by synchronizing these methods on the task context itself.
      
      Failure listeners were called only once. Completion listeners now follow the same pattern; this lifts the idempotency requirement for completion listeners and makes it easier to implement them. In some cases we can (accidentally) add a completion/failure listener after the fact, these listeners will be called immediately in order make sure we can safely clean-up after a task.
      
      As a result of this change we could make the `failure` and `completed` flags non-volatile. The `isCompleted()` method now uses synchronization to ensure that updates are visible across threads.
      
      ## How was this patch tested?
      Adding tests to `TaskContestSuite` to test adding listeners to a completed/failed context.
      
      Author: Herman van Hovell <hvanhovell@databricks.com>
      
      Closes #17244 from hvanhovell/SPARK-19889.
      9ff85be3
  9. Mar 12, 2017
    • xiaojian.fxj's avatar
      [SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the... · 2f5187bd
      xiaojian.fxj authored
      [SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block
      
      Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend ThreadSafeRpcEndpoint. If the heartbeat from a worker is blocked by the message ApplicationFinished, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again.
      It had better reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block.
      
      Author: xiaojian.fxj <xiaojian.fxj@alibaba-inc.com>
      
      Closes #17189 from hustfxj/worker-hearbeat.
      2f5187bd
  10. Mar 10, 2017
  11. Mar 09, 2017
    • jinxing's avatar
      [SPARK-19793] Use clock.getTimeMillis when mark task as finished in TaskSetManager. · 3232e54f
      jinxing authored
      ## What changes were proposed in this pull request?
      
      TaskSetManager is now using `System.getCurrentTimeMillis` when mark task as finished in `handleSuccessfulTask` and `handleFailedTask`. Thus developer cannot set the tasks finishing time in unit test. When `handleSuccessfulTask`, task's duration = `System.getCurrentTimeMillis` - launchTime(which can be set by `clock`), the result is not correct.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: jinxing <jinxing6042@126.com>
      
      Closes #17133 from jinxing64/SPARK-19793.
      3232e54f
    • Jimmy Xiang's avatar
      [SPARK-19757][CORE] DriverEndpoint#makeOffers race against... · b60b9fc1
      Jimmy Xiang authored
      [SPARK-19757][CORE] DriverEndpoint#makeOffers race against CoarseGrainedSchedulerBackend#killExecutors
      
      ## What changes were proposed in this pull request?
      While some executors are being killed due to idleness, if some new tasks come in, driver could assign them to some executors are being killed. These tasks will fail later when the executors are lost. This patch is to make sure CoarseGrainedSchedulerBackend#killExecutors and DriverEndpoint#makeOffers are properly synchronized.
      
      ## How was this patch tested?
      manual tests
      
      Author: Jimmy Xiang <jxiang@apache.org>
      
      Closes #17091 from jxiang/spark-19757.
      b60b9fc1
  12. Mar 07, 2017
    • uncleGen's avatar
      [SPARK-19803][TEST] flaky BlockManagerReplicationSuite test failure · 49570ed0
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      200ms may be too short. Give more time for replication to happen and new block be reported to master
      
      ## How was this patch tested?
      
      test manully
      
      Author: uncleGen <hustyugm@gmail.com>
      Author: dylon <hustyugm@gmail.com>
      
      Closes #17144 from uncleGen/SPARK-19803.
      49570ed0
  13. Mar 06, 2017
    • Imran Rashid's avatar
      [SPARK-19796][CORE] Fix serialization of long property values in TaskDescription · 12bf8324
      Imran Rashid authored
      ## What changes were proposed in this pull request?
      
      The properties that are serialized with a TaskDescription can have very long values (eg. "spark.job.description" which is set to the full sql statement with the thrift-server).  DataOutputStream.writeUTF() does not work well for long strings, so this changes the way those values are serialized to handle longer strings.
      
      ## How was this patch tested?
      
      Updated existing unit test to reproduce the issue.  All unit tests via jenkins.
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #17140 from squito/SPARK-19796.
      12bf8324
  14. Mar 05, 2017
    • liuxian's avatar
      [SPARK-19792][WEBUI] In the Master Page,the column named “Memory per Node” ,I... · 42c4cd9e
      liuxian authored
      [SPARK-19792][WEBUI] In the Master Page,the column named “Memory per Node” ,I think it is not all right
      
      Signed-off-by: liuxian <liu.xian3zte.com.cn>
      
      ## What changes were proposed in this pull request?
      
      Open the spark web page,in the Master Page ,have two tables:Running Applications table and Completed Applications table, to the column named “Memory per Node” ,I think it is not all right ,because a node may be not have only one executor.So I think that should be named as “Memory per Executor”.Otherwise easy to let the user misunderstanding
      
      ## How was this patch tested?
      
      N/A
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #17132 from 10110346/wid-lx-0302.
      42c4cd9e
  15. Mar 03, 2017
  16. Mar 02, 2017
    • Imran Rashid's avatar
      [SPARK-19276][CORE] Fetch Failure handling robust to user error handling · 8417a7ae
      Imran Rashid authored
      ## What changes were proposed in this pull request?
      
      Fault-tolerance in spark requires special handling of shuffle fetch
      failures.  The Executor would catch FetchFailedException and send a
      special msg back to the driver.
      
      However, intervening user code could intercept that exception, and wrap
      it with something else.  This even happens in SparkSQL.  So rather than
      checking the thrown exception only, we'll store the fetch failure directly
      in the TaskContext, where users can't touch it.
      
      ## How was this patch tested?
      
      Added a test case which failed before the fix.  Full test suite via jenkins.
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #16639 from squito/SPARK-19276.
      8417a7ae
    • Patrick Woody's avatar
      [SPARK-19631][CORE] OutputCommitCoordinator should not allow commits for already failed tasks · 433d9eb6
      Patrick Woody authored
      ## What changes were proposed in this pull request?
      
      Previously it was possible for there to be a race between a task failure and committing the output of a task. For example, the driver may mark a task attempt as failed due to an executor heartbeat timeout (possibly due to GC), but the task attempt actually ends up coordinating with the OutputCommitCoordinator once the executor recovers and committing its result. This will lead to any retry attempt failing because the task result has already been committed despite the original attempt failing.
      
      This ensures that any previously failed task attempts cannot enter the commit protocol.
      
      ## How was this patch tested?
      
      Added a unit test
      
      Author: Patrick Woody <pwoody@palantir.com>
      
      Closes #16959 from pwoody/pw/recordFailuresForCommitter.
      433d9eb6
    • Mark Grover's avatar
      [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console · 5ae3516b
      Mark Grover authored
      ## What changes were proposed in this pull request?
      This change redacts senstive information (based on `spark.redaction.regex` property)
      from the Spark Submit console logs. Such sensitive information is already being
      redacted from event logs and yarn logs, etc.
      
      ## How was this patch tested?
      Testing was done manually to make sure that the console logs were not printing any
      sensitive information.
      
      Here's some output from the console:
      
      ```
      Spark properties used, including those specified through
       --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
        (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
        (spark.authenticate,false)
        (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      ```
      
      ```
      System properties:
      (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      (spark.authenticate,false)
      (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      ```
      There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.
      
      Running unit tests to make sure nothing else is broken by this change.
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #17047 from markgrover/master_redaction.
      5ae3516b
  17. Mar 01, 2017
    • GavinGavinNo1's avatar
      [SPARK-13931] Stage can hang if an executor fails while speculated tasks are running · 89990a01
      GavinGavinNo1 authored
      ## What changes were proposed in this pull request?
      When function 'executorLost' is invoked in class 'TaskSetManager', it's significant to judge whether variable 'isZombie' is set to true.
      
      This pull request fixes the following hang:
      
      1.Open speculation switch in the application.
      2.Run this app and suppose last task of shuffleMapStage 1 finishes. Let's get the record straight, from the eyes of DAG, this stage really finishes, and from the eyes of TaskSetManager, variable 'isZombie' is set to true, but variable runningTasksSet isn't empty because of speculation.
      3.Suddenly, executor 3 is lost. TaskScheduler receiving this signal, invokes all executorLost functions of rootPool's taskSetManagers. DAG receiving this signal, removes all this executor's outputLocs.
      4.TaskSetManager adds all this executor's tasks to pendingTasks and tells DAG they will be resubmitted (Attention: possibly not on time).
      5.DAG starts to submit a new waitingStage, let's say shuffleMapStage 2, and going to find that shuffleMapStage 1 is its missing parent because some outputLocs are removed due to executor lost. Then DAG submits shuffleMapStage 1 again.
      6.DAG still receives Task 'Resubmitted' signal from old taskSetManager, and increases the number of pendingTasks of shuffleMapStage 1 each time. However, old taskSetManager won't resolve new task to submit because its variable 'isZombie' is set to true.
      7.Finally shuffleMapStage 1 never finishes in DAG together with all stages depending on it.
      
      ## How was this patch tested?
      
      It's quite difficult to construct test cases.
      
      Author: GavinGavinNo1 <gavingavinno1@gmail.com>
      Author: 16092929 <16092929@cnsuning.com>
      
      Closes #16855 from GavinGavinNo1/resolve-stage-blocked2.
      89990a01
    • jinxing's avatar
      [SPARK-19777] Scan runningTasksSet when check speculatable tasks in TaskSetManager. · 51be6336
      jinxing authored
      ## What changes were proposed in this pull request?
      
      When check speculatable tasks in `TaskSetManager`, only scan `runningTasksSet` instead of scanning all `taskInfos`.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: jinxing <jinxing6042@126.com>
      
      Closes #17111 from jinxing64/SPARK-19777.
      51be6336
  18. Feb 28, 2017
  19. Feb 27, 2017
    • hyukjinkwon's avatar
      [MINOR][BUILD] Fix lint-java breaks in Java · 4ba9c6c4
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to fix the lint-breaks as below:
      
      ```
      [ERROR] src/test/java/org/apache/spark/network/TransportResponseHandlerSuite.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.network.buffer.ManagedBuffer.
      [ERROR] src/main/java/org/apache/spark/unsafe/types/UTF8String.java:[156,10] (modifier) ModifierOrder: 'Nonnull' annotation modifier does not precede non-annotation modifiers.
      [ERROR] src/main/java/org/apache/spark/SparkFirehoseListener.java:[122] (sizes) LineLength: Line is longer than 100 characters (found 105).
      [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[164,78] (coding) OneStatementPerLine: Only one statement per line allowed.
      [ERROR] src/test/java/test/org/apache/spark/JavaAPISuite.java:[1157] (sizes) LineLength: Line is longer than 100 characters (found 121).
      [ERROR] src/test/java/org/apache/spark/streaming/JavaMapWithStateSuite.java:[149] (sizes) LineLength: Line is longer than 100 characters (found 113).
      [ERROR] src/test/java/test/org/apache/spark/streaming/Java8APISuite.java:[146] (sizes) LineLength: Line is longer than 100 characters (found 122).
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[32,8] (imports) UnusedImports: Unused import - org.apache.spark.streaming.Time.
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[611] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[1317] (sizes) LineLength: Line is longer than 100 characters (found 102).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java:[91] (sizes) LineLength: Line is longer than 100 characters (found 102).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[113] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[164] (sizes) LineLength: Line is longer than 100 characters (found 110).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[212] (sizes) LineLength: Line is longer than 100 characters (found 114).
      [ERROR] src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java:[36] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java:[26,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
      [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[20,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
      [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[94] (sizes) LineLength: Line is longer than 100 characters (found 103).
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[30,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.api.java.UDF1.
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[72] (sizes) LineLength: Line is longer than 100 characters (found 104).
      [ERROR] src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java:[121] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[28,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaRDD.
      [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaSparkContext.
      ```
      
      ## How was this patch tested?
      
      Manually via
      
      ```bash
      ./dev/lint-java
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17072 from HyukjinKwon/java-lint.
      4ba9c6c4
  20. Feb 26, 2017
    • Eyal Zituny's avatar
      [SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle... · 9f8e3921
      Eyal Zituny authored
      [SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle QueryTerminatedEvent if more then one listeners exists
      
      ## What changes were proposed in this pull request?
      
      currently if multiple streaming queries listeners exists, when a QueryTerminatedEvent is triggered, only one of the listeners will be invoked while the rest of the listeners will ignore the event.
      this is caused since the the streaming queries listeners bus holds a set of running queries ids and when a termination event is triggered, after the first listeners is handling the event, the terminated query id is being removed from the set.
      in this PR, the query id will be removed from the set only after all the listeners handles the event
      
      ## How was this patch tested?
      
      a test with multiple listeners has been added to StreamingQueryListenerSuite
      
      Author: Eyal Zituny <eyal.zituny@equalum.io>
      
      Closes #16991 from eyalzit/master.
      9f8e3921
  21. Feb 24, 2017
    • Shubham Chopra's avatar
      [SPARK-15355][CORE] Proactive block replication · fa7c582e
      Shubham Chopra authored
      ## What changes were proposed in this pull request?
      
      We are proposing addition of pro-active block replication in case of executor failures. BlockManagerMasterEndpoint does all the book-keeping to keep a track of all the executors and the blocks they hold. It also keeps a track of which executors are alive through heartbeats. When an executor is removed, all this book-keeping state is updated to reflect the lost executor. This step can be used to identify executors that are still in possession of a copy of the cached data and a message could be sent to them to use the existing "replicate" function to find and place new replicas on other suitable hosts. Blocks replicated this way will let the master know of their existence.
      
      This can happen when an executor is lost, and would that way be pro-active as opposed be being done at query time.
      ## How was this patch tested?
      
      This patch was tested with existing unit tests along with new unit tests added to test the functionality.
      
      Author: Shubham Chopra <schopra31@bloomberg.net>
      
      Closes #14412 from shubhamchopra/ProactiveBlockReplication.
      fa7c582e
    • Jeff Zhang's avatar
      [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated to python worker · 330c3e33
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      self.environment will be propagated to executor. Should set PYTHONHASHSEED as long as the python version is greater than 3.3
      
      ## How was this patch tested?
      Manually tested it.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #11211 from zjffdu/SPARK-13330.
      330c3e33
    • Imran Rashid's avatar
      [SPARK-19597][CORE] test case for task deserialization errors · 5f74148b
      Imran Rashid authored
      Adds a test case that ensures that Executors gracefully handle a task that fails to deserialize, by sending back a reasonable failure message.  This does not change any behavior (the prior behavior was already correct), it just adds a test case to prevent regression.
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #16930 from squito/executor_task_deserialization.
      5f74148b
    • Kay Ousterhout's avatar
      [SPARK-19560] Improve DAGScheduler tests. · 5cbd3b59
      Kay Ousterhout authored
      This commit improves the tests that check the case when a
      ShuffleMapTask completes successfully on an executor that has
      failed.  This commit improves the commenting around the existing
      test for this, and adds some additional checks to make it more
      clear what went wrong if the tests fail (the fact that these
      tests are hard to understand came up in the context of markhamstra's
      proposed fix for #16620).
      
      This commit also removes a test that I realized tested exactly
      the same functionality.
      
      markhamstra, I verified that the new version of the test still fails (and
      in a more helpful way) for your proposed change for #16620.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #16892 from kayousterhout/SPARK-19560.
      5cbd3b59
    • wangzhenhua's avatar
      [SPARK-17078][SQL] Show stats when explain · 69d0da63
      wangzhenhua authored
      ## What changes were proposed in this pull request?
      
      Currently we can only check the estimated stats in logical plans by debugging. We need to provide an easier and more efficient way for developers/users.
      
      In this pr, we add EXPLAIN COST command to show stats in the optimized logical plan.
      E.g.
      ```
      spark-sql> EXPLAIN COST select count(1) from store_returns;
      
      ...
      == Optimized Logical Plan ==
      Aggregate [count(1) AS count(1)#24L], Statistics(sizeInBytes=16.0 B, rowCount=1, isBroadcastable=false)
      +- Project, Statistics(sizeInBytes=4.3 GB, rowCount=5.76E+8, isBroadcastable=false)
         +- Relation[sr_returned_date_sk#3,sr_return_time_sk#4,sr_item_sk#5,sr_customer_sk#6,sr_cdemo_sk#7,sr_hdemo_sk#8,sr_addr_sk#9,sr_store_sk#10,sr_reason_sk#11,sr_ticket_number#12,sr_return_quantity#13,sr_return_amt#14,sr_return_tax#15,sr_return_amt_inc_tax#16,sr_fee#17,sr_return_ship_cost#18,sr_refunded_cash#19,sr_reversed_charge#20,sr_store_credit#21,sr_net_loss#22] parquet, Statistics(sizeInBytes=28.6 GB, rowCount=5.76E+8, isBroadcastable=false)
      ...
      ```
      
      ## How was this patch tested?
      
      Add test cases.
      
      Author: wangzhenhua <wangzhenhua@huawei.com>
      Author: Zhenhua Wang <wzh_zju@163.com>
      
      Closes #16594 from wzhfy/showStats.
      69d0da63
    • jerryshao's avatar
      [SPARK-19707][CORE] Improve the invalid path check for sc.addJar · b0a8c16f
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Currently in Spark there're two issues when we add jars with invalid path:
      
      * If the jar path is a empty string {--jar ",dummy.jar"}, then Spark will resolve it to the current directory path and add to classpath / file server, which is unwanted. This is happened in our programatic way to submit Spark application. From my understanding Spark should defensively filter out such empty path.
      * If the jar path is a invalid path (file doesn't exist), `addJar` doesn't check it and will still add to file server, the exception will be delayed until job running. Actually this local path could be checked beforehand, no need to wait until task running. We have similar check in `addFile`, but lacks similar similar mechanism in `addJar`.
      
      ## How was this patch tested?
      
      Add unit test and local manual verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17038 from jerryshao/SPARK-19707.
      b0a8c16f
  22. Feb 22, 2017
    • uncleGen's avatar
      [SPARK-16122][CORE] Add rest api for job environment · 66c4b79a
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      add rest api for job environment.
      
      ## How was this patch tested?
      
      existing ut.
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #16949 from uncleGen/SPARK-16122.
      66c4b79a
  23. Feb 21, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19652][UI] Do auth checks for REST API access. · 17d83e1e
      Marcelo Vanzin authored
      The REST API has a security filter that performs auth checks
      based on the UI root's security manager. That works fine when
      the UI root is the app's UI, but not when it's the history server.
      
      In the SHS case, all users would be allowed to see all applications
      through the REST API, even if the UI itself wouldn't be available
      to them.
      
      This change adds auth checks for each app access through the API
      too, so that only authorized users can see the app's data.
      
      The change also modifies the existing security filter to use
      `HttpServletRequest.getRemoteUser()`, which is used in other
      places. That is not necessarily the same as the principal's
      name; for example, when using Hadoop's SPNEGO auth filter,
      the remote user strips the realm information, which then matches
      the user name registered as the owner of the application.
      
      I also renamed the UIRootFromServletContext trait to a more generic
      name since I'm using it to store more context information now.
      
      Tested manually with an authentication filter enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16978 from vanzin/SPARK-19652.
      17d83e1e
  24. Feb 20, 2017
    • hyukjinkwon's avatar
      [SPARK-18922][TESTS] Fix new test failures on Windows due to path and resource not closed · 17b93b5f
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to fix new test failures on WIndows as below:
      
      **Before**
      
      ```
      KafkaRelationSuite:
       - test late binding start offsets *** FAILED *** (7 seconds, 679 milliseconds)
         Cause: java.nio.file.FileSystemException: C:\projects\spark\target\tmp\spark-4c4b0cd1-4cb7-4908-949d-1b0cc8addb50\topic-4-0\00000000000000000000.log -> C:\projects\spark\target\tmp\spark-4c4b0cd1-4cb7-4908-949d-1b0cc8addb50\topic-4-0\00000000000000000000.log.deleted: The process cannot access the file because it is being used by another process.
      
      KafkaSourceSuite:
       - deserialization of initial offset with Spark 2.1.0 *** FAILED *** (3 seconds, 542 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-97ef64fc-ae61-4ce3-ac59-287fd38bd824
      
       - deserialization of initial offset written by Spark 2.1.0 *** FAILED *** (60 milliseconds)
         java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/projects/spark/external/kafka-0-10-sql/target/scala-2.11/test-classes/kafka-source-initial-offset-version-2.1.0.b
      
      HiveDDLSuite:
       - partitioned table should always put partition columns at the end of table schema *** FAILED *** (657 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-f1b83d09-850a-4bba-8e43-a2a28dfaa757;
      
      DDLSuite:
       - create a data source table without schema *** FAILED *** (94 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-a3f3c161-afae-4d6f-9182-e8642f77062b;
      
       - SET LOCATION for managed table *** FAILED *** (219 milliseconds)
         org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
       Exchange SinglePartit
       +- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#99367L])
          +- *FileScan parquet default.tbl[] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/C:projectsspark	arget	mpspark-15be2f2f-4ea9-4c47-bfee-1b7b49363033], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
      
       - insert data to a data source table which has a not existed location should succeed *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-34987671-e8d1-4624-ba5b-db1012e1246b;
      
       - insert into a data source table with no existed partition location should succeed *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-4c6ccfbf-4091-4032-9fbc-3d40c58267d5;
      
       - read data from a data source table which has a not existed location should succeed *** FAILED *** (0 milliseconds)
      
       - read data from a data source table with no existed partition location should succeed *** FAILED *** (0 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-6af39e37-abd1-44e8-ac68-e2dfcf67a2f3;
      
      InputOutputMetricsSuite:
       - output metrics on records written *** FAILED *** (0 milliseconds)
         java.lang.IllegalArgumentException: Wrong FS: file://C:\projects\spark\target\tmp\spark-cd69ee77-88f2-4202-bed6-19c0ee05ef55\InputOutputMetricsSuite, expected: file:///
      
       - output metrics on records written - new Hadoop API *** FAILED *** (16 milliseconds)
         java.lang.IllegalArgumentException: Wrong FS: file://C:\projects\spark\target\tmp\spark-b69e8fcb-047b-4de8-9cdf-5f026efb6762\InputOutputMetricsSuite, expected: file:///
      ```
      
      **After**
      
      ```
      KafkaRelationSuite:
       - test late binding start offsets !!! CANCELED !!! (62 milliseconds)
      
      KafkaSourceSuite:
       - deserialization of initial offset with Spark 2.1.0 (5 seconds, 341 milliseconds)
       - deserialization of initial offset written by Spark 2.1.0 (910 milliseconds)
      
      HiveDDLSuite:
       - partitioned table should always put partition columns at the end of table schema (2 seconds)
      
      DDLSuite:
       - create a data source table without schema (828 milliseconds)
       - SET LOCATION for managed table (406 milliseconds)
       - insert data to a data source table which has a not existed location should succeed (406 milliseconds)
       - insert into a data source table with no existed partition location should succeed (453 milliseconds)
       - read data from a data source table which has a not existed location should succeed (94 milliseconds)
       - read data from a data source table with no existed partition location should succeed (265 milliseconds)
      
      InputOutputMetricsSuite:
       - output metrics on records written (172 milliseconds)
       - output metrics on records written - new Hadoop API (297 milliseconds)
      ```
      
      ## How was this patch tested?
      
      Fixed tests in `InputOutputMetricsSuite`, `KafkaRelationSuite`,  `KafkaSourceSuite`, `DDLSuite.scala` and `HiveDDLSuite`.
      
      Manually tested via AppVeyor as below:
      
      `InputOutputMetricsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/633-20170219-windows-test/job/ex8nvwa6tsh7rmto
      `KafkaRelationSuite`: https://ci.appveyor.com/project/spark-test/spark/build/633-20170219-windows-test/job/h8dlcowew52y8ncw
      `KafkaSourceSuite`: https://ci.appveyor.com/project/spark-test/spark/build/634-20170219-windows-test/job/9ybgjl7yeubxcre4
      `DDLSuite`: https://ci.appveyor.com/project/spark-test/spark/build/635-20170219-windows-test
      `HiveDDLSuite`: https://ci.appveyor.com/project/spark-test/spark/build/633-20170219-windows-test/job/up6o9n47er087ltb
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16999 from HyukjinKwon/windows-fix.
      Unverified
      17b93b5f
    • Liang-Chi Hsieh's avatar
      [SPARK-19508][CORE] Improve error message when binding service fails · 33941914
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Utils provides a helper function to bind service on port. This function can bind the service to a random free port. However, if the binding fails on a random free port, the retrying and final exception messages look confusing.
      
          17/02/06 16:25:43 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
          17/02/06 16:25:43 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
          17/02/06 16:25:43 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
          17/02/06 16:25:43 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
          17/02/06 16:25:43 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
          ...
          17/02/06 16:25:43 ERROR SparkContext: Error initializing SparkContext.
          java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries (starting from 0)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
      
      ## How was this patch tested?
      
      Jenkins tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #16851 from viirya/better-log-message.
      Unverified
      33941914
    • Reynold Xin's avatar
      [SPARK-19669][SQL] Open up visibility for sharedState, sessionState, and a few other functions · 0733a54a
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      To ease debugging, most of Spark SQL internals have public level visibility. Two of the most important internal states, sharedState and sessionState, however, are package private. It would make more sense to open these up as well with clear documentation that they are internal.
      
      In addition, users currently have way to set active/default SparkSession, but no way to actually get them back. We should open those up as well.
      
      ## How was this patch tested?
      N/A - only visibility change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #17002 from rxin/SPARK-19669.
      0733a54a
    • Sean Owen's avatar
      [SPARK-19646][CORE][STREAMING] binaryRecords replicates records in scala API · d0ecca60
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Use `BytesWritable.copyBytes`, not `getBytes`, because `getBytes` returns the underlying array, which may be reused when repeated reads don't need a different size, as is the case with binaryRecords APIs
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16974 from srowen/SPARK-19646.
      Unverified
      d0ecca60
Loading