Skip to content
Snippets Groups Projects
  1. Sep 05, 2017
    • hyukjinkwon's avatar
      [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0. · 7f3c6ff4
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      1.0.0 fixes an issue with import order, explicit type for public methods, line length limitation and comment validation:
      
      ```
      [error] .../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala:50:16: Are you sure you want to println? If yes, wrap the code block with
      [error]       // scalastyle:off println
      [error]       println(...)
      [error]       // scalastyle:on println
      [error] .../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala:49: File line length exceeds 100 characters
      [error] .../spark/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala:22:21: Are you sure you want to println? If yes, wrap the code block with
      [error]       // scalastyle:off println
      [error]       println(...)
      [error]       // scalastyle:on println
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:35:6: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:51:6: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:93:15: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:98:15: Public method must have explicit type
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:47:2: Insert a space after the start of the comment
      [error] .../spark/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala:26:43: JavaDStream should come before JavaDStreamLike.
      ```
      
      This PR also fixes the workaround added in SPARK-16877 for `org.scalastyle.scalariform.OverrideJavaChecker` feature, added from 0.9.0.
      
      ## How was this patch tested?
      
      Manually tested.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #19116 from HyukjinKwon/scalastyle-1.0.0.
      7f3c6ff4
  2. Sep 01, 2017
    • Sean Owen's avatar
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala... · 12ab7f7e
      Sean Owen authored
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation
      
      …build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
      
      ## What changes were proposed in this pull request?
      
      This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
      
      In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
      
      It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.
      
      - Scalatest 2.x -> 3.0.3
      - Chill 0.8.0 -> 0.8.4
      - Clapper 1.0.x -> 1.1.2
      - json4s 3.2.x -> 3.4.2
      - Jackson 2.6.x -> 2.7.9 (required by json4s)
      
      This change does _not_ fully enable a Scala 2.12 build:
      
      - It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
      - It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.
      
      What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
      
      ## How was this patch tested?
      
      Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #18645 from srowen/SPARK-14280.
      12ab7f7e
  3. Aug 25, 2017
    • jerryshao's avatar
      [SPARK-21714][CORE][YARN] Avoiding re-uploading remote resources in yarn client mode · 1813c4a8
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      With SPARK-10643, Spark supports download resources from remote in client deploy mode. But the implementation overrides variables which representing added resources (like `args.jars`, `args.pyFiles`) to local path, And yarn client leverage this local path to re-upload resources to distributed cache. This is unnecessary to break the semantics of putting resources in a shared FS. So here proposed to fix it.
      
      ## How was this patch tested?
      
      This is manually verified with jars, pyFiles in local and remote storage, both in client and cluster mode.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #18962 from jerryshao/SPARK-21714.
      1813c4a8
  4. Aug 01, 2017
    • Devaraj K's avatar
      [SPARK-21339][CORE] spark-shell --packages option does not add jars to classpath on windows · 58da1a24
      Devaraj K authored
      The --packages option jars are getting added to the classpath with the scheme as "file:///", in Unix it doesn't have problem with this since the scheme contains the Unix Path separator which separates the jar name with location in the classpath. In Windows, the jar file is not getting resolved from the classpath because of the scheme.
      
      Windows : file:///C:/Users/<user>/.ivy2/jars/<jar-name>.jar
      Unix : file:///home/<user>/.ivy2/jars/<jar-name>.jar
      
      With this PR, we are avoiding the 'file://' scheme to get added to the packages jar files.
      
      I have verified manually in Windows and Unix environments, with the change it adds the jar to classpath like below,
      
      Windows : C:\Users\<user>\.ivy2\jars\<jar-name>.jar
      Unix : /home/<user>/.ivy2/jars/<jar-name>.jar
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #18708 from devaraj-kavali/SPARK-21339.
      58da1a24
  5. Jul 13, 2017
    • Sean Owen's avatar
      [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 · 425c4ada
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove Scala 2.10 build profiles and support
      - Replace some 2.10 support in scripts with commented placeholders for 2.12 later
      - Remove deprecated API calls from 2.10 support
      - Remove usages of deprecated context bounds where possible
      - Remove Scala 2.10 workarounds like ScalaReflectionLock
      - Other minor Scala warning fixes
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17150 from srowen/SPARK-19810.
      425c4ada
  6. Jul 12, 2017
    • Kohki Nishio's avatar
      [SPARK-18646][REPL] Set parent classloader as null for ExecutorClassLoader · e08d06b3
      Kohki Nishio authored
      ## What changes were proposed in this pull request?
      
      `ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used.
      
      This is take over of #17074,  the primary author of this PR is taroplus .
      
      Should close #17074 after this PR get merged.
      
      ## How was this patch tested?
      
      Add test case in `ExecutorClassLoaderSuite`.
      
      Author: Kohki Nishio <taroplus@me.com>
      Author: Xingbo Jiang <xingbo.jiang@databricks.com>
      
      Closes #18614 from jiangxb1987/executor_classloader.
      e08d06b3
  7. May 09, 2017
    • Wenchen Fan's avatar
      [SPARK-20548][FLAKY-TEST] share one REPL instance among REPL test cases · f561a76b
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      `ReplSuite.newProductSeqEncoder with REPL defined class` was flaky and throws OOM exception frequently. By analyzing the heap dump, we found the reason is that, in each test case of `ReplSuite`, we create a REPL instance, which creates a classloader and loads a lot of classes related to `SparkContext`. More details please see https://github.com/apache/spark/pull/17833#issuecomment-298711435.
      
      In this PR, we create a new test suite, `SingletonReplSuite`, which shares one REPL instances among all the test cases. Then we move most of the tests from `ReplSuite` to `SingletonReplSuite`, to avoid creating a lot of REPL instances and reduce memory footprint.
      
      ## How was this patch tested?
      
      test only change
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #17844 from cloud-fan/flaky-test.
      f561a76b
  8. May 01, 2017
  9. Apr 24, 2017
  10. Apr 10, 2017
    • Sean Owen's avatar
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish... · a26e3ed5
      Sean Owen authored
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems
      
      ## What changes were proposed in this pull request?
      
      Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
      
      The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17527 from srowen/SPARK-20156.
      a26e3ed5
  11. Mar 08, 2017
    • Michael Armbrust's avatar
      [SPARK-18055][SQL] Use correct mirror in ExpresionEncoder · 314e48a3
      Michael Armbrust authored
      Previously, we were using the mirror of passed in `TypeTag` when reflecting to build an encoder.  This fails when the outer class is built in (i.e. `Seq`'s default mirror is based on root classloader) but inner classes (i.e. `A` in `Seq[A]`) are defined in the REPL or a library.
      
      This patch changes us to always reflect based on a mirror created using the context classloader.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #17201 from marmbrus/replSeqEncoder.
      314e48a3
  12. Feb 09, 2017
    • Shixiong Zhu's avatar
      [SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.cancelOnInterrupt · 303f00a4
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      `Signaling.cancelOnInterrupt` leaks a SparkContext per call and it makes ReplSuite unstable.
      
      This PR adds `SparkContext.getActive` to allow `Signaling.cancelOnInterrupt` to get the active `SparkContext` to avoid the leak.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16825 from zsxwing/SPARK-19481.
      303f00a4
  13. Jan 18, 2017
  14. Dec 27, 2016
    • hyukjinkwon's avatar
      [SPARK-18842][TESTS] De-duplicate paths in classpaths in processes for... · d8e14db8
      hyukjinkwon authored
      [SPARK-18842][TESTS] De-duplicate paths in classpaths in processes for local-cluster mode in ReplSuite to work around the length limitation on Windows
      
      ## What changes were proposed in this pull request?
      
      `ReplSuite`s hang due to the length limitation on Windows with the exception as below:
      
      ```
      Spark context available as 'sc' (master = local-cluster[1,1,1024], app id = app-20161223114000-0000).
      Spark session available as 'spark'.
      Exception in thread "ExecutorRunner for app-20161223114000-0000/26995" java.lang.OutOfMemoryError: GC overhead limit exceeded
      	at java.util.Arrays.copyOf(Arrays.java:3332)
      	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
      	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
      	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
      	at java.lang.StringBuilder.append(StringBuilder.java:202)
      	at java.lang.ProcessImpl.createCommandLine(ProcessImpl.java:194)
      	at java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
      	at java.lang.ProcessImpl.start(ProcessImpl.java:137)
      	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
      	at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:167)
      	at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
      ```
      
      The reason is, it keeps failing and goes in an infinite loop. This fails because it uses the paths (via `getFile`) from URLs in the tests whereas some added afterward are normal local paths.
      (`url.getFile` gives `/C:/a/b/c` and some paths are added later as the format of `C:\a\b\c`. )
      
      So, many classpaths are duplicated because normal local paths and paths from URLs are mixed. This length is up to 40K which hits the length limitation problem (32K) on Windows.
      
      The full command line built here is - https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb
      
      ## How was this patch tested?
      
      Manually via AppVeyor.
      
      **Before**
      https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues
      
      **After**
      https://ci.appveyor.com/project/spark-test/spark/build/398-find-path-issues
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16398 from HyukjinKwon/SPARK-18842-more.
      Unverified
      d8e14db8
  15. Dec 21, 2016
    • Ryan Williams's avatar
      [SPARK-17807][CORE] split test-tags into test-JAR · afd9bc1d
      Ryan Williams authored
      Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
      
      Alternative to #16303.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #16311 from ryan-williams/tt.
      afd9bc1d
  16. Dec 03, 2016
    • hyukjinkwon's avatar
      [SPARK-18685][TESTS] Fix URI and release resources after opening in tests at... · d1312fb7
      hyukjinkwon authored
      [SPARK-18685][TESTS] Fix URI and release resources after opening in tests at ExecutorClassLoaderSuite
      
      ## What changes were proposed in this pull request?
      
      This PR fixes two problems as below:
      
      - Close `BufferedSource` after `Source.fromInputStream(...)` to release resource and make the tests pass on Windows in `ExecutorClassLoaderSuite`
      
        ```
        [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.repl.ExecutorClassLoaderSuite *** ABORTED *** (7 seconds, 333 milliseconds)
        [info]   java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-77b2f37b-6405-47c4-af1c-4a6a206511f2
        [info]   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
        [info]   at org.apache.spark.repl.ExecutorClassLoaderSuite.afterAll(ExecutorClassLoaderSuite.scala:76)
        [info]   at org.scalatest.BeforeAndAfterAll$class.afterAll(BeforeAndAfterAll.scala:213)
        ...
        ```
      
      - Fix URI correctly so that related tests can be passed on Windows.
      
        ```
        [info] - child first *** FAILED *** (78 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - parent first *** FAILED *** (15 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - child first can fall back *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - child first can fail *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - resource from parent *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - resources from parent *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ```
      
      ## How was this patch tested?
      
      Manually tested via AppVeyor.
      
      **Before**
      https://ci.appveyor.com/project/spark-test/spark/build/102-rpel-ExecutorClassLoaderSuite
      
      **After**
      https://ci.appveyor.com/project/spark-test/spark/build/108-rpel-ExecutorClassLoaderSuite
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16116 from HyukjinKwon/close-after-open.
      Unverified
      d1312fb7
  17. Dec 02, 2016
  18. Nov 05, 2016
  19. Nov 01, 2016
    • Ergin Seyfe's avatar
      [SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset · 8a538c97
      Ergin Seyfe authored
      ## What changes were proposed in this pull request?
      Likewise [DataSet.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L156) KeyValueGroupedDataset should mark the queryExecution as transient.
      
      As mentioned in the Jira ticket, without transient we saw serialization issues like
      
      ```
      Caused by: java.io.NotSerializableException: org.apache.spark.sql.execution.QueryExecution
      Serialization stack:
              - object not serializable (class: org.apache.spark.sql.execution.QueryExecution, value: ==
      ```
      
      ## How was this patch tested?
      
      Run the query which is specified in the Jira ticket before and after:
      ```
      val a = spark.createDataFrame(sc.parallelize(Seq((1,2),(3,4)))).as[(Int,Int)]
      val grouped = a.groupByKey(
      {x:(Int,Int)=>x._1}
      )
      val mappedGroups = grouped.mapGroups((k,x)=>
      {(k,1)}
      )
      val yyy = sc.broadcast(1)
      val last = mappedGroups.rdd.map(xx=>
      { val simpley = yyy.value 1 }
      )
      ```
      
      Author: Ergin Seyfe <eseyfe@fb.com>
      
      Closes #15706 from seyfe/keyvaluegrouped_serialization.
      8a538c97
  20. Oct 11, 2016
    • Wenchen Fan's avatar
      [SPARK-17720][SQL] introduce static SQL conf · b9a14718
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the `schemaStringThreshold` in `HiveExternalCatalog`, the flag to enable/disable hive support, the global temp view database in https://github.com/apache/spark/pull/14897.
      
      Actually we've already implemented static SQL conf implicitly via `SparkConf`, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command or `SparkSession.conf`, and forbid users to set/unset static SQL conf.
      
      ## How was this patch tested?
      
      new tests in SQLConfSuite
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #15295 from cloud-fan/global-conf.
      b9a14718
  21. Sep 08, 2016
    • Gurvinder Singh's avatar
      [SPARK-15487][WEB UI] Spark Master UI to reverse proxy Application and Workers UI · 92ce8d48
      Gurvinder Singh authored
      ## What changes were proposed in this pull request?
      
      This pull request adds the functionality to enable accessing worker and application UI through master UI itself. Thus helps in accessing SparkUI when running spark cluster in closed networks e.g. Kubernetes. Cluster admin needs to expose only spark master UI and rest of the UIs can be in the private network, master UI will reverse proxy the connection request to corresponding resource. It adds the path for workers/application UIs as
      
      WorkerUI: <http/https>://master-publicIP:<port>/target/workerID/
      ApplicationUI: <http/https>://master-publicIP:<port>/target/appID/
      
      This makes it easy for users to easily protect the Spark master cluster access by putting some reverse proxy e.g. https://github.com/bitly/oauth2_proxy
      
      ## How was this patch tested?
      
      The functionality has been tested manually and there is a unit test too for testing access to worker UI with reverse proxy address.
      
      pwendell bomeng BryanCutler can you please review it, thanks.
      
      Author: Gurvinder Singh <gurvinder.singh@uninett.no>
      
      Closes #13950 from gurvindersingh/rproxy.
      92ce8d48
  22. Sep 01, 2016
    • Shixiong Zhu's avatar
      [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again · 21c0a4fe
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      After digging into the logs, I noticed the failure is because in this test, it starts a local cluster with 2 executors. However, when SparkContext is created, executors may be still not up. When one of the executor is not up during running the job, the blocks won't be replicated.
      
      This PR just adds a wait loop before running the job to fix the flaky test.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #14905 from zsxwing/SPARK-17318-2.
      21c0a4fe
  23. Aug 30, 2016
  24. Aug 22, 2016
    • Eric Liang's avatar
      [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication · 8e223ea6
      Eric Liang authored
      ## What changes were proposed in this pull request?
      
      This is a straightforward clone of JoshRosen 's original patch. I have follow-up changes to fix block replication for repl-defined classes as well, but those appear to be flaking tests so I'm going to leave that for SPARK-17042
      
      ## How was this patch tested?
      
      End-to-end test in ReplSuite (also more tests in DistributedSuite from the original patch).
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #14311 from ericl/spark-16550.
      8e223ea6
  25. Aug 17, 2016
    • Steve Loughran's avatar
      [SPARK-16736][CORE][SQL] purge superfluous fs calls · cc97ea18
      Steve Loughran authored
      A review of the code, working back from Hadoop's `FileSystem.exists()` and `FileSystem.isDirectory()` code, then removing uses of the calls when superfluous.
      
      1. delete is harmless if called on a nonexistent path, so don't do any checks before deletes
      1. any `FileSystem.exists()`  check before `getFileStatus()` or `open()` is superfluous as the operation itself does the check. Instead the `FileNotFoundException` is caught and triggers the downgraded path. When a `FileNotFoundException` was thrown before, the code still creates a new FNFE with the error messages. Though now the inner exceptions are nested, for easier diagnostics.
      
      Initially, relying on Jenkins test runs.
      
      One troublespot here is that some of the codepaths are clearly error situations; it's not clear that they have coverage anyway. Trying to create the failure conditions in tests would be ideal, but it will also be hard.
      
      Author: Steve Loughran <stevel@apache.org>
      
      Closes #14371 from steveloughran/cloud/SPARK-16736-superfluous-fs-calls.
      cc97ea18
  26. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
  27. Aug 03, 2016
    • Stefan Schulze's avatar
      [SPARK-16770][BUILD] Fix JLine dependency management and version (Sca… · 4775eb41
      Stefan Schulze authored
      ## What changes were proposed in this pull request?
      As of Scala 2.11.x there is no longer a org.scala-lang:jline version aligned to the scala version itself. Scala console now uses the plain jline:jline module. Spark's  dependency management did not reflect this change properly, causing Maven to pull in Jline via transitive dependency. Unfortunately Jline 2.12 contained a minor but very annoying bug rendering the shell almost useless for developers with german keyboard layout. This request contains the following chages:
      - Exclude transitive dependency 'jline:jline' from hive-exec module
      - Remove global properties 'jline.version' and 'jline.groupId'
      - Add both properties and dependency to 'scala-2.11' profile
      - Add explicit dependency on 'jline:jline' to  module 'spark-repl'
      
      ## How was this patch tested?
      - Running mvn dependency:tree and checking for correct Jline version 2.12.1
      - Running full builds with assembly and checking for jline-2.12.1.jar in 'lib' folder of generated tarball
      
      Author: Stefan Schulze <stefan.schulze@pentasys.de>
      
      Closes #14429 from stsc-pentasys/SPARK-16770.
      4775eb41
  28. Jul 31, 2016
    • Reynold Xin's avatar
      [SPARK-16812] Open up SparkILoop.getAddedJars · 7c27d075
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch makes SparkILoop.getAddedJars a public developer API. It is a useful function to get the list of jars added.
      
      ## How was this patch tested?
      N/A - this is a simple visibility change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14417 from rxin/SPARK-16812.
      7c27d075
  29. Jul 19, 2016
  30. Jul 14, 2016
    • jerryshao's avatar
      [SPARK-16540][YARN][CORE] Avoid adding jars twice for Spark running on yarn · 91575cac
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Currently when running spark on yarn, jars specified with --jars, --packages will be added twice, one is Spark's own file server, another is yarn's distributed cache, this can be seen from log:
      for example:
      
      ```
      ./bin/spark-shell --master yarn-client --jars examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
      ```
      
      If specified the jar to be added is scopt jar, it will added twice:
      
      ```
      ...
      16/07/14 15:06:48 INFO Server: Started 5603ms
      16/07/14 15:06:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
      16/07/14 15:06:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
      16/07/14 15:06:48 INFO SparkContext: Added JAR file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar at spark://192.168.0.102:63996/jars/scopt_2.11-3.3.0.jar with timestamp 1468480008637
      16/07/14 15:06:49 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
      16/07/14 15:06:49 INFO Client: Requesting a new application from cluster with 1 NodeManagers
      16/07/14 15:06:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
      16/07/14 15:06:49 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
      16/07/14 15:06:49 INFO Client: Setting up container launch context for our AM
      16/07/14 15:06:49 INFO Client: Setting up the launch environment for our AM container
      16/07/14 15:06:49 INFO Client: Preparing resources for our AM container
      16/07/14 15:06:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
      16/07/14 15:06:50 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_libs__6486179704064718817.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_libs__6486179704064718817.zip
      16/07/14 15:06:51 INFO Client: Uploading resource file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/scopt_2.11-3.3.0.jar
      16/07/14 15:06:51 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_conf__326416236462420861.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_conf__.zip
      ...
      ```
      
      So here try to avoid adding jars to Spark's fileserver unnecessarily.
      
      ## How was this patch tested?
      
      Manually verified both in yarn client and cluster mode, also in standalone mode.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #14196 from jerryshao/SPARK-16540.
      91575cac
  31. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  32. Jun 24, 2016
    • peng.zhang's avatar
      [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite · f4fd7432
      peng.zhang authored
      ## What changes were proposed in this pull request?
      
      Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite doesn't test "yarn cluster" mode correctly.
      This pull request fixes it.
      
      ## How was this patch tested?
      Unit test
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: peng.zhang <peng.zhang@xiaomi.com>
      
      Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode.
      f4fd7432
  33. Jun 19, 2016
    • Prashant Sharma's avatar
      [SPARK-15942][REPL] Unblock `:reset` command in REPL. · 1b3a9b96
      Prashant Sharma authored
      ## What changes were proposed in this pull
      (Paste from JIRA issue.)
      As a follow up for SPARK-15697, I have following semantics for `:reset` command.
      On `:reset` we forget all that user has done but not the initialization of spark. To avoid confusion or make it more clear, we show the message `spark` and `sc` are not erased, infact they are in same state as they were left by previous operations done by the user.
      While doing above, somewhere I felt that this is not usually what reset means. But an accidental shutdown of a cluster can be very costly, so may be in that sense this is less surprising and still useful.
      
      ## How was this patch tested?
      
      Manually, by calling `:reset` command, by both altering the state of SparkContext and creating some local variables.
      
      Author: Prashant Sharma <prashant@apache.org>
      Author: Prashant Sharma <prashsh1@in.ibm.com>
      
      Closes #13661 from ScrapCodes/repl-reset-command.
      1b3a9b96
  34. Jun 16, 2016
    • Nezih Yigitbasi's avatar
      [SPARK-15782][YARN] Fix spark.jars and spark.yarn.dist.jars handling · 63470afc
      Nezih Yigitbasi authored
      When `--packages` is specified with spark-shell the classes from those packages cannot be found, which I think is due to some of the changes in SPARK-12343.
      
      Tested manually with both scala 2.10 and 2.11 repls.
      
      vanzin davies can you guys please review?
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      Author: Nezih Yigitbasi <nyigitbasi@netflix.com>
      
      Closes #13709 from nezihyigitbasi/SPARK-15782.
      63470afc
  35. Jun 15, 2016
    • Davies Liu's avatar
      a153e41c
    • Nezih Yigitbasi's avatar
      [SPARK-15782][YARN] Set spark.jars system property in client mode · 4df8df5c
      Nezih Yigitbasi authored
      ## What changes were proposed in this pull request?
      
      When `--packages` is specified with `spark-shell` the classes from those packages cannot be found, which I think is due to some of the changes in `SPARK-12343`. In particular `SPARK-12343` removes a line that sets the `spark.jars` system property in client mode, which is used by the repl main class to set the classpath.
      
      ## How was this patch tested?
      
      Tested manually.
      
      This system property is used by the repl to populate its classpath. If
      this is not set properly the classes for external packages cannot be
      found.
      
      tgravescs vanzin as you may be familiar with this part of the code.
      
      Author: Nezih Yigitbasi <nyigitbasi@netflix.com>
      
      Closes #13527 from nezihyigitbasi/repl-fix.
      4df8df5c
  36. Jun 13, 2016
    • Prashant Sharma's avatar
      [SPARK-15697][REPL] Unblock some of the useful repl commands. · 4134653e
      Prashant Sharma authored
      ## What changes were proposed in this pull request?
      
      Unblock some of the useful repl commands. like, "implicits", "javap", "power", "type", "kind". As they are useful and fully functional and part of scala/scala project, I see no harm in having them either.
      
      Verbatim paste form JIRA description.
      "implicits", "javap", "power", "type", "kind" commands in repl are blocked. However, they work fine in all cases I have tried. It is clear we don't support them as they are part of the scala/scala repl project. What is the harm in unblocking them, given they are useful ?
      In previous versions of spark we disabled these commands because it was difficult to support them without customization and the associated maintenance. Since the code base of scala repl was actually ported and maintained under spark source. Now that is not the situation and one can benefit from these commands in Spark REPL as much as in scala repl.
      
      ## How was this patch tested?
      Existing tests and manual, by trying out all of the above commands.
      
      P.S. Symantics of reset are to be discussed in a separate issue.
      
      Author: Prashant Sharma <prashsh1@in.ibm.com>
      
      Closes #13437 from ScrapCodes/SPARK-15697/repl-unblock-commands.
      4134653e
  37. Jun 09, 2016
    • Prashant Sharma's avatar
      [SPARK-15841][Tests] REPLSuite has incorrect env set for a couple of tests. · 83070cd1
      Prashant Sharma authored
      Description from JIRA.
      In ReplSuite, for a test that can be tested well on just local should not really have to start a local-cluster. And similarly a test is in-sufficiently run if it's actually fixing a problem related to a distributed run in environment with local run.
      
      Existing tests.
      
      Author: Prashant Sharma <prashsh1@in.ibm.com>
      
      Closes #13574 from ScrapCodes/SPARK-15841/repl-suite-fix.
      83070cd1
  38. Jun 02, 2016
    • hyukjinkwon's avatar
      [SPARK-15322][SQL][FOLLOWUP] Use the new long accumulator for old int accumulators. · 252417fa
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR corrects the remaining cases for using old accumulators.
      
      This does not change some old accumulator usages below:
      
      - `ImplicitSuite.scala` - Tests dedicated to old accumulator, for implicits with `AccumulatorParam`
      
      - `AccumulatorSuite.scala` -  Tests dedicated to old accumulator
      
      - `JavaSparkContext.scala` - For supporting old accumulators for Java API.
      
      - `debug.package.scala` - Usage with `HashSet[String]`. Currently, it seems no implementation for this. I might be able to write an anonymous class for this but I didn't because I think it is not worth writing a lot of codes only for this.
      
      - `SQLMetricsSuite.scala` - This uses the old accumulator for checking type boxing. It seems new accumulator does not require type boxing for this case whereas the old one requires (due to the use of generic).
      
      ## How was this patch tested?
      
      Existing tests cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #13434 from HyukjinKwon/accum.
      252417fa
  39. May 31, 2016
    • xin Wu's avatar
      [SPARK-15236][SQL][SPARK SHELL] Add spark-defaults property to switch to use InMemoryCatalog · 04f925ed
      xin Wu authored
      ## What changes were proposed in this pull request?
      This PR change REPL/Main to check this property `spark.sql.catalogImplementation` to decide if `enableHiveSupport `should be called.
      
      If `spark.sql.catalogImplementation` is set to `hive`, and hive classes are built, Spark will use Hive support.
      Other wise, Spark will create a SparkSession with in-memory catalog support.
      
      ## How was this patch tested?
      Run the REPL component test.
      
      Author: xin Wu <xinwu@us.ibm.com>
      Author: Xin Wu <xinwu@us.ibm.com>
      
      Closes #13088 from xwu0226/SPARK-15236.
      04f925ed
Loading