Skip to content
Snippets Groups Projects
  1. Apr 25, 2017
  2. Apr 14, 2017
  3. Mar 28, 2017
  4. Mar 21, 2017
  5. Mar 08, 2017
    • Michael Armbrust's avatar
      [SPARK-18055][SQL] Use correct mirror in ExpresionEncoder · 320eff14
      Michael Armbrust authored
      
      Previously, we were using the mirror of passed in `TypeTag` when reflecting to build an encoder.  This fails when the outer class is built in (i.e. `Seq`'s default mirror is based on root classloader) but inner classes (i.e. `A` in `Seq[A]`) are defined in the REPL or a library.
      
      This patch changes us to always reflect based on a mirror created using the context classloader.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #17201 from marmbrus/replSeqEncoder.
      
      (cherry picked from commit 314e48a3)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      320eff14
  6. Feb 09, 2017
  7. Dec 22, 2016
  8. Dec 15, 2016
  9. Dec 08, 2016
  10. Dec 03, 2016
    • hyukjinkwon's avatar
      [SPARK-18685][TESTS] Fix URI and release resources after opening in tests at... · 28ea432a
      hyukjinkwon authored
      [SPARK-18685][TESTS] Fix URI and release resources after opening in tests at ExecutorClassLoaderSuite
      
      ## What changes were proposed in this pull request?
      
      This PR fixes two problems as below:
      
      - Close `BufferedSource` after `Source.fromInputStream(...)` to release resource and make the tests pass on Windows in `ExecutorClassLoaderSuite`
      
        ```
        [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.repl.ExecutorClassLoaderSuite *** ABORTED *** (7 seconds, 333 milliseconds)
        [info]   java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-77b2f37b-6405-47c4-af1c-4a6a206511f2
        [info]   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
        [info]   at org.apache.spark.repl.ExecutorClassLoaderSuite.afterAll(ExecutorClassLoaderSuite.scala:76)
        [info]   at org.scalatest.BeforeAndAfterAll$class.afterAll(BeforeAndAfterAll.scala:213)
        ...
        ```
      
      - Fix URI correctly so that related tests can be passed on Windows.
      
        ```
        [info] - child first *** FAILED *** (78 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - parent first *** FAILED *** (15 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - child first can fall back *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - child first can fail *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - resource from parent *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - resources from parent *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ```
      
      ## How was this patch tested?
      
      Manually tested via AppVeyor.
      
      **Before**
      https://ci.appveyor.com/project/spark-test/spark/build/102-rpel-ExecutorClassLoaderSuite
      
      **After**
      https://ci.appveyor.com/project/spark-test/spark/build/108-rpel-ExecutorClassLoaderSuite
      
      
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16116 from HyukjinKwon/close-after-open.
      
      (cherry picked from commit d1312fb7)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      28ea432a
  11. Nov 28, 2016
  12. Nov 05, 2016
  13. Nov 01, 2016
    • Ergin Seyfe's avatar
      [SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset · 8a538c97
      Ergin Seyfe authored
      ## What changes were proposed in this pull request?
      Likewise [DataSet.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L156) KeyValueGroupedDataset should mark the queryExecution as transient.
      
      As mentioned in the Jira ticket, without transient we saw serialization issues like
      
      ```
      Caused by: java.io.NotSerializableException: org.apache.spark.sql.execution.QueryExecution
      Serialization stack:
              - object not serializable (class: org.apache.spark.sql.execution.QueryExecution, value: ==
      ```
      
      ## How was this patch tested?
      
      Run the query which is specified in the Jira ticket before and after:
      ```
      val a = spark.createDataFrame(sc.parallelize(Seq((1,2),(3,4)))).as[(Int,Int)]
      val grouped = a.groupByKey(
      {x:(Int,Int)=>x._1}
      )
      val mappedGroups = grouped.mapGroups((k,x)=>
      {(k,1)}
      )
      val yyy = sc.broadcast(1)
      val last = mappedGroups.rdd.map(xx=>
      { val simpley = yyy.value 1 }
      )
      ```
      
      Author: Ergin Seyfe <eseyfe@fb.com>
      
      Closes #15706 from seyfe/keyvaluegrouped_serialization.
      8a538c97
  14. Oct 11, 2016
    • Wenchen Fan's avatar
      [SPARK-17720][SQL] introduce static SQL conf · b9a14718
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the `schemaStringThreshold` in `HiveExternalCatalog`, the flag to enable/disable hive support, the global temp view database in https://github.com/apache/spark/pull/14897.
      
      Actually we've already implemented static SQL conf implicitly via `SparkConf`, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command or `SparkSession.conf`, and forbid users to set/unset static SQL conf.
      
      ## How was this patch tested?
      
      new tests in SQLConfSuite
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #15295 from cloud-fan/global-conf.
      b9a14718
  15. Sep 08, 2016
    • Gurvinder Singh's avatar
      [SPARK-15487][WEB UI] Spark Master UI to reverse proxy Application and Workers UI · 92ce8d48
      Gurvinder Singh authored
      ## What changes were proposed in this pull request?
      
      This pull request adds the functionality to enable accessing worker and application UI through master UI itself. Thus helps in accessing SparkUI when running spark cluster in closed networks e.g. Kubernetes. Cluster admin needs to expose only spark master UI and rest of the UIs can be in the private network, master UI will reverse proxy the connection request to corresponding resource. It adds the path for workers/application UIs as
      
      WorkerUI: <http/https>://master-publicIP:<port>/target/workerID/
      ApplicationUI: <http/https>://master-publicIP:<port>/target/appID/
      
      This makes it easy for users to easily protect the Spark master cluster access by putting some reverse proxy e.g. https://github.com/bitly/oauth2_proxy
      
      ## How was this patch tested?
      
      The functionality has been tested manually and there is a unit test too for testing access to worker UI with reverse proxy address.
      
      pwendell bomeng BryanCutler can you please review it, thanks.
      
      Author: Gurvinder Singh <gurvinder.singh@uninett.no>
      
      Closes #13950 from gurvindersingh/rproxy.
      92ce8d48
  16. Sep 01, 2016
    • Shixiong Zhu's avatar
      [SPARK-17318][TESTS] Fix ReplSuite replicating blocks of object with class defined in repl again · 21c0a4fe
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      After digging into the logs, I noticed the failure is because in this test, it starts a local cluster with 2 executors. However, when SparkContext is created, executors may be still not up. When one of the executor is not up during running the job, the blocks won't be replicated.
      
      This PR just adds a wait loop before running the job to fix the flaky test.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #14905 from zsxwing/SPARK-17318-2.
      21c0a4fe
  17. Aug 30, 2016
  18. Aug 22, 2016
    • Eric Liang's avatar
      [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication · 8e223ea6
      Eric Liang authored
      ## What changes were proposed in this pull request?
      
      This is a straightforward clone of JoshRosen 's original patch. I have follow-up changes to fix block replication for repl-defined classes as well, but those appear to be flaking tests so I'm going to leave that for SPARK-17042
      
      ## How was this patch tested?
      
      End-to-end test in ReplSuite (also more tests in DistributedSuite from the original patch).
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #14311 from ericl/spark-16550.
      8e223ea6
  19. Aug 17, 2016
    • Steve Loughran's avatar
      [SPARK-16736][CORE][SQL] purge superfluous fs calls · cc97ea18
      Steve Loughran authored
      A review of the code, working back from Hadoop's `FileSystem.exists()` and `FileSystem.isDirectory()` code, then removing uses of the calls when superfluous.
      
      1. delete is harmless if called on a nonexistent path, so don't do any checks before deletes
      1. any `FileSystem.exists()`  check before `getFileStatus()` or `open()` is superfluous as the operation itself does the check. Instead the `FileNotFoundException` is caught and triggers the downgraded path. When a `FileNotFoundException` was thrown before, the code still creates a new FNFE with the error messages. Though now the inner exceptions are nested, for easier diagnostics.
      
      Initially, relying on Jenkins test runs.
      
      One troublespot here is that some of the codepaths are clearly error situations; it's not clear that they have coverage anyway. Trying to create the failure conditions in tests would be ideal, but it will also be hard.
      
      Author: Steve Loughran <stevel@apache.org>
      
      Closes #14371 from steveloughran/cloud/SPARK-16736-superfluous-fs-calls.
      cc97ea18
  20. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
  21. Aug 03, 2016
    • Stefan Schulze's avatar
      [SPARK-16770][BUILD] Fix JLine dependency management and version (Sca… · 4775eb41
      Stefan Schulze authored
      ## What changes were proposed in this pull request?
      As of Scala 2.11.x there is no longer a org.scala-lang:jline version aligned to the scala version itself. Scala console now uses the plain jline:jline module. Spark's  dependency management did not reflect this change properly, causing Maven to pull in Jline via transitive dependency. Unfortunately Jline 2.12 contained a minor but very annoying bug rendering the shell almost useless for developers with german keyboard layout. This request contains the following chages:
      - Exclude transitive dependency 'jline:jline' from hive-exec module
      - Remove global properties 'jline.version' and 'jline.groupId'
      - Add both properties and dependency to 'scala-2.11' profile
      - Add explicit dependency on 'jline:jline' to  module 'spark-repl'
      
      ## How was this patch tested?
      - Running mvn dependency:tree and checking for correct Jline version 2.12.1
      - Running full builds with assembly and checking for jline-2.12.1.jar in 'lib' folder of generated tarball
      
      Author: Stefan Schulze <stefan.schulze@pentasys.de>
      
      Closes #14429 from stsc-pentasys/SPARK-16770.
      4775eb41
  22. Jul 31, 2016
    • Reynold Xin's avatar
      [SPARK-16812] Open up SparkILoop.getAddedJars · 7c27d075
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch makes SparkILoop.getAddedJars a public developer API. It is a useful function to get the list of jars added.
      
      ## How was this patch tested?
      N/A - this is a simple visibility change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14417 from rxin/SPARK-16812.
      7c27d075
  23. Jul 19, 2016
  24. Jul 14, 2016
    • jerryshao's avatar
      [SPARK-16540][YARN][CORE] Avoid adding jars twice for Spark running on yarn · 91575cac
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Currently when running spark on yarn, jars specified with --jars, --packages will be added twice, one is Spark's own file server, another is yarn's distributed cache, this can be seen from log:
      for example:
      
      ```
      ./bin/spark-shell --master yarn-client --jars examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
      ```
      
      If specified the jar to be added is scopt jar, it will added twice:
      
      ```
      ...
      16/07/14 15:06:48 INFO Server: Started 5603ms
      16/07/14 15:06:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
      16/07/14 15:06:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
      16/07/14 15:06:48 INFO SparkContext: Added JAR file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar at spark://192.168.0.102:63996/jars/scopt_2.11-3.3.0.jar with timestamp 1468480008637
      16/07/14 15:06:49 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
      16/07/14 15:06:49 INFO Client: Requesting a new application from cluster with 1 NodeManagers
      16/07/14 15:06:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
      16/07/14 15:06:49 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
      16/07/14 15:06:49 INFO Client: Setting up container launch context for our AM
      16/07/14 15:06:49 INFO Client: Setting up the launch environment for our AM container
      16/07/14 15:06:49 INFO Client: Preparing resources for our AM container
      16/07/14 15:06:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
      16/07/14 15:06:50 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_libs__6486179704064718817.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_libs__6486179704064718817.zip
      16/07/14 15:06:51 INFO Client: Uploading resource file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/scopt_2.11-3.3.0.jar
      16/07/14 15:06:51 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_conf__326416236462420861.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_conf__.zip
      ...
      ```
      
      So here try to avoid adding jars to Spark's fileserver unnecessarily.
      
      ## How was this patch tested?
      
      Manually verified both in yarn client and cluster mode, also in standalone mode.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #14196 from jerryshao/SPARK-16540.
      91575cac
  25. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  26. Jun 24, 2016
    • peng.zhang's avatar
      [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuite · f4fd7432
      peng.zhang authored
      ## What changes were proposed in this pull request?
      
      Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite doesn't test "yarn cluster" mode correctly.
      This pull request fixes it.
      
      ## How was this patch tested?
      Unit test
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: peng.zhang <peng.zhang@xiaomi.com>
      
      Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode.
      f4fd7432
  27. Jun 19, 2016
    • Prashant Sharma's avatar
      [SPARK-15942][REPL] Unblock `:reset` command in REPL. · 1b3a9b96
      Prashant Sharma authored
      ## What changes were proposed in this pull
      (Paste from JIRA issue.)
      As a follow up for SPARK-15697, I have following semantics for `:reset` command.
      On `:reset` we forget all that user has done but not the initialization of spark. To avoid confusion or make it more clear, we show the message `spark` and `sc` are not erased, infact they are in same state as they were left by previous operations done by the user.
      While doing above, somewhere I felt that this is not usually what reset means. But an accidental shutdown of a cluster can be very costly, so may be in that sense this is less surprising and still useful.
      
      ## How was this patch tested?
      
      Manually, by calling `:reset` command, by both altering the state of SparkContext and creating some local variables.
      
      Author: Prashant Sharma <prashant@apache.org>
      Author: Prashant Sharma <prashsh1@in.ibm.com>
      
      Closes #13661 from ScrapCodes/repl-reset-command.
      1b3a9b96
  28. Jun 16, 2016
    • Nezih Yigitbasi's avatar
      [SPARK-15782][YARN] Fix spark.jars and spark.yarn.dist.jars handling · 63470afc
      Nezih Yigitbasi authored
      When `--packages` is specified with spark-shell the classes from those packages cannot be found, which I think is due to some of the changes in SPARK-12343.
      
      Tested manually with both scala 2.10 and 2.11 repls.
      
      vanzin davies can you guys please review?
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      Author: Nezih Yigitbasi <nyigitbasi@netflix.com>
      
      Closes #13709 from nezihyigitbasi/SPARK-15782.
      63470afc
  29. Jun 15, 2016
Loading