Skip to content
Snippets Groups Projects
  1. Jun 03, 2015
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
  2. May 29, 2015
    • Andrew Or's avatar
      [SPARK-7558] Demarcate tests in unit-tests.log · 9eb222c1
      Andrew Or authored
      Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:
      
      ```
      ===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====
      
      15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents
      
      ...
      
      15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
      15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:
      
      ***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****
      
      ...
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:
      
      879b060 [Andrew Or] Fix compile after rebase
      d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      7790b6c [Andrew Or] Fix tests after logical merge conflict
      c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      c43ffc4 [Andrew Or] Fix tests?
      8882581 [Andrew Or] Fix tests
      ee22cda [Andrew Or] Fix log message
      fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      12d1e1b [Andrew Or] Various whitespace changes (minor)
      69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
      bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
      da0b12f [Andrew Or] Add core tests as dependencies in all modules
      f7d29ce [Andrew Or] Introduce base abstract class for all test suites
      9eb222c1
  3. May 19, 2015
  4. May 13, 2015
    • Masayoshi TSUZUKI's avatar
      [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path · 50c72708
      Masayoshi TSUZUKI authored
      escape spaces in the arguments.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits:
      
      3f9a188 [Masayoshi TSUZUKI] modified some errors.
      ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors.
      1784239 [Masayoshi TSUZUKI] removed Utils.formatPath.
      e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala.
      84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows
      016128d [Masayoshi TSUZUKI] fixed to use File.toURI()
      2c62e3b [Masayoshi TSUZUKI] Merge pull request #1 from sarutak/SPARK-6568-2
      7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2
      45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2
      10f1c73 [Kousuke Saruta] Added a comment
      93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2
      649da82 [Kousuke Saruta] Fix classpath handling
      c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path
      50c72708
  5. May 08, 2015
    • vinodkc's avatar
      [SPARK-7489] [SPARK SHELL] Spark shell crashes when compiled with scala 2.11 · 4e7360e1
      vinodkc authored
      Spark shell crashes when compiled with scala 2.11 and  SPARK_PREPEND_CLASSES=true
      
      There is a similar Resolved JIRA issue -SPARK-7470 and a PR https://github.com/apache/spark/pull/5997 , which handled same issue only in scala 2.10
      
      Author: vinodkc <vinod.kc.in@gmail.com>
      
      Closes #6013 from vinodkc/fix_sqlcontext_exception_scala_2.11 and squashes the following commits:
      
      119061c [vinodkc] Spark shell crashes when compiled with scala 2.11
      4e7360e1
    • Andrew Or's avatar
      [SPARK-7470] [SQL] Spark shell SQLContext crashes without hive · 714db2ef
      Andrew Or authored
      This only happens if you have `SPARK_PREPEND_CLASSES` set. Then I built it with `build/sbt clean assembly compile` and just ran it with `bin/spark-shell`.
      ```
      ...
      15/05/07 17:07:30 INFO EventLoggingListener: Logging events to file:/tmp/spark-events/local-1431043649919
      15/05/07 17:07:30 INFO SparkILoop: Created spark context..
      Spark context available as sc.
      java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
      	at java.lang.Class.getDeclaredConstructors0(Native Method)
      	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
      	at java.lang.Class.getConstructor0(Class.java:2803)
      ...
      Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
      	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
      	... 52 more
      
      <console>:10: error: not found: value sqlContext
             import sqlContext.implicits._
                    ^
      <console>:10: error: not found: value sqlContext
             import sqlContext.sql
                    ^
      ```
      yhuai marmbrus
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5997 from andrewor14/sql-shell-crash and squashes the following commits:
      
      61147e6 [Andrew Or] Also expect NoClassDefFoundError
      714db2ef
  6. Apr 25, 2015
    • Prashant Sharma's avatar
      [SPARK-7092] Update spark scala version to 2.11.6 · a11c8683
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #5662 from ScrapCodes/SPARK-7092/scala-update-2.11.6 and squashes the following commits:
      
      58cf4f9 [Prashant Sharma] [SPARK-7092] Update spark scala version to 2.11.6
      a11c8683
  7. Apr 09, 2015
  8. Mar 24, 2015
    • Josh Rosen's avatar
      [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load... · 7215aa74
      Josh Rosen authored
      [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)
      
      ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang.  See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction.
      
      This patch fixes this issue by ensuring proper cleanup of these resources.  It also adds logging for unexpected error cases.
      
      This PR is an extended version of #4935 and adds a regression test.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4944 from JoshRosen/executorclassloader-leak-master-branch and squashes the following commits:
      
      e0e3c25 [Josh Rosen] Wrap try block around getReponseCode; re-enable keep-alive by closing error stream
      961c284 [Josh Rosen] Roll back changes that were added to get the regression test to fail
      7ee2261 [Josh Rosen] Add a failing regression test
      e2d70a3 [Josh Rosen] Properly clean up after errors in ExecutorClassLoader
      7215aa74
  9. Mar 20, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. · a7456459
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5056 from vanzin/SPARK-6371 and squashes the following commits:
      
      63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371
      6506f75 [Marcelo Vanzin] Use more fine-grained exclusion.
      178ba71 [Marcelo Vanzin] Oops.
      75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA.
      a45a62c [Marcelo Vanzin] Work around MIMA warning.
      1d8a670 [Marcelo Vanzin] Re-group jetty exclusion.
      0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx.
      cef4603 [Marcelo Vanzin] Indentation.
      296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
      a7456459
    • Sean Owen's avatar
      SPARK-6338 [CORE] Use standard temp dir mechanisms in tests to avoid orphaned temp files · 6f80c3e8
      Sean Owen authored
      Use `Utils.createTempDir()` to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5029 from srowen/SPARK-6338 and squashes the following commits:
      
      27b740a [Sean Owen] Fix hive-thriftserver tests that don't expect an existing dir
      4a212fa [Sean Owen] Standardize a bit more temp dir management
      9004081 [Sean Owen] Revert some added recursive-delete calls
      57609e4 [Sean Owen] Use Utils.createTempDir() to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
      6f80c3e8
  10. Mar 17, 2015
    • Kevin (Sangwoo) Kim's avatar
      [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running... · f0edeae7
      Kevin (Sangwoo) Kim authored
      [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running groupByKey with class defined in REPL
      
      ```
      case class ClassA(value: String)
      val rdd = sc.parallelize(List(("k1", ClassA("v1")), ("k1", ClassA("v2")) ))
      rdd.groupByKey.collect
      ```
      This code used to be throw exception in spark-shell, because while shuffling ```JavaSerializer```uses ```defaultClassLoader``` which was defined like ```env.serializer.setDefaultClassLoader(urlClassLoader)```.
      
      It should be ```env.serializer.setDefaultClassLoader(replClassLoader)```, like
      ```
          override def run() {
            val deserializeStartTime = System.currentTimeMillis()
            Thread.currentThread.setContextClassLoader(replClassLoader)
      ```
      in TaskRunner.
      
      When ```replClassLoader``` cannot be defined, it's identical with ```urlClassLoader```
      
      Author: Kevin (Sangwoo) Kim <sangwookim.me@gmail.com>
      
      Closes #5046 from swkimme/master and squashes the following commits:
      
      fa2b9ee [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
      6e9620b [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
      d23e4e2 [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
      a4a3c8a [Kevin (Sangwoo) Kim] add 'class defined in repl - shuffle' test to ReplSuite
      bd00da5 [Kevin (Sangwoo) Kim] add 'class defined in repl - shuffle' test to ReplSuite
      c1b1fc7 [Kevin (Sangwoo) Kim] use REPL class loader for executor's serializer
      f0edeae7
  11. Mar 15, 2015
    • Jongyoul Lee's avatar
      [SPARK-3619] Part 2. Upgrade to Mesos 0.21 to work around MESOS-1688 · aa6536fa
      Jongyoul Lee authored
      - MESOS_NATIVE_LIBRARY become deprecated
      - Chagned MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #4361 from jongyoul/SPARK-3619-1 and squashes the following commits:
      
      f1ea91f [Jongyoul Lee] Merge branch 'SPARK-3619-1' of https://github.com/jongyoul/spark into SPARK-3619-1
      a6a00c2 [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - Removed 'Known issues' section
      2e15a21 [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - MESOS_NATIVE_LIBRARY become deprecated - Chagned MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY
      0dace7b [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - MESOS_NATIVE_LIBRARY become deprecated - Chagned MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY
      aa6536fa
  12. Mar 09, 2015
  13. Mar 05, 2015
  14. Feb 16, 2015
    • azagrebin's avatar
      [SPARK-3340] Deprecate ADD_JARS and ADD_FILES · 16687651
      azagrebin authored
      I created a patch that disables the environment variables.
      Thereby scala or python shell log a warning message to notify user about the deprecation
      with the following message:
      scala: "ADD_JARS environment variable is deprecated, use --jar spark submit argument instead"
      python: "Warning: ADD_FILES environment variable is deprecated, use --py-files argument instead"
      
      Is it what is expected or the code associated with the variables should be just completely removed?
      Should it be somewhere documented?
      
      Author: azagrebin <azagrebin@gmail.com>
      
      Closes #4616 from azagrebin/master and squashes the following commits:
      
      bab1aa9 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES: minor readability issue
      0643895 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES: add warning messages
      42f0107 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES
      16687651
  15. Feb 14, 2015
    • Reynold Xin's avatar
      [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames · e98dfe62
      Reynold Xin authored
      - The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
      - toDataFrame -> toDF
      - Dsl -> functions
      - implicits moved into SQLContext.implicits
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      
      Python changes:
      - toDataFrame -> toDF
      - Dsl -> functions package
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      - add toDF functions to RDD on SQLContext init
      - add flatMap to DataFrame
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
      
      5ef9910 [Reynold Xin] More fix
      61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
      ff5832c [Reynold Xin] Fix python
      749c675 [Reynold Xin] count(*) fixes.
      5806df0 [Reynold Xin] Fix build break again.
      d941f3d [Reynold Xin] Fixed explode compilation break.
      fe1267a [Davies Liu] flatMap
      c4afb8e [Reynold Xin] style
      d9de47f [Davies Liu] add comment
      b783994 [Davies Liu] add comment for toDF
      e2154e5 [Davies Liu] schema() -> schema
      3a1004f [Davies Liu] Dsl -> functions, toDF()
      fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      97dd47c [Davies Liu] fix mistake
      6168f74 [Davies Liu] fix test
      1fc0199 [Davies Liu] fix test
      a075cd5 [Davies Liu] clean up, toPandas
      663d314 [Davies Liu] add test for agg('*')
      9e214d5 [Reynold Xin] count(*) fixes.
      1ed7136 [Reynold Xin] Fix build break again.
      921b2e3 [Reynold Xin] Fixed explode compilation break.
      14698d4 [Davies Liu] flatMap
      ba3e12d [Reynold Xin] style
      d08c92d [Davies Liu] add comment
      5c8b524 [Davies Liu] add comment for toDF
      a4e5e66 [Davies Liu] schema() -> schema
      d377fc9 [Davies Liu] Dsl -> functions, toDF()
      6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      e98dfe62
  16. Feb 12, 2015
    • Sean Owen's avatar
      SPARK-5727 [BUILD] Remove Debian packaging · 9a3ea49f
      Sean Owen authored
      (for master / 1.4 only)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4526 from srowen/SPARK-5727.2 and squashes the following commits:
      
      83ba49c [Sean Owen] Remove Debian packaging
      9a3ea49f
  17. Feb 06, 2015
    • Michael Armbrust's avatar
      [HOTFIX] Fix the maven build after adding sqlContext to spark-shell · 57961567
      Michael Armbrust authored
      Follow up to #4387 to fix the build break.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4443 from marmbrus/fixMaven and squashes the following commits:
      
      1eeba7d [Michael Armbrust] try again
      7f5fb15 [Michael Armbrust] [HOTFIX] Fix the maven build after adding sqlContext to spark-shell
      57961567
    • OopsOutOfMemory's avatar
      [SPARK-5586][Spark Shell][SQL] Make `sqlContext` available in spark shell · 3d3ecd77
      OopsOutOfMemory authored
      Result is like this
      ```
      15/02/05 13:41:22 INFO SparkILoop: Created spark context..
      Spark context available as sc.
      15/02/05 13:41:22 INFO SparkILoop: Created sql context..
      SQLContext available as sqlContext.
      
      scala> sq
      sql          sqlContext   sqlParser    sqrt
      ```
      
      Author: OopsOutOfMemory <victorshengli@126.com>
      
      Closes #4387 from OopsOutOfMemory/sqlContextInShell and squashes the following commits:
      
      c7f5203 [OopsOutOfMemory] auto-import sql() function
      e160697 [OopsOutOfMemory] Merge branch 'sqlContextInShell' of https://github.com/OopsOutOfMemory/spark into sqlContextInShell
      37c0a16 [OopsOutOfMemory] auto detect hive support
      a9c59d9 [OopsOutOfMemory] rename and reduce range of imports
      6b9e309 [OopsOutOfMemory] Merge branch 'master' into sqlContextInShell
      cae652f [OopsOutOfMemory] make sqlContext available in spark shell
      3d3ecd77
  18. Feb 05, 2015
  19. Feb 02, 2015
    • Jacek Lewandowski's avatar
      Spark 3883: SSL support for HttpServer and Akka · cfea3003
      Jacek Lewandowski authored
      SPARK-3883: SSL support for Akka connections and Jetty based file servers.
      
      This story introduced the following changes:
      - Introduced SSLOptions object which holds the SSL configuration and can build the appropriate configuration for Akka or Jetty. SSLOptions can be created by parsing SparkConf entries at a specified namespace.
      - SSLOptions is created and kept by SecurityManager
      - All Akka actor address creation snippets based on interpolated strings were replaced by a dedicated methods from AkkaUtils. Those methods select the proper Akka protocol - whether akka.tcp or akka.ssl.tcp
      - Added tests cases for AkkaUtils, FileServer, SSLOptions and SecurityManager
      - Added a way to use node local SSL configuration by executors and driver in standalone mode. It can be done by specifying spark.ssl.useNodeLocalConf in SparkConf.
      - Made CoarseGrainedExecutorBackend not overwrite the settings which are executor startup configuration - they are passed anyway from Worker
      
      Refer to https://github.com/apache/spark/pull/3571 for discussion and details
      
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      Author: Jacek Lewandowski <jacek.lewandowski@datastax.com>
      
      Closes #3571 from jacek-lewandowski/SPARK-3883-master and squashes the following commits:
      
      9ef4ed1 [Jacek Lewandowski] Merge pull request #2 from jacek-lewandowski/SPARK-3883-docs2
      fb31b49 [Jacek Lewandowski] SPARK-3883: Added SSL setup documentation
      2532668 [Jacek Lewandowski] SPARK-3883: Refactored AkkaUtils.protocol method to not use Try
      90a8762 [Jacek Lewandowski] SPARK-3883: Refactored methods to resolve Akka address and made it possible to easily configure multiple communication layers for SSL
      72b2541 [Jacek Lewandowski] SPARK-3883: A reference to the fallback SSLOptions can be provided when constructing SSLOptions
      93050f4 [Jacek Lewandowski] SPARK-3883: SSL support for HttpServer and Akka
      cfea3003
  20. Feb 01, 2015
    • Tobias Schlatter's avatar
      [SPARK-5353] Log failures in REPL class loading · 9f0a6e18
      Tobias Schlatter authored
      Author: Tobias Schlatter <tobias@meisch.ch>
      
      Closes #4130 from gzm0/log-repl-loading and squashes the following commits:
      
      4fa0582 [Tobias Schlatter] Log failures in REPL class loading
      9f0a6e18
    • Patrick Wendell's avatar
      [SPARK-3996]: Shade Jetty in Spark deliverables · a15f6e31
      Patrick Wendell authored
      (v2 of this patch with a fix that was only relevant for the maven build).
      
      This patch piggy-back's on vanzin's work to simplify the Guava shading,
      and adds Jetty as a shaded library in Spark. Other than adding Jetty,
      it consilidates the <artifactSet>'s into the root pom. I found it was
      a bit easier to follow that way, since you don't need to look into
      child pom's to find out specific artifact sets included in shading.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4285 from pwendell/jetty and squashes the following commits:
      
      d3e7f4e [Patrick Wendell] Fix for shaded deps causing compile errors
      19f0710 [Patrick Wendell] More code review feedback
      961452d [Patrick Wendell] Responding to feedback from Marcello
      6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
      a15f6e31
  21. Jan 28, 2015
    • Reynold Xin's avatar
      [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame. · c8e934ef
      Reynold Xin authored
      and
      
      [SPARK-5448][SQL] Make CacheManager a concrete class and field in SQLContext
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4242 from rxin/sqlCleanup and squashes the following commits:
      
      e351cb2 [Reynold Xin] Fixed toDataFrame.
      6545c42 [Reynold Xin] More changes.
      728c017 [Reynold Xin] [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame.
      c8e934ef
  22. Jan 18, 2015
    • Patrick Wendell's avatar
      [HOTFIX]: Minor clean up regarding skipped artifacts in build files. · ad16da1b
      Patrick Wendell authored
      There are two relevant 'skip' configurations in the build, the first
      is for "mvn install" and the second is for "mvn deploy". As of 1.2,
      we actually use "mvn install" to generate our deployed artifcts,
      because we have some customization of the nexus upload due to having
      to cross compile for Scala 2.10 and 2.11.
      
      There is no reason to have differents settings for these values,
      this patch simply cleans this up for the repl/ and yarn/
      projects.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4080 from pwendell/master and squashes the following commits:
      
      e21b78b [Patrick Wendell] [HOTFIX]: Minor clean up regarding skipped artifacts in build files.
      ad16da1b
  23. Jan 16, 2015
    • Chip Senkbeil's avatar
      [SPARK-4923][REPL] Add Developer API to REPL to allow re-publishing the REPL jar · d05c9ee6
      Chip Senkbeil authored
      As requested in [SPARK-4923](https://issues.apache.org/jira/browse/SPARK-4923), I've provided a rough DeveloperApi for the repl. I've only done this for Scala 2.10 because it does not appear that Scala 2.11 is implemented. The Scala 2.11 repl still has the old `scala.tools.nsc` package and the SparkIMain does not appear to have the class server needed for shipping code over (unless this functionality has been moved elsewhere?). I also left alone the `ExecutorClassLoader` and `ConstructorCleaner` as I have no experience working with those classes.
      
      This marks the majority of methods in `SparkIMain` as _private_ with a few special cases being _private[repl]_ as other classes within the same package access them. Any public method has been marked with `DeveloperApi` as suggested by pwendell and I took the liberty of writing up a Scaladoc for each one to further elaborate their usage.
      
      As the Scala 2.11 REPL [conforms]((https://github.com/scala/scala/pull/2206)) to [JSR-223](http://docs.oracle.com/javase/8/docs/technotes/guides/scripting/), the [Spark Kernel](https://github.com/ibm-et/spark-kernel) uses the SparkIMain of Scala 2.10 in the same manner. So, I've taken care to expose methods predominately related to necessary functionality towards a JSR-223 scripting engine implementation.
      
      1. The ability to _get_ variables from the interpreter (and other information like class/symbol/type)
      2. The ability to _put_ variables into the interpreter
      3. The ability to _compile_ code
      4. The ability to _execute_ code
      5. The ability to get contextual information regarding the scripting environment
      
      Additional functionality that I marked as exposed included the following:
      
      1. The blocking initialization method (needed to actually start SparkIMain instance)
      2. The class server uri (needed to set the _spark.repl.class.uri_ property after initialization), reduced from the entire class server
      3. The class output directory (beneficial for tools like ours that need to inspect and use the directory where class files are served)
      4. Suppression (quiet/silence) mechanics for output
      5. Ability to add a jar to the compile/runtime classpath
      6. The reset/close functionality
      7. Metric information (last variable assignment, "needed" for extracting results from last execution, real variable name for better debugging)
      8. Execution wrapper (useful to have, but debatable)
      
      Aside from `SparkIMain`, I updated other classes/traits and their methods in the _repl_ package to be private/package protected where possible. A few odd cases (like the SparkHelper being in the scala.tools.nsc package to expose a private variable) still exist, but I did my best at labelling them.
      
      `SparkCommandLine` has proven useful to extract settings and `SparkJLineCompletion` has proven to be useful in implementing auto-completion in the [Spark Kernel](https://github.com/ibm-et/spark-kernel) project. Other than those - and `SparkIMain` - my experience has yielded that other classes/methods are not necessary for interactive applications taking advantage of the REPL API.
      
      Tested via the following:
      
          $ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
          $ mvn -Phadoop-2.3 -DskipTests clean package && mvn -Phadoop-2.3 test
      
      Also did a quick verification that I could start the shell and execute some code:
      
          $ ./bin/spark-shell
          ...
      
          scala> val x = 3
          x: Int = 3
      
          scala> sc.parallelize(1 to 10).reduce(_+_)
          ...
          res1: Int = 55
      
      Author: Chip Senkbeil <rcsenkbe@us.ibm.com>
      Author: Chip Senkbeil <chip.senkbeil@gmail.com>
      
      Closes #4034 from rcsenkbeil/AddDeveloperApiToRepl and squashes the following commits:
      
      053ca75 [Chip Senkbeil] Fixed failed build by adding missing DeveloperApi import
      c1b88aa [Chip Senkbeil] Added DeveloperApi to public classes in repl
      6dc1ee2 [Chip Senkbeil] Added missing method to expose error reporting flag
      26fd286 [Chip Senkbeil] Refactored other Scala 2.10 classes and methods to be private/package protected where possible
      925c112 [Chip Senkbeil] Added DeveloperApi and Scaladocs to SparkIMain for Scala 2.10
      d05c9ee6
  24. Jan 13, 2015
    • WangTaoTheTonic's avatar
      [SPARK-5006][Deploy]spark.port.maxRetries doesn't work · f7741a9a
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-5006
      
      I think the issue is produced in https://github.com/apache/spark/pull/1777.
      
      Not digging mesos's backend yet. Maybe should add same logic either.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #3841 from WangTaoTheTonic/SPARK-5006 and squashes the following commits:
      
      8cdf96d [WangTao] indent thing
      2d86d65 [WangTaoTheTonic] fix line length
      7cdfd98 [WangTaoTheTonic] fit for new HttpServer constructor
      61a370d [WangTaoTheTonic] some minor fixes
      bc6e1ec [WangTaoTheTonic] rebase
      67bcb46 [WangTaoTheTonic] put conf at 3rd position, modify suite class, add comments
      f450cd1 [WangTaoTheTonic] startServiceOnPort will use a SparkConf arg
      29b751b [WangTaoTheTonic] rebase as ExecutorRunnableUtil changed to ExecutorRunnable
      396c226 [WangTaoTheTonic] make the grammar more like scala
      191face [WangTaoTheTonic] invalid value name
      62ec336 [WangTaoTheTonic] spark.port.maxRetries doesn't work
      f7741a9a
  25. Jan 08, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4048] Enhance and extend hadoop-provided profile. · 48cecf67
      Marcelo Vanzin authored
      This change does a few things to make the hadoop-provided profile more useful:
      
      - Create new profiles for other libraries / services that might be provided by the infrastructure
      - Simplify and fix the poms so that the profiles are only activated while building assemblies.
      - Fix tests so that they're able to run when the profiles are activated
      - Add a new env variable to be used by distributions that use these profiles to provide the runtime
        classpath for Spark jobs and daemons.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
      
      82eb688 [Marcelo Vanzin] Add a comment.
      eb228c0 [Marcelo Vanzin] Fix borked merge.
      4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
      371ebee [Marcelo Vanzin] Review feedback.
      52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      322f882 [Marcelo Vanzin] Fix merge fail.
      f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9640503 [Marcelo Vanzin] Cleanup child process log message.
      115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
      e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
      7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
      1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
      d1399ed [Marcelo Vanzin] Restore jetty dependency.
      82a54b9 [Marcelo Vanzin] Remove unused profile.
      5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
      1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
      f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
      9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
      d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
      4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
      417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
      2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
      1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
      284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
      48cecf67
  26. Jan 06, 2015
    • Sean Owen's avatar
      SPARK-4159 [CORE] Maven build doesn't run JUnit test suites · 4cba6eb4
      Sean Owen authored
      This PR:
      
      - Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
      - Tells `surefire` to test only Java tests
      - Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
      
      For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
      
      2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
      12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
      e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
      4cba6eb4
  27. Nov 21, 2014
  28. Nov 18, 2014
    • Marcelo Vanzin's avatar
      Bumping version to 1.3.0-SNAPSHOT. · 397d3aae
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3277 from vanzin/version-1.3 and squashes the following commits:
      
      7c3c396 [Marcelo Vanzin] Added temp repo to sbt build.
      5f404ff [Marcelo Vanzin] Add another exclusion.
      19457e7 [Marcelo Vanzin] Update old version to 1.2, add temporary 1.2 repo.
      3c8d705 [Marcelo Vanzin] Workaround for MIMA checks.
      e940810 [Marcelo Vanzin] Bumping version to 1.3.0-SNAPSHOT.
      397d3aae
  29. Nov 14, 2014
    • Sandy Ryza's avatar
      SPARK-4375. no longer require -Pscala-2.10 · f5f757e4
      Sandy Ryza authored
      It seems like the winds might have moved away from this approach, but wanted to post the PR anyway because I got it working and to show what it would look like.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3239 from sryza/sandy-spark-4375 and squashes the following commits:
      
      0ffbe95 [Sandy Ryza] Enable -Dscala-2.11 in sbt
      cd42d94 [Sandy Ryza] Update doc
      f6644c3 [Sandy Ryza] SPARK-4375 take 2
      f5f757e4
  30. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
  31. Oct 09, 2014
    • Sean Owen's avatar
      SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, Utils.createTempDir · 363baaca
      Sean Owen authored
      I noticed a few issues with how temp directories are created and deleted:
      
      *Minor*
      
      * Guava's `Files.createTempDir()` plus `File.deleteOnExit()` is used in many tests to make a temp dir, but `Utils.createTempDir()` seems to be the standard Spark mechanism
      * Call to `File.deleteOnExit()` could be pushed into `Utils.createTempDir()` as well, along with this replacement
      * _I messed up the message in an exception in `Utils` in SPARK-3794; fixed here_
      
      *Bit Less Minor*
      
      * `Utils.deleteRecursively()` fails immediately if any `IOException` occurs, instead of trying to delete any remaining files and subdirectories. I've observed this leave temp dirs around. I suggest changing it to continue in the face of an exception and throw one of the possibly several exceptions that occur at the end.
      * `Utils.createTempDir()` will add a JVM shutdown hook every time the method is called. Even if the subdir is the parent of another parent dir, since this check is inside the hook. However `Utils` manages a set of all dirs to delete on shutdown already, called `shutdownDeletePaths`. A single hook can be registered to delete all of these on exit. This is how Tachyon temp paths are cleaned up in `TachyonBlockManager`.
      
      I noticed a few other things that might be changed but wanted to ask first:
      
      * Shouldn't the set of dirs to delete be `File`, not just `String` paths?
      * `Utils` manages the set of `TachyonFile` that have been registered for deletion, but the shutdown hook is managed in `TachyonBlockManager`. Should this logic not live together, and not in `Utils`? it's more specific to Tachyon, and looks a slight bit odd to import in such a generic place.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2670 from srowen/SPARK-3811 and squashes the following commits:
      
      071ae60 [Sean Owen] Update per @vanzin's review
      da0146d [Sean Owen] Make Utils.deleteRecursively try to delete all paths even when an exception occurs; use one shutdown hook instead of one per method call to delete temp dirs
      3a0faa4 [Sean Owen] Standardize on Utils.createTempDir instead of Files.createTempDir
      363baaca
  32. Oct 08, 2014
    • Ahir Reddy's avatar
      [SPARK-3836] [REPL] Spark REPL optionally propagate internal exceptions · c7818434
      Ahir Reddy authored
      Optionally have the repl throw exceptions generated by interpreted code, instead of swallowing the exception and returning it as text output. This is useful when embedding the repl, otherwise it's not possible to know when user code threw an exception.
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      
      Closes #2695 from ahirreddy/repl-throw-exceptions and squashes the following commits:
      
      bad25ee [Ahir Reddy] Style Fixes
      f0e5b44 [Ahir Reddy] Fixed style
      0d4413d [Ahir Reddy] propogate excetions from repl
      c7818434
  33. Oct 01, 2014
    • Reynold Xin's avatar
      [SPARK-3748] Log thread name in unit test logs · 3888ee2f
      Reynold Xin authored
      Thread names are useful for correlating failures.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2600 from rxin/log4j and squashes the following commits:
      
      83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs
      3888ee2f
  34. Sep 14, 2014
    • Prashant Sharma's avatar
      [SPARK-3452] Maven build should skip publishing artifacts people shouldn... · f493f798
      Prashant Sharma authored
      ...'t depend on
      
      Publish local in maven term is `install`
      
      and publish otherwise is `deploy`
      
      So disabled both for following projects.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2329 from ScrapCodes/SPARK-3452/maven-skip-install and squashes the following commits:
      
      257b79a [Prashant Sharma] [SPARK-3452] Maven build should skip publishing artifacts people shouldn't depend on
      f493f798
  35. Sep 11, 2014
    • witgo's avatar
      SPARK-2482: Resolve sbt warnings during build · 33c7a738
      witgo authored
      At the same time, import the `scala.language.postfixOps` and ` org.scalatest.time.SpanSugar._` cause `scala.language.postfixOps` doesn't work
      
      Author: witgo <witgo@qq.com>
      
      Closes #1330 from witgo/sbt_warnings3 and squashes the following commits:
      
      179ba61 [witgo] Resolve sbt warnings during build
      33c7a738
Loading