Skip to content
Snippets Groups Projects
  1. Jun 25, 2014
    • Reynold Xin's avatar
      [SPARK-2270] Kryo cannot serialize results returned by asJavaIterable · 7ff2c754
      Reynold Xin authored
      and thus groupBy/cogroup are broken in Java APIs when Kryo is used).
      
      @pwendell this should be merged into 1.0.1.
      
      Thanks @sorenmacbeth for reporting this & helping out with the fix.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1206 from rxin/kryo-iterable-2270 and squashes the following commits:
      
      09da0aa [Reynold Xin] Updated the comment.
      009bf64 [Reynold Xin] [SPARK-2270] Kryo cannot serialize results returned by asJavaIterable (and thus groupBy/cogroup are broken in Java APIs when Kryo is used).
      7ff2c754
    • Andrew Or's avatar
      [SPARK-2258 / 2266] Fix a few worker UI bugs · 9aa60329
      Andrew Or authored
      **SPARK-2258.** Worker UI displays zombie processes if the executor throws an exception before a process is launched. This is because we only inform the Worker of the change if the process is already launched, which in this case it isn't.
      
      **SPARK-2266.** We expose "Some(app-id)" on the log page. This is fairly minor.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1213 from andrewor14/fix-worker-ui and squashes the following commits:
      
      c1223fe [Andrew Or] Fix worker UI bugs
      9aa60329
    • Andrew Or's avatar
      [SPARK-2242] HOTFIX: pyspark shell hangs on simple job · 5603e4c4
      Andrew Or authored
      This reverts a change introduced in 38702487, which redirected all stderr to the OS pipe instead of directly to the `bin/pyspark` shell output. This causes a simple job to hang in two ways:
      
      1. If the cluster is not configured correctly or does not have enough resources, the job hangs without producing any output, because the relevant warning messages are masked.
      2. If the stderr volume is large, this could lead to a deadlock if we redirect everything to the OS pipe. From the [python docs](https://docs.python.org/2/library/subprocess.html):
      
      ```
      Note Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock
      based on the child process output volume. Use Popen with the communicate() method
      when you need pipes.
      ```
      
      Note that we cannot remove `stdout=PIPE` in a similar way, because we currently use it to communicate the py4j port. However, it should be fine (as it has been for a long time) because we do not produce a ton of traffic through `stdout`.
      
      That commit was not merged in branch-1.0, so this fix is for master only.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1178 from andrewor14/fix-python and squashes the following commits:
      
      e68e870 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-python
      20849a8 [Andrew Or] Tone down stdout interference message
      a09805b [Andrew Or] Return more than 1 line of error message to user
      6dfbd1e [Andrew Or] Don't swallow original exception
      0d1861f [Andrew Or] Provide more helpful output if stdout is garbled
      21c9d7c [Andrew Or] Do not mask stderr from output
      5603e4c4
    • Reynold Xin's avatar
      ac06a85d
    • CodingCat's avatar
      SPARK-2038: rename "conf" parameters in the saveAsHadoop functions with source-compatibility · acc01ab3
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-2038
      
      to differentiate with SparkConf object and at the same time keep the source level compatibility
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #1137 from CodingCat/SPARK-2038 and squashes the following commits:
      
      11abeba [CodingCat] revise the comments
      7ee5712 [CodingCat] to keep the source-compatibility
      763975f [CodingCat] style fix
      d91288d [CodingCat] rename "conf" parameters in the saveAsHadoop functions
      acc01ab3
    • Cheng Lian's avatar
      [BUGFIX][SQL] Should match java.math.BigDecimal when wnrapping Hive output · 22036aeb
      Cheng Lian authored
      The `BigDecimal` branch in `unwrap` matches to `scala.math.BigDecimal` rather than `java.math.BigDecimal`.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1199 from liancheng/javaBigDecimal and squashes the following commits:
      
      e9bb481 [Cheng Lian] Should match java.math.BigDecimal when wnrapping Hive output
      22036aeb
    • Cheng Lian's avatar
      [SPARK-2263][SQL] Support inserting MAP<K, V> to Hive tables · 8fade897
      Cheng Lian authored
      JIRA issue: [SPARK-2263](https://issues.apache.org/jira/browse/SPARK-2263)
      
      Map objects were not converted to Hive types before inserting into Hive tables.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1205 from liancheng/spark-2263 and squashes the following commits:
      
      c7a4373 [Cheng Lian] Addressed @concretevitamin's comment
      784940b [Cheng Lian] SARPK-2263: support inserting MAP<K, V> to Hive tables
      8fade897
  2. Jun 24, 2014
    • witgo's avatar
      SPARK-2248: spark.default.parallelism does not apply in local mode · b6b44853
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #1194 from witgo/SPARK-2248 and squashes the following commits:
      
      6ac950b [witgo] spark.default.parallelism does not apply in local mode
      b6b44853
    • Michael Armbrust's avatar
      Fix possible null pointer in acumulator toString · 2714968e
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1204 from marmbrus/nullPointerToString and squashes the following commits:
      
      35b5fce [Michael Armbrust] Fix possible null pointer in acumulator toString
      2714968e
    • Matthew Farrellee's avatar
      Autodetect JAVA_HOME on RPM-based systems · 54055fb2
      Matthew Farrellee authored
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #1185 from mattf/master-1 and squashes the following commits:
      
      42150fc [Matthew Farrellee] Autodetect JAVA_HOME on RPM-based systems
      54055fb2
    • Cheng Hao's avatar
      [SQL]Add base row updating methods for JoinedRow · 133495d8
      Cheng Hao authored
      This will be helpful in join operators.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #1187 from chenghao-intel/joinedRow and squashes the following commits:
      
      87c19e3 [Cheng Hao] Add base row set methods for JoinedRow
      133495d8
    • Xiangrui Meng's avatar
      [SPARK-1112, 2156] Bootstrap to fetch the driver's Spark properties. · 8ca41769
      Xiangrui Meng authored
      This is an alternative solution to #1124 . Before launching the executor backend, we first fetch driver's spark properties and use it to overwrite executor's spark properties. This should be better than #1124.
      
      @pwendell Are there spark properties that might be different on the driver and on the executors?
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1132 from mengxr/akka-bootstrap and squashes the following commits:
      
      77ff32d [Xiangrui Meng] organize imports
      68e1dfb [Xiangrui Meng] use timeout from AkkaUtils; remove props from RegisteredExecutor
      46d332d [Xiangrui Meng] fix a test
      7947c18 [Xiangrui Meng] increase slack size for akka
      4ab696a [Xiangrui Meng] bootstrap to retrieve driver spark conf
      8ca41769
    • Michael Armbrust's avatar
      [SPARK-2264][SQL] Fix failing CachedTableSuite · a162c9b3
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1201 from marmbrus/fixCacheTests and squashes the following commits:
      
      9d87ed1 [Michael Armbrust] Use analyzer (which runs to fixed point) instead of manually removing analysis operators.
      a162c9b3
    • Kay Ousterhout's avatar
      Fix broken Json tests. · 1978a903
      Kay Ousterhout authored
      The assertJsonStringEquals method was missing an "assert" so
      did not actually check that the strings were equal. This commit
      adds the missing assert and fixes subsequently revealed problems
      with the JsonProtocolSuite.
      
      @andrewor14 I changed some of the test functionality to match what it
      looks like you intended based on the expected strings -- let me know if
      anything here looks wrong.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #1198 from kayousterhout/json_test_fix and squashes the following commits:
      
      77f858f [Kay Ousterhout] Fix broken Json tests.
      1978a903
    • Patrick Wendell's avatar
      HOTFIX: Disabling tests per SPARK-2264 · 221909e6
      Patrick Wendell authored
      221909e6
    • Rui Li's avatar
      SPARK-1937: fix issue with task locality · 924b7082
      Rui Li authored
      Don't check executor/host availability when creating a TaskSetManager. Because the executors may haven't been registered when the TaskSetManager is created, in which case all tasks will be considered "has no preferred locations", and thus losing data locality in later scheduling.
      
      Author: Rui Li <rui.li@intel.com>
      Author: lirui-intel <rui.li@intel.com>
      
      Closes #892 from lirui-intel/delaySchedule and squashes the following commits:
      
      8444d7c [Rui Li] fix code style
      fafd57f [Rui Li] keep locality constraints within the valid levels
      18f9e05 [Rui Li] restrict allowed locality
      5b3fb2f [Rui Li] refine UT
      99f843e [Rui Li] add unit test and fix bug
      fff4123 [Rui Li] fix computing valid locality levels
      685ed3d [Rui Li] remove delay shedule for pendingTasksWithNoPrefs
      7b0177a [Rui Li] remove redundant code
      c7b93b5 [Rui Li] revise patch
      3d7da02 [lirui-intel] Update TaskSchedulerImpl.scala
      cab4c71 [Rui Li] revised patch
      539a578 [Rui Li] fix code style
      cf0d6ac [Rui Li] fix code style
      3dfae86 [Rui Li] re-compute pending tasks when new host is added
      a225ac2 [Rui Li] SPARK-1937: fix issue with task locality
      924b7082
    • Reynold Xin's avatar
      [SPARK-2252] Fix MathJax for HTTPs. · 420c1c3e
      Reynold Xin authored
      Found out about this from the Hacker News link to GraphX which was using HTTPs.
      
      @mengxr
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1189 from rxin/mllib-doc and squashes the following commits:
      
      5328be0 [Reynold Xin] [SPARK-2252] Fix MathJax for HTTPs.
      420c1c3e
  3. Jun 23, 2014
    • jerryshao's avatar
      [SPARK-2124] Move aggregation into shuffle implementations · 56eb8af1
      jerryshao authored
      This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations.
      
      I leave `CoGoupedRDD` and `SubtractedRDD` unchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs.
      
      Also I do not move sort related code of `OrderedRDDFunctions` into shuffle, this will be solved in another sub-task.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #1064 from jerryshao/SPARK-2124 and squashes the following commits:
      
      4a05a40 [jerryshao] Modify according to comments
      1f7dcc8 [jerryshao] Style changes
      50a2fd6 [jerryshao] Fix test suite issue after moving aggregator to Shuffle reader and writer
      1a96190 [jerryshao] Code modification related to the ShuffledRDD
      308f635 [jerryshao] initial works of move combiner to ShuffleManager's reader and writer
      56eb8af1
    • Reynold Xin's avatar
      [SPARK-2227] Support dfs command in SQL. · 51c81683
      Reynold Xin authored
      Note that nothing gets printed to the console because we don't properly maintain session state right now.
      
      I will have a followup PR that fixes it.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1167 from rxin/commands and squashes the following commits:
      
      56f04f8 [Reynold Xin] [SPARK-2227] Support dfs command in SQL.
      51c81683
    • Henry Saputra's avatar
      Cleanup on Connection, ConnectionManagerId, ConnectionManager classes part 2 · 383bf72c
      Henry Saputra authored
      Cleanup on Connection, ConnectionManagerId, and ConnectionManager classes part 2 while I was working at the code there to help IDE:
      1. Remove unused imports
      2. Remove parentheses in method calls that do not have side affect.
      3. Add parentheses in method calls that do have side effect or not simple get to object properties.
      4. Change if-else check (via isInstanceOf) for Connection class type with Scala expression for consistency and cleanliness.
      5. Remove semicolon
      6. Remove extra spaces.
      7. Remove redundant return for consistency
      
      Author: Henry Saputra <henry.saputra@gmail.com>
      
      Closes #1157 from hsaputra/cleanup_connection_classes_part2 and squashes the following commits:
      
      4be6906 [Henry Saputra] Fix Spark Scala style for line over 100 chars.
      85b24f7 [Henry Saputra] Cleanup on Connection and ConnectionManager classes part 2 while I was working at the code there to help IDE: 1. Remove unused imports 2. Remove parentheses in method calls that do not have side affect. 3. Add parentheses in method calls that do have side effect. 4. Change if-else check (via isInstanceOf) for Connection class type with Scala expression for consitency and cleanliness. 5. Remove semicolon 6. Remove extra spaces.
      383bf72c
    • Marcelo Vanzin's avatar
      [SPARK-1768] History server enhancements. · 21ddd7d1
      Marcelo Vanzin authored
      Two improvements to the history server:
      
      - Separate the HTTP handling from history fetching, so that it's easy to add
        new backends later (thinking about SPARK-1537 in the long run)
      
      - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
        memory for faster access. This allows the app limit to go away, since holding
        just the listing in memory shouldn't be too expensive unless the user has millions
        of completed apps in the history (at which point I'd expect other issues to arise
        aside from history server memory usage, such as FileSystem.listStatus()
        starting to become ridiculously expensive).
      
      I also fixed a few minor things along the way which aren't really worth mentioning.
      I also removed the app's log path from the UI since that information may not even
      exist depending on which backend is used (even though there is only one now).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #718 from vanzin/hist-server and squashes the following commits:
      
      53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording.
      c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs.
      dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration.
      4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo.
      2a7f68d [Marcelo Vanzin] Address review feedback.
      4e72c77 [Marcelo Vanzin] Remove comment about ordering.
      249bcea [Marcelo Vanzin] Remove offset / count from provider interface.
      ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps.
      6e2432f [Marcelo Vanzin] Second round of feedback.
      b2c570a [Marcelo Vanzin] Make class package-private.
      4406f61 [Marcelo Vanzin] Cosmetic change to listing header.
      e852149 [Marcelo Vanzin] Initialize new app array to expected size.
      e8026f4 [Marcelo Vanzin] Review feedback.
      49d2fd3 [Marcelo Vanzin] Fix a comment.
      91e96ca [Marcelo Vanzin] Fix scalastyle issues.
      6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info.
      eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down.
      bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI.
      b284478 [Marcelo Vanzin] Separate history server from history backend.
      21ddd7d1
    • Prashant Sharma's avatar
      [SPARK-2118] spark class should complain if tools jar is missing. · 6dc6722a
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1068 from ScrapCodes/SPARK-2118/tools-jar-check and squashes the following commits:
      
      29e768b [Prashant Sharma] Code Review
      5cb6f7d [Prashant Sharma] [SPARK-2118] spark class should complaing if tools jar is missing.
      6dc6722a
    • Cheng Lian's avatar
      [SPARK-1669][SQL] Made cacheTable idempotent · a4bc442c
      Cheng Lian authored
      JIRA issue: [SPARK-1669](https://issues.apache.org/jira/browse/SPARK-1669)
      
      Caching the same table multiple times should end up with only 1 in-memory columnar representation of this table.
      
      Before:
      
      ```
      scala> loadTestTable("src")
      ...
      scala> cacheTable("src")
      ...
      scala> cacheTable("src")
      ...
      scala> table("src")
      ...
      == Query Plan ==
      InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (HiveTableScan [key#2,value#3], (MetastoreRelation default, src, None), None))))
      ```
      
      After:
      
      ```
      scala> loadTestTable("src")
      ...
      scala> cacheTable("src")
      ...
      scala> cacheTable("src")
      ...
      scala> table("src")
      ...
      == Query Plan ==
      InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (HiveTableScan [key#2,value#3], (MetastoreRelation default, src, None), None))
      ```
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1183 from liancheng/spark-1669 and squashes the following commits:
      
      68f8a20 [Cheng Lian] Removed an unused import
      51bae90 [Cheng Lian] Made cacheTable idempotent
      a4bc442c
    • Matthew Farrellee's avatar
      Fix mvn detection · 853a2b95
      Matthew Farrellee authored
      When mvn is not detected (not in executor's path), 'set -e' causes the
      detection to terminate the script before the helpful error message can
      be displayed.
      
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #1181 from mattf/master-0 and squashes the following commits:
      
      506549f [Matthew Farrellee] Fix mvn detection
      853a2b95
    • Vlad's avatar
      Fixed small running on YARN docs typo · b88238fa
      Vlad authored
      The backslash is needed for multiline command
      
      Author: Vlad <frolvlad@gmail.com>
      
      Closes #1158 from frol/patch-1 and squashes the following commits:
      
      e258044 [Vlad] Fixed small running on YARN docs typo
      b88238fa
    • Marcelo Vanzin's avatar
      [SPARK-1395] Fix "local:" URI support in Yarn mode (again). · e380767d
      Marcelo Vanzin authored
      Recent changes ignored the fact that path may be defined with "local:"
      URIs, which means they need to be explicitly added to the classpath
      everywhere a remote process is started. This change fixes that by:
      
      - Using the correct methods to add paths to the classpath
      - Creating SparkConf settings for the Spark jar itself and for the
        user's jar
      - Propagating those two settings to the remote processes where needed
      
      This ensures that both in client and in cluster mode, the driver has
      the necessary info to build the executor's classpath and have things
      still work when they contain "local:" references.
      
      The change also fixes some confusion in ClientBase about whether
      to use SparkConf or system properties to propagate config options to
      the driver and executors, by standardizing on using data held by
      SparkConf.
      
      On the cleanup front, I removed the hacky way that log4j configuration
      was being propagated to handle the "local:" case. It's much more cleanly
      (and generically) handled by using spark-submit arguments (--files to
      upload a config file, or setting spark.executor.extraJavaOptions to pass
      JVM arguments and use a local file).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #560 from vanzin/yarn-local-2 and squashes the following commits:
      
      4e7f066 [Marcelo Vanzin] Correctly propagate SPARK_JAVA_OPTS to driver/executor.
      6a454ea [Marcelo Vanzin] Use constants for PWD in test.
      6dd5943 [Marcelo Vanzin] Fix propagation of config options to driver / executor.
      b2e377f [Marcelo Vanzin] Review feedback.
      93c3f85 [Marcelo Vanzin] Fix ClassCastException in test.
      e5c682d [Marcelo Vanzin] Fix cluster mode, restore SPARK_LOG4J_CONF.
      1dfbb40 [Marcelo Vanzin] Add documentation for spark.yarn.jar.
      bbdce05 [Marcelo Vanzin] [SPARK-1395] Fix "local:" URI support in Yarn mode (again).
      e380767d
  4. Jun 22, 2014
    • Jean-Martin Archer's avatar
      SPARK-2166 - Listing of instances to be terminated before the prompt · 9cb64b2c
      Jean-Martin Archer authored
      Will list the EC2 instances before detroying the cluster.
      This was added because it can be scary to destroy EC2
      instances without knowing which one will be impacted.
      
      Author: Jean-Martin Archer <jeanmartin.archer@pulseenergy.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Patrick Wendell <pwendell@gmail.com>
      
      Closes #270 from j-martin/master and squashes the following commits:
      
      826455f [Jean-Martin Archer] [SPARK-2611] Implementing recommendations
      27b0a36 [Jean-Martin Archer] Listing of instances to be terminated before the prompt Will list the EC2 instances before detroying the cluster. This was added because it can be scary to destroy EC2 instances without knowing which one will be impacted.
      9cb64b2c
    • Ori Kremer's avatar
      SPARK-2241: quote command line args in ec2 script · 9fc373e3
      Ori Kremer authored
      To preserve quoted command line args (in case options have space in them).
      
      Author: Ori Kremer <ori.kremer@gmail.com>
      
      Closes #1169 from orikremer/quote_cmd_line_args and squashes the following commits:
      
      67e2aa1 [Ori Kremer] quote command line args
      9fc373e3
    • witgo's avatar
      SPARK-2229: FileAppender throw an llegalArgumentException in jdk6 · 409d24e2
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #1174 from witgo/SPARK-2229 and squashes the following commits:
      
      f85f321 [witgo] FileAppender throw anIllegalArgumentException in JDK6
      e1a8da8 [witgo] SizeBasedRollingPolicy throw an java.lang.IllegalArgumentException in JDK6
      409d24e2
    • Sean Owen's avatar
      SPARK-1316. Remove use of Commons IO · 9fe28c35
      Sean Owen authored
      Commons IO is actually barely used, and is not a declared dependency. This just replaces with equivalents from the JDK and Guava.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1173 from srowen/SPARK-1316 and squashes the following commits:
      
      2eb53db [Sean Owen] Reorder Guava import
      8fde404 [Sean Owen] Remove use of Commons IO, which is not actually a dependency
      9fe28c35
    • Sean Owen's avatar
      SPARK-2034. KafkaInputDStream doesn't close resources and may prevent JVM shutdown · 476581e8
      Sean Owen authored
      Tobias noted today on the mailing list:
      
      ========
      
      I am trying to use Spark Streaming with Kafka, which works like a
      charm – except for shutdown. When I run my program with "sbt
      run-main", sbt will never exit, because there are two non-daemon
      threads left that don't die.
      I created a minimal example at
      <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-kafkadoesntshutdown-scala>.
      It starts a StreamingContext and does nothing more than connecting to
      a Kafka server and printing what it receives. Using the `future
      Unknown macro: { ... }
      ` construct, I shut down the StreamingContext after some seconds and
      then print the difference between the threads at start time and at end
      time. The output can be found at
      <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output1>.
      There are a number of threads remaining that will prevent sbt from
      exiting.
      When I replace `KafkaUtils.createStream(...)` with a call that does
      exactly the same, except that it calls `consumerConnector.shutdown()`
      in `KafkaReceiver.onStop()` (which it should, IMO), the output is as
      shown at <https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output2>.
      Does anyone have any idea what is going on here and why the program
      doesn't shut down properly? The behavior is the same with both kafka
      0.8.0 and 0.8.1.1, by the way.
      
      ========
      
      Something similar was noted last year:
      
      http://mail-archives.apache.org/mod_mbox/spark-dev/201309.mbox/%3C1380220041.2428.YahooMailNeo@web160804.mail.bf1.yahoo.com%3E
      
      KafkaInputDStream doesn't close `ConsumerConnector` in `onStop()`, and does not close the `Executor` it creates. The latter leaves non-daemon threads and can prevent the JVM from shutting down even if streaming is closed properly.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #980 from srowen/SPARK-2034 and squashes the following commits:
      
      9f31a8d [Sean Owen] Restore ClassTag to private class because MIMA flags it; is the shadowing intended?
      2d579a8 [Sean Owen] Close ConsumerConnector in onStop; shutdown() the local Executor that is created so that its threads stop when done; close the Zookeeper client even on exception; fix a few typos; log exceptions that otherwise vanish
      476581e8
    • Patrick Wendell's avatar
      SPARK-2231: dev/run-tests should include YARN and use a recent Hadoop version · 58b32f34
      Patrick Wendell authored
      ...rsion
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #1175 from pwendell/test-hadoop-version and squashes the following commits:
      
      9210ef4 [Patrick Wendell] SPARK-2231: dev/run-tests should include YARN and use a recent Hadoop version
      58b32f34
    • Sean Owen's avatar
      SPARK-1996. Remove use of special Maven repo for Akka · 1db9cbc3
      Sean Owen authored
      Just following up Matei's suggestion to remove the Akka repo references. Builds and the audit-release script appear OK.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1170 from srowen/SPARK-1996 and squashes the following commits:
      
      5ca2930 [Sean Owen] Remove outdated Akka repository references
      1db9cbc3
  5. Jun 21, 2014
  6. Jun 20, 2014
    • Marcelo Vanzin's avatar
      Fix some tests. · 648553d4
      Marcelo Vanzin authored
      - JavaAPISuite was trying to compare a bare path with a URI. Fix by
        extracting the path from the URI, since we know it should be a
        local path anyway/
      
      - b9be1609 excluded the ASM dependency everywhere, but easymock needs
        it (because cglib needs it). So re-add the dependency, with test
        scope this time.
      
      The second one above actually uncovered a weird situation: the maven
      test target works, even though I can't find the class sbt complains
      about in its classpath. sbt complains with:
      
        [error] Uncaught exception when running org.apache.spark.util
        .random.RandomSamplerSuite: java.lang.NoClassDefFoundError:
        org/objectweb/asm/Type
      
      To avoid more weirdness caused by that, I explicitly added the asm
      dependency to both maven and sbt (for tests only), and verified
      the classes don't end up in the final assembly.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #917 from vanzin/flaky-tests and squashes the following commits:
      
      d022320 [Marcelo Vanzin] Fix some tests.
      648553d4
    • Anant's avatar
      [SPARK-2061] Made splits deprecated in JavaRDDLike · 010c460d
      Anant authored
      The jira for the issue can be found at: https://issues.apache.org/jira/browse/SPARK-2061
      Most of spark has used over to consistently using `partitions` instead of `splits`. We should do likewise and add a `partitions` method to JavaRDDLike and have `splits` just call that. We should also go through all cases where other API's (e.g. Python) call `splits` and we should change those to use the newer API.
      
      Author: Anant <anant.asty@gmail.com>
      
      Closes #1062 from anantasty/SPARK-2061 and squashes the following commits:
      
      b83ce6b [Anant] Fixed syntax issue
      21f9210 [Anant] Fixed version number in deprecation string
      9315b76 [Anant] made related changes to use partitions in python api
      8c62dd1 [Anant] Made splits deprecated in JavaRDDLike
      010c460d
    • Patrick Wendell's avatar
      a6786424
Loading