Skip to content
Snippets Groups Projects
  1. Apr 25, 2014
    • Sean Owen's avatar
      SPARK-1607. Replace octal literals, removed in Scala 2.11, with hex literals · 6e101f11
      Sean Owen authored
      Octal literals like "0700" are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618
      
      This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #529 from srowen/SPARK-1607 and squashes the following commits:
      
      1ee0e67 [Sean Owen] Use Integer.parseInt(...,8) for octal literal instead of hex equivalent
      0102f3d [Sean Owen] Replace octal literals, removed in Scala 2.11, with hex literals
      6e101f11
    • Aaron Davidson's avatar
      Call correct stop(). · 45ad7f0c
      Aaron Davidson authored
      Oopsie in #504.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #527 from aarondav/stop and squashes the following commits:
      
      8d1446a [Aaron Davidson] Call correct stop().
      45ad7f0c
    • Holden Karau's avatar
      SPARK-1242 Add aggregate to python rdd · e03bc379
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #139 from holdenk/add_aggregate_to_python_api and squashes the following commits:
      
      0f39ae3 [Holden Karau] Merge in master
      4879c75 [Holden Karau] CR feedback, fix issue with empty RDDs in aggregate
      70b4724 [Holden Karau] Style fixes from code review
      96b047b [Holden Karau] Add aggregate to python rdd
      e03bc379
  2. Apr 24, 2014
    • Sandeep's avatar
      Fix [SPARK-1078]: Remove the Unnecessary lift-json dependency · 095b5182
      Sandeep authored
      Remove the Unnecessary lift-json dependency from pom.xml
      
      Author: Sandeep <sandeep@techaddict.me>
      
      Closes #536 from techaddict/FIX-SPARK-1078 and squashes the following commits:
      
      bd0fd1d [Sandeep] Fix [SPARK-1078]: Replace lift-json with json4s-jackson. Remove the Unnecessary lift-json dependency from pom.xml
      095b5182
    • Andrew Or's avatar
      [Typo] In the maven docs: chd -> cdh · 06e82d94
      Andrew Or authored
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #548 from andrewor14/doc-typo and squashes the following commits:
      
      3eaf4c4 [Andrew Or] chd -> cdh
      06e82d94
    • Michael Armbrust's avatar
      Generalize pattern for planning hash joins. · 86ff8b10
      Michael Armbrust authored
      This will be helpful for [SPARK-1495](https://issues.apache.org/jira/browse/SPARK-1495) and other cases where we want to have custom hash join implementations but don't want to repeat the logic for finding the join keys.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #418 from marmbrus/hashFilter and squashes the following commits:
      
      d5cc79b [Michael Armbrust] Address @rxin 's comments.
      366b6d9 [Michael Armbrust] style fixes
      14560eb [Michael Armbrust] Generalize pattern for planning hash joins.
      f4809c1 [Michael Armbrust] Move common functions to PredicateHelper.
      86ff8b10
    • Tathagata Das's avatar
      [SPARK-1617] and [SPARK-1618] Improvements to streaming ui and bug fix to socket receiver · cd12dd9b
      Tathagata Das authored
      1617: These changes expose the receiver state (active or inactive) and last error in the UI
      1618: If the socket receiver cannot connect in the first attempt, it should try to restart after a delay. That was broken, as the thread that restarts (hence, stops) the receiver waited on Thread.join on itself!
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #540 from tdas/streaming-ui-fix and squashes the following commits:
      
      e469434 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-ui-fix
      dbddf75 [Tathagata Das] Style fix.
      66df1a5 [Tathagata Das] Merge remote-tracking branch 'apache/master' into streaming-ui-fix
      ad98bc9 [Tathagata Das] Refactored streaming listener to use ReceiverInfo.
      d7f849c [Tathagata Das] Revert "Moved BatchInfo from streaming.scheduler to streaming.ui"
      5c80919 [Tathagata Das] Moved BatchInfo from streaming.scheduler to streaming.ui
      da244f6 [Tathagata Das] Fixed socket receiver as well as made receiver state and error visible in the streamign UI.
      cd12dd9b
    • Mridul Muralidharan's avatar
      SPARK-1586 Windows build fixes · 968c0187
      Mridul Muralidharan authored
      Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.
      
      Author: Mridul Muralidharan <mridulm80@apache.org>
      
      This patch had conflicts when merged, resolved by
      Committer: Matei Zaharia <matei@databricks.com>
      
      Closes #505 from mridulm/windows_fixes and squashes the following commits:
      
      ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
      cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
      3267f4b [Mridul Muralidharan] Fix build failures
      35b277a [Mridul Muralidharan] Fix Scalastyle failures
      bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
      10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
      1337abd [Mridul Muralidharan] fix classpath while running in windows
      968c0187
    • tmalaska's avatar
      SPARK-1584: Upgrade Flume dependency to 1.4.0 · d5c6ae6c
      tmalaska authored
      Updated the Flume dependency in the maven pom file and the scala build file.
      
      Author: tmalaska <ted.malaska@cloudera.com>
      
      Closes #507 from tmalaska/master and squashes the following commits:
      
      79492c8 [tmalaska] excluded all thrift
      159c3f1 [tmalaska] fixed the flume pom file issues
      5bf56a7 [tmalaska] Upgrade flume version
      d5c6ae6c
    • Ahir Reddy's avatar
      [SPARK-986]: Job cancelation for PySpark · e53eb4f0
      Ahir Reddy authored
      * Additions to the PySpark API to cancel jobs
      * Monitor Thread in PythonRDD to kill Python workers if a task is interrupted
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      
      Closes #541 from ahirreddy/python-cancel and squashes the following commits:
      
      dfdf447 [Ahir Reddy] Changed success -> completed and made logging message clearer
      6c860ab [Ahir Reddy] PR Comments
      4b4100a [Ahir Reddy] Success flag
      adba6ed [Ahir Reddy] Destroy python workers
      27a2f8f [Ahir Reddy] Start the writer thread...
      d422f7b [Ahir Reddy] Remove unnecesssary vals
      adda337 [Ahir Reddy] Busy wait on the ocntext.interrupted flag, and then kill the python worker
      d9e472f [Ahir Reddy] Revert "removed unnecessary vals"
      5b9cae5 [Ahir Reddy] removed unnecessary vals
      07b54d9 [Ahir Reddy] Fix canceling unit test
      8ae9681 [Ahir Reddy] Don't interrupt worker
      7722342 [Ahir Reddy] Monitor Thread for python workers
      db04e16 [Ahir Reddy] Added canceling api to PySpark
      e53eb4f0
    • Andrew Or's avatar
      [SPARK-1615] Synchronize accesses to the LiveListenerBus' event queue · ee6f7e22
      Andrew Or authored
      Original poster is @zsxwing, who reported this bug in #516.
      
      Much of SparkListenerSuite relies on LiveListenerBus's `waitUntilEmpty()` method. As the name suggests, this waits until the event queue is empty. However, the following race condition could happen:
      
      (1) We dequeue an event
      (2) The queue is empty, we return true (even though the event has not been processed)
      (3) The test asserts something assuming that all listeners have finished executing (and fails)
      (4) The listeners receive and process the event
      
      This PR makes (1) and (4) atomic by synchronizing around it. To do that, however, we must avoid using `eventQueue.take`, which is blocking and will cause a deadlock if we synchronize around it. As a workaround, we use the non-blocking `eventQueue.poll` + a semaphore to provide the same semantics.
      
      This has been a possible race condition for a long time, but for some reason we've never run into it.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #544 from andrewor14/stage-info-test-fix and squashes the following commits:
      
      3cbe40c [Andrew Or] Merge github.com:apache/spark into stage-info-test-fix
      56dbbcb [Andrew Or] Check if event is actually added before releasing semaphore
      eb486ae [Andrew Or] Synchronize accesses to the LiveListenerBus' event queue
      ee6f7e22
    • jerryshao's avatar
      [SPARK-1510] Spark Streaming metrics source for metrics system · 80429f3e
      jerryshao authored
      This pulls in changes made by @jerryshao in https://github.com/apache/spark/pull/424 and merges with the master.
      
      Author: jerryshao <saisai.shao@intel.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #545 from tdas/streaming-metrics and squashes the following commits:
      
      034b443 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-metrics
      fb3b0a5 [jerryshao] Modify according master update
      21939f5 [jerryshao] Style changes according to style check error
      976116b [jerryshao] Add StreamSource in StreamingContext for better monitoring through metrics system
      80429f3e
    • Thomas Graves's avatar
      Spark 1489 Fix the HistoryServer view acls · 44da5ab2
      Thomas Graves authored
      This allows the view acls set by the user to be enforced by the history server.  It also fixes filters being applied properly.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #509 from tgravescs/SPARK-1489 and squashes the following commits:
      
      869c186 [Thomas Graves] change to either acls enabled or disabled
      0d8333c [Thomas Graves] Add history ui policy to allow acls to either use application set, history server force acls on, or off
      65148b5 [Thomas Graves] SPARK-1489 Fix the HistoryServer view acls
      44da5ab2
    • Michael Armbrust's avatar
      [SQL] Add support for parsing indexing into arrays in SQL. · 4660991e
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #518 from marmbrus/parseArrayIndex and squashes the following commits:
      
      afd2d6b [Michael Armbrust] 100 chars
      c3d6026 [Michael Armbrust] Add support for parsing indexing into arrays in SQL.
      4660991e
    • Tathagata Das's avatar
      [SPARK-1592][streaming] Automatically remove streaming input blocks · 526a518b
      Tathagata Das authored
      The raw input data is stored as blocks in BlockManagers. Earlier they were cleared by cleaner ttl. Now since streaming does not require cleaner TTL to be set, the block would not get cleared. This increases up the Spark's memory usage, which is not even accounted and shown in the Spark storage UI. It may cause the data blocks to spill over to disk, which eventually slows down the receiving of data (persisting to memory become bottlenecked by writing to disk).
      
      The solution in this PR is to automatically remove those blocks. The mechanism to keep track of which BlockRDDs (which has presents the raw data blocks as a RDD) can be safely cleared already exists. Just use it to explicitly remove blocks from BlockRDDs.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #512 from tdas/block-rdd-unpersist and squashes the following commits:
      
      d25e610 [Tathagata Das] Merge remote-tracking branch 'apache/master' into block-rdd-unpersist
      5f46d69 [Tathagata Das] Merge remote-tracking branch 'apache/master' into block-rdd-unpersist
      2c320cd [Tathagata Das] Updated configuration with spark.streaming.unpersist setting.
      2d4b2fd [Tathagata Das] Automatically removed input blocks
      526a518b
    • Arun Ramakrishnan's avatar
      SPARK-1438 RDD.sample() make seed param optional · 35e3d199
      Arun Ramakrishnan authored
      copying form previous pull request https://github.com/apache/spark/pull/462
      
      Its probably better to let the underlying language implementation take care of the default . This was easier to do with python as the default value for seed in random and numpy random is None.
      
      In Scala/Java side it might mean propagating an Option or null(oh no!) down the chain until where the Random is constructed. But, looks like the convention in some other methods was to use System.nanoTime. So, followed that convention.
      
      Conflict with overloaded method in sql.SchemaRDD.sample which also defines default params.
      sample(fraction, withReplacement=false, seed=math.random)
      Scala does not allow more than one overloaded to have default params. I believe the author intended to override the RDD.sample method and not overload it. So, changed it.
      
      If backward compatible is important, 3 new method can be introduced (without default params) like this
      sample(fraction)
      sample(fraction, withReplacement)
      sample(fraction, withReplacement, seed)
      
      Added some tests for the scala RDD takeSample method.
      
      Author: Arun Ramakrishnan <smartnut007@gmail.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Matei Zaharia <matei@databricks.com>
      
      Closes #477 from smartnut007/master and squashes the following commits:
      
      07bb06e [Arun Ramakrishnan] SPARK-1438 fixing more space formatting issues
      b9ebfe2 [Arun Ramakrishnan] SPARK-1438 removing redundant import of random in python rddsampler
      8d05b1a [Arun Ramakrishnan] SPARK-1438 RDD . Replace System.nanoTime with a Random generated number. python: use a separate instance of Random instead of seeding language api global Random instance.
      69619c6 [Arun Ramakrishnan] SPARK-1438 fix spacing issue
      0c247db [Arun Ramakrishnan] SPARK-1438 RDD language apis to support optional seed in RDD methods sample/takeSample
      35e3d199
    • CodingCat's avatar
      SPARK-1104: kill Process in workerThread of ExecutorRunner · f99af852
      CodingCat authored
      As reported in https://spark-project.atlassian.net/browse/SPARK-1104
      
      By @pwendell: "Sometimes due to large shuffles executors will take a long time shutting down. In particular this can happen if large numbers of shuffle files are around (this will be alleviated by SPARK-1103, but nonetheless...).
      
      The symptom is you have DEAD workers sitting around in the UI and the existing workers keep trying to re-register but can't because they've been assumed dead."
      
      In this patch, I add lines in the handler of InterruptedException in workerThread of executorRunner, so that the process.destroy() and process.waitFor() can only block the workerThread instead of blocking the worker Actor...
      
      ---------
      
      analysis: process.destroy() is a blocking method, i.e. it only returns when all shutdownHook threads return...so calling it in Worker thread will make Worker block for a long while....
      
      about what will happen on the shutdown hooks when the JVM process is killed: http://www.tutorialspoint.com/java/lang/runtime_addshutdownhook.htm
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #35 from CodingCat/SPARK-1104 and squashes the following commits:
      
      85767da [CodingCat] add null checking and remove unnecessary killProce
      3107aeb [CodingCat] address Aaron's comments
      eb615ba [CodingCat] kill the process when the error happens
      0accf2f [CodingCat] set process to null after killed it
      1d511c8 [CodingCat] kill Process in workerThread
      f99af852
    • Sandeep's avatar
      Fix Scala Style · a03ac222
      Sandeep authored
      Any comments are welcome
      
      Author: Sandeep <sandeep@techaddict.me>
      
      Closes #531 from techaddict/stylefix-1 and squashes the following commits:
      
      7492730 [Sandeep] Pass 4
      98b2428 [Sandeep] fix rxin suggestions
      b5e2e6f [Sandeep] Pass 3
      05932d7 [Sandeep] fix if else styling 2
      08690e5 [Sandeep] fix if else styling
      a03ac222
    • Michael Armbrust's avatar
      SPARK-1494 Don't initialize classes loaded by MIMA excludes, attempt 2 · c5c1916d
      Michael Armbrust authored
      [WIP]
      
      Looks like scala reflection was invoking the static initializer:
      ```
      ...
      	at org.apache.spark.sql.test.TestSQLContext$.<init>(TestSQLContext.scala:25)
      	at org.apache.spark.sql.test.TestSQLContext$.<clinit>(TestSQLContext.scala)
      	at java.lang.Class.forName0(Native Method)
      	at java.lang.Class.forName(Class.java:270)
      	at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500)
      	at scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505)
      	at scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109)
      ...
      ```
      
      Need to make sure that this doesn't change the exclusion semantics before merging.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #526 from marmbrus/mima and squashes the following commits:
      
      8168dea [Michael Armbrust] Spurious change
      afba262 [Michael Armbrust] Prevent Scala reflection from running static class initializer.
      c5c1916d
    • Thomas Graves's avatar
      Spark 1490 Add kerberos support to the HistoryServer · bd375094
      Thomas Graves authored
      Here I've added the ability for the History server to login from a kerberos keytab file so that the history server can be run as a super user and stay up for along period of time while reading the history files from HDFS.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #513 from tgravescs/SPARK-1490 and squashes the following commits:
      
      e204a99 [Thomas Graves] remove extra logging
      5418daa [Thomas Graves] fix typo in config
      0076b99 [Thomas Graves] Update docs
      4d76545 [Thomas Graves] SPARK-1490 Add kerberos support to the HistoryServer
      bd375094
    • zsxwing's avatar
      SPARK-1611: Fix incorrect initialization order in AppendOnlyMap · 78a49b25
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1611
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #534 from zsxwing/SPARK-1611 and squashes the following commits:
      
      96af089 [zsxwing] SPARK-1611: Fix incorrect initialization order in AppendOnlyMap
      78a49b25
    • Sean Owen's avatar
      SPARK-1488. Squash more language feature warnings in new commits by importing implicitConversion · 6338a93f
      Sean Owen authored
      A recent commit reintroduced some of the same warnings that SPARK-1488 resolved. These are just a few more of the same changes to remove these warnings.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #528 from srowen/SPARK-1488.2 and squashes the following commits:
      
      62d592c [Sean Owen] More feature warnings in tests
      4e2e94b [Sean Owen] Squash more language feature warnings in new commits by importing implicitConversion
      6338a93f
    • Patrick Wendell's avatar
      Small changes to release script · faeb761c
      Patrick Wendell authored
      faeb761c
    • Takuya UESHIN's avatar
      [SPARK-1610] [SQL] Fix Cast to use exact type value when cast from BooleanType to NumericTy... · 27b2821c
      Takuya UESHIN authored
      ...pe.
      
      `Cast` from `BooleanType` to `NumericType` are all using `Int` value.
      But it causes `ClassCastException` when the casted value is used by the following evaluation like the code below:
      
      ``` scala
      scala> import org.apache.spark.sql.catalyst._
      import org.apache.spark.sql.catalyst._
      
      scala> import types._
      import types._
      
      scala> import expressions._
      import expressions._
      
      scala> Add(Cast(Literal(true), ShortType), Literal(1.toShort)).eval()
      java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Short
      	at scala.runtime.BoxesRunTime.unboxToShort(BoxesRunTime.java:102)
      	at scala.math.Numeric$ShortIsIntegral$.plus(Numeric.scala:72)
      	at org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
      	at org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
      	at org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:114)
      	at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:58)
      	at .<init>(<console>:17)
      	at .<clinit>(<console>)
      	at .<init>(<console>:7)
      	at .<clinit>(<console>)
      	at $print(<console>)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:483)
      	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
      	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
      	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
      	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
      	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
      	at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
      	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
      	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
      	at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
      	at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
      	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
      	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
      	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
      	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
      	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
      	at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
      	at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
      	at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
      	at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
      	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
      ```
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #533 from ueshin/issues/SPARK-1610 and squashes the following commits:
      
      70f36e8 [Takuya UESHIN] Fix Cast to use exact type value when cast from BooleanType to NumericType.
      27b2821c
    • Reynold Xin's avatar
      SPARK-1601 & SPARK-1602: two bug fixes related to cancellation · 1fdf659d
      Reynold Xin authored
      This should go into 1.0 since it would return wrong data when the bug happens (which is pretty likely if cancellation is used). Test case attached.
      
      1. Do not put partially executed partitions into cache (in task killing).
      
      2. Iterator returned by CacheManager#getOrCompute was not an InterruptibleIterator, and was thus leading to uninterruptible jobs.
      
      Thanks @aarondav and @ahirreddy for reporting and helping debug.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #521 from rxin/kill and squashes the following commits:
      
      401033f [Reynold Xin] Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/spark into kill
      7a7bdd2 [Reynold Xin] Add a new line in the end of JobCancellationSuite.scala.
      35cd9f7 [Reynold Xin] Fixed a bug that partially executed partitions can be put into cache (in task killing).
      1fdf659d
    • Mridul Muralidharan's avatar
      SPARK-1587 Fix thread leak · dd681f50
      Mridul Muralidharan authored
      mvn test fails (intermittently) due to thread leak - since scalatest runs all tests in same vm.
      
      Author: Mridul Muralidharan <mridulm80@apache.org>
      
      Closes #504 from mridulm/resource_leak_fixes and squashes the following commits:
      
      a5d10d0 [Mridul Muralidharan] Prevent thread leaks while running tests : cleanup all threads when SparkContext.stop is invoked. Causes tests to fail
      7b5e19c [Mridul Muralidharan] Prevent NPE while running tests
      dd681f50
    • Sandeep's avatar
      [Fix #79] Replace Breakable For Loops By While Loops · bb68f477
      Sandeep authored
      Author: Sandeep <sandeep@techaddict.me>
      
      Closes #503 from techaddict/fix-79 and squashes the following commits:
      
      e3f6746 [Sandeep] Style changes
      07a4f6b [Sandeep] for loop to While loop
      0a6d8e9 [Sandeep] Breakable for loop to While loop
      bb68f477
    • zsxwing's avatar
      SPARK-1589: Fix the incorrect compare · 6ab75780
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1589
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #508 from zsxwing/SPARK-1589 and squashes the following commits:
      
      570c67a [zsxwing] SPARK-1589: Fix the incorrect compare
      6ab75780
    • Ankur Dave's avatar
      Mark all fields of EdgePartition, Graph, and GraphOps transient · 1d6abe3a
      Ankur Dave authored
      These classes are only serializable to work around closure capture, so their fields should all be marked `@transient` to avoid wasteful serialization.
      
      This PR supersedes apache/spark#519 and fixes the same bug.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #520 from ankurdave/graphx-transient and squashes the following commits:
      
      6431760 [Ankur Dave] Mark all fields of EdgePartition, Graph, and GraphOps `@transient`
      1d6abe3a
    • Aaron Davidson's avatar
      Update Java api for setJobGroup with interruptOnCancel · d485eecb
      Aaron Davidson authored
      Also adds a unit test.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #522 from aarondav/cancel2 and squashes the following commits:
      
      565c253 [Aaron Davidson] Update Java api for setJobGroup with interruptOnCancel
      65b33d8 [Aaron Davidson] Add unit test for Thread interruption on cancellation
      d485eecb
  3. Apr 23, 2014
    • Andrew Or's avatar
      [Hot Fix #469] Fix flaky test in SparkListenerSuite · 4b2bab1d
      Andrew Or authored
      The two modified tests may fail if the race condition does not bid in our favor...
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #516 from andrewor14/stage-info-test-fix and squashes the following commits:
      
      b4b6100 [Andrew Or] Add/replace missing waitUntilEmpty() calls to listener bus
      4b2bab1d
    • Matei Zaharia's avatar
      [SPARK-1540] Add an optional Ordering parameter to PairRDDFunctions. · 640f9a0e
      Matei Zaharia authored
      In https://issues.apache.org/jira/browse/SPARK-1540 we'd like to look at Spark's API to see if we can take advantage of Comparable keys in more places, which will make external spilling more efficient. This PR is a first step towards that that shows how to pass an Ordering when available and still continue functioning otherwise. It does this using a new implicit parameter with a default value of null.
      
      The API is currently only in Scala -- in Java we'd have to add new versions of mapToPair and such that take a Comparator, or a new method to add a "type hint" to an RDD. We can address those later though.
      
      Unfortunately requiring all keys to be Comparable would not work without requiring RDDs in general to contain only Comparable types. The reason is that methods such as distinct() and intersection() do a shuffle, but should be usable on RDDs of any type. So ordering will have to remain an optimization for the types that can be ordered. I think this isn't a horrible outcome though because one of the nice things about Spark's API is that it works on objects of *any* type, without requiring you to specify a schema or implement Writable or stuff like that.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Reynold Xin <rxin@apache.org>
      
      Closes #487 from mateiz/ordered-keys and squashes the following commits:
      
      bd565f6 [Matei Zaharia] Pass an Ordering to only one version of groupBy because the Scala language spec doesn't allow having an optional parameter on all of them (this was only compiling in Scala 2.10 due to a bug).
      4629965 [Matei Zaharia] Add tests for other versions of groupBy
      3beae85 [Matei Zaharia] Added a test for implicit orderings
      80b7a3b [Matei Zaharia] Add an optional Ordering parameter to PairRDDFunctions.
      640f9a0e
    • Aaron Davidson's avatar
      SPARK-1582 Invoke Thread.interrupt() when cancelling jobs · 432201c7
      Aaron Davidson authored
      Sometimes executor threads are blocked waiting for IO or monitors, and the current implementation of job cancellation may never recover these threads. By simply invoking Thread.interrupt() during cancellation, we can often safely unblock the threads and use them for subsequent work.
      
      Note that this feature must remain optional for now because of a bug in HDFS where Thread.interrupt() may cause nodes to be marked as permanently dead (as the InterruptedException is reinterpreted as an IOException during communication with some node).
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #498 from aarondav/cancel and squashes the following commits:
      
      e52b829 [Aaron Davidson] Don't use job.properties when null
      82f78bb [Aaron Davidson] Update DAGSchedulerSuite
      b67f472 [Aaron Davidson] Add comment on why interruptOnCancel is in setJobGroup
      4cb9fd6 [Aaron Davidson] SPARK-1582 Invoke Thread.interrupt() when cancelling jobs
      432201c7
    • Marcelo Vanzin's avatar
      Honor default fs name when initializing event logger. · dd1b7a61
      Marcelo Vanzin authored
      This is related to SPARK-1459 / PR #375. Without this fix,
      FileLogger.createLogDir() may try to create the log dir on
      HDFS, while createWriter() will try to open the log file on
      the local file system, leading to interesting errors and
      confusion.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #450 from vanzin/event-file-2 and squashes the following commits:
      
      592cdb3 [Marcelo Vanzin] Honor default fs name when initializing event logger.
      dd1b7a61
    • Aaron Davidson's avatar
      SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent · a967b005
      Aaron Davidson authored
      Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit().
      
      This patch avoids two related issues:
      
        1. Always catch exceptions in this reader thread.
        2. Don't mask readerException when Python throws an EOFError
           after worker.shutdownOutput() is called.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #486 from aarondav/pyspark and squashes the following commits:
      
      fbb11e9 [Aaron Davidson] Make sure FileNotFoundExceptions are handled same as before
      b9acb3e [Aaron Davidson] SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent
      a967b005
    • zsxwing's avatar
      SPARK-1583: Fix a bug that using java.util.HashMap by mistake · a6646066
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1583
      
      Does anyone know why using `java.util.HashMap` rather than `mutable.HashMap`? Some methods of `java.util.HashMap` are not generics and compiler can not help us find similar problems.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #500 from zsxwing/SPARK-1583 and squashes the following commits:
      
      7bfd74d [zsxwing] SPARK-1583: Fix a bug that using java.util.HashMap by mistake
      a6646066
    • Patrick Wendell's avatar
      SPARK-1119 and other build improvements · cd4ed293
      Patrick Wendell authored
      1. Makes assembly and examples jar naming consistent in maven/sbt.
      2. Updates make-distribution.sh to use Maven and fixes some bugs.
      3. Updates the create-release script to call make-distribution script.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #502 from pwendell/make-distribution and squashes the following commits:
      
      1a97f0d [Patrick Wendell] SPARK-1119 and other build improvements
      cd4ed293
    • Michael Armbrust's avatar
      [SQL] SPARK-1571 Mistake in java example code · 39f85e03
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #496 from marmbrus/javaBeanBug and squashes the following commits:
      
      644fedd [Michael Armbrust] Bean methods must be public.
      39f85e03
    • Michael Armbrust's avatar
      SPARK-1494 Don't initialize classes loaded by MIMA excludes. · 8e950813
      Michael Armbrust authored
      [WIP]  Just seeing how Jenkins likes this...
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #494 from marmbrus/mima and squashes the following commits:
      
      6eec616 [Michael Armbrust] Force hive tests to run.
      acaf682 [Michael Armbrust] Don't initialize loaded classes.
      8e950813
  4. Apr 22, 2014
Loading