- Dec 10, 2013
-
-
Prashant Sharma authored
-
- Dec 07, 2013
-
-
Prashant Sharma authored
Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.
-
- Dec 03, 2013
-
-
Raymond Liu authored
-
- Nov 19, 2013
-
-
Henry Saputra authored
Passed the sbt/sbt compile and test
-
Henry Saputra authored
Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.
-
- Nov 15, 2013
-
-
Aaron Davidson authored
I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.
-
- Nov 12, 2013
-
-
Tathagata Das authored
-
Prashant Sharma authored
-
- Oct 25, 2013
-
-
Patrick Wendell authored
Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Oct 24, 2013
-
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Tathagata Das authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
-
Tathagata Das authored
-
- Oct 23, 2013
-
-
Tathagata Das authored
Fixed bug in Java transformWith, added more Java testcases for transform and transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.
-
- Oct 21, 2013
-
-
Tathagata Das authored
Updated TransformDStream to allow n-ary DStream transform. Added transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.
-
- Oct 19, 2013
-
-
Reynold Xin authored
-
- Oct 17, 2013
-
-
Prabeesh K authored
-
- Oct 16, 2013
- Oct 13, 2013
-
-
Aaron Davidson authored
This is an unfortunately invasive change which converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Since this touches a lot of files, it'd be best to either get this patch in quickly or throw it on the ground to avoid too many secondary merge conflicts.
-
- Oct 12, 2013
-
-
jerryshao authored
-
- Oct 06, 2013
-
-
Patrick Wendell authored
-
- Oct 05, 2013
-
-
Martin Weindel authored
-
- Sep 26, 2013
-
-
Prashant Sharma authored
-
- Sep 24, 2013
-
-
Patrick Wendell authored
-
- Sep 21, 2013
-
-
Prashant Sharma authored
-
- Sep 20, 2013
-
-
Vadim Chekan authored
-
- Sep 10, 2013
-
-
Prashant Sharma authored
-
- Sep 01, 2013
-
-
Matei Zaharia authored
* RDD, *RDDFunctions -> org.apache.spark.rdd * Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer
-
Matei Zaharia authored
-
Matei Zaharia authored
-
- Aug 22, 2013
-
-
Prashant Sharma authored
-