Commits · f8ba89da217a1f1fd5c856a95a27a3e535017643 · cs525-sp18-g07 / spark

Dec 10, 2013
- Style fixes and addressed review comments at #221 · 17db6a90
  Prashant Sharma authored 11 years ago
  
  17db6a90
Dec 07, 2013
- Incorporated Patrick's feedback comment on #211 and made maven... · 7ad6921a
  Prashant Sharma authored 11 years ago
  
  Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.
  7ad6921a
Dec 03, 2013
- Fix pom.xml for maven build · 4738818d
  Raymond Liu authored 11 years ago
  
  4738818d
Nov 19, 2013
- Another set of changes to remove unnecessary semicolon (;) from Scala code. · 10be58f2
  Henry Saputra authored 11 years ago
  
  Passed the sbt/sbt compile and test
  10be58f2
- Remove the semicolons at the end of Scala code to make it more pure Scala code. · 9c934b64
  Henry Saputra authored 11 years ago
  
  Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.
  9c934b64
Nov 15, 2013

Aaron Davidson authored 11 years ago

I've diff'd this patch against my own -- since they were both created
independently, this means that two sets of eyes have gone over all the
merge conflicts that were created, so I'm feeling significantly more
confident in the resulting PR.

@rxin has looked at the changes to the repl and is resoundingly
confident that they are correct.

f629ba95

Nov 12, 2013
- Made block generator thread safe to fix Kafka bug. · 7ccbbdac
  Tathagata Das authored 11 years ago
  
  7ccbbdac
- Remove deprecated actorFor and use actorSelection everywhere. · 6860b79f
  Prashant Sharma authored 11 years ago
  
  6860b79f
Oct 25, 2013

Exclude jopt from kafka dependency. · af4a529f

Patrick Wendell authored 11 years ago

Kafka uses an older version of jopt that causes bad conflicts with the version
used by spark-perf. It's not easy to remove this downstream because of the way
that spark-perf uses Spark (by including a spark assembly as an unmanaged jar).
This fixes the problem at its source by just never including it.

af4a529f

Style fixes · ad5f579c
Patrick Wendell authored 11 years ago

ad5f579c
Spacing fix · e5f6d569
Patrick Wendell authored 11 years ago

e5f6d569

Oct 24, 2013

Small spacing fix · a351fd4a
Patrick Wendell authored 11 years ago

a351fd4a
Adding Java versions and associated tests · 31e92b72
Patrick Wendell authored 11 years ago

31e92b72
Some clean-up of tests · 39f6f755
Patrick Wendell authored 11 years ago

39f6f755
Fixed accidental bug. · e962a6e6
Tathagata Das authored 11 years ago

e962a6e6
Removing Java for now · 9423532f
Patrick Wendell authored 11 years ago

9423532f
Adding tests · 05ac9940
Patrick Wendell authored 11 years ago

05ac9940

Add a `repartition` operator. · 08c1a42d

Patrick Wendell authored 11 years ago

This patch adds an operator called repartition with more straightforward
semantics than the current `coalesce` operator. There are a few use cases
where this operator is useful:

1. If a user wants to increase the number of partitions in the RDD. This
is more common now with streaming. E.g. a user is ingesting data on one
node but they want to add more partitions to ensure parallelism of
subsequent operations across threads or the cluster.

Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's
super confusing.

2. If a user has input data where the number of partitions is not known. E.g.

> sc.textFile("some file").coalesce(50)....

This is both vague semantically (am I growing or shrinking this RDD) but also,
may not work correctly if the base RDD has fewer than 50 partitions.

The new operator forces shuffles every time, so it will always produce exactly
the number of new partitions. It also throws an exception rather than silently
not-working if a bad input is passed.

I am currently adding streaming tests (requires refactoring some of the test
suite to allow testing at partition granularity), so this is not ready for
merge yet. But feedback is welcome.

08c1a42d

Added JavaStreamingContext.transform · bacfe5eb
Tathagata Das authored 11 years ago

bacfe5eb

Oct 23, 2013

Fixed bug in Java transformWith, added more Java testcases for transform and... · 72d2e1dd

Tathagata Das authored 11 years ago

Fixed bug in Java transformWith, added more Java testcases for transform and transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.

72d2e1dd

Oct 21, 2013

Updated TransformDStream to allow n-ary DStream transform. Added... · 06664987

Tathagata Das authored 11 years ago

Updated TransformDStream to allow n-ary DStream transform. Added transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.

06664987

Oct 19, 2013
- Exclusion rules for Maven build files. · 4e44d65b
  Reynold Xin authored 11 years ago
  
  4e44d65b
Oct 17, 2013
- Update MQTTInputDStream.scala · d223d389
  Prabeesh K authored 11 years ago
  
  d223d389
Oct 16, 2013
- modify code, use Spark Logging Class · 890f8fe4
  prabeesh authored 11 years ago
  
  890f8fe4
- add maven dependencies for mqtt · 9a757572
  prabeesh authored 11 years ago
  
  9a757572
- added mqtt adapter · 2e48b23e
  prabeesh authored 11 years ago
  
  2e48b23e
- mqttinputdstream for mqttstreaming adapter · 742ada91
  prabeesh authored 11 years ago
  
  742ada91
Oct 13, 2013

Refactor BlockId into an actual type · a3959111

Aaron Davidson authored 11 years ago

This is an unfortunately invasive change which converts all of our BlockId
strings into actual BlockId types. Here are some advantages of doing this now:

+ Type safety

+ Code clarity - it's now obvious what the key of a shuffle or rdd block is,
  for instance. Additionally, appearing in tuple/map type signatures is a big
  readability bonus. A Seq[(String, BlockStatus)] is not very clear.
  Further, we can now use more Scala features, like matching on BlockId types.

+ Explicit usage - we can now formally tell where various BlockIds are being used
  (without doing string searches); this makes updating current BlockIds a much
  clearer process, and compiler-supported.
  (I'm looking at you, shuffle file consolidation.)

+ It will only get harder to make this change as time goes on.

Since this touches a lot of files, it'd be best to either get this patch
in quickly or throw it on the ground to avoid too many secondary merge conflicts.

a3959111

Oct 12, 2013
- Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming · c23cd72b
  jerryshao authored 11 years ago
  
  c23cd72b
Oct 06, 2013
- Merging build changes in from 0.8 · aa9fb849
  Patrick Wendell authored 11 years ago
  
  aa9fb849
Oct 05, 2013
- fixed some warnings · e09f4a96
  Martin Weindel authored 11 years ago
  
  e09f4a96
Sep 26, 2013
- fixed maven build for scala 2.10 · 7ff4c2d3
  Prashant Sharma authored 11 years ago
  
  7ff4c2d3
Sep 24, 2013
- Update build version in master · 6079721f
  Patrick Wendell authored 11 years ago
  
  6079721f
Sep 21, 2013
- Akka 2.2 migration · 276c37a5
  Prashant Sharma authored 11 years ago
  
  276c37a5
Sep 20, 2013
- Serialize and restore spark.cleaner.ttl to savepoint · fbe40c58
  Vadim Chekan authored 11 years ago
  
  fbe40c58
Sep 10, 2013
- Few more fixes to tests broken during merge · 6fcfefcb
  Prashant Sharma authored 11 years ago
  
  6fcfefcb
Sep 01, 2013
- Move some classes to more appropriate packages: · 0a8cc309
  Matei Zaharia authored 11 years ago
  
  * RDD, *RDDFunctions -> org.apache.spark.rdd * Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer
  0a8cc309
- Fix some URLs · 5701eb92
  Matei Zaharia authored 11 years ago
  
  5701eb92
- Initial work to rename package to org.apache.spark · 46eecd11
  Matei Zaharia authored 11 years ago
  
  46eecd11
Aug 22, 2013
- Linking custom receiver guide · 2bc348e9
  Prashant Sharma authored 11 years ago
  
  2bc348e9