Skip to content
Snippets Groups Projects
  1. Dec 10, 2013
  2. Dec 07, 2013
  3. Dec 03, 2013
  4. Nov 19, 2013
  5. Nov 15, 2013
    • Aaron Davidson's avatar
      Various merge corrections · f629ba95
      Aaron Davidson authored
      I've diff'd this patch against my own -- since they were both created
      independently, this means that two sets of eyes have gone over all the
      merge conflicts that were created, so I'm feeling significantly more
      confident in the resulting PR.
      
      @rxin has looked at the changes to the repl and is resoundingly
      confident that they are correct.
      f629ba95
  6. Nov 12, 2013
  7. Oct 25, 2013
    • Patrick Wendell's avatar
      Exclude jopt from kafka dependency. · af4a529f
      Patrick Wendell authored
      Kafka uses an older version of jopt that causes bad conflicts with the version
      used by spark-perf. It's not easy to remove this downstream because of the way
      that spark-perf uses Spark (by including a spark assembly as an unmanaged jar).
      This fixes the problem at its source by just never including it.
      af4a529f
    • Patrick Wendell's avatar
      Style fixes · ad5f579c
      Patrick Wendell authored
      ad5f579c
    • Patrick Wendell's avatar
      Spacing fix · e5f6d569
      Patrick Wendell authored
      e5f6d569
  8. Oct 24, 2013
    • Patrick Wendell's avatar
      Small spacing fix · a351fd4a
      Patrick Wendell authored
      a351fd4a
    • Patrick Wendell's avatar
      31e92b72
    • Patrick Wendell's avatar
      Some clean-up of tests · 39f6f755
      Patrick Wendell authored
      39f6f755
    • Tathagata Das's avatar
      Fixed accidental bug. · e962a6e6
      Tathagata Das authored
      e962a6e6
    • Patrick Wendell's avatar
      Removing Java for now · 9423532f
      Patrick Wendell authored
      9423532f
    • Patrick Wendell's avatar
      Adding tests · 05ac9940
      Patrick Wendell authored
      05ac9940
    • Patrick Wendell's avatar
      Add a `repartition` operator. · 08c1a42d
      Patrick Wendell authored
      This patch adds an operator called repartition with more straightforward
      semantics than the current `coalesce` operator. There are a few use cases
      where this operator is useful:
      
      1. If a user wants to increase the number of partitions in the RDD. This
      is more common now with streaming. E.g. a user is ingesting data on one
      node but they want to add more partitions to ensure parallelism of
      subsequent operations across threads or the cluster.
      
      Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's
      super confusing.
      
      2. If a user has input data where the number of partitions is not known. E.g.
      
      > sc.textFile("some file").coalesce(50)....
      
      This is both vague semantically (am I growing or shrinking this RDD) but also,
      may not work correctly if the base RDD has fewer than 50 partitions.
      
      The new operator forces shuffles every time, so it will always produce exactly
      the number of new partitions. It also throws an exception rather than silently
      not-working if a bad input is passed.
      
      I am currently adding streaming tests (requires refactoring some of the test
      suite to allow testing at partition granularity), so this is not ready for
      merge yet. But feedback is welcome.
      08c1a42d
    • Tathagata Das's avatar
      Added JavaStreamingContext.transform · bacfe5eb
      Tathagata Das authored
      bacfe5eb
  9. Oct 23, 2013
  10. Oct 21, 2013
  11. Oct 19, 2013
  12. Oct 17, 2013
  13. Oct 16, 2013
  14. Oct 13, 2013
    • Aaron Davidson's avatar
      Refactor BlockId into an actual type · a3959111
      Aaron Davidson authored
      This is an unfortunately invasive change which converts all of our BlockId
      strings into actual BlockId types. Here are some advantages of doing this now:
      
      + Type safety
      
      + Code clarity - it's now obvious what the key of a shuffle or rdd block is,
        for instance. Additionally, appearing in tuple/map type signatures is a big
        readability bonus. A Seq[(String, BlockStatus)] is not very clear.
        Further, we can now use more Scala features, like matching on BlockId types.
      
      + Explicit usage - we can now formally tell where various BlockIds are being used
        (without doing string searches); this makes updating current BlockIds a much
        clearer process, and compiler-supported.
        (I'm looking at you, shuffle file consolidation.)
      
      + It will only get harder to make this change as time goes on.
      
      Since this touches a lot of files, it'd be best to either get this patch
      in quickly or throw it on the ground to avoid too many secondary merge conflicts.
      a3959111
  15. Oct 12, 2013
  16. Oct 06, 2013
  17. Oct 05, 2013
  18. Sep 26, 2013
  19. Sep 24, 2013
  20. Sep 21, 2013
  21. Sep 20, 2013
  22. Sep 10, 2013
  23. Sep 01, 2013
  24. Aug 22, 2013
Loading