Skip to content
Snippets Groups Projects
  1. Aug 02, 2014
    • Chris Fregly's avatar
      [SPARK-1981] Add AWS Kinesis streaming support · 91f9504e
      Chris Fregly authored
      Author: Chris Fregly <chris@fregly.com>
      
      Closes #1434 from cfregly/master and squashes the following commits:
      
      4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
      0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
      691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
      0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated docs
      e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into the examples/ dir
      d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the KinesisUtils api
      912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
      db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
      338997e [Chris Fregly] improve build docs for kinesis
      828f8ae [Chris Fregly] more cleanup
      e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      cd68c0d [Chris Fregly] fixed typos and backward compatibility
      d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
      91f9504e
  2. Aug 01, 2014
  3. Jul 30, 2014
    • Sean Owen's avatar
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven... · 6ab96a6f
      Sean Owen authored
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep
      
      The Maven-based builds in the build matrix have been failing for a few days:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/
      
      On inspection, it looks like the Spark SQL Java tests don't compile:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/244/consoleFull
      
      I confirmed it by repeating the command vs master:
      
      `mvn -Dhadoop.version=1.0.4 -Dlabel=centos -DskipTests clean package`
      
      The problem is that this module doesn't depend on JUnit. In fact, none of the modules do, but `com.novocode:junit-interface` (the SBT-JUnit bridge) pulls it in, in most places. However this module doesn't depend on `com.novocode:junit-interface`
      
      Adding the `junit:junit` dependency fixes the compile problem. In fact, the other modules with Java tests should probably depend on it explicitly instead of happening to get it via `com.novocode:junit-interface`, since that is a bit SBT/Scala-specific (and I am not even sure it's needed).
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1660 from srowen/SPARK-2749 and squashes the following commits:
      
      858ff7c [Sean Owen] Add explicit junit dep to other modules with Java tests for robustness
      9636794 [Sean Owen] Add junit dep so that Spark SQL Java tests compile
      6ab96a6f
  4. Jul 10, 2014
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
  5. Jun 05, 2014
  6. Jun 04, 2014
    • Sean Owen's avatar
      SPARK-1973. Add randomSplit to JavaRDD (with tests, and tidy Java tests) · d341b17c
      Sean Owen authored
      I'd like to use randomSplit through the Java API, and would like to add a convenience wrapper for this method to JavaRDD. This is fairly trivial. (In fact, is the intent that JavaRDD not wrap every RDD method? and that sometimes users should just use JavaRDD.wrapRDD()?)
      
      Along the way, I added tests for it, and also touched up the Java API test style and behavior. This is maybe the more useful part of this small change.
      
      Author: Sean Owen <sowen@cloudera.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Xiangrui Meng <meng@databricks.com>
      
      Closes #919 from srowen/SPARK-1973 and squashes the following commits:
      
      148cb7b [Sean Owen] Some final Java test polish, while we are at it
      1fc3f3e [Xiangrui Meng] more cleaning on Java 8 tests
      9ebc57f [Sean Owen] Use accumulator instead of temp files to test foreach
      5efb0be [Sean Owen] Add Java randomSplit, and unit tests (including for sample)
      5dcc158 [Sean Owen] Simplified Java 8 test with new language features, and fixed the name of MLB's greatest team
      91a1769 [Sean Owen] Touch up minor style issues in existing Java API suite test
      d341b17c
  7. Jun 03, 2014
    • Syed Hashmi's avatar
      [SPARK-1942] Stop clearing spark.driver.port in unit tests · 7782a304
      Syed Hashmi authored
      stop resetting spark.driver.port in unit tests (scala, java and python).
      
      Author: Syed Hashmi <shashmi@cloudera.com>
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #943 from syedhashmi/master and squashes the following commits:
      
      885f210 [Syed Hashmi] Removing unnecessary file (created by mergetool)
      b8bd4b5 [Syed Hashmi] Merge remote-tracking branch 'upstream/master'
      b895e59 [Syed Hashmi] Revert "[SPARK-1784] Add a new partitioner"
      57b6587 [Syed Hashmi] Revert "[SPARK-1784] Add a balanced partitioner"
      1574769 [Syed Hashmi] [SPARK-1942] Stop clearing spark.driver.port in unit tests
      4354836 [Syed Hashmi] Revert "SPARK-1686: keep schedule() calling in the main thread"
      fd36542 [Syed Hashmi] [SPARK-1784] Add a balanced partitioner
      6668015 [CodingCat] SPARK-1686: keep schedule() calling in the main thread
      4ca94cc [Syed Hashmi] [SPARK-1784] Add a new partitioner
      7782a304
  8. May 12, 2014
    • Sean Owen's avatar
      SPARK-1798. Tests should clean up temp files · 7120a297
      Sean Owen authored
      Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent.
      
      Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former.
      
      The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules.
      
      Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method.
      
      _If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #732 from srowen/SPARK-1798 and squashes the following commits:
      
      5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each
      b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean
      bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests
      7120a297
  9. May 02, 2014
  10. Apr 08, 2014
    • Holden Karau's avatar
      Spark 1271: Co-Group and Group-By should pass Iterable[X] · ce8ec545
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #242 from holdenk/spark-1320-cogroupandgroupshouldpassiterator and squashes the following commits:
      
      f289536 [Holden Karau] Fix bad merge, should have been Iterable rather than Iterator
      77048f8 [Holden Karau] Fix merge up to master
      d3fe909 [Holden Karau] use toSeq instead
      7a092a3 [Holden Karau] switch resultitr to resultiterable
      eb06216 [Holden Karau] maybe I should have had a coffee first. use correct import for guava iterables
      c5075aa [Holden Karau] If guava 14 had iterables
      2d06e10 [Holden Karau] Fix Java 8 cogroup tests for the new API
      11e730c [Holden Karau] Fix streaming tests
      66b583d [Holden Karau] Fix the core test suite to compile
      4ed579b [Holden Karau] Refactor from iterator to iterable
      d052c07 [Holden Karau] Python tests now pass with iterator pandas
      3bcd81d [Holden Karau] Revert "Try and make pickling list iterators work"
      cd1e81c [Holden Karau] Try and make pickling list iterators work
      c60233a [Holden Karau] Start investigating moving to iterators for python API like the Java/Scala one. tl;dr: We will have to write our own iterator since the default one doesn't pickle well
      88a5cef [Holden Karau] Fix cogroup test in JavaAPISuite for streaming
      a5ee714 [Holden Karau] oops, was checking wrong iterator
      e687f21 [Holden Karau] Fix groupbykey test in JavaAPISuite of streaming
      ec8cc3e [Holden Karau] Fix test issues\!
      4b0eeb9 [Holden Karau] Switch cast in PairDStreamFunctions
      fa395c9 [Holden Karau] Revert "Add a join based on the problem in SVD"
      ec99e32 [Holden Karau] Revert "Revert this but for now put things in list pandas"
      b692868 [Holden Karau] Revert
      7e533f7 [Holden Karau] Fix the bug
      8a5153a [Holden Karau] Revert me, but we have some stuff to debug
      b4e86a9 [Holden Karau] Add a join based on the problem in SVD
      c4510e2 [Holden Karau] Revert this but for now put things in list pandas
      b4e0b1d [Holden Karau] Fix style issues
      71e8b9f [Holden Karau] I really need to stop calling size on iterators, it is the path of sadness.
      b1ae51a [Holden Karau] Fix some of the types in the streaming JavaAPI suite. Probably still needs more work
      37888ec [Holden Karau] core/tests now pass
      249abde [Holden Karau] org.apache.spark.rdd.PairRDDFunctionsSuite passes
      6698186 [Holden Karau] Revert "I think this might be a bad rabbit hole. Started work to make CoGroupedRDD use iterator and then went crazy"
      fe992fe [Holden Karau] hmmm try and fix up basic operation suite
      172705c [Holden Karau] Fix Java API suite
      caafa63 [Holden Karau] I think this might be a bad rabbit hole. Started work to make CoGroupedRDD use iterator and then went crazy
      88b3329 [Holden Karau] Fix groupbykey to actually give back an iterator
      4991af6 [Holden Karau] Fix some tests
      be50246 [Holden Karau] Calling size on an iterator is not so good if we want to use it after
      687ffbc [Holden Karau] This is the it compiles point of replacing Seq with Iterator and JList with JIterator in the groupby and cogroup signatures
      ce8ec545
  11. Mar 26, 2014
    • NirmalReddy's avatar
      Spark 1095 : Adding explicit return types to all public methods · 3e63d98f
      NirmalReddy authored
      Excluded those that are self-evident and the cases that are discussed in the mailing list.
      
      Author: NirmalReddy <nirmal_reddy2000@yahoo.com>
      Author: NirmalReddy <nirmal.reddy@imaginea.com>
      
      Closes #168 from NirmalReddy/Spark-1095 and squashes the following commits:
      
      ac54b29 [NirmalReddy] import misplaced
      8c5ff3e [NirmalReddy] Changed syntax of unit returning methods
      02d0778 [NirmalReddy] fixed explicit types in all the other packages
      1c17773 [NirmalReddy] fixed explicit types in core package
      3e63d98f
  12. Mar 11, 2014
    • Patrick Wendell's avatar
      SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues... · 16788a65
      Patrick Wendell authored
      This patch removes Ganglia integration from the default build. It
      allows users willing to link against LGPL code to use Ganglia
      by adding build flags or linking against a new Spark artifact called
      spark-ganglia-lgpl.
      
      This brings Spark in line with the Apache policy on LGPL code
      enumerated here:
      
      https://www.apache.org/legal/3party.html#options-optional
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #108 from pwendell/ganglia and squashes the following commits:
      
      326712a [Patrick Wendell] Responding to review feedback
      5f28ee4 [Patrick Wendell] SPARK-1167: Remove metrics-ganglia from default build due to LGPL issues.
      16788a65
  13. Mar 04, 2014
    • Prashant Sharma's avatar
      [java8API] SPARK-964 Investigate the potential for using JDK 8 lambda... · 181ec503
      Prashant Sharma authored
      [java8API] SPARK-964 Investigate the potential for using JDK 8 lambda expressions for the Java/Scala APIs
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #17 from ScrapCodes/java8-lambdas and squashes the following commits:
      
      95850e6 [Patrick Wendell] Some doc improvements and build changes to the Java 8 patch.
      85a954e [Prashant Sharma] Nit. import orderings.
      673f7ac [Prashant Sharma] Added support for -java-home as well
      80a13e8 [Prashant Sharma] Used fake class tag syntax
      26eb3f6 [Prashant Sharma] Patrick's comments on PR.
      35d8d79 [Prashant Sharma] Specified java 8 building in the docs
      31d4cd6 [Prashant Sharma] Maven build to support -Pjava8-tests flag.
      4ab87d3 [Prashant Sharma] Review feedback on the pr
      c33dc2c [Prashant Sharma] SPARK-964, Java 8 API Support.
      181ec503
Loading