Skip to content
Snippets Groups Projects
  1. Jan 08, 2014
    • Patrick Wendell's avatar
      Merge pull request #358 from pwendell/add-cdh · bdeaeafb
      Patrick Wendell authored
      Add CDH Repository to Maven Build
      
      At some point this was removed from the Maven build... so I'm adding it back. It's needed for the Hadoop2 tests we run on Jenkins and it's also included in the SBT build.
      bdeaeafb
    • Reynold Xin's avatar
      Merge pull request #356 from hsaputra/remove_deprecated_cleanup_method · 5cae05f5
      Reynold Xin authored
      Remove calls to deprecated mapred's OutputCommitter.cleanupJob
      
      Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob,
      
      Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions.
      
      In fact the implementation of mapred OutputCommitter.commitJob looks like this:
      
        public void commitJob(JobContext jobContext) throws IOException {
          cleanupJob(jobContext);
        }
      5cae05f5
    • Thomas Graves's avatar
      Merge pull request #345 from colorant/yarn · 6eef78d7
      Thomas Graves authored
      support distributing extra files to worker for yarn client mode
      
      So that user doesn't need to package all dependency into one assemble jar as spark app jar
      6eef78d7
    • Patrick Wendell's avatar
      Add CDH Repository to Maven Build · 3209a86f
      Patrick Wendell authored
      3209a86f
    • Henry Saputra's avatar
      Remove calls to deprecated mapred's OutputCommitter.cleanupJob because since Hadoop 1.0.4 · 4517326e
      Henry Saputra authored
      the mapred OutputCommitter.commitJob should do cleanup job.
      
      In fact the implementation of mapred OutputCommitter.commitJob looks like this:
      
        public void commitJob(JobContext jobContext) throws IOException {
          cleanupJob(jobContext);
        }
      
      (The jobContext input argument is type of org.apache.hadoop.mapred.JobContext)
      4517326e
    • Patrick Wendell's avatar
      Merge pull request #322 from falaki/MLLibDocumentationImprovement · bb6a39a6
      Patrick Wendell authored
      SPARK-1009 Updated MLlib docs to show how to use it in Python
      
      In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.
      bb6a39a6
    • Patrick Wendell's avatar
      Merge pull request #355 from ScrapCodes/patch-1 · cb1b9273
      Patrick Wendell authored
      Update README.md
      
      The link does not work otherwise.
      cb1b9273
    • Patrick Wendell's avatar
      Merge pull request #313 from tdas/project-refactor · c0f0155e
      Patrick Wendell authored
      Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc.
      
      At a high level, these are the following changes.
      
      1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules.
      
      2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`.
      
      3.  Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information).
      
      4. Jars of the external projects have been added to examples project but not to the assembly project.
      
      5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
      c0f0155e
    • Prashant Sharma's avatar
      Update README.md · d1f28057
      Prashant Sharma authored
      The link does not work otherwise.
      d1f28057
  2. Jan 07, 2014
Loading