- Jan 10, 2014
-
-
Patrick Wendell authored
Make DEBUG-level logs consummable. Removes two things that caused issues with the debug logs: (a) Internal polling in the DAGScheduler was polluting the logs. (b) The Scala REPL logs were really noisy.
-
Patrick Wendell authored
Removes two things that caused issues with the debug logs: (a) Internal polling in the DAGScheduler was polluting the logs. (b) The Scala REPL logs were really noisy.
-
Matei Zaharia authored
Fix bug added when we changed AppDescription.maxCores to an Option The Scala compiler warned about this -- we were comparing an Option against an integer now.
-
Patrick Wendell authored
Enable shuffle consolidation by default. Bump this to being enabled for 0.9.0.
-
Patrick Wendell authored
Bump this to being enabled for 0.9.0.
-
Patrick Wendell authored
Set default logging to WARN for Spark streaming examples. This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.
-
- Jan 09, 2014
-
-
Patrick Wendell authored
-
Patrick Wendell authored
Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).
-
Reynold Xin authored
Add some missing Java API methods These are primarily for setting job groups, canceling jobs, and setting names on RDDs. Seemed like useful stuff to expose in Java.
-
Reynold Xin authored
Bug fixes for updating the RDD block's memory and disk usage information Bug fixes for updating the RDD block's memory and disk usage information. From the code context, we can find that the memSize and diskSize here are both always equal to the size of the block. Actually, they never be zero. Thus, the logic here is wrong for recording the block usage in BlockStatus, especially for the blocks which are dropped from memory to ensure space for the new input rdd blocks. I have tested it that this would cause the storage metrics shown in the Storage webpage wrong and misleading. With this patch, the metrics will be okay. Finally, Merry Christmas, guys:)
-
Patrick Wendell authored
-
Patrick Wendell authored
SPARK-998: Support Launching Driver Inside of Standalone Mode [NOTE: I need to bring the tests up to date with new changes, so for now they will fail] This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI There are a few small TODO's here, but the code is generally feature-complete. They are: - Bring tests up to date and add test coverage - Restarting on failure should be optional and maybe off by default. - See if we can re-use akka connections to facilitate clients behind a firewall A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures. Instructions to test locally: - `sbt/sbt assembly/assembly examples/assembly` - start a local version of the standalone cluster manager ``` ./spark-class org.apache.spark.deploy.client.DriverClient \ -j -Dspark.test.property=something \ -e SPARK_TEST_KEY=SOMEVALUE \ launch spark://10.99.1.14:7077 \ ../path-to-examples-assembly-jar \ org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13 ``` - Go in the UI and make sure it started correctly, look at the output etc - Kill workers, the driver program, masters, etc.
-
Matei Zaharia authored
The Scala compiler warned about this -- we were comparing an Option against an integer now.
-
Matei Zaharia authored
-
Patrick Wendell authored
Send logs to stderr by default (instead of stdout).
-
Patrick Wendell authored
-
Matei Zaharia authored
Use typed getters for configuration settings This improves some of the code style after SPARK-544.
-
Patrick Wendell authored
-
Patrick Wendell authored
This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.
-
Reynold Xin authored
Minor style cleanup. Mostly on indenting & line width changes. Focused on the few important files since they are the files that new contributors usually read first.
-
Reynold Xin authored
Don't delegate to users `sbt`. This changes our `sbt/sbt` script to not delegate to the user's `sbt` even if it is present. If users already have sbt installed and they want to use their own sbt, we'd expect them to just call sbt directly from within Spark. We no longer set any enironment variables or anything from this script, so they should just launch sbt directly on their own. There are a number of hard-to-debug issues which can come from the current appraoch. One is if the user is unaware of an existing sbt installation and now without explanation their build breaks because they haven't configured options correctly (such as permgen size) within their sbt (reported by @patmcdonough). Another is if the user has a much older version of sbt hanging around, in which case some of the older versions don't acutally work well when newer verisons of sbt are specified in the build file (reported by @marmbrus). A third is if the user has done some other modification to their sbt script, such as setting it to delegate to sbt/sbt in Spark, and this causes that to break (also reported by @marmbrus). So to keep things simple let's just avoid this path and remove it. Any user who already has sbt and wants to build spark with it should be able to understand easily how to do it.
-
Reynold Xin authored
-
Patrick Wendell authored
-
Matei Zaharia authored
-
Patrick Wendell authored
This changes our `sbt/sbt` script to not delegate to the user's `sbt` even if it is present. If users already have sbt installed and they want to use their own sbt, we'd expect them to just call sbt directly from within Spark. We no longer set any enironment variables or anything from this script, so they should just launch sbt directly on their own. There are a number of hard-to-debug issues which can come from the current appraoch. One is if the user is unaware of an existing sbt installation and now without explanation their build breaks because they haven't configured options correctly (such as permgen size) within their sbt. Another is if the user has a much older version of sbt hanging around, in which case some of the older versions don't acutally work well when newer verisons of sbt are specified in the build file (reported by @marmbrus). A third is if the user has done some other modification to their sbt script, such as setting it to delegate to sbt/sbt in Spark, and this causes that to break (also reported by @marmbrus). So to keep things simple let's just avoid this path and remove it. Any user who already has sbt and wants to build spark with it should be able to understand easily how to do it.
-
Patrick Wendell authored
Fixing config option "retained_stages" => "retainedStages". This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.
-
- Jan 08, 2014
-
-
Patrick Wendell authored
This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.
-
Patrick Wendell authored
-
Reynold Xin authored
-
Reynold Xin authored
fix make-distribution.sh show version: command not found
-
Reynold Xin authored
Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.
-
Patrick Wendell authored
Add CDH Repository to Maven Build At some point this was removed from the Maven build... so I'm adding it back. It's needed for the Hadoop2 tests we run on Jenkins and it's also included in the SBT build.
-
Reynold Xin authored
Remove calls to deprecated mapred's OutputCommitter.cleanupJob Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob, Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions. In fact the implementation of mapred OutputCommitter.commitJob looks like this: public void commitJob(JobContext jobContext) throws IOException { cleanupJob(jobContext); }
-
walker authored
-
liguoqiang authored
-
Thomas Graves authored
support distributing extra files to worker for yarn client mode So that user doesn't need to package all dependency into one assemble jar as spark app jar
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml
-
Henry Saputra authored
-