- Jan 21, 2014
-
-
Tathagata Das authored
-
- Jan 20, 2014
-
-
Tathagata Das authored
-
- Jan 19, 2014
-
-
Patrick Wendell authored
Only log error on missing jar to allow spark examples to jar. Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs. To make that nicer so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.
-
Patrick Wendell authored
Updated java API docs for streaming, along with very minor changes in the code examples. Docs updated for: Scala: StreamingContext, DStream, PairDStreamFunctions Java: JavaStreamingContext, JavaDStream, JavaPairDStream Example updated: JavaQueueStream: Not use deprecated method ActorWordCount: Use the public interface the right way.
-
Thomas Graves authored
-
Thomas Graves authored
-
- Jan 18, 2014
-
-
Patrick Wendell authored
Correct L2 regularized weight update with canonical form Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E
-
Patrick Wendell authored
Minor api usability changes - Expose checkpoint directory - since it is autogenerated now - null check for jars - Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.
-
Patrick Wendell authored
Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+) We disabled these earlier because Jenkins didn't have these versions.
-
Patrick Wendell authored
Remove Typesafe Config usage and conf files to fix nested property names With Typesafe Config we had the subtle problem of no longer allowing nested property names, which are used for a few of our properties: http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html This PR is for branch 0.9 but should be added into master too. (cherry picked from commit 34e911ce) Signed-off-by:
Patrick Wendell <pwendell@gmail.com>
-
Patrick Wendell authored
Use renamed shuffle spill config in CoGroupedRDD.scala This one got missed when it was renamed.
-
Patrick Wendell authored
-
Sean Owen authored
-
- Jan 17, 2014
-
-
Mridul Muralidharan authored
-
Patrick Wendell authored
Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
-
Patrick Wendell authored
Clone records java api
-
- Jan 16, 2014
-
-
Prashant Sharma authored
-
Tathagata Das authored
-
Mridul Muralidharan authored
-
Mridul Muralidharan authored
-
Qiuzhuang Lian authored
JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
-
Reynold Xin authored
Fail rather than hanging if a task crashes the JVM. Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefinitely rather than failing the job after maxFailures. Eventually, this makes the job hang, because the Standalone Scheduler removes the application after 10 works have failed, and then the app is left in a state where it's disconnected from the master and waiting to reconnect. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.
-
Kay Ousterhout authored
-
- Jan 15, 2014
-
-
Reynold Xin authored
Code clean up for mllib * Removed unnecessary parentheses * Removed unused imports * Simplified `filter...size()` to `count ...` * Removed obsoleted parameters' comments
-
Reynold Xin authored
SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
-
Kay Ousterhout authored
Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefiniteily rather than failing the job after maxFailures. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.
-
Patrick Wendell authored
Clarify that Python 2.7 is only needed for MLlib
-
Matei Zaharia authored
-
Patrick Wendell authored
Workers should use working directory as spark home if it's not specified If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.
-
Patrick Wendell authored
Made some classes private[stremaing] and deprecated a method in JavaStreamingContext. Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming]. Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.
-
Tathagata Das authored
-
Patrick Wendell authored
GraphX shouldn't list Spark as provided. I noticed this when building an application against GraphX to audit the released artifacts.
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
Updated Debian packaging
-
Thomas Graves authored
More yarn code refactor Try to retrive common code in yarn alpha/stable for client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them. Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way. So just make it run for these two files firstly.
-
CrazyJvm authored
remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
-
Reynold Xin authored
Rename VertexID -> VertexId in GraphX
-
Mridul Muralidharan authored
Expose method and class - so that we can use it from user code (particularly since checkpoint directory is autogenerated now
-
Patrick Wendell authored
Fixed the flaky tests by making SparkConf not serializable SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that. @mateiz @pwendell
-