Skip to content
Snippets Groups Projects
  1. Jan 23, 2014
  2. Jan 22, 2014
    • Patrick Wendell's avatar
      Merge pull request #496 from pwendell/master · a1cd1851
      Patrick Wendell authored
      Fix bug in worker clean-up in UI
      
      Introduced in d5a96fec (/cc @aarondav).
      
      This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers.
      a1cd1851
    • Patrick Wendell's avatar
      Merge pull request #447 from CodingCat/SPARK-1027 · 034dce2a
      Patrick Wendell authored
      fix for SPARK-1027
      
      fix for SPARK-1027  (https://spark-project.atlassian.net/browse/SPARK-1027)
      
      FIXES
      
      1. change sparkhome from String to Option(String) in ApplicationDesc
      
      2. remove sparkhome parameter in LaunchExecutor message
      
      3. adjust involved files
      034dce2a
    • Patrick Wendell's avatar
      Fix bug in worker clean-up in UI · 62855131
      Patrick Wendell authored
      Introduced in d5a96fec. This should be picked into 0.8 and 0.9 as well.
      62855131
    • CodingCat's avatar
      refactor sparkHome to val · 2b3c4614
      CodingCat authored
      clean code
      2b3c4614
    • Patrick Wendell's avatar
      Merge pull request #495 from srowen/GraphXCommonsMathDependency · 3184facd
      Patrick Wendell authored
      Fix graphx Commons Math dependency
      
      `graphx` depends on Commons Math (2.x) in `SVDPlusPlus.scala`. However the module doesn't declare this dependency. It happens to work because it is included by Hadoop artifacts. But, I can tell you this isn't true as of a month or so ago. Building versus recent Hadoop would fail. (That's how we noticed.)
      
      The simple fix is to declare the dependency, as it should be. But it's also worth noting that `commons-math` is the old-ish 2.x line, while `commons-math3` is where newer 3.x releases are. Drop-in replacement, but different artifact and package name. Changing this only usage to `commons-math3` works, tests pass, and isn't surprising that it does, so is probably also worth changing. (A comment in some test code also references `commons-math3`, FWIW.)
      
      It does raise another question though: `mllib` looks like it uses the `jblas` `DoubleMatrix` for general purpose vector/matrix stuff. Should `graphx` really use Commons Math for this? Beyond the tiny scope here but worth asking.
      3184facd
    • Sean Owen's avatar
      4476398f
    • Patrick Wendell's avatar
      Merge pull request #492 from skicavs/master · a1238bb5
      Patrick Wendell authored
      fixed job name and usage information for the JavaSparkPi example
      a1238bb5
    • Sean Owen's avatar
      Depend on Commons Math explicitly instead of accidentally getting it from... · fd0c5b8c
      Sean Owen authored
      Depend on Commons Math explicitly instead of accidentally getting it from Hadoop (which stops working in 2.2.x) and also use the newer commons-math3
      fd0c5b8c
    • Patrick Wendell's avatar
      Merge pull request #478 from sryza/sandy-spark-1033 · 576c4a4c
      Patrick Wendell authored
      SPARK-1033. Ask for cores in Yarn container requests
      
      Tested on a pseudo-distributed cluster against the Fair Scheduler and observed a worker taking more than a single core.
      576c4a4c
    • Matei Zaharia's avatar
      Merge pull request #493 from kayousterhout/double_add · 5bcfd798
      Matei Zaharia authored
      Fixed bug where task set managers are added to queue twice
      
      @mateiz can you verify that this is a bug and wasn't intentional? (https://github.com/apache/incubator-spark/commit/90a04dab8d9a2a9a372cea7cdf46cc0fd0f2f76c#diff-7fa4f84a961750c374f2120ca70e96edR551)
      
      This bug leads to a small performance hit because task
      set managers will get offered each rejected resource
      offer twice, but doesn't lead to any incorrect functionality.
      
      Thanks to @hdc1112 for pointing this out.
      5bcfd798
    • Matei Zaharia's avatar
      Merge pull request #315 from rezazadeh/sparsesvd · d009b17d
      Matei Zaharia authored
      Sparse SVD
      
      # Singular Value Decomposition
      Given an *m x n* matrix *A*, compute matrices *U, S, V* such that
      
      *A = U * S * V^T*
      
      There is no restriction on m, but we require n^2 doubles to fit in memory.
      Further, n should be less than m.
      
      The decomposition is computed by first computing *A^TA = V S^2 V^T*,
      computing svd locally on that (since n x n is small),
      from which we recover S and V.
      Then we compute U via easy matrix multiplication
      as *U =  A * V * S^-1*
      
      Only singular vectors associated with the largest k singular values
      If there are k such values, then the dimensions of the return will be:
      
      * *S* is *k x k* and diagonal, holding the singular values on diagonal.
      * *U* is *m x k* and satisfies U^T*U = eye(k).
      * *V* is *n x k* and satisfies V^TV = eye(k).
      
      All input and output is expected in sparse matrix format, 0-indexed
      as tuples of the form ((i,j),value) all in RDDs.
      
      # Testing
      Tests included. They test:
      - Decomposition promise (A = USV^T)
      - For small matrices, output is compared to that of jblas
      - Rank 1 matrix test included
      - Full Rank matrix test included
      - Middle-rank matrix forced via k included
      
      # Example Usage
      
      import org.apache.spark.SparkContext
      import org.apache.spark.mllib.linalg.SVD
      import org.apache.spark.mllib.linalg.SparseMatrix
      import org.apache.spark.mllib.linalg.MatrixyEntry
      
      // Load and parse the data file
      val data = sc.textFile("mllib/data/als/test.data").map { line =>
            val parts = line.split(',')
            MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble)
      }
      val m = 4
      val n = 4
      
      // recover top 1 singular vector
      val decomposed = SVD.sparseSVD(SparseMatrix(data, m, n), 1)
      
      println("singular values = " + decomposed.S.data.toArray.mkString)
      
      # Documentation
      Added to docs/mllib-guide.md
      d009b17d
    • Kay Ousterhout's avatar
      Fixed bug where task set managers are added to queue twice · 19da82c5
      Kay Ousterhout authored
      This bug leads to a small performance hit because task
      set managers will get offered each rejected resource
      offer twice, but doesn't lead to any incorrect functionality.
      19da82c5
    • Kevin Mader's avatar
  3. Jan 21, 2014
    • Reynold Xin's avatar
      Merge pull request #489 from ash211/patch-6 · 749f8428
      Reynold Xin authored
      Clarify spark.default.parallelism
      
      It's the task count across the cluster, not per worker, per machine, per core, or anything else.
      749f8428
    • Andrew Ash's avatar
      Clarify spark.default.parallelism · 069bb942
      Andrew Ash authored
      It's the task count across the cluster, not per worker, per machine, per core, or anything else.
      069bb942
    • Reynold Xin's avatar
      Merge pull request #469 from ajtulloch/use-local-spark-context-in-tests-for-mllib · f8544981
      Reynold Xin authored
      [MLlib] Use a LocalSparkContext trait in test suites
      
      Replaces the 9 instances of
      
      ```scala
      class XXXSuite extends FunSuite with BeforeAndAfterAll {
        @transient private var sc: SparkContext = _
      
        override def beforeAll() {
          sc = new SparkContext("local", "test")
        }
      
        override def afterAll() {
          sc.stop()
          System.clearProperty("spark.driver.port")
        }
      ```
      
      with
      
      ```scala
      class XXXSuite extends FunSuite with LocalSparkContext {
      ```
      f8544981
    • Andrew Tulloch's avatar
      Fixed import order · 3a067b4a
      Andrew Tulloch authored
      3a067b4a
    • Sandy Ryza's avatar
      Incorporate Tom's comments - update doc and code to reflect that core requests... · adf42611
      Sandy Ryza authored
      Incorporate Tom's comments - update doc and code to reflect that core requests may not always be honored
      adf42611
    • Patrick Wendell's avatar
      Merge pull request #480 from pwendell/0.9-fixes · 77b986f6
      Patrick Wendell authored
      Handful of 0.9 fixes
      
      This patch addresses a few fixes for Spark 0.9.0 based on the last release candidate.
      
      @mridulm gets credit for reporting most of the issues here. Many of the fixes here are based on his work in #477 and follow up discussion with him.
      77b986f6
    • Patrick Wendell's avatar
      Style clean-up · a9bcc980
      Patrick Wendell authored
      a9bcc980
    • Patrick Wendell's avatar
      Merge pull request #484 from tdas/run-example-fix · c67d3d8b
      Patrick Wendell authored
      Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM.
      
      bin/run-example scripts was not passing Java properties set through the SPARK_JAVA_OPTS to the example. This is important for examples like Twitter** as the Twitter authentication information must be set through java properties. Hence added the same JAVA_OPTS code in run-example as it is in bin/spark-class script.
      
      Also added SPARK_MEM, in case someone wants to run the example with different amounts of memory. This can be removed if it is not tune with the intended semantics of the run-example scripts.
      
      @matei Please check this soon I want this to go in 0.9-rc4
      c67d3d8b
    • Tathagata Das's avatar
      Removed SPARK_MEM from run-examples. · 65869f84
      Tathagata Das authored
      65869f84
    • Patrick Wendell's avatar
      Adding small code comment · a917a87e
      Patrick Wendell authored
      a917a87e
    • Reynold Xin's avatar
      Merge pull request #449 from CrazyJvm/master · 6b4eed77
      Reynold Xin authored
      SPARK-1028 : fix "set MASTER automatically fails" bug.
      
      spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem.
      The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
      6b4eed77
    • Patrick Wendell's avatar
      Merge pull request #482 from tdas/streaming-example-fix · 0367981d
      Patrick Wendell authored
      Added StreamingContext.awaitTermination to streaming examples
      
      StreamingContext.start() currently starts a non-daemon thread which prevents termination of a Spark Streaming program even if main function has exited. Since the expected behavior of a streaming program is to run until explicitly killed, this was sort of fine when spark streaming applications are launched from the command line. However, when launched in Yarn-standalone mode, this did not work as the driver effectively got terminated when the main function exits. So SparkStreaming examples did not work on Yarn.
      
      This addition to the examples ensures that the examples work on Yarn and also ensures that everyone learns that StreamingContext.awaitTermination() being necessary for SparkStreaming programs to wait.
      
      The true bug-fix of making sure all threads by Spark Streaming are daemon threads is left for post-0.9.
      0367981d
Loading