Skip to content
Snippets Groups Projects
  1. Dec 22, 2015
  2. Dec 21, 2015
    • Davies Liu's avatar
      [SPARK-12388] change default compression to lz4 · 29cecd4a
      Davies Liu authored
      According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy.
      
      After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4).
      
      [1] https://github.com/ning/jvm-compressor-benchmark/wiki
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10342 from davies/lz4.
      29cecd4a
    • Andrew Or's avatar
      [SPARK-12466] Fix harmless NPE in tests · d655d37d
      Andrew Or authored
      ```
      [info] ReplayListenerSuite:
      [info] - Simple replay (58 milliseconds)
      java.lang.NullPointerException
      	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
      	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
      ```
      https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
      
      This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests).
      
      Tested locally to verify that the NPE is gone.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10417 from andrewor14/fix-harmless-npe.
      d655d37d
    • Reynold Xin's avatar
      [SPARK-2331] SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T] · a820ca19
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10394 from rxin/SPARK-2331.
      a820ca19
    • Alex Bozarth's avatar
      [SPARK-12339][SPARK-11206][WEBUI] Added a null check that was removed in · b0849b8a
      Alex Bozarth authored
      Updates made in SPARK-11206 missed an edge case which cause's a NullPointerException when a task is killed. In some cases when a task ends in failure taskMetrics is initialized as null (see JobProgressListener.onTaskEnd()). To address this a null check was added. Before the changes in SPARK-11206 this null check was called at the start of the updateTaskAccumulatorValues() function.
      
      Author: Alex Bozarth <ajbozart@us.ibm.com>
      
      Closes #10405 from ajbozarth/spark12339.
      b0849b8a
    • pshearer's avatar
      Doc typo: ltrim = trim from left end, not right · fc6dbcc7
      pshearer authored
      Author: pshearer <pshearer@massmutual.com>
      
      Closes #10414 from pshearer/patch-1.
      fc6dbcc7
    • Takeshi YAMAMURO's avatar
      [SPARK-5882][GRAPHX] Add a test for GraphLoader.edgeListFile · 1eb90bc9
      Takeshi YAMAMURO authored
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #4674 from maropu/AddGraphLoaderSuite.
      1eb90bc9
    • Takeshi YAMAMURO's avatar
      [SPARK-12392][CORE] Optimize a location order of broadcast blocks by... · 935f4663
      Takeshi YAMAMURO authored
      [SPARK-12392][CORE] Optimize a location order of broadcast blocks by considering preferred local hosts
      
      When multiple workers exist in a host, we can bypass unnecessary remote access for broadcasts; block managers fetch broadcast blocks from the same host instead of remote hosts.
      
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #10346 from maropu/OptimizeBlockLocationOrder.
      935f4663
    • gatorsmile's avatar
      [SPARK-12374][SPARK-12150][SQL] Adding logical/physical operators for Range · 4883a508
      gatorsmile authored
      Based on the suggestions from marmbrus , added logical/physical operators for Range for improving the performance.
      
      Also added another API for resolving the JIRA Spark-12150.
      
      Could you take a look at my implementation, marmbrus ? If not good, I can rework it. : )
      
      Thank you very much!
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #10335 from gatorsmile/rangeOperators.
      4883a508
    • Wenchen Fan's avatar
      [SPARK-12321][SQL] JSON format for TreeNode (use reflection) · 7634fe95
      Wenchen Fan authored
      An alternative solution for https://github.com/apache/spark/pull/10295 , instead of implementing json format for all logical/physical plans and expressions, use reflection to implement it in `TreeNode`.
      
      Here I use pre-order traversal to flattern a plan tree to a plan list, and add an extra field `num-children` to each plan node, so that we can reconstruct the tree from the list.
      
      example json:
      
      logical plan tree:
      ```
      [ {
        "class" : "org.apache.spark.sql.catalyst.plans.logical.Sort",
        "num-children" : 1,
        "order" : [ [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.SortOrder",
          "num-children" : 1,
          "child" : 0,
          "direction" : "Ascending"
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "i",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 10,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        } ] ],
        "global" : false,
        "child" : 0
      }, {
        "class" : "org.apache.spark.sql.catalyst.plans.logical.Project",
        "num-children" : 1,
        "projectList" : [ [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.Alias",
          "num-children" : 1,
          "child" : 0,
          "name" : "i",
          "exprId" : {
            "id" : 10,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Add",
          "num-children" : 2,
          "left" : 0,
          "right" : 1
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "a",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 0,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Literal",
          "num-children" : 0,
          "value" : "1",
          "dataType" : "integer"
        } ], [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.Alias",
          "num-children" : 1,
          "child" : 0,
          "name" : "j",
          "exprId" : {
            "id" : 11,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Multiply",
          "num-children" : 2,
          "left" : 0,
          "right" : 1
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "a",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 0,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Literal",
          "num-children" : 0,
          "value" : "2",
          "dataType" : "integer"
        } ] ],
        "child" : 0
      }, {
        "class" : "org.apache.spark.sql.catalyst.plans.logical.LocalRelation",
        "num-children" : 0,
        "output" : [ [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "a",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 0,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        } ] ],
        "data" : [ ]
      } ]
      ```
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10311 from cloud-fan/toJson-reflection.
      7634fe95
    • Dilip Biswal's avatar
      [SPARK-12398] Smart truncation of DataFrame / Dataset toString · 474eb21a
      Dilip Biswal authored
      When a DataFrame or Dataset has a long schema, we should intelligently truncate to avoid flooding the screen with unreadable information.
      // Standard output
      [a: int, b: int]
      
      // Truncate many top level fields
      [a: int, b, string ... 10 more fields]
      
      // Truncate long inner structs
      [a: struct<a: Int ... 10 more fields>]
      
      Author: Dilip Biswal <dbiswal@us.ibm.com>
      
      Closes #10373 from dilipbiswal/spark-12398.
      474eb21a
    • Jeff Zhang's avatar
      [PYSPARK] Pyspark typo & Add missing abstractmethod annotation · 1920d72a
      Jeff Zhang authored
      No jira is created since this is a trivial change.
      
      davies  Please help review it
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10143 from zjffdu/pyspark_typo.
      1920d72a
    • Sean Owen's avatar
      [SPARK-12349][ML] Make spark.ml PCAModel load backwards compatible · d0f69508
      Sean Owen authored
      Only load explainedVariance in PCAModel if it was written with Spark > 1.6.x
      jkbradley is this kind of what you had in mind?
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10327 from srowen/SPARK-12349.
      d0f69508
  3. Dec 20, 2015
    • Bryan Cutler's avatar
      [SPARK-10158][PYSPARK][MLLIB] ALS better error message when using Long IDs · ce1798b3
      Bryan Cutler authored
      Added catch for casting Long to Int exception when PySpark ALS Ratings are serialized.  It is easy to accidentally use Long IDs for user/product and before, it would fail with a somewhat cryptic "ClassCastException: java.lang.Long cannot be cast to java.lang.Integer."  Now if this is done, a more descriptive error is shown, e.g. "PickleException: Ratings id 1205640308657491975 exceeds max integer value of 2147483647."
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #9361 from BryanCutler/als-pyspark-long-id-error-SPARK-10158.
      ce1798b3
    • Reynold Xin's avatar
      [SPARK-11808] Remove Bagel. · 284e29a8
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10395 from rxin/SPARK-11808.
      284e29a8
  4. Dec 19, 2015
  5. Dec 18, 2015
  6. Dec 17, 2015
    • Shixiong Zhu's avatar
      [MINOR] Hide the error logs for 'SQLListenerMemoryLeakSuite' · 0370abdf
      Shixiong Zhu authored
      Hide the error logs for 'SQLListenerMemoryLeakSuite' to avoid noises. Most of changes are space changes.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10363 from zsxwing/hide-log.
      0370abdf
    • jhu-chang's avatar
      [SPARK-11749][STREAMING] Duplicate creating the RDD in file stream when... · f4346f61
      jhu-chang authored
      [SPARK-11749][STREAMING] Duplicate creating the RDD in file stream when recovering from checkpoint data
      
      Add a transient flag `DStream.restoredFromCheckpointData` to control the restore processing in DStream to avoid duplicate works:  check this flag first in `DStream.restoreCheckpointData`, only when `false`, the restore process will be executed.
      
      Author: jhu-chang <gt.hu.chang@gmail.com>
      
      Closes #9765 from jhu-chang/SPARK-11749.
      f4346f61
    • Herman van Hovell's avatar
      [SPARK-8641][SQL] Native Spark Window functions · 658f66e6
      Herman van Hovell authored
      This PR removes Hive windows functions from Spark and replaces them with (native) Spark ones. The PR is on par with Hive in terms of features.
      
      This has the following advantages:
      * Better memory management.
      * The ability to use spark UDAFs in Window functions.
      
      cc rxin / yhuai
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #9819 from hvanhovell/SPARK-8641-2.
      658f66e6
Loading