Skip to content
Snippets Groups Projects
  1. Dec 24, 2015
  2. Dec 23, 2015
    • Adrian Bridgett's avatar
      [SPARK-12499][BUILD] don't force MAVEN_OPTS · ead6abf7
      Adrian Bridgett authored
      allow the user to override MAVEN_OPTS (2GB wasn't sufficient for me)
      
      Author: Adrian Bridgett <adrian@smop.co.uk>
      
      Closes #10448 from abridgett/feature/do_not_force_maven_opts.
      ead6abf7
    • Sean Owen's avatar
      [SPARK-12500][CORE] Fix Tachyon deprecations; pull Tachyon dependency into one class · ae1f54aa
      Sean Owen authored
      Fix Tachyon deprecations; pull Tachyon dependency into `TachyonBlockManager` only
      
      CC calvinjia as I probably need a double-check that the usage of the new API is correct.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10449 from srowen/SPARK-12500.
      ae1f54aa
    • pierre-borckmans's avatar
      [SPARK-12477][SQL] - Tungsten projection fails for null values in array fields · 43b2a639
      pierre-borckmans authored
      Accessing null elements in an array field fails when tungsten is enabled.
      It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
      
      This PR solves this by checking if the accessed element in the array field is null, in the generated code.
      
      Example:
      ```
      // Array of String
      case class AS( as: Seq[String] )
      val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
      dfAS.registerTempTable("T_AS")
      for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
      ```
      
      With Tungsten disabled:
      ```
      0 = [a]
      1 = [null]
      2 = [b]
      ```
      
      With Tungsten enabled:
      ```
      0 = [a]
      15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15)
      java.lang.NullPointerException
      	at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
      	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
      	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
      	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      ```
      
      Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com>
      
      Closes #10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
      43b2a639
    • Liang-Chi Hsieh's avatar
      [SPARK-11164][SQL] Add InSet pushdown filter back for Parquet · 50301c0a
      Liang-Chi Hsieh authored
      When the filter is ```"b in ('1', '2')"```, the filter is not pushed down to Parquet. Thanks!
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      Closes #10278 from gatorsmile/parquetFilterNot.
      50301c0a
  3. Dec 22, 2015
  4. Dec 21, 2015
    • Davies Liu's avatar
      [SPARK-12388] change default compression to lz4 · 29cecd4a
      Davies Liu authored
      According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy.
      
      After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4).
      
      [1] https://github.com/ning/jvm-compressor-benchmark/wiki
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10342 from davies/lz4.
      29cecd4a
    • Andrew Or's avatar
      [SPARK-12466] Fix harmless NPE in tests · d655d37d
      Andrew Or authored
      ```
      [info] ReplayListenerSuite:
      [info] - Simple replay (58 milliseconds)
      java.lang.NullPointerException
      	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
      	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
      ```
      https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
      
      This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests).
      
      Tested locally to verify that the NPE is gone.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10417 from andrewor14/fix-harmless-npe.
      d655d37d
    • Reynold Xin's avatar
      [SPARK-2331] SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T] · a820ca19
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10394 from rxin/SPARK-2331.
      a820ca19
    • Alex Bozarth's avatar
      [SPARK-12339][SPARK-11206][WEBUI] Added a null check that was removed in · b0849b8a
      Alex Bozarth authored
      Updates made in SPARK-11206 missed an edge case which cause's a NullPointerException when a task is killed. In some cases when a task ends in failure taskMetrics is initialized as null (see JobProgressListener.onTaskEnd()). To address this a null check was added. Before the changes in SPARK-11206 this null check was called at the start of the updateTaskAccumulatorValues() function.
      
      Author: Alex Bozarth <ajbozart@us.ibm.com>
      
      Closes #10405 from ajbozarth/spark12339.
      b0849b8a
    • pshearer's avatar
      Doc typo: ltrim = trim from left end, not right · fc6dbcc7
      pshearer authored
      Author: pshearer <pshearer@massmutual.com>
      
      Closes #10414 from pshearer/patch-1.
      fc6dbcc7
    • Takeshi YAMAMURO's avatar
      [SPARK-5882][GRAPHX] Add a test for GraphLoader.edgeListFile · 1eb90bc9
      Takeshi YAMAMURO authored
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #4674 from maropu/AddGraphLoaderSuite.
      1eb90bc9
    • Takeshi YAMAMURO's avatar
      [SPARK-12392][CORE] Optimize a location order of broadcast blocks by... · 935f4663
      Takeshi YAMAMURO authored
      [SPARK-12392][CORE] Optimize a location order of broadcast blocks by considering preferred local hosts
      
      When multiple workers exist in a host, we can bypass unnecessary remote access for broadcasts; block managers fetch broadcast blocks from the same host instead of remote hosts.
      
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #10346 from maropu/OptimizeBlockLocationOrder.
      935f4663
    • gatorsmile's avatar
      [SPARK-12374][SPARK-12150][SQL] Adding logical/physical operators for Range · 4883a508
      gatorsmile authored
      Based on the suggestions from marmbrus , added logical/physical operators for Range for improving the performance.
      
      Also added another API for resolving the JIRA Spark-12150.
      
      Could you take a look at my implementation, marmbrus ? If not good, I can rework it. : )
      
      Thank you very much!
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #10335 from gatorsmile/rangeOperators.
      4883a508
    • Wenchen Fan's avatar
      [SPARK-12321][SQL] JSON format for TreeNode (use reflection) · 7634fe95
      Wenchen Fan authored
      An alternative solution for https://github.com/apache/spark/pull/10295 , instead of implementing json format for all logical/physical plans and expressions, use reflection to implement it in `TreeNode`.
      
      Here I use pre-order traversal to flattern a plan tree to a plan list, and add an extra field `num-children` to each plan node, so that we can reconstruct the tree from the list.
      
      example json:
      
      logical plan tree:
      ```
      [ {
        "class" : "org.apache.spark.sql.catalyst.plans.logical.Sort",
        "num-children" : 1,
        "order" : [ [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.SortOrder",
          "num-children" : 1,
          "child" : 0,
          "direction" : "Ascending"
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "i",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 10,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        } ] ],
        "global" : false,
        "child" : 0
      }, {
        "class" : "org.apache.spark.sql.catalyst.plans.logical.Project",
        "num-children" : 1,
        "projectList" : [ [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.Alias",
          "num-children" : 1,
          "child" : 0,
          "name" : "i",
          "exprId" : {
            "id" : 10,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Add",
          "num-children" : 2,
          "left" : 0,
          "right" : 1
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "a",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 0,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Literal",
          "num-children" : 0,
          "value" : "1",
          "dataType" : "integer"
        } ], [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.Alias",
          "num-children" : 1,
          "child" : 0,
          "name" : "j",
          "exprId" : {
            "id" : 11,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Multiply",
          "num-children" : 2,
          "left" : 0,
          "right" : 1
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "a",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 0,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        }, {
          "class" : "org.apache.spark.sql.catalyst.expressions.Literal",
          "num-children" : 0,
          "value" : "2",
          "dataType" : "integer"
        } ] ],
        "child" : 0
      }, {
        "class" : "org.apache.spark.sql.catalyst.plans.logical.LocalRelation",
        "num-children" : 0,
        "output" : [ [ {
          "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
          "num-children" : 0,
          "name" : "a",
          "dataType" : "integer",
          "nullable" : true,
          "metadata" : { },
          "exprId" : {
            "id" : 0,
            "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6"
          },
          "qualifiers" : [ ]
        } ] ],
        "data" : [ ]
      } ]
      ```
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10311 from cloud-fan/toJson-reflection.
      7634fe95
    • Dilip Biswal's avatar
      [SPARK-12398] Smart truncation of DataFrame / Dataset toString · 474eb21a
      Dilip Biswal authored
      When a DataFrame or Dataset has a long schema, we should intelligently truncate to avoid flooding the screen with unreadable information.
      // Standard output
      [a: int, b: int]
      
      // Truncate many top level fields
      [a: int, b, string ... 10 more fields]
      
      // Truncate long inner structs
      [a: struct<a: Int ... 10 more fields>]
      
      Author: Dilip Biswal <dbiswal@us.ibm.com>
      
      Closes #10373 from dilipbiswal/spark-12398.
      474eb21a
    • Jeff Zhang's avatar
      [PYSPARK] Pyspark typo & Add missing abstractmethod annotation · 1920d72a
      Jeff Zhang authored
      No jira is created since this is a trivial change.
      
      davies  Please help review it
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10143 from zjffdu/pyspark_typo.
      1920d72a
    • Sean Owen's avatar
      [SPARK-12349][ML] Make spark.ml PCAModel load backwards compatible · d0f69508
      Sean Owen authored
      Only load explainedVariance in PCAModel if it was written with Spark > 1.6.x
      jkbradley is this kind of what you had in mind?
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10327 from srowen/SPARK-12349.
      d0f69508
  5. Dec 20, 2015
    • Bryan Cutler's avatar
      [SPARK-10158][PYSPARK][MLLIB] ALS better error message when using Long IDs · ce1798b3
      Bryan Cutler authored
      Added catch for casting Long to Int exception when PySpark ALS Ratings are serialized.  It is easy to accidentally use Long IDs for user/product and before, it would fail with a somewhat cryptic "ClassCastException: java.lang.Long cannot be cast to java.lang.Integer."  Now if this is done, a more descriptive error is shown, e.g. "PickleException: Ratings id 1205640308657491975 exceeds max integer value of 2147483647."
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #9361 from BryanCutler/als-pyspark-long-id-error-SPARK-10158.
      ce1798b3
    • Reynold Xin's avatar
      [SPARK-11808] Remove Bagel. · 284e29a8
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10395 from rxin/SPARK-11808.
      284e29a8
  6. Dec 19, 2015
  7. Dec 18, 2015
Loading