Skip to content
Snippets Groups Projects
  1. Sep 18, 2015
    • Cheng Lian's avatar
      [SPARK-10623] [SQL] Fixes ORC predicate push-down · 22be2ae1
      Cheng Lian authored
      When pushing down a leaf predicate, ORC `SearchArgument` builder requires an extra "parent" predicate (any one among `AND`/`OR`/`NOT`) to wrap the leaf predicate. E.g., to push down `a < 1`, we must build `AND(a < 1)` instead. Fortunately, when actually constructing the `SearchArgument`, the builder will eliminate all those unnecessary wrappers.
      
      This PR is based on #8783 authored by zhzhan. I also took the chance to simply `OrcFilters` a little bit to improve readability.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8799 from liancheng/spark-10623/fix-orc-ppd.
      22be2ae1
    • Eric Liang's avatar
      [MINOR] [ML] override toString of AttributeGroup · c8149ef2
      Eric Liang authored
      This makes equality test failures much more readable.
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      Author: Eric Liang <ekhliang@gmail.com>
      
      Closes #8826 from ericl/attrgroupstr.
      c8149ef2
    • Mingyu Kim's avatar
      [SPARK-10611] Clone Configuration for each task for NewHadoopRDD · 8074208f
      Mingyu Kim authored
      This patch attempts to fix the Hadoop Configuration thread safety issue for NewHadoopRDD in the same way SPARK-2546 fixed the issue for HadoopRDD.
      
      Author: Mingyu Kim <mkim@palantir.com>
      
      Closes #8763 from mingyukim/mkim/SPARK-10611.
      8074208f
    • Reynold Xin's avatar
      [SPARK-9808] Remove hash shuffle file consolidation. · 348d7c9a
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8812 from rxin/SPARK-9808-1.
      348d7c9a
    • Holden Karau's avatar
      [SPARK-10449] [SQL] Don't merge decimal types with incompatable precision or scales · 3a22b100
      Holden Karau authored
      From JIRA: Schema merging should only handle struct fields. But currently we also reconcile decimal precision and scale information.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8634 from holdenk/SPARK-10449-dont-merge-different-precision.
      3a22b100
    • Yijie Shen's avatar
      [SPARK-10539] [SQL] Project should not be pushed down through Intersect or Except #8742 · c6f8135e
      Yijie Shen authored
      Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-10539
      
      I added some comments based on the fix of https://github.com/apache/spark/pull/8742.
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8823 from yhuai/fix_set_optimization.
      c6f8135e
    • Cheng Lian's avatar
      [SPARK-10540] Fixes flaky all-data-type test · 00a2911c
      Cheng Lian authored
      This PR breaks the original test case into multiple ones (one test case for each data type). In this way, test failure output can be much more readable.
      
      Within each test case, we build a table with two columns, one of them is for the data type to test, the other is an "index" column, which is used to sort the DataFrame and workaround [SPARK-10591] [1]
      
      [1]: https://issues.apache.org/jira/browse/SPARK-10591
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8768 from liancheng/spark-10540/test-all-data-types.
      00a2911c
    • Yanbo Liang's avatar
      [SPARK-10615] [PYSPARK] change assertEquals to assertEqual · 35e8ab93
      Yanbo Liang authored
      As ```assertEquals``` is deprecated, so we need to change ```assertEquals``` to ```assertEqual``` for existing python unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8814 from yanboliang/spark-10615.
      35e8ab93
    • Yash Datta's avatar
      [SPARK-10451] [SQL] Prevent unnecessary serializations in InMemoryColumnarTableScan · 20fd35df
      Yash Datta authored
      Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient.
      
      This  reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb
      
      Author: Yash Datta <Yash.Datta@guavus.com>
      
      Closes #8604 from saucam/serde.
      20fd35df
    • navis.ryu's avatar
      [SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized · e3b5d6cb
      navis.ryu authored
      Kryo fails with buffer overflow even with max value (2G).
      
      {noformat}
      org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1
      Serialization trace:
      containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference)
      child (org.apache.spark.sql.catalyst.expressions.SortOrder)
      array (scala.collection.mutable.ArraySeq)
      ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering)
      interpretedOrdering (org.apache.spark.sql.types.StructType)
      schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value.
              at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263)
              at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      {noformat}
      
      Author: navis.ryu <navis@apache.org>
      
      Closes #8808 from navis/SPARK-10684.
      e3b5d6cb
    • Reynold Xin's avatar
      Added <code> tag to documentation. · 74d8f7dd
      Reynold Xin authored
      74d8f7dd
    • Felix Bechstein's avatar
      docs/running-on-mesos.md: state default values in default column · 9a56dcdf
      Felix Bechstein authored
      This PR simply uses the default value column for defaults.
      
      Author: Felix Bechstein <felix.bechstein@otto.de>
      
      Closes #8810 from felixb/fix_mesos_doc.
      9a56dcdf
    • linweizhong's avatar
      [SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when... · 93c7650a
      linweizhong authored
      [SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when HiveThriftServer was starting
      
      When we start HiveThriftServer, we will start SparkContext first, then start HiveServer2, if we kill application while HiveServer2 is starting then SparkContext will stop successfully, but SparkSubmit process can not exit.
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #7853 from Sephiroth-Lin/SPARK-9522.
      93c7650a
    • Reynold Xin's avatar
      [SPARK-10682] [GRAPHX] Remove Bagel test suites. · d009da2f
      Reynold Xin authored
      Bagel has been deprecated and we haven't done any changes to it. There is no need to run those tests.
      
      This should speed up tests by 1 min.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8807 from rxin/SPARK-10682.
      d009da2f
  2. Sep 17, 2015
  3. Sep 16, 2015
Loading