Skip to content
Snippets Groups Projects
  1. Sep 21, 2015
  2. Sep 20, 2015
  3. Sep 19, 2015
    • Josh Rosen's avatar
      [SPARK-10710] Remove ability to disable spilling in core and SQL · 2117eea7
      Josh Rosen authored
      It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`.
      
      This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
      2117eea7
    • zsxwing's avatar
      [SPARK-10155] [SQL] Change SqlParser to object to avoid memory leak · e789000b
      zsxwing authored
      Since `scala.util.parsing.combinator.Parsers` is thread-safe since Scala 2.10 (See [SI-4929](https://issues.scala-lang.org/browse/SI-4929)), we can change SqlParser to object to avoid memory leak.
      
      I didn't change other subclasses of `scala.util.parsing.combinator.Parsers` because there is only one instance in one SQLContext, which should not be an issue.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8357 from zsxwing/sql-memory-leak.
      e789000b
    • Alexis Seigneurin's avatar
      Fixed links to the API · d83b6aae
      Alexis Seigneurin authored
      Submitting this change on the master branch as requested in https://github.com/apache/spark/pull/8819#issuecomment-141505941
      
      Author: Alexis Seigneurin <alexis.seigneurin@gmail.com>
      
      Closes #8838 from aseigneurin/patch-2.
      d83b6aae
    • Kousuke Saruta's avatar
      [SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is wrong. · d507f9c0
      Kousuke Saruta authored
      In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong.
      
      /CC yhuai
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #8776 from sarutak/SPARK-10584-2.
      d507f9c0
    • Andrew Or's avatar
      [SPARK-10474] [SQL] Aggregation fails to allocate memory for pointer array · 7ff8d68c
      Andrew Or authored
      When `TungstenAggregation` hits memory pressure, it switches from hash-based to sort-based aggregation in-place. However, in the process we try to allocate the pointer array for writing to the new `UnsafeExternalSorter` *before* actually freeing the memory from the hash map. This lead to the following exception:
      ```
       java.io.IOException: Could not acquire 65536 bytes of memory
              at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
              at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
              at org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:126)
              at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
              at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8827 from andrewor14/allocate-pointer-array.
      7ff8d68c
  4. Sep 18, 2015
    • Cheng Lian's avatar
      [SPARK-10623] [SQL] Fixes ORC predicate push-down · 22be2ae1
      Cheng Lian authored
      When pushing down a leaf predicate, ORC `SearchArgument` builder requires an extra "parent" predicate (any one among `AND`/`OR`/`NOT`) to wrap the leaf predicate. E.g., to push down `a < 1`, we must build `AND(a < 1)` instead. Fortunately, when actually constructing the `SearchArgument`, the builder will eliminate all those unnecessary wrappers.
      
      This PR is based on #8783 authored by zhzhan. I also took the chance to simply `OrcFilters` a little bit to improve readability.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8799 from liancheng/spark-10623/fix-orc-ppd.
      22be2ae1
    • Eric Liang's avatar
      [MINOR] [ML] override toString of AttributeGroup · c8149ef2
      Eric Liang authored
      This makes equality test failures much more readable.
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      Author: Eric Liang <ekhliang@gmail.com>
      
      Closes #8826 from ericl/attrgroupstr.
      c8149ef2
    • Mingyu Kim's avatar
      [SPARK-10611] Clone Configuration for each task for NewHadoopRDD · 8074208f
      Mingyu Kim authored
      This patch attempts to fix the Hadoop Configuration thread safety issue for NewHadoopRDD in the same way SPARK-2546 fixed the issue for HadoopRDD.
      
      Author: Mingyu Kim <mkim@palantir.com>
      
      Closes #8763 from mingyukim/mkim/SPARK-10611.
      8074208f
    • Reynold Xin's avatar
      [SPARK-9808] Remove hash shuffle file consolidation. · 348d7c9a
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8812 from rxin/SPARK-9808-1.
      348d7c9a
    • Holden Karau's avatar
      [SPARK-10449] [SQL] Don't merge decimal types with incompatable precision or scales · 3a22b100
      Holden Karau authored
      From JIRA: Schema merging should only handle struct fields. But currently we also reconcile decimal precision and scale information.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8634 from holdenk/SPARK-10449-dont-merge-different-precision.
      3a22b100
    • Yijie Shen's avatar
      [SPARK-10539] [SQL] Project should not be pushed down through Intersect or Except #8742 · c6f8135e
      Yijie Shen authored
      Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-10539
      
      I added some comments based on the fix of https://github.com/apache/spark/pull/8742.
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8823 from yhuai/fix_set_optimization.
      c6f8135e
    • Cheng Lian's avatar
      [SPARK-10540] Fixes flaky all-data-type test · 00a2911c
      Cheng Lian authored
      This PR breaks the original test case into multiple ones (one test case for each data type). In this way, test failure output can be much more readable.
      
      Within each test case, we build a table with two columns, one of them is for the data type to test, the other is an "index" column, which is used to sort the DataFrame and workaround [SPARK-10591] [1]
      
      [1]: https://issues.apache.org/jira/browse/SPARK-10591
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8768 from liancheng/spark-10540/test-all-data-types.
      00a2911c
    • Yanbo Liang's avatar
      [SPARK-10615] [PYSPARK] change assertEquals to assertEqual · 35e8ab93
      Yanbo Liang authored
      As ```assertEquals``` is deprecated, so we need to change ```assertEquals``` to ```assertEqual``` for existing python unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8814 from yanboliang/spark-10615.
      35e8ab93
    • Yash Datta's avatar
      [SPARK-10451] [SQL] Prevent unnecessary serializations in InMemoryColumnarTableScan · 20fd35df
      Yash Datta authored
      Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient.
      
      This  reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb
      
      Author: Yash Datta <Yash.Datta@guavus.com>
      
      Closes #8604 from saucam/serde.
      20fd35df
    • navis.ryu's avatar
      [SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized · e3b5d6cb
      navis.ryu authored
      Kryo fails with buffer overflow even with max value (2G).
      
      {noformat}
      org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1
      Serialization trace:
      containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference)
      child (org.apache.spark.sql.catalyst.expressions.SortOrder)
      array (scala.collection.mutable.ArraySeq)
      ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering)
      interpretedOrdering (org.apache.spark.sql.types.StructType)
      schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value.
              at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263)
              at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      {noformat}
      
      Author: navis.ryu <navis@apache.org>
      
      Closes #8808 from navis/SPARK-10684.
      e3b5d6cb
    • Reynold Xin's avatar
      Added <code> tag to documentation. · 74d8f7dd
      Reynold Xin authored
      74d8f7dd
    • Felix Bechstein's avatar
      docs/running-on-mesos.md: state default values in default column · 9a56dcdf
      Felix Bechstein authored
      This PR simply uses the default value column for defaults.
      
      Author: Felix Bechstein <felix.bechstein@otto.de>
      
      Closes #8810 from felixb/fix_mesos_doc.
      9a56dcdf
    • linweizhong's avatar
      [SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when... · 93c7650a
      linweizhong authored
      [SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when HiveThriftServer was starting
      
      When we start HiveThriftServer, we will start SparkContext first, then start HiveServer2, if we kill application while HiveServer2 is starting then SparkContext will stop successfully, but SparkSubmit process can not exit.
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #7853 from Sephiroth-Lin/SPARK-9522.
      93c7650a
    • Reynold Xin's avatar
      [SPARK-10682] [GRAPHX] Remove Bagel test suites. · d009da2f
      Reynold Xin authored
      Bagel has been deprecated and we haven't done any changes to it. There is no need to run those tests.
      
      This should speed up tests by 1 min.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8807 from rxin/SPARK-10682.
      d009da2f
  5. Sep 17, 2015
Loading