Skip to content
Snippets Groups Projects
  1. Sep 18, 2015
    • Reynold Xin's avatar
      [SPARK-9808] Remove hash shuffle file consolidation. · 348d7c9a
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8812 from rxin/SPARK-9808-1.
      348d7c9a
    • Holden Karau's avatar
      [SPARK-10449] [SQL] Don't merge decimal types with incompatable precision or scales · 3a22b100
      Holden Karau authored
      From JIRA: Schema merging should only handle struct fields. But currently we also reconcile decimal precision and scale information.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8634 from holdenk/SPARK-10449-dont-merge-different-precision.
      3a22b100
    • Yijie Shen's avatar
      [SPARK-10539] [SQL] Project should not be pushed down through Intersect or Except #8742 · c6f8135e
      Yijie Shen authored
      Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-10539
      
      I added some comments based on the fix of https://github.com/apache/spark/pull/8742.
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8823 from yhuai/fix_set_optimization.
      c6f8135e
    • Cheng Lian's avatar
      [SPARK-10540] Fixes flaky all-data-type test · 00a2911c
      Cheng Lian authored
      This PR breaks the original test case into multiple ones (one test case for each data type). In this way, test failure output can be much more readable.
      
      Within each test case, we build a table with two columns, one of them is for the data type to test, the other is an "index" column, which is used to sort the DataFrame and workaround [SPARK-10591] [1]
      
      [1]: https://issues.apache.org/jira/browse/SPARK-10591
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8768 from liancheng/spark-10540/test-all-data-types.
      00a2911c
    • Yanbo Liang's avatar
      [SPARK-10615] [PYSPARK] change assertEquals to assertEqual · 35e8ab93
      Yanbo Liang authored
      As ```assertEquals``` is deprecated, so we need to change ```assertEquals``` to ```assertEqual``` for existing python unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8814 from yanboliang/spark-10615.
      35e8ab93
    • Yash Datta's avatar
      [SPARK-10451] [SQL] Prevent unnecessary serializations in InMemoryColumnarTableScan · 20fd35df
      Yash Datta authored
      Many of the fields in InMemoryColumnar scan and InMemoryRelation can be made transient.
      
      This  reduces my 1000ms job to abt 700 ms . The task size reduces from 2.8 mb to ~1300kb
      
      Author: Yash Datta <Yash.Datta@guavus.com>
      
      Closes #8604 from saucam/serde.
      20fd35df
    • navis.ryu's avatar
      [SPARK-10684] [SQL] StructType.interpretedOrdering need not to be serialized · e3b5d6cb
      navis.ryu authored
      Kryo fails with buffer overflow even with max value (2G).
      
      {noformat}
      org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1
      Serialization trace:
      containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference)
      child (org.apache.spark.sql.catalyst.expressions.SortOrder)
      array (scala.collection.mutable.ArraySeq)
      ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering)
      interpretedOrdering (org.apache.spark.sql.types.StructType)
      schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value.
              at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263)
              at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      {noformat}
      
      Author: navis.ryu <navis@apache.org>
      
      Closes #8808 from navis/SPARK-10684.
      e3b5d6cb
    • Reynold Xin's avatar
      Added <code> tag to documentation. · 74d8f7dd
      Reynold Xin authored
      74d8f7dd
    • Felix Bechstein's avatar
      docs/running-on-mesos.md: state default values in default column · 9a56dcdf
      Felix Bechstein authored
      This PR simply uses the default value column for defaults.
      
      Author: Felix Bechstein <felix.bechstein@otto.de>
      
      Closes #8810 from felixb/fix_mesos_doc.
      9a56dcdf
    • linweizhong's avatar
      [SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when... · 93c7650a
      linweizhong authored
      [SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when HiveThriftServer was starting
      
      When we start HiveThriftServer, we will start SparkContext first, then start HiveServer2, if we kill application while HiveServer2 is starting then SparkContext will stop successfully, but SparkSubmit process can not exit.
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #7853 from Sephiroth-Lin/SPARK-9522.
      93c7650a
    • Reynold Xin's avatar
      [SPARK-10682] [GRAPHX] Remove Bagel test suites. · d009da2f
      Reynold Xin authored
      Bagel has been deprecated and we haven't done any changes to it. There is no need to run those tests.
      
      This should speed up tests by 1 min.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8807 from rxin/SPARK-10682.
      d009da2f
  2. Sep 17, 2015
  3. Sep 16, 2015
  4. Sep 15, 2015
    • Joseph K. Bradley's avatar
      [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups · b921fe4d
      Joseph K. Bradley authored
      Various ML guide cleanups.
      
      * ml-guide.md: Make it easier to access the algorithm-specific guides.
      * LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically.  E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics.
      * mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec”
      * Clean up Binarizer user guide a little.
      * Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place.
      * spark.ml Word2Vec user guide: clean up grammar/writing
      * Chi Sq Feature Selector docs: Improve text in doc.
      
      CC: mengxr feynmanliang
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8752 from jkbradley/mlguide-fixes-1.5.
      b921fe4d
Loading