Skip to content
Snippets Groups Projects
  1. Sep 08, 2015
  2. Sep 07, 2015
    • Reynold Xin's avatar
      [SPARK-9767] Remove ConnectionManager. · 5ffe752b
      Reynold Xin authored
      We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8161 from rxin/SPARK-9767.
      5ffe752b
  3. Sep 05, 2015
  4. Sep 04, 2015
    • Yin Huai's avatar
      [SPARK-9925] [SQL] [TESTS] Set SQLConf.SHUFFLE_PARTITIONS.key correctly for tests · 47058ca5
      Yin Huai authored
      This PR fix the failed test and conflict for #8155
      
      https://issues.apache.org/jira/browse/SPARK-9925
      
      Closes #8155
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8602 from davies/shuffle_partitions.
      47058ca5
    • Holden Karau's avatar
      [SPARK-10402] [DOCS] [ML] Add defaults to the scaladoc for params in ml/ · 22eab706
      Holden Karau authored
      We should make sure the scaladoc for params includes their default values through the models in ml/
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8591 from holdenk/SPARK-10402-add-scaladoc-for-default-values-of-params-in-ml.
      22eab706
    • xutingjun's avatar
      [SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with... · eafe3723
      xutingjun authored
      [SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with checkpoint file in cluster mode
      
      Author: xutingjun <xutingjun@huawei.com>
      
      Closes #8477 from XuTingjun/streaming-attempt.
      eafe3723
    • robbins's avatar
      [SPARK-10454] [SPARK CORE] wait for empty event queue · 2e1c1755
      robbins authored
      Author: robbins <robbins@uk.ibm.com>
      
      Closes #8605 from robbinspg/DAGSchedulerSuite-fix.
      2e1c1755
    • Timothy Chen's avatar
      [SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode. · b087d23e
      Timothy Chen authored
      Support running pyspark with cluster mode on Mesos!
      This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #8349 from tnachen/mesos_python.
      b087d23e
    • Andrew Or's avatar
      [SPARK-10450] [SQL] Minor improvements to readability / style / typos etc. · 3339e6f6
      Andrew Or authored
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8603 from andrewor14/minor-sql-changes.
      3339e6f6
    • Wenchen Fan's avatar
      [SPARK-10176] [SQL] Show partially analyzed plans when checkAnswer fails to analyze · c3c0e431
      Wenchen Fan authored
      This PR takes over https://github.com/apache/spark/pull/8389.
      
      This PR improves `checkAnswer` to print the partially analyzed plan in addition to the user friendly error message, in order to aid debugging failing tests.
      
      In doing so, I ran into a conflict with the various ways that we bring a SQLContext into the tests. Depending on the trait we refer to the current context as `sqlContext`, `_sqlContext`, `ctx` or `hiveContext` with access modifiers `public`, `protected` and `private` depending on the defining class.
      
      I propose we refactor as follows:
      
      1. All tests should only refer to a `protected sqlContext` when testing general features, and `protected hiveContext` when it is a method that only exists on a `HiveContext`.
      2. All tests should only import `testImplicits._` (i.e., don't import `TestHive.implicits._`)
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8584 from cloud-fan/cleanupTests.
      c3c0e431
    • Michael Armbrust's avatar
      MAINTENANCE: Automated closing of pull requests. · 804a0126
      Michael Armbrust authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1890 (requested by andrewor14, JoshRosen)
      Closes #3558 (requested by JoshRosen, marmbrus)
      Closes #3890 (requested by marmbrus)
      Closes #3895 (requested by andrewor14, marmbrus)
      Closes #4055 (requested by andrewor14)
      Closes #4105 (requested by andrewor14)
      Closes #4812 (requested by marmbrus)
      Closes #5109 (requested by andrewor14)
      Closes #5178 (requested by andrewor14)
      Closes #5298 (requested by marmbrus)
      Closes #5393 (requested by marmbrus)
      Closes #5449 (requested by andrewor14)
      Closes #5468 (requested by marmbrus)
      Closes #5715 (requested by marmbrus)
      Closes #6192 (requested by marmbrus)
      Closes #6319 (requested by marmbrus)
      Closes #6326 (requested by marmbrus)
      Closes #6349 (requested by marmbrus)
      Closes #6380 (requested by andrewor14)
      Closes #6554 (requested by marmbrus)
      Closes #6696 (requested by marmbrus)
      Closes #6868 (requested by marmbrus)
      Closes #6951 (requested by marmbrus)
      Closes #7129 (requested by marmbrus)
      Closes #7188 (requested by marmbrus)
      Closes #7358 (requested by marmbrus)
      Closes #7379 (requested by marmbrus)
      Closes #7628 (requested by marmbrus)
      Closes #7715 (requested by marmbrus)
      Closes #7782 (requested by marmbrus)
      Closes #7914 (requested by andrewor14)
      Closes #8051 (requested by andrewor14)
      Closes #8269 (requested by andrewor14)
      Closes #8448 (requested by andrewor14)
      Closes #8576 (requested by andrewor14)
      804a0126
    • Shivaram Venkataraman's avatar
      [MINOR] Minor style fix in SparkR · 143e521d
      Shivaram Venkataraman authored
      `dev/lintr-r` passes on my machine now
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8601 from shivaram/sparkr-style-fix.
      143e521d
  5. Sep 03, 2015
  6. Sep 02, 2015
    • Holden Karau's avatar
      [SPARK-9723] [ML] params getordefault should throw more useful error · 44948a2e
      Holden Karau authored
      Params.getOrDefault should throw a more meaningful exception than what you get from a bad key lookup.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8567 from holdenk/SPARK-9723-params-getordefault-should-throw-more-useful-error.
      44948a2e
    • Yin Huai's avatar
      [SPARK-10422] [SQL] String column in InMemoryColumnarCache needs to override clone method · 03f3e91f
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-10422
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8578 from yhuai/SPARK-10422.
      03f3e91f
    • 0x0FFF's avatar
      [SPARK-10417] [SQL] Iterating through Column results in infinite loop · 6cd98c18
      0x0FFF authored
      `pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)
      
      Issue reproduction:
      ```
      df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
      for i in df["name"]: print i
      ```
      
      Author: 0x0FFF <programmerag@gmail.com>
      
      Closes #8574 from 0x0FFF/SPARK-10417.
      6cd98c18
    • Marcelo Vanzin's avatar
      [SPARK-10004] [SHUFFLE] Perform auth checks when clients read shuffle data. · 2da3a9e9
      Marcelo Vanzin authored
      To correctly isolate applications, when requests to read shuffle data
      arrive at the shuffle service, proper authorization checks need to
      be performed. This change makes sure that only the application that
      created the shuffle data can read from it.
      
      Such checks are only enabled when "spark.authenticate" is enabled,
      otherwise there's no secure way to make sure that the client is really
      who it says it is.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8218 from vanzin/SPARK-10004.
      2da3a9e9
    • Wenchen Fan's avatar
      [SPARK-10389] [SQL] support order by non-attribute grouping expression on Aggregate · fc483077
      Wenchen Fan authored
      For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8548 from cloud-fan/support-order-by-non-attribute.
      fc483077
    • Wenchen Fan's avatar
      [SPARK-10034] [SQL] add regression test for Sort on Aggregate · 56c4c172
      Wenchen Fan authored
      Before #8371, there was a bug for `Sort` on `Aggregate` that we can't use aggregate expressions named `_aggOrdering` and can't use more than one ordering expressions which contains aggregate functions. The reason of this bug is that: The aggregate expression in `SortOrder` never get resolved, we alias it with `_aggOrdering` and call `toAttribute` which gives us an `UnresolvedAttribute`. So actually we are referencing aggregate expression by name, not by exprId like we thought. And if there is already an aggregate expression named `_aggOrdering` or there are more than one ordering expressions having aggregate functions, we will have conflict names and can't search by name.
      
      However, after #8371 got merged, the `SortOrder`s are guaranteed to be resolved and we are always referencing aggregate expression by exprId. The Bug doesn't exist anymore and this PR add regression tests for it.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8231 from cloud-fan/sort-agg.
      56c4c172
Loading