Skip to content
Snippets Groups Projects
  1. Sep 05, 2015
  2. Sep 04, 2015
    • Yin Huai's avatar
      [SPARK-9925] [SQL] [TESTS] Set SQLConf.SHUFFLE_PARTITIONS.key correctly for tests · 47058ca5
      Yin Huai authored
      This PR fix the failed test and conflict for #8155
      
      https://issues.apache.org/jira/browse/SPARK-9925
      
      Closes #8155
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8602 from davies/shuffle_partitions.
      47058ca5
    • Holden Karau's avatar
      [SPARK-10402] [DOCS] [ML] Add defaults to the scaladoc for params in ml/ · 22eab706
      Holden Karau authored
      We should make sure the scaladoc for params includes their default values through the models in ml/
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8591 from holdenk/SPARK-10402-add-scaladoc-for-default-values-of-params-in-ml.
      22eab706
    • xutingjun's avatar
      [SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with... · eafe3723
      xutingjun authored
      [SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with checkpoint file in cluster mode
      
      Author: xutingjun <xutingjun@huawei.com>
      
      Closes #8477 from XuTingjun/streaming-attempt.
      eafe3723
    • robbins's avatar
      [SPARK-10454] [SPARK CORE] wait for empty event queue · 2e1c1755
      robbins authored
      Author: robbins <robbins@uk.ibm.com>
      
      Closes #8605 from robbinspg/DAGSchedulerSuite-fix.
      2e1c1755
    • Timothy Chen's avatar
      [SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode. · b087d23e
      Timothy Chen authored
      Support running pyspark with cluster mode on Mesos!
      This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #8349 from tnachen/mesos_python.
      b087d23e
    • Andrew Or's avatar
      [SPARK-10450] [SQL] Minor improvements to readability / style / typos etc. · 3339e6f6
      Andrew Or authored
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8603 from andrewor14/minor-sql-changes.
      3339e6f6
    • Wenchen Fan's avatar
      [SPARK-10176] [SQL] Show partially analyzed plans when checkAnswer fails to analyze · c3c0e431
      Wenchen Fan authored
      This PR takes over https://github.com/apache/spark/pull/8389.
      
      This PR improves `checkAnswer` to print the partially analyzed plan in addition to the user friendly error message, in order to aid debugging failing tests.
      
      In doing so, I ran into a conflict with the various ways that we bring a SQLContext into the tests. Depending on the trait we refer to the current context as `sqlContext`, `_sqlContext`, `ctx` or `hiveContext` with access modifiers `public`, `protected` and `private` depending on the defining class.
      
      I propose we refactor as follows:
      
      1. All tests should only refer to a `protected sqlContext` when testing general features, and `protected hiveContext` when it is a method that only exists on a `HiveContext`.
      2. All tests should only import `testImplicits._` (i.e., don't import `TestHive.implicits._`)
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8584 from cloud-fan/cleanupTests.
      c3c0e431
    • Michael Armbrust's avatar
      MAINTENANCE: Automated closing of pull requests. · 804a0126
      Michael Armbrust authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1890 (requested by andrewor14, JoshRosen)
      Closes #3558 (requested by JoshRosen, marmbrus)
      Closes #3890 (requested by marmbrus)
      Closes #3895 (requested by andrewor14, marmbrus)
      Closes #4055 (requested by andrewor14)
      Closes #4105 (requested by andrewor14)
      Closes #4812 (requested by marmbrus)
      Closes #5109 (requested by andrewor14)
      Closes #5178 (requested by andrewor14)
      Closes #5298 (requested by marmbrus)
      Closes #5393 (requested by marmbrus)
      Closes #5449 (requested by andrewor14)
      Closes #5468 (requested by marmbrus)
      Closes #5715 (requested by marmbrus)
      Closes #6192 (requested by marmbrus)
      Closes #6319 (requested by marmbrus)
      Closes #6326 (requested by marmbrus)
      Closes #6349 (requested by marmbrus)
      Closes #6380 (requested by andrewor14)
      Closes #6554 (requested by marmbrus)
      Closes #6696 (requested by marmbrus)
      Closes #6868 (requested by marmbrus)
      Closes #6951 (requested by marmbrus)
      Closes #7129 (requested by marmbrus)
      Closes #7188 (requested by marmbrus)
      Closes #7358 (requested by marmbrus)
      Closes #7379 (requested by marmbrus)
      Closes #7628 (requested by marmbrus)
      Closes #7715 (requested by marmbrus)
      Closes #7782 (requested by marmbrus)
      Closes #7914 (requested by andrewor14)
      Closes #8051 (requested by andrewor14)
      Closes #8269 (requested by andrewor14)
      Closes #8448 (requested by andrewor14)
      Closes #8576 (requested by andrewor14)
      804a0126
    • Shivaram Venkataraman's avatar
      [MINOR] Minor style fix in SparkR · 143e521d
      Shivaram Venkataraman authored
      `dev/lintr-r` passes on my machine now
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8601 from shivaram/sparkr-style-fix.
      143e521d
  3. Sep 03, 2015
  4. Sep 02, 2015
    • Holden Karau's avatar
      [SPARK-9723] [ML] params getordefault should throw more useful error · 44948a2e
      Holden Karau authored
      Params.getOrDefault should throw a more meaningful exception than what you get from a bad key lookup.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8567 from holdenk/SPARK-9723-params-getordefault-should-throw-more-useful-error.
      44948a2e
    • Yin Huai's avatar
      [SPARK-10422] [SQL] String column in InMemoryColumnarCache needs to override clone method · 03f3e91f
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-10422
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8578 from yhuai/SPARK-10422.
      03f3e91f
    • 0x0FFF's avatar
      [SPARK-10417] [SQL] Iterating through Column results in infinite loop · 6cd98c18
      0x0FFF authored
      `pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)
      
      Issue reproduction:
      ```
      df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
      for i in df["name"]: print i
      ```
      
      Author: 0x0FFF <programmerag@gmail.com>
      
      Closes #8574 from 0x0FFF/SPARK-10417.
      6cd98c18
    • Marcelo Vanzin's avatar
      [SPARK-10004] [SHUFFLE] Perform auth checks when clients read shuffle data. · 2da3a9e9
      Marcelo Vanzin authored
      To correctly isolate applications, when requests to read shuffle data
      arrive at the shuffle service, proper authorization checks need to
      be performed. This change makes sure that only the application that
      created the shuffle data can read from it.
      
      Such checks are only enabled when "spark.authenticate" is enabled,
      otherwise there's no secure way to make sure that the client is really
      who it says it is.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8218 from vanzin/SPARK-10004.
      2da3a9e9
    • Wenchen Fan's avatar
      [SPARK-10389] [SQL] support order by non-attribute grouping expression on Aggregate · fc483077
      Wenchen Fan authored
      For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8548 from cloud-fan/support-order-by-non-attribute.
      fc483077
    • Wenchen Fan's avatar
      [SPARK-10034] [SQL] add regression test for Sort on Aggregate · 56c4c172
      Wenchen Fan authored
      Before #8371, there was a bug for `Sort` on `Aggregate` that we can't use aggregate expressions named `_aggOrdering` and can't use more than one ordering expressions which contains aggregate functions. The reason of this bug is that: The aggregate expression in `SortOrder` never get resolved, we alias it with `_aggOrdering` and call `toAttribute` which gives us an `UnresolvedAttribute`. So actually we are referencing aggregate expression by name, not by exprId like we thought. And if there is already an aggregate expression named `_aggOrdering` or there are more than one ordering expressions having aggregate functions, we will have conflict names and can't search by name.
      
      However, after #8371 got merged, the `SortOrder`s are guaranteed to be resolved and we are always referencing aggregate expression by exprId. The Bug doesn't exist anymore and this PR add regression tests for it.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8231 from cloud-fan/sort-agg.
      56c4c172
    • Chuan Shao's avatar
      [SPARK-7336] [HISTORYSERVER] Fix bug that applications status incorrect on JobHistory UI. · c3b881a7
      Chuan Shao authored
      Author: ArcherShao <shaochuan@huawei.com>
      
      Closes #5886 from ArcherShao/SPARK-7336.
      c3b881a7
  5. Sep 01, 2015
    • 0x0FFF's avatar
      [SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection · 00d9af5e
      0x0FFF authored
      This PR addresses issue [SPARK-10392](https://issues.apache.org/jira/browse/SPARK-10392)
      The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the `datetime.date` due to implementation of its return statement
      
      Issue reproduction on master:
      ```
      >>> from pyspark.sql.types import *
      >>> a = DateType()
      >>> a.fromInternal(0)
      0
      >>> a.fromInternal(1)
      datetime.date(1970, 1, 2)
      ```
      
      Author: 0x0FFF <programmerag@gmail.com>
      
      Closes #8556 from 0x0FFF/SPARK-10392.
      00d9af5e
    • 0x0FFF's avatar
      [SPARK-10162] [SQL] Fix the timezone omitting for PySpark Dataframe filter function · bf550a4b
      0x0FFF authored
      This PR addresses [SPARK-10162](https://issues.apache.org/jira/browse/SPARK-10162)
      The issue is with DataFrame filter() function, if datetime.datetime is passed to it:
      * Timezone information of this datetime is ignored
      * This datetime is assumed to be in local timezone, which depends on the OS timezone setting
      
      Fix includes both code change and regression test. Problem reproduction code on master:
      ```python
      import pytz
      from datetime import datetime
      from pyspark.sql import *
      from pyspark.sql.types import *
      sqc = SQLContext(sc)
      df = sqc.createDataFrame([], StructType([StructField("dt", TimestampType())]))
      
      m1 = pytz.timezone('UTC')
      m2 = pytz.timezone('Etc/GMT+3')
      
      df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain()
      df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain()
      ```
      It gives the same timestamp ignoring time zone:
      ```
      >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain()
      Filter (dt#0 > 946713600000000)
       Scan PhysicalRDD[dt#0]
      
      >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain()
      Filter (dt#0 > 946713600000000)
       Scan PhysicalRDD[dt#0]
      ```
      After the fix:
      ```
      >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain()
      Filter (dt#0 > 946684800000000)
       Scan PhysicalRDD[dt#0]
      
      >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain()
      Filter (dt#0 > 946695600000000)
       Scan PhysicalRDD[dt#0]
      ```
      PR [8536](https://github.com/apache/spark/pull/8536) was occasionally closed by me dropping the repo
      
      Author: 0x0FFF <programmerag@gmail.com>
      
      Closes #8555 from 0x0FFF/SPARK-10162.
      bf550a4b
    • zhuol's avatar
      [SPARK-4223] [CORE] Support * in acls. · ec012805
      zhuol authored
      SPARK-4223.
      
      Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access.
      
      Manual tests to verify that: "*" works for any user in:
      a. Spark ui: view and kill stage.     Done.
      b. Spark history server.                  Done.
      c. Yarn application killing.  Done.
      
      Author: zhuol <zhuol@yahoo-inc.com>
      
      Closes #8398 from zhuoliu/4223.
      ec012805
    • Sean Owen's avatar
      [SPARK-10398] [DOCS] Migrate Spark download page to use new lua mirroring scripts · 3f63bd60
      Sean Owen authored
      Migrate Apache download closer.cgi refs to new closer.lua
      
      This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #8557 from srowen/SPARK-10398.
      3f63bd60
Loading