Skip to content
Snippets Groups Projects
  1. Jul 06, 2016
  2. Jun 18, 2016
    • Josh Howes's avatar
      [SPARK-15973][PYSPARK] Fix GroupedData Documentation · e574c997
      Josh Howes authored
      *This contribution is my original work and that I license the work to the project under the project's open source license.*
      
      ## What changes were proposed in this pull request?
      
      Documentation updates to PySpark's GroupedData
      
      ## How was this patch tested?
      
      Manual Tests
      
      Author: Josh Howes <josh.howes@gmail.com>
      Author: Josh Howes <josh.howes@maxpoint.com>
      
      Closes #13724 from josh-howes/bugfix/SPARK-15973.
      e574c997
  3. May 23, 2016
    • WeichenXu's avatar
      [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with... · a15ca553
      WeichenXu authored
      [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code
      
      ## What changes were proposed in this pull request?
      
      Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code.
      
      ## How was this patch tested?
      
      Existing test.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #13242 from WeichenXu123/python_doctest_update_sparksession.
      a15ca553
  4. Jan 14, 2016
    • Wenchen Fan's avatar
      [SPARK-12756][SQL] use hash expression in Exchange · 962e9bcf
      Wenchen Fan authored
      This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one.
      
      This PR also fixes the tests that are broken by the new hash behaviour in shuffle.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
      962e9bcf
  5. Dec 07, 2015
  6. Nov 25, 2015
  7. Nov 24, 2015
    • Reynold Xin's avatar
      [SPARK-11946][SQL] Audit pivot API for 1.6. · f3152722
      Reynold Xin authored
      Currently pivot's signature looks like
      
      ```scala
      scala.annotation.varargs
      def pivot(pivotColumn: Column, values: Column*): GroupedData
      
      scala.annotation.varargs
      def pivot(pivotColumn: String, values: Any*): GroupedData
      ```
      
      I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List.
      
      I also made similar changes for Python.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9929 from rxin/SPARK-11946.
      f3152722
  8. Nov 13, 2015
  9. Nov 03, 2015
    • Reynold Xin's avatar
      [SPARK-11489][SQL] Only include common first order statistics in GroupedData · 5051262d
      Reynold Xin authored
      We added a bunch of higher order statistics such as skewness and kurtosis to GroupedData. I don't think they are common enough to justify being listed, since users can always use the normal statistics aggregate functions.
      
      That is to say, after this change, we won't support
      ```scala
      df.groupBy("key").kurtosis("colA", "colB")
      ```
      
      However, we will still support
      ```scala
      df.groupBy("key").agg(kurtosis(col("colA")), kurtosis(col("colB")))
      ```
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9446 from rxin/SPARK-11489.
      5051262d
    • Davies Liu's avatar
      [SPARK-11467][SQL] add Python API for stddev/variance · 1d04dc95
      Davies Liu authored
      Add Python API for stddev/stddev_pop/stddev_samp/variance/var_pop/var_samp/skewness/kurtosis
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9424 from davies/py_var.
      1d04dc95
  10. Sep 08, 2015
  11. Jul 01, 2015
    • Reynold Xin's avatar
      [SPARK-8770][SQL] Create BinaryOperator abstract class. · 9fd13d56
      Reynold Xin authored
      Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
      
      This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7174 from rxin/binary-opterator and squashes the following commits:
      
      f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
      fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator
      d8518cf [Reynold Xin] Updated Python tests.
      9fd13d56
  12. May 23, 2015
    • Davies Liu's avatar
      [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates · efe3bfdf
      Davies Liu authored
      1. ntile should take an integer as parameter.
      2. Added Python API (based on #6364)
      3. Update documentation of various DataFrame Python functions.
      
      Author: Davies Liu <davies@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6374 from rxin/window-final and squashes the following commits:
      
      69004c7 [Reynold Xin] Style fix.
      288cea9 [Reynold Xin] Update documentaiton.
      7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
      66092b4 [Davies Liu] update docs
      ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
      ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
      8936ade [Davies Liu] fix maxint in python 3
      2649358 [Davies Liu] update docs
      778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
      efe3bfdf
  13. May 21, 2015
    • Davies Liu's avatar
      [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs · 8ddcb25b
      Davies Liu authored
      Add version info for public Python SQL API.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6295 from davies/versions and squashes the following commits:
      
      cfd91e6 [Davies Liu] add more version for DataFrame API
      600834d [Davies Liu] add version to SQL API docs
      8ddcb25b
  14. May 15, 2015
    • Davies Liu's avatar
      [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files · d7b69946
      Davies Liu authored
      dataframe.py is splited into column.py, group.py and dataframe.py:
      ```
         360 column.py
        1223 dataframe.py
         183 group.py
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6201 from davies/split_df and squashes the following commits:
      
      fc8f5ab [Davies Liu] split dataframe.py into multiple files
      d7b69946
Loading