Skip to content
Snippets Groups Projects
  1. Feb 18, 2015
    • Davies Liu's avatar
      [SPARK-5878] fix DataFrame.repartition() in Python · c1b6fa98
      Davies Liu authored
      Also add tests for distinct()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4667 from davies/repartition and squashes the following commits:
      
      79059fd [Davies Liu] add test
      cb4915e [Davies Liu] fix repartition
      c1b6fa98
  2. Feb 17, 2015
    • Davies Liu's avatar
      [SPARK-5871] output explain in Python · 3df85dcc
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4658 from davies/explain and squashes the following commits:
      
      db87ea2 [Davies Liu] output explain in Python
      3df85dcc
    • Davies Liu's avatar
      [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API · d8adefef
      Davies Liu authored
      1. added explain()
      2. add isLocal()
      3. do not call show() in __repl__
      4. add foreach() and foreachPartition()
      5. add distinct()
      6. fix functions.col()/column()/lit()
      7. fix unit tests in sql/functions.py
      8. fix unicode in showString()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4645 from davies/df6 and squashes the following commits:
      
      6b46a2c [Davies Liu] fix DataFrame Python API
      d8adefef
  3. Feb 16, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-5799][SQL] Compute aggregation function on specified numeric columns · 5c78be7a
      Liang-Chi Hsieh authored
      Compute aggregation function on specified numeric columns. For example:
      
          val df = Seq(("a", 1, 0, "b"), ("b", 2, 4, "c"), ("a", 2, 3, "d")).toDataFrame("key", "value1", "value2", "rest")
          df.groupBy("key").min("value2")
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4592 from viirya/specific_cols_agg and squashes the following commits:
      
      9446896 [Liang-Chi Hsieh] For comments.
      314c4cd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      353fad7 [Liang-Chi Hsieh] For python unit tests.
      54ed0c4 [Liang-Chi Hsieh] Address comments.
      b079e6b [Liang-Chi Hsieh] Remove duplicate codes.
      55100fb [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      880c2ac [Liang-Chi Hsieh] Fix Python style checks.
      4c63a01 [Liang-Chi Hsieh] Fix pyspark.
      b1a24fc [Liang-Chi Hsieh] Address comments.
      2592f29 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      27069c3 [Liang-Chi Hsieh] Combine functions and add varargs annotation.
      371a3f7 [Liang-Chi Hsieh] Compute aggregation function on specified numeric columns.
      5c78be7a
  4. Feb 14, 2015
    • Reynold Xin's avatar
      [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames · e98dfe62
      Reynold Xin authored
      - The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
      - toDataFrame -> toDF
      - Dsl -> functions
      - implicits moved into SQLContext.implicits
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      
      Python changes:
      - toDataFrame -> toDF
      - Dsl -> functions package
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      - add toDF functions to RDD on SQLContext init
      - add flatMap to DataFrame
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
      
      5ef9910 [Reynold Xin] More fix
      61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
      ff5832c [Reynold Xin] Fix python
      749c675 [Reynold Xin] count(*) fixes.
      5806df0 [Reynold Xin] Fix build break again.
      d941f3d [Reynold Xin] Fixed explode compilation break.
      fe1267a [Davies Liu] flatMap
      c4afb8e [Reynold Xin] style
      d9de47f [Davies Liu] add comment
      b783994 [Davies Liu] add comment for toDF
      e2154e5 [Davies Liu] schema() -> schema
      3a1004f [Davies Liu] Dsl -> functions, toDF()
      fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      97dd47c [Davies Liu] fix mistake
      6168f74 [Davies Liu] fix test
      1fc0199 [Davies Liu] fix test
      a075cd5 [Davies Liu] clean up, toPandas
      663d314 [Davies Liu] add test for agg('*')
      9e214d5 [Reynold Xin] count(*) fixes.
      1ed7136 [Reynold Xin] Fix build break again.
      921b2e3 [Reynold Xin] Fixed explode compilation break.
      14698d4 [Davies Liu] flatMap
      ba3e12d [Reynold Xin] style
      d08c92d [Davies Liu] add comment
      5c8b524 [Davies Liu] add comment for toDF
      a4e5e66 [Davies Liu] schema() -> schema
      d377fc9 [Davies Liu] Dsl -> functions, toDF()
      6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      e98dfe62
  5. Feb 12, 2015
    • Yin Huai's avatar
      [SQL] Move SaveMode to SQL package. · c025a468
      Yin Huai authored
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4542 from yhuai/moveSaveMode and squashes the following commits:
      
      65a4425 [Yin Huai] Move SaveMode to sql package.
      c025a468
  6. Feb 11, 2015
    • Davies Liu's avatar
      [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Python DataFrame API remaining tasks · b694eb9c
      Davies Liu authored
      1. DataFrame.renameColumn
      
      2. DataFrame.show() and _repr_
      
      3. Use simpleString() rather than jsonValue in DataFrame.dtypes
      
      4. createDataFrame from local Python data, including pandas.DataFrame
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4528 from davies/df3 and squashes the following commits:
      
      014acea [Davies Liu] fix typo
      6ba526e [Davies Liu] fix tests
      46f5f95 [Davies Liu] address comments
      6cbc154 [Davies Liu] dataframe.show() and improve dtypes
      6f94f25 [Davies Liu] create DataFrame from local Python data
      b694eb9c
  7. Feb 10, 2015
    • Yin Huai's avatar
      [SPARK-5658][SQL] Finalize DDL and write support APIs · aaf50d05
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5658
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4446 from yhuai/writeSupportFollowup and squashes the following commits:
      
      f3a96f7 [Yin Huai] davies's comments.
      225ff71 [Yin Huai] Use Scala TestHiveContext to initialize the Python HiveContext in Python tests.
      2306f93 [Yin Huai] Style.
      2091fcd [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      537e28f [Yin Huai] Correctly clean up temp data.
      ae4649e [Yin Huai] Fix Python test.
      609129c [Yin Huai] Doc format.
      92b6659 [Yin Huai] Python doc and other minor updates.
      cbc717f [Yin Huai] Rename dataSourceName to source.
      d1c12d3 [Yin Huai] No need to delete the duplicate rule since it has been removed in master.
      22cfa70 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      d91ecb8 [Yin Huai] Fix test.
      4c76d78 [Yin Huai] Simplify APIs.
      3abc215 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      0832ce4 [Yin Huai] Fix test.
      98e7cdb [Yin Huai] Python style.
      2bf44ef [Yin Huai] Python APIs.
      c204967 [Yin Huai] Format
      a10223d [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      9ff97d8 [Yin Huai] Add SaveMode to saveAsTable.
      9b6e570 [Yin Huai] Update doc.
      c2be775 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      99950a2 [Yin Huai] Use Java enum for SaveMode.
      4679665 [Yin Huai] Remove duplicate rule.
      77d89dc [Yin Huai] Update doc.
      e04d908 [Yin Huai] Move import and add (Scala-specific) to scala APIs.
      cf5703d [Yin Huai] Add checkAnswer to Java tests.
      7db95ff [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      6dfd386 [Yin Huai] Add java test.
      f2f33ef [Yin Huai] Fix test.
      e702386 [Yin Huai] Apache header.
      b1e9b1b [Yin Huai] Format.
      ed4e1b4 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      af9e9b3 [Yin Huai] DDL and write support API followup.
      2a6213a [Yin Huai] Update API names.
      e6a0b77 [Yin Huai] Update test.
      43bae01 [Yin Huai] Remove createTable from HiveContext.
      5ffc372 [Yin Huai] Add more load APIs to SQLContext.
      5390743 [Yin Huai] Add more save APIs to DataFrame.
      aaf50d05
    • Michael Armbrust's avatar
      [SQL] Add toString to DataFrame/Column · de80b1ba
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4436 from marmbrus/dfToString and squashes the following commits:
      
      8a3c35f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into dfToString
      b72a81b [Michael Armbrust] add toString
      de80b1ba
  8. Feb 09, 2015
    • Davies Liu's avatar
      [SPARK-5469] restructure pyspark.sql into multiple files · 08488c17
      Davies Liu authored
      All the DataTypes moved into pyspark.sql.types
      
      The changes can be tracked by `--find-copies-harder -M25`
      ```
      davieslocalhost:~/work/spark/python$ git diff --find-copies-harder -M25 --numstat master..
      2       5       python/docs/pyspark.ml.rst
      0       3       python/docs/pyspark.mllib.rst
      10      2       python/docs/pyspark.sql.rst
      1       1       python/pyspark/mllib/linalg.py
      21      14      python/pyspark/{mllib => sql}/__init__.py
      14      2108    python/pyspark/{sql.py => sql/context.py}
      10      1772    python/pyspark/{sql.py => sql/dataframe.py}
      7       6       python/pyspark/{sql_tests.py => sql/tests.py}
      8       1465    python/pyspark/{sql.py => sql/types.py}
      4       2       python/run-tests
      1       1       sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala
      ```
      
      Also `git blame -C -C python/pyspark/sql/context.py` to track the history.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4479 from davies/sql and squashes the following commits:
      
      1b5f0a5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into sql
      2b2b983 [Davies Liu] restructure pyspark.sql
      08488c17
Loading