Skip to content
Snippets Groups Projects
  1. Feb 17, 2015
    • Michael Armbrust's avatar
      [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation · c74b07fa
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4642 from marmbrus/docs and squashes the following commits:
      
      d291c34 [Michael Armbrust] python tests
      9be66e3 [Michael Armbrust] comments
      d56afc2 [Michael Armbrust] fix style
      f004747 [Michael Armbrust] fix build
      c4a907b [Michael Armbrust] fix tests
      42e2b73 [Michael Armbrust] [SQL] Documentation / API Clean-up.
      c74b07fa
  2. Feb 16, 2015
    • Reynold Xin's avatar
      [SQL] Various DataFrame doc changes. · 0e180bfc
      Reynold Xin authored
      Added a bunch of tags.
      
      Also changed parquetFile to take varargs rather than a string followed by varargs.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4636 from rxin/df-doc and squashes the following commits:
      
      651f80c [Reynold Xin] Fixed parquetFile in PySpark.
      8dc3024 [Reynold Xin] [SQL] Various DataFrame doc changes.
      0e180bfc
  3. Feb 14, 2015
    • Reynold Xin's avatar
      [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames · e98dfe62
      Reynold Xin authored
      - The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
      - toDataFrame -> toDF
      - Dsl -> functions
      - implicits moved into SQLContext.implicits
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      
      Python changes:
      - toDataFrame -> toDF
      - Dsl -> functions package
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      - add toDF functions to RDD on SQLContext init
      - add flatMap to DataFrame
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
      
      5ef9910 [Reynold Xin] More fix
      61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
      ff5832c [Reynold Xin] Fix python
      749c675 [Reynold Xin] count(*) fixes.
      5806df0 [Reynold Xin] Fix build break again.
      d941f3d [Reynold Xin] Fixed explode compilation break.
      fe1267a [Davies Liu] flatMap
      c4afb8e [Reynold Xin] style
      d9de47f [Davies Liu] add comment
      b783994 [Davies Liu] add comment for toDF
      e2154e5 [Davies Liu] schema() -> schema
      3a1004f [Davies Liu] Dsl -> functions, toDF()
      fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      97dd47c [Davies Liu] fix mistake
      6168f74 [Davies Liu] fix test
      1fc0199 [Davies Liu] fix test
      a075cd5 [Davies Liu] clean up, toPandas
      663d314 [Davies Liu] add test for agg('*')
      9e214d5 [Reynold Xin] count(*) fixes.
      1ed7136 [Reynold Xin] Fix build break again.
      921b2e3 [Reynold Xin] Fixed explode compilation break.
      14698d4 [Davies Liu] flatMap
      ba3e12d [Reynold Xin] style
      d08c92d [Davies Liu] add comment
      5c8b524 [Davies Liu] add comment for toDF
      a4e5e66 [Davies Liu] schema() -> schema
      d377fc9 [Davies Liu] Dsl -> functions, toDF()
      6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      e98dfe62
  4. Feb 12, 2015
  5. Feb 11, 2015
    • Davies Liu's avatar
      [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Python DataFrame API remaining tasks · b694eb9c
      Davies Liu authored
      1. DataFrame.renameColumn
      
      2. DataFrame.show() and _repr_
      
      3. Use simpleString() rather than jsonValue in DataFrame.dtypes
      
      4. createDataFrame from local Python data, including pandas.DataFrame
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4528 from davies/df3 and squashes the following commits:
      
      014acea [Davies Liu] fix typo
      6ba526e [Davies Liu] fix tests
      46f5f95 [Davies Liu] address comments
      6cbc154 [Davies Liu] dataframe.show() and improve dtypes
      6f94f25 [Davies Liu] create DataFrame from local Python data
      b694eb9c
  6. Feb 10, 2015
    • Davies Liu's avatar
      [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns · ea602840
      Davies Liu authored
      Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4498 from davies/create and squashes the following commits:
      
      08469c1 [Davies Liu] remove Scala/Java API for now
      c80a7a9 [Davies Liu] fix hive test
      d1bd8f2 [Davies Liu] cleanup applySchema
      9526e97 [Davies Liu] createDataFrame from RDD with columns
      ea602840
    • Yin Huai's avatar
      [SPARK-5658][SQL] Finalize DDL and write support APIs · aaf50d05
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5658
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4446 from yhuai/writeSupportFollowup and squashes the following commits:
      
      f3a96f7 [Yin Huai] davies's comments.
      225ff71 [Yin Huai] Use Scala TestHiveContext to initialize the Python HiveContext in Python tests.
      2306f93 [Yin Huai] Style.
      2091fcd [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      537e28f [Yin Huai] Correctly clean up temp data.
      ae4649e [Yin Huai] Fix Python test.
      609129c [Yin Huai] Doc format.
      92b6659 [Yin Huai] Python doc and other minor updates.
      cbc717f [Yin Huai] Rename dataSourceName to source.
      d1c12d3 [Yin Huai] No need to delete the duplicate rule since it has been removed in master.
      22cfa70 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      d91ecb8 [Yin Huai] Fix test.
      4c76d78 [Yin Huai] Simplify APIs.
      3abc215 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      0832ce4 [Yin Huai] Fix test.
      98e7cdb [Yin Huai] Python style.
      2bf44ef [Yin Huai] Python APIs.
      c204967 [Yin Huai] Format
      a10223d [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      9ff97d8 [Yin Huai] Add SaveMode to saveAsTable.
      9b6e570 [Yin Huai] Update doc.
      c2be775 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      99950a2 [Yin Huai] Use Java enum for SaveMode.
      4679665 [Yin Huai] Remove duplicate rule.
      77d89dc [Yin Huai] Update doc.
      e04d908 [Yin Huai] Move import and add (Scala-specific) to scala APIs.
      cf5703d [Yin Huai] Add checkAnswer to Java tests.
      7db95ff [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      6dfd386 [Yin Huai] Add java test.
      f2f33ef [Yin Huai] Fix test.
      e702386 [Yin Huai] Apache header.
      b1e9b1b [Yin Huai] Format.
      ed4e1b4 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      af9e9b3 [Yin Huai] DDL and write support API followup.
      2a6213a [Yin Huai] Update API names.
      e6a0b77 [Yin Huai] Update test.
      43bae01 [Yin Huai] Remove createTable from HiveContext.
      5ffc372 [Yin Huai] Add more load APIs to SQLContext.
      5390743 [Yin Huai] Add more save APIs to DataFrame.
      aaf50d05
  7. Feb 09, 2015
    • Davies Liu's avatar
      [SPARK-5469] restructure pyspark.sql into multiple files · 08488c17
      Davies Liu authored
      All the DataTypes moved into pyspark.sql.types
      
      The changes can be tracked by `--find-copies-harder -M25`
      ```
      davieslocalhost:~/work/spark/python$ git diff --find-copies-harder -M25 --numstat master..
      2       5       python/docs/pyspark.ml.rst
      0       3       python/docs/pyspark.mllib.rst
      10      2       python/docs/pyspark.sql.rst
      1       1       python/pyspark/mllib/linalg.py
      21      14      python/pyspark/{mllib => sql}/__init__.py
      14      2108    python/pyspark/{sql.py => sql/context.py}
      10      1772    python/pyspark/{sql.py => sql/dataframe.py}
      7       6       python/pyspark/{sql_tests.py => sql/tests.py}
      8       1465    python/pyspark/{sql.py => sql/types.py}
      4       2       python/run-tests
      1       1       sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala
      ```
      
      Also `git blame -C -C python/pyspark/sql/context.py` to track the history.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4479 from davies/sql and squashes the following commits:
      
      1b5f0a5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into sql
      2b2b983 [Davies Liu] restructure pyspark.sql
      08488c17
Loading