Skip to content
Snippets Groups Projects
  1. Feb 18, 2015
    • Davies Liu's avatar
      [SPARK-5722] [SQL] [PySpark] infer int as LongType · aa8f10e8
      Davies Liu authored
      The `int` is 64-bit on 64-bit machine (very common now), we should infer it as LongType for it in Spark SQL.
      
      Also, LongType in SQL will come back as `int`.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4666 from davies/long and squashes the following commits:
      
      6bc6cc4 [Davies Liu] infer int as LongType
      aa8f10e8
  2. Feb 17, 2015
    • Davies Liu's avatar
      [SPARK-5872] [SQL] create a sqlCtx in pyspark shell · 4d4cc760
      Davies Liu authored
      The sqlCtx will be HiveContext if hive is built in assembly jar, or SQLContext if not.
      
      It also skip the Hive tests in pyspark.sql.tests if no hive is available.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4659 from davies/sqlctx and squashes the following commits:
      
      0e6629a [Davies Liu] sqlCtx in pyspark
      4d4cc760
  3. Feb 14, 2015
    • Reynold Xin's avatar
      [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames · e98dfe62
      Reynold Xin authored
      - The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
      - toDataFrame -> toDF
      - Dsl -> functions
      - implicits moved into SQLContext.implicits
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      
      Python changes:
      - toDataFrame -> toDF
      - Dsl -> functions package
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      - add toDF functions to RDD on SQLContext init
      - add flatMap to DataFrame
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
      
      5ef9910 [Reynold Xin] More fix
      61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
      ff5832c [Reynold Xin] Fix python
      749c675 [Reynold Xin] count(*) fixes.
      5806df0 [Reynold Xin] Fix build break again.
      d941f3d [Reynold Xin] Fixed explode compilation break.
      fe1267a [Davies Liu] flatMap
      c4afb8e [Reynold Xin] style
      d9de47f [Davies Liu] add comment
      b783994 [Davies Liu] add comment for toDF
      e2154e5 [Davies Liu] schema() -> schema
      3a1004f [Davies Liu] Dsl -> functions, toDF()
      fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      97dd47c [Davies Liu] fix mistake
      6168f74 [Davies Liu] fix test
      1fc0199 [Davies Liu] fix test
      a075cd5 [Davies Liu] clean up, toPandas
      663d314 [Davies Liu] add test for agg('*')
      9e214d5 [Reynold Xin] count(*) fixes.
      1ed7136 [Reynold Xin] Fix build break again.
      921b2e3 [Reynold Xin] Fixed explode compilation break.
      14698d4 [Davies Liu] flatMap
      ba3e12d [Reynold Xin] style
      d08c92d [Davies Liu] add comment
      5c8b524 [Davies Liu] add comment for toDF
      a4e5e66 [Davies Liu] schema() -> schema
      d377fc9 [Davies Liu] Dsl -> functions, toDF()
      6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      e98dfe62
  4. Feb 11, 2015
    • Davies Liu's avatar
      [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Python DataFrame API remaining tasks · b694eb9c
      Davies Liu authored
      1. DataFrame.renameColumn
      
      2. DataFrame.show() and _repr_
      
      3. Use simpleString() rather than jsonValue in DataFrame.dtypes
      
      4. createDataFrame from local Python data, including pandas.DataFrame
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4528 from davies/df3 and squashes the following commits:
      
      014acea [Davies Liu] fix typo
      6ba526e [Davies Liu] fix tests
      46f5f95 [Davies Liu] address comments
      6cbc154 [Davies Liu] dataframe.show() and improve dtypes
      6f94f25 [Davies Liu] create DataFrame from local Python data
      b694eb9c
  5. Feb 10, 2015
    • Davies Liu's avatar
      [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns · ea602840
      Davies Liu authored
      Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4498 from davies/create and squashes the following commits:
      
      08469c1 [Davies Liu] remove Scala/Java API for now
      c80a7a9 [Davies Liu] fix hive test
      d1bd8f2 [Davies Liu] cleanup applySchema
      9526e97 [Davies Liu] createDataFrame from RDD with columns
      ea602840
    • Yin Huai's avatar
      [SPARK-5658][SQL] Finalize DDL and write support APIs · aaf50d05
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5658
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4446 from yhuai/writeSupportFollowup and squashes the following commits:
      
      f3a96f7 [Yin Huai] davies's comments.
      225ff71 [Yin Huai] Use Scala TestHiveContext to initialize the Python HiveContext in Python tests.
      2306f93 [Yin Huai] Style.
      2091fcd [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      537e28f [Yin Huai] Correctly clean up temp data.
      ae4649e [Yin Huai] Fix Python test.
      609129c [Yin Huai] Doc format.
      92b6659 [Yin Huai] Python doc and other minor updates.
      cbc717f [Yin Huai] Rename dataSourceName to source.
      d1c12d3 [Yin Huai] No need to delete the duplicate rule since it has been removed in master.
      22cfa70 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      d91ecb8 [Yin Huai] Fix test.
      4c76d78 [Yin Huai] Simplify APIs.
      3abc215 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      0832ce4 [Yin Huai] Fix test.
      98e7cdb [Yin Huai] Python style.
      2bf44ef [Yin Huai] Python APIs.
      c204967 [Yin Huai] Format
      a10223d [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      9ff97d8 [Yin Huai] Add SaveMode to saveAsTable.
      9b6e570 [Yin Huai] Update doc.
      c2be775 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      99950a2 [Yin Huai] Use Java enum for SaveMode.
      4679665 [Yin Huai] Remove duplicate rule.
      77d89dc [Yin Huai] Update doc.
      e04d908 [Yin Huai] Move import and add (Scala-specific) to scala APIs.
      cf5703d [Yin Huai] Add checkAnswer to Java tests.
      7db95ff [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      6dfd386 [Yin Huai] Add java test.
      f2f33ef [Yin Huai] Fix test.
      e702386 [Yin Huai] Apache header.
      b1e9b1b [Yin Huai] Format.
      ed4e1b4 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      af9e9b3 [Yin Huai] DDL and write support API followup.
      2a6213a [Yin Huai] Update API names.
      e6a0b77 [Yin Huai] Update test.
      43bae01 [Yin Huai] Remove createTable from HiveContext.
      5ffc372 [Yin Huai] Add more load APIs to SQLContext.
      5390743 [Yin Huai] Add more save APIs to DataFrame.
      aaf50d05
  6. Feb 09, 2015
    • Davies Liu's avatar
      [SPARK-5469] restructure pyspark.sql into multiple files · 08488c17
      Davies Liu authored
      All the DataTypes moved into pyspark.sql.types
      
      The changes can be tracked by `--find-copies-harder -M25`
      ```
      davieslocalhost:~/work/spark/python$ git diff --find-copies-harder -M25 --numstat master..
      2       5       python/docs/pyspark.ml.rst
      0       3       python/docs/pyspark.mllib.rst
      10      2       python/docs/pyspark.sql.rst
      1       1       python/pyspark/mllib/linalg.py
      21      14      python/pyspark/{mllib => sql}/__init__.py
      14      2108    python/pyspark/{sql.py => sql/context.py}
      10      1772    python/pyspark/{sql.py => sql/dataframe.py}
      7       6       python/pyspark/{sql_tests.py => sql/tests.py}
      8       1465    python/pyspark/{sql.py => sql/types.py}
      4       2       python/run-tests
      1       1       sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala
      ```
      
      Also `git blame -C -C python/pyspark/sql/context.py` to track the history.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4479 from davies/sql and squashes the following commits:
      
      1b5f0a5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into sql
      2b2b983 [Davies Liu] restructure pyspark.sql
      08488c17
  7. Feb 03, 2015
    • Davies Liu's avatar
      [SPARK-5554] [SQL] [PySpark] add more tests for DataFrame Python API · 068c0e2e
      Davies Liu authored
      Add more tests and docs for DataFrame Python API, improve test coverage, fix bugs.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4331 from davies/fix_df and squashes the following commits:
      
      dd9919f [Davies Liu] fix tests
      467332c [Davies Liu] support string in cast()
      83c92fe [Davies Liu] address comments
      c052f6f [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_df
      8dd19a9 [Davies Liu] fix tests in python 2.6
      35ccb9f [Davies Liu] fix build
      78ebcfa [Davies Liu] add sql_test.py in run_tests
      9ab78b4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_df
      6040ba7 [Davies Liu] fix docs
      3ab2661 [Davies Liu] add more tests for DataFrame
      068c0e2e
Loading