Skip to content
Snippets Groups Projects
  1. May 13, 2014
  2. May 07, 2014
    • Kan Zhang's avatar
      [SPARK-1460] Returning SchemaRDD instead of normal RDD on Set operations... · 967635a2
      Kan Zhang authored
      ... that do not change schema
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #448 from kanzhang/SPARK-1460 and squashes the following commits:
      
      111e388 [Kan Zhang] silence MiMa errors in EdgeRDD and VertexRDD
      91dc787 [Kan Zhang] Taking into account newly added Ordering param
      79ed52a [Kan Zhang] [SPARK-1460] Returning SchemaRDD on Set operations that do not change schema
      967635a2
  3. Apr 29, 2014
    • Michael Armbrust's avatar
      Minor fix to python table caching API. · 497be3ca
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #585 from marmbrus/pythonCacheTable and squashes the following commits:
      
      7ec1f91 [Michael Armbrust] Minor fix to python table caching API.
      497be3ca
  4. Apr 19, 2014
    • Michael Armbrust's avatar
      Add insertInto and saveAsTable to Python API. · 10d04213
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #447 from marmbrus/pythonInsert and squashes the following commits:
      
      c7ab692 [Michael Armbrust] Keep docstrings < 72 chars.
      ff62870 [Michael Armbrust] Add insertInto and saveAsTable to Python API.
      10d04213
  5. Apr 15, 2014
    • Michael Armbrust's avatar
      [SQL] SPARK-1424 Generalize insertIntoTable functions on SchemaRDDs · 273c2fd0
      Michael Armbrust authored
      This makes it possible to create tables and insert into them using the DSL and SQL for the scala and java apis.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #354 from marmbrus/insertIntoTable and squashes the following commits:
      
      6c6f227 [Michael Armbrust] Create random temporary files in python parquet unit tests.
      f5e6d5c [Michael Armbrust] Merge remote-tracking branch 'origin/master' into insertIntoTable
      765c506 [Michael Armbrust] Add to JavaAPI.
      77b512c [Michael Armbrust] typos.
      5c3ef95 [Michael Armbrust] use names for boolean args.
      882afdf [Michael Armbrust] Change createTableAs to saveAsTable.  Clean up api annotations.
      d07d94b [Michael Armbrust] Add tests, support for creating parquet files and hive tables.
      fa3fe81 [Michael Armbrust] Make insertInto available on JavaSchemaRDD as well.  Add createTableAs function.
      273c2fd0
    • Ahir Reddy's avatar
      SPARK-1374: PySpark API for SparkSQL · c99bcb7f
      Ahir Reddy authored
      An initial API that exposes SparkSQL functionality in PySpark. A PythonRDD composed of dictionaries, with string keys and primitive values (boolean, float, int, long, string) can be converted into a SchemaRDD that supports sql queries.
      
      ```
      from pyspark.context import SQLContext
      sqlCtx = SQLContext(sc)
      rdd = sc.parallelize([{"field1" : 1, "field2" : "row1"}, {"field1" : 2, "field2": "row2"}, {"field1" : 3, "field2": "row3"}])
      srdd = sqlCtx.applySchema(rdd)
      sqlCtx.registerRDDAsTable(srdd, "table1")
      srdd2 = sqlCtx.sql("SELECT field1 AS f1, field2 as f2 from table1")
      srdd2.collect()
      ```
      The last line yields ```[{"f1" : 1, "f2" : "row1"}, {"f1" : 2, "f2": "row2"}, {"f1" : 3, "f2": "row3"}]```
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #363 from ahirreddy/pysql and squashes the following commits:
      
      0294497 [Ahir Reddy] Updated log4j properties to supress Hive Warns
      307d6e0 [Ahir Reddy] Style fix
      6f7b8f6 [Ahir Reddy] Temporary fix MIMA checker. Since we now assemble Spark jar with Hive, we don't want to check the interfaces of all of our hive dependencies
      3ef074a [Ahir Reddy] Updated documentation because classes moved to sql.py
      29245bf [Ahir Reddy] Cache underlying SchemaRDD instead of generating and caching PythonRDD
      f2312c7 [Ahir Reddy] Moved everything into sql.py
      a19afe4 [Ahir Reddy] Doc fixes
      6d658ba [Ahir Reddy] Remove the metastore directory created by the HiveContext tests in SparkSQL
      521ff6d [Ahir Reddy] Trying to get spark to build with hive
      ab95eba [Ahir Reddy] Set SPARK_HIVE=true on jenkins
      ded03e7 [Ahir Reddy] Added doc test for HiveContext
      22de1d4 [Ahir Reddy] Fixed maven pyrolite dependency
      e4da06c [Ahir Reddy] Display message if hive is not built into spark
      227a0be [Michael Armbrust] Update API links. Fix Hive example.
      58e2aa9 [Michael Armbrust] Build Docs for pyspark SQL Api.  Minor fixes.
      4285340 [Michael Armbrust] Fix building of Hive API Docs.
      38a92b0 [Michael Armbrust] Add note to future non-python developers about python docs.
      337b201 [Ahir Reddy] Changed com.clearspring.analytics stream version from 2.4.0 to 2.5.1 to match SBT build, and added pyrolite to maven build
      40491c9 [Ahir Reddy] PR Changes + Method Visibility
      1836944 [Michael Armbrust] Fix comments.
      e00980f [Michael Armbrust] First draft of python sql programming guide.
      b0192d3 [Ahir Reddy] Added Long, Double and Boolean as usable types + unit test
      f98a422 [Ahir Reddy] HiveContexts
      79621cf [Ahir Reddy] cleaning up cruft
      b406ba0 [Ahir Reddy] doctest formatting
      20936a5 [Ahir Reddy] Added tests and documentation
      e4d21b4 [Ahir Reddy] Added pyrolite dependency
      79f739d [Ahir Reddy] added more tests
      7515ba0 [Ahir Reddy] added more tests :)
      d26ec5e [Ahir Reddy] added test
      e9f5b8d [Ahir Reddy] adding tests
      906d180 [Ahir Reddy] added todo explaining cost of creating Row object in python
      251f99d [Ahir Reddy] for now only allow dictionaries as input
      09b9980 [Ahir Reddy] made jrdd explicitly lazy
      c608947 [Ahir Reddy] SchemaRDD now has all RDD operations
      725c91e [Ahir Reddy] awesome row objects
      55d1c76 [Ahir Reddy] return row objects
      4fe1319 [Ahir Reddy] output dictionaries correctly
      be079de [Ahir Reddy] returning dictionaries works
      cd5f79f [Ahir Reddy] Switched to using Scala SQLContext
      e948bd9 [Ahir Reddy] yippie
      4886052 [Ahir Reddy] even better
      c0fb1c6 [Ahir Reddy] more working
      043ca85 [Ahir Reddy] working
      5496f9f [Ahir Reddy] doesn't crash
      b8b904b [Ahir Reddy] Added schema rdd class
      67ba875 [Ahir Reddy] java to python, and python to java
      bcc0f23 [Ahir Reddy] Java to python
      ab6025d [Ahir Reddy] compiling
      c99bcb7f
Loading