Skip to content
Snippets Groups Projects
  1. Feb 12, 2015
    • Antonio Navarro Perez's avatar
      [SQL][DOCS] Update sql documentation · 6a1be026
      Antonio Navarro Perez authored
      Updated examples using the new api and added DataFrame concept
      
      Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>
      
      Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:
      
      82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
      8d5376a [Antonio Navarro Perez] fixed typo
      8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
      6a1be026
  2. Feb 10, 2015
    • Davies Liu's avatar
      [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns · ea602840
      Davies Liu authored
      Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4498 from davies/create and squashes the following commits:
      
      08469c1 [Davies Liu] remove Scala/Java API for now
      c80a7a9 [Davies Liu] fix hive test
      d1bd8f2 [Davies Liu] cleanup applySchema
      9526e97 [Davies Liu] createDataFrame from RDD with columns
      ea602840
  3. Feb 05, 2015
    • Daoyuan Wang's avatar
      [Branch-1.3] [DOC] doc fix for date · 6fa4ac1b
      Daoyuan Wang authored
      Trivial fix.
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #4400 from adrian-wang/docdate and squashes the following commits:
      
      31bbe40 [Daoyuan Wang] doc fix for date
      6fa4ac1b
    • Matei Zaharia's avatar
      [SPARK-5608] Improve SEO of Spark documentation pages · 4d74f060
      Matei Zaharia authored
      - Add meta description tags on some of the most important doc pages
      - Shorten the titles of some pages to have more relevant keywords; for
        example there's no reason to have "Spark SQL Programming Guide - Spark
        1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
        documentation".
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #4381 from mateiz/docs-seo and squashes the following commits:
      
      4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
      4d74f060
  4. Feb 03, 2015
    • Daoyuan Wang's avatar
      [SPARK-4987] [SQL] parquet timestamp type support · 0c20ce69
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #3820 from adrian-wang/parquettimestamp and squashes the following commits:
      
      b1e2a0d [Daoyuan Wang] fix for nanos
      4dadef1 [Daoyuan Wang] fix wrong read
      93f438d [Daoyuan Wang] parquet timestamp support
      0c20ce69
  5. Jan 18, 2015
  6. Dec 30, 2014
    • luogankun's avatar
      [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE is eager · 2deac748
      luogankun authored
      `CACHE TABLE tbl` is now __eager__ by default not __lazy__
      
      Author: luogankun <luogankun@gmail.com>
      
      Closes #3773 from luogankun/SPARK-4930 and squashes the following commits:
      
      cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ...
      bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager
      2deac748
    • luogankun's avatar
      [SPARK-4916][SQL][DOCS]Update SQL programming guide about cache section · f7a41a0e
      luogankun authored
      `SchemeRDD.cache()` now uses in-memory columnar storage.
      
      Author: luogankun <luogankun@gmail.com>
      
      Closes #3759 from luogankun/SPARK-4916 and squashes the following commits:
      
      7b39864 [luogankun] [SPARK-4916]Update SQL programming guide
      6018122 [luogankun] Merge branch 'master' of https://github.com/apache/spark into SPARK-4916
      0b93785 [luogankun] [SPARK-4916]Update SQL programming guide
      99b2336 [luogankun] [SPARK-4916]Update SQL programming guide
      f7a41a0e
  7. Dec 16, 2014
    • Peter Vandenabeele's avatar
      [DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line · 1a9e35e5
      Peter Vandenabeele authored
      * This commit hopes to avoid the confusion I faced when trying
        to submit a regular, valid multi-line JSON file, also see
      
        http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
      
      Author: Peter Vandenabeele <peter@vandenabeele.com>
      
      Closes #3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits:
      
      1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
      6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
      fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
      1a9e35e5
    • Judy Nash's avatar
      [SQL] SPARK-4700: Add HTTP protocol spark thrift server · 17688d14
      Judy Nash authored
      Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode.
      
      Author: Judy Nash <judynash@microsoft.com>
      Author: judynash <judynash@microsoft.com>
      
      Closes #3672 from judynash/master and squashes the following commits:
      
      526315d [Judy Nash] correct spacing on startThriftServer method
      31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue
      47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length
      2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide
      1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master'
      2b1d312 [Judy Nash] updated http thrift server support based on feedback
      377532c [judynash] add HTTP protocol spark thrift server
      17688d14
  8. Dec 04, 2014
    • Andy Konwinski's avatar
      Fix typo in Spark SQL docs. · 15cf3b01
      Andy Konwinski authored
      Author: Andy Konwinski <andykonwinski@gmail.com>
      
      Closes #3611 from andyk/patch-3 and squashes the following commits:
      
      7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.
      15cf3b01
  9. Dec 01, 2014
  10. Nov 30, 2014
  11. Nov 17, 2014
  12. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
  13. Nov 07, 2014
  14. Nov 03, 2014
    • Michael Armbrust's avatar
      [SQL] More aggressive defaults · 25bef7e6
      Michael Armbrust authored
       - Turns on compression for in-memory cached data by default
       - Changes the default parquet compression format back to gzip (we have seen more OOMs with production workloads due to the way Snappy allocates memory)
       - Ups the batch size to 10,000 rows
       - Increases the broadcast threshold to 10mb.
       - Uses our parquet implementation instead of the hive one by default.
       - Cache parquet metadata by default.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #3064 from marmbrus/fasterDefaults and squashes the following commits:
      
      97ee9f8 [Michael Armbrust] parquet codec docs
      e641694 [Michael Armbrust] Remote also
      a12866a [Michael Armbrust] Cache metadata.
      2d73acc [Michael Armbrust] Update docs defaults.
      d63d2d5 [Michael Armbrust] document parquet option
      da373f9 [Michael Armbrust] More aggressive defaults
      25bef7e6
  15. Oct 26, 2014
  16. Oct 02, 2014
    • Yin Huai's avatar
      [SQL][Docs] Update the output of printSchema and fix a typo in SQL programming guide. · 82a6a083
      Yin Huai authored
      We have changed the output format of `printSchema`. This PR will update our SQL programming guide to show the updated format. Also, it fixes a typo (the value type of `StructType` in Java API).
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2630 from yhuai/sqlDoc and squashes the following commits:
      
      267d63e [Yin Huai] Update the output of printSchema and fix a typo.
      82a6a083
  17. Sep 28, 2014
  18. Sep 27, 2014
  19. Sep 22, 2014
  20. Sep 17, 2014
    • Michael Armbrust's avatar
      [SQL][DOCS] Improve table caching section · cbf983bb
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2434 from marmbrus/patch-1 and squashes the following commits:
      
      67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section
      cbf983bb
  21. Sep 16, 2014
    • Michael Armbrust's avatar
      [SQL][DOCS] Improve section on thrift-server · 84073eb1
      Michael Armbrust authored
      Taken from liancheng's updates. Merged conflicts with #2316.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2384 from marmbrus/sqlDocUpdate and squashes the following commits:
      
      2db6319 [Michael Armbrust] @liancheng's updates
      84073eb1
  22. Sep 13, 2014
    • Nicholas Chammas's avatar
      [SQL] [Docs] typo fixes · a523ceaf
      Nicholas Chammas authored
      * Fixed random typo
      * Added in missing description for DecimalType
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2367 from nchammas/patch-1 and squashes the following commits:
      
      aa528be [Nicholas Chammas] doc fix for SQL DecimalType
      3247ac1 [Nicholas Chammas] [SQL] [Docs] typo fixes
      a523ceaf
  23. Sep 12, 2014
    • Yin Huai's avatar
      [SQL][Docs] Update SQL programming guide to show the correct default value of... · e11eeb71
      Yin Huai authored
      [SQL][Docs] Update SQL programming guide to show the correct default value of containsNull in an ArrayType
      
      After #1889, the default value of `containsNull` in an `ArrayType` is `true`.
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2374 from yhuai/containsNull and squashes the following commits:
      
      dc609a3 [Yin Huai] Update the SQL programming guide to show the correct default value of containsNull in an ArrayType (the default value is true instead of false).
      e11eeb71
  24. Sep 08, 2014
    • Henry Cook's avatar
      [SQL] Minor edits to sql programming guide. · 26bc7655
      Henry Cook authored
      Author: Henry Cook <hcook@eecs.berkeley.edu>
      
      Closes #2316 from hcook/sql-docs and squashes the following commits:
      
      373f94b [Henry Cook] Minor edits to sql programming guide.
      26bc7655
  25. Sep 07, 2014
    • Michael Armbrust's avatar
      [SQL] Update SQL Programming Guide · 39db1bfd
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2258 from marmbrus/sqlDocUpdate and squashes the following commits:
      
      f3d450b [Michael Armbrust] fix brackets
      bea3bfa [Michael Armbrust] Davies suggestions
      3a29fe2 [Michael Armbrust] tighten visibility
      a71aa36 [Michael Armbrust] Draft of doc updates
      52932c0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into sqlDocUpdate
      1e8c849 [Yin Huai] Update the example used for applySchema.
      9457c39 [Yin Huai] Update doc.
      31ba240 [Yin Huai] Merge remote-tracking branch 'upstream/master' into dataTypeDoc
      29bc668 [Yin Huai] Draft doc for data type and schema APIs.
      39db1bfd
  26. Aug 29, 2014
  27. Aug 20, 2014
    • Patrick Wendell's avatar
      SPARK-3092 [SQL]: Always include the thriftserver when -Phive is enabled. · f2f26c2a
      Patrick Wendell authored
      Currently we have a separate profile called hive-thriftserver. I originally suggested this in case users did not want to bundle the thriftserver, but it's ultimately lead to a lot of confusion. Since the thriftserver is only a few classes, I don't see a really good reason to isolate it from the rest of Hive. So let's go ahead and just include it in the same profile to simplify things.
      
      This has been suggested in the past by liancheng.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2006 from pwendell/hiveserver and squashes the following commits:
      
      742ea40 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into hiveserver
      034ad47 [Patrick Wendell] SPARK-3092: Always include the thriftserver when -Phive is enabled.
      f2f26c2a
  28. Aug 18, 2014
    • Patrick Wendell's avatar
      SPARK-3025 [SQL]: Allow JDBC clients to set a fair scheduler pool · 6bca8898
      Patrick Wendell authored
      This definitely needs review as I am not familiar with this part of Spark.
      I tested this locally and it did seem to work.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #1937 from pwendell/scheduler and squashes the following commits:
      
      b858e33 [Patrick Wendell] SPARK-3025: Allow JDBC clients to set a fair scheduler pool
      6bca8898
  29. Aug 03, 2014
    • Michael Armbrust's avatar
      [SPARK-2784][SQL] Deprecate hql() method in favor of a config option, 'spark.sql.dialect' · 236dfac6
      Michael Armbrust authored
      Many users have reported being confused by the distinction between the `sql` and `hql` methods.  Specifically, many users think that `sql(...)` cannot be used to read hive tables.  In this PR I introduce a new configuration option `spark.sql.dialect` that picks which dialect with be used for parsing.  For SQLContext this must be set to `sql`.  In `HiveContext` it defaults to `hiveql` but can also be set to `sql`.
      
      The `hql` and `hiveql` methods continue to act the same but are now marked as deprecated.
      
      **This is a possibly breaking change for some users unless they set the dialect manually, though this is unlikely.**
      
      For example: `hiveContex.sql("SELECT 1")` will now throw a parsing exception by default.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1746 from marmbrus/sqlLanguageConf and squashes the following commits:
      
      ad375cc [Michael Armbrust] Merge remote-tracking branch 'apache/master' into sqlLanguageConf
      20c43f8 [Michael Armbrust] override function instead of just setting the value
      7e4ae93 [Michael Armbrust] Deprecate hql() method in favor of a config option, 'spark.sql.dialect'
      236dfac6
  30. Aug 02, 2014
    • Michael Armbrust's avatar
      [SPARK-2739][SQL] Rename registerAsTable to registerTempTable · 1a804373
      Michael Armbrust authored
      There have been user complaints that the difference between `registerAsTable` and `saveAsTable` is too subtle.  This PR addresses this by renaming `registerAsTable` to `registerTempTable`, which more clearly reflects what is happening.  `registerAsTable` remains, but will cause a deprecation warning.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1743 from marmbrus/registerTempTable and squashes the following commits:
      
      d031348 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable
      4dff086 [Michael Armbrust] Fix .java files too
      89a2f12 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable
      0b7b71e [Michael Armbrust] Rename registerAsTable to registerTempTable
      1a804373
  31. Aug 01, 2014
    • CrazyJvm's avatar
      [SQL] Documentation: Explain cacheTable command · c82fe478
      CrazyJvm authored
      add the `cacheTable` specification
      
      Author: CrazyJvm <crazyjvm@gmail.com>
      
      Closes #1681 from CrazyJvm/sql-programming-guide-cache and squashes the following commits:
      
      0a231e0 [CrazyJvm] grammar fixes
      a04020e [CrazyJvm] modify title to Cached tables
      18b6594 [CrazyJvm] fix format
      2cbbf58 [CrazyJvm] add cacheTable guide
      c82fe478
  32. Jul 31, 2014
    • Michael Armbrust's avatar
      [SPARK-2397][SQL] Deprecate LocalHiveContext · 72cfb139
      Michael Armbrust authored
      LocalHiveContext is redundant with HiveContext.  The only difference is it creates `./metastore` instead of `./metastore_db`.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1641 from marmbrus/localHiveContext and squashes the following commits:
      
      e5ec497 [Michael Armbrust] Add deprecation version
      626e056 [Michael Armbrust] Don't remove from imports yet
      905cc5f [Michael Armbrust] Merge remote-tracking branch 'apache/master' into localHiveContext
      1c2727e [Michael Armbrust] Deprecate LocalHiveContext
      72cfb139
  33. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  34. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
Loading