[Docs] SQL doc formatting and typo fixes

As [reported on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8131.html): * Code fencing with triple-backticks doesn’t seem to work like it does on GitHub. Newlines are lost. Instead, use 4-space indent to format small code blocks. * Nested bullets need 2 leading spaces, not 1. * Spellcheck! Author: Nicholas Chammas <nicholas.chammas@gmail.com> Author: nchammas <nicholas.chammas@gmail.com> Closes #2201 from nchammas/sql-doc-fixes and squashes the following commits: 873f889 [Nicholas Chammas] [Docs] fix skip-api flag 5195e0c [Nicholas Chammas] [Docs] SQL doc formatting and typo fixes 3b26c8d [nchammas] [Spark QA] Link to console output on test time out

[Docs] SQL doc formatting and typo fixes
53aa8316 · Nicholas Chammas · Michael Armbrust · e248328b · 53aa8316 · 53aa8316
Commit 53aa8316 authored 10 years ago by Nicholas Chammas Committed by Michael Armbrust 10 years ago
--- a/docs/README.md
+++ b/docs/README.md
@@ -30,7 +30,7 @@ called `_site` containing index.html as well as the rest of the compiled files.
 You can modify the default Jekyll build as follows:
    # Skip generating API docs (which takes a while)
-    $ SKIP_SCALADOC=1 jekyll build
+    $ SKIP_API=1 jekyll build
    # Serve content locally on port 4000
    $ jekyll serve --watch
    # Build the site with extra features used on the live page

--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -474,10 +474,10 @@ anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)
 Spark SQL also supports reading and writing data stored in [Apache Hive](http://hive.apache.org/).
 However, since Hive has a large number of dependencies, it is not included in the default Spark assembly.
-In order to use Hive you must first run '`sbt/sbt -Phive assembly/assembly`' (or use `-Phive` for maven).
+In order to use Hive you must first run "`sbt/sbt -Phive assembly/assembly`" (or use `-Phive` for maven).
 This command builds a new assembly jar that includes Hive. Note that this Hive assembly jar must also be present
 on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries
-(SerDes) in order to acccess data stored in Hive.
+(SerDes) in order to access data stored in Hive.
 Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`.
@@ -576,9 +576,8 @@ evaluated by the SQL execution engine.  A full list of the functions supported c
 ## Running the Thrift JDBC server
-The Thrift JDBC server implemented here corresponds to the [`HiveServer2`]
+The Thrift JDBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)
-(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2) in Hive 0.12. You can test
+in Hive 0.12. You can test the JDBC server with the beeline script comes with either Spark or Hive 0.12.
-the JDBC server with the beeline script comes with either Spark or Hive 0.12.
 To start the JDBC server, run the following in the Spark directory:
@@ -597,7 +596,7 @@ Connect to the JDBC server in beeline with:
 Beeline will ask you for a username and password. In non-secure mode, simply enter the username on
 your machine and a blank password. For secure mode, please follow the instructions given in the
-[beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients)
+[beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients).
 Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`.
@@ -616,11 +615,10 @@ In Shark, default reducer number is 1 and is controlled by the property `mapred.
 SQL deprecates this property by a new property `spark.sql.shuffle.partitions`, whose default value
 is 200. Users may customize this property via `SET`:
-```
+    SET spark.sql.shuffle.partitions=10;
-SET spark.sql.shuffle.partitions=10;
+    SELECT page, count(*) c 
-SELECT page, count(*) c FROM logs_last_month_cached
+    FROM logs_last_month_cached
-GROUP BY page ORDER BY c DESC LIMIT 10;
+    GROUP BY page ORDER BY c DESC LIMIT 10;
-```
 You may also put this property in `hive-site.xml` to override the default value.
@@ -630,22 +628,18 @@ For now, the `mapred.reduce.tasks` property is still recognized, and is converte
 #### Caching
 The `shark.cache` table property no longer exists, and tables whose name end with `_cached` are no
-longer automcatically cached. Instead, we provide `CACHE TABLE` and `UNCACHE TABLE` statements to
+longer automatically cached. Instead, we provide `CACHE TABLE` and `UNCACHE TABLE` statements to
 let user control table caching explicitly:
-```
+    CACHE TABLE logs_last_month;
-CACHE TABLE logs_last_month;
+    UNCACHE TABLE logs_last_month;
-UNCACHE TABLE logs_last_month;
-```
-**NOTE** `CACHE TABLE tbl` is lazy, it only marks table `tbl` as "need to by cached if necessary",
+**NOTE:** `CACHE TABLE tbl` is lazy, it only marks table `tbl` as "need to by cached if necessary",
 but doesn't actually cache it until a query that touches `tbl` is executed. To force the table to be
 cached, you may simply count the table immediately after executing `CACHE TABLE`:
-```
+    CACHE TABLE logs_last_month;
-CACHE TABLE logs_last_month;
+    SELECT COUNT(1) FROM logs_last_month;
-SELECT COUNT(1) FROM logs_last_month;
-```
 Several caching related features are not supported yet:
@@ -655,7 +649,7 @@ Several caching related features are not supported yet:
 ### Compatibility with Apache Hive
-#### Deploying in Exising Hive Warehouses
+#### Deploying in Existing Hive Warehouses
 Spark SQL Thrift JDBC server is designed to be "out of the box" compatible with existing Hive
 installations. You do not need to modify your existing Hive Metastore or change the data placement
@@ -666,50 +660,50 @@ or partitioning of your tables.
 Spark SQL supports the vast majority of Hive features, such as:
 * Hive query statements, including:
- * `SELECT`
+  * `SELECT`
- * `GROUP BY
+  * `GROUP BY`
- * `ORDER BY`
+  * `ORDER BY`
- * `CLUSTER BY`
+  * `CLUSTER BY`
- * `SORT BY`
+  * `SORT BY`
 * All Hive operators, including:
- * Relational operators (`=`, `⇔`, `==`, `<>`, `<`, `>`, `>=`, `<=`, etc)
+  * Relational operators (`=`, `⇔`, `==`, `<>`, `<`, `>`, `>=`, `<=`, etc)
- * Arthimatic operators (`+`, `-`, `*`, `/`, `%`, etc)
+  * Arithmetic operators (`+`, `-`, `*`, `/`, `%`, etc)
- * Logical operators (`AND`, `&&`, `OR`, `||`, etc)
+  * Logical operators (`AND`, `&&`, `OR`, `||`, etc)
- * Complex type constructors
+  * Complex type constructors
- * Mathemtatical functions (`sign`, `ln`, `cos`, etc)
+  * Mathematical functions (`sign`, `ln`, `cos`, etc)
- * String functions (`instr`, `length`, `printf`, etc)
+  * String functions (`instr`, `length`, `printf`, etc)
 * User defined functions (UDF)
 * User defined aggregation functions (UDAF)
-* User defined serialization formats (SerDe's)
+* User defined serialization formats (SerDes)
 * Joins
- * `JOIN`
+  * `JOIN`
- * `{LEFT|RIGHT|FULL} OUTER JOIN`
+  * `{LEFT|RIGHT|FULL} OUTER JOIN`
- * `LEFT SEMI JOIN`
+  * `LEFT SEMI JOIN`
- * `CROSS JOIN`
+  * `CROSS JOIN`
 * Unions
-* Sub queries
+* Sub-queries
- * `SELECT col FROM ( SELECT a + b AS col from t1) t2`
+  * `SELECT col FROM ( SELECT a + b AS col from t1) t2`
 * Sampling
 * Explain
 * Partitioned tables
 * All Hive DDL Functions, including:
- * `CREATE TABLE`
+  * `CREATE TABLE`
- * `CREATE TABLE AS SELECT`
+  * `CREATE TABLE AS SELECT`
- * `ALTER TABLE`
+  * `ALTER TABLE`
 * Most Hive Data types, including:
- * `TINYINT`
+  * `TINYINT`
- * `SMALLINT`
+  * `SMALLINT`
- * `INT`
+  * `INT`
- * `BIGINT`
+  * `BIGINT`
- * `BOOLEAN`
+  * `BOOLEAN`
- * `FLOAT`
+  * `FLOAT`
- * `DOUBLE`
+  * `DOUBLE`
- * `STRING`
+  * `STRING`
- * `BINARY`
+  * `BINARY`
- * `TIMESTAMP`
+  * `TIMESTAMP`
- * `ARRAY<>`
+  * `ARRAY<>`
- * `MAP<>`
+  * `MAP<>`
- * `STRUCT<>`
+  * `STRUCT<>`
 #### Unsupported Hive Functionality
@@ -749,8 +743,7 @@ releases of Spark SQL.
  Hive automatically converts the join into a map join. We are adding this auto conversion in the
  next release.
 * Automatically determine the number of reducers for joins and groupbys: Currently in Spark SQL, you
-  need to control the degree of parallelism post-shuffle using "SET
+  need to control the degree of parallelism post-shuffle using "`SET spark.sql.shuffle.partitions=[num_tasks];`". We are going to add auto-setting of parallelism in the
-  spark.sql.shuffle.partitions=[num_tasks];". We are going to add auto-setting of parallelism in the
  next release.
 * Meta-data only query: For queries that can be answered by using only meta data, Spark SQL still
  launches tasks to compute the result.