Skip to content
Snippets Groups Projects
  • Yin Huai's avatar
    d2f4f30b
    [SPARK-2060][SQL] Querying JSON Datasets with SQL and DSL in Spark SQL · d2f4f30b
    Yin Huai authored
    JIRA: https://issues.apache.org/jira/browse/SPARK-2060
    
    Programming guide: http://yhuai.github.io/site/sql-programming-guide.html
    
    Scala doc of SQLContext: http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.sql.SQLContext
    
    Author: Yin Huai <huai@cse.ohio-state.edu>
    
    Closes #999 from yhuai/newJson and squashes the following commits:
    
    227e89e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    ce8eedd [Yin Huai] rxin's comments.
    bc9ac51 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    94ffdaa [Yin Huai] Remove "get" from method names.
    ce31c81 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    e2773a6 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    79ea9ba [Yin Huai] Fix typos.
    5428451 [Yin Huai] Newline
    1f908ce [Yin Huai] Remove extra line.
    d7a005c [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    7ea750e [Yin Huai] marmbrus's comments.
    6a5f5ef [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    83013fb [Yin Huai] Update Java Example.
    e7a6c19 [Yin Huai] SchemaRDD.javaToPython should convert a field with the StructType to a Map.
    6d20b85 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    4fbddf0 [Yin Huai] Programming guide.
    9df8c5a [Yin Huai] Python API.
    7027634 [Yin Huai] Java API.
    cff84cc [Yin Huai] Use a SchemaRDD for a JSON dataset.
    d0bd412 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    ab810b0 [Yin Huai] Make JsonRDD private.
    6df0891 [Yin Huai] Apache header.
    8347f2e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    66f9e76 [Yin Huai] Update docs and use the entire dataset to infer the schema.
    8ffed79 [Yin Huai] Update the example.
    a5a4b52 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    4325475 [Yin Huai] If a sampled dataset is used for schema inferring, update the schema of the JsonTable after first execution.
    65b87f0 [Yin Huai] Fix sampling...
    8846af5 [Yin Huai] API doc.
    52a2275 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    0387523 [Yin Huai] Address PR comments.
    666b957 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    a2313a6 [Yin Huai] Address PR comments.
    f3ce176 [Yin Huai] After type conflict resolution, if a NullType is found, StringType is used.
    0576406 [Yin Huai] Add Apache license header.
    af91b23 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    f45583b [Yin Huai] Infer the schema of a JSON dataset (a text file with one JSON object per line or a RDD[String] with one JSON object per string) and returns a SchemaRDD.
    f31065f [Yin Huai] A query plan or a SchemaRDD can print out its schema.
    d2f4f30b
    History
    [SPARK-2060][SQL] Querying JSON Datasets with SQL and DSL in Spark SQL
    Yin Huai authored
    JIRA: https://issues.apache.org/jira/browse/SPARK-2060
    
    Programming guide: http://yhuai.github.io/site/sql-programming-guide.html
    
    Scala doc of SQLContext: http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.sql.SQLContext
    
    Author: Yin Huai <huai@cse.ohio-state.edu>
    
    Closes #999 from yhuai/newJson and squashes the following commits:
    
    227e89e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    ce8eedd [Yin Huai] rxin's comments.
    bc9ac51 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    94ffdaa [Yin Huai] Remove "get" from method names.
    ce31c81 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    e2773a6 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    79ea9ba [Yin Huai] Fix typos.
    5428451 [Yin Huai] Newline
    1f908ce [Yin Huai] Remove extra line.
    d7a005c [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    7ea750e [Yin Huai] marmbrus's comments.
    6a5f5ef [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    83013fb [Yin Huai] Update Java Example.
    e7a6c19 [Yin Huai] SchemaRDD.javaToPython should convert a field with the StructType to a Map.
    6d20b85 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    4fbddf0 [Yin Huai] Programming guide.
    9df8c5a [Yin Huai] Python API.
    7027634 [Yin Huai] Java API.
    cff84cc [Yin Huai] Use a SchemaRDD for a JSON dataset.
    d0bd412 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    ab810b0 [Yin Huai] Make JsonRDD private.
    6df0891 [Yin Huai] Apache header.
    8347f2e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    66f9e76 [Yin Huai] Update docs and use the entire dataset to infer the schema.
    8ffed79 [Yin Huai] Update the example.
    a5a4b52 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    4325475 [Yin Huai] If a sampled dataset is used for schema inferring, update the schema of the JsonTable after first execution.
    65b87f0 [Yin Huai] Fix sampling...
    8846af5 [Yin Huai] API doc.
    52a2275 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    0387523 [Yin Huai] Address PR comments.
    666b957 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    a2313a6 [Yin Huai] Address PR comments.
    f3ce176 [Yin Huai] After type conflict resolution, if a NullType is found, StringType is used.
    0576406 [Yin Huai] Add Apache license header.
    af91b23 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
    f45583b [Yin Huai] Infer the schema of a JSON dataset (a text file with one JSON object per line or a RDD[String] with one JSON object per string) and returns a SchemaRDD.
    f31065f [Yin Huai] A query plan or a SchemaRDD can print out its schema.