Skip to content
Snippets Groups Projects
  • Kan Zhang's avatar
    9422a9b0
    [SPARK-2736] PySpark converter and example script for reading Avro files · 9422a9b0
    Kan Zhang authored
    JIRA: https://issues.apache.org/jira/browse/SPARK-2736
    
    This patch includes:
    1. An Avro converter that converts Avro data types to Python. It handles all 3 Avro data mappings (Generic, Specific and Reflect).
    2. An example Python script for reading Avro files using AvroKeyInputFormat and the converter.
    3. Fixing a classloading issue.
    
    cc @MLnick @JoshRosen @mateiz
    
    Author: Kan Zhang <kzhang@apache.org>
    
    Closes #1916 from kanzhang/SPARK-2736 and squashes the following commits:
    
    02443f8 [Kan Zhang] [SPARK-2736] Adding .avsc files to .rat-excludes
    f74e9a9 [Kan Zhang] [SPARK-2736] nit: clazz -> className
    82cc505 [Kan Zhang] [SPARK-2736] Update data sample
    0be7761 [Kan Zhang] [SPARK-2736] Example pyspark script and data files
    c8e5881 [Kan Zhang] [SPARK-2736] Trying to work with all 3 Avro data models
    2271a5b [Kan Zhang] [SPARK-2736] Using the right class loader to find Avro classes
    536876b [Kan Zhang] [SPARK-2736] Adding Avro to Java converter
    9422a9b0
    History
    [SPARK-2736] PySpark converter and example script for reading Avro files
    Kan Zhang authored
    JIRA: https://issues.apache.org/jira/browse/SPARK-2736
    
    This patch includes:
    1. An Avro converter that converts Avro data types to Python. It handles all 3 Avro data mappings (Generic, Specific and Reflect).
    2. An example Python script for reading Avro files using AvroKeyInputFormat and the converter.
    3. Fixing a classloading issue.
    
    cc @MLnick @JoshRosen @mateiz
    
    Author: Kan Zhang <kzhang@apache.org>
    
    Closes #1916 from kanzhang/SPARK-2736 and squashes the following commits:
    
    02443f8 [Kan Zhang] [SPARK-2736] Adding .avsc files to .rat-excludes
    f74e9a9 [Kan Zhang] [SPARK-2736] nit: clazz -> className
    82cc505 [Kan Zhang] [SPARK-2736] Update data sample
    0be7761 [Kan Zhang] [SPARK-2736] Example pyspark script and data files
    c8e5881 [Kan Zhang] [SPARK-2736] Trying to work with all 3 Avro data models
    2271a5b [Kan Zhang] [SPARK-2736] Using the right class loader to find Avro classes
    536876b [Kan Zhang] [SPARK-2736] Adding Avro to Java converter
unsafe NaN GiB