Skip to content
Snippets Groups Projects
  • Franklyn D'souza's avatar
    0f90f4e6
    [SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns. · 0f90f4e6
    Franklyn D'souza authored
    ## What changes were proposed in this pull request?
    
    This PR adds equality operators to UDT classes so that they can be correctly tested for dataType equality during union operations.
    
    This was previously causing `"AnalysisException: u"unresolved operator 'Union;""` when trying to unionAll two dataframes with UDT columns as below.
    
    ```
    from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
    from pyspark.sql import types
    
    schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])
    
    a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
    b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)
    
    c = a.unionAll(b)
    ```
    
    ## How was the this patch tested?
    
    Tested using two unit tests in sql/test.py and the DataFrameSuite.
    
    Additional information here : https://issues.apache.org/jira/browse/SPARK-13410
    
    Author: Franklyn D'souza <franklynd@gmail.com>
    
    Closes #11279 from damnMeddlingKid/udt-union-all.
    0f90f4e6
    History
    [SPARK-13410][SQL] Support unionAll for DataFrames with UDT columns.
    Franklyn D'souza authored
    ## What changes were proposed in this pull request?
    
    This PR adds equality operators to UDT classes so that they can be correctly tested for dataType equality during union operations.
    
    This was previously causing `"AnalysisException: u"unresolved operator 'Union;""` when trying to unionAll two dataframes with UDT columns as below.
    
    ```
    from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
    from pyspark.sql import types
    
    schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])
    
    a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
    b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)
    
    c = a.unionAll(b)
    ```
    
    ## How was the this patch tested?
    
    Tested using two unit tests in sql/test.py and the DataFrameSuite.
    
    Additional information here : https://issues.apache.org/jira/browse/SPARK-13410
    
    Author: Franklyn D'souza <franklynd@gmail.com>
    
    Closes #11279 from damnMeddlingKid/udt-union-all.