Skip to content
Snippets Groups Projects
  • Pedro Rodriguez's avatar
    d3454858
    [SPARK-8231] [SQL] Add array_contains · d3454858
    Pedro Rodriguez authored
    This PR is based on #7580 , thanks to EntilZha
    
    PR for work on https://issues.apache.org/jira/browse/SPARK-8231
    
    Currently, I have an initial implementation for contains. Based on discussion on JIRA, it should behave same as Hive: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128
    
    Main points are:
    1. If the array is empty, null, or the value is null, return false
    2. If there is a type mismatch, throw error
    3. If comparison is not supported, throw error
    
    Closes #7580
    
    Author: Pedro Rodriguez <prodriguez@trulia.com>
    Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
    Author: Davies Liu <davies@databricks.com>
    
    Closes #7949 from davies/array_contains and squashes the following commits:
    
    d3c08bc [Davies Liu] use foreach() to avoid copy
    bc3d1fe [Davies Liu] fix array_contains
    719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into array_contains
    e352cf9 [Pedro Rodriguez] fixed diff from master
    4d5b0ff [Pedro Rodriguez] added docs and another type check
    ffc0591 [Pedro Rodriguez] fixed unit test
    7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints which are different between python 2 an 3
    b5ffae8 [Pedro Rodriguez] fixed pyspark test
    4e7dce3 [Pedro Rodriguez] added more docs
    3082399 [Pedro Rodriguez] fixed unit test
    46f9789 [Pedro Rodriguez] reverted change
    d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then added tests to insure this
    8528027 [Pedro Rodriguez] added more tests
    686e029 [Pedro Rodriguez] fix scala style
    d262e9d [Pedro Rodriguez] reworked type checking code and added more tests
    2517a58 [Pedro Rodriguez] removed unused import
    28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests
    12f8795 [Pedro Rodriguez] fix scala style checks
    e8a20a9 [Pedro Rodriguez] added python df (broken atm)
    65b562c [Pedro Rodriguez] made array_contains nullable false
    33b45aa [Pedro Rodriguez] reordered test
    9623c64 [Pedro Rodriguez] fixed test
    4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs
    72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs
    69c46fb [Pedro Rodriguez] added tests and codegen
    9e0bfc4 [Pedro Rodriguez] initial attempt at implementation
    d3454858
    History
    [SPARK-8231] [SQL] Add array_contains
    Pedro Rodriguez authored
    This PR is based on #7580 , thanks to EntilZha
    
    PR for work on https://issues.apache.org/jira/browse/SPARK-8231
    
    Currently, I have an initial implementation for contains. Based on discussion on JIRA, it should behave same as Hive: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128
    
    Main points are:
    1. If the array is empty, null, or the value is null, return false
    2. If there is a type mismatch, throw error
    3. If comparison is not supported, throw error
    
    Closes #7580
    
    Author: Pedro Rodriguez <prodriguez@trulia.com>
    Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
    Author: Davies Liu <davies@databricks.com>
    
    Closes #7949 from davies/array_contains and squashes the following commits:
    
    d3c08bc [Davies Liu] use foreach() to avoid copy
    bc3d1fe [Davies Liu] fix array_contains
    719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into array_contains
    e352cf9 [Pedro Rodriguez] fixed diff from master
    4d5b0ff [Pedro Rodriguez] added docs and another type check
    ffc0591 [Pedro Rodriguez] fixed unit test
    7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints which are different between python 2 an 3
    b5ffae8 [Pedro Rodriguez] fixed pyspark test
    4e7dce3 [Pedro Rodriguez] added more docs
    3082399 [Pedro Rodriguez] fixed unit test
    46f9789 [Pedro Rodriguez] reverted change
    d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then added tests to insure this
    8528027 [Pedro Rodriguez] added more tests
    686e029 [Pedro Rodriguez] fix scala style
    d262e9d [Pedro Rodriguez] reworked type checking code and added more tests
    2517a58 [Pedro Rodriguez] removed unused import
    28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests
    12f8795 [Pedro Rodriguez] fix scala style checks
    e8a20a9 [Pedro Rodriguez] added python df (broken atm)
    65b562c [Pedro Rodriguez] made array_contains nullable false
    33b45aa [Pedro Rodriguez] reordered test
    9623c64 [Pedro Rodriguez] fixed test
    4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs
    72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs
    69c46fb [Pedro Rodriguez] added tests and codegen
    9e0bfc4 [Pedro Rodriguez] initial attempt at implementation