Skip to content
Snippets Groups Projects
  • Pedro Rodriguez's avatar
    560c658a
    [SPARK-8230][SQL] Add array/map size method · 560c658a
    Pedro Rodriguez authored
    Pull Request for: https://issues.apache.org/jira/browse/SPARK-8230
    
    Primary issue resolved is to implement array/map size for Spark SQL. Code is ready for review by a committer. Chen Hao is on the JIRA ticket, but I don't know his username on github, rxin is also on JIRA ticket.
    
    Things to review:
    1. Where to put added functions namespace wise, they seem to be part of a few operations on collections which includes `sort_array` and `array_contains`. Hence the name given `collectionOperations.scala` and `_collection_functions` in python.
    2. In Python code, should it be in a `1.5.0` function array or in a collections array?
    3. Are there any missing methods on the `Size` case class? Looks like many of these functions have generated Java code, is that also needed in this case?
    4. Something else?
    
    Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
    Author: Pedro Rodriguez <prodriguez@trulia.com>
    
    Closes #7462 from EntilZha/SPARK-8230 and squashes the following commits:
    
    9a442ae [Pedro Rodriguez] fixed functions and sorted __all__
    9aea3bb [Pedro Rodriguez] removed imports from python docs
    15d4bf1 [Pedro Rodriguez] Added null test case and changed to nullSafeCodeGen
    d88247c [Pedro Rodriguez] removed python code
    bd5f0e4 [Pedro Rodriguez] removed duplicate function from rebase/merge
    59931b4 [Pedro Rodriguez] fixed compile bug instroduced when merging
    c187175 [Pedro Rodriguez] updated code to add size to __all__ directly and removed redundent pretty print
    130839f [Pedro Rodriguez] fixed failing test
    aa9bade [Pedro Rodriguez] fix style
    e093473 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
    0449377 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
    9a1a2ff [Pedro Rodriguez] added unit tests for map size
    2bfbcb6 [Pedro Rodriguez] added unit test for size
    20df2b4 [Pedro Rodriguez] Finished working version of size function and added it to python
    b503e75 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
    99a6a5c [Pedro Rodriguez] fixed failing test
    cac75ac [Pedro Rodriguez] fix style
    933d843 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
    42bb7d4 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
    f9c3b8a [Pedro Rodriguez] added unit tests for map size
    2515d9f [Pedro Rodriguez] added documentation
    0e60541 [Pedro Rodriguez] added unit test for size
    acf9853 [Pedro Rodriguez] Finished working version of size function and added it to python
    84a5d38 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
    560c658a
    History
    [SPARK-8230][SQL] Add array/map size method
    Pedro Rodriguez authored
    Pull Request for: https://issues.apache.org/jira/browse/SPARK-8230
    
    Primary issue resolved is to implement array/map size for Spark SQL. Code is ready for review by a committer. Chen Hao is on the JIRA ticket, but I don't know his username on github, rxin is also on JIRA ticket.
    
    Things to review:
    1. Where to put added functions namespace wise, they seem to be part of a few operations on collections which includes `sort_array` and `array_contains`. Hence the name given `collectionOperations.scala` and `_collection_functions` in python.
    2. In Python code, should it be in a `1.5.0` function array or in a collections array?
    3. Are there any missing methods on the `Size` case class? Looks like many of these functions have generated Java code, is that also needed in this case?
    4. Something else?
    
    Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
    Author: Pedro Rodriguez <prodriguez@trulia.com>
    
    Closes #7462 from EntilZha/SPARK-8230 and squashes the following commits:
    
    9a442ae [Pedro Rodriguez] fixed functions and sorted __all__
    9aea3bb [Pedro Rodriguez] removed imports from python docs
    15d4bf1 [Pedro Rodriguez] Added null test case and changed to nullSafeCodeGen
    d88247c [Pedro Rodriguez] removed python code
    bd5f0e4 [Pedro Rodriguez] removed duplicate function from rebase/merge
    59931b4 [Pedro Rodriguez] fixed compile bug instroduced when merging
    c187175 [Pedro Rodriguez] updated code to add size to __all__ directly and removed redundent pretty print
    130839f [Pedro Rodriguez] fixed failing test
    aa9bade [Pedro Rodriguez] fix style
    e093473 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
    0449377 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
    9a1a2ff [Pedro Rodriguez] added unit tests for map size
    2bfbcb6 [Pedro Rodriguez] added unit test for size
    20df2b4 [Pedro Rodriguez] Finished working version of size function and added it to python
    b503e75 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
    99a6a5c [Pedro Rodriguez] fixed failing test
    cac75ac [Pedro Rodriguez] fix style
    933d843 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
    42bb7d4 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
    f9c3b8a [Pedro Rodriguez] added unit tests for map size
    2515d9f [Pedro Rodriguez] added documentation
    0e60541 [Pedro Rodriguez] added unit test for size
    acf9853 [Pedro Rodriguez] Finished working version of size function and added it to python
    84a5d38 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
functions.py 30.62 KiB