Skip to content
Snippets Groups Projects
  1. Aug 12, 2015
  2. Aug 06, 2015
  3. Aug 05, 2015
    • Pedro Rodriguez's avatar
      [SPARK-8231] [SQL] Add array_contains · d3454858
      Pedro Rodriguez authored
      This PR is based on #7580 , thanks to EntilZha
      
      PR for work on https://issues.apache.org/jira/browse/SPARK-8231
      
      Currently, I have an initial implementation for contains. Based on discussion on JIRA, it should behave same as Hive: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128
      
      Main points are:
      1. If the array is empty, null, or the value is null, return false
      2. If there is a type mismatch, throw error
      3. If comparison is not supported, throw error
      
      Closes #7580
      
      Author: Pedro Rodriguez <prodriguez@trulia.com>
      Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7949 from davies/array_contains and squashes the following commits:
      
      d3c08bc [Davies Liu] use foreach() to avoid copy
      bc3d1fe [Davies Liu] fix array_contains
      719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into array_contains
      e352cf9 [Pedro Rodriguez] fixed diff from master
      4d5b0ff [Pedro Rodriguez] added docs and another type check
      ffc0591 [Pedro Rodriguez] fixed unit test
      7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints which are different between python 2 an 3
      b5ffae8 [Pedro Rodriguez] fixed pyspark test
      4e7dce3 [Pedro Rodriguez] added more docs
      3082399 [Pedro Rodriguez] fixed unit test
      46f9789 [Pedro Rodriguez] reverted change
      d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then added tests to insure this
      8528027 [Pedro Rodriguez] added more tests
      686e029 [Pedro Rodriguez] fix scala style
      d262e9d [Pedro Rodriguez] reworked type checking code and added more tests
      2517a58 [Pedro Rodriguez] removed unused import
      28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests
      12f8795 [Pedro Rodriguez] fix scala style checks
      e8a20a9 [Pedro Rodriguez] added python df (broken atm)
      65b562c [Pedro Rodriguez] made array_contains nullable false
      33b45aa [Pedro Rodriguez] reordered test
      9623c64 [Pedro Rodriguez] fixed test
      4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs
      72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs
      69c46fb [Pedro Rodriguez] added tests and codegen
      9e0bfc4 [Pedro Rodriguez] initial attempt at implementation
      d3454858
  4. Aug 04, 2015
    • Davies Liu's avatar
      [SPARK-9513] [SQL] [PySpark] Add python API for DataFrame functions · 2b67fdb6
      Davies Liu authored
      This adds Python API for those DataFrame functions that is introduced in 1.5.
      
      There is issue with serialize byte_array in Python 3, so some of functions (for BinaryType) does not have tests.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7922 from davies/python_functions and squashes the following commits:
      
      8ad942f [Davies Liu] fix test
      5fb6ec3 [Davies Liu] fix bugs
      3495ed3 [Davies Liu] fix issues
      ea5f7bb [Davies Liu] Add python API for DataFrame functions
      2b67fdb6
  5. Aug 01, 2015
    • HuJiayin's avatar
      [SPARK-8269] [SQL] string function: initcap · 00cd92f3
      HuJiayin authored
      This PR is based on #7208 , thanks to HuJiayin
      
      Closes #7208
      
      Author: HuJiayin <jiayin.hu@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7850 from davies/initcap and squashes the following commits:
      
      54472e9 [Davies Liu] fix python test
      17ffe51 [Davies Liu] Merge branch 'master' of github.com:apache/spark into initcap
      ca46390 [Davies Liu] Merge branch 'master' of github.com:apache/spark into initcap
      3a906e4 [Davies Liu] implement title case in UTF8String
      8b2506a [HuJiayin] Update functions.py
      2cd43e5 [HuJiayin] fix python style check
      b616c0e [HuJiayin] add python api
      1f5a0ef [HuJiayin] add codegen
      7e0c604 [HuJiayin] Merge branch 'master' of https://github.com/apache/spark into initcap
      6a0b958 [HuJiayin] add column
      c79482d [HuJiayin] support soundex
      7ce416b [HuJiayin] support initcap rebase code
      00cd92f3
    • zhichao.li's avatar
      [SPARK-8263] [SQL] substr/substring should also support binary type · c5166f7a
      zhichao.li authored
      This is based on #7641, thanks to zhichao-li
      
      Closes #7641
      
      Author: zhichao.li <zhichao.li@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7848 from davies/substr and squashes the following commits:
      
      461b709 [Davies Liu] remove bytearry from tests
      b45377a [Davies Liu] Merge branch 'master' of github.com:apache/spark into substr
      01d795e [zhichao.li] scala style
      99aa130 [zhichao.li] add substring to dataframe
      4f68bfe [zhichao.li] add binary type support for substring
      c5166f7a
    • Cheng Hao's avatar
      [SPARK-8232] [SQL] Add sort_array support · cf6c9ca3
      Cheng Hao authored
      This PR is based on #7581 , just fix the conflict.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7851 from davies/sort_array and squashes the following commits:
      
      a80ef66 [Davies Liu] fix conflict
      7cfda65 [Davies Liu] Merge branch 'master' of github.com:apache/spark into sort_array
      664c960 [Cheng Hao] update the sort_array by using the ArrayData
      276d2d5 [Cheng Hao] add empty line
      0edab9c [Cheng Hao] Add asending/descending support for sort_array
      80fc0f8 [Cheng Hao] Add type checking
      a42b678 [Cheng Hao] Add sort_array support
      cf6c9ca3
    • Davies Liu's avatar
      Revert "[SPARK-8232] [SQL] Add sort_array support" · 60ea7ab4
      Davies Liu authored
      This reverts commit 67ad4e21.
      60ea7ab4
    • Cheng Hao's avatar
      [SPARK-8232] [SQL] Add sort_array support · 67ad4e21
      Cheng Hao authored
      Add expression `sort_array` support.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Davies Liu <davies.liu@gmail.com>
      
      Closes #7581 from chenghao-intel/sort_array and squashes the following commits:
      
      664c960 [Cheng Hao] update the sort_array by using the ArrayData
      276d2d5 [Cheng Hao] add empty line
      0edab9c [Cheng Hao] Add asending/descending support for sort_array
      80fc0f8 [Cheng Hao] Add type checking
      a42b678 [Cheng Hao] Add sort_array support
      67ad4e21
  6. Jul 31, 2015
    • zhichao.li's avatar
      [SPARK-8264][SQL]add substring_index function · 6996bd2e
      zhichao.li authored
      This PR is based on #7533 , thanks to zhichao-li
      
      Closes #7533
      
      Author: zhichao.li <zhichao.li@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7843 from davies/str_index and squashes the following commits:
      
      391347b [Davies Liu] add python api
      3ce7802 [Davies Liu] fix substringIndex
      f2d29a1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into str_index
      515519b [zhichao.li] add foldable and remove null checking
      9546991 [zhichao.li] scala style
      67c253a [zhichao.li] hide some apis and clean code
      b19b013 [zhichao.li] add codegen and clean code
      ac863e9 [zhichao.li] reduce the calling of numChars
      12e108f [zhichao.li] refine unittest
      d92951b [zhichao.li] add lastIndexOf
      52d7b03 [zhichao.li] add substring_index function
      6996bd2e
    • HuJiayin's avatar
      [SPARK-8271][SQL]string function: soundex · 4d5a6e7b
      HuJiayin authored
      This PR brings SQL function soundex(), see https://issues.apache.org/jira/browse/HIVE-9738
      
      It's based on #7115 , thanks to HuJiayin
      
      Author: HuJiayin <jiayin.hu@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7812 from davies/soundex and squashes the following commits:
      
      fa75941 [Davies Liu] Merge branch 'master' of github.com:apache/spark into soundex
      a4bd6d8 [Davies Liu] fix soundex
      2538908 [HuJiayin] add codegen soundex
      d15d329 [HuJiayin] add back ut
      ded1a14 [HuJiayin] Merge branch 'master' of https://github.com/apache/spark
      e2dec2c [HuJiayin] support soundex rebase code
      4d5a6e7b
  7. Jul 30, 2015
    • Daoyuan Wang's avatar
      [SPARK-8176] [SPARK-8197] [SQL] function to_date/ trunc · 83670fc9
      Daoyuan Wang authored
      This PR is based on #6988 , thanks to adrian-wang .
      
      This brings two SQL functions: to_date() and trunc().
      
      Closes #6988
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7805 from davies/to_date and squashes the following commits:
      
      2c7beba [Davies Liu] Merge branch 'master' of github.com:apache/spark into to_date
      310dd55 [Daoyuan Wang] remove dup test in rebase
      980b092 [Daoyuan Wang] resolve rebase conflict
      a476c5a [Daoyuan Wang] address comments from davies
      d44ea5f [Daoyuan Wang] function to_date, trunc
      83670fc9
    • Daoyuan Wang's avatar
      [SPARK-8186] [SPARK-8187] [SPARK-8194] [SPARK-8198] [SPARK-9133] [SPARK-9290]... · 1abf7dc1
      Daoyuan Wang authored
      [SPARK-8186] [SPARK-8187] [SPARK-8194] [SPARK-8198] [SPARK-9133] [SPARK-9290] [SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation
      
      This PR is based on #7589 , thanks to adrian-wang
      
      Added SQL function date_add, date_sub, add_months, month_between, also add a rule for
      add/subtract of date/timestamp and interval.
      
      Closes #7589
      
      cc rxin
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7754 from davies/date_add and squashes the following commits:
      
      e8c633a [Davies Liu] Merge branch 'master' of github.com:apache/spark into date_add
      9e8e085 [Davies Liu] Merge branch 'master' of github.com:apache/spark into date_add
      6224ce4 [Davies Liu] fix conclict
      bd18cd4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into date_add
      e47ff2c [Davies Liu] add python api, fix date functions
      01943d0 [Davies Liu] Merge branch 'master' into date_add
      522e91a [Daoyuan Wang] fix
      e8a639a [Daoyuan Wang] fix
      42df486 [Daoyuan Wang] fix style
      87c4b77 [Daoyuan Wang] function add_months, months_between and some fixes
      1a68e03 [Daoyuan Wang] poc of time interval calculation
      c506661 [Daoyuan Wang] function date_add , date_sub
      1abf7dc1
  8. Jul 25, 2015
    • JD's avatar
      [Spark-8668][SQL] Adding expr to functions · 723db13e
      JD authored
      Author: JD <jd@csh.rit.edu>
      Author: Joseph Batchik <josephbatchik@gmail.com>
      
      Closes #7606 from JDrit/expr and squashes the following commits:
      
      ad7f607 [Joseph Batchik] fixing python linter error
      9d6daea [Joseph Batchik] removed order by per @rxin's comment
      707d5c6 [Joseph Batchik] Added expr to fuctions.py
      79df83c [JD] added example to the docs
      b89eec8 [JD] moved function up as per @rxin's comment
      4960909 [JD] updated per @JoshRosen's comment
      2cb329c [JD] updated per @rxin's comment
      9a9ad0c [JD] removing unused import
      6dc26d0 [JD] removed split
      7f2222c [JD] Adding expr function as per SPARK-8668
      723db13e
  9. Jul 21, 2015
    • Pedro Rodriguez's avatar
      [SPARK-8230][SQL] Add array/map size method · 560c658a
      Pedro Rodriguez authored
      Pull Request for: https://issues.apache.org/jira/browse/SPARK-8230
      
      Primary issue resolved is to implement array/map size for Spark SQL. Code is ready for review by a committer. Chen Hao is on the JIRA ticket, but I don't know his username on github, rxin is also on JIRA ticket.
      
      Things to review:
      1. Where to put added functions namespace wise, they seem to be part of a few operations on collections which includes `sort_array` and `array_contains`. Hence the name given `collectionOperations.scala` and `_collection_functions` in python.
      2. In Python code, should it be in a `1.5.0` function array or in a collections array?
      3. Are there any missing methods on the `Size` case class? Looks like many of these functions have generated Java code, is that also needed in this case?
      4. Something else?
      
      Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
      Author: Pedro Rodriguez <prodriguez@trulia.com>
      
      Closes #7462 from EntilZha/SPARK-8230 and squashes the following commits:
      
      9a442ae [Pedro Rodriguez] fixed functions and sorted __all__
      9aea3bb [Pedro Rodriguez] removed imports from python docs
      15d4bf1 [Pedro Rodriguez] Added null test case and changed to nullSafeCodeGen
      d88247c [Pedro Rodriguez] removed python code
      bd5f0e4 [Pedro Rodriguez] removed duplicate function from rebase/merge
      59931b4 [Pedro Rodriguez] fixed compile bug instroduced when merging
      c187175 [Pedro Rodriguez] updated code to add size to __all__ directly and removed redundent pretty print
      130839f [Pedro Rodriguez] fixed failing test
      aa9bade [Pedro Rodriguez] fix style
      e093473 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
      0449377 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
      9a1a2ff [Pedro Rodriguez] added unit tests for map size
      2bfbcb6 [Pedro Rodriguez] added unit test for size
      20df2b4 [Pedro Rodriguez] Finished working version of size function and added it to python
      b503e75 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
      99a6a5c [Pedro Rodriguez] fixed failing test
      cac75ac [Pedro Rodriguez] fix style
      933d843 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
      42bb7d4 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
      f9c3b8a [Pedro Rodriguez] added unit tests for map size
      2515d9f [Pedro Rodriguez] added documentation
      0e60541 [Pedro Rodriguez] added unit test for size
      acf9853 [Pedro Rodriguez] Finished working version of size function and added it to python
      84a5d38 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
      560c658a
    • Cheng Hao's avatar
      [SPARK-8255] [SPARK-8256] [SQL] Add regex_extract/regex_replace · 8c8f0ef5
      Cheng Hao authored
      Add expressions `regex_extract` & `regex_replace`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7468 from chenghao-intel/regexp and squashes the following commits:
      
      e5ea476 [Cheng Hao] minor update for documentation
      ef96fd6 [Cheng Hao] update the code gen
      72cf28f [Cheng Hao] Add more log for compilation error
      4e11381 [Cheng Hao] Add regexp_replace / regexp_extract support
      8c8f0ef5
  10. Jul 20, 2015
    • Davies Liu's avatar
      [SPARK-9114] [SQL] [PySpark] convert returned object from UDF into internal type · 9f913c4f
      Davies Liu authored
      This PR also remove the duplicated code between registerFunction and UserDefinedFunction.
      
      cc JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7450 from davies/fix_return_type and squashes the following commits:
      
      e80bf9f [Davies Liu] remove debugging code
      f94b1f6 [Davies Liu] fix mima
      8f9c58b [Davies Liu] convert returned object from UDF into internal type
      9f913c4f
  11. Jul 19, 2015
    • Reynold Xin's avatar
      [SQL] Make date/time functions more consistent with other database systems. · 3427937e
      Reynold Xin authored
      This pull request fixes some of the problems in #6981.
      
      - Added date functions to `__all__` so they get exposed
      - Rename day_of_month -> dayofmonth
      - Rename day_in_year -> dayofyear
      - Rename week_of_year -> weekofyear
      - Removed "day" from Scala/Python API since it is ambiguous. Only leaving the alias in SQL.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Reynold Xin <rxin@databricks.com>
      
      Closes #7506 from rxin/datetime and squashes the following commits:
      
      0cb24d9 [Reynold Xin] Export all functions in Python.
      e44a4a0 [Reynold Xin] Removed day function from Scala and Python.
      9c08fdc [Reynold Xin] [SQL] Make date/time functions more consistent with other database systems.
      3427937e
    • Tarek Auel's avatar
      [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK... · 83b682be
      Tarek Auel authored
      [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SPARK-9115][SQL] date functions
      
      Jira:
      https://issues.apache.org/jira/browse/SPARK-8199
      https://issues.apache.org/jira/browse/SPARK-8184
      https://issues.apache.org/jira/browse/SPARK-8183
      https://issues.apache.org/jira/browse/SPARK-8182
      https://issues.apache.org/jira/browse/SPARK-8181
      https://issues.apache.org/jira/browse/SPARK-8180
      https://issues.apache.org/jira/browse/SPARK-8179
      https://issues.apache.org/jira/browse/SPARK-8177
      https://issues.apache.org/jira/browse/SPARK-8179
      https://issues.apache.org/jira/browse/SPARK-9115
      
      Regarding `day`and `dayofmonth` are both necessary?
      
      ~~I am going to add `Quarter` to this PR as well.~~ Done.
      
      ~~As soon as the Scala coding is reviewed and discussed, I'll add the python api.~~ Done
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      Author: Tarek Auel <tarek.auel@gmail.com>
      
      Closes #6981 from tarekauel/SPARK-8199 and squashes the following commits:
      
      f7b4c8c [Tarek Auel] [SPARK-8199] fixed bug in tests
      bb567b6 [Tarek Auel] [SPARK-8199] fixed test
      3e095ba [Tarek Auel] [SPARK-8199] style and timezone fix
      256c357 [Tarek Auel] [SPARK-8199] code cleanup
      5983dcc [Tarek Auel] [SPARK-8199] whitespace fix
      6e0c78f [Tarek Auel] [SPARK-8199] removed setTimeZone in tests, according to cloud-fans comment in #7488
      4afc09c [Tarek Auel] [SPARK-8199] concise leap year handling
      ea6c110 [Tarek Auel] [SPARK-8199] fix after merging master
      70238e0 [Tarek Auel] Merge branch 'master' into SPARK-8199
      3c6ae2e [Tarek Auel] [SPARK-8199] removed binary search
      fb98ba0 [Tarek Auel] [SPARK-8199] python docstring fix
      cdfae27 [Tarek Auel] [SPARK-8199] cleanup & python docstring fix
      746b80a [Tarek Auel] [SPARK-8199] build fix
      0ad6db8 [Tarek Auel] [SPARK-8199] minor fix
      523542d [Tarek Auel] [SPARK-8199] address comments
      2259299 [Tarek Auel] [SPARK-8199] day_of_month alias
      d01b977 [Tarek Auel] [SPARK-8199] python underscore
      56c4a92 [Tarek Auel] [SPARK-8199] update python docu
      e223bc0 [Tarek Auel] [SPARK-8199] refactoring
      d6aa14e [Tarek Auel] [SPARK-8199] fixed Hive compatibility
      b382267 [Tarek Auel] [SPARK-8199] fixed bug in day calculation; removed set TimeZone in HiveCompatibilitySuite for test purposes; removed Hive tests for second and minute, because we can cast '2015-03-18' to a timestamp and extract a minute/second from it
      1b2e540 [Tarek Auel] [SPARK-8119] style fix
      0852655 [Tarek Auel] [SPARK-8119] changed from ExpectsInputTypes to implicit casts
      ec87c69 [Tarek Auel] [SPARK-8119] bug fixing and refactoring
      1358cdc [Tarek Auel] Merge remote-tracking branch 'origin/master' into SPARK-8199
      740af0e [Tarek Auel] implement date function using a calculation based on days
      4fb66da [Tarek Auel] WIP: date functions on calculation only
      1a436c9 [Tarek Auel] wip
      f775f39 [Tarek Auel] fixed return type
      ad17e96 [Tarek Auel] improved implementation
      c42b444 [Tarek Auel] Removed merge conflict file
      ccb723c [Tarek Auel] [SPARK-8199] style and fixed merge issues
      10e4ad1 [Tarek Auel] Merge branch 'master' into date-functions-fast
      7d9f0eb [Tarek Auel] [SPARK-8199] git renaming issue
      f3e7a9f [Tarek Auel] [SPARK-8199] revert change in DataFrameFunctionsSuite
      6f5d95c [Tarek Auel] [SPARK-8199] fixed year interval
      d9f8ac3 [Tarek Auel] [SPARK-8199] implement fast track
      7bc9d93 [Tarek Auel] Merge branch 'master' into SPARK-8199
      5a105d9 [Tarek Auel] [SPARK-8199] rebase after #6985 got merged
      eb6760d [Tarek Auel] Merge branch 'master' into SPARK-8199
      f120415 [Tarek Auel] improved runtime
      a8edebd [Tarek Auel] use Calendar instead of SimpleDateFormat
      5fe74e1 [Tarek Auel] fixed python style
      3bfac90 [Tarek Auel] fixed style
      356df78 [Tarek Auel] rely on cast mechanism of Spark. Simplified implementation
      02efc5d [Tarek Auel] removed doubled code
      a5ea120 [Tarek Auel] added python api; changed test to be more meaningful
      b680db6 [Tarek Auel] added codegeneration to all functions
      c739788 [Tarek Auel] added support for quarter SPARK-8178
      849fb41 [Tarek Auel] fixed stupid test
      638596f [Tarek Auel] improved codegen
      4d8049b [Tarek Auel] fixed tests and added type check
      5ebb235 [Tarek Auel] resolved naming conflict
      d0e2f99 [Tarek Auel] date functions
      83b682be
  12. Jul 15, 2015
    • Cheng Hao's avatar
      [SPARK-8245][SQL] FormatNumber/Length Support for Expression · 42dea3ac
      Cheng Hao authored
      - `BinaryType` for `Length`
      - `FormatNumber`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7034 from chenghao-intel/expression and squashes the following commits:
      
      e534b87 [Cheng Hao] python api style issue
      601bbf5 [Cheng Hao] add python API support
      3ebe288 [Cheng Hao] update as feedback
      52274f7 [Cheng Hao] add support for udf_format_number and length for binary
      42dea3ac
  13. Jul 06, 2015
    • Davies Liu's avatar
      [SPARK-8784] [SQL] Add Python API for hex and unhex · 37e4d921
      Davies Liu authored
      Add Python API for hex/unhex,  also cleanup Hex/Unhex
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7223 from davies/hex and squashes the following commits:
      
      6f1249d [Davies Liu] no explicit rule to cast string into binary
      711a6ed [Davies Liu] fix test
      f9fe5a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
      f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex
      49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
      b31fc9a [Davies Liu] Update math.scala
      25156b7 [Davies Liu] address comments and fix test
      c3af78c [Davies Liu] address commments
      1a24082 [Davies Liu] Add Python API for hex and unhex
      37e4d921
  14. Jul 04, 2015
    • Tarek Auel's avatar
      [SPARK-8270][SQL] levenshtein distance · 6b3574e6
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8270
      
      Info: I can not build the latest master, it stucks during the build process: `[INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml`
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7214 from tarekauel/SPARK-8270 and squashes the following commits:
      
      ab348b9 [Tarek Auel] Merge branch 'master' into SPARK-8270
      a2ad318 [Tarek Auel] [SPARK-8270] changed order of fields
      d91b12c [Tarek Auel] [SPARK-8270] python fix
      adbd075 [Tarek Auel] [SPARK-8270] fixed typo
      23185c9 [Tarek Auel] [SPARK-8270] levenshtein distance
      6b3574e6
  15. Jul 03, 2015
    • zhichao.li's avatar
      [SPARK-8226] [SQL] Add function shiftrightunsigned · ab535b9a
      zhichao.li authored
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #7035 from zhichao-li/shiftRightUnsigned and squashes the following commits:
      
      6bcca5a [zhichao.li] change coding style
      3e9f5ae [zhichao.li] python style
      d85ae0b [zhichao.li] add shiftrightunsigned
      ab535b9a
  16. Jul 02, 2015
    • Reynold Xin's avatar
      Revert "[SPARK-8784] [SQL] Add Python API for hex and unhex" · e589e71a
      Reynold Xin authored
      This reverts commit fc7aebd9.
      e589e71a
    • Davies Liu's avatar
      [SPARK-8784] [SQL] Add Python API for hex and unhex · fc7aebd9
      Davies Liu authored
      Also improve the performance of hex/unhex
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7181 from davies/hex and squashes the following commits:
      
      f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex
      49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
      b31fc9a [Davies Liu] Update math.scala
      25156b7 [Davies Liu] address comments and fix test
      c3af78c [Davies Liu] address commments
      1a24082 [Davies Liu] Add Python API for hex and unhex
      fc7aebd9
    • Yijie Shen's avatar
      [SPARK-8407] [SQL] complex type constructors: struct and named_struct · 52302a80
      Yijie Shen authored
      This is a follow up of [SPARK-8283](https://issues.apache.org/jira/browse/SPARK-8283) ([PR-6828](https://github.com/apache/spark/pull/6828)), to support both `struct` and `named_struct` in Spark SQL.
      
      After [#6725](https://github.com/apache/spark/pull/6828), the semantic of [`CreateStruct`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala#L56) methods have changed a little and do not limited to cols of `NamedExpressions`, it will name non-NamedExpression fields following the hive convention, col1, col2 ...
      
      This PR would both loosen [`struct`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L723) to take children of `Expression` type and add `named_struct` support.
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #6874 from yijieshen/SPARK-8283 and squashes the following commits:
      
      4cd3375ac [Yijie Shen] change struct documentation
      d599d0b [Yijie Shen] rebase code
      9a7039e [Yijie Shen] fix reviews and regenerate golden answers
      b487354 [Yijie Shen] replace assert using checkAnswer
      f07e114 [Yijie Shen] tiny fix
      9613be9 [Yijie Shen] review fix
      7fef712 [Yijie Shen] Fix checkInputTypes' implementation using foldable and nullable
      60812a7 [Yijie Shen] Fix type check
      828d694 [Yijie Shen] remove unnecessary resolved assertion inside dataType method
      fd3cd8e [Yijie Shen] remove type check from eval
      7a71255 [Yijie Shen] tiny fix
      ccbbd86 [Yijie Shen] Fix reviews
      47da332 [Yijie Shen] remove nameStruct API from DataFrame
      917e680 [Yijie Shen] Fix reviews
      4bd75ad [Yijie Shen] loosen struct method in functions.scala to take Expression children
      0acb7be [Yijie Shen] Add CreateNamedStruct in both DataFrame function API and FunctionRegistery
      52302a80
    • Tarek Auel's avatar
      [SPARK-8223] [SPARK-8224] [SQL] shift left and shift right · 5b333813
      Tarek Auel authored
      Jira:
      https://issues.apache.org/jira/browse/SPARK-8223
      https://issues.apache.org/jira/browse/SPARK-8224
      
      ~~I am aware of #7174 and will update this pr, if it's merged.~~ Done
      I don't know if #7034 can simplify this, but we can have a look on it, if it gets merged
      
      rxin In the Jira ticket the function as no second argument. I added a `numBits` argument that allows to specify the number of bits. I guess this improves the usability. I wanted to add `shiftleft(value)` as well, but the `selectExpr` dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scala `def shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr)`, but as I mentioned this doesn't pass tests like `df.selectExpr("shiftRight(a)", ...` (not enough arguments exception).
      
      If we need the bitwise shift in order to be hive compatible, I suggest to add `shiftLeft` and something like `shiftLeftX`
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7178 from tarekauel/8223 and squashes the following commits:
      
      8023bb5 [Tarek Auel] [SPARK-8223][SPARK-8224] fixed test
      f3f64e6 [Tarek Auel] [SPARK-8223][SPARK-8224] Integer -> Int
      f628706 [Tarek Auel] [SPARK-8223][SPARK-8224] removed toString; updated function description
      3b56f2a [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223
      5189690 [Tarek Auel] [SPARK-8223][SPARK-8224] minor fix and style fix
      9434a28 [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223
      44ee324 [Tarek Auel] [SPARK-8223][SPARK-8224] docu fix
      ac7fe9d [Tarek Auel] [SPARK-8223][SPARK-8224] right and left bit shift
      5b333813
  17. Jul 01, 2015
    • Reynold Xin's avatar
      [SPARK-8770][SQL] Create BinaryOperator abstract class. · 9fd13d56
      Reynold Xin authored
      Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
      
      This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7174 from rxin/binary-opterator and squashes the following commits:
      
      f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
      fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator
      d8518cf [Reynold Xin] Updated Python tests.
      9fd13d56
  18. Jun 30, 2015
    • Tarek Auel's avatar
      [SPARK-8727] [SQL] Missing python api; md5, log2 · ccdb0522
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8727
      
      Author: Tarek Auel <tarek.auel@gmail.com>
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7114 from tarekauel/missing-python and squashes the following commits:
      
      ef4c61b [Tarek Auel] [SPARK-8727] revert dataframe change
      4029d4d [Tarek Auel] removed dataframe pi and e unit test
      66f0d2b [Tarek Auel] removed pi and e from python api and dataframe api; added _to_java_column(col) for strlen
      4d07318 [Tarek Auel] fixed python unit test
      45f2bee [Tarek Auel] fixed result of pi and e
      c39f47b [Tarek Auel] add python api
      bd50a3a [Tarek Auel] add missing python functions
      ccdb0522
  19. Jun 29, 2015
    • Tarek Auel's avatar
      [SPARK-8235] [SQL] misc function sha / sha1 · a5c2961c
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8235
      
      I added the support for sha1. If I understood rxin correctly, sha and sha1 should execute the same algorithm, shouldn't they?
      
      Please take a close look on the Python part. This is adopted from #6934
      
      Author: Tarek Auel <tarek.auel@gmail.com>
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #6963 from tarekauel/SPARK-8235 and squashes the following commits:
      
      f064563 [Tarek Auel] change to shaHex
      7ce3cdc [Tarek Auel] rely on automatic cast
      a1251d6 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-8235
      68eb043 [Tarek Auel] added docstring
      be5aff1 [Tarek Auel] improved error message
      7336c96 [Tarek Auel] added type check
      cf23a80 [Tarek Auel] simplified example
      ebf75ef [Tarek Auel] [SPARK-8301] updated the python documentation. Removed sha in python and scala
      6d6ff0d [Tarek Auel] [SPARK-8233] added docstring
      ea191a9 [Tarek Auel] [SPARK-8233] fixed signatureof python function. Added expected type to misc
      e3fd7c3 [Tarek Auel] SPARK[8235] added sha to the list of __all__
      e5dad4e [Tarek Auel] SPARK[8235] sha / sha1
      a5c2961c
  20. Jun 26, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-8237] [SQL] Add misc function sha2 · 47c874ba
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8237
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6934 from viirya/expr_sha2 and squashes the following commits:
      
      35e0bb3 [Liang-Chi Hsieh] For comments.
      68b5284 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
      8573aff [Liang-Chi Hsieh] Remove unnecessary Product.
      ee61e06 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
      59e41aa [Liang-Chi Hsieh] Add misc function: sha2.
      47c874ba
  21. Jun 19, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-8207] [SQL] Add math function bin · 2c59d5c1
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8207
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6721 from viirya/expr_bin and squashes the following commits:
      
      07e1c8f [Liang-Chi Hsieh] Remove AbstractUnaryMathExpression and let BIN inherit UnaryExpression.
      0677f1a [Liang-Chi Hsieh] For comments.
      cf62b95 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      0cf20f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      dea9c12 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      d4f4774 [Liang-Chi Hsieh] Add @ignore_unicode_prefix.
      7a0196f [Liang-Chi Hsieh] Fix python style.
      ac2bacd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      a0a2d0f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      4cb764d [Liang-Chi Hsieh] For comments.
      0f78682 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      c0c3197 [Liang-Chi Hsieh] Add bin to FunctionRegistry.
      824f761 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      50e0c3b [Liang-Chi Hsieh] Add math function bin(a: long): string.
      2c59d5c1
  22. Jun 18, 2015
    • Reynold Xin's avatar
      [SPARK-8218][SQL] Binary log math function update. · dc413138
      Reynold Xin authored
      Some minor updates based on after merging #6725.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6871 from rxin/log and squashes the following commits:
      
      ab51542 [Reynold Xin] Use JVM log
      76fc8de [Reynold Xin] Fixed arg.
      a7c1522 [Reynold Xin] [SPARK-8218][SQL] Binary log math function update.
      dc413138
    • Liang-Chi Hsieh's avatar
      [SPARK-8218][SQL] Add binary log math function · fee3438a
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8218
      
      Because there is already `log` unary function defined, the binary log function is called `logarithm` for now.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6725 from viirya/expr_binary_log and squashes the following commits:
      
      bf96bd9 [Liang-Chi Hsieh] Compare log result in string.
      102070d [Liang-Chi Hsieh] Round log result to better comparing in python test.
      fd01863 [Liang-Chi Hsieh] For comments.
      beed631 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      6089d11 [Liang-Chi Hsieh] Remove unnecessary override.
      8cf37b7 [Liang-Chi Hsieh] For comments.
      bc89597 [Liang-Chi Hsieh] For comments.
      db7dc38 [Liang-Chi Hsieh] Use ctor instead of companion object.
      0634ef7 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      1750034 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      3d75bfc [Liang-Chi Hsieh] Fix scala style.
      5b39c02 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      23c54a3 [Liang-Chi Hsieh] Fix scala style.
      ebc9929 [Liang-Chi Hsieh] Let Logarithm accept one parameter too.
      605574d [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      21c3bfd [Liang-Chi Hsieh] Fix scala style.
      c6c187f [Liang-Chi Hsieh] For comments.
      c795342 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      f373bac [Liang-Chi Hsieh] Add binary log expression.
      fee3438a
  23. May 23, 2015
    • Davies Liu's avatar
      [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates · efe3bfdf
      Davies Liu authored
      1. ntile should take an integer as parameter.
      2. Added Python API (based on #6364)
      3. Update documentation of various DataFrame Python functions.
      
      Author: Davies Liu <davies@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6374 from rxin/window-final and squashes the following commits:
      
      69004c7 [Reynold Xin] Style fix.
      288cea9 [Reynold Xin] Update documentaiton.
      7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
      66092b4 [Davies Liu] update docs
      ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
      ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
      8936ade [Davies Liu] fix maxint in python 3
      2649358 [Davies Liu] update docs
      778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
      efe3bfdf
  24. May 21, 2015
    • Davies Liu's avatar
      [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs · 8ddcb25b
      Davies Liu authored
      Add version info for public Python SQL API.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6295 from davies/versions and squashes the following commits:
      
      cfd91e6 [Davies Liu] add more version for DataFrame API
      600834d [Davies Liu] add version to SQL API docs
      8ddcb25b
  25. May 18, 2015
    • Davies Liu's avatar
      [SPARK-6216] [PYSPARK] check python version of worker with driver · 32fbd297
      Davies Liu authored
      This PR revert #5404, change to pass the version of python in driver into JVM, check it in worker before deserializing closure, then it can works with different major version of Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6203 from davies/py_version and squashes the following commits:
      
      b8fb76e [Davies Liu] fix test
      6ce5096 [Davies Liu] use string for version
      47c6278 [Davies Liu] check python version of worker with driver
      32fbd297
  26. May 15, 2015
    • Davies Liu's avatar
      [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files · d7b69946
      Davies Liu authored
      dataframe.py is splited into column.py, group.py and dataframe.py:
      ```
         360 column.py
        1223 dataframe.py
         183 group.py
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6201 from davies/split_df and squashes the following commits:
      
      fc8f5ab [Davies Liu] split dataframe.py into multiple files
      d7b69946
  27. May 14, 2015
    • Michael Armbrust's avatar
      [SPARK-7548] [SQL] Add explode function for DataFrames · 6d0633e3
      Michael Armbrust authored
      Add an `explode` function for dataframes and modify the analyzer so that single table generating functions can be present in a select clause along with other expressions.   There are currently the following restrictions:
       - only top level TGFs are allowed (i.e. no `select(explode('list) + 1)`)
       - only one may be present in a single select to avoid potentially confusing implicit Cartesian products.
      
      TODO:
       - [ ] Python
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6107 from marmbrus/explodeFunction and squashes the following commits:
      
      7ee2c87 [Michael Armbrust] whitespace
      6f80ba3 [Michael Armbrust] Update dataframe.py
      c176c89 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      81b5da3 [Michael Armbrust] style
      d3faa05 [Michael Armbrust] fix self join case
      f9e1e3e [Michael Armbrust] fix python, add since
      4f0d0a9 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      e710fe4 [Michael Armbrust] add java and python
      52ca0dc [Michael Armbrust] [SPARK-7548][SQL] Add explode function for dataframes.
      6d0633e3
Loading