Skip to content
Snippets Groups Projects
  1. Aug 13, 2015
  2. Aug 12, 2015
  3. Aug 08, 2015
    • Davies Liu's avatar
      [SPARK-6902] [SQL] [PYSPARK] Row should be read-only · ac507a03
      Davies Liu authored
      Raise an read-only exception when user try to mutable a Row.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8009 from davies/readonly_row and squashes the following commits:
      
      8722f3f [Davies Liu] add tests
      05a3d36 [Davies Liu] Row should be read-only
      ac507a03
    • Davies Liu's avatar
      [SPARK-4561] [PYSPARK] [SQL] turn Row into dict recursively · 74a6541a
      Davies Liu authored
      Add an option `recursive` to `Row.asDict()`, when True (default is False), it will convert the nested Row into dict.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8006 from davies/as_dict and squashes the following commits:
      
      922cc5a [Davies Liu] turn Row into dict recursively
      74a6541a
  4. Aug 07, 2015
    • Reynold Xin's avatar
      [SPARK-9733][SQL] Improve physical plan explain for data sources · 05d04e10
      Reynold Xin authored
      All data sources show up as "PhysicalRDD" in physical plan explain. It'd be better if we can show the name of the data source.
      
      Without this patch:
      ```
      == Physical Plan ==
      NewAggregate with UnsafeHybridAggregationIterator ArrayBuffer(date#0, cat#1) ArrayBuffer((sum(CAST((CAST(count#2, IntegerType) + 1), LongType))2,mode=Final,isDistinct=false))
       Exchange hashpartitioning(date#0,cat#1)
        NewAggregate with UnsafeHybridAggregationIterator ArrayBuffer(date#0, cat#1) ArrayBuffer((sum(CAST((CAST(count#2, IntegerType) + 1), LongType))2,mode=Partial,isDistinct=false))
         PhysicalRDD [date#0,cat#1,count#2], MapPartitionsRDD[3] at
      ```
      
      With this patch:
      ```
      == Physical Plan ==
      TungstenAggregate(key=[date#0,cat#1], value=[(sum(CAST((CAST(count#2, IntegerType) + 1), LongType)),mode=Final,isDistinct=false)]
       Exchange hashpartitioning(date#0,cat#1)
        TungstenAggregate(key=[date#0,cat#1], value=[(sum(CAST((CAST(count#2, IntegerType) + 1), LongType)),mode=Partial,isDistinct=false)]
         ConvertToUnsafe
          Scan ParquetRelation[file:/scratch/rxin/spark/sales4][date#0,cat#1,count#2]
      ```
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8024 from rxin/SPARK-9733 and squashes the following commits:
      
      811b90e [Reynold Xin] Fixed Python test case.
      52cab77 [Reynold Xin] Cast.
      eea9ccc [Reynold Xin] Fix test case.
      fcecb22 [Reynold Xin] [SPARK-9733][SQL] Improve explain message for data source scan node.
      05d04e10
  5. Aug 06, 2015
  6. Aug 05, 2015
    • Yijie Shen's avatar
      [SPARK-6591] [SQL] Python data source load options should auto convert common types into strings · 8c320e45
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-6591
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7926 from yjshen/py_dsload_opt and squashes the following commits:
      
      b207832 [Yijie Shen] fix style
      efdf834 [Yijie Shen] resolve comment
      7a8f6a2 [Yijie Shen] lowercase
      822e769 [Yijie Shen] convert load opts to string
      8c320e45
    • Pedro Rodriguez's avatar
      [SPARK-8231] [SQL] Add array_contains · d3454858
      Pedro Rodriguez authored
      This PR is based on #7580 , thanks to EntilZha
      
      PR for work on https://issues.apache.org/jira/browse/SPARK-8231
      
      Currently, I have an initial implementation for contains. Based on discussion on JIRA, it should behave same as Hive: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128
      
      Main points are:
      1. If the array is empty, null, or the value is null, return false
      2. If there is a type mismatch, throw error
      3. If comparison is not supported, throw error
      
      Closes #7580
      
      Author: Pedro Rodriguez <prodriguez@trulia.com>
      Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7949 from davies/array_contains and squashes the following commits:
      
      d3c08bc [Davies Liu] use foreach() to avoid copy
      bc3d1fe [Davies Liu] fix array_contains
      719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into array_contains
      e352cf9 [Pedro Rodriguez] fixed diff from master
      4d5b0ff [Pedro Rodriguez] added docs and another type check
      ffc0591 [Pedro Rodriguez] fixed unit test
      7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints which are different between python 2 an 3
      b5ffae8 [Pedro Rodriguez] fixed pyspark test
      4e7dce3 [Pedro Rodriguez] added more docs
      3082399 [Pedro Rodriguez] fixed unit test
      46f9789 [Pedro Rodriguez] reverted change
      d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then added tests to insure this
      8528027 [Pedro Rodriguez] added more tests
      686e029 [Pedro Rodriguez] fix scala style
      d262e9d [Pedro Rodriguez] reworked type checking code and added more tests
      2517a58 [Pedro Rodriguez] removed unused import
      28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests
      12f8795 [Pedro Rodriguez] fix scala style checks
      e8a20a9 [Pedro Rodriguez] added python df (broken atm)
      65b562c [Pedro Rodriguez] made array_contains nullable false
      33b45aa [Pedro Rodriguez] reordered test
      9623c64 [Pedro Rodriguez] fixed test
      4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs
      72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs
      69c46fb [Pedro Rodriguez] added tests and codegen
      9e0bfc4 [Pedro Rodriguez] initial attempt at implementation
      d3454858
  7. Aug 04, 2015
    • Davies Liu's avatar
      [SPARK-9513] [SQL] [PySpark] Add python API for DataFrame functions · 2b67fdb6
      Davies Liu authored
      This adds Python API for those DataFrame functions that is introduced in 1.5.
      
      There is issue with serialize byte_array in Python 3, so some of functions (for BinaryType) does not have tests.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7922 from davies/python_functions and squashes the following commits:
      
      8ad942f [Davies Liu] fix test
      5fb6ec3 [Davies Liu] fix bugs
      3495ed3 [Davies Liu] fix issues
      ea5f7bb [Davies Liu] Add python API for DataFrame functions
      2b67fdb6
  8. Aug 01, 2015
    • HuJiayin's avatar
      [SPARK-8269] [SQL] string function: initcap · 00cd92f3
      HuJiayin authored
      This PR is based on #7208 , thanks to HuJiayin
      
      Closes #7208
      
      Author: HuJiayin <jiayin.hu@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7850 from davies/initcap and squashes the following commits:
      
      54472e9 [Davies Liu] fix python test
      17ffe51 [Davies Liu] Merge branch 'master' of github.com:apache/spark into initcap
      ca46390 [Davies Liu] Merge branch 'master' of github.com:apache/spark into initcap
      3a906e4 [Davies Liu] implement title case in UTF8String
      8b2506a [HuJiayin] Update functions.py
      2cd43e5 [HuJiayin] fix python style check
      b616c0e [HuJiayin] add python api
      1f5a0ef [HuJiayin] add codegen
      7e0c604 [HuJiayin] Merge branch 'master' of https://github.com/apache/spark into initcap
      6a0b958 [HuJiayin] add column
      c79482d [HuJiayin] support soundex
      7ce416b [HuJiayin] support initcap rebase code
      00cd92f3
    • zhichao.li's avatar
      [SPARK-8263] [SQL] substr/substring should also support binary type · c5166f7a
      zhichao.li authored
      This is based on #7641, thanks to zhichao-li
      
      Closes #7641
      
      Author: zhichao.li <zhichao.li@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7848 from davies/substr and squashes the following commits:
      
      461b709 [Davies Liu] remove bytearry from tests
      b45377a [Davies Liu] Merge branch 'master' of github.com:apache/spark into substr
      01d795e [zhichao.li] scala style
      99aa130 [zhichao.li] add substring to dataframe
      4f68bfe [zhichao.li] add binary type support for substring
      c5166f7a
    • Cheng Hao's avatar
      [SPARK-8232] [SQL] Add sort_array support · cf6c9ca3
      Cheng Hao authored
      This PR is based on #7581 , just fix the conflict.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7851 from davies/sort_array and squashes the following commits:
      
      a80ef66 [Davies Liu] fix conflict
      7cfda65 [Davies Liu] Merge branch 'master' of github.com:apache/spark into sort_array
      664c960 [Cheng Hao] update the sort_array by using the ArrayData
      276d2d5 [Cheng Hao] add empty line
      0edab9c [Cheng Hao] Add asending/descending support for sort_array
      80fc0f8 [Cheng Hao] Add type checking
      a42b678 [Cheng Hao] Add sort_array support
      cf6c9ca3
    • Davies Liu's avatar
      Revert "[SPARK-8232] [SQL] Add sort_array support" · 60ea7ab4
      Davies Liu authored
      This reverts commit 67ad4e21.
      60ea7ab4
    • Cheng Hao's avatar
      [SPARK-8232] [SQL] Add sort_array support · 67ad4e21
      Cheng Hao authored
      Add expression `sort_array` support.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Davies Liu <davies.liu@gmail.com>
      
      Closes #7581 from chenghao-intel/sort_array and squashes the following commits:
      
      664c960 [Cheng Hao] update the sort_array by using the ArrayData
      276d2d5 [Cheng Hao] add empty line
      0edab9c [Cheng Hao] Add asending/descending support for sort_array
      80fc0f8 [Cheng Hao] Add type checking
      a42b678 [Cheng Hao] Add sort_array support
      67ad4e21
  9. Jul 31, 2015
    • zhichao.li's avatar
      [SPARK-8264][SQL]add substring_index function · 6996bd2e
      zhichao.li authored
      This PR is based on #7533 , thanks to zhichao-li
      
      Closes #7533
      
      Author: zhichao.li <zhichao.li@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7843 from davies/str_index and squashes the following commits:
      
      391347b [Davies Liu] add python api
      3ce7802 [Davies Liu] fix substringIndex
      f2d29a1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into str_index
      515519b [zhichao.li] add foldable and remove null checking
      9546991 [zhichao.li] scala style
      67c253a [zhichao.li] hide some apis and clean code
      b19b013 [zhichao.li] add codegen and clean code
      ac863e9 [zhichao.li] reduce the calling of numChars
      12e108f [zhichao.li] refine unittest
      d92951b [zhichao.li] add lastIndexOf
      52d7b03 [zhichao.li] add substring_index function
      6996bd2e
    • HuJiayin's avatar
      [SPARK-8271][SQL]string function: soundex · 4d5a6e7b
      HuJiayin authored
      This PR brings SQL function soundex(), see https://issues.apache.org/jira/browse/HIVE-9738
      
      It's based on #7115 , thanks to HuJiayin
      
      Author: HuJiayin <jiayin.hu@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7812 from davies/soundex and squashes the following commits:
      
      fa75941 [Davies Liu] Merge branch 'master' of github.com:apache/spark into soundex
      a4bd6d8 [Davies Liu] fix soundex
      2538908 [HuJiayin] add codegen soundex
      d15d329 [HuJiayin] add back ut
      ded1a14 [HuJiayin] Merge branch 'master' of https://github.com/apache/spark
      e2dec2c [HuJiayin] support soundex rebase code
      4d5a6e7b
  10. Jul 30, 2015
    • Daoyuan Wang's avatar
      [SPARK-8176] [SPARK-8197] [SQL] function to_date/ trunc · 83670fc9
      Daoyuan Wang authored
      This PR is based on #6988 , thanks to adrian-wang .
      
      This brings two SQL functions: to_date() and trunc().
      
      Closes #6988
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7805 from davies/to_date and squashes the following commits:
      
      2c7beba [Davies Liu] Merge branch 'master' of github.com:apache/spark into to_date
      310dd55 [Daoyuan Wang] remove dup test in rebase
      980b092 [Daoyuan Wang] resolve rebase conflict
      a476c5a [Daoyuan Wang] address comments from davies
      d44ea5f [Daoyuan Wang] function to_date, trunc
      83670fc9
    • Xiangrui Meng's avatar
      [SPARK-7157][SQL] add sampleBy to DataFrame · df326695
      Xiangrui Meng authored
      This was previously committed but then reverted due to test failures (see #6769).
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7755 from rxin/SPARK-7157 and squashes the following commits:
      
      fbf9044 [Xiangrui Meng] fix python test
      542bd37 [Xiangrui Meng] update test
      604fe6d [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
      f051afd [Xiangrui Meng] use udf instead of building expression
      f4e9425 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
      8fb990b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
      103beb3 [Xiangrui Meng] add Java-friendly sampleBy
      991f26f [Xiangrui Meng] fix seed
      4a14834 [Xiangrui Meng] move sampleBy to stat
      832f7cc [Xiangrui Meng] add sampleBy to DataFrame
      df326695
    • Xiangrui Meng's avatar
      [SPARK-9408] [PYSPARK] [MLLIB] Refactor linalg.py to /linalg · ca71cc8c
      Xiangrui Meng authored
      This is based on MechCoder 's PR https://github.com/apache/spark/pull/7731. Hopefully it could pass tests. MechCoder I tried to make minimal changes. If this passes Jenkins, we can merge this one first and then try to move `__init__.py` to `local.py` in a separate PR.
      
      Closes #7731
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7746 from mengxr/SPARK-9408 and squashes the following commits:
      
      0e05a3b [Xiangrui Meng] merge master
      1135551 [Xiangrui Meng] add a comment for str(...)
      c48cae0 [Xiangrui Meng] update tests
      173a805 [Xiangrui Meng] move linalg.py to linalg/__init__.py
      ca71cc8c
    • Daoyuan Wang's avatar
      [SPARK-8186] [SPARK-8187] [SPARK-8194] [SPARK-8198] [SPARK-9133] [SPARK-9290]... · 1abf7dc1
      Daoyuan Wang authored
      [SPARK-8186] [SPARK-8187] [SPARK-8194] [SPARK-8198] [SPARK-9133] [SPARK-9290] [SQL] functions: date_add, date_sub, add_months, months_between, time-interval calculation
      
      This PR is based on #7589 , thanks to adrian-wang
      
      Added SQL function date_add, date_sub, add_months, month_between, also add a rule for
      add/subtract of date/timestamp and interval.
      
      Closes #7589
      
      cc rxin
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7754 from davies/date_add and squashes the following commits:
      
      e8c633a [Davies Liu] Merge branch 'master' of github.com:apache/spark into date_add
      9e8e085 [Davies Liu] Merge branch 'master' of github.com:apache/spark into date_add
      6224ce4 [Davies Liu] fix conclict
      bd18cd4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into date_add
      e47ff2c [Davies Liu] add python api, fix date functions
      01943d0 [Davies Liu] Merge branch 'master' into date_add
      522e91a [Daoyuan Wang] fix
      e8a639a [Daoyuan Wang] fix
      42df486 [Daoyuan Wang] fix style
      87c4b77 [Daoyuan Wang] function add_months, months_between and some fixes
      1a68e03 [Daoyuan Wang] poc of time interval calculation
      c506661 [Daoyuan Wang] function date_add , date_sub
      1abf7dc1
    • Davies Liu's avatar
      [SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__ · e044705b
      Davies Liu authored
      Also we could create a Python UDT without having a Scala one, it's important for Python users.
      
      cc mengxr JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7453 from davies/class_in_main and squashes the following commits:
      
      4dfd5e1 [Davies Liu] add tests for Python and Scala UDT
      793d9b2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      dc65f19 [Davies Liu] address comment
      a9a3c40 [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      a86e1fc [Davies Liu] fix serialization
      ad528ba [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      63f52ef [Davies Liu] fix pylint check
      655b8a9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into class_in_main
      316a394 [Davies Liu] support Python UDT with UTF
      0bcb3ef [Davies Liu] fix bug in mllib
      de986d6 [Davies Liu] fix test
      83d65ac [Davies Liu] fix bug in StructType
      55bb86e [Davies Liu] support Python UDT in __main__ (without Scala one)
      e044705b
    • Alex Angelini's avatar
      Fix reference to self.names in StructType · f5dd1133
      Alex Angelini authored
      `names` is not defined in this context, I think you meant `self.names`.
      
      davies
      
      Author: Alex Angelini <alex.louis.angelini@gmail.com>
      
      Closes #7766 from angelini/fix_struct_type_names and squashes the following commits:
      
      01543a1 [Alex Angelini] Fix reference to self.names in StructType
      f5dd1133
  11. Jul 25, 2015
    • JD's avatar
      [Spark-8668][SQL] Adding expr to functions · 723db13e
      JD authored
      Author: JD <jd@csh.rit.edu>
      Author: Joseph Batchik <josephbatchik@gmail.com>
      
      Closes #7606 from JDrit/expr and squashes the following commits:
      
      ad7f607 [Joseph Batchik] fixing python linter error
      9d6daea [Joseph Batchik] removed order by per @rxin's comment
      707d5c6 [Joseph Batchik] Added expr to fuctions.py
      79df83c [JD] added example to the docs
      b89eec8 [JD] moved function up as per @rxin's comment
      4960909 [JD] updated per @JoshRosen's comment
      2cb329c [JD] updated per @rxin's comment
      9a9ad0c [JD] removing unused import
      6dc26d0 [JD] removed split
      7f2222c [JD] Adding expr function as per SPARK-8668
      723db13e
  12. Jul 23, 2015
    • Davies Liu's avatar
      [SPARK-9069] [SPARK-9264] [SQL] remove unlimited precision support for DecimalType · 8a94eb23
      Davies Liu authored
      Romove Decimal.Unlimited (change to support precision up to 38, to match with Hive and other databases).
      
      In order to keep backward source compatibility, Decimal.Unlimited is still there, but change to Decimal(38, 18).
      
      If no precision and scale is provide, it's Decimal(10, 0) as before.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7605 from davies/decimal_unlimited and squashes the following commits:
      
      aa3f115 [Davies Liu] fix tests and style
      fb0d20d [Davies Liu] address comments
      bfaae35 [Davies Liu] fix style
      df93657 [Davies Liu] address comments and clean up
      06727fd [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_unlimited
      4c28969 [Davies Liu] fix tests
      8d783cc [Davies Liu] fix tests
      788631c [Davies Liu] fix double with decimal in Union/except
      1779bde [Davies Liu] fix scala style
      c9c7c78 [Davies Liu] remove Decimal.Unlimited
      8a94eb23
    • Xiangrui Meng's avatar
      [SPARK-9243] [Documentation] null -> zero in crosstab doc · ecfb3127
      Xiangrui Meng authored
      We forgot to update doc. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7608 from mengxr/SPARK-9243 and squashes the following commits:
      
      0ea3236 [Xiangrui Meng] null -> zero in crosstab doc
      ecfb3127
  13. Jul 21, 2015
    • Pedro Rodriguez's avatar
      [SPARK-8230][SQL] Add array/map size method · 560c658a
      Pedro Rodriguez authored
      Pull Request for: https://issues.apache.org/jira/browse/SPARK-8230
      
      Primary issue resolved is to implement array/map size for Spark SQL. Code is ready for review by a committer. Chen Hao is on the JIRA ticket, but I don't know his username on github, rxin is also on JIRA ticket.
      
      Things to review:
      1. Where to put added functions namespace wise, they seem to be part of a few operations on collections which includes `sort_array` and `array_contains`. Hence the name given `collectionOperations.scala` and `_collection_functions` in python.
      2. In Python code, should it be in a `1.5.0` function array or in a collections array?
      3. Are there any missing methods on the `Size` case class? Looks like many of these functions have generated Java code, is that also needed in this case?
      4. Something else?
      
      Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
      Author: Pedro Rodriguez <prodriguez@trulia.com>
      
      Closes #7462 from EntilZha/SPARK-8230 and squashes the following commits:
      
      9a442ae [Pedro Rodriguez] fixed functions and sorted __all__
      9aea3bb [Pedro Rodriguez] removed imports from python docs
      15d4bf1 [Pedro Rodriguez] Added null test case and changed to nullSafeCodeGen
      d88247c [Pedro Rodriguez] removed python code
      bd5f0e4 [Pedro Rodriguez] removed duplicate function from rebase/merge
      59931b4 [Pedro Rodriguez] fixed compile bug instroduced when merging
      c187175 [Pedro Rodriguez] updated code to add size to __all__ directly and removed redundent pretty print
      130839f [Pedro Rodriguez] fixed failing test
      aa9bade [Pedro Rodriguez] fix style
      e093473 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
      0449377 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
      9a1a2ff [Pedro Rodriguez] added unit tests for map size
      2bfbcb6 [Pedro Rodriguez] added unit test for size
      20df2b4 [Pedro Rodriguez] Finished working version of size function and added it to python
      b503e75 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
      99a6a5c [Pedro Rodriguez] fixed failing test
      cac75ac [Pedro Rodriguez] fix style
      933d843 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
      42bb7d4 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
      f9c3b8a [Pedro Rodriguez] added unit tests for map size
      2515d9f [Pedro Rodriguez] added documentation
      0e60541 [Pedro Rodriguez] added unit test for size
      acf9853 [Pedro Rodriguez] Finished working version of size function and added it to python
      84a5d38 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
      560c658a
    • Cheng Hao's avatar
      [SPARK-8255] [SPARK-8256] [SQL] Add regex_extract/regex_replace · 8c8f0ef5
      Cheng Hao authored
      Add expressions `regex_extract` & `regex_replace`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7468 from chenghao-intel/regexp and squashes the following commits:
      
      e5ea476 [Cheng Hao] minor update for documentation
      ef96fd6 [Cheng Hao] update the code gen
      72cf28f [Cheng Hao] Add more log for compilation error
      4e11381 [Cheng Hao] Add regexp_replace / regexp_extract support
      8c8f0ef5
    • Cheng Lian's avatar
      [SPARK-9100] [SQL] Adds DataFrame reader/writer shortcut methods for ORC · d38c5029
      Cheng Lian authored
      This PR adds DataFrame reader/writer shortcut methods for ORC in both Scala and Python.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7444 from liancheng/spark-9100 and squashes the following commits:
      
      284d043 [Cheng Lian] Fixes PySpark test cases and addresses PR comments
      e0b09fb [Cheng Lian] Adds DataFrame reader/writer shortcut methods for ORC
      d38c5029
  14. Jul 20, 2015
  15. Jul 19, 2015
    • Reynold Xin's avatar
      [SQL] Make date/time functions more consistent with other database systems. · 3427937e
      Reynold Xin authored
      This pull request fixes some of the problems in #6981.
      
      - Added date functions to `__all__` so they get exposed
      - Rename day_of_month -> dayofmonth
      - Rename day_in_year -> dayofyear
      - Rename week_of_year -> weekofyear
      - Removed "day" from Scala/Python API since it is ambiguous. Only leaving the alias in SQL.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Reynold Xin <rxin@databricks.com>
      
      Closes #7506 from rxin/datetime and squashes the following commits:
      
      0cb24d9 [Reynold Xin] Export all functions in Python.
      e44a4a0 [Reynold Xin] Removed day function from Scala and Python.
      9c08fdc [Reynold Xin] [SQL] Make date/time functions more consistent with other database systems.
      3427937e
    • Liang-Chi Hsieh's avatar
      [SPARK-9166][SQL][PYSPARK] Capture and hide IllegalArgumentException in Python API · 9b644c41
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9166
      
      Simply capture and hide `IllegalArgumentException` in Python API.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7497 from viirya/hide_illegalargument and squashes the following commits:
      
      8324dce [Liang-Chi Hsieh] Fix python style.
      9ace67d [Liang-Chi Hsieh] Also check exception message.
      8b2ce5c [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into hide_illegalargument
      7be016a [Liang-Chi Hsieh] Capture and hide IllegalArgumentException in Python.
      9b644c41
    • Tarek Auel's avatar
      [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK... · 83b682be
      Tarek Auel authored
      [SPARK-8199][SPARK-8184][SPARK-8183][SPARK-8182][SPARK-8181][SPARK-8180][SPARK-8179][SPARK-8177][SPARK-8178][SPARK-9115][SQL] date functions
      
      Jira:
      https://issues.apache.org/jira/browse/SPARK-8199
      https://issues.apache.org/jira/browse/SPARK-8184
      https://issues.apache.org/jira/browse/SPARK-8183
      https://issues.apache.org/jira/browse/SPARK-8182
      https://issues.apache.org/jira/browse/SPARK-8181
      https://issues.apache.org/jira/browse/SPARK-8180
      https://issues.apache.org/jira/browse/SPARK-8179
      https://issues.apache.org/jira/browse/SPARK-8177
      https://issues.apache.org/jira/browse/SPARK-8179
      https://issues.apache.org/jira/browse/SPARK-9115
      
      Regarding `day`and `dayofmonth` are both necessary?
      
      ~~I am going to add `Quarter` to this PR as well.~~ Done.
      
      ~~As soon as the Scala coding is reviewed and discussed, I'll add the python api.~~ Done
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      Author: Tarek Auel <tarek.auel@gmail.com>
      
      Closes #6981 from tarekauel/SPARK-8199 and squashes the following commits:
      
      f7b4c8c [Tarek Auel] [SPARK-8199] fixed bug in tests
      bb567b6 [Tarek Auel] [SPARK-8199] fixed test
      3e095ba [Tarek Auel] [SPARK-8199] style and timezone fix
      256c357 [Tarek Auel] [SPARK-8199] code cleanup
      5983dcc [Tarek Auel] [SPARK-8199] whitespace fix
      6e0c78f [Tarek Auel] [SPARK-8199] removed setTimeZone in tests, according to cloud-fans comment in #7488
      4afc09c [Tarek Auel] [SPARK-8199] concise leap year handling
      ea6c110 [Tarek Auel] [SPARK-8199] fix after merging master
      70238e0 [Tarek Auel] Merge branch 'master' into SPARK-8199
      3c6ae2e [Tarek Auel] [SPARK-8199] removed binary search
      fb98ba0 [Tarek Auel] [SPARK-8199] python docstring fix
      cdfae27 [Tarek Auel] [SPARK-8199] cleanup & python docstring fix
      746b80a [Tarek Auel] [SPARK-8199] build fix
      0ad6db8 [Tarek Auel] [SPARK-8199] minor fix
      523542d [Tarek Auel] [SPARK-8199] address comments
      2259299 [Tarek Auel] [SPARK-8199] day_of_month alias
      d01b977 [Tarek Auel] [SPARK-8199] python underscore
      56c4a92 [Tarek Auel] [SPARK-8199] update python docu
      e223bc0 [Tarek Auel] [SPARK-8199] refactoring
      d6aa14e [Tarek Auel] [SPARK-8199] fixed Hive compatibility
      b382267 [Tarek Auel] [SPARK-8199] fixed bug in day calculation; removed set TimeZone in HiveCompatibilitySuite for test purposes; removed Hive tests for second and minute, because we can cast '2015-03-18' to a timestamp and extract a minute/second from it
      1b2e540 [Tarek Auel] [SPARK-8119] style fix
      0852655 [Tarek Auel] [SPARK-8119] changed from ExpectsInputTypes to implicit casts
      ec87c69 [Tarek Auel] [SPARK-8119] bug fixing and refactoring
      1358cdc [Tarek Auel] Merge remote-tracking branch 'origin/master' into SPARK-8199
      740af0e [Tarek Auel] implement date function using a calculation based on days
      4fb66da [Tarek Auel] WIP: date functions on calculation only
      1a436c9 [Tarek Auel] wip
      f775f39 [Tarek Auel] fixed return type
      ad17e96 [Tarek Auel] improved implementation
      c42b444 [Tarek Auel] Removed merge conflict file
      ccb723c [Tarek Auel] [SPARK-8199] style and fixed merge issues
      10e4ad1 [Tarek Auel] Merge branch 'master' into date-functions-fast
      7d9f0eb [Tarek Auel] [SPARK-8199] git renaming issue
      f3e7a9f [Tarek Auel] [SPARK-8199] revert change in DataFrameFunctionsSuite
      6f5d95c [Tarek Auel] [SPARK-8199] fixed year interval
      d9f8ac3 [Tarek Auel] [SPARK-8199] implement fast track
      7bc9d93 [Tarek Auel] Merge branch 'master' into SPARK-8199
      5a105d9 [Tarek Auel] [SPARK-8199] rebase after #6985 got merged
      eb6760d [Tarek Auel] Merge branch 'master' into SPARK-8199
      f120415 [Tarek Auel] improved runtime
      a8edebd [Tarek Auel] use Calendar instead of SimpleDateFormat
      5fe74e1 [Tarek Auel] fixed python style
      3bfac90 [Tarek Auel] fixed style
      356df78 [Tarek Auel] rely on cast mechanism of Spark. Simplified implementation
      02efc5d [Tarek Auel] removed doubled code
      a5ea120 [Tarek Auel] added python api; changed test to be more meaningful
      b680db6 [Tarek Auel] added codegeneration to all functions
      c739788 [Tarek Auel] added support for quarter SPARK-8178
      849fb41 [Tarek Auel] fixed stupid test
      638596f [Tarek Auel] improved codegen
      4d8049b [Tarek Auel] fixed tests and added type check
      5ebb235 [Tarek Auel] resolved naming conflict
      d0e2f99 [Tarek Auel] date functions
      83b682be
  16. Jul 15, 2015
    • Cheng Hao's avatar
      [SPARK-8245][SQL] FormatNumber/Length Support for Expression · 42dea3ac
      Cheng Hao authored
      - `BinaryType` for `Length`
      - `FormatNumber`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7034 from chenghao-intel/expression and squashes the following commits:
      
      e534b87 [Cheng Hao] python api style issue
      601bbf5 [Cheng Hao] add python API support
      3ebe288 [Cheng Hao] update as feedback
      52274f7 [Cheng Hao] add support for udf_format_number and length for binary
      42dea3ac
  17. Jul 12, 2015
    • Davies Liu's avatar
      [SPARK-9006] [PYSPARK] fix microsecond loss in Python 3 · 20b47433
      Davies Liu authored
      It may loss a microsecond if using timestamp as float, should be `int` instead.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7363 from davies/fix_microsecond and squashes the following commits:
      
      36f6007 [Davies Liu] fix microsecond loss in Python 3
      20b47433
  18. Jul 10, 2015
Loading