Skip to content
Snippets Groups Projects
  • Davies Liu's avatar
    85842760
    [SPARK-6638] [SQL] Improve performance of StringType in SQL · 85842760
    Davies Liu authored
    This PR change the internal representation for StringType from java.lang.String to UTF8String, which is implemented use ArrayByte.
    
    This PR should not break any public API, Row.getString() will still return java.lang.String.
    
    This is the first step of improve the performance of String in SQL.
    
    cc rxin
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #5350 from davies/string and squashes the following commits:
    
    3b7bfa8 [Davies Liu] fix schema of AddJar
    2772f0d [Davies Liu] fix new test failure
    6d776a9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    59025c8 [Davies Liu] address comments from @marmbrus
    341ec2c [Davies Liu] turn off scala style check in UTF8StringSuite
    744788f [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    b04a19c [Davies Liu] add comment for getString/setString
    08d897b [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    5116b43 [Davies Liu] rollback unrelated changes
    1314a37 [Davies Liu] address comments from Yin
    867bf50 [Davies Liu] fix String filter push down
    13d9d42 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    2089d24 [Davies Liu] add hashcode check back
    ac18ae6 [Davies Liu] address comment
    fd11364 [Davies Liu] optimize UTF8String
    8d17f21 [Davies Liu] fix hive compatibility tests
    e5fa5b8 [Davies Liu] remove clone in UTF8String
    28f3d81 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    28d6f32 [Davies Liu] refactor
    537631c [Davies Liu] some comment about Date
    9f4c194 [Davies Liu] convert data type for data source
    956b0a4 [Davies Liu] fix hive tests
    73e4363 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    9dc32d1 [Davies Liu] fix some hive tests
    23a766c [Davies Liu] refactor
    8b45864 [Davies Liu] fix codegen with UTF8String
    bb52e44 [Davies Liu] fix scala style
    c7dd4d2 [Davies Liu] fix some catalyst tests
    38c303e [Davies Liu] fix python sql tests
    5f9e120 [Davies Liu] fix sql tests
    6b499ac [Davies Liu] fix style
    a85fb27 [Davies Liu] refactor
    d32abd1 [Davies Liu] fix utf8 for python api
    4699c3a [Davies Liu] use Array[Byte] in UTF8String
    21f67c6 [Davies Liu] cleanup
    685fd07 [Davies Liu] use UTF8String instead of String for StringType
    85842760
    History
    [SPARK-6638] [SQL] Improve performance of StringType in SQL
    Davies Liu authored
    This PR change the internal representation for StringType from java.lang.String to UTF8String, which is implemented use ArrayByte.
    
    This PR should not break any public API, Row.getString() will still return java.lang.String.
    
    This is the first step of improve the performance of String in SQL.
    
    cc rxin
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #5350 from davies/string and squashes the following commits:
    
    3b7bfa8 [Davies Liu] fix schema of AddJar
    2772f0d [Davies Liu] fix new test failure
    6d776a9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    59025c8 [Davies Liu] address comments from @marmbrus
    341ec2c [Davies Liu] turn off scala style check in UTF8StringSuite
    744788f [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    b04a19c [Davies Liu] add comment for getString/setString
    08d897b [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    5116b43 [Davies Liu] rollback unrelated changes
    1314a37 [Davies Liu] address comments from Yin
    867bf50 [Davies Liu] fix String filter push down
    13d9d42 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    2089d24 [Davies Liu] add hashcode check back
    ac18ae6 [Davies Liu] address comment
    fd11364 [Davies Liu] optimize UTF8String
    8d17f21 [Davies Liu] fix hive compatibility tests
    e5fa5b8 [Davies Liu] remove clone in UTF8String
    28f3d81 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    28d6f32 [Davies Liu] refactor
    537631c [Davies Liu] some comment about Date
    9f4c194 [Davies Liu] convert data type for data source
    956b0a4 [Davies Liu] fix hive tests
    73e4363 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
    9dc32d1 [Davies Liu] fix some hive tests
    23a766c [Davies Liu] refactor
    8b45864 [Davies Liu] fix codegen with UTF8String
    bb52e44 [Davies Liu] fix scala style
    c7dd4d2 [Davies Liu] fix some catalyst tests
    38c303e [Davies Liu] fix python sql tests
    5f9e120 [Davies Liu] fix sql tests
    6b499ac [Davies Liu] fix style
    a85fb27 [Davies Liu] refactor
    d32abd1 [Davies Liu] fix utf8 for python api
    4699c3a [Davies Liu] use Array[Byte] in UTF8String
    21f67c6 [Davies Liu] cleanup
    685fd07 [Davies Liu] use UTF8String instead of String for StringType