-
- Downloads
[SPARK-19843][SQL] UTF8String => (int / long) conversion expensive for invalid inputs
## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-19843 Created wrapper classes (`IntWrapper`, `LongWrapper`) to wrap the result of parsing (which are primitive types). In case of problem in parsing, the method would return a boolean. ## How was this patch tested? - Added new unit tests - Ran a prod job which had conversion from string -> int and verified the outputs ## Performance Tiny regression when all strings are valid integers ``` conversion to int: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------- trunk 502 / 522 33.4 29.9 1.0X SPARK-19843 493 / 503 34.0 29.4 1.0X ``` Huge gain when all strings are invalid integers ``` conversion to int: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------- trunk 33913 / 34219 0.5 2021.4 1.0X SPARK-19843 154 / 162 108.8 9.2 220.0X ``` Author: Tejas Patil <tejasp@fb.com> Closes #17184 from tejasapatil/SPARK-19843_is_numeric_maybe.
Showing
- common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java 72 additions, 48 deletions...c/main/java/org/apache/spark/unsafe/types/UTF8String.java
- common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 125 additions, 3 deletions...t/java/org/apache/spark/unsafe/types/UTF8StringSuite.java
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala 50 additions, 31 deletions...cala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Please register or sign in to comment