Skip to content
Snippets Groups Projects
  • hyukjinkwon's avatar
    fab563b9
    [SPARK-23517][PYTHON] Make `pyspark.util._exception_message` produce the trace... · fab563b9
    hyukjinkwon authored
    [SPARK-23517][PYTHON] Make `pyspark.util._exception_message` produce the trace from Java side by Py4JJavaError
    
    ## What changes were proposed in this pull request?
    
    This PR proposes for `pyspark.util._exception_message` to produce the trace from Java side by `Py4JJavaError`.
    
    Currently, in Python 2, it uses `message` attribute which `Py4JJavaError` didn't happen to have:
    
    ```python
    >>> from pyspark.util import _exception_message
    >>> try:
    ...     sc._jvm.java.lang.String(None)
    ... except Exception as e:
    ...     pass
    ...
    >>> e.message
    ''
    ```
    
    Seems we should use `str` instead for now:
    
     https://github.com/bartdag/py4j/blob/aa6c53b59027925a426eb09b58c453de02c21b7c/py4j-python/src/py4j/protocol.py#L412
    
    but this doesn't address the problem with non-ascii string from Java side -
     `https://github.com/bartdag/py4j/issues/306`
    
    So, we could directly call `__str__()`:
    
    ```python
    >>> e.__str__()
    u'An error occurred while calling None.java.lang.String.\n: java.lang.NullPointerException\n\tat java.lang.String.<init>(String.java:588)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:422)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:238)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:214)\n\tat java.lang.Thread.run(Thread.java:745)\n'
    ```
    
    which doesn't type coerce unicodes to `str` in Python 2.
    
    This can be actually a problem:
    
    ```python
    from pyspark.sql.functions import udf
    spark.conf.set("spark.sql.execution.arrow.enabled", True)
    spark.range(1).select(udf(lambda x: [[]])()).toPandas()
    ```
    
    **Before**
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas
        raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
    RuntimeError:
    Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this.
    ```
    
    **After**
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas
        raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
    RuntimeError: An error occurred while calling o47.collectAsArrowToPython.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 1 times, most recent failure: Lost task 7.0 in stage 0.0 (TID 7, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
      File "/.../spark/python/pyspark/worker.py", line 245, in main
        process()
      File "/.../spark/python/pyspark/worker.py", line 240, in process
    ...
    Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this.
    ```
    
    ## How was this patch tested?
    
    Manually tested and unit tests were added.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #20680 from HyukjinKwon/SPARK-23517.
    fab563b9
    History
    [SPARK-23517][PYTHON] Make `pyspark.util._exception_message` produce the trace...
    hyukjinkwon authored
    [SPARK-23517][PYTHON] Make `pyspark.util._exception_message` produce the trace from Java side by Py4JJavaError
    
    ## What changes were proposed in this pull request?
    
    This PR proposes for `pyspark.util._exception_message` to produce the trace from Java side by `Py4JJavaError`.
    
    Currently, in Python 2, it uses `message` attribute which `Py4JJavaError` didn't happen to have:
    
    ```python
    >>> from pyspark.util import _exception_message
    >>> try:
    ...     sc._jvm.java.lang.String(None)
    ... except Exception as e:
    ...     pass
    ...
    >>> e.message
    ''
    ```
    
    Seems we should use `str` instead for now:
    
     https://github.com/bartdag/py4j/blob/aa6c53b59027925a426eb09b58c453de02c21b7c/py4j-python/src/py4j/protocol.py#L412
    
    but this doesn't address the problem with non-ascii string from Java side -
     `https://github.com/bartdag/py4j/issues/306`
    
    So, we could directly call `__str__()`:
    
    ```python
    >>> e.__str__()
    u'An error occurred while calling None.java.lang.String.\n: java.lang.NullPointerException\n\tat java.lang.String.<init>(String.java:588)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.lang.reflect.Constructor.newInstance(Constructor.java:422)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:238)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:214)\n\tat java.lang.Thread.run(Thread.java:745)\n'
    ```
    
    which doesn't type coerce unicodes to `str` in Python 2.
    
    This can be actually a problem:
    
    ```python
    from pyspark.sql.functions import udf
    spark.conf.set("spark.sql.execution.arrow.enabled", True)
    spark.range(1).select(udf(lambda x: [[]])()).toPandas()
    ```
    
    **Before**
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas
        raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
    RuntimeError:
    Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this.
    ```
    
    **After**
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/dataframe.py", line 2009, in toPandas
        raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
    RuntimeError: An error occurred while calling o47.collectAsArrowToPython.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 0.0 failed 1 times, most recent failure: Lost task 7.0 in stage 0.0 (TID 7, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
      File "/.../spark/python/pyspark/worker.py", line 245, in main
        process()
      File "/.../spark/python/pyspark/worker.py", line 240, in process
    ...
    Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this.
    ```
    
    ## How was this patch tested?
    
    Manually tested and unit tests were added.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #20680 from HyukjinKwon/SPARK-23517.