Skip to content
Snippets Groups Projects
  • gatorsmile's avatar
    ef7a5e0b
    [SPARK-14603][SQL][FOLLOWUP] Verification of Metadata Operations by Session Catalog · ef7a5e0b
    gatorsmile authored
    #### What changes were proposed in this pull request?
    This follow-up PR is to address the remaining comments in https://github.com/apache/spark/pull/12385
    
    The major change in this PR is to issue better error messages in PySpark by using the mechanism that was proposed by davies in https://github.com/apache/spark/pull/7135
    
    For example, in PySpark, if we input the following statement:
    ```python
    >>> l = [('Alice', 1)]
    >>> df = sqlContext.createDataFrame(l)
    >>> df.createTempView("people")
    >>> df.createTempView("people")
    ```
    Before this PR, the exception we will get is like
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/dataframe.py", line 152, in createTempView
        self._jdf.createTempView(name)
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 63, in deco
        return f(*a, **kw)
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o35.createTempView.
    : org.apache.spark.sql.catalyst.analysis.TempTableAlreadyExistsException: Temporary table 'people' already exists;
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTempView(SessionCatalog.scala:324)
        at org.apache.spark.sql.SparkSession.createTempView(SparkSession.scala:523)
        at org.apache.spark.sql.Dataset.createTempView(Dataset.scala:2328)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:211)
        at java.lang.Thread.run(Thread.java:745)
    ```
    After this PR, the exception we will get become cleaner:
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/dataframe.py", line 152, in createTempView
        self._jdf.createTempView(name)
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 75, in deco
        raise AnalysisException(s.split(': ', 1)[1], stackTrace)
    pyspark.sql.utils.AnalysisException: u"Temporary table 'people' already exists;"
    ```
    
    #### How was this patch tested?
    Fixed an existing PySpark test case
    
    Author: gatorsmile <gatorsmile@gmail.com>
    
    Closes #13126 from gatorsmile/followup-14684.
    ef7a5e0b
    History
    [SPARK-14603][SQL][FOLLOWUP] Verification of Metadata Operations by Session Catalog
    gatorsmile authored
    #### What changes were proposed in this pull request?
    This follow-up PR is to address the remaining comments in https://github.com/apache/spark/pull/12385
    
    The major change in this PR is to issue better error messages in PySpark by using the mechanism that was proposed by davies in https://github.com/apache/spark/pull/7135
    
    For example, in PySpark, if we input the following statement:
    ```python
    >>> l = [('Alice', 1)]
    >>> df = sqlContext.createDataFrame(l)
    >>> df.createTempView("people")
    >>> df.createTempView("people")
    ```
    Before this PR, the exception we will get is like
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/dataframe.py", line 152, in createTempView
        self._jdf.createTempView(name)
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 63, in deco
        return f(*a, **kw)
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o35.createTempView.
    : org.apache.spark.sql.catalyst.analysis.TempTableAlreadyExistsException: Temporary table 'people' already exists;
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTempView(SessionCatalog.scala:324)
        at org.apache.spark.sql.SparkSession.createTempView(SparkSession.scala:523)
        at org.apache.spark.sql.Dataset.createTempView(Dataset.scala:2328)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:211)
        at java.lang.Thread.run(Thread.java:745)
    ```
    After this PR, the exception we will get become cleaner:
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/dataframe.py", line 152, in createTempView
        self._jdf.createTempView(name)
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
      File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 75, in deco
        raise AnalysisException(s.split(': ', 1)[1], stackTrace)
    pyspark.sql.utils.AnalysisException: u"Temporary table 'people' already exists;"
    ```
    
    #### How was this patch tested?
    Fixed an existing PySpark test case
    
    Author: gatorsmile <gatorsmile@gmail.com>
    
    Closes #13126 from gatorsmile/followup-14684.