Skip to content
Snippets Groups Projects
  1. Feb 17, 2015
    • Cheng Hao's avatar
      [SQL] [Minor] Update the HiveContext Unittest · 9d281fa5
      Cheng Hao authored
      In unit test, the table src(key INT, value STRING) is not the same as HIVE src(key STRING, value STRING)
      https://github.com/apache/hive/blob/branch-0.13/data/scripts/q_test_init.sql
      
      And in the reflect.q, test failed for expression `reflect("java.lang.Integer", "valueOf", key, 16)`, which expect the argument `key` as STRING not INT.
      
      This PR doesn't aim to change the `src` schema, we can do that after 1.3 released, however, we probably need to re-generate all the golden files.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4584 from chenghao-intel/reflect and squashes the following commits:
      
      e5bdc3a [Cheng Hao] Move the test case reflect into blacklist
      184abfd [Cheng Hao] revert the change to table src1
      d9bcf92 [Cheng Hao] Update the HiveContext Unittest
      9d281fa5
    • Liang-Chi Hsieh's avatar
      [Minor][SQL] Use same function to check path parameter in JSONRelation · ac506b7c
      Liang-Chi Hsieh authored
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4649 from viirya/use_checkpath and squashes the following commits:
      
      0f9a1a1 [Liang-Chi Hsieh] Use same function to check path parameter.
      ac506b7c
    • Liang-Chi Hsieh's avatar
      [SPARK-5862][SQL] Only transformUp the given plan once in HiveMetastoreCatalog · 4611de1c
      Liang-Chi Hsieh authored
      Current `ParquetConversions` in `HiveMetastoreCatalog` will transformUp the given plan multiple times if there are many Metastore Parquet tables. Since the transformUp operation is recursive, it should be better to only perform it once.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4651 from viirya/parquet_atonce and squashes the following commits:
      
      c1ed29d [Liang-Chi Hsieh] Fix bug.
      e0f919b [Liang-Chi Hsieh] Only transformUp the given plan once.
      4611de1c
    • CodingCat's avatar
      [Minor] fix typo in SQL document · 31efb39c
      CodingCat authored
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #4656 from CodingCat/fix_typo and squashes the following commits:
      
      b41d15c [CodingCat] recover
      689fe46 [CodingCat] fix typo
      31efb39c
    • Davies Liu's avatar
      [SPARK-5864] [PySpark] support .jar as python package · fc4eb950
      Davies Liu authored
      A jar file containing Python sources in it could be used as a Python package, just like zip file.
      
      spark-submit already put the jar file into PYTHONPATH, this patch also put it in the sys.path, then it could be used in Python worker.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4652 from davies/jar and squashes the following commits:
      
      17d3f76 [Davies Liu] support .jar as python package
      fc4eb950
    • Sean Owen's avatar
      SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager · 49c19fdb
      Sean Owen authored
      Avoid call to remove shutdown hook being called from shutdown hook
      
      CC pwendell JoshRosen MattWhelan
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4648 from srowen/SPARK-5841.2 and squashes the following commits:
      
      51548db [Sean Owen] Avoid call to remove shutdown hook being called from shutdown hook
      49c19fdb
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 24f358b9
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3297 (close requested by 'andrewor14')
      Closes #3345 (close requested by 'pwendell')
      Closes #2729 (close requested by 'srowen')
      Closes #2320 (close requested by 'pwendell')
      Closes #4529 (close requested by 'andrewor14')
      Closes #2098 (close requested by 'srowen')
      Closes #4120 (close requested by 'andrewor14')
      24f358b9
    • MechCoder's avatar
      [SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees · 9b746f38
      MechCoder authored
      For unordered features, it is sufficient to use splits since the threshold of the split corresponds the threshold of the HighSplit of the bin and there is no use of the LowSplit.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4231 from MechCoder/spark-3381 and squashes the following commits:
      
      58c19a5 [MechCoder] COSMIT
      c274b74 [MechCoder] Remove unordered feature calculation in labeledPointToTreePoint
      b2b9b89 [MechCoder] COSMIT
      d3ee042 [MechCoder] [SPARK-3381] [MLlib] Eliminate bins for unordered features
      9b746f38
    • xukun 00228947's avatar
      [SPARK-5661]function hasShutdownDeleteTachyonDir should use... · b271c265
      xukun 00228947 authored
      [SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file
      
      hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not shutdownDeletePaths) to determine Whether contain file. To solve it ,delete two unused function.
      
      Author: xukun 00228947 <xukun.xu@huawei.com>
      Author: viper-kun <xukun.xu@huawei.com>
      
      Closes #4418 from viper-kun/deleteunusedfun and squashes the following commits:
      
      87340eb [viper-kun] fix style
      3d6c69e [xukun 00228947] fix bug
      2bc397e [xukun 00228947] deleteunusedfun
      b271c265
    • Ryan Williams's avatar
      [SPARK-5778] throw if nonexistent metrics config file provided · d8f69cf7
      Ryan Williams authored
      previous behavior was to log an error; this is fine in the general
      case where no `spark.metrics.conf` parameter was specified, in which
      case a default `metrics.properties` is looked for, and the execption
      logged and suppressed if it doesn't exist.
      
      if the user has purposefully specified a metrics.conf file, however,
      it makes more sense to show them an error when said file doesn't
      exist.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #4571 from ryan-williams/metrics and squashes the following commits:
      
      5bccb14 [Ryan Williams] private-ize some MetricsConfig members
      08ff998 [Ryan Williams] rename METRICS_CONF: DEFAULT_METRICS_CONF_FILENAME
      f4d7fab [Ryan Williams] fix tests
      ad24b0e [Ryan Williams] add "metrics.properties" to .rat-excludes
      94e810b [Ryan Williams] throw if nonexistent Sink class is specified
      31d2c30 [Ryan Williams] metrics code review feedback
      56287db [Ryan Williams] throw if nonexistent metrics config file provided
      d8f69cf7
    • Davies Liu's avatar
      [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API · d8adefef
      Davies Liu authored
      1. added explain()
      2. add isLocal()
      3. do not call show() in __repl__
      4. add foreach() and foreachPartition()
      5. add distinct()
      6. fix functions.col()/column()/lit()
      7. fix unit tests in sql/functions.py
      8. fix unicode in showString()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4645 from davies/df6 and squashes the following commits:
      
      6b46a2c [Davies Liu] fix DataFrame Python API
      d8adefef
    • Michael Armbrust's avatar
      [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation · c74b07fa
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4642 from marmbrus/docs and squashes the following commits:
      
      d291c34 [Michael Armbrust] python tests
      9be66e3 [Michael Armbrust] comments
      d56afc2 [Michael Armbrust] fix style
      f004747 [Michael Armbrust] fix build
      c4a907b [Michael Armbrust] fix tests
      42e2b73 [Michael Armbrust] [SQL] Documentation / API Clean-up.
      c74b07fa
    • Xiangrui Meng's avatar
      [SPARK-5858][MLLIB] Remove unnecessary first() call in GLM · c76da36c
      Xiangrui Meng authored
      `numFeatures` is only used by multinomial logistic regression. Calling `.first()` for every GLM causes performance regression, especially in Python.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4647 from mengxr/SPARK-5858 and squashes the following commits:
      
      036dc7f [Xiangrui Meng] remove unnecessary first() call
      12c5548 [Xiangrui Meng] check numFeatures only once
      c76da36c
    • Patrick Wendell's avatar
      SPARK-5856: In Maven build script, launch Zinc with more memory · 3ce46e94
      Patrick Wendell authored
      I've seen out of memory exceptions when trying
      to run many parallel builds against the same Zinc
      server during packaging. We should use the same
      increased memory settings we use for Maven itself.
      
      I tested this and confirmed that the Nailgun JVM
      launched with higher memory.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4643 from pwendell/zinc-memory and squashes the following commits:
      
      717cfb0 [Patrick Wendell] SPARK-5856: Launch Zinc with larger memory options.
      3ce46e94
    • Josh Rosen's avatar
      Revert "[SPARK-5363] [PySpark] check ending mark in non-block way" · ee6e3eff
      Josh Rosen authored
      This reverts commits ac6fe67e and c06e42f2.
      ee6e3eff
    • jerryshao's avatar
      [SPARK-5826][Streaming] Fix Configuration not serializable problem · a65766bf
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #4612 from jerryshao/SPARK-5826 and squashes the following commits:
      
      7ec71db [jerryshao] Remove transient for conf statement
      88d84e6 [jerryshao] Fix Configuration not serializable problem
      a65766bf
    • Patrick Wendell's avatar
      HOTFIX: Style issue causing build break · c06e42f2
      Patrick Wendell authored
      Caused by #4601
      c06e42f2
    • Xiangrui Meng's avatar
      [SPARK-5802][MLLIB] cache transformed data in glm · fd84229e
      Xiangrui Meng authored
      If we need to transform the input data, we should cache the output to avoid re-computing feature vectors every iteration. dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4593 from mengxr/SPARK-5802 and squashes the following commits:
      
      ae3be84 [Xiangrui Meng] cache transformed data in glm
      fd84229e
  2. Feb 16, 2015
    • Reynold Xin's avatar
      [SPARK-5853][SQL] Schema support in Row. · d380f324
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4640 from rxin/SPARK-5853 and squashes the following commits:
      
      9c6f569 [Reynold Xin] [SPARK-5853][SQL] Schema support in Row.
      d380f324
    • Patrick Wendell's avatar
      SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream · a51d51ff
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4638 from pwendell/SPARK-5850 and squashes the following commits:
      
      386126f [Patrick Wendell] SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream.
      a51d51ff
    • Davies Liu's avatar
      [SPARK-5363] [PySpark] check ending mark in non-block way · ac6fe67e
      Davies Liu authored
      There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.
      
      This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.
      
      There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.
      
      cc JoshRosen pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4601 from davies/freeze and squashes the following commits:
      
      e15a8c3 [Davies Liu] update logging
      890329c [Davies Liu] Merge branch 'freeze' of github.com:davies/spark into freeze
      2bd2228 [Davies Liu] add more logging
      656d544 [Davies Liu] Update PythonRDD.scala
      05e1085 [Davies Liu] check ending mark in non-block way
      ac6fe67e
    • Reynold Xin's avatar
      [SQL] Various DataFrame doc changes. · 0e180bfc
      Reynold Xin authored
      Added a bunch of tags.
      
      Also changed parquetFile to take varargs rather than a string followed by varargs.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4636 from rxin/df-doc and squashes the following commits:
      
      651f80c [Reynold Xin] Fixed parquetFile in PySpark.
      8dc3024 [Reynold Xin] [SQL] Various DataFrame doc changes.
      0e180bfc
    • Josh Rosen's avatar
      [SPARK-5849] Handle more types of invalid JSON requests in SubmitRestProtocolMessage.parseAction · 58a82a78
      Josh Rosen authored
      This patch improves SubmitRestProtocol's handling of invalid JSON requests in cases where those requests were parsable as JSON but not as JSON objects (e.g. they could be parsed as ararys or strings).  I replaced an unchecked cast with pattern-matching and added a new test case.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4637 from JoshRosen/rest-protocol-cast and squashes the following commits:
      
      b3f282b [Josh Rosen] [SPARK-5849] Handle more types of invalid JSON in SubmitRestProtocolMessage.parseAction
      58a82a78
    • azagrebin's avatar
      [SPARK-3340] Deprecate ADD_JARS and ADD_FILES · 16687651
      azagrebin authored
      I created a patch that disables the environment variables.
      Thereby scala or python shell log a warning message to notify user about the deprecation
      with the following message:
      scala: "ADD_JARS environment variable is deprecated, use --jar spark submit argument instead"
      python: "Warning: ADD_FILES environment variable is deprecated, use --py-files argument instead"
      
      Is it what is expected or the code associated with the variables should be just completely removed?
      Should it be somewhere documented?
      
      Author: azagrebin <azagrebin@gmail.com>
      
      Closes #4616 from azagrebin/master and squashes the following commits:
      
      bab1aa9 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES: minor readability issue
      0643895 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES: add warning messages
      42f0107 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES
      16687651
    • Davies Liu's avatar
      [SPARK-5788] [PySpark] capture the exception in python write thread · b1bd1dd3
      Davies Liu authored
      The exception in Python writer thread will shutdown executor.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4577 from davies/exception and squashes the following commits:
      
      eb0ceff [Davies Liu] Update PythonRDD.scala
      139b0db [Davies Liu] capture the exception in python write thread
      b1bd1dd3
    • Matt Whelan's avatar
      SPARK-5848: tear down the ConsoleProgressBar timer · 1294a6e0
      Matt Whelan authored
      The timer is a GC root, and failing to terminate it leaks SparkContext
      instances.
      
      Author: Matt Whelan <mwhelan@perka.com>
      
      Closes #4635 from MattWhelan/SPARK-5848 and squashes the following commits:
      
      2a1e8a5 [Matt Whelan] SPARK-5848: teardown the ConsoleProgressBar timer
      1294a6e0
    • Yin Huai's avatar
      [SPARK-4865][SQL]Include temporary tables in SHOW TABLES · e189cbb0
      Yin Huai authored
      This PR adds a `ShowTablesCommand` to support `SHOW TABLES [IN databaseName]` SQL command. The result of `SHOW TABLE` has two columns, `tableName` and `isTemporary`. For temporary tables, the value of `isTemporary` column will be `false`.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-4865
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4618 from yhuai/showTablesCommand and squashes the following commits:
      
      0c09791 [Yin Huai] Use ShowTablesCommand.
      85ee76d [Yin Huai] Since SHOW TABLES is not a Hive native command any more and we will not see "OK" (originally generated by Hive's driver), use SHOW DATABASES in the test.
      94bacac [Yin Huai] Add SHOW TABLES to the list of noExplainCommands.
      d71ed09 [Yin Huai] Fix test.
      a4a6ec3 [Yin Huai] Add SHOW TABLE command.
      e189cbb0
    • kai's avatar
      [SQL] Optimize arithmetic and predicate operators · cb6c48c8
      kai authored
      Existing implementation of arithmetic operators and BinaryComparison operators have redundant type checking codes, e.g.:
      Expression.n2 is used by Add/Subtract/Multiply.
      (1) n2 always checks left.dataType == right.dataType. However, this checking should be done once when we resolve expression types;
      (2) n2 requires dataType is a NumericType. This can be done once.
      
      This PR optimizes arithmetic and predicate operators by removing such redundant type-checking codes.
      
      Some preliminary benchmarking on 10G TPC-H data over 5 r3.2xlarge EC2 machines shows that this PR can reduce the query time by 5.5% to 11%.
      The benchmark queries follow the template below, where OP is plus/minus/times/divide/remainder/bitwise and/bitwise or/bitwise xor.
      
      SELECT l_returnflag,  l_linestatus, SUM(l_quantity OP cnt1), SUM(l_quantity OP cnt2), ...., SUM(l_quantity OP cnt700)
      FROM (
          SELECT l_returnflag, l_linestatus, l_quantity, 1 AS cnt1, 2 AS cnt2, ..., 700 AS cnt700
          FROM lineitem
          WHERE l_shipdate <= '1998-09-01'
      )
      GROUP BY l_returnflag,  l_linestatus;
      
      Author: kai <kaizeng@eecs.berkeley.edu>
      
      Closes #4472 from kai-zeng/arithmetic-optimize and squashes the following commits:
      
      fef0cf1 [kai] Merge branch 'master' of github.com:apache/spark into arithmetic-optimize
      4b3a1bb [kai] chmod a-x
      5a41e49 [kai] chmod a-x Expression.scala
      cb37c94 [kai] rebase onto spark master
      7f6e968 [kai] chmod 100755 -> 100644
      6cddb46 [kai] format
      7490dbc [kai] fix unresolved-expression exception for EqualTo
      9c40bc0 [kai] fix bitwisenot
      3cbd363 [kai] clean up test code
      ca47801 [kai] override evalInternal for bitwise ops
      8fa84a1 [kai] add bitwise or and xor
      6892fc4 [kai] revert override evalInternal
      f8eba24 [kai] override evalInternal
      31ccdd4 [kai] rewrite all bitwise op and remove evalInternal
      86297e2 [kai] generalized
      cb92ae1 [kai] bitwise-and: override eval
      97a7d6c [kai] bitwise-and: override evalInternal using and func
      0906c39 [kai] add bitwise test
      62abbbc [kai] clean up predicate and arithmetic
      b34d58d [kai] add caching and benmark option
      12c5b32 [kai] override eval
      1cd7571 [kai] fix sqrt and maxof
      03fd0c3 [kai] fix predicate
      16fd84c [kai] optimize + - * / % -(unary) abs < > <= >=
      fd95823 [kai] remove unnecessary type checking
      24d062f [kai] test suite
      cb6c48c8
    • Yin Huai's avatar
      [SPARK-5839][SQL]HiveMetastoreCatalog does not recognize table names and... · f3ff1eb2
      Yin Huai authored
      [SPARK-5839][SQL]HiveMetastoreCatalog does not recognize table names and aliases of data source tables.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-5839
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4626 from yhuai/SPARK-5839 and squashes the following commits:
      
      f779d85 [Yin Huai] Use subqeury to wrap replaced ParquetRelation.
      2695f13 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-5839
      f1ba6ca [Yin Huai] Address comment.
      2c7fa08 [Yin Huai] Use Subqueries to wrap a data source table.
      f3ff1eb2
    • Yin Huai's avatar
      [SPARK-5746][SQL] Check invalid cases for the write path of data source API · 5b6cd65c
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5746
      
      liancheng marmbrus
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4617 from yhuai/insertOverwrite and squashes the following commits:
      
      8e3019d [Yin Huai] Fix compilation error.
      499e8e7 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
      e76e85a [Yin Huai] Address comments.
      ac31b3c [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
      f30bdad [Yin Huai] Use toDF.
      99da57e [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
      6b7545c [Yin Huai] Add a pre write check to the data source API.
      a88c516 [Yin Huai] DDLParser will take a parsering function to take care CTAS statements.
      5b6cd65c
    • Patrick Wendell's avatar
      HOTFIX: Break in Jekyll build from #4589 · 04b401da
      Patrick Wendell authored
      That patch had a line break in the middle of a {{ }} expression, which is not allowed.
      04b401da
    • Josh Rosen's avatar
      [SPARK-2313] Use socket to communicate GatewayServer port back to Python driver · 0cfda846
      Josh Rosen authored
      This patch changes PySpark so that the GatewayServer's port is communicated back to the Python process that launches it over a local socket instead of a pipe.  The old pipe-based approach was brittle and could fail if `spark-submit` printed unexpected to stdout.
      
      To accomplish this, I wrote a custom `PythonGatewayServer.main()` function to use in place of Py4J's `GatewayServer.main()`.
      
      Closes #3424.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits:
      
      6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed
      0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies.
      9bdb4b6 [Josh Rosen] Handle case where getListeningPort returns -1
      3fb7ed1 [Josh Rosen] Remove stdout=PIPE
      2458934 [Josh Rosen] Use underscore to mark env var. as private
      d12c95d [Josh Rosen] Use Logging and Utils.tryOrExit()
      e5f9730 [Josh Rosen] Wrap everything in a giant try-block
      2f70689 [Josh Rosen] Use stdin PIPE to share fate with driver
      8bf956e [Josh Rosen] Initial cut at passing Py4J gateway port back to driver via socket
      0cfda846
    • Matt Whelan's avatar
      SPARK-5357: Update commons-codec version to 1.10 (current) · c01c4ebc
      Matt Whelan authored
      Resolves https://issues.apache.org/jira/browse/SPARK-5357
      
      In commons-codec 1.5, Base64 instances are not thread safe.  That was only true from 1.4-1.6.
      
      Author: Matt Whelan <mwhelan@perka.com>
      
      Closes #4153 from MattWhelan/depsUpdate and squashes the following commits:
      
      b4a91f4 [Matt Whelan] SPARK-5357: Update commons-codec version to 1.10 (current)
      c01c4ebc
    • Matt Whelan's avatar
      SPARK-5841: remove DiskBlockManager shutdown hook on stop · bb05982d
      Matt Whelan authored
      After a call to stop, the shutdown hook is redundant, and causes a
      memory leak.
      
      Author: Matt Whelan <mwhelan@perka.com>
      
      Closes #4627 from MattWhelan/SPARK-5841 and squashes the following commits:
      
      d5f5c7f [Matt Whelan] SPARK-5841: remove DiskBlockManager shutdown hook on stop
      bb05982d
    • Cheng Lian's avatar
      [SPARK-5833] [SQL] Adds REFRESH TABLE command · c51ab37f
      Cheng Lian authored
      Lifts `HiveMetastoreCatalog.refreshTable` to `Catalog`. Adds `RefreshTable` command to refresh (possibly cached) metadata in external data sources tables.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4624)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4624 from liancheng/refresh-table and squashes the following commits:
      
      8d1aa4c [Cheng Lian] Adds REFRESH TABLE command
      c51ab37f
    • Cheng Lian's avatar
      [SPARK-5296] [SQL] Add more filter types for data sources API · 6f54dee6
      Cheng Lian authored
      This PR adds the following filter types for data sources API:
      
      - `IsNull`
      - `IsNotNull`
      - `Not`
      - `And`
      - `Or`
      
      The code which converts Catalyst predicate expressions to data sources filters is very similar to filter conversion logics in `ParquetFilters` which converts Catalyst predicates to Parquet filter predicates. In this way we can support nested AND/OR/NOT predicates without changing current `BaseScan` type hierarchy.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4623)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4623 from liancheng/more-fiters and squashes the following commits:
      
      1b296f4 [Cheng Lian] Add more filter types for data sources API
      6f54dee6
    • OopsOutOfMemory's avatar
      [SQL] Add fetched row count in SparkSQLCLIDriver · b4d7c703
      OopsOutOfMemory authored
      before this change:
      ```scala
      Time taken: 0.619 seconds
      ```
      
      after this change :
      ```scala
      Time taken: 0.619 seconds, Fetched: 4 row(s)
      ```
      
      Author: OopsOutOfMemory <victorshengli@126.com>
      
      Closes #4604 from OopsOutOfMemory/rowcount and squashes the following commits:
      
      7252dea [OopsOutOfMemory] add fetched row count
      b4d7c703
    • Michael Armbrust's avatar
      [SQL] Initial support for reporting location of error in sql string · 104b2c45
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4587 from marmbrus/position and squashes the following commits:
      
      0810052 [Michael Armbrust] fix tests
      395c019 [Michael Armbrust] Merge remote-tracking branch 'marmbrus/position' into position
      e155dce [Michael Armbrust] more errors
      f3efa51 [Michael Armbrust] Update AnalysisException.scala
      d45ff60 [Michael Armbrust] [SQL] Initial support for reporting location of error in sql string
      104b2c45
    • Daoyuan Wang's avatar
      [SPARK-5824] [SQL] add null format in ctas and set default col comment to null · 275a0c08
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #4609 from adrian-wang/ctas and squashes the following commits:
      
      0a75d5a [Daoyuan Wang] reorder import
      93d1863 [Daoyuan Wang] add null format in ctas and set default col comment to null
      275a0c08
    • Cheng Hao's avatar
      [SQL] [Minor] Update the SpecificMutableRow.copy · cc552e04
      Cheng Hao authored
      When profiling the Join / Aggregate queries via VisualVM, I noticed lots of `SpecificMutableRow` objects created, as well as the `MutableValue`, since the `SpecificMutableRow` are mostly used in data source implementation, but the `copy` method could be called multiple times in upper modules (e.g. in Join / aggregation etc.), duplicated instances created should be avoid.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4619 from chenghao-intel/specific_mutable_row and squashes the following commits:
      
      9300d23 [Cheng Hao] update the SpecificMutableRow.copy
      cc552e04
Loading