Skip to content
Snippets Groups Projects
  1. Jan 16, 2017
    • Felix Cheung's avatar
      [SPARK-18828][SPARKR] Refactor scripts for R · c84f7d3e
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Refactored script to remove duplications and clearer purpose for each script
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16249 from felixcheung/rscripts.
      c84f7d3e
    • Felix Cheung's avatar
      [SPARK-19232][SPARKR] Update Spark distribution download cache location on Windows · a115a543
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Windows seems to be the only place with appauthor in the path, for which we should say "Apache" (and case sensitive)
      Current path of `AppData\Local\spark\spark\Cache` is a bit odd.
      
      ## How was this patch tested?
      
      manual.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16590 from felixcheung/rcachedir.
      a115a543
    • wm624@hotmail.com's avatar
      [SPARK-19066][SPARKR] SparkR LDA doesn't set optimizer correctly · 12c8c216
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      spark.lda passes the optimizer "em" or "online" as a string to the backend. However, LDAWrapper doesn't set optimizer based on the value from R. Therefore, for optimizer "em", the `isDistributed` field is FALSE, which should be TRUE based on scala code.
      
      In addition, the `summary` method should bring back the results related to `DistributedLDAModel`.
      
      ## How was this patch tested?
      Manual tests by comparing with scala example.
      Modified the current unit test: fix the incorrect unit test and add necessary tests for `summary` method.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16464 from wangmiao1981/new.
      12c8c216
    • jiangxingbo's avatar
      [SPARK-18801][SQL][FOLLOWUP] Alias the view with its child · e635cbb6
      jiangxingbo authored
      ## What changes were proposed in this pull request?
      
      This PR is a follow-up to address the comments https://github.com/apache/spark/pull/16233/files#r95669988 and https://github.com/apache/spark/pull/16233/files#r95662299.
      
      We try to wrap the child by:
      1. Generate the `queryOutput` by:
          1.1. If the query column names are defined, map the column names to attributes in the child output by name;
          1.2. Else set the child output attributes to `queryOutput`.
      2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match, try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
      3. Add a Project over the child, with the new output generated by the previous steps.
      If the view output doesn't have the same number of columns neither with the child output, nor with the query column names, throw an AnalysisException.
      
      ## How was this patch tested?
      
      Add new test cases in `SQLViewSuite`.
      
      Author: jiangxingbo <jiangxb1987@gmail.com>
      
      Closes #16561 from jiangxb1987/alias-view.
      e635cbb6
    • Liang-Chi Hsieh's avatar
      [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parquet · 61e48f52
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      We have a config `spark.sql.files.ignoreCorruptFiles` which can be used to ignore corrupt files when reading files in SQL. Currently the `ignoreCorruptFiles` config has two issues and can't work for Parquet:
      
      1. We only ignore corrupt files in `FileScanRDD` . Actually, we begin to read those files as early as inferring data schema from the files. For corrupt files, we can't read the schema and fail the program. A related issue reported at http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tc20418.html
      2. In `FileScanRDD`, we assume that we only begin to read the files when starting to consume the iterator. However, it is possibly the files are read before that. In this case, `ignoreCorruptFiles` config doesn't work too.
      
      This patch targets Parquet datasource. If this direction is ok, we can address the same issue for other datasources like Orc.
      
      Two main changes in this patch:
      
      1. Replace `ParquetFileReader.readAllFootersInParallel` by implementing the logic to read footers in multi-threaded manner
      
          We can't ignore corrupt files if we use `ParquetFileReader.readAllFootersInParallel`. So this patch implements the logic to do the similar thing in `readParquetFootersInParallel`.
      
      2. In `FileScanRDD`, we need to ignore corrupt file too when we call `readFunction` to return iterator.
      
      One thing to notice is:
      
      We read schema from Parquet file's footer. The method to read footer `ParquetFileReader.readFooter` throws `RuntimeException`, instead of `IOException`, if it can't successfully read the footer. Please check out https://github.com/apache/parquet-mr/blob/df9d8e415436292ae33e1ca0b8da256640de9710/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L470. So this patch catches `RuntimeException`.  One concern is that it might also shadow other runtime exceptions other than reading corrupt files.
      
      ## How was this patch tested?
      
      Jenkins tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #16474 from viirya/fix-ignorecorrupted-parquet-files.
      61e48f52
  2. Jan 15, 2017
    • gatorsmile's avatar
      [SPARK-19120] Refresh Metadata Cache After Loading Hive Tables · de62ddf7
      gatorsmile authored
      ### What changes were proposed in this pull request?
      ```Scala
              sql("CREATE TABLE tab (a STRING) STORED AS PARQUET")
      
              // This table fetch is to fill the cache with zero leaf files
              spark.table("tab").show()
      
              sql(
                s"""
                   |LOAD DATA LOCAL INPATH '$newPartitionDir' OVERWRITE
                   |INTO TABLE tab
                 """.stripMargin)
      
              spark.table("tab").show()
      ```
      
      In the above example, the returned result is empty after table loading. The metadata cache could be out of dated after loading new data into the table, because loading/inserting does not update the cache. So far, the metadata cache is only used for data source tables. Thus, for Hive serde tables, only `parquet` and `orc` formats are facing such issues, because the Hive serde tables in the format of  parquet/orc could be converted to data source tables when `spark.sql.hive.convertMetastoreParquet`/`spark.sql.hive.convertMetastoreOrc` is on.
      
      This PR is to refresh the metadata cache after processing the `LOAD DATA` command.
      
      In addition, Spark SQL does not convert **partitioned** Hive tables (orc/parquet) to data source tables in the write path, but the read path is using the metadata cache for both **partitioned** and non-partitioned Hive tables (orc/parquet). That means, writing the partitioned parquet/orc tables still use `InsertIntoHiveTable`, instead of `InsertIntoHadoopFsRelationCommand`. To avoid reading the out-of-dated cache, `InsertIntoHiveTable` needs to refresh the metadata cache for partitioned tables. Note, it does not need to refresh the cache for non-partitioned parquet/orc tables, because it does not call `InsertIntoHiveTable` at all. Based on the comments, this PR will keep the existing logics unchanged. That means, we always refresh the table no matter whether the table is partitioned or not.
      
      ### How was this patch tested?
      Added test cases in parquetSuites.scala
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #16500 from gatorsmile/refreshInsertIntoHiveTable.
      de62ddf7
    • uncleGen's avatar
      [SPARK-19206][DOC][DSTREAM] Fix outdated parameter descriptions in kafka010 · a5e651f4
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      Fix outdated parameter descriptions in kafka010
      
      ## How was this patch tested?
      
      cc koeninger  zsxwing
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #16569 from uncleGen/SPARK-19206.
      a5e651f4
    • Shixiong Zhu's avatar
      [SPARK-18971][CORE] Upgrade Netty to 4.0.43.Final · a8567e34
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Upgrade Netty to `4.0.43.Final` to add the fix for https://github.com/netty/netty/issues/6153
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16568 from zsxwing/SPARK-18971.
      a8567e34
    • Maurus Cuelenaere's avatar
      [MINOR][DOC] Document local[*,F] master modes · 3df2d931
      Maurus Cuelenaere authored
      ## What changes were proposed in this pull request?
      
      core/src/main/scala/org/apache/spark/SparkContext.scala contains LOCAL_N_FAILURES_REGEX master mode, but this was never documented, so do so.
      
      ## How was this patch tested?
      
      By using the Github Markdown preview feature.
      
      Author: Maurus Cuelenaere <mcuelenaere@gmail.com>
      
      Closes #16562 from mcuelenaere/patch-1.
      3df2d931
    • xiaojian.fxj's avatar
      [SPARK-19042] spark executor can't download the jars when uber jar's http url... · c9d612f8
      xiaojian.fxj authored
      [SPARK-19042] spark executor can't download the jars when uber jar's http url contains any query strings
      
      If the uber jars' https contains any query strings, the Executor.updateDependencies method can't can't download the jars correctly. This is because  the "localName = name.split("/").last" won't get the expected jar's url. The bug fix is the same as [SPARK-17855]
      
      Author: xiaojian.fxj <xiaojian.fxj@alibaba-inc.com>
      
      Closes #16509 from hustfxj/bug.
      c9d612f8
    • Tsuyoshi Ozawa's avatar
      [SPARK-19207][SQL] LocalSparkSession should use Slf4JLoggerFactory.INSTANCE · 9112f31b
      Tsuyoshi Ozawa authored
      ## What changes were proposed in this pull request?
      
      Using Slf4JLoggerFactory.INSTANCE instead of creating Slf4JLoggerFactory's object with constructor. It's deprecated.
      
      ## How was this patch tested?
      
      With running StateStoreRDDSuite.
      
      Author: Tsuyoshi Ozawa <ozawa@apache.org>
      
      Closes #16570 from oza/SPARK-19207.
      9112f31b
  3. Jan 14, 2017
    • windpiger's avatar
      [SPARK-19151][SQL] DataFrameWriter.saveAsTable support hive overwrite · 89423539
      windpiger authored
      ## What changes were proposed in this pull request?
      
      After [SPARK-19107](https://issues.apache.org/jira/browse/SPARK-19107), we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support.
      
      This PR implement:
      DataFrameWriter.saveAsTable work with hive format with overwrite mode
      
      ## How was this patch tested?
      unit test added
      
      Author: windpiger <songjun@outlook.com>
      
      Closes #16549 from windpiger/saveAsTableWithHiveOverwrite.
      89423539
    • hyukjinkwon's avatar
      [SPARK-19221][PROJECT INFRA][R] Add winutils binaries to the path in AppVeyor... · b6a7aa4f
      hyukjinkwon authored
      [SPARK-19221][PROJECT INFRA][R] Add winutils binaries to the path in AppVeyor tests for Hadoop libraries to call native codes properly
      
      ## What changes were proposed in this pull request?
      
      It seems Hadoop libraries need winutils binaries for native libraries in the path.
      
      It is not a problem in tests for now because we are only testing SparkR on Windows via AppVeyor but it can be a problem if we run Scala tests via AppVeyor as below:
      
      ```
       - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (3 seconds, 937 milliseconds)
         org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:609)
         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
         ...
      ```
      
      This PR proposes to add it to the `Path` for AppVeyor tests.
      
      ## How was this patch tested?
      
      Manually via AppVeyor.
      
      **Before**
      https://ci.appveyor.com/project/spark-test/spark/build/549-windows-complete/job/gc8a1pjua2bc4i8m
      
      **After**
      https://ci.appveyor.com/project/spark-test/spark/build/572-windows-complete/job/c4vrysr5uvj2hgu7
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16584 from HyukjinKwon/set-path-appveyor.
      b6a7aa4f
  4. Jan 13, 2017
    • Yucai Yu's avatar
      [SPARK-19180] [SQL] the offset of short should be 2 in OffHeapColumn · ad0dadaa
      Yucai Yu authored
      ## What changes were proposed in this pull request?
      
      the offset of short is 4 in OffHeapColumnVector's putShorts, but actually it should be 2.
      
      ## How was this patch tested?
      
      unit test
      
      Author: Yucai Yu <yucai.yu@intel.com>
      
      Closes #16555 from yucai/offheap_short.
      ad0dadaa
    • Felix Cheung's avatar
      [SPARK-18335][SPARKR] createDataFrame to support numPartitions parameter · b0e8eb6d
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      To allow specifying number of partitions when the DataFrame is created
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16512 from felixcheung/rnumpart.
      b0e8eb6d
    • Vinayak's avatar
      [SPARK-18687][PYSPARK][SQL] Backward compatibility - creating a Dataframe on a... · 285a7798
      Vinayak authored
      [SPARK-18687][PYSPARK][SQL] Backward compatibility - creating a Dataframe on a new SQLContext object fails with a Derby error
      
      Change is for SQLContext to reuse the active SparkSession during construction if the sparkContext supplied is the same as the currently active SparkContext. Without this change, a new SparkSession is instantiated that results in a Derby error when attempting to create a dataframe using a new SQLContext object even though the SparkContext supplied to the new SQLContext is same as the currently active one. Refer https://issues.apache.org/jira/browse/SPARK-18687 for details on the error and a repro.
      
      Existing unit tests and a new unit test added to pyspark-sql:
      
      /python/run-tests --python-executables=python --modules=pyspark-sql
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Vinayak <vijoshi5@in.ibm.com>
      Author: Vinayak Joshi <vijoshi@users.noreply.github.com>
      
      Closes #16119 from vijoshi/SPARK-18687_master.
      285a7798
    • Andrew Ash's avatar
      Fix missing close-parens for In filter's toString · b040cef2
      Andrew Ash authored
      Otherwise the open parentheses isn't closed in query plan descriptions of batch scans.
      
          PushedFilters: [In(COL_A, [1,2,4,6,10,16,219,815], IsNotNull(COL_B), ...
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #16558 from ash211/patch-9.
      b040cef2
    • Wenchen Fan's avatar
      [SPARK-19178][SQL] convert string of large numbers to int should return null · 6b34e745
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      When we convert a string to integral, we will convert that string to `decimal(20, 0)` first, so that we can turn a string with decimal format to truncated integral, e.g. `CAST('1.2' AS int)` will return `1`.
      
      However, this brings problems when we convert a string with large numbers to integral, e.g. `CAST('1234567890123' AS int)` will return `1912276171`, while Hive returns null as we expected.
      
      This is a long standing bug(seems it was there the first day Spark SQL was created), this PR fixes this bug by adding the native support to convert `UTF8String` to integral.
      
      ## How was this patch tested?
      
      new regression tests
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #16550 from cloud-fan/string-to-int.
      6b34e745
    • wm624@hotmail.com's avatar
      [SPARK-19142][SPARKR] spark.kmeans should take seed, initSteps, and tol as parameters · 7f24a0b6
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      spark.kmeans doesn't have interface to set initSteps, seed and tol. As Spark Kmeans algorithm doesn't take the same set of parameters as R kmeans, we should maintain a different interface in spark.kmeans.
      
      Add missing parameters and corresponding document.
      
      Modified existing unit tests to take additional parameters.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16523 from wangmiao1981/kmeans.
      7f24a0b6
  5. Jan 12, 2017
  6. Jan 11, 2017
    • hyukjinkwon's avatar
      [SPARK-16848][SQL] Check schema validation for user-specified schema in jdbc and table APIs · 24100f16
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to throw an exception for both jdbc APIs when user specified schemas are not allowed or useless.
      
      **DataFrameReader.jdbc(...)**
      
      ``` scala
      spark.read.schema(StructType(Nil)).jdbc(...)
      ```
      
      **DataFrameReader.table(...)**
      
      ```scala
      spark.read.schema(StructType(Nil)).table("usrdb.test")
      ```
      
      ## How was this patch tested?
      
      Unit test in `JDBCSuite` and `DataFrameReaderWriterSuite`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #14451 from HyukjinKwon/SPARK-16848.
      24100f16
    • wangzhenhua's avatar
      [SPARK-19132][SQL] Add test cases for row size estimation and aggregate estimation · 43fa21b3
      wangzhenhua authored
      ## What changes were proposed in this pull request?
      
      In this pr, we add more test cases for project and aggregate estimation.
      
      ## How was this patch tested?
      
      Add test cases.
      
      Author: wangzhenhua <wangzhenhua@huawei.com>
      
      Closes #16551 from wzhfy/addTests.
      43fa21b3
    • Reynold Xin's avatar
      [SPARK-19149][SQL] Follow-up: simplify cache implementation. · 66fe819a
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch simplifies slightly the logical plan statistics cache implementation, as discussed in https://github.com/apache/spark/pull/16529
      
      ## How was this patch tested?
      N/A - this has no behavior change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #16544 from rxin/SPARK-19149.
      66fe819a
    • jiangxingbo's avatar
      [SPARK-18801][SQL] Support resolve a nested view · 30a07071
      jiangxingbo authored
      ## What changes were proposed in this pull request?
      
      We should be able to resolve a nested view. The main advantage is that if you update an underlying view, the current view also gets updated.
      The new approach should be compatible with older versions of SPARK/HIVE, that means:
      1. The new approach should be able to resolve the views that created by older versions of SPARK/HIVE;
      2. The new approach should be able to resolve the views that are currently supported by SPARK SQL.
      
      The new approach mainly brings in the following changes:
      1. Add a new operator called `View` to keep track of the CatalogTable that describes the view, and the output attributes as well as the child of the view;
      2. Update the `ResolveRelations` rule to resolve the relations and views, note that a nested view should be resolved correctly;
      3. Add `viewDefaultDatabase` variable to `CatalogTable` to keep track of the default database name used to resolve a view, if the `CatalogTable` is not a view, then the variable should be `None`;
      4. Add `AnalysisContext` to enable us to still support a view created with CTE/Windows query;
      5. Enables the view support without enabling Hive support (i.e., enableHiveSupport);
      6. Fix a weird behavior: the result of a view query may have different schema if the referenced table has been changed. After this PR, we try to cast the child output attributes to that from the view schema, throw an AnalysisException if cast is not allowed.
      
      Note this is compatible with the views defined by older versions of Spark(before 2.2), which have empty `defaultDatabase` and all the relations in `viewText` have database part defined.
      
      ## How was this patch tested?
      1. Add new tests in `SessionCatalogSuite` to test the function `lookupRelation`;
      2. Add new test case in `SQLViewSuite` to test resolve a nested view.
      
      Author: jiangxingbo <jiangxb1987@gmail.com>
      
      Closes #16233 from jiangxb1987/resolve-view.
      30a07071
    • Bryan Cutler's avatar
      [SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings... · 3bc2eff8
      Bryan Cutler authored
      [SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings used to resolve packages/artifacts
      
      ## What changes were proposed in this pull request?
      
      Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality.  This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.
      
      This change restructures the creation of the IvySettings object in two distinct ways.  First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included.  Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.
      ## How was this patch tested?
      
      Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined.  Added new test to load a simple Ivy settings file with a local filesystem resolver.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      Author: Ian Hummel <ian@themodernlife.net>
      
      Closes #15119 from BryanCutler/spark-custom-IvySettings.
      3bc2eff8
    • Felix Cheung's avatar
      [SPARK-19130][SPARKR] Support setting literal value as column implicitly · d749c066
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      ```
      df$foo <- 1
      ```
      
      instead of
      ```
      df$foo <- lit(1)
      ```
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16510 from felixcheung/rlitcol.
      d749c066
    • jerryshao's avatar
      [SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS security filesystems · 4239a108
      jerryshao authored
      Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different DelegationTokenIdentifier.
      
      ## How was this patch tested?
      
      Manually verified in security cluster.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #16432 from jerryshao/SPARK-19021.
      4239a108
    • wangzhenhua's avatar
      [SPARK-19149][SQL] Unify two sets of statistics in LogicalPlan · a6155135
      wangzhenhua authored
      ## What changes were proposed in this pull request?
      
      Currently we have two sets of statistics in LogicalPlan: a simple stats and a stats estimated by cbo, but the computing logic and naming are quite confusing, we need to unify these two sets of stats.
      
      ## How was this patch tested?
      
      Just modify existing tests.
      
      Author: wangzhenhua <wangzhenhua@huawei.com>
      Author: Zhenhua Wang <wzh_zju@163.com>
      
      Closes #16529 from wzhfy/unifyStats.
      a6155135
  7. Jan 10, 2017
    • Wenchen Fan's avatar
      [SPARK-19157][SQL] should be able to change spark.sql.runSQLOnFiles at runtime · 3b19c74e
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      The analyzer rule that supports to query files directly will be added to `Analyzer.extendedResolutionRules` when SparkSession is created, according to the `spark.sql.runSQLOnFiles` flag. If the flag is off when we create `SparkSession`, this rule is not added and we can not query files directly even we turn on the flag later.
      
      This PR fixes this bug by always adding that rule to `Analyzer.extendedResolutionRules`.
      
      ## How was this patch tested?
      
      new regression test
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #16531 from cloud-fan/sql-on-files.
      3b19c74e
    • Shixiong Zhu's avatar
      [SPARK-19140][SS] Allow update mode for non-aggregation streaming queries · bc6c56e9
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16520 from zsxwing/update-without-agg.
      bc6c56e9
    • Sean Owen's avatar
      [SPARK-18997][CORE] Recommended upgrade libthrift to 0.9.3 · 856bae6a
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Updates to libthrift 0.9.3 to address a CVE.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16530 from srowen/SPARK-18997.
      856bae6a
    • Felix Cheung's avatar
      [SPARK-19133][SPARKR][ML] fix glm for Gamma, clarify glm family supported · 9bc3507e
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      R family is a longer list than what Spark supports.
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16511 from felixcheung/rdocglmfamily.
      9bc3507e
Loading