Skip to content
Snippets Groups Projects
  1. Mar 01, 2016
  2. Feb 29, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13478][YARN] Use real user when fetching delegation tokens. · c7fccb56
      Marcelo Vanzin authored
      The Hive client library is not smart enough to notice that the current
      user is a proxy user; so when using a proxy user, it fails to fetch
      delegation tokens from the metastore because of a missing kerberos
      TGT for the current user.
      
      To fix it, just run the code that fetches the delegation token as the
      real logged in user.
      
      Tested on a kerberos cluster both submitting normally and with a proxy
      user; Hive and HBase tokens are retrieved correctly in both cases.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11358 from vanzin/SPARK-13478.
      c7fccb56
    • Sameer Agarwal's avatar
      [SPARK-13123][SQL] Implement whole state codegen for sort · 4bd697da
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      This PR adds support for implementing whole state codegen for sort. Builds heaving on nongli 's PR: https://github.com/apache/spark/pull/11008 (which actually implements the feature), and adds the following changes on top:
      
      - [x]  Generated code updates peak execution memory metrics
      - [x]  Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`
      
      ## How was this patch tested?
      
      New unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`. Further, all existing sort tests should pass.
      
      Author: Sameer Agarwal <sameer@databricks.com>
      Author: Nong Li <nong@databricks.com>
      
      Closes #11359 from sameeragarwal/sort-codegen.
      4bd697da
    • Shixiong Zhu's avatar
      [SPARK-13522][CORE] Fix the exit log place for heartbeat · 644dbb64
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Just fixed the log place introduced by #11401
      
      ## How was this patch tested?
      
      unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11432 from zsxwing/SPARK-13522-follow-up.
      644dbb64
    • Shixiong Zhu's avatar
      [SPARK-13522][CORE] Executor should kill itself when it's unable to heartbeat... · 17a253cb
      Shixiong Zhu authored
      [SPARK-13522][CORE] Executor should kill itself when it's unable to heartbeat to driver more than N times
      
      ## What changes were proposed in this pull request?
      
      Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.
      
      This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times.
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11401 from zsxwing/SPARK-13522.
      17a253cb
    • gatorsmile's avatar
      [SPARK-13544][SQL] Rewrite/Propagate Constraints for Aliases in Aggregate · bc65f60e
      gatorsmile authored
      #### What changes were proposed in this pull request?
      
      After analysis by Analyzer, two operators could have alias. They are `Project` and `Aggregate`. So far, we only rewrite and propagate constraints if `Alias` is defined in `Project`. This PR is to resolve this issue in `Aggregate`.
      
      #### How was this patch tested?
      
      Added a test case for `Aggregate` in `ConstraintPropagationSuite`.
      
      marmbrus sameeragarwal
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #11422 from gatorsmile/validConstraintsInUnaryNodes.
      bc65f60e
    • hyukjinkwon's avatar
      [SPARK-13509][SPARK-13507][SQL] Support for writing CSV with a single function call · 02aa499d
      hyukjinkwon authored
      https://issues.apache.org/jira/browse/SPARK-13507
      https://issues.apache.org/jira/browse/SPARK-13509
      
      ## What changes were proposed in this pull request?
      This PR adds the support to write CSV data directly by a single call to the given path.
      
      Several unitests were added for each functionality.
      ## How was this patch tested?
      
      This was tested with unittests and with `dev/run_tests` for coding style
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Hyukjin Kwon <gurwls223@gmail.com>
      
      Closes #11389 from HyukjinKwon/SPARK-13507-13509.
      02aa499d
    • Cheng Lian's avatar
      [SPARK-13540][SQL] Supports using nested classes within Scala objects as Dataset element type · 916fc34f
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      Nested classes defined within Scala objects are translated into Java static nested classes. Unlike inner classes, they don't need outer scopes. But the analyzer still thinks that an outer scope is required.
      
      This PR fixes this issue simply by checking whether a nested class is static before looking up its outer scope.
      
      ## How was this patch tested?
      
      A test case is added to `DatasetSuite`. It checks contents of a Dataset whose element type is a nested class declared in a Scala object.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #11421 from liancheng/spark-13540-object-as-outer-scope.
      916fc34f
    • Zheng RuiFeng's avatar
      [SPARK-13506][MLLIB] Fix the wrong parameter in R code comment in AssociationRulesSuite · ac5c6352
      Zheng RuiFeng authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13506
      
      ## What changes were proposed in this pull request?
      
      just chang R Snippet Comment in  AssociationRulesSuite
      
      ## How was this patch tested?
      
      unit test passsed
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11387 from zhengruifeng/ars.
      ac5c6352
    • zhuol's avatar
      [SPARK-13481] Desc order of appID by default for history server page. · 2f91f5ac
      zhuol authored
      ## What changes were proposed in this pull request?
      
      Now by default, it shows as ascending order of appId. We might prefer to display as descending order by default, which will show the latest application at the top.
      
      ## How was this patch tested?
      
      Manual tested. See screenshot below:
      
      ![desc-sort](https://cloud.githubusercontent.com/assets/11683054/13307473/102f4cf8-db31-11e5-8dd5-391edbf32f0d.png)
      
      Author: zhuol <zhuol@yahoo-inc.com>
      
      Closes #11357 from zhuoliu/13481.
      2f91f5ac
    • vijaykiran's avatar
      [SPARK-12633][PYSPARK] [DOC] PySpark regression parameter desc to consistent format · 236e3c8f
      vijaykiran authored
      Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the regression module.  Also, updated 2 params in classification to read as `Supported values:` to be consistent.
      
      closes #10600
      
      Author: vijaykiran <mail@vijaykiran.com>
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #11404 from BryanCutler/param-desc-consistent-regression-SPARK-12633.
      236e3c8f
    • Jeff Zhang's avatar
      [SPARK-12994][CORE] It is not necessary to create ExecutorAllocationM… · 99fe8993
      Jeff Zhang authored
      …anager in local mode
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10914 from zjffdu/SPARK-12994.
      99fe8993
    • Yanbo Liang's avatar
      [SPARK-13545][MLLIB][PYSPARK] Make MLlib LogisticRegressionWithLBFGS's default... · d81a7135
      Yanbo Liang authored
      [SPARK-13545][MLLIB][PYSPARK] Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python
      
      ## What changes were proposed in this pull request?
      * The default value of ```regParam``` of PySpark MLlib ```LogisticRegressionWithLBFGS``` should be consistent with Scala which is ```0.0```. (This is also consistent with ML ```LogisticRegression```.)
      * BTW, if we use a known updater(L1 or L2) for binary classification, ```LogisticRegressionWithLBFGS``` will call the ML implementation. We should update the API doc to clarifying ```numCorrections``` will have no effect if we fall into that route.
      * Make a pass for all parameters of ```LogisticRegressionWithLBFGS```, others are set properly.
      
      cc mengxr dbtsai
      ## How was this patch tested?
      No new tests, it should pass all current tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #11424 from yanboliang/spark-13545.
      d81a7135
    • Rahul Tanwani's avatar
      [SPARK-13309][SQL] Fix type inference issue with CSV data · dd3b5455
      Rahul Tanwani authored
      Fix type inference issue for sparse CSV data - https://issues.apache.org/jira/browse/SPARK-13309
      
      Author: Rahul Tanwani <rahul@Rahuls-MacBook-Pro.local>
      
      Closes #11194 from tanwanirahul/master.
      dd3b5455
  3. Feb 28, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-13537][SQL] Fix readBytes in VectorizedPlainValuesReader · 6dfc4a76
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13537
      
      ## What changes were proposed in this pull request?
      
      In readBytes of VectorizedPlainValuesReader, we use buffer[offset] to access bytes in buffer. It is incorrect because offset is added with Platform.BYTE_ARRAY_OFFSET when initialization. We should fix it.
      
      ## How was this patch tested?
      
      `ParquetHadoopFsRelationSuite` sometimes (depending on the randomly generated data) will be [failed](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52136/consoleFull) by this bug. After applying this, the test can be passed.
      
      I added a test to `ParquetHadoopFsRelationSuite` with the data which will fail without this patch.
      
      The error exception:
      
          [info] ParquetHadoopFsRelationSuite:
          [info] - test all data types - StringType (440 milliseconds)
          [info] - test all data types - BinaryType (434 milliseconds)
          [info] - test all data types - BooleanType (406 milliseconds)
          20:59:38.618 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 2597.0 (TID 67966)
          java.lang.ArrayIndexOutOfBoundsException: 46
      	at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readBytes(VectorizedPlainValuesReader.java:88)
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11418 from viirya/fix-readbytes.
      6dfc4a76
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  4. Feb 27, 2016
    • Andrew Or's avatar
      [SPARK-13526][SQL] Move SQLContext per-session states to new class · cca79fad
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This creates a `SessionState`, which groups a few fields that existed in `SQLContext`. Because `HiveContext` extends `SQLContext` we also need to make changes there. This is mainly a cleanup task that will soon pave the way for merging the two contexts.
      
      ## How was this patch tested?
      
      Existing unit tests; this patch introduces no change in behavior.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #11405 from andrewor14/refactor-session.
      cca79fad
    • Reynold Xin's avatar
      Closes #11413 · 4c5e968d
      Reynold Xin authored
      4c5e968d
    • Nong Li's avatar
      [SPARK-13533][SQL] Fix readBytes in VectorizedPlainValuesReader · d780ed8b
      Nong Li authored
      ## What changes were proposed in this pull request?
      
      Fix readBytes in VectorizedPlainValuesReader. This fixes a copy and paste issue.
      
      ## How was this patch tested?
      
      Ran ParquetHadoopFsRelationSuite which failed before this.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #11414 from nongli/spark-13533.
      d780ed8b
    • Liang-Chi Hsieh's avatar
      [SPARK-13530][SQL] Add ShortType support to UnsafeRowParquetRecordReader · 3814d0bc
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13530
      
      ## What changes were proposed in this pull request?
      
      By enabling vectorized parquet scanner by default, the unit test `ParquetHadoopFsRelationSuite` based on `HadoopFsRelationTest` will be failed due to the lack of short type support in `UnsafeRowParquetRecordReader`. We should fix it.
      
      The error exception:
      
          [info] ParquetHadoopFsRelationSuite:
          [info] - test all data types - StringType (499 milliseconds)
          [info] - test all data types - BinaryType (447 milliseconds)
          [info] - test all data types - BooleanType (520 milliseconds)
          [info] - test all data types - ByteType (418 milliseconds)
          00:22:58.920 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 124.0 (TID 1949)
          org.apache.commons.lang.NotImplementedException: Unimplemented type: ShortType
      	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.readIntBatch(UnsafeRowParquetRecordReader.java:769)
      	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.readBatch(UnsafeRowParquetRecordReader.java:640)
      	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.access$000(UnsafeRowParquetRecordReader.java:461)
      	at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader.nextBatch(UnsafeRowParquetRecordReader.java:224)
      ## How was this patch tested?
      
      The unit test `ParquetHadoopFsRelationSuite` based on `HadoopFsRelationTest` will be [failed](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52110/consoleFull) due to the lack of short type support in UnsafeRowParquetRecordReader. By adding this support, the test can be passed.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11412 from viirya/add-shorttype-support.
      3814d0bc
    • mark800's avatar
      [SPARK-7483][MLLIB] Upgrade Chill to 0.7.2 to support Kryo with FPGrowth · ec0cc75e
      mark800 authored
      It registers more Scala classes, including ListBuffer to support Kryo with FPGrowth.
      
      See https://github.com/twitter/chill/releases for Chill's change log.
      
      Author: mark800 <yky800@126.com>
      
      Closes #11041 from mark800/master.
      ec0cc75e
    • Nong Li's avatar
      [SPARK-13518][SQL] Enable vectorized parquet scanner by default · 7a0cb4e5
      Nong Li authored
      ## What changes were proposed in this pull request?
      
      Change the default of the flag to enable this feature now that the implementation is complete.
      
      ## How was this patch tested?
      
      The new parquet reader should be a drop in, so will be exercised by the existing tests.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #11397 from nongli/spark-13518.
      7a0cb4e5
    • Reynold Xin's avatar
      [SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts · 59e3e10b
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts.
      
      Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11400 from rxin/release-script.
      59e3e10b
  5. Feb 26, 2016
    • Josh Rosen's avatar
      [SPARK-13474][PROJECT INFRA] Update packaging scripts to push artifacts to home.apache.org · f77dc4e1
      Josh Rosen authored
      Due to the people.apache.org -> home.apache.org migration, we need to update our packaging scripts to publish artifacts to the new server. Because the new server only supports sftp instead of ssh, we need to update the scripts to use lftp instead of ssh + rsync.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11350 from JoshRosen/update-release-scripts-for-apache-home.
      f77dc4e1
    • Shixiong Zhu's avatar
      [SPARK-13519][CORE] Driver should tell Executor to stop itself when cleaning executor's state · ad615291
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      When the driver removes an executor's state, the connection between the driver and the executor may be still alive so that the executor cannot exit automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor is still alive), so the driver should try to tell the executor to stop itself. Otherwise, we will leak an executor.
      
      This PR modified the driver to send `StopExecutor` to the executor when it's removed.
      
      ## How was this patch tested?
      
      manual test: increase the worker heartbeat interval to force it's always timeout and the leak executors are gone.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11399 from zsxwing/SPARK-13519.
      ad615291
    • zlpmichelle's avatar
      [SPARK-13505][ML] add python api for MaxAbsScaler · 1e5fcdf9
      zlpmichelle authored
      ## What changes were proposed in this pull request?
      After SPARK-13028, we should add Python API for MaxAbsScaler.
      
      ## How was this patch tested?
      unit test
      
      Author: zlpmichelle <zlpmichelle@gmail.com>
      
      Closes #11393 from zlpmichelle/master.
      1e5fcdf9
    • Reynold Xin's avatar
      [SPARK-13465] Add a task failure listener to TaskContext · 391755dc
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      
      TaskContext supports task completion callback, which gets called regardless of task failures. However, there is no way for the listener to know if there is an error. This patch adds a new listener that gets called when a task fails.
      
      ## How was the this patch tested?
      New unit test case and integration test case covering the code path
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11340 from rxin/SPARK-13465.
      391755dc
    • Nong Li's avatar
      [SPARK-13499] [SQL] Performance improvements for parquet reader. · 0598a2b8
      Nong Li authored
      ## What changes were proposed in this pull request?
      
      This patch includes these performance fixes:
        - Remove unnecessary setNotNull() calls. The NULL bits are cleared already.
        - Speed up RLE group decoding
        - Speed up dictionary decoding by decoding NULLs directly into the result.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      In addition to the updated benchmarks, on TPCDS, the result of these changes
      running Q55 (sf40) is:
      
      ```
      TPCDS:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
      ---------------------------------------------------------------------------------
      q55 (Before)                             6398 / 6616         18.0          55.5
      q55 (After)                              4983 / 5189         23.1          43.3
      ```
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #11375 from nongli/spark-13499.
      0598a2b8
    • Davies Liu's avatar
      [SPARK-12313] [SQL] improve performance of BroadcastNestedLoopJoin · 6df1e55a
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      Currently, BroadcastNestedLoopJoin is implemented for worst case, it's too slow, very easy to hang forever. This PR will create fast path for some joinType and buildSide, also improve the worst case (will use much less memory than before).
      
      Before this PR, one task requires O(N*K) + O(K) in worst cases, N is number of rows from one partition of streamed table, it could hang the job (because of GC).
      
      In order to workaround this for InnerJoin, we have to disable auto-broadcast, switch to CartesianProduct: This could be workaround for InnerJoin, see https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html
      
      In this PR, we will have fast path for these joins :
      
       InnerJoin with BuildLeft or BuildRight
       LeftOuterJoin with BuildRight
       RightOuterJoin with BuildLeft
       LeftSemi with BuildRight
      
      These fast paths are all stream based (take one pass on streamed table), required O(1) memory.
      
      All other join types and build types will take two pass on streamed table, one pass to find the matched rows that includes streamed part, which require O(1) memory, another pass to find the rows from build table that does not have a matched row from streamed table, which required O(K) memory, K is the number rows from build side, one bit per row, should be much smaller than the memory for broadcast. The following join types work in this way:
      
      LeftOuterJoin with BuildLeft
      RightOuterJoin with BuildRight
      FullOuterJoin with BuildLeft or BuildRight
      LeftSemi with BuildLeft
      
      This PR also added tests for all the join types for BroadcastNestedLoopJoin.
      
      After this PR, for InnerJoin with one small table, BroadcastNestedLoopJoin should be faster than CartesianProduct, we don't need that workaround anymore.
      
      ## How was the this patch tested?
      
      Added unit tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11328 from davies/nested_loop.
      6df1e55a
    • Dongjoon Hyun's avatar
      [MINOR][SQL] Fix modifier order. · 727e7801
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes the order of modifier from `abstract public` into `public abstract`.
      Currently, when we run `./dev/lint-java`, it shows the error.
      ```
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/util/sketch/CountMinSketch.java:[53,10] (modifier) ModifierOrder: 'public' modifier out of order with the JLS suggestions.
      ```
      
      ## How was this patch tested?
      
      ```
      $ ./dev/lint-java
      Checkstyle checks passed.
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11390 from dongjoon-hyun/fix_modifier_order.
      727e7801
    • Dongjoon Hyun's avatar
      [SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using include_example · 7af0de07
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR replaces example codes in `mllib-linear-methods.md` using `include_example`
      by doing the followings:
        * Extracts the example codes(Scala,Java,Python) as files in `example` module.
        * Merges some dialog-style examples into a single file.
        * Hide redundant codes in HTML for the consistency with other docs.
      
      ## How was the this patch tested?
      
      manual test.
      This PR can be tested by document generations, `SKIP_API=1 jekyll build`.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11320 from dongjoon-hyun/SPARK-11381.
      7af0de07
    • Bryan Cutler's avatar
      [SPARK-12634][PYSPARK][DOC] PySpark tree parameter desc to consistent format · b33261f9
      Bryan Cutler authored
      Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent.  This is for the tree module.
      
      closes #10601
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      Author: vijaykiran <mail@vijaykiran.com>
      
      Closes #11353 from BryanCutler/param-desc-consistent-tree-SPARK-12634.
      b33261f9
    • Cheng Lian's avatar
      [SPARK-13457][SQL] Removes DataFrame RDD operations · 99dfcedb
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This is another try of PR #11323.
      
      This PR removes DataFrame RDD operations except for `foreach` and `foreachPartitions` (they are actions rather than transformations). Original calls are now replaced by calls to methods of `DataFrame.rdd`.
      
      PR #11323 was reverted because it introduced a regression: both `DataFrame.foreach` and `DataFrame.foreachPartitions` wrap underlying RDD operations with `withNewExecutionId` to track Spark jobs. But they are removed in #11323.
      
      ## How was the this patch tested?
      
      No extra tests are added. Existing tests should do the work.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #11388 from liancheng/remove-df-rdd-ops.
      99dfcedb
    • huangzhaowei's avatar
      [SPARK-12523][YARN] Support long-running of the Spark On HBase and hive meta store. · 5c3912e5
      huangzhaowei authored
      Obtain the hive metastore and hbase token as well as hdfs token in DelegationToeknRenewer to supoort long-running application of spark on hbase or thriftserver.
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #10645 from SaintBacchus/SPARK-12523.
      5c3912e5
    • Liwei Lin's avatar
      [MINOR][STREAMING] Fix a minor naming issue in JavaDStreamLike · 318bf411
      Liwei Lin authored
      Author: Liwei Lin <proflin.me@gmail.com>
      
      Closes #11385 from proflin/Fix-minor-naming.
      318bf411
    • hyukjinkwon's avatar
      [SPARK-13503][SQL] Support to specify the (writing) option for compression codec for TEXT · 9812a24a
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-13503
      This PR makes the TEXT datasource can compress output by option instead of manually setting Hadoop configurations.
      For reflecting codec by names, it is similar with https://github.com/apache/spark/pull/10805 and https://github.com/apache/spark/pull/10858.
      
      ## How was this patch tested?
      
      This was tested with unittests and with `dev/run_tests` for coding style
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #11384 from HyukjinKwon/SPARK-13503.
      9812a24a
    • Reynold Xin's avatar
      [SPARK-13487][SQL] User-facing RuntimeConfig interface · 26ac6080
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch creates the public API for runtime configuration and an implementation for it. The public runtime configuration includes configs for existing SQL, as well as Hadoop Configuration.
      
      This new interface is currently dead code. It will be added to SQLContext and a session entry point to Spark when we add that.
      
      ## How was this patch tested?
      a new unit test suite
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11378 from rxin/SPARK-13487.
      26ac6080
    • thomastechs's avatar
      [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string... · 8afe4914
      thomastechs authored
      [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
      
      ## What changes were proposed in this pull request?
      
      This Pull request is used for the fix SPARK-12941, creating a data type mapping to Oracle for the corresponding data type"Stringtype" from dataframe. This PR is for the master branch fix, where as another PR is already tested with the branch 1.4
      
      ## How was the this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      This patch was tested using the Oracle docker .Created a new integration suite for the same.The oracle.jdbc jar was to be downloaded from the maven repository.Since there was no jdbc jar available in the maven repository, the jar was downloaded from oracle site manually and installed in the local; thus tested. So, for SparkQA test case run, the ojdbc jar might be manually placed in the local maven repository(com/oracle/ojdbc6/11.2.0.2.0) while Spark QA test run.
      
      Author: thomastechs <thomas.sebastian@tcs.com>
      
      Closes #11306 from thomastechs/master.
      8afe4914
Loading