Skip to content
Snippets Groups Projects
  1. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [HOT-FIX] Recover some deprecations for 2.10 compatibility. · 02b7677e
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      #11479 [SPARK-13627] broke 2.10 compatibility: [2.10-Build](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-scala-2.10/292/console)
      At this moment, we need to support both 2.10 and 2.11.
      This PR recovers some deprecated methods which were replace by [SPARK-13627].
      
      ## How was this patch tested?
      
      Jenkins build: Both 2.10, 2.11.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11488 from dongjoon-hyun/hotfix_compatibility_with_2.10.
      02b7677e
    • Liang-Chi Hsieh's avatar
      [SPARK-13466] [SQL] Remove projects that become redundant after column pruning rule · 7b25dc7b
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13466
      
      ## What changes were proposed in this pull request?
      
      With column pruning rule in optimizer, some Project operators will become redundant. We should remove these redundant Projects.
      
      For an example query:
      
          val input = LocalRelation('key.int, 'value.string)
      
          val query =
            Project(Seq($"x.key", $"y.key"),
              Join(
                SubqueryAlias("x", input),
                BroadcastHint(SubqueryAlias("y", input)), Inner, None))
      
      After the first run of column pruning, it would like:
      
          Project(Seq($"x.key", $"y.key"),
            Join(
              Project(Seq($"x.key"), SubqueryAlias("x", input)),
              Project(Seq($"y.key"),      <-- inserted by the rule
              BroadcastHint(SubqueryAlias("y", input))),
              Inner, None))
      
      Actually we don't need the outside Project now. This patch will remove it:
      
          Join(
            Project(Seq($"x.key"), SubqueryAlias("x", input)),
            Project(Seq($"y.key"),
            BroadcastHint(SubqueryAlias("y", input))),
            Inner, None)
      
      ## How was the this patch tested?
      
      Unit test is added into ColumnPruningSuite.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11341 from viirya/remove-redundant-project.
      7b25dc7b
    • Liang-Chi Hsieh's avatar
      [SPARK-13635] [SQL] Enable LimitPushdown optimizer rule because we have... · 1085bd86
      Liang-Chi Hsieh authored
      [SPARK-13635] [SQL] Enable LimitPushdown optimizer rule because we have whole-stage codegen for Limit
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-13635
      
      ## What changes were proposed in this pull request?
      
      LimitPushdown optimizer rule has been disabled due to no whole-stage codegen for Limit. As we have whole-stage codegen for Limit now, we should enable it.
      
      ## How was this patch tested?
      
      As we only re-enable LimitPushdown optimizer rule, no need to add new tests for it.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11483 from viirya/enable-limitpushdown.
      1085bd86
    • Devaraj K's avatar
      [SPARK-13621][CORE] TestExecutor.scala needs to be moved to test package · 56e3d007
      Devaraj K authored
      Moved TestExecutor.scala from src to test package and removed the unused file TestClient.scala.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #11474 from devaraj-kavali/SPARK-13621.
      56e3d007
    • Liang-Chi Hsieh's avatar
      [SPARK-13616][SQL] Let SQLBuilder convert logical plan without a project on top of it · f87ce050
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13616
      
      ## What changes were proposed in this pull request?
      
      It is possibly that a logical plan has been removed `Project` from the top of it. Or the plan doesn't has a top `Project` from the beginning because it is not necessary. Currently the `SQLBuilder` can't convert such plans back to SQL. This change is to add this feature.
      
      ## How was this patch tested?
      
      A test is added to `LogicalPlanToSQLSuite`.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11466 from viirya/sqlbuilder-notopselect.
      f87ce050
  2. Mar 02, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13627][SQL][YARN] Fix simple deprecation warnings. · 9c274ac4
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to fix the following deprecation warnings.
        * MethodSymbolApi.paramss--> paramLists
        * AnnotationApi.tpe -> tree.tpe
        * BufferLike.readOnly -> toList.
        * StandardNames.nme -> termNames
        * scala.tools.nsc.interpreter.AbstractFileClassLoader -> scala.reflect.internal.util.AbstractFileClassLoader
        * TypeApi.declarations-> decls
      
      ## How was this patch tested?
      
      Check the compile build log and pass the tests.
      ```
      ./build/sbt
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11479 from dongjoon-hyun/SPARK-13627.
      9c274ac4
    • Wenchen Fan's avatar
      [SPARK-13617][SQL] remove unnecessary GroupingAnalytics trait · b60b8137
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      The `trait GroupingAnalytics` only has one implementation, it's an unnecessary abstraction. This PR removes it, and does some code simplification when resolving `GroupingSet`.
      
      ## How was this patch tested?
      
      existing tests
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #11469 from cloud-fan/groupingset.
      b60b8137
    • Takeshi YAMAMURO's avatar
      [SPARK-13528][SQL] Make the short names of compression codecs consistent in ParquetRelation · 6250cf1e
      Takeshi YAMAMURO authored
      ## What changes were proposed in this pull request?
      This pr to make the short names of compression codecs in `ParquetRelation` consistent against other ones. This pr comes from #11324.
      
      ## How was this patch tested?
      Add more tests in `TextSuite`.
      
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #11408 from maropu/SPARK-13528.
      6250cf1e
    • Wenchen Fan's avatar
      [SPARK-13594][SQL] remove typed operations(e.g. map, flatMap) from python DataFrame · 4dd24811
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Remove `map`, `flatMap`, `mapPartitions` from python DataFrame, to prepare for Dataset API in the future.
      
      ## How was this patch tested?
      
      existing tests
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #11445 from cloud-fan/python-clean.
      4dd24811
    • Nong Li's avatar
      [SPARK-13574] [SQL] Add benchmark to measure string dictionary decode. · e2780ce8
      Nong Li authored
      ## What changes were proposed in this pull request?
      
      Also updated the other benchmarks when the default to use vectorized decode was flipped.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #11454 from nongli/benchmark.
      e2780ce8
    • Davies Liu's avatar
      [SPARK-13601] call failure callbacks before writer.close() · b5a59a0f
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      In order to tell OutputStream that the task has failed or not, we should call the failure callbacks BEFORE calling writer.close().
      
      ## How was this patch tested?
      
      Added new unit tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11450 from davies/callback.
      b5a59a0f
    • gatorsmile's avatar
      [SPARK-13535][SQL] Fix Analysis Exceptions when Using Backticks in Transform Clause · 9e01fe2e
      gatorsmile authored
      #### What changes were proposed in this pull request?
      ```SQL
      FROM
      (FROM test SELECT TRANSFORM(key, value) USING 'cat' AS (`thing1` int, thing2 string)) t
      SELECT thing1 + 1
      ```
      This query returns an analysis error, like:
      ```
      Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot resolve '`thing1`' given input columns: [`thing1`, thing2]; line 3 pos 7
      'Project [unresolvedalias(('thing1 + 1), None)]
      +- SubqueryAlias t
         +- ScriptTransformation [key#2,value#3], cat, [`thing1`#6,thing2#7], HiveScriptIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim,	)),List((field.delim,	)),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),false)
            +- SubqueryAlias test
               +- Project [_1#0 AS key#2,_2#1 AS value#3]
                  +- LocalRelation [_1#0,_2#1], [[1,1],[2,2],[3,3],[4,4],[5,5]]
      ```
      
      The backpacks of \`thing1\` should be cleaned before entering Parser/Analyzer. This PR fixes this issue.
      
      #### How was this patch tested?
      
      Added a test case and modified an existing test case
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #11415 from gatorsmile/scriptTransform.
      9e01fe2e
    • Josh Rosen's avatar
      [SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManager · d6969ffc
      Josh Rosen authored
      CacheManager directly calls MemoryStore.unrollSafely() and has its own logic for handling graceful fallback to disk when cached data does not fit in memory. However, this logic also exists inside of the MemoryStore itself, so this appears to be unnecessary duplication.
      
      Thanks to the addition of block-level read/write locks in #10705, we can refactor the code to remove the CacheManager and replace it with an atomic `BlockManager.getOrElseUpdate()` method.
      
      This pull request replaces / subsumes #10748.
      
      /cc andrewor14 and nongli for review. Note that this changes the locking semantics of a couple of internal BlockManager methods (`doPut()` and `lockNewBlockForWriting`), so please pay attention to the Scaladoc changes and new test cases for those methods.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11436 from JoshRosen/remove-cachemanager.
      d6969ffc
    • gatorsmile's avatar
      [SPARK-13609] [SQL] Support Column Pruning for MapPartitions · 8f8d8a23
      gatorsmile authored
      #### What changes were proposed in this pull request?
      
      This PR is to prune unnecessary columns when the operator is  `MapPartitions`. The solution is to add an extra `Project` in the child node.
      
      For the other two operators `AppendColumns` and `MapGroups`, it sounds doable. More discussions are required. The major reason is the current implementation of the `inputPlan` of `groupBy` is based on the child of `AppendColumns`. It might be a bug? Thus, will submit a separate PR.
      
      #### How was this patch tested?
      
      Added a test case in ColumnPruningSuite to verify the rule. Added another test case in DatasetSuite.scala to verify the data.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #11460 from gatorsmile/datasetPruningNew.
      8f8d8a23
    • lgieron's avatar
      [SPARK-13515] Make FormatNumber work irrespective of locale. · d8afd45f
      lgieron authored
      ## What changes were proposed in this pull request?
      
      Change in class FormatNumber to make it work irrespective of locale.
      
      ## How was this patch tested?
      
      Unit tests.
      
      Author: lgieron <lgieron@gmail.com>
      
      Closes #11396 from lgieron/SPARK-13515_Fix_Format_Number.
      d8afd45f
    • Wojciech Jurczyk's avatar
      Fix run-tests.py typos · 75e618de
      Wojciech Jurczyk authored
      ## What changes were proposed in this pull request?
      
      The PR fixes typos in an error message in dev/run-tests.py.
      
      Author: Wojciech Jurczyk <wojciech.jurczyk@codilime.com>
      
      Closes #11467 from wjur/wjur/typos_run_tests.
      75e618de
    • Dongjoon Hyun's avatar
      [MINOR][STREAMING] Replace deprecated `apply` with `create` in example. · 366f26d2
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Twitter Algebird deprecated `apply` in HyperLogLog.scala.
      ```
      deprecated("Use toHLL", since = "0.10.0 / 2015-05")
      def apply[T <% Array[Byte]](t: T) = create(t)
      ```
      This PR replace the deprecated usage `apply` with new `create`
      according to the upstream change.
      
      ## How was this patch tested?
      manual.
      ```
      /bin/spark-submit --class org.apache.spark.examples.streaming.TwitterAlgebirdHLL examples/target/scala-2.11/spark-examples-2.0.0-SNAPSHOT-hadoop2.2.0.jar
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11451 from dongjoon-hyun/replace_deprecated_hll_apply.
      366f26d2
  3. Mar 01, 2016
    • jerryshao's avatar
      [BUILD][MINOR] Fix SBT build error with network-yarn module · b4d096de
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      ```
      error] Expected ID character
      [error] Not a valid command: common (similar: completions)
      [error] Expected project ID
      [error] Expected configuration
      [error] Expected ':' (if selecting a configuration)
      [error] Expected key
      [error] Not a valid key: common (similar: commands)
      [error] common/network-yarn/test
      ```
      
      `common/network-yarn` is not a valid sbt project, we should change to `network-yarn`.
      
      ## How was this patch tested?
      
      Locally run the the unit-test.
      
      CC rxin , we should either change here, or change the sbt project name.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #11456 from jerryshao/build-fix.
      b4d096de
    • Joseph K. Bradley's avatar
      [SPARK-13008][ML][PYTHON] Put one alg per line in pyspark.ml all lists · 9495c40f
      Joseph K. Bradley authored
      This is to fix a long-time annoyance: Whenever we add a new algorithm to pyspark.ml, we have to add it to the ```__all__``` list at the top.  Since we keep it alphabetized, it often creates a lot more changes than needed.  It is also easy to add the Estimator and forget the Model.  I'm going to switch it to have one algorithm per line.
      
      This also alphabetizes a few out-of-place classes in pyspark.ml.feature.  No changes have been made to the moved classes.
      
      CC: thunterdb
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #10927 from jkbradley/ml-python-all-list.
      9495c40f
    • sureshthalamati's avatar
      [SPARK-13167][SQL] Include rows with null values for partition column when... · e42724b1
      sureshthalamati authored
      [SPARK-13167][SQL] Include rows with null values for partition column when reading from JDBC datasources.
      
      Rows with null values in partition column are not included in the results because none of the partition
      where clause specify is null predicate on the partition column. This fix adds is null predicate on the partition column  to the first JDBC partition where clause.
      
      Example:
      JDBCPartition(THEID < 1 or THEID is null, 0),JDBCPartition(THEID >= 1 AND THEID < 2,1),
      JDBCPartition(THEID >= 2, 2)
      
      Author: sureshthalamati <suresh.thalamati@gmail.com>
      
      Closes #11063 from sureshthalamati/nullable_jdbc_part_col_spark-13167.
      e42724b1
    • Davies Liu's avatar
      [SPARK-13598] [SQL] remove LeftSemiJoinBNL · a640c5b4
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      Broadcast left semi join without joining keys is already supported in BroadcastNestedLoopJoin, it has the same implementation as LeftSemiJoinBNL, we should remove that.
      
      ## How was this patch tested?
      
      Updated unit tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11448 from davies/remove_bnl.
      a640c5b4
    • Reynold Xin's avatar
      [SPARK-13548][BUILD] Move tags and unsafe modules into common · b0ee7d43
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch moves tags and unsafe modules into common directory to remove 2 top level non-user-facing directories.
      
      ## How was this patch tested?
      Jenkins should suffice.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11426 from rxin/SPARK-13548.
      b0ee7d43
    • Davies Liu's avatar
      [SPARK-13582] [SQL] defer dictionary decoding in parquet reader · c27ba0d5
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR defer the resolution from a id of dictionary to value until the column is actually accessed (inside getInt/getLong), this is very useful for those columns and rows that are filtered out. It's also useful for binary type, we will not need to copy all the byte arrays.
      
      This PR also change the underlying type for small decimal that could be fit within a Int, in order to use getInt() to lookup the value from IntDictionary.
      
      ## How was this patch tested?
      
      Manually test TPCDS Q7 with scale factor 10, saw about 30% improvements (after PR #11274).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11437 from davies/decode_dict.
      c27ba0d5
    • Xiangrui Meng's avatar
      Closes #11320 · c37bbb3a
      Xiangrui Meng authored
      Closes #10940
      Closes #11302
      Closes #11430
      Closes #10912
      c37bbb3a
    • Yanbo Liang's avatar
      [SPARK-12811][ML] Estimator for Generalized Linear Models(GLMs) · 5ed48dd8
      Yanbo Liang authored
      Estimator for Generalized Linear Models(GLMs) which will be solved by IRLS.
      
      cc mengxr
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #11136 from yanboliang/spark-12811.
      5ed48dd8
    • Liang-Chi Hsieh's avatar
      [SPARK-13511] [SQL] Add wholestage codegen for limit · c43899a0
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13511
      
      ## What changes were proposed in this pull request?
      
      Current limit operator doesn't support wholestage codegen. This is open to add support for it.
      
      In the `doConsume` of `GlobalLimit` and `LocalLimit`, we use a count term to count the processed rows. Once the row numbers catches the limit number, we set the variable `stopEarly` of `BufferedRowIterator` newly added in this pr to `true` that indicates we want to stop processing remaining rows. Then when the wholestage codegen framework checks `shouldStop()`, it will stop the processing of the row iterator.
      
      Before this, the executed plan for a query `sqlContext.range(N).limit(100).groupBy().sum()` is:
      
          TungstenAggregate(key=[], functions=[(sum(id#5L),mode=Final,isDistinct=false)], output=[sum(id)#6L])
          +- TungstenAggregate(key=[], functions=[(sum(id#5L),mode=Partial,isDistinct=false)], output=[sum#9L])
             +- GlobalLimit 100
                +- Exchange SinglePartition, None
                   +- LocalLimit 100
                      +- Range 0, 1, 1, 524288000, [id#5L]
      
      After add wholestage codegen support:
      
          WholeStageCodegen
          :  +- TungstenAggregate(key=[], functions=[(sum(id#40L),mode=Final,isDistinct=false)], output=[sum(id)#41L])
          :     +- TungstenAggregate(key=[], functions=[(sum(id#40L),mode=Partial,isDistinct=false)], output=[sum#44L])
          :        +- GlobalLimit 100
          :           +- INPUT
          +- Exchange SinglePartition, None
             +- WholeStageCodegen
                :  +- LocalLimit 100
                :     +- Range 0, 1, 1, 524288000, [id#40L]
      
      ## How was this patch tested?
      
      A test is added into BenchmarkWholeStageCodegen.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11391 from viirya/wholestage-limit.
      c43899a0
    • Masayoshi TSUZUKI's avatar
      [SPARK-13592][WINDOWS] fix path of spark-submit2.cmd in spark-submit.cmd · 12a2a57e
      Masayoshi TSUZUKI authored
      ## What changes were proposed in this pull request?
      
      This patch fixes the problem that pyspark fails on Windows because pyspark can't find ```spark-submit2.cmd```.
      
      ## How was this patch tested?
      
      manual tests:
        I ran ```bin\pyspark.cmd``` and checked if pyspark is launched correctly after this patch is applyed.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #11442 from tsudukim/feature/SPARK-13592.
      12a2a57e
    • Zheng RuiFeng's avatar
      [SPARK-13550][ML] Add java example for ml.clustering.BisectingKMeans · 3c5f5e3b
      Zheng RuiFeng authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13550
      
      ## What changes were proposed in this pull request?
      
      Just add a java example for ml.clustering.BisectingKMeans
      
      ## How was this patch tested?
      
      manual tests were done.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11428 from zhengruifeng/ml_bkm_je.
      3c5f5e3b
    • Zheng RuiFeng's avatar
      [SPARK-13551][MLLIB] Fix wrong comment and remove meanless lines in... · 0a4b620f
      Zheng RuiFeng authored
      [SPARK-13551][MLLIB] Fix wrong comment and remove meanless lines in mllib.JavaBisectingKMeansExample
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-13551
      
      ## What changes were proposed in this pull request?
      
      Fix wrong comment and remove meanless lines in mllib.JavaBisectingKMeansExample
      
      ## How was this patch tested?
      
      manual test
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11429 from zhengruifeng/mllib_bkm_je.
      0a4b620f
  4. Feb 29, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13478][YARN] Use real user when fetching delegation tokens. · c7fccb56
      Marcelo Vanzin authored
      The Hive client library is not smart enough to notice that the current
      user is a proxy user; so when using a proxy user, it fails to fetch
      delegation tokens from the metastore because of a missing kerberos
      TGT for the current user.
      
      To fix it, just run the code that fetches the delegation token as the
      real logged in user.
      
      Tested on a kerberos cluster both submitting normally and with a proxy
      user; Hive and HBase tokens are retrieved correctly in both cases.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11358 from vanzin/SPARK-13478.
      c7fccb56
    • Sameer Agarwal's avatar
      [SPARK-13123][SQL] Implement whole state codegen for sort · 4bd697da
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      This PR adds support for implementing whole state codegen for sort. Builds heaving on nongli 's PR: https://github.com/apache/spark/pull/11008 (which actually implements the feature), and adds the following changes on top:
      
      - [x]  Generated code updates peak execution memory metrics
      - [x]  Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`
      
      ## How was this patch tested?
      
      New unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`. Further, all existing sort tests should pass.
      
      Author: Sameer Agarwal <sameer@databricks.com>
      Author: Nong Li <nong@databricks.com>
      
      Closes #11359 from sameeragarwal/sort-codegen.
      4bd697da
    • Shixiong Zhu's avatar
      [SPARK-13522][CORE] Fix the exit log place for heartbeat · 644dbb64
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Just fixed the log place introduced by #11401
      
      ## How was this patch tested?
      
      unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11432 from zsxwing/SPARK-13522-follow-up.
      644dbb64
    • Shixiong Zhu's avatar
      [SPARK-13522][CORE] Executor should kill itself when it's unable to heartbeat... · 17a253cb
      Shixiong Zhu authored
      [SPARK-13522][CORE] Executor should kill itself when it's unable to heartbeat to driver more than N times
      
      ## What changes were proposed in this pull request?
      
      Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.
      
      This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times.
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11401 from zsxwing/SPARK-13522.
      17a253cb
    • gatorsmile's avatar
      [SPARK-13544][SQL] Rewrite/Propagate Constraints for Aliases in Aggregate · bc65f60e
      gatorsmile authored
      #### What changes were proposed in this pull request?
      
      After analysis by Analyzer, two operators could have alias. They are `Project` and `Aggregate`. So far, we only rewrite and propagate constraints if `Alias` is defined in `Project`. This PR is to resolve this issue in `Aggregate`.
      
      #### How was this patch tested?
      
      Added a test case for `Aggregate` in `ConstraintPropagationSuite`.
      
      marmbrus sameeragarwal
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #11422 from gatorsmile/validConstraintsInUnaryNodes.
      bc65f60e
    • hyukjinkwon's avatar
      [SPARK-13509][SPARK-13507][SQL] Support for writing CSV with a single function call · 02aa499d
      hyukjinkwon authored
      https://issues.apache.org/jira/browse/SPARK-13507
      https://issues.apache.org/jira/browse/SPARK-13509
      
      ## What changes were proposed in this pull request?
      This PR adds the support to write CSV data directly by a single call to the given path.
      
      Several unitests were added for each functionality.
      ## How was this patch tested?
      
      This was tested with unittests and with `dev/run_tests` for coding style
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Hyukjin Kwon <gurwls223@gmail.com>
      
      Closes #11389 from HyukjinKwon/SPARK-13507-13509.
      02aa499d
    • Cheng Lian's avatar
      [SPARK-13540][SQL] Supports using nested classes within Scala objects as Dataset element type · 916fc34f
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      Nested classes defined within Scala objects are translated into Java static nested classes. Unlike inner classes, they don't need outer scopes. But the analyzer still thinks that an outer scope is required.
      
      This PR fixes this issue simply by checking whether a nested class is static before looking up its outer scope.
      
      ## How was this patch tested?
      
      A test case is added to `DatasetSuite`. It checks contents of a Dataset whose element type is a nested class declared in a Scala object.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #11421 from liancheng/spark-13540-object-as-outer-scope.
      916fc34f
    • Zheng RuiFeng's avatar
      [SPARK-13506][MLLIB] Fix the wrong parameter in R code comment in AssociationRulesSuite · ac5c6352
      Zheng RuiFeng authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13506
      
      ## What changes were proposed in this pull request?
      
      just chang R Snippet Comment in  AssociationRulesSuite
      
      ## How was this patch tested?
      
      unit test passsed
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11387 from zhengruifeng/ars.
      ac5c6352
    • zhuol's avatar
      [SPARK-13481] Desc order of appID by default for history server page. · 2f91f5ac
      zhuol authored
      ## What changes were proposed in this pull request?
      
      Now by default, it shows as ascending order of appId. We might prefer to display as descending order by default, which will show the latest application at the top.
      
      ## How was this patch tested?
      
      Manual tested. See screenshot below:
      
      ![desc-sort](https://cloud.githubusercontent.com/assets/11683054/13307473/102f4cf8-db31-11e5-8dd5-391edbf32f0d.png)
      
      Author: zhuol <zhuol@yahoo-inc.com>
      
      Closes #11357 from zhuoliu/13481.
      2f91f5ac
    • vijaykiran's avatar
      [SPARK-12633][PYSPARK] [DOC] PySpark regression parameter desc to consistent format · 236e3c8f
      vijaykiran authored
      Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the regression module.  Also, updated 2 params in classification to read as `Supported values:` to be consistent.
      
      closes #10600
      
      Author: vijaykiran <mail@vijaykiran.com>
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #11404 from BryanCutler/param-desc-consistent-regression-SPARK-12633.
      236e3c8f
    • Jeff Zhang's avatar
      [SPARK-12994][CORE] It is not necessary to create ExecutorAllocationM… · 99fe8993
      Jeff Zhang authored
      …anager in local mode
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10914 from zjffdu/SPARK-12994.
      99fe8993
Loading