Skip to content
Snippets Groups Projects
  1. Apr 19, 2016
    • gatorsmile's avatar
      [SPARK-12457] Fixed the Wrong Description and Missing Example in Collection Functions · d9620e76
      gatorsmile authored
      #### What changes were proposed in this pull request?
      https://github.com/apache/spark/pull/12185 contains the original PR I submitted in https://github.com/apache/spark/pull/10418
      
      However, it misses one of the extended example, a wrong description and a few typos for collection functions. This PR is fix all these issues.
      
      #### How was this patch tested?
      The existing test cases already cover it.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #12492 from gatorsmile/expressionUpdate.
      d9620e76
    • tedyu's avatar
      [SPARK-13904] Add exit code parameter to exitExecutor() · e8963360
      tedyu authored
      ## What changes were proposed in this pull request?
      
      This PR adds exit code parameter to exitExecutor() so that caller can specify different exit code.
      
      ## How was this patch tested?
      
      Existing test
      
      rxin hbhanawat
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #12457 from tedyu/master.
      e8963360
    • Wenchen Fan's avatar
      [SPARK-14491] [SQL] refactor object operator framework to make it easy to eliminate serializations · 9ee95b6e
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      This PR tries to separate the serialization and deserialization logic from object operators, so that it's easier to eliminate unnecessary serializations in optimizer.
      
      Typed aggregate related operators are special, they will deserialize the input row to multiple objects and it's difficult to simply use a deserializer operator to abstract it, so we still mix the deserialization logic there.
      
      ## How was this patch tested?
      
      existing tests and new test in `EliminateSerializationSuite`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #12260 from cloud-fan/encoder.
      9ee95b6e
    • Cheng Lian's avatar
      [SPARK-13681][SPARK-14458][SPARK-14566][SQL] Add back once removed... · 5e360c93
      Cheng Lian authored
      [SPARK-13681][SPARK-14458][SPARK-14566][SQL] Add back once removed CommitFailureTestRelationSuite and SimpleTextHadoopFsRelationSuite
      
      ## What changes were proposed in this pull request?
      
      These test suites were removed while refactoring `HadoopFsRelation` related API. This PR brings them back.
      
      This PR also fixes two regressions:
      
      - SPARK-14458, which causes runtime error when saving partitioned tables using `FileFormat` data sources that are not able to infer their own schemata. This bug wasn't detected by any built-in data sources because all of them happen to have schema inference feature.
      
      - SPARK-14566, which happens to be covered by SPARK-14458 and causes wrong query result or runtime error when
        - appending a Dataset `ds` to a persisted partitioned data source relation `t`, and
        - partition columns in `ds` don't all appear after data columns
      
      ## How was this patch tested?
      
      `CommitFailureTestRelationSuite` uses a testing relation that always fails when committing write tasks to test write job cleanup.
      
      `SimpleTextHadoopFsRelationSuite` uses a testing relation to test general `HadoopFsRelation` and `FileFormat` interfaces.
      
      The two regressions are both covered by existing test cases.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #12179 from liancheng/spark-13681-commit-failure-test.
      5e360c93
    • Dongjoon Hyun's avatar
      [SPARK-14577][SQL] Add spark.sql.codegen.maxCaseBranches config option · 3d46d796
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      We currently disable codegen for `CaseWhen` if the number of branches is greater than 20 (in CaseWhen.MAX_NUM_CASES_FOR_CODEGEN). It would be better if this value is a non-public config defined in SQLConf.
      
      ## How was this patch tested?
      
      Pass the Jenkins tests (including a new testcase `Support spark.sql.codegen.maxCaseBranches option`)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12353 from dongjoon-hyun/SPARK-14577.
      3d46d796
    • bomeng's avatar
      [SPARK-14398][SQL] Audit non-reserved keyword list in ANTLR4 parser · 74fe235a
      bomeng authored
      ## What changes were proposed in this pull request?
      
      I have compared non-reserved list in Antlr3 and Antlr4 one by one as well as all the existing keywords defined in Antlr4, added the missing keywords to the non-reserved keywords list.  If we need to support more syntax, we can add more keywords by then.
      
      Any recommendation for the above is welcome.
      
      ## How was this patch tested?
      
      I manually checked the keywords one by one. Please let me know if there is a better way to test.
      
      Another thought: I suggest to put all the keywords definition and non-reserved list in order, that will be much easier to check in the future.
      
      Author: bomeng <bmeng@us.ibm.com>
      
      Closes #12191 from bomeng/SPARK-14398.
      74fe235a
    • Wenchen Fan's avatar
      [SPARK-14595][SQL] add input metrics for FileScanRDD · d4b94ead
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      This is roughly based on the input metrics logic in `SqlNewHadoopRDD`
      
      ## How was this patch tested?
      
      Not sure how to write a test, I manually verified it in Spark UI.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #12352 from cloud-fan/metrics.
      d4b94ead
  2. Apr 18, 2016
    • Sameer Agarwal's avatar
      [SPARK-14722][SQL] Rename upstreams() -> inputRDDs() in WholeStageCodegen · 6f880068
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      
      Per rxin's suggestions, this patch renames `upstreams()` to `inputRDDs()` in `WholeStageCodegen` for better implied semantics
      
      ## How was this patch tested?
      
      N/A
      
      Author: Sameer Agarwal <sameer@databricks.com>
      
      Closes #12486 from sameeragarwal/codegen-cleanup.
      6f880068
    • Sameer Agarwal's avatar
      [SPARK-14718][SQL] Avoid mutating ExprCode in doGenCode · 4eae1dbd
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      
      The `doGenCode` method currently takes in an `ExprCode`, mutates it and returns the java code to evaluate the given expression. It should instead just return a new `ExprCode` to avoid passing around mutable objects during code generation.
      
      ## How was this patch tested?
      
      Existing Tests
      
      Author: Sameer Agarwal <sameer@databricks.com>
      
      Closes #12483 from sameeragarwal/new-exprcode-2.
      4eae1dbd
    • Josh Rosen's avatar
      [SPARK-14719] WriteAheadLogBasedBlockHandler should ignore BlockManager put errors · ed2de029
      Josh Rosen authored
      WriteAheadLogBasedBlockHandler will currently throw exceptions if its BlockManager `put()` calls fail, even though those calls are only performed as a performance optimization. Instead, it should log and ignore exceptions during that `put()`.
      
      This is a longstanding issue that was masked by an incorrect test case. I think that we haven't noticed this in production because
      
      1. most people probably use a `MEMORY_AND_DISK` storage level, and
      2. typically, individual blocks may be small enough relative to the total storage memory such that they're able to evict blocks from previous batches, so `put()` failures here may be rare in practice.
      
      This patch fixes the faulty test and fixes the bug.
      
      /cc tdas
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #12484 from JoshRosen/received-block-hadndler-fix.
      ed2de029
    • Reynold Xin's avatar
      [SPARK-14667] Remove HashShuffleManager · 5e92583d
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      The sort shuffle manager has been the default since Spark 1.2. It is time to remove the old hash shuffle manager.
      
      ## How was this patch tested?
      Removed some tests related to the old manager.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #12423 from rxin/SPARK-14667.
      5e92583d
    • CodingCat's avatar
      [SPARK-13227] Risky apply() in OpenHashMap · 4b3d1294
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-13227
      
      It might confuse the future developers when they use OpenHashMap.apply() with a numeric value type.
      
      null.asInstance[Int], null.asInstance[Long], null.asInstace[Float] and null.asInstance[Double] will return 0/0.0/0L, which might confuse the developer if the value set contains 0/0.0/0L with an existing key
      
      The current patch only adds the comments describing the issue, with the respect to apply the minimum changes to the code base
      
      The more direct, yet more aggressive, approach is use Option as the return type
      
      andrewor14  JoshRosen  any thoughts about how to avoid the potential issue?
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #11107 from CodingCat/SPARK-13227.
      4b3d1294
    • Mark Grover's avatar
      [SPARK-14711][BUILD] Examples jar not a part of distribution. · 2b151b6b
      Mark Grover authored
      ## What changes were proposed in this pull request?
      
      Move the spark-examples.jar from being in examples/target to examples/target/scala-2.11/jars
      
      ## How was this patch tested?
      
      Built distribution to make sure examples jar was being included in the tarball.
      Ran run-example to make sure examples were run.
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #12476 from markgrover/spark-14711.
      2b151b6b
    • Joseph K. Bradley's avatar
      [SPARK-14714][ML][PYTHON] Fixed issues with non-kwarg typeConverter arg for Param constructor · d29e429e
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      PySpark Param constructors need to pass the TypeConverter argument by name, partly to make sure it is not mistaken for the expectedType arg and partly because we will remove the expectedType arg in 2.1. In several places, this is not being done correctly.
      
      This PR changes all usages in pyspark/ml/ to keyword args.
      
      ## How was this patch tested?
      
      Existing unit tests.  I will not test type conversion for every Param unless we really think it necessary.
      
      Also, if you start the PySpark shell and import classes (e.g., pyspark.ml.feature.StandardScaler), then you no longer get this warning:
      ```
      /Users/josephkb/spark/python/pyspark/ml/param/__init__.py:58: UserWarning: expectedType is deprecated and will be removed in 2.1. Use typeConverter instead, as a keyword argument.
        "Use typeConverter instead, as a keyword argument.")
      ```
      That warning came from the typeConverter argument being passes as the expectedType arg by mistake.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12480 from jkbradley/typeconverter-fix.
      d29e429e
    • Zheng RuiFeng's avatar
      [SPARK-14515][DOC] Add python example for ChiSqSelector · 9bfb35da
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Add the missing python example for ChiSqSelector
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12283 from zhengruifeng/chi2_pe.
      9bfb35da
    • Wenchen Fan's avatar
      [SPARK-14628][CORE][FOLLLOW-UP] Always tracking read/write metrics · 60273408
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      This PR is a follow up for https://github.com/apache/spark/pull/12417, now we always track input/output/shuffle metrics in spark JSON protocol and status API.
      
      Most of the line changes are because of re-generating the gold answer for `HistoryServerSuite`, and we add a lot of 0 values for read/write metrics.
      
      ## How was this patch tested?
      
      existing tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #12462 from cloud-fan/follow.
      60273408
    • Shixiong Zhu's avatar
      [SPARK-14713][TESTS] Fix the flaky test NettyBlockTransferServiceSuite · 6ff04358
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      When there are multiple tests running, "NettyBlockTransferServiceSuite.can bind to a specific port twice and the second increments" may fail.
      
      E.g., assume there are 2 tests running. Here are the execution order to reproduce the test failure.
      
      | Execution Order | Test 1 | Test 2 |
      | ------------- | ------------- | ------------- |
      | 1 | service0 binds to 17634 |  |
      | 2 |  | service0 binds to 17635 (17634 is occupied) |
      | 3 | service1 binds to 17636 |  |
      | 4 | pass test |  |
      | 5 | service0.close (release 17634) |  |
      | 6 |  | service1 binds to 17634 |
      | 7 |  | `service1.port should be (service0.port + 1)` fails (17634 != 17635 + 1) |
      
      Here is an example in Jenkins: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.2/786/testReport/junit/org.apache.spark.network.netty/NettyBlockTransferServiceSuite/can_bind_to_a_specific_port_twice_and_the_second_increments/
      
      This PR makes two changes:
      
      - Use a random port between 17634 and 27634 to reduce the possibility of port conflicts.
      - Make `service1` use `service0.port` to bind to avoid the above race condition.
      
      ## How was this patch tested?
      
      Jenkins unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #12477 from zsxwing/SPARK-14713.
      6ff04358
    • Luciano Resende's avatar
      [SPARK-14504][SQL] Enable Oracle docker tests · 68450c8c
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      
      Enable Oracle docker tests
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #12270 from lresende/oracle.
      68450c8c
    • Andrew Or's avatar
      [SPARK-14674][SQL] Move HiveContext.hiveconf to HiveSessionState · f1a11976
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This is just cleanup. This allows us to remove HiveContext later without inflating the diff too much. This PR fixes the conflicts of https://github.com/apache/spark/pull/12431. It also removes the `def hiveConf` from `HiveSqlParser`. So, we will pass the HiveConf associated with a session explicitly instead of relying on Hive's `SessionState` to pass `HiveConf`.
      
      ## How was this patch tested?
      Existing tests.
      
      Closes #12431
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12449 from yhuai/hiveconf.
      f1a11976
    • Sameer Agarwal's avatar
      [SPARK-14710][SQL] Rename gen/genCode to genCode/doGenCode to better reflect the semantics · 8bd81213
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      
      Per rxin's suggestions, this patch renames `s/gen/genCode` and `s/genCode/doGenCode` to better reflect the semantics of these 2 function calls.
      
      ## How was this patch tested?
      
      N/A (refactoring only)
      
      Author: Sameer Agarwal <sameer@databricks.com>
      
      Closes #12475 from sameeragarwal/gencode.
      8bd81213
    • hyukjinkwon's avatar
      [MINOR] Revert removing explicit typing (changed in some examples and StatFunctions) · 6fc1e72d
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR reverts some changes in https://github.com/apache/spark/pull/12413. (please see the discussion in that PR).
      
      from
      ```scala
          words.foreachRDD { (rdd, time) =>
          ...
      ```
      
      to
      ```scala
          words.foreachRDD { (rdd: RDD[String], time: Time) =>
          ...
      ```
      
      Also, this was discussed in dev-mailing list, [here](http://apache-spark-developers-list.1001551.n3.nabble.com/Question-about-Scala-style-explicit-typing-within-transformation-functions-and-anonymous-val-td17173.html)
      
      ## How was this patch tested?
      
      This was tested with `sbt scalastyle`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12452 from HyukjinKwon/revert-explicit-typing.
      6fc1e72d
    • Xusen Yin's avatar
      [SPARK-14299][EXAMPLES] Remove duplications for scala.examples.ml · 8c62edb7
      Xusen Yin authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-14299
      
      Delete duplications in scala/examples/ml.
      
      TrainValidationSplitExample.scala --> ModelSelectionViaTrainValidationSplitExample
      CrossValidatorExample.scala --> ModelSelectionViaCrossValidationExample
      
      ## How was this patch tested?
      
      Existing tests passed.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #12366 from yinxusen/SPARK-14299-2.
      8c62edb7
    • Xusen Yin's avatar
      [SPARK-14440][PYSPARK] Remove pipeline specific reader and writer · f31a62d1
      Xusen Yin authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-14440
      
      Remove
      
      * PipelineMLWriter
      * PipelineMLReader
      * PipelineModelMLWriter
      * PipelineModelMLReader
      
      and modify comments.
      
      ## How was this patch tested?
      
      test with unit test.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #12216 from yinxusen/SPARK-14440.
      f31a62d1
    • Andrew Or's avatar
      [SPARK-14647][SQL] Group SQLContext/HiveContext state into SharedState · 28ee1570
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This patch adds a SharedState that groups state shared across multiple SQLContexts. This is analogous to the SessionState added in SPARK-13526 that groups session-specific state. This cleanup makes the constructors of the contexts simpler and ultimately allows us to remove HiveContext in the near future.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12463 from yhuai/sharedState.
      28ee1570
    • Reynold Xin's avatar
      [HOTFIX] Fix Scala 2.10 compilation break. · e4ae9742
      Reynold Xin authored
      e4ae9742
    • Jason Lee's avatar
      [SPARK-14564][ML][MLLIB][PYSPARK] Python Word2Vec missing setWindowSize method · 3d66a2ce
      Jason Lee authored
      ## What changes were proposed in this pull request?
      Added windowSize getter/setter to ML/MLlib
      
      ## How was this patch tested?
      Added test cases in tests.py under both ML and MLlib
      
      Author: Jason Lee <cjlee@us.ibm.com>
      
      Closes #12428 from jasoncl/SPARK-14564.
      3d66a2ce
    • Dongjoon Hyun's avatar
      [SPARK-14580][SPARK-14655][SQL] Hive IfCoercion should preserve predicate. · d280d1da
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Currently, `HiveTypeCoercion.IfCoercion` removes all predicates whose return-type are null. However, some UDFs need evaluations because they are designed to throw exceptions. This PR fixes that to preserve the predicates. Also, `assert_true` is implemented as Spark SQL function.
      
      **Before**
      ```
      scala> sql("select if(assert_true(false),2,3)").head
      res2: org.apache.spark.sql.Row = [3]
      ```
      
      **After**
      ```
      scala> sql("select if(assert_true(false),2,3)").head
      ... ASSERT_TRUE ...
      ```
      
      **Hive**
      ```
      hive> select if(assert_true(false),2,3);
      OK
      Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: ASSERT_TRUE(): assertion failed.
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins tests (including a new testcase in `HivePlanTest`)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12340 from dongjoon-hyun/SPARK-14580.
      d280d1da
    • Xusen Yin's avatar
      [SPARK-14306][ML][PYSPARK] PySpark ml.classification OneVsRest support export/import · b64482f4
      Xusen Yin authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-14306
      
      Add PySpark OneVsRest save/load supports.
      
      ## How was this patch tested?
      
      Test with Python unit test.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #12439 from yinxusen/SPARK-14306-0415.
      b64482f4
    • Tathagata Das's avatar
      [SPARK-14473][SQL] Define analysis rules to catch operations not supported in streaming · 775cf17e
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      There are many operations that are currently not supported in the streaming execution. For example:
       - joining two streams
       - unioning a stream and a batch source
       - sorting
       - window functions (not time windows)
       - distinct aggregates
      
      Furthermore, executing a query with a stream source as a batch query should also fail.
      
      This patch add an additional step after analysis in the QueryExecution which will check that all the operations in the analyzed logical plan is supported or not.
      
      ## How was this patch tested?
      unit tests.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #12246 from tdas/SPARK-14473.
      775cf17e
    • Dongjoon Hyun's avatar
      [SPARK-14614] [SQL] Add `bround` function · 432d1399
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to add `bound` function (aka Banker's round) by extending current `round` implementation. [Hive supports `bround` since 1.3.0.](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF)
      
      **Hive (1.3 ~ 2.0)**
      ```
      hive> select round(2.5), bround(2.5);
      OK
      3.0	2.0
      ```
      
      **After this PR**
      ```scala
      scala> sql("select round(2.5), bround(2.5)").head
      res0: org.apache.spark.sql.Row = [3,2]
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins tests (with extended tests).
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12376 from dongjoon-hyun/SPARK-14614.
      432d1399
    • jerryshao's avatar
      [SPARK-14423][YARN] Avoid same name files added to distributed cache again · d6fb485d
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      In the current implementation of assembly-free spark deployment, jars under `assembly/target/scala-xxx/jars` will be uploaded to distributed cache by default, there's a chance these jars' name will be conflicted with name of jars specified in `--jars`, this will introduce exception when starting application:
      
      ```
      client token: N/A
      	 diagnostics: Application application_1459907402325_0004 failed 2 times due to AM Container for appattempt_1459907402325_0004_000002 exited with  exitCode: -1000
      For more detailed output, check application tracking page:http://hw12100.local:8088/proxy/application_1459907402325_0004/Then, click on links to logs of each attempt.
      Diagnostics: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590
      java.io.IOException: Resource hdfs://localhost:8020/user/sshao/.sparkStaging/application_1459907402325_0004/avro-mapred-1.7.7-hadoop2.jar changed on src filesystem (expected 1459909780508, was 1459909782590
      	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
      	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
      	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
      	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      ```
      
      So here by checking the name of file to avoid same name files uploaded again.
      
      ## How was this patch tested?
      
      Unit test and manual integrated test is done locally.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #12203 from jerryshao/SPARK-14423.
      d6fb485d
    • Reynold Xin's avatar
      [SPARK-14696][SQL] Add implicit encoders for boxed primitive types · 1a396647
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      We currently only have implicit encoders for scala primitive types. We should also add implicit encoders for boxed primitives. Otherwise, the following code would not have an encoder:
      
      ```scala
      sqlContext.range(1000).map { i => i }
      ```
      
      ## How was this patch tested?
      Added a unit test case for this.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #12466 from rxin/SPARK-14696.
      1a396647
    • Wenchen Fan's avatar
      [SPARK-13363][SQL] support Aggregator in RelationalGroupedDataset · 2f1d0320
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      set the input encoder for `TypedColumn` in `RelationalGroupedDataset.agg`.
      
      ## How was this patch tested?
      
      new tests in `DatasetAggregatorSuite`
      
      close https://github.com/apache/spark/pull/11269
      
      This PR brings https://github.com/apache/spark/pull/12359 up to date and fix the compile.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #12451 from cloud-fan/agg.
      2f1d0320
  3. Apr 17, 2016
    • Andrew Or's avatar
      7de06a64
    • Subhobrata Dey's avatar
      [SPARK-14632] randomSplit method fails on dataframes with maps in schema · 699a4dfd
      Subhobrata Dey authored
      ## What changes were proposed in this pull request?
      
      The patch fixes the issue with the randomSplit method which is not able to split dataframes which has maps in schema. The bug was introduced in spark 1.6.1.
      
      ## How was this patch tested?
      
      Tested with unit tests.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Subhobrata Dey <sbcd90@gmail.com>
      
      Closes #12438 from sbcd90/randomSplitIssue.
      699a4dfd
    • Reynold Xin's avatar
      8a87f7d5
    • Hemant Bhanawat's avatar
      [SPARK-13904][SCHEDULER] Add support for pluggable cluster manager · af1f4da7
      Hemant Bhanawat authored
      ## What changes were proposed in this pull request?
      
      This commit adds support for pluggable cluster manager. And also allows a cluster manager to clean up tasks without taking the parent process down.
      
      To plug a new external cluster manager, ExternalClusterManager trait should be implemented. It returns task scheduler and backend scheduler that will be used by SparkContext to schedule tasks. An external cluster manager is registered using the java.util.ServiceLoader mechanism (This mechanism is also being used to register data sources like parquet, json, jdbc etc.). This allows auto-loading implementations of ExternalClusterManager interface.
      
      Currently, when a driver fails, executors exit using system.exit. This does not bode well for cluster managers that would like to reuse the parent process of an executor. Hence,
      
        1. Moving system.exit to a function that can be overriden in subclasses of CoarseGrainedExecutorBackend.
        2. Added functionality of killing all the running tasks in an executor.
      
      ## How was this patch tested?
      ExternalClusterManagerSuite.scala was added to test this patch.
      
      Author: Hemant Bhanawat <hemant@snappydata.io>
      
      Closes #11723 from hbhanawat/pluggableScheduler.
      af1f4da7
  4. Apr 16, 2016
    • Andrew Or's avatar
      [SPARK-14672][SQL] Move HiveContext analyze logic to AnalyzeTable · 3394b12c
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      Move the implementation of `hiveContext.analyze` to the command of `AnalyzeTable`.
      
      ## How was this patch tested?
      Existing tests.
      
      Closes #12429
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12448 from yhuai/analyzeTable.
      3394b12c
    • Andrew Or's avatar
      [SPARK-14647][SQL] Group SQLContext/HiveContext state into SharedState · 5cefecc9
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This patch adds a SharedState that groups state shared across multiple SQLContexts. This is analogous to the SessionState added in SPARK-13526 that groups session-specific state. This cleanup makes the constructors of the contexts simpler and ultimately allows us to remove HiveContext in the near future.
      
      ## How was this patch tested?
      Existing tests.
      
      Closes #12405
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12447 from yhuai/sharedState.
      5cefecc9
    • 杨博 (Yang Bo)'s avatar
      [SPARK-14683][DOCUMENTATION] Configure external links in ScalaDoc · 3f49afee
      杨博 (Yang Bo) authored
      Right now Spark's Scaladoc does not link to Scala standard library and other dependencies. This would bother Spark starters because they may be not experienced Scala programmers.
      
      This patch fixes these links in ScalaDoc.
      
      Author: 杨博 (Yang Bo) <pop.atry@gmail.com>
      
      Closes #12444 from Atry/patch-1.
      3f49afee
Loading