Skip to content
Snippets Groups Projects
  1. May 09, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-15211][SQL] Select features column from LibSVMRelation causes failure · 635ef407
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      We need to use `requiredSchema` in `LibSVMRelation` to project the fetch required columns when loading data from this data source. Otherwise, when users try to select `features` column, it will cause failure.
      
      ## How was this patch tested?
      `LibSVMRelationSuite`.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #12986 from viirya/fix-libsvmrelation.
      635ef407
    • gatorsmile's avatar
      [SPARK-15184][SQL] Fix Silent Removal of An Existent Temp Table by Rename Table · a59ab594
      gatorsmile authored
      #### What changes were proposed in this pull request?
      Currently, if we rename a temp table `Tab1` to another existent temp table `Tab2`. `Tab2` will be silently removed. This PR is to detect it and issue an exception message.
      
      In addition, this PR also detects another issue in the rename table command. When the destination table identifier does have database name, we should not ignore them. That might mean users could rename a regular table.
      
      #### How was this patch tested?
      Added two related test cases
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #12959 from gatorsmile/rewriteTable.
      a59ab594
  2. May 08, 2016
    • gatorsmile's avatar
      [SPARK-15185][SQL] InMemoryCatalog: Silent Removal of an Existent... · e9131ec2
      gatorsmile authored
      [SPARK-15185][SQL] InMemoryCatalog: Silent Removal of an Existent Table/Function/Partitions by Rename
      
      #### What changes were proposed in this pull request?
      So far, in the implementation of InMemoryCatalog, we do not check if the new/destination table/function/partition exists or not. Thus, we just silently remove the existent table/function/partition.
      
      This PR is to detect them and issue an appropriate exception.
      
      #### How was this patch tested?
      Added the related test cases. They also verify if HiveExternalCatalog also detects these errors.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #12960 from gatorsmile/renameInMemoryCatalog.
      e9131ec2
    • Sun Rui's avatar
      [SPARK-12479][SPARKR] sparkR collect on GroupedData throws R error "missing... · 454ba4d6
      Sun Rui authored
      [SPARK-12479][SPARKR] sparkR collect on GroupedData throws R error "missing value where TRUE/FALSE needed"
      
      ## What changes were proposed in this pull request?
      
      This PR is a workaround for NA handling in hash code computation.
      
      This PR is on behalf of paulomagalhaes whose PR is https://github.com/apache/spark/pull/10436
      
      ## How was this patch tested?
      SparkR unit tests.
      
      Author: Sun Rui <sunrui2016@gmail.com>
      Author: ray <ray@rays-MacBook-Air.local>
      
      Closes #12976 from sun-rui/SPARK-12479.
      454ba4d6
  3. May 07, 2016
    • Sandeep Singh's avatar
      [SPARK-15178][CORE] Remove LazyFileRegion instead use netty's DefaultFileRegion · 6e268b9e
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      Remove LazyFileRegion instead use netty's DefaultFileRegion, since It was created so that we didn't create a file descriptor before having to send the file.
      
      ## How was this patch tested?
      Existing tests
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #12977 from techaddict/SPARK-15178.
      6e268b9e
    • Bryan Cutler's avatar
      [DOC][MINOR] Fixed minor errors in feature.ml user guide doc · 5d188a69
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Fixed some minor errors found when reviewing feature.ml user guide
      
      ## How was this patch tested?
      built docs locally
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.
      5d188a69
    • Nick Pentreath's avatar
      [MINOR][ML][PYSPARK] ALS example cleanup · b0cafdb6
      Nick Pentreath authored
      Cleans up ALS examples by removing unnecessary casts to double for `rating` and `prediction` columns, since `RegressionEvaluator` now supports `Double` & `Float` input types.
      
      ## How was this patch tested?
      
      Manual compile and run with `run-example ml.ALSExample` and `spark-submit examples/src/main/python/ml/als_example.py`.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #12892 from MLnick/als-examples-cleanup.
      b0cafdb6
  4. May 06, 2016
    • Herman van Hovell's avatar
      [SPARK-15122] [SQL] Fix TPC-DS 41 - Normalize predicates before pulling them out · df89f1d4
      Herman van Hovell authored
      ## What changes were proposed in this pull request?
      The official TPC-DS 41 query currently fails because it contains a scalar subquery with a disjunctive correlated predicate (the correlated predicates were nested in ORs). This makes the `Analyzer` pull out the entire predicate which is wrong and causes the following (correct) analysis exception: `The correlated scalar subquery can only contain equality predicates`
      
      This PR fixes this by first simplifing (or normalizing) the correlated predicates before pulling them out of the subquery.
      
      ## How was this patch tested?
      Manual testing on TPC-DS 41, and added a test to SubquerySuite.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #12954 from hvanhovell/SPARK-15122.
      df89f1d4
    • Kevin Yu's avatar
      [SPARK-15051][SQL] Create a TypedColumn alias · 607a27a0
      Kevin Yu authored
      ## What changes were proposed in this pull request?
      
      Currently when we create an alias against a TypedColumn from user-defined Aggregator(for example: agg(aggSum.toColumn as "a")), spark is using the alias' function from Column( as), the alias function will return a column contains a TypedAggregateExpression, which is unresolved because the inputDeserializer is not defined. Later the aggregator function (agg) will inject the inputDeserializer back to the TypedAggregateExpression, but only if the aggregate columns are TypedColumn, in the above case, the TypedAggregateExpression will remain unresolved because it is under column and caused the
      problem reported by this jira [15051](https://issues.apache.org/jira/browse/SPARK-15051?jql=project%20%3D%20SPARK).
      
      This PR propose to create an alias function for TypedColumn,  it will return a TypedColumn. It is using the similar code path  as Column's alia function.
      
      For the spark build in aggregate function, like max, it is working with alias, for example
      
      val df1 = Seq(1 -> "a", 2 -> "b", 3 -> "b").toDF("i", "j")
      checkAnswer(df1.agg(max("j") as "b"), Row(3) :: Nil)
      
      Thanks for comments.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Add test cases in DatasetAggregatorSuite.scala
      run the sql related queries against this patch.
      
      Author: Kevin Yu <qyu@us.ibm.com>
      
      Closes #12893 from kevinyu98/spark-15051.
      607a27a0
    • Sandeep Singh's avatar
      [SPARK-15087][MINOR][DOC] Follow Up: Fix the Comments · a21a3bbe
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      Remove the Comment, since it not longer applies. see the discussion here(https://github.com/apache/spark/pull/12865#discussion-diff-61946906)
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #12953 from techaddict/SPARK-15087-FOLLOW-UP.
      a21a3bbe
    • Thomas Graves's avatar
      [SPARK-1239] Improve fetching of map output statuses · cc95f1ed
      Thomas Graves authored
      The main issue we are trying to solve is the memory bloat of the Driver when tasks request the map output statuses.  This means with a large number of tasks you either need a huge amount of memory on Driver or you have to repartition to smaller number.  This makes it really difficult to run over say 50000 tasks.
      
      The main issues that cause the memory bloat are:
      1) no flow control on sending the map output status responses.  We serialize the map status output  and then hand off to netty to send.  netty is sending asynchronously and it can't send them fast enough to keep up with incoming requests so we end up with lots of copies of the serialized map output statuses sitting there and this causes huge bloat when you have 10's of thousands of tasks and map output status is in the 10's of MB.
      2) When initial reduce tasks are started up, they all request the map output statuses from the Driver. These requests are handled by multiple threads in parallel so even though we check to see if we have a cached version, initially when we don't have a cached version yet, many of initial requests can all end up serializing the exact same map output statuses.
      
      This patch does a couple of things:
      - When the map output status size is over a threshold (default 512K) then it uses broadcast to send the map statuses.  This means we no longer serialize a large map output status and thus we don't have issues with memory bloat.  the messages sizes are now in the 300-400 byte range and the map status output are broadcast. If its under the threadshold it sends it as before, the message contains the DIRECT indicator now.
      - synchronize the incoming requests to allow one thread to cache the serialized output and broadcast the map output status  that can then be used by everyone else.  This ensures we don't create multiple broadcast variables when we don't need to.  To ensure this happens I added a second thread pool which the Dispatcher hands the requests to so that those threads can block without blocking the main dispatcher threads (which would cause things like heartbeats and such not to come through)
      
      Note that some of design and code was contributed by mridulm
      
      ## How was this patch tested?
      
      Unit tests and a lot of manually testing.
      Ran with akka and netty rpc. Ran with both dynamic allocation on and off.
      
      one of the large jobs I used to test this was a join of 15TB of data.  it had 200,000 map tasks, and  20,000 reduce tasks. Executors ranged from 200 to 2000.  This job ran successfully with 5GB of memory on the driver with these changes. Without these changes I was using 20GB and only had 500 reduce tasks.  The job has 50mb of serialized map output statuses and took roughly the same amount of time for the executors to get the map output statuses as before.
      
      Ran a variety of other jobs, from large wordcounts to small ones not using broadcasts.
      
      Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
      
      Closes #12113 from tgravescs/SPARK-1239.
      cc95f1ed
    • Tathagata Das's avatar
      [SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there... · f7b7ef41
      Tathagata Das authored
      [SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when there is no partitioning scheme in the given paths
      
      ## What changes were proposed in this pull request?
      Lets says there are json files in the following directories structure
      ```
      xyz/file0.json
      xyz/subdir1/file1.json
      xyz/subdir2/file2.json
      xyz/subdir1/subsubdir1/file3.json
      ```
      `sqlContext.read.json("xyz")` should read only file0.json according to behavior in Spark 1.6.1. However in current master, all the 4 files are read.
      
      The fix is to make FileCatalog return only the children files of the given path if there is not partitioning detected (instead of all the recursive list of files).
      
      Closes #12774
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #12856 from tdas/SPARK-14997.
      f7b7ef41
    • Burak Köse's avatar
      [SPARK-14050][ML] Add multiple languages support and additional methods for Stop Words Remover · e20cd9f4
      Burak Köse authored
      ## What changes were proposed in this pull request?
      
      This PR continues the work from #11871 with the following changes:
      * load English stopwords as default
      * covert stopwords to list in Python
      * update some tests and doc
      
      ## How was this patch tested?
      
      Unit tests.
      
      Closes #11871
      
      cc: burakkose srowen
      
      Author: Burak Köse <burakks41@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Burak KOSE <burakks41@gmail.com>
      
      Closes #12843 from mengxr/SPARK-14050.
      e20cd9f4
    • gatorsmile's avatar
      [SPARK-15108][SQL] Describe Permanent UDTF · 5c8fad7b
      gatorsmile authored
      #### What changes were proposed in this pull request?
      When Describe a UDTF, the command returns a wrong result. The command is unable to find the function, which has been created and cataloged in the catalog but not in the functionRegistry.
      
      This PR is to correct it. If the function is not in the functionRegistry, we will check the catalog for collecting the information of the UDTF function.
      
      #### How was this patch tested?
      Added test cases to verify the results
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #12885 from gatorsmile/showFunction.
      5c8fad7b
    • Zheng RuiFeng's avatar
      [SPARK-14512] [DOC] Add python example for QuantileDiscretizer · 76ad04d9
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Add the missing python example for QuantileDiscretizer
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12281 from zhengruifeng/discret_pe.
      76ad04d9
    • hyukjinkwon's avatar
      [SPARK-14962][SQL] Do not push down isnotnull/isnull on unsuportted types in ORC · fa928ff9
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-14962
      
      ORC filters were being pushed down for all types for both `IsNull` and `IsNotNull`.
      
      This is apparently OK because both `IsNull` and `IsNotNull` do not take a type as an argument (Hive 1.2.x) during building filters (`SearchArgument`) in Spark-side but they do not filter correctly because stored statistics always produces `null` for not supported types (eg `ArrayType`) in ORC-side. So, it is always `true` for `IsNull` which ends up with always `false` for `IsNotNull`. (Please see [RecordReaderImpl.java#L296-L318](https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java#L296-L318)  and [RecordReaderImpl.java#L359-L365](https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java#L359-L365) in Hive 1.2)
      
      This looks prevented in Hive 1.3.x >= by forcing to give a type ([`PredicateLeaf.Type`](https://github.com/apache/hive/blob/e085b7e9bd059d91aaf013df0db4d71dca90ec6f/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.java#L50-L56)) when building a filter ([`SearchArgument`](https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java#L260)) but Hive 1.2.x seems not doing this.
      
      This PR prevents ORC filter creation for `IsNull` and `IsNotNull` on unsupported types. `OrcFilters` resembles `ParquetFilters`.
      
      ## How was this patch tested?
      
      Unittests in `OrcQuerySuite` and `OrcFilterSuite` and `sbt scalastyle`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Hyukjin Kwon <gurwls223@gmail.com>
      
      Closes #12777 from HyukjinKwon/SPARK-14962.
      fa928ff9
    • Luciano Resende's avatar
      [SPARK-14738][BUILD] Separate docker integration tests from main build · a03c5e68
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      
      Create a maven profile for executing the docker integration tests using maven
      Remove docker integration tests from main sbt build
      Update documentation on how to run docker integration tests from sbt
      
      ## How was this patch tested?
      
      Manual test of the docker integration tests as in :
      mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test
      
      ## Other comments
      
      Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #12508 from lresende/docker.
      a03c5e68
  5. May 05, 2016
    • Sun Rui's avatar
      [SPARK-11395][SPARKR] Support over and window specification in SparkR. · 157a49aa
      Sun Rui authored
      This PR:
      1. Implement WindowSpec S4 class.
      2. Implement Window.partitionBy() and Window.orderBy() as utility functions to create WindowSpec objects.
      3. Implement over() of Column class.
      
      Author: Sun Rui <rui.sun@intel.com>
      Author: Sun Rui <sunrui2016@gmail.com>
      
      Closes #10094 from sun-rui/SPARK-11395.
      157a49aa
    • Andrew Or's avatar
      [HOTFIX] Fix MLUtils compile · 7f5922aa
      Andrew Or authored
      7f5922aa
    • Jacek Laskowski's avatar
      [SPARK-15152][DOC][MINOR] Scaladoc and Code style Improvements · bbb77734
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Minor doc and code style fixes
      
      ## How was this patch tested?
      
      local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #12928 from jaceklaskowski/SPARK-15152.
      bbb77734
    • Dilip Biswal's avatar
      [SPARK-14893][SQL] Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed · 02c07e89
      Dilip Biswal authored
      ## What changes were proposed in this pull request?
      
      Enable the test that was disabled when HiveContext was removed.
      
      ## How was this patch tested?
      
      Made sure the enabled test passes with the new jar.
      
      Author: Dilip Biswal <dbiswal@us.ibm.com>
      
      Closes #12924 from dilipbiswal/spark-14893.
      02c07e89
    • Ryan Blue's avatar
      [SPARK-9926] Parallelize partition logic in UnionRDD. · 08db4912
      Ryan Blue authored
      This patch has the new logic from #8512 that uses a parallel collection to compute partitions in UnionRDD. The rest of #8512 added an alternative code path for calculating splits in S3, but that isn't necessary to get the same speedup. The underlying problem wasn't that bulk listing wasn't used, it was that an extra FileStatus was retrieved for each file. The fix was just committed as [HADOOP-12810](https://issues.apache.org/jira/browse/HADOOP-12810). (I think the original commit also used a single prefix to enumerate all paths, but that isn't always helpful and it was removed in later versions so there is no need for SparkS3Utils.)
      
      I tested this using the same table that piapiaozhexiu was using. Calculating splits for a 10-day period took 25 seconds with this change and HADOOP-12810, which is on par with the results from #8512.
      
      Author: Ryan Blue <blue@apache.org>
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #11242 from rdblue/SPARK-9926-parallelize-union-rdd.
      08db4912
    • depend's avatar
      [SPARK-15158][CORE] downgrade shouldRollover message to debug level · 5c47db06
      depend authored
      ## What changes were proposed in this pull request?
      set log level to debug when check shouldRollover
      
      ## How was this patch tested?
      It's tested manually.
      
      Author: depend <depend@gmail.com>
      
      Closes #12931 from depend/master.
      5c47db06
    • Dongjoon Hyun's avatar
      [SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update... · 2c170dd3
      Dongjoon Hyun authored
      [SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update binary_classification_metrics_example.py
      
      ## What changes were proposed in this pull request?
      
      This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
      - Use multiline format in SparkSession builder patterns.
      - Update `binary_classification_metrics_example.py` to use `SparkSession`.
      - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)
      
      ## How was this patch tested?
      
      After passing the Jenkins tests and run `dev/lint-java` manually.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12911 from dongjoon-hyun/SPARK-15134.
      2c170dd3
    • Shixiong Zhu's avatar
      [SPARK-15135][SQL] Make sure SparkSession thread safe · bb9991de
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Went through SparkSession and its members and fixed non-thread-safe classes used by SparkSession
      
      ## How was this patch tested?
      
      Existing unit tests
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #12915 from zsxwing/spark-session-thread-safe.
      bb9991de
    • Sandeep Singh's avatar
      [SPARK-15072][SQL][REPL][EXAMPLES] Remove SparkSession.withHiveSupport · ed6f3f8a
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      Removing the `withHiveSupport` method of `SparkSession`, instead use `enableHiveSupport`
      
      ## How was this patch tested?
      ran tests locally
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #12851 from techaddict/SPARK-15072.
      ed6f3f8a
    • gatorsmile's avatar
      [SPARK-14124][SQL][FOLLOWUP] Implement Database-related DDL Commands · 8cba57a7
      gatorsmile authored
      #### What changes were proposed in this pull request?
      
      First, a few test cases failed in mac OS X  because the property value of `java.io.tmpdir` does not include a trailing slash on some platform. Hive always removes the last trailing slash. For example, what I got in the web:
      ```
      Win NT  --> C:\TEMP\
      Win XP  --> C:\TEMP
      Solaris --> /var/tmp/
      Linux   --> /var/tmp
      ```
      Second, a couple of test cases are added to verify if the commands work properly.
      
      #### How was this patch tested?
      Added a test case for it and correct the previous test cases.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      Closes #12081 from gatorsmile/mkdir.
      8cba57a7
    • Cheng Lian's avatar
      [MINOR][BUILD] Adds spark-warehouse/ to .gitignore · 63db2bd2
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      Adds spark-warehouse/ to `.gitignore`.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #12929 from liancheng/gitignore-spark-warehouse.
      63db2bd2
    • NarineK's avatar
      [SPARK-15110] [SPARKR] Implement repartitionByColumn for SparkR DataFrames · 22226fcc
      NarineK authored
      ## What changes were proposed in this pull request?
      
      Implement repartitionByColumn on DataFrame.
      This will allow us to run R functions on each partition identified by column groups with dapply() method.
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: NarineK <narine.kokhlikyan@us.ibm.com>
      
      Closes #12887 from NarineK/repartitionByColumns.
      22226fcc
    • hyukjinkwon's avatar
      [SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0 · ac12b35d
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-15148
      
      Mainly it improves the performance roughtly about 30%-40% according to the [release note](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.1.0). For the details of the purpose is described in the JIRA.
      
      This PR upgrades Univocity library from 2.0.2 to 2.1.0.
      
      ## How was this patch tested?
      
      Existing tests should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12923 from HyukjinKwon/SPARK-15148.
      ac12b35d
    • Wenchen Fan's avatar
      [SPARK-14139][SQL] RowEncoder should preserve schema nullability · 55cc1c99
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      The problem is: In `RowEncoder`, we use `Invoke` to get the field of an external row, which lose the nullability information. This PR creates a `GetExternalRowField` expression, so that we can preserve the nullability info.
      
      TODO: simplify the null handling logic in `RowEncoder`, to remove so many if branches, in follow-up PR.
      
      ## How was this patch tested?
      
      new tests in `RowEncoderSuite`
      
      Note that, This PR takes over https://github.com/apache/spark/pull/11980, with a little simplification, so all credits should go to koertkuipers
      
      Author: Wenchen Fan <wenchen@databricks.com>
      Author: Koert Kuipers <koert@tresata.com>
      
      Closes #12364 from cloud-fan/nullable.
      55cc1c99
    • Jason Moore's avatar
      [SPARK-14915][CORE] Don't re-queue a task if another attempt has already succeeded · 77361a43
      Jason Moore authored
      ## What changes were proposed in this pull request?
      
      Don't re-queue a task if another attempt has already succeeded.  This currently happens when a speculative task is denied from committing the result due to another copy of the task already having succeeded.
      
      ## How was this patch tested?
      
      I'm running a job which has a fair bit of skew in the processing time across the tasks for speculation to trigger in the last quarter (default settings), causing many commit denied exceptions to be thrown.  Previously, these tasks were then being retried over and over again until the stage possibly completes (despite using compute resources on these superfluous tasks).  With this change (applied to the 1.6 branch), they no longer retry and the stage completes successfully without these extra task attempts.
      
      Author: Jason Moore <jasonmoore2k@outlook.com>
      
      Closes #12751 from jasonmoore2k/SPARK-14915.
      77361a43
    • Luciano Resende's avatar
      [SPARK-14589][SQL] Enhance DB2 JDBC Dialect docker tests · 10443022
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      
      Enhance the DB2 JDBC Dialect docker tests as they seemed to have had some issues on previous merge causing some tests to fail.
      
      ## How was this patch tested?
      
      By running the integration tests locally.
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #12348 from lresende/SPARK-14589.
      10443022
    • Holden Karau's avatar
      [SPARK-15106][PYSPARK][ML] Add PySpark package doc for ML component & remove "BETA" · 4c0d827c
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Copy the package documentation from Scala/Java to Python for ML package and remove beta tags. Not super sure if we want to keep the BETA tag but since we are making it the default it seems like probably the time to remove it (happy to put it back in if we want to keep it BETA).
      
      ## How was this patch tested?
      
      Python documentation built locally as HTML and text and verified output.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12883 from holdenk/SPARK-15106-add-pyspark-package-doc-for-ml.
      4c0d827c
    • mcheah's avatar
      [SPARK-12154] Upgrade to Jersey 2 · b7fdc23c
      mcheah authored
      ## What changes were proposed in this pull request?
      
      Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things.
      
      ## How was this patch tested?
      
      I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #12715 from mccheah/feature/upgrade-jersey.
      b7fdc23c
    • Lining Sun's avatar
      [SPARK-15123] upgrade org.json4s to 3.2.11 version · 592fc455
      Lining Sun authored
      ## What changes were proposed in this pull request?
      
      We had the issue when using snowplow in our Spark applications. Snowplow requires json4s version 3.2.11 while Spark still use a few years old version 3.2.10. The change is to upgrade json4s jar to 3.2.11.
      
      ## How was this patch tested?
      
      We built Spark jar and successfully ran our applications in local and cluster modes.
      
      Author: Lining Sun <lining@gmail.com>
      
      Closes #12901 from liningalex/master.
      592fc455
    • Abhinav Gupta's avatar
      [SPARK-15045] [CORE] Remove dead code in TaskMemoryManager.cleanUpAllAllocatedMemory for pageTable · 1a5c6fce
      Abhinav Gupta authored
      ## What changes were proposed in this pull request?
      
      Removed the DeadCode as suggested.
      
      Author: Abhinav Gupta <abhi.951990@gmail.com>
      
      Closes #12829 from abhi951990/master.
      1a5c6fce
    • Kousuke Saruta's avatar
      [SPARK-15132][MINOR][SQL] Debug log for generated code should be printed with proper indentation · 1a9b3415
      Kousuke Saruta authored
      ## What changes were proposed in this pull request?
      
      Similar to #11990, GenerateOrdering and GenerateColumnAccessor should print debug log for generated code with proper indentation.
      
      ## How was this patch tested?
      
      Manually checked.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #12908 from sarutak/SPARK-15132.
      1a9b3415
  6. May 04, 2016
Loading