Skip to content
Snippets Groups Projects
  1. Aug 15, 2017
    • Marcelo Vanzin's avatar
      [SPARK-17742][CORE] Handle child process exit in SparkLauncher. · cba826d0
      Marcelo Vanzin authored
      Currently the launcher handle does not monitor the child spark-submit
      process it launches; this means that if the child exits with an error,
      the handle's state will never change, and an application will not know
      that the application has failed.
      
      This change adds code to monitor the child process, and changes the
      handle state appropriately when the child process exits.
      
      Tested with added unit tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18877 from vanzin/SPARK-17742.
      cba826d0
    • Wenchen Fan's avatar
      [SPARK-18464][SQL][FOLLOWUP] support old table which doesn't store schema in table properties · 14bdb25f
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      This is a follow-up of https://github.com/apache/spark/pull/15900 , to fix one more bug:
      When table schema is empty and need to be inferred at runtime, we should not resolve parent plans before the schema has been inferred, or the parent plans will be resolved against an empty schema and may get wrong result for something like `select *`
      
      The fix logic is: introduce `UnresolvedCatalogRelation` as a placeholder. Then we replace it with `LogicalRelation` or `HiveTableRelation` during analysis, so that it's guaranteed that we won't resolve parent plans until the schema has been inferred.
      
      ## How was this patch tested?
      
      regression test
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #18907 from cloud-fan/bug.
      14bdb25f
    • donnyzone's avatar
      [SPARK-19471][SQL] AggregationIterator does not initialize the generated... · bc990258
      donnyzone authored
      [SPARK-19471][SQL] AggregationIterator does not initialize the generated result projection before using it
      
      ## What changes were proposed in this pull request?
      
      This is a follow-up PR that moves the test case in PR-18920 (https://github.com/apache/spark/pull/18920) to DataFrameAggregateSuit.
      
      ## How was this patch tested?
      unit test
      
      Author: donnyzone <wellfengzhu@gmail.com>
      
      Closes #18946 from DonnyZone/branch-19471-followingPR.
      bc990258
    • Shixiong Zhu's avatar
      [SPARK-21732][SQL] Lazily init hive metastore client · 12411b5e
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR changes the codes to lazily init hive metastore client so that we can create SparkSession without talking to the hive metastore sever.
      
      It's pretty helpful when you set a hive metastore server but it's down. You can still start the Spark shell to debug.
      
      ## How was this patch tested?
      
      The new unit test.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #18944 from zsxwing/hive-lazy-init.
      12411b5e
    • hyukjinkwon's avatar
      [SPARK-21724][SQL][DOC] Adds since information in the documentation of date functions · 0422ce06
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR adds `since` annotation in documentation so that this can be rendered as below:
      
      <img width="290" alt="2017-08-14 6 54 26" src="https://user-images.githubusercontent.com/6477701/29267050-034c1f64-8122-11e7-862b-7dfc38e292bf.png">
      
      ## How was this patch tested?
      
      Manually checked the documentation by `cd sql && ./create-docs.sh`.
      Also, Jenkins tests are required.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18939 from HyukjinKwon/add-sinces-date-functions.
      0422ce06
    • Liang-Chi Hsieh's avatar
      [SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are successfully removed · 4c3cf1cc
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      We put staging path to delete into the deleteOnExit cache of `FileSystem` in case of the path can't be successfully removed. But when we successfully remove the path, we don't remove it from the cache. We should do it to avoid continuing grow the cache size.
      
      ## How was this patch tested?
      
      Added a test.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #18934 from viirya/SPARK-21721.
      4c3cf1cc
  2. Aug 14, 2017
    • Shixiong Zhu's avatar
      [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files · 282f00b4
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Directly writing a snapshot file may generate a partial file. This PR changes it to write to a temp file then rename to the target file.
      
      ## How was this patch tested?
      
      Jenkins.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #18928 from zsxwing/SPARK-21696.
      282f00b4
    • donnyzone's avatar
      [SPARK-19471][SQL] AggregationIterator does not initialize the generated... · fbc26925
      donnyzone authored
      [SPARK-19471][SQL] AggregationIterator does not initialize the generated result projection before using it
      
      ## What changes were proposed in this pull request?
      
      Recently, we have also encountered such NPE issues in our production environment as described in:
      https://issues.apache.org/jira/browse/SPARK-19471
      
      This issue can be reproduced by the following examples:
      ` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4))).toDF("x", "y")
      
      //HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
      df.groupBy("x").agg(rand(),sum("y")).show()
      
      //ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
      df.groupBy("x").agg(rand(),collect_list("y")).show()
      
      //SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false &&SQLConf.USE_OBJECT_HASH_AGG.key=false
      df.groupBy("x").agg(rand(),collect_list("y")).show()`
      `
      
      This PR is based on PR-16820(https://github.com/apache/spark/pull/16820) with test cases for all aggregation paths. We want to push it forward.
      
      > When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions.
      
      ## How was this patch tested?
      
      unit test
      verified in production environment
      
      Author: donnyzone <wellfengzhu@gmail.com>
      
      Closes #18920 from DonnyZone/Branch-spark-19471.
      fbc26925
    • caoxuewen's avatar
      [MINOR][SQL][TEST] no uncache table in joinsuite test · 0326b69c
      caoxuewen authored
      ## What changes were proposed in this pull request?
      
      At present, in test("broadcasted hash outer join operator selection") case,  set the testData2 to _CACHE TABLE_, but no _uncache table_ testData2. It can make people confused.
      In addition, in the joinsuite test cases, clear the cache table of work by SharedSQLContext _spark.sharedState.cacheManager.clearCache_ to do, so we do not need to uncache table
      let's fix it. thanks.
      
      ## How was this patch tested?
      Existing test cases.
      
      Author: caoxuewen <cao.xuewen@zte.com.cn>
      
      Closes #18914 from heary-cao/uncache_table.
      0326b69c
    • byakuinss's avatar
      [SPARK-21658][SQL][PYSPARK] Add default None for value in na.replace in PySpark · 0fcde87a
      byakuinss authored
      ## What changes were proposed in this pull request?
      JIRA issue: https://issues.apache.org/jira/browse/SPARK-21658
      
      Add default None for value in `na.replace` since `Dataframe.replace` and `DataframeNaFunctions.replace` are alias.
      
      The default values are the same now.
      ```
      >>> df = sqlContext.createDataFrame([('Alice', 10, 80.0)])
      >>> df.replace({"Alice": "a"}).first()
      Row(_1=u'a', _2=10, _3=80.0)
      >>> df.na.replace({"Alice": "a"}).first()
      Row(_1=u'a', _2=10, _3=80.0)
      ```
      
      ## How was this patch tested?
      Existing tests.
      
      cc viirya
      
      Author: byakuinss <grace.chinhanyu@gmail.com>
      
      Closes #18895 from byakuinss/SPARK-21658.
      0fcde87a
    • Andrew Ash's avatar
      [SPARK-21563][CORE] Fix race condition when serializing TaskDescriptions and adding jars · 6847e93c
      Andrew Ash authored
      ## What changes were proposed in this pull request?
      
      Fix the race condition when serializing TaskDescriptions and adding jars by keeping the set of jars and files for a TaskSet constant across the lifetime of the TaskSet.  Otherwise TaskDescription serialization can produce an invalid serialization when new file/jars are added concurrently as the TaskDescription is serialized.
      
      ## How was this patch tested?
      
      Additional unit test ensures jars/files contained in the TaskDescription remain constant throughout the lifetime of the TaskSet.
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #18913 from ash211/SPARK-21563.
      6847e93c
    • Anderson Osagie's avatar
      [SPARK-21176][WEB UI] Format worker page links to work with proxy · 34d2134a
      Anderson Osagie authored
      ## What changes were proposed in this pull request?
      
      Several links on the worker page do not work correctly with the proxy because:
      1) They don't acknowledge the proxy
      2) They use relative paths (unlike the Application Page which uses full paths)
      
      This patch fixes that. It also fixes a mistake in the proxy's Location header parsing which caused it to incorrectly handle redirects.
      
      ## How was this patch tested?
      
      I checked the validity of every link with the proxy on and off.
      
      Author: Anderson Osagie <osagie@gmail.com>
      
      Closes #18915 from aosagie/fix/proxy-links.
      34d2134a
  3. Aug 13, 2017
    • aokolnychyi's avatar
      [MINOR][SQL] Additional test case for CheckCartesianProducts rule · 5596ce83
      aokolnychyi authored
      ## What changes were proposed in this pull request?
      
      While discovering optimization rules and their test coverage, I did not find any tests for `CheckCartesianProducts` in the Catalyst folder. So, I decided to create a new test suite. Once I finished, I found a test in `JoinSuite` for this functionality so feel free to discard this change if it does not make much sense. The proposed test suite covers a few additional use cases.
      
      Author: aokolnychyi <anton.okolnychyi@sap.com>
      
      Closes #18909 from aokolnychyi/check-cartesian-join-tests.
      5596ce83
  4. Aug 12, 2017
    • pj.fanning's avatar
      [SPARK-21709][BUILD] sbt 0.13.16 and some plugin updates · c0e333db
      pj.fanning authored
      ## What changes were proposed in this pull request?
      
      Update sbt version to 0.13.16. I think this is a useful stepping stone to getting to sbt 1.0.0.
      
      ## How was this patch tested?
      
      Existing Build.
      
      Author: pj.fanning <pj.fanning@workday.com>
      
      Closes #18921 from pjfanning/SPARK-21709.
      c0e333db
    • Ajay Saini's avatar
      [SPARK-17025][ML][PYTHON] Persistence for Pipelines with Python-only Stages · 35db3b9f
      Ajay Saini authored
      ## What changes were proposed in this pull request?
      
      Implemented a Python-only persistence framework for pipelines containing stages that cannot be saved using Java.
      
      ## How was this patch tested?
      
      Created a custom Python-only UnaryTransformer, included it in a Pipeline, and saved/loaded the pipeline. The loaded pipeline was compared against the original using _compare_pipelines() in tests.py.
      
      Author: Ajay Saini <ajays725@gmail.com>
      
      Closes #18888 from ajaysaini725/PythonPipelines.
      35db3b9f
    • Sean Owen's avatar
      [MINOR][BUILD] Download RAT and R version info over HTTPS; use RAT 0.12 · b0bdfce9
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      This is trivial, but bugged me. We should download software over HTTPS.
      And we can use RAT 0.12 while at it to pick up bug fixes.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #18927 from srowen/Rat012.
      b0bdfce9
  5. Aug 11, 2017
    • Stavros Kontopoulos's avatar
      [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alone cluster mode · da8c59bd
      Stavros Kontopoulos authored
      Fixes --packages flag for the stand-alone case in cluster mode. Adds to the driver classpath the jars that are resolved via ivy along with any other jars passed to `spark.jars`. Jars not resolved by ivy are downloaded explicitly to a tmp folder on the driver node. Similar code is available in SparkSubmit so we refactored part of it to use it at the DriverWrapper class which is responsible for launching driver in standalone cluster mode.
      
      Note: In stand-alone mode `spark.jars` contains the user jar so it can be fetched later on at the executor side.
      
      Manually by submitting a driver in cluster mode within a standalone cluster and checking if dependencies were resolved at the driver side.
      
      Author: Stavros Kontopoulos <st.kontopoulos@gmail.com>
      
      Closes #18630 from skonto/fix_packages_stand_alone_cluster.
      da8c59bd
    • Tejas Patil's avatar
      [SPARK-19122][SQL] Unnecessary shuffle+sort added if join predicates ordering... · 7f16c691
      Tejas Patil authored
      [SPARK-19122][SQL] Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order
      
      ## What changes were proposed in this pull request?
      
      Jira : https://issues.apache.org/jira/browse/SPARK-19122
      
      `leftKeys` and `rightKeys` in `SortMergeJoinExec` are altered based on the ordering of join keys in the child's `outputPartitioning`. This is done everytime `requiredChildDistribution` is invoked during query planning.
      
      ## How was this patch tested?
      
      - Added new test case
      - Existing tests
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #16985 from tejasapatil/SPARK-19122_join_order_shuffle.
      7f16c691
    • Tejas Patil's avatar
      [SPARK-21595] Separate thresholds for buffering and spilling in ExternalAppendOnlyUnsafeRowArray · 94439997
      Tejas Patil authored
      ## What changes were proposed in this pull request?
      
      [SPARK-21595](https://issues.apache.org/jira/browse/SPARK-21595) reported that there is excessive spilling to disk due to default spill threshold for `ExternalAppendOnlyUnsafeRowArray` being quite small for WINDOW operator. Old behaviour of WINDOW operator (pre https://github.com/apache/spark/pull/16909) would hold data in an array for first 4096 records post which it would switch to `UnsafeExternalSorter` and start spilling to disk after reaching `spark.shuffle.spill.numElementsForceSpillThreshold` (or earlier if there was paucity of memory due to excessive consumers).
      
      Currently the (switch from in-memory to `UnsafeExternalSorter`) and (`UnsafeExternalSorter` spilling to disk) for `ExternalAppendOnlyUnsafeRowArray` is controlled by a single threshold. This PR aims to separate that to have more granular control.
      
      ## How was this patch tested?
      
      Added unit tests
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #18843 from tejasapatil/SPARK-21595.
      94439997
    • LucaCanali's avatar
      [SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment · 0377338b
      LucaCanali authored
      Add an option to the JDBC data source to initialize the environment of the remote database session
      
      ## What changes were proposed in this pull request?
      
      This proposes an option to the JDBC datasource, tentatively called " sessionInitStatement" to implement the functionality of session initialization present for example in the Sqoop connector for Oracle (see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements ) . After each database session is opened to the remote DB, and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block in the case of Oracle).
      
      See also https://issues.apache.org/jira/browse/SPARK-21519
      
      ## How was this patch tested?
      
      Manually tested using Spark SQL data source and Oracle JDBC
      
      Author: LucaCanali <luca.canali@cern.ch>
      
      Closes #18724 from LucaCanali/JDBC_datasource_sessionInitStatement.
      0377338b
    • Kent Yao's avatar
      [SPARK-21675][WEBUI] Add a navigation bar at the bottom of the Details for Stage Page · 2387f1e3
      Kent Yao authored
      ## What changes were proposed in this pull request?
      
      1. In Spark Web UI, the Details for Stage Page don't have a navigation bar at the bottom. When we drop down to the bottom, it is better for us to see a navi bar right there to go wherever we what.
      2. Executor ID is not equivalent to Host, it may be  better to separate them, and then we can group the tasks by Hosts .
      
      ## How was this patch tested?
      manually test
      ![wx20170809-165606](https://user-images.githubusercontent.com/8326978/29114161-f82b4920-7d25-11e7-8d0c-0c036b008a78.png)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Kent Yao <yaooqinn@hotmail.com>
      
      Closes #18893 from yaooqinn/SPARK-21675.
      2387f1e3
  6. Aug 10, 2017
    • Reynold Xin's avatar
      [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog · 584c7f14
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch removes the unused SessionCatalog.getTableMetadataOption and ExternalCatalog. getTableOption.
      
      ## How was this patch tested?
      Removed the test case.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #18912 from rxin/remove-getTableOption.
      584c7f14
    • Peng Meng's avatar
      [SPARK-21638][ML] Fix RF/GBT Warning message error · ca695585
      Peng Meng authored
      ## What changes were proposed in this pull request?
      
      When train RF model, there are many warning messages like this:
      
      > WARN  RandomForest: Tree learning is using approximately 268492800 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 2622 nodes in this iteration.
      
      This warning message is unnecessary and the data is not accurate.
      
      Actually, if all the nodes cannot split in one iteration, it will show this warning. For most of the case, all the nodes cannot split just in one iteration, so for most of the case, it will show this warning for each iteration.
      
      ## How was this patch tested?
      The existing UT
      
      Author: Peng Meng <peng.meng@intel.com>
      
      Closes #18868 from mpjlu/fixRFwarning.
      ca695585
    • Adrian Ionescu's avatar
      [SPARK-21669] Internal API for collecting metrics/stats during FileFormatWriter jobs · 95ad960c
      Adrian Ionescu authored
      ## What changes were proposed in this pull request?
      
      This patch introduces an internal interface for tracking metrics and/or statistics on data on the fly, as it is being written to disk during a `FileFormatWriter` job and partially reimplements SPARK-20703 in terms of it.
      
      The interface basically consists of 3 traits:
      - `WriteTaskStats`: just a tag for classes that represent statistics collected during a `WriteTask`
        The only constraint it adds is that the class should be `Serializable`, as instances of it will be collected on the driver from all executors at the end of the `WriteJob`.
      - `WriteTaskStatsTracker`: a trait for classes that can actually compute statistics based on tuples that are processed by a given `WriteTask` and eventually produce a `WriteTaskStats` instance.
      - `WriteJobStatsTracker`: a trait for classes that act as containers of `Serializable` state that's necessary for instantiating `WriteTaskStatsTracker` on executors and finally process the resulting collection of `WriteTaskStats`, once they're gathered back on the driver.
      
      Potential future use of this interface is e.g. CBO stats maintenance during `INSERT INTO table ... ` operations.
      
      ## How was this patch tested?
      Existing tests for SPARK-20703 exercise the new code: `hive/SQLMetricsSuite`, `sql/JavaDataFrameReaderWriterSuite`, etc.
      
      Author: Adrian Ionescu <adrian@databricks.com>
      
      Closes #18884 from adrian-ionescu/write-stats-tracker-api.
      95ad960c
  7. Aug 09, 2017
    • bravo-zhang's avatar
      [SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None · 84454d7d
      bravo-zhang authored
      ## What changes were proposed in this pull request?
      
      Currently `df.na.replace("*", Map[String, String]("NULL" -> null))` will produce exception.
      This PR enables passing null/None as value in the replacement map in DataFrame.replace().
      Note that the replacement map keys and values should still be the same type, while the values can have a mix of null/None and that type.
      This PR enables following operations for example:
      `df.na.replace("*", Map[String, String]("NULL" -> null))`(scala)
      `df.na.replace("*", Map[Any, Any](60 -> null, 70 -> 80))`(scala)
      `df.na.replace('Alice', None)`(python)
      `df.na.replace([10, 20])`(python, replacing with None is by default)
      One use case could be: I want to replace all the empty strings with null/None because they were incorrectly generated and then drop all null/None data
      `df.na.replace("*", Map("" -> null)).na.drop()`(scala)
      `df.replace(u'', None).dropna()`(python)
      
      ## How was this patch tested?
      
      Scala unit test.
      Python doctest and unit test.
      
      Author: bravo-zhang <mzhang1230@gmail.com>
      
      Closes #18820 from bravo-zhang/spark-14932.
      84454d7d
    • peay's avatar
      [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator · c06f3f5a
      peay authored
      ## What changes were proposed in this pull request?
      
      This modification increases the timeout for `serveIterator` (which is not dynamically configurable). This fixes timeout issues in pyspark when using `collect` and similar functions, in cases where Python may take more than a couple seconds to connect.
      
      See https://issues.apache.org/jira/browse/SPARK-21551
      
      ## How was this patch tested?
      
      Ran the tests.
      
      cc rxin
      
      Author: peay <peay@protonmail.com>
      
      Closes #18752 from peay/spark-21551.
      c06f3f5a
    • Jose Torres's avatar
      [SPARK-21587][SS] Added filter pushdown through watermarks. · 0fb73253
      Jose Torres authored
      ## What changes were proposed in this pull request?
      
      Push filter predicates through EventTimeWatermark if they're deterministic and do not reference the watermarked attribute. (This is similar but not identical to the logic for pushing through UnaryNode.)
      
      ## How was this patch tested?
      unit tests
      
      Author: Jose Torres <joseph-torres@databricks.com>
      
      Closes #18790 from joseph-torres/SPARK-21587.
      0fb73253
    • gatorsmile's avatar
      [SPARK-21504][SQL] Add spark version info into table metadata · 2d799d08
      gatorsmile authored
      ## What changes were proposed in this pull request?
      This PR is to add the spark version info in the table metadata. When creating the table, this value is assigned. It can help users find which version of Spark was used to create the table.
      
      ## How was this patch tested?
      N/A
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18709 from gatorsmile/addVersion.
      2d799d08
    • Takeshi Yamamuro's avatar
      [SPARK-21276][CORE] Update lz4-java to the latest (v1.4.0) · b78cf13b
      Takeshi Yamamuro authored
      ## What changes were proposed in this pull request?
      This pr updated `lz4-java` to the latest (v1.4.0) and removed custom `LZ4BlockInputStream`. We currently use custom `LZ4BlockInputStream` to read concatenated byte stream in shuffle. But, this functionality has been implemented in the latest lz4-java (https://github.com/lz4/lz4-java/pull/105). So, we might update the latest to remove the custom `LZ4BlockInputStream`.
      
      Major diffs between the latest release and v1.3.0 in the master are as follows (https://github.com/lz4/lz4-java/compare/62f7547abb0819d1ca1e669645ee1a9d26cd60b0...6d4693f56253fcddfad7b441bb8d917b182efa2d);
      - fixed NPE in XXHashFactory similarly
      - Don't place resources in default package to support shading
      - Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed
      - Try to load lz4-java from java.library.path, then fallback to bundled
      - Add ppc64le binary
      - Add s390x JNI binding
      - Add basic LZ4 Frame v1.5.0 support
      - enable aarch64 support for lz4-java
      - Allow unsafeInstance() for ppc64le archiecture
      - Add unsafeInstance support for AArch64
      - Support 64-bit JNI build on Solaris
      - Avoid over-allocating a buffer
      - Allow EndMark to be incompressible for LZ4FrameInputStream.
      - Concat byte stream
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Takeshi Yamamuro <yamamuro@apache.org>
      
      Closes #18883 from maropu/SPARK-21276.
      b78cf13b
    • vinodkc's avatar
      [SPARK-21665][CORE] Need to close resources after use · 83fe3b5e
      vinodkc authored
      ## What changes were proposed in this pull request?
      Resources in Core - SparkSubmitArguments.scala, Spark-launcher - AbstractCommandBuilder.java, resource-managers- YARN - Client.scala are released
      
      ## How was this patch tested?
      No new test cases added, Unit test have been passed
      
      Author: vinodkc <vinod.kc.in@gmail.com>
      
      Closes #18880 from vinodkc/br_fixresouceleak.
      83fe3b5e
    • 10087686's avatar
      [SPARK-21663][TESTS] test("remote fetch below max RPC message size") should... · 6426adff
      10087686 authored
      [SPARK-21663][TESTS] test("remote fetch below max RPC message size") should call masterTracker.stop() in MapOutputTrackerSuite
      
      Signed-off-by: 10087686 <wang.jiaochunzte.com.cn>
      
      ## What changes were proposed in this pull request?
      After Unit tests end,there should be call masterTracker.stop() to free resource;
      (Please fill in changes proposed in this fix)
      
      ## How was this patch tested?
      Run Unit tests;
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: 10087686 <wang.jiaochun@zte.com.cn>
      
      Closes #18867 from wangjiaochun/mapout.
      6426adff
    • WeichenXu's avatar
      [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search · b35660dd
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
      https://github.com/scalanlp/breeze/pull/651
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #18797 from WeichenXu123/update-breeze.
      b35660dd
    • Anderson Osagie's avatar
      [SPARK-21176][WEB UI] Use a single ProxyServlet to proxy all workers and applications · ae8a2b14
      Anderson Osagie authored
      ## What changes were proposed in this pull request?
      
      Currently, each application and each worker creates their own proxy servlet. Each proxy servlet is backed by its own HTTP client and a relatively large number of selector threads. This is excessive but was fixed (to an extent) by https://github.com/apache/spark/pull/18437.
      
      However, a single HTTP client (backed by a single selector thread) should be enough to handle all proxy requests. This PR creates a single proxy servlet no matter how many applications and workers there are.
      
      ## How was this patch tested?
      .
      The unit tests for rewriting proxied locations and headers were updated. I then spun up a 100 node cluster to ensure that proxy'ing worked correctly
      
      jiangxb1987 Please let me know if there's anything else I can do to help push this thru. Thanks!
      
      Author: Anderson Osagie <osagie@gmail.com>
      
      Closes #18499 from aosagie/fix/minimize-proxy-threads.
      ae8a2b14
    • pgandhi's avatar
      [SPARK-21503][UI] Spark UI shows incorrect task status for a killed Executor Process · f016f5c8
      pgandhi authored
      The executor tab on Spark UI page shows task as completed when an executor process that is running that task is killed using the kill command.
      Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. This case will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc.
      
      ## How was this patch tested?
      Manually Tested the fix by observing the UI change before and after.
      Before:
      <img width="1398" alt="screen shot-before" src="https://user-images.githubusercontent.com/22228190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png">
      After:
      <img width="1385" alt="screen shot-after" src="https://user-images.githubusercontent.com/22228190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png">
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: pgandhi <pgandhi@yahoo-inc.com>
      Author: pgandhi999 <parthkgandhi9@gmail.com>
      
      Closes #18707 from pgandhi999/master.
      f016f5c8
    • Xingbo Jiang's avatar
      [SPARK-21608][SPARK-9221][SQL] Window rangeBetween() API should allow literal boundary · 031910b0
      Xingbo Jiang authored
      ## What changes were proposed in this pull request?
      
      Window rangeBetween() API should allow literal boundary, that means, the window range frame can calculate frame of double/date/timestamp.
      
      Example of the use case can be:
      ```
      SELECT
      	val_timestamp,
      	cate,
      	avg(val_timestamp) OVER(PARTITION BY cate ORDER BY val_timestamp RANGE BETWEEN CURRENT ROW AND interval 23 days 4 hours FOLLOWING)
      FROM testData
      ```
      
      This PR refactors the Window `rangeBetween` and `rowsBetween` API, while the legacy user code should still be valid.
      
      ## How was this patch tested?
      
      Add new test cases both in `DataFrameWindowFunctionsSuite` and in `window.sql`.
      
      Author: Xingbo Jiang <xingbo.jiang@databricks.com>
      
      Closes #18814 from jiangxb1987/literal-boundary.
      031910b0
  8. Aug 08, 2017
Loading