Skip to content
Snippets Groups Projects
  1. May 19, 2017
    • Nick Pentreath's avatar
      [SPARK-20506][DOCS] 2.2 migration guide · b5d8d9ba
      Nick Pentreath authored
      Update ML guide for migration `2.1` -> `2.2` and the previous version migration guide section.
      
      ## How was this patch tested?
      
      Build doc locally.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #17996 from MLnick/SPARK-20506-2.2-migration-guide.
      b5d8d9ba
    • Wayne Zhang's avatar
      [SPARKR] Fix bad examples in DataFrame methods and style issues · 7f203a24
      Wayne Zhang authored
      ## What changes were proposed in this pull request?
      Some examples in the DataFrame methods are syntactically wrong, even though they are pseudo code. Fix these and some style issues.
      
      Author: Wayne Zhang <actuaryzhang@uber.com>
      
      Closes #18003 from actuaryzhang/sparkRDoc3.
      7f203a24
    • zero323's avatar
      [SPARKR][DOCS][MINOR] Use consistent names in rollup and cube examples · 2d90c04f
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Rename `carsDF` to `df` in SparkR `rollup` and `cube` examples.
      
      ## How was this patch tested?
      
      Manual tests.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17988 from zero323/cube-docs.
      2d90c04f
    • liuxian's avatar
      [SPARK-20763][SQL] The function of `month` and `day` return the value which is not we expected. · ea3b1e35
      liuxian authored
      ## What changes were proposed in this pull request?
      spark-sql>select month("1582-09-28");
      spark-sql>10
      For this case, the expected result is 9, but it is 10.
      
      spark-sql>select day("1582-04-18");
      spark-sql>28
      For this case, the expected result is 18, but it is 28.
      
      when the date  before "1582-10-04", the function of `month` and `day` return the value which is not we expected.
      
      ## How was this patch tested?
      unit tests
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #17997 from 10110346/wip_lx_0516.
      ea3b1e35
    • Yuming Wang's avatar
      [SPARK-20751][SQL] Add built-in SQL Function - COT · bff021df
      Yuming Wang authored
      ## What changes were proposed in this pull request?
      
      Add built-in SQL Function - COT.
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Yuming Wang <wgyumg@gmail.com>
      
      Closes #17999 from wangyum/SPARK-20751.
      bff021df
    • liuzhaokun's avatar
      [SPARK-20759] SCALA_VERSION in _config.yml should be consistent with pom.xml · dba2ca2c
      liuzhaokun authored
      [https://issues.apache.org/jira/browse/SPARK-20759](https://issues.apache.org/jira/browse/SPARK-20759)
      SCALA_VERSION in _config.yml is 2.11.7, but 2.11.8 in pom.xml. So I think SCALA_VERSION in _config.yml should be consistent with pom.xml.
      
      Author: liuzhaokun <liu.zhaokun@zte.com.cn>
      
      Closes #17992 from liu-zhaokun/new.
      dba2ca2c
    • caoxuewen's avatar
      [SPARK-20607][CORE] Add new unit tests to ShuffleSuite · f398640d
      caoxuewen authored
      ## What changes were proposed in this pull request?
      
      This PR update to two:
      1.adds the new unit tests.
        testing would be performed when there is no shuffle stage,
        shuffle will not generate the data file and the index files.
      2.Modify the '[SPARK-4085] rerun map stage if reduce stage cannot find its local shuffle file' unit test,
        parallelize is 1 but not is 2, Check the index file and delete.
      
      ## How was this patch tested?
      The new unit test.
      
      Author: caoxuewen <cao.xuewen@zte.com.cn>
      
      Closes #17868 from heary-cao/ShuffleSuite.
      f398640d
    • tpoterba's avatar
      [SPARK-20773][SQL] ParquetWriteSupport.writeFields is quadratic in number of fields · 3f2cd51e
      tpoterba authored
      Fix quadratic List indexing in ParquetWriteSupport.
      
      I noticed this function while profiling some code with today. It showed up as a significant factor in a table with twenty columns; with hundreds of columns, it could dominate any other function call.
      
      ## What changes were proposed in this pull request?
      
      The writeFields method iterates from 0 until number of fields, indexing into rootFieldWriters for each element. rootFieldWriters is a List, so indexing is a linear operation. The complexity of the writeFields method is thus quadratic in the number of fields.
      
      Solution: explicitly convert rootFieldWriters to Array (implicitly converted to WrappedArray) for constant-time indexing.
      
      ## How was this patch tested?
      
      This is a one-line change for performance reasons.
      
      Author: tpoterba <tpoterba@broadinstitute.org>
      Author: Tim Poterba <tpoterba@gmail.com>
      
      Closes #18005 from tpoterba/tpoterba-patch-1.
      3f2cd51e
    • Ala Luszczak's avatar
      [SPARK-20798] GenerateUnsafeProjection should check if a value is null before calling the getter · ce8edb8b
      Ala Luszczak authored
      ## What changes were proposed in this pull request?
      
      GenerateUnsafeProjection.writeStructToBuffer() did not honor the assumption that the caller must make sure that a value is not null before using the getter. This could lead to various errors. This change fixes that behavior.
      
      Example of code generated before:
      ```scala
      /* 059 */         final UTF8String fieldName = value.getUTF8String(0);
      /* 060 */         if (value.isNullAt(0)) {
      /* 061 */           rowWriter1.setNullAt(0);
      /* 062 */         } else {
      /* 063 */           rowWriter1.write(0, fieldName);
      /* 064 */         }
      ```
      
      Example of code generated now:
      ```scala
      /* 060 */         boolean isNull1 = value.isNullAt(0);
      /* 061 */         UTF8String value1 = isNull1 ? null : value.getUTF8String(0);
      /* 062 */         if (isNull1) {
      /* 063 */           rowWriter1.setNullAt(0);
      /* 064 */         } else {
      /* 065 */           rowWriter1.write(0, value1);
      /* 066 */         }
      ```
      
      ## How was this patch tested?
      
      Adds GenerateUnsafeProjectionSuite.
      
      Author: Ala Luszczak <ala@databricks.com>
      
      Closes #18030 from ala/fix-generate-unsafe-projection.
      ce8edb8b
  2. May 18, 2017
    • Yash Sharma's avatar
      [DSTREAM][DOC] Add documentation for kinesis retry configurations · 92580bd0
      Yash Sharma authored
      ## What changes were proposed in this pull request?
      
      The changes were merged as part of - https://github.com/apache/spark/pull/17467.
      The documentation was missed somewhere in the review iterations. Adding the documentation where it belongs.
      
      ## How was this patch tested?
      Docs. Not tested.
      
      cc budde , brkyvz
      
      Author: Yash Sharma <ysharma@atlassian.com>
      
      Closes #18028 from yssharma/ysharma/kinesis_retry_docs.
      92580bd0
    • hyukjinkwon's avatar
      [SPARK-20364][SQL] Disable Parquet predicate pushdown for fields having dots in the names · 8fb3d5c6
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This is an alternative workaround by simply avoiding the predicate pushdown for columns having dots in the names. This is an approach different with https://github.com/apache/spark/pull/17680.
      
      The downside of this PR is, literally it does not push down filters on the column having dots in Parquet files at all (both no record level and no rowgroup level) whereas the downside of the approach in that PR, it does not use the Parquet's API properly but in a hacky way to support this case.
      
      I assume we prefer a safe way here by using the Parquet API properly but this does close that PR as we are basically just avoiding here.
      
      This way looks a simple workaround and probably it is fine given the problem looks arguably rather corner cases (although it might end up with reading whole row groups under the hood but either looks not the best).
      
      Currently, if there are dots in the column name, predicate pushdown seems being failed in Parquet.
      
      **With dots**
      
      ```scala
      val path = "/tmp/abcde"
      Seq(Some(1), None).toDF("col.dots").write.parquet(path)
      spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()
      ```
      
      ```
      +--------+
      |col.dots|
      +--------+
      +--------+
      ```
      
      **Without dots**
      
      ```scala
      val path = "/tmp/abcde"
      Seq(Some(1), None).toDF("coldots").write.parquet(path)
      spark.read.parquet(path).where("`coldots` IS NOT NULL").show()
      ```
      
      ```
      +-------+
      |coldots|
      +-------+
      |      1|
      +-------+
      ```
      
      **After**
      
      ```scala
      val path = "/tmp/abcde"
      Seq(Some(1), None).toDF("col.dots").write.parquet(path)
      spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()
      ```
      
      ```
      +--------+
      |col.dots|
      +--------+
      |       1|
      +--------+
      ```
      
      ## How was this patch tested?
      
      Unit tests added in `ParquetFilterSuite`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18000 from HyukjinKwon/SPARK-20364-workaround.
      8fb3d5c6
    • liuzhaokun's avatar
      [SPARK-20796] the location of start-master.sh in spark-standalone.md is wrong · 99452df4
      liuzhaokun authored
      [https://issues.apache.org/jira/browse/SPARK-20796](https://issues.apache.org/jira/browse/SPARK-20796)
      the location of start-master.sh in spark-standalone.md should be "sbin/start-master.sh" rather than "bin/start-master.sh".
      
      Author: liuzhaokun <liu.zhaokun@zte.com.cn>
      
      Closes #18027 from liu-zhaokun/sbin.
      99452df4
    • zuotingbing's avatar
      [SPARK-20779][EXAMPLES] The ASF header placed in an incorrect location in some files. · 4779b86b
      zuotingbing authored
      ## What changes were proposed in this pull request?
      
      The license is not at the top in some files. and it will be best if we update these places of the ASF header to be consistent with other files.
      
      ## How was this patch tested?
      
      manual tests
      
      Author: zuotingbing <zuo.tingbing9@zte.com.cn>
      
      Closes #18012 from zuotingbing/spark-license.
      4779b86b
    • hyukjinkwon's avatar
      [INFRA] Close stale PRs · 5d2750aa
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to close PRs ...
      
        - inactive to the review comments more than a month
        - WIP and inactive more than a month
        - with Jenkins build failure but inactive more than a month
        - suggested to be closed and no comment against that
        - obviously looking inappropriate (e.g., Branch 0.5)
      
      To make sure, I left a comment for each PR about a week ago and I could not have a response back from the author in these PRs below:
      
      Closes #11129
      Closes #12085
      Closes #12162
      Closes #12419
      Closes #12420
      Closes #12491
      Closes #13762
      Closes #13837
      Closes #13851
      Closes #13881
      Closes #13891
      Closes #13959
      Closes #14091
      Closes #14481
      Closes #14547
      Closes #14557
      Closes #14686
      Closes #15594
      Closes #15652
      Closes #15850
      Closes #15914
      Closes #15918
      Closes #16285
      Closes #16389
      Closes #16652
      Closes #16743
      Closes #16893
      Closes #16975
      Closes #17001
      Closes #17088
      Closes #17119
      Closes #17272
      Closes #17971
      
      Added:
      Closes #17778
      Closes #17303
      Closes #17872
      
      ## How was this patch tested?
      
      N/A
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18017 from HyukjinKwon/close-inactive-prs.
      5d2750aa
    • Xingbo Jiang's avatar
      [SPARK-20700][SQL] InferFiltersFromConstraints stackoverflows for query (v2) · b7aac15d
      Xingbo Jiang authored
      ## What changes were proposed in this pull request?
      
      In the previous approach we used `aliasMap` to link an `Attribute` to the expression with potentially the form `f(a, b)`, but we only searched the `expressions` and `children.expressions` for this, which is not enough when an `Alias` may lies deep in the logical plan. In that case, we can't generate the valid equivalent constraint classes and thus we fail at preventing the recursive deductions.
      
      We fix this problem by collecting all `Alias`s from the logical plan.
      
      ## How was this patch tested?
      
      No additional test case is added, but do modified one test case to cover this situation.
      
      Author: Xingbo Jiang <xingbo.jiang@databricks.com>
      
      Closes #18020 from jiangxb1987/inferConstrants.
      b7aac15d
  3. May 17, 2017
  4. May 16, 2017
    • Liang-Chi Hsieh's avatar
      [SPARK-20690][SQL] Subqueries in FROM should have alias names · 7463a88b
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      We add missing attributes into Filter in Analyzer. But we shouldn't do it through subqueries like this:
      
          select 1 from  (select 1 from onerow t1 LIMIT 1) where  t1.c1=1
      
      This query works in current codebase. However, the outside where clause shouldn't be able to refer `t1.c1` attribute.
      
      The root cause is we allow subqueries in FROM have no alias names previously, it is confusing and isn't supported by various databases such as MySQL, Postgres, Oracle. We shouldn't support it too.
      
      ## How was this patch tested?
      
      Jenkins tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #17935 from viirya/SPARK-20690.
      7463a88b
    • Herman van Hovell's avatar
      [SQL][TRIVIAL] Lower parser log level to debug · 69bb7715
      Herman van Hovell authored
      ## What changes were proposed in this pull request?
      Currently the parser logs the query it is parsing at `info` level. This is too high, this PR lowers the log level to `debug`.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Herman van Hovell <hvanhovell@databricks.com>
      
      Closes #18006 from hvanhovell/lower_parser_log_level.
      69bb7715
    • Yash Sharma's avatar
      [SPARK-20140][DSTREAM] Remove hardcoded kinesis retry wait and max retries · 38f4e869
      Yash Sharma authored
      ## What changes were proposed in this pull request?
      
      The pull requests proposes to remove the hardcoded values for Amazon Kinesis - MIN_RETRY_WAIT_TIME_MS, MAX_RETRIES.
      
      This change is critical for kinesis checkpoint recovery when the kinesis backed rdd is huge.
      Following happens in a typical kinesis recovery :
      - kinesis throttles large number of requests while recovering
      - retries in case of throttling are not able to recover due to the small wait period
      - kinesis throttles per second, the wait period should be configurable for recovery
      
      The patch picks the spark kinesis configs from:
      - spark.streaming.kinesis.retry.wait.time
      - spark.streaming.kinesis.retry.max.attempts
      
      Jira : https://issues.apache.org/jira/browse/SPARK-20140
      
      ## How was this patch tested?
      
      Modified the KinesisBackedBlockRDDSuite.scala to run kinesis tests with the modified configurations. Wasn't able to test the patch with actual throttling.
      
      Author: Yash Sharma <ysharma@atlassian.com>
      
      Closes #17467 from yssharma/ysharma/spark-kinesis-retries.
      38f4e869
    • Kazuaki Ishizaki's avatar
      [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit · 6f62e9d9
      Kazuaki Ishizaki authored
      ## What changes were proposed in this pull request?
      
      When an expression for `df.filter()` has many nodes (e.g. 400), the size of Java bytecode for the generated Java code is more than 64KB. It produces an Java exception. As a result, the execution fails.
      This PR continues to execute by calling `Expression.eval()` disabling code generation if an exception has been caught.
      
      ## How was this patch tested?
      
      Add a test suite into `DataFrameSuite`
      
      Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
      
      Closes #17087 from kiszk/SPARK-19372.
      6f62e9d9
    • Shixiong Zhu's avatar
      [SPARK-20529][CORE] Allow worker and master work with a proxy server · 9150bca4
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy.
      
      In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side.
      
      There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between  master and worker, the address may be wrong. However, there is no way to figure it out just in the worker.
      
      ## How was this patch tested?
      
      The new added unit test.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #17821 from zsxwing/SPARK-20529.
      9150bca4
    • Nick Pentreath's avatar
      [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all performance PRs · 25b4f41d
      Nick Pentreath authored
      Small clean ups from #17742 and #17845.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #17919 from MLnick/SPARK-20677-als-perf-followup.
      25b4f41d
    • Nick Pentreath's avatar
      [SPARK-20553][ML][PYSPARK] Update ALS examples with recommend-all methods · 6af7b43b
      Nick Pentreath authored
      Update ALS examples illustrating use of "recommendForAllX" methods.
      
      ## How was this patch tested?
      Built and ran examples locally
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #17950 from MLnick/SPARK-20553-update-als-examples.
      6af7b43b
  5. May 15, 2017
    • Yanbo Liang's avatar
      [SPARK-20501][ML] ML 2.2 QA: New Scala APIs, docs · dbe81633
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Review new Scala APIs introduced in 2.2.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #17934 from yanboliang/spark-20501.
      dbe81633
    • Yanbo Liang's avatar
      [SPARK-20707][ML] ML deprecated APIs should be removed in major release. · d4022d49
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Before 2.2, MLlib keep to remove APIs deprecated in last feature/minor release. But from Spark 2.2, we decide to remove deprecated APIs in a major release, so we need to change corresponding annotations to tell users those will be removed in 3.0.
      Meanwhile, this fixed bugs in ML documents. The original ML docs can't show deprecated annotations in ```MLWriter``` and ```MLReader``` related class, we correct it in this PR.
      
      Before:
      ![image](https://cloud.githubusercontent.com/assets/1962026/25939889/f8c55f20-3666-11e7-9fa2-0605bfb3ed06.png)
      
      After:
      ![image](https://cloud.githubusercontent.com/assets/1962026/25939870/e9b0d5be-3666-11e7-9765-5e04885e4b32.png)
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #17946 from yanboliang/spark-20707.
      d4022d49
    • Takuya UESHIN's avatar
      [SPARK-20588][SQL] Cache TimeZone instances. · c8c878a4
      Takuya UESHIN authored
      ## What changes were proposed in this pull request?
      
      Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck.
      This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site.
      
      This pr makes a cache of the generated TimeZone instances to avoid the synchronization.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Takuya UESHIN <ueshin@databricks.com>
      
      Closes #17933 from ueshin/issues/SPARK-20588.
      c8c878a4
    • Dongjoon Hyun's avatar
      [SPARK-20735][SQL][TEST] Enable cross join in TPCDSQueryBenchmark · bbd163d5
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Since [SPARK-17298](https://issues.apache.org/jira/browse/SPARK-17298), some queries (q28, q61, q77, q88, q90) in the test suites fail with a message "_Use the CROSS JOIN syntax to allow cartesian products between these relations_".
      
      This benchmark is used as a reference model for Spark TPC-DS, so this PR aims to enable the correct configuration in `TPCDSQueryBenchmark.scala`.
      
      ## How was this patch tested?
      
      Manual. (Run TPCDSQueryBenchmark)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #17977 from dongjoon-hyun/SPARK-20735.
      bbd163d5
    • Tathagata Das's avatar
      [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior · 499ba2cb
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      Timeout and state data are two independent entities and should be settable independently. Therefore, in the same call of the user-defined function, one should be able to set the timeout before initializing the state and also after removing the state. Whether timeouts can be set or not, should not depend on the current state, and vice versa.
      
      However, a limitation of the current implementation is that state cannot be null while timeout is set. This is checked lazily after the function call has completed.
      
      ## How was this patch tested?
      - Updated existing unit tests that test the behavior of GroupState.setTimeout*** wrt to the current state
      - Added new tests that verify the disallowed cases where state is undefined but timeout is set.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #17957 from tdas/SPARK-20717.
      499ba2cb
    • Tejas Patil's avatar
      [SPARK-17729][SQL] Enable creating hive bucketed tables · d2416925
      Tejas Patil authored
      ## What changes were proposed in this pull request?
      
      Hive allows inserting data to bucketed table without guaranteeing bucketed and sorted-ness based on these two configs : `hive.enforce.bucketing` and `hive.enforce.sorting`.
      
      What does this PR achieve ?
      - Spark will disallow users from writing outputs to hive bucketed tables by default (given that output won't adhere with Hive's semantics).
      - IF user still wants to write to hive bucketed table, the only resort is to use `hive.enforce.bucketing=false` and `hive.enforce.sorting=false` which means user does NOT care about bucketing guarantees.
      
      Changes done in this PR:
      - Extract table's bucketing information in `HiveClientImpl`
      - While writing table info to metastore, `HiveClientImpl` now populates the bucketing information in the hive `Table` object
      - `InsertIntoHiveTable` allows inserts to bucketed table only if both `hive.enforce.bucketing` and `hive.enforce.sorting` are `false`
      
      Ability to create bucketed tables will enable adding test cases to Spark while I add more changes related to hive bucketing support. Design doc for hive hive bucketing support : https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit#
      
      ## How was this patch tested?
      - Added test for creating bucketed and sorted table.
      - Added test to ensure that INSERTs fail if strict bucket / sort is enforced
      - Added test to ensure that INSERTs can go through if strict bucket / sort is NOT enforced
      - Added test to validate that bucketing information shows up in output of DESC FORMATTED
      - Added test to ensure that `SHOW CREATE TABLE` works for hive bucketed tables
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #17644 from tejasapatil/SPARK-17729_create_bucketed_table.
      d2416925
    • Tathagata Das's avatar
      [SPARK-20716][SS] StateStore.abort() should not throw exceptions · 271175e2
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      StateStore.abort() should do a best effort attempt to clean up temporary resources. It should not throw errors, especially because its called in a TaskCompletionListener, because this error could hide previous real errors in the task.
      
      ## How was this patch tested?
      No unit test.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #17958 from tdas/SPARK-20716.
      271175e2
    • Wenchen Fan's avatar
      [SPARK-12837][SPARK-20666][CORE][FOLLOWUP] getting name should not fail if... · e1aaab1e
      Wenchen Fan authored
      [SPARK-12837][SPARK-20666][CORE][FOLLOWUP] getting name should not fail if accumulator is garbage collected
      
      ## What changes were proposed in this pull request?
      
      After https://github.com/apache/spark/pull/17596 , we do not send internal accumulator name to executor side anymore, and always look up the accumulator name in `AccumulatorContext`.
      
      This cause a regression if the accumulator is already garbage collected, this PR fixes this by still sending accumulator name for `SQLMetrics`.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #17931 from cloud-fan/bug.
      e1aaab1e
    • Zheng RuiFeng's avatar
      [SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive · 9970aa09
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      make param `family` in LoR and `optimizer` in LDA case insensitive
      
      ## How was this patch tested?
      updated tests
      
      yanboliang
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #17910 from zhengruifeng/lr_family_lowercase.
      9970aa09
    • Takeshi Yamamuro's avatar
      [SPARK-20730][SQL] Add an optimizer rule to combine nested Concat · b0888d1a
      Takeshi Yamamuro authored
      ## What changes were proposed in this pull request?
      This pr added a new Optimizer rule to combine nested Concat. The master supports a pipeline operator '||' to concatenate strings in #17711 (This pr is follow-up). Since the parser currently generates nested Concat expressions, the optimizer needs to combine the nested expressions.
      
      ## How was this patch tested?
      Added tests in `CombineConcatSuite` and `SQLQueryTestSuite`.
      
      Author: Takeshi Yamamuro <yamamuro@apache.org>
      
      Closes #17970 from maropu/SPARK-20730.
      b0888d1a
    • guoxiaolong's avatar
      [SPARK-20720][WEB-UI] Executor Summary' should show the exact number, 'Removed... · 8da6e8b1
      guoxiaolong authored
      [SPARK-20720][WEB-UI] Executor Summary' should show the exact number, 'Removed Executors' should display the specific number, in the Application Page
      
      ## What changes were proposed in this pull request?
      
      When the number of spark worker executors is large, if the specific number is displayed, will better help us to analyze and observe by spark ui.
      
      Although this is a small improvement, but it is indeed very valuable.
      
      After fix:
      ![executor1](https://cloud.githubusercontent.com/assets/26266482/25986597/2d8e4386-3723-11e7-9c24-e5bff17c26e2.png)
      
      ## How was this patch tested?
      
      manual tests
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: guoxiaolong <guo.xiaolong1@zte.com.cn>
      Author: 郭小龙 10207633 <guo.xiaolong1@zte.com.cn>
      Author: guoxiaolongzte <guo.xiaolong1@zte.com.cn>
      
      Closes #17961 from guoxiaolongzte/SPARK-20720.
      8da6e8b1
    • guoxiaolong's avatar
      [SPARK-20705][WEB-UI] The sort function can not be used in the master page... · 99d57999
      guoxiaolong authored
      [SPARK-20705][WEB-UI] The sort function can not be used in the master page when you use Firefox or Google Chrome.
      
      ## What changes were proposed in this pull request?
      When you open the master page, when you use Firefox or Google Chrom, the console of Firefox or Google Chrome is wrong. But The IE  is no problem.
      e.g.
      ![error](https://cloud.githubusercontent.com/assets/26266482/25946143/74467a5c-367c-11e7-8f9f-d3585b1aea88.png)
      
      My Firefox version is 48.0.2.
      My Google Chrome version  is 49.0.2623.75 m.
      
      ## How was this patch tested?
      
      manual tests
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: guoxiaolong <guo.xiaolong1@zte.com.cn>
      Author: 郭小龙 10207633 <guo.xiaolong1@zte.com.cn>
      Author: guoxiaolongzte <guo.xiaolong1@zte.com.cn>
      
      Closes #17952 from guoxiaolongzte/SPARK-20705.
      99d57999
  6. May 14, 2017
    • zero323's avatar
      [SPARK-20726][SPARKR] wrapper for SQL broadcast · 5a799fd8
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Adds R wrapper for `o.a.s.sql.functions.broadcast`.
      - Renames `broadcast` to `broadcast_`.
      
      ## How was this patch tested?
      
      Unit tests, check `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17965 from zero323/SPARK-20726.
      5a799fd8
Loading