Skip to content
Snippets Groups Projects
  1. Feb 25, 2016
    • Terence Yim's avatar
      [SPARK-13441][YARN] Fix NPE in yarn Client.createConfArchive method · fae88af1
      Terence Yim authored
      ## What changes were proposed in this pull request?
      
      Instead of using result of File.listFiles() directly, which may throw NPE, check for null first. If it is null, log a warning instead
      
      ## How was the this patch tested?
      
      Ran the ./dev/run-tests locally
      Tested manually on a cluster
      
      Author: Terence Yim <terence@cask.co>
      
      Closes #11337 from chtyim/fixes/SPARK-13441-null-check.
      fae88af1
    • Oliver Pierson's avatar
      [SPARK-13444][MLLIB] QuantileDiscretizer chooses bad splits on large DataFrames · 6f8e835c
      Oliver Pierson authored
      ## What changes were proposed in this pull request?
      
      Change line 113 of QuantileDiscretizer.scala to
      
      `val requiredSamples = math.max(numBins * numBins, 10000.0)`
      
      so that `requiredSamples` is a `Double`.  This will fix the division in line 114 which currently results in zero if `requiredSamples < dataset.count`
      
      ## How was the this patch tested?
      Manual tests.  I was having a problems using QuantileDiscretizer with my a dataset and after making this change QuantileDiscretizer behaves as expected.
      
      Author: Oliver Pierson <ocp@gatech.edu>
      Author: Oliver Pierson <opierson@umd.edu>
      
      Closes #11319 from oliverpierson/SPARK-13444.
      6f8e835c
    • Cheng Lian's avatar
      [SPARK-13473][SQL] Don't push predicate through project with nondeterministic field(s) · 3fa6491b
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      Predicates shouldn't be pushed through project with nondeterministic field(s).
      
      See https://github.com/graphframes/graphframes/pull/23 and SPARK-13473 for more details.
      
      This PR targets master, branch-1.6, and branch-1.5.
      
      ## How was this patch tested?
      
      A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field.
      3fa6491b
    • Devaraj K's avatar
      [SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0 · 2e44031f
      Devaraj K authored
      Fixed the HTTP Server Host Name/IP issue i.e. HTTP Server to take the
      configured host name/IP and not '0.0.0.0' always.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #11133 from devaraj-kavali/SPARK-13117.
      2e44031f
    • Reynold Xin's avatar
      [SPARK-13486][SQL] Move SQLConf into an internal package · 2b2c8c33
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch moves SQLConf into org.apache.spark.sql.internal package to make it very explicit that it is internal. Soon I will also submit more API work that creates implementations of interfaces in this internal package.
      
      ## How was this patch tested?
      If it compiles, then the refactoring should work.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11363 from rxin/SPARK-13486.
      2b2c8c33
    • Davies Liu's avatar
      [SPARK-13376] [SPARK-13476] [SQL] improve column pruning · 07f92ef1
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR mostly rewrite the ColumnPruning rule to support most of the SQL logical plans (except those for Dataset).
      
      This PR also fix a bug in Generate, it should always output UnsafeRow, added an regression test for that.
      
      ## How was this patch tested?
      
      This is test by unit tests, also manually test with TPCDS Q78, which could prune all unused columns successfully, improved the performance by 78% (from 22s to 12s).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11354 from davies/fix_column_pruning.
      07f92ef1
    • huangzhaowei's avatar
      [SPARK-13482][MINOR][CONFIGURATION] Make consistency of the configuraiton named in TransportConf. · 264533b5
      huangzhaowei authored
      `spark.storage.memoryMapThreshold` has two kind of the value, one is 2*1024*1024 as integer and the other one is '2m' as string.
      "2m" is recommanded in document but it will go wrong if the code goes into `TransportConf#memoryMapBytes`.
      
      [Jira](https://issues.apache.org/jira/browse/SPARK-13482)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #11360 from SaintBacchus/SPARK-13482.
      264533b5
    • Kai Jiang's avatar
      [SPARK-7106][MLLIB][PYSPARK] Support model save/load in Python's FPGrowth · 4d2864b2
      Kai Jiang authored
      ## What changes were proposed in this pull request?
      
      Python API supports mode save/load in FPGrowth
      JIRA: [https://issues.apache.org/jira/browse/SPARK-7106](https://issues.apache.org/jira/browse/SPARK-7106)
      ## How was the this patch tested?
      
      The patch is tested with Python doctest.
      
      Author: Kai Jiang <jiangkai@gmail.com>
      
      Closes #11321 from vectorijk/spark-7106.
      4d2864b2
    • Joseph K. Bradley's avatar
      [SPARK-13479][SQL][PYTHON] Added Python API for approxQuantile · 13ce10e9
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      * Scala DataFrameStatFunctions: Added version of approxQuantile taking a List instead of an Array, for Python compatbility
      * Python DataFrame and DataFrameStatFunctions: Added approxQuantile
      
      ## How was this patch tested?
      
      * unit test in sql/tests.py
      
      Documentation was copied from the existing approxQuantile exactly.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #11356 from jkbradley/approx-quantile-python.
      13ce10e9
  2. Feb 24, 2016
    • Michael Armbrust's avatar
      [SPARK-13092][SQL] Add ExpressionSet for constraint tracking · 2b042577
      Michael Armbrust authored
      This PR adds a new abstraction called an `ExpressionSet` which attempts to canonicalize expressions to remove cosmetic differences.  Deterministic expressions that are in the set after canonicalization will always return the same answer given the same input (i.e. false positives should not be possible). However, it is possible that two canonical expressions that are not equal will in fact return the same answer given any input (i.e. false negatives are possible).
      
      ```scala
      val set = AttributeSet('a + 1 :: 1 + 'a :: Nil)
      
      set.iterator => Iterator('a + 1)
      set.contains('a + 1) => true
      set.contains(1 + 'a) => true
      set.contains('a + 2) => false
      ```
      
      Other relevant changes include:
       - Since this concept overlaps with the existing `semanticEquals` and `semanticHash`, those functions are also ported to this new infrastructure.
       - A memoized `canonicalized` version of the expression is added as a `lazy val` to `Expression` and is used by both `semanticEquals` and `ExpressionSet`.
       - A set of unit tests for `ExpressionSet` are added
       - Tests which expect `semanticEquals` to be less intelligent than it now is are updated.
      
      As a followup, we should consider auditing the places where we do `O(n)` `semanticEquals` operations and replace them with `ExpressionSet`.  We should also consider consolidating `AttributeSet` as a specialized factory for an `ExpressionSet.`
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #11338 from marmbrus/expressionSet.
      2b042577
    • Nong Li's avatar
      [SPARK-13250] [SQL] Update PhysicallRDD to convert to UnsafeRow if using the vectorized scanner. · 5a7af9e7
      Nong Li authored
      Some parts of the engine rely on UnsafeRow which the vectorized parquet scanner does not want
      to produce. This add a conversion in Physical RDD. In the case where codegen is used (and the
      scan is the start of the pipeline), there is no requirement to use UnsafeRow. This patch adds
      update PhysicallRDD to support codegen, which eliminates the need for the UnsafeRow conversion
      in all cases.
      
      The result of these changes for TPCDS-Q19 at the 10gb sf reduces the query time from 9.5 seconds
      to 6.5 seconds.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #11141 from nongli/spark-13250.
      5a7af9e7
    • Yin Huai's avatar
      [SPARK-13383][SQL] Fix test · cbb0b65a
      Yin Huai authored
      ## What changes were proposed in this pull request?
      
      Reverting SPARK-13376 (https://github.com/apache/spark/commit/d563c8fa01cfaebb5899ff7970115d0f2e64e8d5) affects the test added by SPARK-13383. So, I am fixing the test.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #11355 from yhuai/SPARK-13383-fix-test.
      cbb0b65a
    • Yin Huai's avatar
      [SPARK-13475][TESTS][SQL] HiveCompatibilitySuite should still run in PR... · bc353805
      Yin Huai authored
      [SPARK-13475][TESTS][SQL] HiveCompatibilitySuite should still run in PR builder even if a PR only changes sql/core
      
      ## What changes were proposed in this pull request?
      
      `HiveCompatibilitySuite` should still run in PR build even if a PR only changes sql/core. So, I am going to remove `ExtendedHiveTest` annotation from `HiveCompatibilitySuite`.
      
      https://issues.apache.org/jira/browse/SPARK-13475
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #11351 from yhuai/SPARK-13475.
      bc353805
    • gatorsmile's avatar
      [HOT][TEST] Disable a Test that Requires Nested Union Support. · 5289837a
      gatorsmile authored
      ## What changes were proposed in this pull request?
      Since "[SPARK-13321][SQL] Support nested UNION in parser" is reverted, we need to disable the test case that requires this PR. Thanks!
      
      rxin yhuai marmbrus
      
      ## How was this patch tested?
      
      N/A
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #11352 from gatorsmile/disableTestCase.
      5289837a
    • Wenchen Fan's avatar
      [SPARK-13467] [PYSPARK] abstract python function to simplify pyspark code · a60f9128
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      When we pass a Python function to JVM side, we also need to send its context, e.g. `envVars`, `pythonIncludes`, `pythonExec`, etc. However, it's annoying to pass around so many parameters at many places. This PR abstract python function along with its context, to simplify some pyspark code and make the logic more clear.
      
      ## How was the this patch tested?
      
      by existing unit tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #11342 from cloud-fan/python-clean.
      a60f9128
    • Reynold Xin's avatar
      Revert "[SPARK-13321][SQL] Support nested UNION in parser" · f92f53fa
      Reynold Xin authored
      This reverts commit 55d6fdf2.
      f92f53fa
    • Reynold Xin's avatar
      65805ab6
    • Reynold Xin's avatar
      Revert "[SPARK-13376] [SQL] improve column pruning" · d563c8fa
      Reynold Xin authored
      This reverts commit e9533b41.
      d563c8fa
    • Reynold Xin's avatar
      Revert "[SPARK-13383][SQL] Keep broadcast hint after column pruning" · 382b27ba
      Reynold Xin authored
      This reverts commit f3739869.
      382b27ba
    • Liang-Chi Hsieh's avatar
      [SPARK-13383][SQL] Keep broadcast hint after column pruning · f3739869
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13383
      
      ## What changes were proposed in this pull request?
      
      When we do column pruning in Optimizer, we put additional Project on top of a logical plan. However, when we already wrap a BroadcastHint on a logical plan, the added Project will hide BroadcastHint after later execution.
      
      We should take care of BroadcastHint when we do column pruning.
      
      ## How was the this patch tested?
      
      Unit test is added.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11260 from viirya/keep-broadcasthint.
      f3739869
    • Liang-Chi Hsieh's avatar
      [SPARK-13472] [SPARKR] Fix unstable Kmeans test in R · 89301818
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13472
      
      ## What changes were proposed in this pull request?
      
      One Kmeans test in R is unstable and sometimes fails. We should fix it.
      
      ## How was this patch tested?
      
      Unit test is modified in this PR.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11345 from viirya/fix-kmeans-r-test and squashes the following commits:
      
      f959f61 [Liang-Chi Hsieh] Sort resulted clusters.
      89301818
    • Daniel Jalova's avatar
      [SPARK-12759][Core][Spark should fail fast if --executor-memory is too small for spark to start] · bcfd55fa
      Daniel Jalova authored
      Added an exception to be thrown in UnifiedMemoryManager.scala if the configuration given for executor memory is too low. Also modified the exception message thrown when driver memory is too low.
      
      This patch was tested manually by passing in config options to Spark shell. I also added a test in UnifiedMemoryManagerSuite.scala
      
      Author: Daniel Jalova <djalova@us.ibm.com>
      
      Closes #11255 from djalova/SPARK-12759.
      bcfd55fa
    • Rahul Tanwani's avatar
      [MINOR][MAINTENANCE] Fix typo for the pull request template. · 83142917
      Rahul Tanwani authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      ## How was the this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Rahul Tanwani <tanwanirahul@gmail.com>
      
      Closes #11343 from tanwanirahul/pull_request_template.
      83142917
  3. Feb 23, 2016
    • Davies Liu's avatar
      [SPARK-13431] [SQL] [test-maven] split keywords from ExpressionParser.g · 86c852cf
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR pull all the keywords (and some others) from ExpressionParser.g as KeywordParser.g, because ExpressionParser is too large to compile.
      
      ## How was the this patch tested?
      
      unit test, maven build
      
      Closes #11329
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11331 from davies/split_expr.
      86c852cf
    • Davies Liu's avatar
      [SPARK-13376] [SQL] improve column pruning · e9533b41
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      This PR mostly rewrite the ColumnPruning rule to support most of the SQL logical plans (except those for Dataset).
      
      ## How was the this patch tested?
      
      This is test by unit tests, also manually test with TPCDS Q78, which could prune all unused columns successfully, improved the performance by 78% (from 22s to 12s).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11256 from davies/fix_column_pruning.
      e9533b41
    • JeremyNixon's avatar
      [SPARK-10759][ML] update cross validator with include_example · 230bbeaa
      JeremyNixon authored
      This pull request uses {%include_example%} to add an example for the python cross validator to ml-guide.
      
      Author: JeremyNixon <jnixon2@gmail.com>
      
      Closes #11240 from JeremyNixon/pipeline_include_example.
      230bbeaa
    • Xusen Yin's avatar
      [SPARK-13011] K-means wrapper in SparkR · 8d29001d
      Xusen Yin authored
      https://issues.apache.org/jira/browse/SPARK-13011
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #11124 from yinxusen/SPARK-13011.
      8d29001d
    • Timothy Hunter's avatar
      [SPARK-6761][SQL][ML] Fixes to API and documentation of approximate quantiles · 15e30155
      Timothy Hunter authored
      ## What changes were proposed in this pull request?
      
      This continues  thunterdb 's work on `approxQuantile` API. It changes the signature of `approxQuantile` from `(col: String, quantile: Double, epsilon: Double): Double`  to `(col: String, probabilities: Array[Double], relativeError: Double): Array[Double]` and update API doc. It also improves the error message in tests and simplifies the merge algorithm for summaries.
      
      ## How was the this patch tested?
      
      Use the same unit tests as before.
      
      Closes #11325
      
      Author: Timothy Hunter <timhunter@databricks.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #11332 from mengxr/SPARK-6761.
      15e30155
    • Davies Liu's avatar
      [SPARK-13373] [SQL] generate sort merge join · 9cdd867d
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      Generates code for SortMergeJoin.
      
      ## How was the this patch tested?
      
      Unit tests and manually tested with TPCDS Q72, which showed 70% performance improvements (from 42s to 25s), but micro benchmark only show minor improvements, it may depends the distribution of data and number of columns.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11248 from davies/gen_smj.
      9cdd867d
    • Davies Liu's avatar
      [SPARK-13329] [SQL] considering output for statistics of logical plan · c481bdf5
      Davies Liu authored
      The current implementation of statistics of UnaryNode does not considering output (for example, Project may product much less columns than it's child), we should considering it to have a better guess.
      
      We usually only join with few columns from a parquet table, the size of projected plan could be much smaller than the original parquet files. Having a better guess of size help we choose between broadcast join or sort merge join.
      
      After this PR, I saw a few queries choose broadcast join other than sort merge join without turning spark.sql.autoBroadcastJoinThreshold for every query, ended up with about 6-8X improvements on end-to-end time.
      
      We use `defaultSize` of DataType to estimate the size of a column, currently For DecimalType/StringType/BinaryType and UDT, we are over-estimate too much (4096 Bytes), so this PR change them to some more reasonable values. Here are the new defaultSize for them:
      
      DecimalType:  8 or 16 bytes, based on the precision
      StringType:  20 bytes
      BinaryType: 100 bytes
      UDF: default size of SQL type
      
      These numbers are not perfect (hard to have a perfect number for them), but should be better than 4096.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #11210 from davies/statics.
      c481bdf5
    • Michael Armbrust's avatar
      [SPARK-13440][SQL] ObjectType should accept any ObjectType, If should not care about nullability · c5bfe5d2
      Michael Armbrust authored
      The type checking functions of `If` and `UnwrapOption` are fixed to eliminate spurious failures.  `UnwrapOption` was checking for an input of `ObjectType` but `ObjectType`'s accept function was hard coded to return `false`.  `If`'s type check was returning a false negative in the case that the two options differed only by nullability.
      
      Tests added:
       -  an end-to-end regression test is added to `DatasetSuite` for the reported failure.
       - all the unit tests in `ExpressionEncoderSuite` are augmented to also confirm successful analysis.  These tests are actually what pointed out the additional issues with `If` resolution.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #11316 from marmbrus/datasetOptions.
      c5bfe5d2
    • Lianhui Wang's avatar
      [SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab · 9f426339
      Lianhui Wang authored
      andrewor14 squito Dead Executors should also be displayed on Executor Tab.
      as following:
      ![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png)
      
      Author: Lianhui Wang <lianhuiwang09@gmail.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Andrew Or <andrew@databricks.com>
      
      Closes #10058 from lianhuiwang/SPARK-7729.
      9f426339
    • Grzegorz Chilkiewicz's avatar
      [SPARK-13338][ML] Allow setting 'degree' parameter to 1 for PolynomialExpansion · 5d69eaf0
      Grzegorz Chilkiewicz authored
      Author: Grzegorz Chilkiewicz <grzegorz.chilkiewicz@codilime.com>
      
      Closes #11216 from grzegorz-chilkiewicz/master.
      5d69eaf0
    • zhuol's avatar
      [SPARK-13364] Sort appId as num rather than str in history page. · 4d1e5f92
      zhuol authored
      ## What changes were proposed in this pull request?
      
      History page now sorts the appID as a string, which can lead to unexpected order for the case "application_11111_9" and "application_11111_20".
      Add a new sort type called appId-numeric can fix it.
      
      ## How was the this patch tested?
      This patch was manually tested with UI. See the screenshot below:
      ![sortappidbetter](https://cloud.githubusercontent.com/assets/11683054/13185564/7f941a16-d707-11e5-8fb7-0316368d3030.png)
      
      Author: zhuol <zhuol@yahoo-inc.com>
      
      Closes #11259 from zhuoliu/13364.
      4d1e5f92
    • Liang-Chi Hsieh's avatar
      [SPARK-13358] [SQL] Retrieve grep path when do benchmark · 87d7f890
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-13358
      
      When trying to run a benchmark, I found that on my Ubuntu linux grep is not in /usr/bin/ but /bin/. So wondering if it is better to use which to retrieve grep path.
      
      cc davies
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #11231 from viirya/benchmark-grep-path.
      87d7f890
    • jerryshao's avatar
      [SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode · e99d0170
      jerryshao authored
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #11229 from jerryshao/SPARK-13220.
      e99d0170
    • gatorsmile's avatar
      [SPARK-13263][SQL] SQL Generation Support for Tablesample · 87250580
      gatorsmile authored
      In the parser, tableSample clause is part of tableSource.
      ```
      tableSource
      init { gParent.pushMsg("table source", state); }
      after { gParent.popMsg(state); }
          : tabname=tableName
          ((tableProperties) => props=tableProperties)?
          ((tableSample) => ts=tableSample)?
          ((KW_AS) => (KW_AS alias=Identifier)
          |
          (Identifier) => (alias=Identifier))?
          -> ^(TOK_TABREF $tabname $props? $ts? $alias?)
          ;
      ```
      
      Two typical query samples using TABLESAMPLE are:
      ```
          "SELECT s.id FROM t0 TABLESAMPLE(10 PERCENT) s"
          "SELECT * FROM t0 TABLESAMPLE(0.1 PERCENT)"
      ```
      
      FYI, the logical plan of a TABLESAMPLE query:
      ```
      sql("SELECT * FROM t0 TABLESAMPLE(0.1 PERCENT)").explain(true)
      
      == Analyzed Logical Plan ==
      id: bigint
      Project [id#16L]
      +- Sample 0.0, 0.001, false, 381
         +- Subquery t0
            +- Relation[id#16L] ParquetRelation
      ```
      
      Thanks! cc liancheng
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      This patch had conflicts when merged, resolved by
      Committer: Cheng Lian <lian@databricks.com>
      
      Closes #11148 from gatorsmile/tablesplitsample.
      87250580
    • movelikeriver's avatar
      [SPARK-13257][IMPROVEMENT] Refine naive Bayes example by checking model after loading it · 5cd3e6f6
      movelikeriver authored
      Refine naive Bayes example by checking model after loading it
      
      Author: movelikeriver <mars.lenjoy@gmail.com>
      
      Closes #11125 from movelikeriver/naive_bayes.
      5cd3e6f6
    • Xiangrui Meng's avatar
      [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply · 764ca180
      Xiangrui Meng authored
      `GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call it in LDA without validating this requirement. So it might introduce errors. Replacing it by `Graph.apply` would be safer and more proper because it is a public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` here (though it doesn't seem so based on the implementation) or the test cases are special. jkbradley ankurdave
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #11226 from mengxr/SPARK-13355.
      764ca180
    • Yanbo Liang's avatar
      [SPARK-13429][MLLIB] Unify Logistic Regression convergence tolerance of ML & MLlib · 72427c3e
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      In order to provide better and consistent result, let's change the default value of MLlib ```LogisticRegressionWithLBFGS convergenceTol``` from ```1E-4``` to ```1E-6``` which will be equal to ML ```LogisticRegression```.
      cc dbtsai
      ## How was the this patch tested?
      unit tests
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #11299 from yanboliang/spark-13429.
      72427c3e
Loading