Skip to content
Snippets Groups Projects
  1. Aug 24, 2016
    • Junyang Qian's avatar
      [SPARKR][MINOR] Fix doc for show method · d2932a0e
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      The original doc of `show` put methods for multiple classes together but the text only talks about `SparkDataFrame`. This PR tries to fix this problem.
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14776 from junyangq/SPARK-FixShowDoc.
      d2932a0e
    • Yanbo Liang's avatar
      [MINOR][DOC] Fix wrong ml.feature.Normalizer document. · 45b786ac
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      The ```ml.feature.Normalizer``` examples illustrate L1 norm rather than L2, we should correct corresponding document.
      ![image](https://cloud.githubusercontent.com/assets/1962026/17928637/85aec284-69b0-11e6-9b13-d465ee560581.png)
      
      ## How was this patch tested?
      Doc change, no test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14787 from yanboliang/normalizer.
      45b786ac
    • VinceShieh's avatar
      [SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer... · 92c0eaf3
      VinceShieh authored
      [SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer when some quantiles are duplicated
      
      ## What changes were proposed in this pull request?
      
      In cases when QuantileDiscretizerSuite is called upon a numeric array with duplicated elements,  we will  take the unique elements generated from approxQuantiles as input for Bucketizer.
      
      ## How was this patch tested?
      
      An unit test is added in QuantileDiscretizerSuite
      
      QuantileDiscretizer.fit will throw an illegal exception when calling setSplits on a list of splits
      with duplicated elements. Bucketizer.setSplits should only accept either a numeric vector of two
      or more unique cut points, although that may produce less number of buckets than requested.
      
      Signed-off-by: VinceShieh <vincent.xieintel.com>
      
      Author: VinceShieh <vincent.xie@intel.com>
      
      Closes #14747 from VinceShieh/SPARK-17086.
      92c0eaf3
    • Weiqing Yang's avatar
      [MINOR][BUILD] Fix Java CheckStyle Error · 673a80d2
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing list), besides the critical bugs, it's better to fix the code style errors before the release.
      
      Before:
      ```
      ./dev/lint-java
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525] (sizes) LineLength: Line is longer than 100 characters (found 119).
      [ERROR] src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103).
      ```
      After:
      ```
      ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      ## How was this patch tested?
      Manual.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14768 from Sherry302/fixjavastyle.
      673a80d2
    • Wenchen Fan's avatar
      [SPARK-17186][SQL] remove catalog table type INDEX · 52fa45d6
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Actually Spark SQL doesn't support index, the catalog table type `INDEX` is from Hive. However, most operations in Spark SQL can't handle index table, e.g. create table, alter table, etc.
      
      Logically index table should be invisible to end users, and Hive also generates special table name for index table to avoid users accessing it directly. Hive has special SQL syntax to create/show/drop index tables.
      
      At Spark SQL side, although we can describe index table directly, but the result is unreadable, we should use the dedicated SQL syntax to do it(e.g. `SHOW INDEX ON tbl`). Spark SQL can also read index table directly, but the result is always empty.(Can hive read index table directly?)
      
      This PR remove the table type `INDEX`, to make it clear that Spark SQL doesn't support index currently.
      
      ## How was this patch tested?
      
      existing tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #14752 from cloud-fan/minor2.
      52fa45d6
    • Weiqing Yang's avatar
      [MINOR][SQL] Remove implemented functions from comments of 'HiveSessionCatalog.scala' · b9994ad0
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      This PR removes implemented functions from comments of `HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`.
      
      ## How was this patch tested?
      Manual.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14769 from Sherry302/cleanComment.
      b9994ad0
  2. Aug 23, 2016
    • Tejas Patil's avatar
      [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader` · c1937dd1
      Tejas Patil authored
      ## What changes were proposed in this pull request?
      
      Jira: https://issues.apache.org/jira/browse/SPARK-16862
      
      `BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k buffer to read data off disk. This PR makes it configurable to improve on disk reads. I have made the default value to be 1 MB as with that value I observed improved performance.
      
      ## How was this patch tested?
      
      I am relying on the existing unit tests.
      
      ## Performance
      
      After deploying this change to prod and setting the config to 1 mb, there was a 12% reduction in the CPU time and 19.5% reduction in CPU reservation time.
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #14726 from tejasapatil/spill_buffer_2.
      c1937dd1
    • Josh Rosen's avatar
      [SPARK-17194] Use single quotes when generating SQL for string literals · bf8ff833
      Josh Rosen authored
      When Spark emits SQL for a string literal, it should wrap the string in single quotes, not double quotes. Databases which adhere more strictly to the ANSI SQL standards, such as Postgres, allow only single-quotes to be used for denoting string literals (see http://stackoverflow.com/a/1992331/590203).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14763 from JoshRosen/SPARK-17194.
      bf8ff833
    • Zheng RuiFeng's avatar
      [TRIVIAL] Typo Fix · 6555ef0c
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Fix a typo
      
      ## How was this patch tested?
      no tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #14772 from zhengruifeng/minor_numClasses.
      6555ef0c
    • hyukjinkwon's avatar
      [MINOR][DOC] Use standard quotes instead of "curly quote" marks from Mac in... · 58855991
      hyukjinkwon authored
      [MINOR][DOC] Use standard quotes instead of "curly quote" marks from Mac in structured streaming programming guides
      
      ## What changes were proposed in this pull request?
      
      This PR fixes curly quotes (`“` and `”` ) to standard quotes (`"`).
      
      This will be a actual problem when users copy and paste the examples. This would not work.
      
      This seems only happening in `structured-streaming-programming-guide.md`.
      
      ## How was this patch tested?
      
      Manually built.
      
      This will change some examples to be correctly marked down as below:
      
      ![2016-08-23 3 24 13](https://cloud.githubusercontent.com/assets/6477701/17882878/2a38332e-694a-11e6-8e84-76bdb89151e0.png)
      
      to
      
      ![2016-08-23 3 26 06](https://cloud.githubusercontent.com/assets/6477701/17882888/376eaa28-694a-11e6-8b88-32ea83997037.png)
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #14770 from HyukjinKwon/minor-quotes.
      58855991
    • Junyang Qian's avatar
      [SPARKR][MINOR] Remove reference link for common Windows environment variables · 8fd63e80
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      The PR removes reference link in the doc for environment variables for common Windows folders. The cran check gave code 503: service unavailable on the original link.
      
      ## How was this patch tested?
      
      Manual check.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14767 from junyangq/SPARKR-RemoveLink.
      8fd63e80
    • Davies Liu's avatar
      [SPARK-13286] [SQL] add the next expression of SQLException as cause · 9afdfc94
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      Some JDBC driver (for example PostgreSQL) does not use the underlying exception as cause, but have another APIs (getNextException) to access that, so it it's included in the error logging, making us hard to find the root cause, especially in batch mode.
      
      This PR will pull out the next exception and add it as cause (if it's different) or suppressed (if there is another different cause).
      
      ## How was this patch tested?
      
      Can't reproduce this on the default JDBC driver, so did not add a regression test.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #14722 from davies/keep_cause.
      9afdfc94
    • Jagadeesan's avatar
      [SPARK-17095] [Documentation] [Latex and Scala doc do not play nicely] · 97d461b7
      Jagadeesan authored
      ## What changes were proposed in this pull request?
      
      In Latex, it is common to find "}}}" when closing several expressions at once. [SPARK-16822](https://issues.apache.org/jira/browse/SPARK-16822) added Mathjax to render Latex equations in scaladoc. However, when scala doc sees "}}}" or "{{{" it treats it as a special character for code block. This results in some very strange output.
      
      Author: Jagadeesan <as2@us.ibm.com>
      
      Closes #14688 from jagadeesanas2/SPARK-17095.
      97d461b7
    • Jacek Laskowski's avatar
      [SPARK-17199] Use CatalystConf.resolver for case-sensitivity comparison · 9d376ad7
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Use `CatalystConf.resolver` consistently for case-sensitivity comparison (removed dups).
      
      ## How was this patch tested?
      
      Local build. Waiting for Jenkins to ensure clean build and test.
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #14771 from jaceklaskowski/17199-catalystconf-resolver.
      9d376ad7
    • Sean Zhong's avatar
      [SPARK-17188][SQL] Moves class QuantileSummaries to project catalyst for... · cc33460a
      Sean Zhong authored
      [SPARK-17188][SQL] Moves class QuantileSummaries to project catalyst for implementing percentile_approx
      
      ## What changes were proposed in this pull request?
      
      This is a sub-task of [SPARK-16283](https://issues.apache.org/jira/browse/SPARK-16283) (Implement percentile_approx SQL function), which moves class QuantileSummaries to project catalyst so that it can be reused when implementing aggregation function `percentile_approx`.
      
      ## How was this patch tested?
      
      This PR only does class relocation, class implementation is not changed.
      
      Author: Sean Zhong <seanzhong@databricks.com>
      
      Closes #14754 from clockfly/move_QuantileSummaries_to_catalyst.
      cc33460a
  3. Aug 22, 2016
    • Felix Cheung's avatar
      [SPARKR][MINOR] Update R DESCRIPTION file · d2b3d3e6
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Update DESCRIPTION
      
      ## How was this patch tested?
      
      Run install and CRAN tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14764 from felixcheung/rpackagedescription.
      d2b3d3e6
    • Cheng Lian's avatar
      [SPARK-17182][SQL] Mark Collect as non-deterministic · 2cdd92a7
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This PR marks the abstract class `Collect` as non-deterministic since the results of `CollectList` and `CollectSet` depend on the actual order of input rows.
      
      ## How was this patch tested?
      
      Existing test cases should be enough.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #14749 from liancheng/spark-17182-non-deterministic-collect.
      2cdd92a7
    • Shivaram Venkataraman's avatar
      [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh · 920806ab
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      ## How was this patch tested?
      
      This change adds CRAN documentation checks to be run as a part of `R/run-tests.sh` . As this script is also used by Jenkins this means that we will get documentation checks on every PR going forward.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #14759 from shivaram/sparkr-cran-jenkins.
      920806ab
    • hqzizania's avatar
      [SPARK-17090][FOLLOW-UP][ML] Add expert param support to SharedParamsCodeGen · 37f0ab70
      hqzizania authored
      ## What changes were proposed in this pull request?
      
      Add expert param support to SharedParamsCodeGen where aggregationDepth a expert param is added.
      
      Author: hqzizania <hqzizania@gmail.com>
      
      Closes #14738 from hqzizania/SPARK-17090-minor.
      37f0ab70
    • gatorsmile's avatar
      [SPARK-17144][SQL] Removal of useless CreateHiveTableAsSelectLogicalPlan · 6d93f9e0
      gatorsmile authored
      ## What changes were proposed in this pull request?
      `CreateHiveTableAsSelectLogicalPlan` is a dead code after refactoring.
      
      ## How was this patch tested?
      N/A
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #14707 from gatorsmile/removeCreateHiveTable.
      6d93f9e0
    • Eric Liang's avatar
      [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication · 8e223ea6
      Eric Liang authored
      ## What changes were proposed in this pull request?
      
      This is a straightforward clone of JoshRosen 's original patch. I have follow-up changes to fix block replication for repl-defined classes as well, but those appear to be flaking tests so I'm going to leave that for SPARK-17042
      
      ## How was this patch tested?
      
      End-to-end test in ReplSuite (also more tests in DistributedSuite from the original patch).
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #14311 from ericl/spark-16550.
      8e223ea6
    • Felix Cheung's avatar
      [SPARK-16508][SPARKR] doc updates and more CRAN check fixes · 71afeeea
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      replace ``` ` ``` in code doc with `\code{thing}`
      remove added `...` for drop(DataFrame)
      fix remaining CRAN check warnings
      
      ## How was this patch tested?
      
      create doc with knitr
      
      junyangq
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14734 from felixcheung/rdoccleanup.
      71afeeea
    • Eric Liang's avatar
      [SPARK-17162] Range does not support SQL generation · 84770b59
      Eric Liang authored
      ## What changes were proposed in this pull request?
      
      The range operator previously didn't support SQL generation, which made it not possible to use in views.
      
      ## How was this patch tested?
      
      Unit tests.
      
      cc hvanhovell
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #14724 from ericl/spark-17162.
      84770b59
    • Sean Zhong's avatar
      [MINOR][SQL] Fix some typos in comments and test hints · 929cb8be
      Sean Zhong authored
      ## What changes were proposed in this pull request?
      
      Fix some typos in comments and test hints
      
      ## How was this patch tested?
      
      N/A.
      
      Author: Sean Zhong <seanzhong@databricks.com>
      
      Closes #14755 from clockfly/fix_minor_typo.
      929cb8be
    • Shivaram Venkataraman's avatar
      [SPARKR][MINOR] Add Xiangrui and Felix to maintainers · 6f3cd36f
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the package description.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #14758 from shivaram/sparkr-maintainers.
      6f3cd36f
    • Felix Cheung's avatar
      [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation in test · 0583ecda
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      refactor, cleanup, reformat, fix deprecation in test
      
      ## How was this patch tested?
      
      unit tests, manual tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14735 from felixcheung/rmllibutil.
      0583ecda
    • Sean Owen's avatar
      [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6 · 342278c0
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Collect GC discussion in one section, and documenting findings about G1 GC heap region size.
      
      ## How was this patch tested?
      
      Jekyll doc build
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14732 from srowen/SPARK-16320.
      342278c0
    • Junyang Qian's avatar
      [SPARKR][MINOR] Fix Cache Folder Path in Windows · 209e1b3c
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR tries to fix the scheme of local cache folder in Windows. The name of the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`.
      
      ## How was this patch tested?
      
      Manual test in Windows 7.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14743 from junyangq/SPARKR-FixWindowsInstall.
      209e1b3c
    • Holden Karau's avatar
      [SPARK-15113][PYSPARK][ML] Add missing num features num classes · b264cbb1
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Add missing `numFeatures` and `numClasses` to the wrapped Java models in PySpark ML pipelines. Also tag `DecisionTreeClassificationModel` as Expiremental to match Scala doc.
      
      ## How was this patch tested?
      
      Extended doctests
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12889 from holdenk/SPARK-15113-add-missing-numFeatures-numClasses.
      b264cbb1
    • Jagadeesan's avatar
      [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS] · bd965506
      Jagadeesan authored
      Changes in  Spark Stuctured Streaming doc in this link
      https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations
      
      Author: Jagadeesan <as2@us.ibm.com>
      
      Closes #14715 from jagadeesanas2/SPARK-17085.
      bd965506
    • Davies Liu's avatar
      [SPARK-17115][SQL] decrease the threshold when split expressions · 8d35a6f6
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      In 2.0, we change the threshold of splitting expressions from 16K to 64K, which cause very bad performance on wide table, because the generated method can't be JIT compiled by default (above the limit of 8K bytecode).
      
      This PR will decrease it to 1K, based on the benchmark results for a wide table with 400 columns of LongType.
      
      It also fix a bug around splitting expression in whole-stage codegen (it should not split them).
      
      ## How was this patch tested?
      
      Added benchmark suite.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #14692 from davies/split_exprs.
      8d35a6f6
    • GraceH's avatar
      [SPARK-16968] Document additional options in jdbc Writer · 4b6c2cbc
      GraceH authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      This is the document for previous JDBC Writer options.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Unit test has been added in previous PR.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: GraceH <jhuang1@paypal.com>
      
      Closes #14683 from GraceH/jdbc_options.
      4b6c2cbc
    • Richael's avatar
      [SPARK-17127] Make unaligned access in unsafe available for AArch64 · 083de00c
      Richael authored
      ## # What changes were proposed in this pull request?
      
      From the spark of version 2.0.0 , when MemoryMode.OFF_HEAP is set , whether the architecture supports unaligned access or not is checked. If the check doesn't pass, exception is raised.
      
      We know that AArch64 also supports unaligned access , but now only i386, x86, amd64, and X86_64 are included.
      
      I think we should include aarch64 when performing the check.
      
      ## How was this patch tested?
      
      Unit test suite
      
      Author: Richael <Richael.Zhuang@arm.com>
      
      Closes #14700 from yimuxi/zym_change_unsafe.
      083de00c
    • Wenchen Fan's avatar
      [SPARK-16498][SQL] move hive hack for data source table into HiveExternalCatalog · b2074b66
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Spark SQL doesn't have its own meta store yet, and use hive's currently. However, hive's meta store has some limitations(e.g. columns can't be too many, not case-preserving, bad decimal type support, etc.), so we have some hacks to successfully store data source table metadata into hive meta store, i.e. put all the information in table properties.
      
      This PR moves these hacks to `HiveExternalCatalog`, tries to isolate hive specific logic in one place.
      
      changes overview:
      
      1.  **before this PR**: we need to put metadata(schema, partition columns, etc.) of data source tables to table properties before saving it to external catalog, even the external catalog doesn't use hive metastore(e.g. `InMemoryCatalog`)
      **after this PR**: the table properties tricks are only in `HiveExternalCatalog`, the caller side doesn't need to take care of it anymore.
      
      2. **before this PR**: because the table properties tricks are done outside of external catalog, so we also need to revert these tricks when we read the table metadata from external catalog and use it. e.g. in `DescribeTableCommand` we will read schema and partition columns from table properties.
      **after this PR**: The table metadata read from external catalog is exactly the same with what we saved to it.
      
      bonus: now we can create data source table using `SessionCatalog`, if schema is specified.
      breaks: `schemaStringLengthThreshold` is not configurable anymore. `hive.default.rcfile.serde` is not configurable anymore.
      
      ## How was this patch tested?
      
      existing tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #14155 from cloud-fan/catalog-table.
      b2074b66
  4. Aug 21, 2016
    • Dongjoon Hyun's avatar
      [SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly · 91c23976
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Currently, `NullPropagation` optimizer replaces `COUNT` on null literals in a bottom-up fashion. During that, `WindowExpression` is not covered properly. This PR adds the missing propagation logic.
      
      **Before**
      ```scala
      scala> sql("SELECT COUNT(1 + NULL) OVER ()").show
      java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
      ```
      
      **After**
      ```scala
      scala> sql("SELECT COUNT(1 + NULL) OVER ()").show
      +----------------------------------------------------------------------------------------------+
      |count((1 + CAST(NULL AS INT))) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)|
      +----------------------------------------------------------------------------------------------+
      |                                                                                             0|
      +----------------------------------------------------------------------------------------------+
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins test with a new test case.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #14689 from dongjoon-hyun/SPARK-17098.
      91c23976
    • Xiangrui Meng's avatar
      [MINOR][R] add SparkR.Rcheck/ and SparkR_*.tar.gz to R/.gitignore · ab714346
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      Ignore temp files generated by `check-cran.sh`.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #14740 from mengxr/R-gitignore.
      ab714346
    • wm624@hotmail.com's avatar
      [SPARK-17002][CORE] Document that spark.ssl.protocol. is required for SSL · e328f577
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      `spark.ssl.enabled`=true, but failing to set `spark.ssl.protocol` will fail and throw meaningless exception. `spark.ssl.protocol` is required when `spark.ssl.enabled`.
      
      Improvement: require `spark.ssl.protocol` when initializing SSLContext, otherwise throws an exception to indicate that.
      
      Remove the OrElse("default").
      
      Document this requirement in configure.md
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manual tests:
      Build document and check document
      
      Configure `spark.ssl.enabled` only, it throws exception below:
      6/08/16 16:04:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mwang); groups with view permissions: Set(); users  with modify permissions: Set(mwang); groups with modify permissions: Set()
      Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: spark.ssl.protocol is required when enabling SSL connections.
      	at scala.Predef$.require(Predef.scala:224)
      	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:285)
      	at org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1026)
      	at org.apache.spark.deploy.master.Master$.main(Master.scala:1011)
      	at org.apache.spark.deploy.master.Master.main(Master.scala)
      
      Configure `spark.ssl.protocol`  and `spark.ssl.protocol`
      It works fine.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #14674 from wangmiao1981/ssl.
      e328f577
    • Yanbo Liang's avatar
      [SPARK-16961][FOLLOW-UP][SPARKR] More robust test case for spark.gaussianMixture. · 7f08a60b
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      #14551 fixed off-by-one bug in ```randomizeInPlace``` and some test failure caused by this fix.
      But for SparkR ```spark.gaussianMixture``` test case, the fix is inappropriate. It only changed the output result of native R which should be compared by SparkR, however, it did not change the R code in annotation which is used for reproducing the result in native R. It will confuse users who can not reproduce the same result in native R. This PR sends a more robust test case which can produce same result between SparkR and native R.
      
      ## How was this patch tested?
      Unit test update.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14730 from yanboliang/spark-16961-followup.
      7f08a60b
  5. Aug 20, 2016
    • hqzizania's avatar
      [SPARK-17090][ML] Make tree aggregation level in linear/logistic regression configurable · 61ef74f2
      hqzizania authored
      ## What changes were proposed in this pull request?
      
      Linear/logistic regression use treeAggregate with default depth (always = 2) for collecting coefficient gradient updates to the driver. For high dimensional problems, this can cause OOM error on the driver. This patch makes it configurable to avoid this problem if users' input data has many features. It adds a HasTreeDepth API in `sharedParams.scala`, and extends it to both Linear regression and logistic regression in .ml
      
      Author: hqzizania <hqzizania@gmail.com>
      
      Closes #14717 from hqzizania/SPARK-17090.
      61ef74f2
    • Bryan Cutler's avatar
      [SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf doesn't... · 9f37d4ea
      Bryan Cutler authored
      [SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf doesn't exist in dependent module
      
      ## What changes were proposed in this pull request?
      
      Adding a "(runtime)" to the dependency configuration will set a fallback configuration to be used if the requested one is not found.  E.g. with the setting "default(runtime)", Ivy will look for the conf "default" in the module ivy file and if not found will look for the conf "runtime".  This can help with the case when using "sbt publishLocal" which does not write a "default" conf in the published ivy.xml file.
      
      ## How was this patch tested?
      used spark-submit with --packages option for a package published locally with no default conf, and a package resolved from Maven central.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13428 from BryanCutler/fallback-package-conf-SPARK-12666.
      9f37d4ea
Loading