Skip to content
Snippets Groups Projects
  1. Aug 09, 2016
    • Mariusz Strzelecki's avatar
      [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3 · 29081b58
      Mariusz Strzelecki authored
      ## What changes were proposed in this pull request?
      
      Ability to use KafkaUtils.createDirectStream with starting offsets in python 3 by using java.lang.Number instead of Long during param mapping in scala helper. This allows py4j to pass Integer or Long to the map and resolves ClassCastException problems.
      
      ## How was this patch tested?
      
      unit tests
      
      jerryshao  - could you please look at this PR?
      
      Author: Mariusz Strzelecki <mariusz.strzelecki@allegrogroup.com>
      
      Closes #14540 from szczeles/kafka_pyspark.
      29081b58
    • Yanbo Liang's avatar
      [SPARK-16933][ML] Fix AFTAggregator in AFTSurvivalRegression serializes unnecessary data. · 182e1190
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Similar to ```LeastSquaresAggregator``` in #14109, ```AFTAggregator``` used for ```AFTSurvivalRegression``` ends up serializing the ```parameters``` and ```featuresStd```, which is not necessary and can cause performance issues for high dimensional data. This patch removes this serialization. This PR is highly inspired by #14109.
      
      ## How was this patch tested?
      I tested this locally and verified the serialization reduction.
      
      Before patch
      ![image](https://cloud.githubusercontent.com/assets/1962026/17512035/abb93f04-5dda-11e6-97d3-8ae6b61a0dfd.png)
      
      After patch
      ![image](https://cloud.githubusercontent.com/assets/1962026/17512024/9e0dc44c-5dda-11e6-93d0-6e130ba0d6aa.png)
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14519 from yanboliang/spark-16933.
      182e1190
    • Reynold Xin's avatar
      [SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.execution package · 511f52f8
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This package is meant to be internal, and as a result it does not make sense to mark things as private[sql] or private[spark]. It simply makes debugging harder when Spark developers need to inspect the plans at runtime.
      
      This patch removes all private[sql] and private[spark] visibility modifiers in org.apache.spark.sql.execution.
      
      ## How was this patch tested?
      N/A - just visibility changes.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14554 from rxin/remote-private.
      511f52f8
    • Michael Gummelt's avatar
      [SPARK-16809] enable history server links in dispatcher UI · 62e62124
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Links the Spark Mesos Dispatcher UI to the history server UI
      
      - adds spark.mesos.dispatcher.historyServer.url
      - explicitly generates frameworkIDs for the launched drivers, so the dispatcher knows how to correlate drivers and frameworkIDs
      
      ## How was this patch tested?
      
      manual testing
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      Author: Sergiusz Urbaniak <sur@mesosphere.io>
      
      Closes #14414 from mgummelt/history-server.
      62e62124
    • Dongjoon Hyun's avatar
      [SPARK-16940][SQL] `checkAnswer` should raise `TestFailedException` for wrong results · 2154345b
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes the following to make `checkAnswer` raise `TestFailedException` again instead of `java.util.NoSuchElementException: key not found: TZ` in the environments without `TZ` variable. Also, this PR adds `QueryTestSuite` class for testing `QueryTest` itself.
      
      ```scala
      - |Timezone Env: ${sys.env("TZ")}
      + |Timezone Env: ${sys.env.getOrElse("TZ", "")}
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins tests with a new test suite.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #14528 from dongjoon-hyun/SPARK-16940.
      2154345b
    • Sun Rui's avatar
      [SPARK-16522][MESOS] Spark application throws exception on exit. · af710e5b
      Sun Rui authored
      ## What changes were proposed in this pull request?
      Spark applications running on Mesos throw exception upon exit. For details, refer to https://issues.apache.org/jira/browse/SPARK-16522.
      
      I am not sure if there is any better fix, so wait for review comments.
      
      ## How was this patch tested?
      Manual test. Observed that the exception is gone upon application exit.
      
      Author: Sun Rui <sunrui2016@gmail.com>
      
      Closes #14175 from sun-rui/SPARK-16522.
      af710e5b
    • Sean Owen's avatar
      [SPARK-16606][CORE] Misleading warning for SparkContext.getOrCreate "WARN... · 801e4d09
      Sean Owen authored
      [SPARK-16606][CORE] Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an existing SparkContext, some configuration may not take effect."
      
      ## What changes were proposed in this pull request?
      
      SparkContext.getOrCreate shouldn't warn about ignored config if
      
      - it wasn't ignored because a new context is created with it or
      - no config was actually provided
      
      ## How was this patch tested?
      
      Jenkins + existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14533 from srowen/SPARK-16606.
      801e4d09
  2. Aug 08, 2016
    • hyukjinkwon's avatar
      [SPARK-16610][SQL] Add `orc.compress` as an alias for `compression` option. · bb2b9d0a
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      For ORC source, Spark SQL has a writer option `compression`, which is used to set the codec and its value will be also set to `orc.compress` (the orc conf used for codec). However, if a user only set `orc.compress` in the writer option, we should not use the default value of `compression` (snappy) as the codec. Instead, we should respect the value of `orc.compress`.
      
      This PR makes ORC data source not ignoring `orc.compress` when `comperssion` is unset.
      
      So, here is the behaviour,
      
       1. Check `compression` and use this if it is set.
       2. If `compression` is not set, check `orc.compress` and use it.
       3. If `compression` and `orc.compress` are not set, then use the default snappy.
      
      ## How was this patch tested?
      
      Unit test in `OrcQuerySuite`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #14518 from HyukjinKwon/SPARK-16610.
      bb2b9d0a
    • Alice's avatar
      [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug · e17a76ef
      Alice authored
      ## What changes were proposed in this pull request?
      
      Add a constant iterator which point to head of result. The header will be used to reset iterator when fetch result from first row repeatedly.
      JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563
      
      ## How was this patch tested?
      
      This bug was found when using Cloudera HUE connecting to spark sql thrift server, currently SQL statement result can be only fetched for once. The fix was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL results repeatedly through thrift server.
      
      Author: Alice <alice.gugu@gmail.com>
      Author: Alice <guhq@garena.com>
      
      Closes #14218 from alicegugu/SparkSQLFetchResultsBug.
      e17a76ef
    • Sean Zhong's avatar
      [SPARK-16898][SQL] Adds argument type information for typed logical plan like... · bca43cd6
      Sean Zhong authored
      [SPARK-16898][SQL] Adds argument type information for typed logical plan like MapElements, TypedFilter, and AppendColumn
      
      ## What changes were proposed in this pull request?
      
      This PR adds argument type information for typed logical plan like MapElements, TypedFilter, and AppendColumn, so that we can use these info in customized optimizer rule.
      
      ## How was this patch tested?
      
      Existing test.
      
      Author: Sean Zhong <seanzhong@databricks.com>
      
      Closes #14494 from clockfly/add_more_info_for_typed_operator.
      bca43cd6
    • Herman van Hovell's avatar
      [SPARK-16749][SQL] Simplify processing logic in LEAD/LAG processing. · df106588
      Herman van Hovell authored
      ## What changes were proposed in this pull request?
      The logic for LEAD/LAG processing is more complex that it needs to be. This PR fixes that.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Herman van Hovell <hvanhovell@databricks.com>
      
      Closes #14376 from hvanhovell/SPARK-16749.
      df106588
    • Michael Gummelt's avatar
      Update docs to include SASL support for RPC · 53d1c787
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Update docs to include SASL support for RPC
      
      Evidence: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala#L63
      
      ## How was this patch tested?
      
      Docs change only
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #14549 from mgummelt/sasl.
      53d1c787
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
    • Tathagata Das's avatar
      [SPARK-16953] Make requestTotalExecutors public Developer API to be consistent... · 86502390
      Tathagata Das authored
      [SPARK-16953] Make requestTotalExecutors public Developer API to be consistent with requestExecutors/killExecutors
      
      ## What changes were proposed in this pull request?
      
      RequestExecutors and killExecutor are public developer APIs for managing the number of executors allocated to the SparkContext. For consistency, requestTotalExecutors should also be a public Developer API, as it provides similar functionality. In fact, using requestTotalExecutors is more convenient that requestExecutors as the former is idempotent and the latter is not.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #14541 from tdas/SPARK-16953.
      86502390
    • Marcelo Vanzin's avatar
      [SPARK-16586][CORE] Handle JVM errors printed to stdout. · 1739e75f
      Marcelo Vanzin authored
      Some very rare JVM errors are printed to stdout, and that confuses
      the code in spark-class. So add a check so that those cases are
      detected and the proper error message is shown to the user.
      
      Tested by running spark-submit after setting "ulimit -v 32000".
      
      Closes #14231
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14508 from vanzin/SPARK-16586.
      1739e75f
    • gatorsmile's avatar
      [SPARK-16936][SQL] Case Sensitivity Support for Refresh Temp Table · 5959df21
      gatorsmile authored
      ### What changes were proposed in this pull request?
      Currently, the `refreshTable` API is always case sensitive.
      
      When users use the view name without the exact case match, the API silently ignores the call. Users might expect the command has been successfully completed. However, when users run the subsequent SQL commands, they might still get the exception, like
      ```
      Job aborted due to stage failure:
      Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in stage 4.0 (TID 7, localhost):
      java.io.FileNotFoundException:
      File file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/spark-bd4b9ea6-9aec-49c5-8f05-01cff426211e/part-r-00000-0c84b915-c032-4f2e-abf5-1d48fdbddf38.snappy.parquet does not exist
      ```
      
      This PR is to fix the issue.
      
      ### How was this patch tested?
      Added a test case.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #14523 from gatorsmile/refreshTempTable.
      5959df21
    • gatorsmile's avatar
      [SPARK-16457][SQL] Fix Wrong Messages when CTAS with a Partition By Clause · ab126909
      gatorsmile authored
      #### What changes were proposed in this pull request?
      When doing a CTAS with a Partition By clause, we got a wrong error message.
      
      For example,
      ```SQL
      CREATE TABLE gen__tmp
      PARTITIONED BY (key string)
      AS SELECT key, value FROM mytable1
      ```
      The error message we get now is like
      ```
      Operation not allowed: Schema may not be specified in a Create Table As Select (CTAS) statement(line 2, pos 0)
      ```
      
      However, based on the code, the message we should get is like
      ```
      Operation not allowed: A Create Table As Select (CTAS) statement is not allowed to create a partitioned table using Hive's file formats. Please use the syntax of "CREATE TABLE tableName USING dataSource OPTIONS (...) PARTITIONED BY ...\" to create a partitioned table through a CTAS statement.(line 2, pos 0)
      ```
      
      Currently, partitioning columns is part of the schema. This PR fixes the bug by changing the detection orders.
      
      #### How was this patch tested?
      Added test cases.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #14113 from gatorsmile/ctas.
      ab126909
    • Sean Zhong's avatar
      [SPARK-16906][SQL] Adds auxiliary info like input class and input schema in... · 94a9d11e
      Sean Zhong authored
      [SPARK-16906][SQL] Adds auxiliary info like input class and input schema in TypedAggregateExpression
      
      ## What changes were proposed in this pull request?
      
      This PR adds auxiliary info like input class and input schema in TypedAggregateExpression
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: Sean Zhong <seanzhong@databricks.com>
      
      Closes #14501 from clockfly/typed_aggregation.
      94a9d11e
    • Nattavut Sutyanyong's avatar
      [SPARK-16804][SQL] Correlated subqueries containing non-deterministic... · 06f5dc84
      Nattavut Sutyanyong authored
      [SPARK-16804][SQL] Correlated subqueries containing non-deterministic operations return incorrect results
      
      ## What changes were proposed in this pull request?
      
      This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase by returning an error message when the LIMIT is found in the path from the parent table to the correlated predicate in the subquery.
      
      ## How was this patch tested?
      
      ./dev/run-tests
      a new unit test on the problematic pattern.
      
      Author: Nattavut Sutyanyong <nsy.can@gmail.com>
      
      Closes #14411 from nsyca/master.
      06f5dc84
    • Weiqing Yang's avatar
      [SPARK-16945] Fix Java Lint errors · e10ca8de
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      This PR is to fix the minor Java linter errors as following:
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10] (modifier) RedundantModifier: Redundant 'final' modifier.
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10] (modifier) RedundantModifier: Redundant 'final' modifier.
      
      ## How was this patch tested?
      Manual test.
      dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14532 from Sherry302/master.
      e10ca8de
    • sethah's avatar
      [SPARK-16404][ML] LeastSquaresAggregators serializes unnecessary data · 1db1c656
      sethah authored
      ## What changes were proposed in this pull request?
      Similar to `LogisticAggregator`, `LeastSquaresAggregator` used for linear regression ends up serializing the coefficients and the features standard deviations, which is not necessary and can cause performance issues for high dimensional data. This patch removes this serialization.
      
      In https://github.com/apache/spark/pull/13729 the approach was to pass these values directly to the add method. The approach used here, initially, is to mark these fields as transient instead which gives the benefit of keeping the signature of the add method simple and interpretable. The downside is that it requires the use of `transient lazy val`s which are difficult to reason about if one is not quite familiar with serialization in Scala/Spark.
      
      ## How was this patch tested?
      
      **MLlib**
      ![image](https://cloud.githubusercontent.com/assets/7275795/16703660/436f79fa-4524-11e6-9022-ef00058ec718.png)
      
      **ML without patch**
      ![image](https://cloud.githubusercontent.com/assets/7275795/16703831/c4d50b9e-4525-11e6-80cb-9b58c850cd41.png)
      
      **ML with patch**
      ![image](https://cloud.githubusercontent.com/assets/7275795/16703675/63e0cf40-4524-11e6-9120-1f512a70e083.png)
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #14109 from sethah/LIR_serialize.
      1db1c656
    • Tejas Patil's avatar
      [SPARK-16919] Configurable update interval for console progress bar · e076fb05
      Tejas Patil authored
      ## What changes were proposed in this pull request?
      
      Currently the update interval for the console progress bar is hardcoded. This PR makes it configurable for users.
      
      ## How was this patch tested?
      
      Ran a long running job and with a high value of update interval, the updates were shown less frequently.
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #14507 from tejasapatil/SPARK-16919.
      e076fb05
  3. Aug 07, 2016
  4. Aug 06, 2016
    • Josh Rosen's avatar
      [SPARK-16925] Master should call schedule() after all executor exit events, not only failures · 4f5f9b67
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      This patch fixes a bug in Spark's standalone Master which could cause applications to hang if tasks cause executors to exit with zero exit codes.
      
      As an example of the bug, run
      
      ```
      sc.parallelize(1 to 1, 1).foreachPartition { _ => System.exit(0) }
      ```
      
      on a standalone cluster which has a single Spark application. This will cause all executors to die but those executors won't be replaced unless another Spark application or worker joins or leaves the cluster (or if an executor exits with a non-zero exit code). This behavior is caused by a bug in how the Master handles the `ExecutorStateChanged` event: the current implementation calls `schedule()` only if the executor exited with a non-zero exit code, so a task which causes a JVM to unexpectedly exit "cleanly" will skip the `schedule()` call.
      
      This patch addresses this by modifying the `ExecutorStateChanged` to always unconditionally call `schedule()`. This should be safe because it should always be safe to call `schedule()`; adding extra `schedule()` calls can only affect performance and should not introduce correctness bugs.
      
      ## How was this patch tested?
      
      I added a regression test in `DistributedSuite`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14510 from JoshRosen/SPARK-16925.
      4f5f9b67
  5. Aug 05, 2016
    • Nicholas Chammas's avatar
      [SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration + Update "important classes" · 2dd03886
      Nicholas Chammas authored
      ## Proposed Changes
      
      * Update the list of "important classes" in `pyspark.sql` to match 2.0.
      * Fix references to `UDFRegistration` so that the class shows up in the docs. It currently [doesn't](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html).
      * Remove some unnecessary whitespace in the Python RST doc files.
      
      I reused the [existing JIRA](https://issues.apache.org/jira/browse/SPARK-16772) I created last week for similar API doc fixes.
      
      ## How was this patch tested?
      
      * I ran `lint-python` successfully.
      * I ran `make clean build` on the Python docs and confirmed the results are as expected locally in my browser.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #14496 from nchammas/SPARK-16772-UDFRegistration.
      2dd03886
    • Artur Sukhenko's avatar
      [SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ… · 14dba452
      Artur Sukhenko authored
      ## What changes were proposed in this pull request?
      
      Mask `spark.authenticate.secret` on Spark environment page (Web UI).
      This is addition to https://github.com/apache/spark/pull/14409
      
      ## How was this patch tested?
      `./dev/run-tests`
      [info] ScalaTest
      [info] Run completed in 1 hour, 8 minutes, 38 seconds.
      [info] Total number of tests run: 2166
      [info] Suites: completed 65, aborted 0
      [info] Tests: succeeded 2166, failed 0, canceled 0, ignored 590, pending 0
      [info] All tests passed.
      
      Author: Artur Sukhenko <artur.sukhenko@gmail.com>
      
      Closes #14484 from Devian-ua/SPARK-16796.
      14dba452
    • hyukjinkwon's avatar
      [SPARK-16847][SQL] Prevent to potentially read corrupt statstics on binary in... · 55d6dad6
      hyukjinkwon authored
      [SPARK-16847][SQL] Prevent to potentially read corrupt statstics on binary in Parquet vectorized reader
      
      ## What changes were proposed in this pull request?
      
      This problem was found in [PARQUET-251](https://issues.apache.org/jira/browse/PARQUET-251) and we disabled filter pushdown on binary columns in Spark before. We enabled this after upgrading Parquet but it seems there is potential incompatibility for Parquet files written in lower Spark versions.
      
      Currently, this does not happen in normal Parquet reader. However, In Spark, we implemented a vectorized reader, separately with Parquet's standard API. For normal Parquet reader this is being handled but not in the vectorized reader.
      
      It is okay to just pass `FileMetaData`. This is being handled in parquet-mr (See https://github.com/apache/parquet-mr/commit/e3b95020f777eb5e0651977f654c1662e3ea1f29). This will prevent loading corrupt statistics in each page in Parquet.
      
      This PR replaces the deprecated usage of constructor.
      
      ## How was this patch tested?
      
      N/A
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #14450 from HyukjinKwon/SPARK-16847.
      55d6dad6
    • Yin Huai's avatar
      [SPARK-16901] Hive settings in hive-site.xml may be overridden by Hive's default values · e679bc3c
      Yin Huai authored
      ## What changes were proposed in this pull request?
      When we create the HiveConf for metastore client, we use a Hadoop Conf as the base, which may contain Hive settings in hive-site.xml (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L49). However, HiveConf's initialize function basically ignores the base Hadoop Conf and always its default values (i.e. settings with non-null default values) as the base (https://github.com/apache/hive/blob/release-1.2.1/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2687). So, even a user put javax.jdo.option.ConnectionURL in hive-site.xml, it is not used and Hive will use its default, which is jdbc:derby:;databaseName=metastore_db;create=true.
      
      This issue only shows up when `spark.sql.hive.metastore.jars` is not set to builtin.
      
      ## How was this patch tested?
      New test in HiveSparkSubmitSuite.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #14497 from yhuai/SPARK-16901.
      e679bc3c
    • Yanbo Liang's avatar
      [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for... · 6cbde337
      Yanbo Liang authored
      [SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/VectorAssembler and fix failed tests.
      
      ## What changes were proposed in this pull request?
      This is follow-up for #14378. When we add ```transformSchema``` for all estimators and transformers, I found there are tests failed for ```StringIndexer``` and ```VectorAssembler```. So I moved these parts of work separately in this PR, to make it more clear to review.
      The corresponding tests should throw ```IllegalArgumentException``` at schema validation period after we add ```transformSchema```. It's efficient that to throw exception at the start of ```fit``` or ```transform``` rather than during the process.
      
      ## How was this patch tested?
      Modified unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14455 from yanboliang/transformSchema.
      6cbde337
    • Ekasit Kijsipongse's avatar
      [SPARK-13238][CORE] Add ganglia dmax parameter · 1f96c97f
      Ekasit Kijsipongse authored
      The current ganglia reporter doesn't set metric expiration time (dmax). The metrics of all finished applications are indefinitely left displayed in ganglia web. The dmax parameter allows user to set the lifetime of the metrics. The default value is 0 for compatibility with previous versions.
      
      Author: Ekasit Kijsipongse <ekasitk@gmail.com>
      
      Closes #11127 from ekasitk/ganglia-dmax.
      1f96c97f
    • Bryan Cutler's avatar
      [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs · 180fd3e0
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Improve example outputs to better reflect the functionality that is being presented.  This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output.  Explicitly set parameters when they are used as part of the example.  Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema.  Synced examples between different APIs.
      
      ## How was this patch tested?
      Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14308 from BryanCutler/ml-examples-improve-output-SPARK-16260.
      180fd3e0
    • Sylvain Zimmer's avatar
      [SPARK-16826][SQL] Switch to java.net.URI for parse_url() · 2460f03f
      Sylvain Zimmer authored
      ## What changes were proposed in this pull request?
      The java.net.URL class has a globally synchronized Hashtable, which limits the throughput of any single executor doing lots of calls to parse_url(). Tests have shown that a 36-core machine can only get to 10% CPU use because the threads are locked most of the time.
      
      This patch switches to java.net.URI which has less features than java.net.URL but focuses on URI parsing, which is enough for parse_url().
      
      New tests were added to make sure a few common edge cases didn't change behaviour.
      https://issues.apache.org/jira/browse/SPARK-16826
      
      ## How was this patch tested?
      I've kept the old URL code commented for now, so that people can verify that the new unit tests do pass with java.net.URL.
      
      Thanks to srowen for the help!
      
      Author: Sylvain Zimmer <sylvain@sylvainzimmer.com>
      
      Closes #14488 from sylvinus/master.
      2460f03f
    • Yuming Wang's avatar
      [SPARK-16625][SQL] General data types to be mapped to Oracle · 39a2b2ea
      Yuming Wang authored
      ## What changes were proposed in this pull request?
      
      Spark will convert **BooleanType** to **BIT(1)**, **LongType** to **BIGINT**, **ByteType**  to **BYTE** when saving DataFrame to Oracle, but Oracle does not support BIT, BIGINT and BYTE types.
      
      This PR is convert following _Spark Types_ to _Oracle types_ refer to [Oracle Developer's Guide](https://docs.oracle.com/cd/E19501-01/819-3659/gcmaz/)
      
      Spark Type | Oracle
      ----|----
      BooleanType | NUMBER(1)
      IntegerType | NUMBER(10)
      LongType | NUMBER(19)
      FloatType | NUMBER(19, 4)
      DoubleType | NUMBER(19, 4)
      ByteType | NUMBER(3)
      ShortType | NUMBER(5)
      
      ## How was this patch tested?
      
      Add new tests in [JDBCSuite.scala](https://github.com/wangyum/spark/commit/22b0c2a4228cb8b5098ad741ddf4d1904e745ff6#diff-dc4b58851b084b274df6fe6b189db84d) and [OracleDialect.scala](https://github.com/wangyum/spark/commit/22b0c2a4228cb8b5098ad741ddf4d1904e745ff6#diff-5e0cadf526662f9281aa26315b3750ad)
      
      Author: Yuming Wang <wgyumg@gmail.com>
      
      Closes #14377 from wangyum/SPARK-16625.
      39a2b2ea
    • petermaxlee's avatar
      [MINOR] Update AccumulatorV2 doc to not mention "+=". · e0260641
      petermaxlee authored
      ## What changes were proposed in this pull request?
      As reported by Bryan Cutler on the mailing list, AccumulatorV2 does not have a += method, yet the documentation still references it.
      
      ## How was this patch tested?
      N/A
      
      Author: petermaxlee <petermaxlee@gmail.com>
      
      Closes #14466 from petermaxlee/accumulator.
      e0260641
Loading