Skip to content
Snippets Groups Projects
  1. May 09, 2017
    • Jon McLean's avatar
      [SPARK-20615][ML][TEST] SparseVector.argmax throws IndexOutOfBoundsException · be53a783
      Jon McLean authored
      ## What changes were proposed in this pull request?
      
      Added a check for for the number of defined values.  Previously the argmax function assumed that at least one value was defined if the vector size was greater than zero.
      
      ## How was this patch tested?
      
      Tests were added to the existing VectorsSuite to cover this case.
      
      Author: Jon McLean <jon.mclean@atsid.com>
      
      Closes #17877 from jonmclean/vectorArgmaxIndexBug.
      be53a783
    • Nick Pentreath's avatar
      [SPARK-20587][ML] Improve performance of ML ALS recommendForAll · 10b00aba
      Nick Pentreath authored
      This PR is a `DataFrame` version of #17742 for [SPARK-11968](https://issues.apache.org/jira/browse/SPARK-11968), for improving the performance of `recommendAll` methods.
      
      ## How was this patch tested?
      
      Existing unit tests.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #17845 from MLnick/ml-als-perf.
      10b00aba
    • Peng's avatar
      [SPARK-11968][MLLIB] Optimize MLLIB ALS recommendForAll · 80794247
      Peng authored
      The recommendForAll of MLLIB ALS is very slow.
      GC is a key problem of the current method.
      The task use the following code to keep temp result:
      val output = new Array[(Int, (Int, Double))](m*n)
      m = n = 4096 (default value, no method to set)
      so output is about 4k * 4k * (4 + 4 + 8) = 256M. This is a large memory and cause serious GC problem, and it is frequently OOM.
      
      Actually, we don't need to save all the temp result. Support we recommend topK (topK is about 10, or 20) product for each user, we only need 4k * topK * (4 + 4 + 8) memory to save the temp result.
      
      The Test Environment:
      3 workers: each work 10 core, each work 30G memory, each work 1 executor.
      The Data: User 480,000, and Item 17,000
      
      BlockSize:     1024  2048  4096  8192
      Old method:  245s  332s  488s  OOM
      This solution: 121s  118s   117s  120s
      
      The existing UT.
      
      Author: Peng <peng.meng@intel.com>
      Author: Peng Meng <peng.meng@intel.com>
      
      Closes #17742 from mpjlu/OptimizeAls.
      80794247
    • Felix Cheung's avatar
      [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames() test fails · b952b44a
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Change it to check for relative count like in this test https://github.com/apache/spark/blame/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L3355 for catalog APIs
      
      ## How was this patch tested?
      
      unit tests, this needs to combine with another commit with SQL change to check
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17905 from felixcheung/rtabletests.
      b952b44a
  2. May 08, 2017
    • Hossein's avatar
      [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails · 2abfee18
      Hossein authored
      ## What changes were proposed in this pull request?
      Cleaning existing temp tables before running tableNames tests
      
      ## How was this patch tested?
      SparkR Unit tests
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #17903 from falaki/SPARK-20661.
      2abfee18
    • jerryshao's avatar
      [SPARK-20605][CORE][YARN][MESOS] Deprecate not used AM and executor port configuration · 829cd7b8
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      After SPARK-10997, client mode Netty RpcEnv doesn't require to start server, so port configurations are not used any more, here propose to remove these two configurations: "spark.executor.port" and "spark.am.port".
      
      ## How was this patch tested?
      
      Existing UTs.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #17866 from jerryshao/SPARK-20605.
      829cd7b8
    • Xianyang Liu's avatar
      [SPARK-20621][DEPLOY] Delete deprecated config parameter in 'spark-env.sh' · aeb2ecc0
      Xianyang Liu authored
      ## What changes were proposed in this pull request?
      
      Currently, `spark.executor.instances` is deprecated in `spark-env.sh`, because we suggest config it in `spark-defaults.conf` or other config file. And also this parameter is useless even if you set it in `spark-env.sh`, so remove it in this patch.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Xianyang Liu <xianyang.liu@intel.com>
      
      Closes #17881 from ConeyLiu/deprecatedParam.
      aeb2ecc0
    • Nick Pentreath's avatar
      [SPARK-20596][ML][TEST] Consolidate and improve ALS recommendAll test cases · 58518d07
      Nick Pentreath authored
      Existing test cases for `recommendForAllX` methods (added in [SPARK-19535](https://issues.apache.org/jira/browse/SPARK-19535)) test `k < num items` and `k = num items`. Technically we should also test that `k > num items` returns the same results as `k = num items`.
      
      ## How was this patch tested?
      
      Updated existing unit tests.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #17860 from MLnick/SPARK-20596-als-rec-tests.
      58518d07
    • Xianyang Liu's avatar
      [SPARK-19956][CORE] Optimize a location order of blocks with topology information · 15526653
      Xianyang Liu authored
      ## What changes were proposed in this pull request?
      
      When call the method getLocations of BlockManager, we only compare the data block host. Random selection for non-local data blocks, this may cause the selected data block to be in a different rack. So in this patch to increase the sort of the rack.
      
      ## How was this patch tested?
      
      New test case.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Xianyang Liu <xianyang.liu@intel.com>
      
      Closes #17300 from ConeyLiu/blockmanager.
      15526653
    • liuxian's avatar
      [SPARK-20519][SQL][CORE] Modify to prevent some possible runtime exceptions · 0f820e2b
      liuxian authored
      Signed-off-by: liuxian <liu.xian3zte.com.cn>
      
      ## What changes were proposed in this pull request?
      
      When the input parameter is null, may be a runtime exception occurs
      
      ## How was this patch tested?
      Existing unit tests
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #17796 from 10110346/wip_lx_0428.
      0f820e2b
    • Wayne Zhang's avatar
      [SPARKR][DOC] fix typo in vignettes · 2fdaeb52
      Wayne Zhang authored
      ## What changes were proposed in this pull request?
      Fix typo in vignettes
      
      Author: Wayne Zhang <actuaryzhang@uber.com>
      
      Closes #17884 from actuaryzhang/typo.
      2fdaeb52
    • sujith71955's avatar
      [SPARK-20380][SQL] Unable to set/unset table comment property using ALTER... · 42cc6d13
      sujith71955 authored
      [SPARK-20380][SQL] Unable to set/unset table comment property using ALTER TABLE SET/UNSET TBLPROPERTIES ddl
      
      ### What changes were proposed in this pull request?
      Table comment was not getting  set/unset using **ALTER TABLE  SET/UNSET TBLPROPERTIES** query
      eg: ALTER TABLE table_with_comment SET TBLPROPERTIES("comment"= "modified comment)
       when user alter the table properties  and adds/updates table comment,table comment which is a field  of **CatalogTable**  instance is not getting updated and  old table comment if exists was shown to user, inorder  to handle this issue, update the comment field value in **CatalogTable** with the newly added/modified comment along with other table level properties when user executes **ALTER TABLE  SET TBLPROPERTIES** query.
      
      This pr has also taken care of unsetting the table comment when user executes query  **ALTER TABLE  UNSET TBLPROPERTIES** inorder to unset or remove table comment.
      eg: ALTER TABLE table_comment UNSET TBLPROPERTIES IF EXISTS ('comment')
      
      ### How was this patch tested?
      Added test cases  as part of **SQLQueryTestSuite** for verifying  table comment using desc formatted table query after adding/modifying table comment as part of **AlterTableSetPropertiesCommand** and unsetting the table comment using **AlterTableUnsetPropertiesCommand**.
      
      Author: sujith71955 <sujithchacko.2010@gmail.com>
      
      Closes #17649 from sujith71955/alter_table_comment.
      42cc6d13
    • Felix Cheung's avatar
      [SPARK-20626][SPARKR] address date test warning with timezone on windows · c24bdaab
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      set timezone on windows
      
      ## How was this patch tested?
      
      unit test, AppVeyor
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17892 from felixcheung/rtimestamptest.
      c24bdaab
  3. May 07, 2017
    • Imran Rashid's avatar
      [SPARK-12297][SQL] Hive compatibility for Parquet Timestamps · 22691556
      Imran Rashid authored
      ## What changes were proposed in this pull request?
      
      This change allows timestamps in parquet-based hive table to behave as a "floating time", without a timezone, as timestamps are for other file formats.  If the storage timezone is the same as the session timezone, this conversion is a no-op.  When data is read from a hive table, the table property is *always* respected.  This allows spark to not change behavior when reading old data, but read newly written data correctly (whatever the source of the data is).
      
      Spark inherited the original behavior from Hive, but Hive is also updating behavior to use the same  scheme in HIVE-12767 / HIVE-16231.
      
      The default for Spark remains unchanged; created tables do not include the new table property.
      
      This will only apply to hive tables; nothing is added to parquet metadata to indicate the timezone, so data that is read or written directly from parquet files will never have any conversions applied.
      
      ## How was this patch tested?
      
      Added a unit test which creates tables, reads and writes data, under a variety of permutations (different storage timezones, different session timezones, vectorized reading on and off).
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #16781 from squito/SPARK-12297.
      22691556
    • zero323's avatar
      [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucketBy · f53a8207
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds Python wrappers for `DataFrameWriter.bucketBy` and `DataFrameWriter.sortBy` ([SPARK-16931](https://issues.apache.org/jira/browse/SPARK-16931))
      
      ## How was this patch tested?
      
      Unit tests covering new feature.
      
      __Note__: Based on work of GregBowyer (f49b9a23468f7af32cb53d2b654272757c151725)
      
      CC HyukjinKwon
      
      Author: zero323 <zero323@users.noreply.github.com>
      Author: Greg Bowyer <gbowyer@fastmail.co.uk>
      
      Closes #17077 from zero323/SPARK-16931.
      f53a8207
    • zero323's avatar
      [SPARK-20550][SPARKR] R wrapper for Dataset.alias · 1f73d358
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Add SparkR wrapper for `Dataset.alias`.
      - Adjust roxygen annotations for `functions.alias` (including example usage).
      
      ## How was this patch tested?
      
      Unit tests, `check_cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17825 from zero323/SPARK-20550.
      1f73d358
    • Jacek Laskowski's avatar
      [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (and typo hunting) · 500436b4
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      * Docs are consistent (across different `unix_timestamp` variants and their internal expressions)
      * typo hunting
      
      ## How was this patch tested?
      
      local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #17801 from jaceklaskowski/unix_timestamp.
      500436b4
    • Felix Cheung's avatar
      [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppVeyor · 7087e011
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      add environment
      
      ## How was this patch tested?
      
      wait for appveyor run
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17878 from felixcheung/appveyorrcran.
      7087e011
    • Steve Loughran's avatar
      [SPARK-7481][BUILD] Add spark-hadoop-cloud module to pull in object store access. · 2cf83c47
      Steve Loughran authored
      ## What changes were proposed in this pull request?
      
      Add a new `spark-hadoop-cloud` module and maven profile to pull in object store support from `hadoop-openstack`, `hadoop-aws` and `hadoop-azure` (Hadoop 2.7+) JARs, along with their dependencies, fixing up the dependencies so that everything works, in particular Jackson.
      
      It restores `s3n://` access to S3, adds its `s3a://` replacement, OpenStack `swift://` and azure `wasb://`.
      
      There's a documentation page, `cloud_integration.md`, which covers the basic details of using Spark with object stores, referring the reader to the supplier's own documentation, with specific warnings on security and the possible mismatch between a store's behavior and that of a filesystem. In particular, users are advised be very cautious when trying to use an object store as the destination of data, and to consult the documentation of the storage supplier and the connector.
      
      (this is the successor to #12004; I can't re-open it)
      
      ## How was this patch tested?
      
      Downstream tests exist in [https://github.com/steveloughran/spark-cloud-examples/tree/master/cloud-examples](https://github.com/steveloughran/spark-cloud-examples/tree/master/cloud-examples)
      
      Those verify that the dependencies are sufficient to allow downstream applications to work with s3a, azure wasb and swift storage connectors, and perform basic IO & dataframe operations thereon. All seems well.
      
      Manually clean build & verify that assembly contains the relevant aws-* hadoop-* artifacts on Hadoop 2.6; azure on a hadoop-2.7 profile.
      
      SBT build: `build/sbt -Phadoop-cloud -Phadoop-2.7 package`
      maven build `mvn install -Phadoop-cloud -Phadoop-2.7`
      
      This PR *does not* update `dev/deps/spark-deps-hadoop-2.7` or `dev/deps/spark-deps-hadoop-2.6`, because unless the hadoop-cloud profile is enabled, no extra JARs show up in the dependency list. The dependency check in Jenkins isn't setting the property, so the new JARs aren't visible.
      
      Author: Steve Loughran <stevel@apache.org>
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #17834 from steveloughran/cloud/SPARK-7481-current.
      2cf83c47
    • Daniel Li's avatar
      [SPARK-20484][MLLIB] Add documentation to ALS code · 88e6d750
      Daniel Li authored
      ## What changes were proposed in this pull request?
      
      This PR adds documentation to the ALS code.
      
      ## How was this patch tested?
      
      Existing tests were used.
      
      mengxr srowen
      
      This contribution is my original work.  I have the license to work on this project under the Spark project’s open source license.
      
      Author: Daniel Li <dan@danielyli.com>
      
      Closes #17793 from danielyli/spark-20484.
      88e6d750
    • caoxuewen's avatar
      [SPARK-20518][CORE] Supplement the new blockidsuite unit tests · 37f963ac
      caoxuewen authored
      ## What changes were proposed in this pull request?
      
      This PR adds the new unit tests to support ShuffleDataBlockId , ShuffleIndexBlockId , TempShuffleBlockId , TempLocalBlockId
      
      ## How was this patch tested?
      
      The new unit test.
      
      Author: caoxuewen <cao.xuewen@zte.com.cn>
      
      Closes #17794 from heary-cao/blockidsuite.
      37f963ac
    • zero323's avatar
      [SPARK-18777][PYTHON][SQL] Return UDF from udf.register · 63d90e7d
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Move udf wrapping code from `functions.udf` to `functions.UserDefinedFunction`.
      - Return wrapped udf from `catalog.registerFunction` and dependent methods.
      - Update docstrings in `catalog.registerFunction` and `SQLContext.registerFunction`.
      - Unit tests.
      
      ## How was this patch tested?
      
      - Existing unit tests and docstests.
      - Additional tests covering new feature.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17831 from zero323/SPARK-18777.
      63d90e7d
    • Xiao Li's avatar
      [SPARK-20557][SQL] Support JDBC data type Time with Time Zone · cafca54c
      Xiao Li authored
      ### What changes were proposed in this pull request?
      
      This PR is to support JDBC data type TIME WITH TIME ZONE. It can be converted to TIMESTAMP
      
      In addition, before this PR, for unsupported data types, we simply output the type number instead of the type name.
      
      ```
      java.sql.SQLException: Unsupported type 2014
      ```
      After this PR, the message is like
      ```
      java.sql.SQLException: Unsupported type TIMESTAMP_WITH_TIMEZONE
      ```
      
      - Also upgrade the H2 version to `1.4.195` which has the type fix for "TIMESTAMP WITH TIMEZONE". However, it is not fully supported. Thus, we capture the exception, but we still need it to partially test the support of "TIMESTAMP WITH TIMEZONE", because Docker tests are not regularly run.
      
      ### How was this patch tested?
      Added test cases.
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #17835 from gatorsmile/h2.
      cafca54c
  4. May 05, 2017
    • hyukjinkwon's avatar
      [SPARK-20614][PROJECT INFRA] Use the same log4j configuration with Jenkins in AppVeyor · b433acae
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      Currently, there are flooding logs in AppVeyor (in the console). This has been fine because we can download all the logs. However, (given my observations so far), logs are truncated when there are too many. It has been grown recently and it started to get truncated. For example, see  https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1209-master
      
      Even after the log is downloaded, it looks truncated as below:
      
      ```
      [00:44:21] 17/05/04 18:56:18 INFO TaskSetManager: Finished task 197.0 in stage 601.0 (TID 9211) in 0 ms on localhost (executor driver) (194/200)
      [00:44:21] 17/05/04 18:56:18 INFO Executor: Running task 199.0 in stage 601.0 (TID 9213)
      [00:44:21] 17/05/04 18:56:18 INFO Executor: Finished task 198.0 in stage 601.0 (TID 9212). 2473 bytes result sent to driver
      ...
      ```
      
      Probably, it looks better to use the same log4j configuration that we are using for SparkR tests in Jenkins(please see https://github.com/apache/spark/blob/fc472bddd1d9c6a28e57e31496c0166777af597e/R/run-tests.sh#L26 and https://github.com/apache/spark/blob/fc472bddd1d9c6a28e57e31496c0166777af597e/R/log4j.properties)
      ```
      # Set everything to be logged to the file target/unit-tests.log
      log4j.rootCategory=INFO, file
      log4j.appender.file=org.apache.log4j.FileAppender
      log4j.appender.file.append=true
      log4j.appender.file.file=R/target/unit-tests.log
      log4j.appender.file.layout=org.apache.log4j.PatternLayout
      log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n
      
      # Ignore messages below warning level from Jetty, because it's a bit verbose
      log4j.logger.org.eclipse.jetty=WARN
      org.eclipse.jetty.LEVEL=WARN
      ```
      
      ## How was this patch tested?
      
      Manually tested with spark-test account
        - https://ci.appveyor.com/project/spark-test/spark/build/672-r-log4j (there is an example for flaky test here)
        - https://ci.appveyor.com/project/spark-test/spark/build/673-r-log4j (I re-ran the build).
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17873 from HyukjinKwon/appveyor-reduce-logs.
      b433acae
    • Juliusz Sompolski's avatar
      [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch · 5d75b14b
      Juliusz Sompolski authored
      ## What changes were proposed in this pull request?
      
      Due to a likely typo, the logDebug msg printing the diff of query plans shows a diff to the initial plan, not diff to the start of batch.
      
      ## How was this patch tested?
      
      Now the debug message prints the diff between start and end of batch.
      
      Author: Juliusz Sompolski <julek@databricks.com>
      
      Closes #17875 from juliuszsompolski/SPARK-20616.
      5d75b14b
    • Jannik Arndt's avatar
      [SPARK-20557][SQL] Support for db column type TIMESTAMP WITH TIME ZONE · b31648c0
      Jannik Arndt authored
      ## What changes were proposed in this pull request?
      
      SparkSQL can now read from a database table with column type [TIMESTAMP WITH TIME ZONE](https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html#TIMESTAMP_WITH_TIMEZONE).
      
      ## How was this patch tested?
      
      Tested against Oracle database.
      
      JoshRosen, you seem to know the class, would you look at this? Thanks!
      
      Author: Jannik Arndt <jannik@jannikarndt.de>
      
      Closes #17832 from JannikArndt/spark-20557-timestamp-with-timezone.
      b31648c0
    • Shixiong Zhu's avatar
      [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load · bd578828
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout.
      
      This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #17863 from zsxwing/fix-kafka-flaky-test.
      bd578828
    • Yucai's avatar
      [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ObjectHashAggregateExec · 41439fd5
      Yucai authored
      ## What changes were proposed in this pull request?
      
      ObjectHashAggregateExec is missing numOutputRows, add this metrics for it.
      
      ## How was this patch tested?
      
      Added unit tests for the new metrics.
      
      Author: Yucai <yucai.yu@intel.com>
      
      Closes #17678 from yucai/objectAgg_numOutputRows.
      41439fd5
    • Jarrett Meyer's avatar
      [SPARK-20613] Remove excess quotes in Windows executable · b9ad2d19
      Jarrett Meyer authored
      ## What changes were proposed in this pull request?
      
      Quotes are already added to the RUNNER variable on line 54. There is no need to put quotes on line 67. If you do, you will get an error when launching Spark.
      
      '""C:\Program' is not recognized as an internal or external command, operable program or batch file.
      
      ## How was this patch tested?
      
      Tested manually on Windows 10.
      
      Author: Jarrett Meyer <jarrettmeyer@gmail.com>
      
      Closes #17861 from jarrettmeyer/fix-windows-cmd.
      b9ad2d19
    • madhu's avatar
      [SPARK-20495][SQL][CORE] Add StorageLevel to cacheTable API · 9064f1b0
      madhu authored
      ## What changes were proposed in this pull request?
      Currently cacheTable API only supports MEMORY_AND_DISK. This PR adds additional API to take different storage levels.
      ## How was this patch tested?
      unit tests
      
      Author: madhu <phatak.dev@gmail.com>
      
      Closes #17802 from phatak-dev/cacheTableAPI.
      9064f1b0
    • jyu00's avatar
      [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode · 5773ab12
      jyu00 authored
      ## What changes were proposed in this pull request?
      
      Updated spark-class to turn off posix mode so the process substitution doesn't cause a syntax error.
      
      ## How was this patch tested?
      
      Existing unit tests, manual spark-shell testing with posix mode on
      
      Author: jyu00 <jessieyu@us.ibm.com>
      
      Closes #17852 from jyu00/master.
      5773ab12
    • Yuming Wang's avatar
      [SPARK-19660][SQL] Replace the deprecated property name fs.default.name to... · 37cdf077
      Yuming Wang authored
      [SPARK-19660][SQL] Replace the deprecated property name fs.default.name to fs.defaultFS that newly introduced
      
      ## What changes were proposed in this pull request?
      
      Replace the deprecated property name `fs.default.name` to `fs.defaultFS` that newly introduced.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Yuming Wang <wgyumg@gmail.com>
      
      Closes #17856 from wangyum/SPARK-19660.
      37cdf077
    • hyukjinkwon's avatar
      [INFRA] Close stale PRs · 4411ac70
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to close a stale PR, several PRs suggested to be closed by a committer and obviously inappropriate PRs.
      
      Closes #11119
      Closes #17853
      Closes #17732
      Closes #17456
      Closes #17410
      Closes #17314
      Closes #17362
      Closes #17542
      
      ## How was this patch tested?
      
      N/A
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17855 from HyukjinKwon/close-pr.
      4411ac70
  5. May 04, 2017
    • Wayne Zhang's avatar
      [SPARK-20574][ML] Allow Bucketizer to handle non-Double numeric column · 0d16faab
      Wayne Zhang authored
      ## What changes were proposed in this pull request?
      Bucketizer currently requires input column to be Double, but the logic should work on any numeric data types. Many practical problems have integer/float data types, and it could get very tedious to manually cast them into Double before calling bucketizer. This PR extends bucketizer to handle all numeric types.
      
      ## How was this patch tested?
      New test.
      
      Author: Wayne Zhang <actuaryzhang@uber.com>
      
      Closes #17840 from actuaryzhang/bucketizer.
      0d16faab
    • Dongjoon Hyun's avatar
      [SPARK-20566][SQL] ColumnVector should support `appendFloats` for array · bfc8c79c
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to add a missing `appendFloats` API for array into **ColumnVector** class. For double type, there is `appendDoubles` for array [here](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java#L818-L824).
      
      ## How was this patch tested?
      
      Pass the Jenkins with a newly added test case.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #17836 from dongjoon-hyun/SPARK-20566.
      bfc8c79c
    • Yanbo Liang's avatar
      [SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up · c5dceb8c
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Address some minor comments for #17715:
      * Put bound-constrained optimization params under expertParams.
      * Update some docs.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #17829 from yanboliang/spark-20047-followup.
      c5dceb8c
    • Felix Cheung's avatar
      [SPARK-20571][SPARKR][SS] Flaky Structured Streaming tests · 57b64703
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Make tests more reliable by having it till processed.
      Increasing timeout value might help but ultimately the flakiness from processing delay when Jenkins is hard to account for. This isn't an actual public API supported
      
      ## How was this patch tested?
      unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17857 from felixcheung/rsstestrelia.
      57b64703
    • zero323's avatar
      [SPARK-20544][SPARKR] R wrapper for input_file_name · f21897fc
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds wrapper for `o.a.s.sql.functions.input_file_name`
      
      ## How was this patch tested?
      
      Existing unit tests, additional unit tests, `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17818 from zero323/SPARK-20544.
      f21897fc
    • zero323's avatar
      [SPARK-20585][SPARKR] R generic hint support · 9c36aa27
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds support for generic hints on `SparkDataFrame`
      
      ## How was this patch tested?
      
      Unit tests, `check-cran.sh`
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17851 from zero323/SPARK-20585.
      9c36aa27
    • Felix Cheung's avatar
      [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming... · b8302ccd
      Felix Cheung authored
      [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example
      
      ## What changes were proposed in this pull request?
      
      Add
      - R vignettes
      - R programming guide
      - SS programming guide
      - R example
      
      Also disable spark.als in vignettes for now since it's failing (SPARK-20402)
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17814 from felixcheung/rdocss.
      b8302ccd
Loading