Skip to content
Snippets Groups Projects
  1. Feb 17, 2015
  2. Feb 16, 2015
    • Reynold Xin's avatar
      [SPARK-5853][SQL] Schema support in Row. · d380f324
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4640 from rxin/SPARK-5853 and squashes the following commits:
      
      9c6f569 [Reynold Xin] [SPARK-5853][SQL] Schema support in Row.
      d380f324
    • Patrick Wendell's avatar
      SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream · a51d51ff
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4638 from pwendell/SPARK-5850 and squashes the following commits:
      
      386126f [Patrick Wendell] SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream.
      a51d51ff
    • Davies Liu's avatar
      [SPARK-5363] [PySpark] check ending mark in non-block way · ac6fe67e
      Davies Liu authored
      There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.
      
      This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.
      
      There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.
      
      cc JoshRosen pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4601 from davies/freeze and squashes the following commits:
      
      e15a8c3 [Davies Liu] update logging
      890329c [Davies Liu] Merge branch 'freeze' of github.com:davies/spark into freeze
      2bd2228 [Davies Liu] add more logging
      656d544 [Davies Liu] Update PythonRDD.scala
      05e1085 [Davies Liu] check ending mark in non-block way
      ac6fe67e
    • Reynold Xin's avatar
      [SQL] Various DataFrame doc changes. · 0e180bfc
      Reynold Xin authored
      Added a bunch of tags.
      
      Also changed parquetFile to take varargs rather than a string followed by varargs.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4636 from rxin/df-doc and squashes the following commits:
      
      651f80c [Reynold Xin] Fixed parquetFile in PySpark.
      8dc3024 [Reynold Xin] [SQL] Various DataFrame doc changes.
      0e180bfc
    • Josh Rosen's avatar
      [SPARK-5849] Handle more types of invalid JSON requests in SubmitRestProtocolMessage.parseAction · 58a82a78
      Josh Rosen authored
      This patch improves SubmitRestProtocol's handling of invalid JSON requests in cases where those requests were parsable as JSON but not as JSON objects (e.g. they could be parsed as ararys or strings).  I replaced an unchecked cast with pattern-matching and added a new test case.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4637 from JoshRosen/rest-protocol-cast and squashes the following commits:
      
      b3f282b [Josh Rosen] [SPARK-5849] Handle more types of invalid JSON in SubmitRestProtocolMessage.parseAction
      58a82a78
    • azagrebin's avatar
      [SPARK-3340] Deprecate ADD_JARS and ADD_FILES · 16687651
      azagrebin authored
      I created a patch that disables the environment variables.
      Thereby scala or python shell log a warning message to notify user about the deprecation
      with the following message:
      scala: "ADD_JARS environment variable is deprecated, use --jar spark submit argument instead"
      python: "Warning: ADD_FILES environment variable is deprecated, use --py-files argument instead"
      
      Is it what is expected or the code associated with the variables should be just completely removed?
      Should it be somewhere documented?
      
      Author: azagrebin <azagrebin@gmail.com>
      
      Closes #4616 from azagrebin/master and squashes the following commits:
      
      bab1aa9 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES: minor readability issue
      0643895 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES: add warning messages
      42f0107 [azagrebin] [SPARK-3340] Deprecate ADD_JARS and ADD_FILES
      16687651
    • Davies Liu's avatar
      [SPARK-5788] [PySpark] capture the exception in python write thread · b1bd1dd3
      Davies Liu authored
      The exception in Python writer thread will shutdown executor.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4577 from davies/exception and squashes the following commits:
      
      eb0ceff [Davies Liu] Update PythonRDD.scala
      139b0db [Davies Liu] capture the exception in python write thread
      b1bd1dd3
    • Matt Whelan's avatar
      SPARK-5848: tear down the ConsoleProgressBar timer · 1294a6e0
      Matt Whelan authored
      The timer is a GC root, and failing to terminate it leaks SparkContext
      instances.
      
      Author: Matt Whelan <mwhelan@perka.com>
      
      Closes #4635 from MattWhelan/SPARK-5848 and squashes the following commits:
      
      2a1e8a5 [Matt Whelan] SPARK-5848: teardown the ConsoleProgressBar timer
      1294a6e0
    • Yin Huai's avatar
      [SPARK-4865][SQL]Include temporary tables in SHOW TABLES · e189cbb0
      Yin Huai authored
      This PR adds a `ShowTablesCommand` to support `SHOW TABLES [IN databaseName]` SQL command. The result of `SHOW TABLE` has two columns, `tableName` and `isTemporary`. For temporary tables, the value of `isTemporary` column will be `false`.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-4865
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4618 from yhuai/showTablesCommand and squashes the following commits:
      
      0c09791 [Yin Huai] Use ShowTablesCommand.
      85ee76d [Yin Huai] Since SHOW TABLES is not a Hive native command any more and we will not see "OK" (originally generated by Hive's driver), use SHOW DATABASES in the test.
      94bacac [Yin Huai] Add SHOW TABLES to the list of noExplainCommands.
      d71ed09 [Yin Huai] Fix test.
      a4a6ec3 [Yin Huai] Add SHOW TABLE command.
      e189cbb0
    • kai's avatar
      [SQL] Optimize arithmetic and predicate operators · cb6c48c8
      kai authored
      Existing implementation of arithmetic operators and BinaryComparison operators have redundant type checking codes, e.g.:
      Expression.n2 is used by Add/Subtract/Multiply.
      (1) n2 always checks left.dataType == right.dataType. However, this checking should be done once when we resolve expression types;
      (2) n2 requires dataType is a NumericType. This can be done once.
      
      This PR optimizes arithmetic and predicate operators by removing such redundant type-checking codes.
      
      Some preliminary benchmarking on 10G TPC-H data over 5 r3.2xlarge EC2 machines shows that this PR can reduce the query time by 5.5% to 11%.
      The benchmark queries follow the template below, where OP is plus/minus/times/divide/remainder/bitwise and/bitwise or/bitwise xor.
      
      SELECT l_returnflag,  l_linestatus, SUM(l_quantity OP cnt1), SUM(l_quantity OP cnt2), ...., SUM(l_quantity OP cnt700)
      FROM (
          SELECT l_returnflag, l_linestatus, l_quantity, 1 AS cnt1, 2 AS cnt2, ..., 700 AS cnt700
          FROM lineitem
          WHERE l_shipdate <= '1998-09-01'
      )
      GROUP BY l_returnflag,  l_linestatus;
      
      Author: kai <kaizeng@eecs.berkeley.edu>
      
      Closes #4472 from kai-zeng/arithmetic-optimize and squashes the following commits:
      
      fef0cf1 [kai] Merge branch 'master' of github.com:apache/spark into arithmetic-optimize
      4b3a1bb [kai] chmod a-x
      5a41e49 [kai] chmod a-x Expression.scala
      cb37c94 [kai] rebase onto spark master
      7f6e968 [kai] chmod 100755 -> 100644
      6cddb46 [kai] format
      7490dbc [kai] fix unresolved-expression exception for EqualTo
      9c40bc0 [kai] fix bitwisenot
      3cbd363 [kai] clean up test code
      ca47801 [kai] override evalInternal for bitwise ops
      8fa84a1 [kai] add bitwise or and xor
      6892fc4 [kai] revert override evalInternal
      f8eba24 [kai] override evalInternal
      31ccdd4 [kai] rewrite all bitwise op and remove evalInternal
      86297e2 [kai] generalized
      cb92ae1 [kai] bitwise-and: override eval
      97a7d6c [kai] bitwise-and: override evalInternal using and func
      0906c39 [kai] add bitwise test
      62abbbc [kai] clean up predicate and arithmetic
      b34d58d [kai] add caching and benmark option
      12c5b32 [kai] override eval
      1cd7571 [kai] fix sqrt and maxof
      03fd0c3 [kai] fix predicate
      16fd84c [kai] optimize + - * / % -(unary) abs < > <= >=
      fd95823 [kai] remove unnecessary type checking
      24d062f [kai] test suite
      cb6c48c8
    • Yin Huai's avatar
      [SPARK-5839][SQL]HiveMetastoreCatalog does not recognize table names and... · f3ff1eb2
      Yin Huai authored
      [SPARK-5839][SQL]HiveMetastoreCatalog does not recognize table names and aliases of data source tables.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-5839
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4626 from yhuai/SPARK-5839 and squashes the following commits:
      
      f779d85 [Yin Huai] Use subqeury to wrap replaced ParquetRelation.
      2695f13 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-5839
      f1ba6ca [Yin Huai] Address comment.
      2c7fa08 [Yin Huai] Use Subqueries to wrap a data source table.
      f3ff1eb2
    • Yin Huai's avatar
      [SPARK-5746][SQL] Check invalid cases for the write path of data source API · 5b6cd65c
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5746
      
      liancheng marmbrus
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4617 from yhuai/insertOverwrite and squashes the following commits:
      
      8e3019d [Yin Huai] Fix compilation error.
      499e8e7 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
      e76e85a [Yin Huai] Address comments.
      ac31b3c [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
      f30bdad [Yin Huai] Use toDF.
      99da57e [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
      6b7545c [Yin Huai] Add a pre write check to the data source API.
      a88c516 [Yin Huai] DDLParser will take a parsering function to take care CTAS statements.
      5b6cd65c
    • Patrick Wendell's avatar
      HOTFIX: Break in Jekyll build from #4589 · 04b401da
      Patrick Wendell authored
      That patch had a line break in the middle of a {{ }} expression, which is not allowed.
      04b401da
    • Josh Rosen's avatar
      [SPARK-2313] Use socket to communicate GatewayServer port back to Python driver · 0cfda846
      Josh Rosen authored
      This patch changes PySpark so that the GatewayServer's port is communicated back to the Python process that launches it over a local socket instead of a pipe.  The old pipe-based approach was brittle and could fail if `spark-submit` printed unexpected to stdout.
      
      To accomplish this, I wrote a custom `PythonGatewayServer.main()` function to use in place of Py4J's `GatewayServer.main()`.
      
      Closes #3424.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits:
      
      6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed
      0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies.
      9bdb4b6 [Josh Rosen] Handle case where getListeningPort returns -1
      3fb7ed1 [Josh Rosen] Remove stdout=PIPE
      2458934 [Josh Rosen] Use underscore to mark env var. as private
      d12c95d [Josh Rosen] Use Logging and Utils.tryOrExit()
      e5f9730 [Josh Rosen] Wrap everything in a giant try-block
      2f70689 [Josh Rosen] Use stdin PIPE to share fate with driver
      8bf956e [Josh Rosen] Initial cut at passing Py4J gateway port back to driver via socket
      0cfda846
    • Matt Whelan's avatar
      SPARK-5357: Update commons-codec version to 1.10 (current) · c01c4ebc
      Matt Whelan authored
      Resolves https://issues.apache.org/jira/browse/SPARK-5357
      
      In commons-codec 1.5, Base64 instances are not thread safe.  That was only true from 1.4-1.6.
      
      Author: Matt Whelan <mwhelan@perka.com>
      
      Closes #4153 from MattWhelan/depsUpdate and squashes the following commits:
      
      b4a91f4 [Matt Whelan] SPARK-5357: Update commons-codec version to 1.10 (current)
      c01c4ebc
    • Matt Whelan's avatar
      SPARK-5841: remove DiskBlockManager shutdown hook on stop · bb05982d
      Matt Whelan authored
      After a call to stop, the shutdown hook is redundant, and causes a
      memory leak.
      
      Author: Matt Whelan <mwhelan@perka.com>
      
      Closes #4627 from MattWhelan/SPARK-5841 and squashes the following commits:
      
      d5f5c7f [Matt Whelan] SPARK-5841: remove DiskBlockManager shutdown hook on stop
      bb05982d
    • Cheng Lian's avatar
      [SPARK-5833] [SQL] Adds REFRESH TABLE command · c51ab37f
      Cheng Lian authored
      Lifts `HiveMetastoreCatalog.refreshTable` to `Catalog`. Adds `RefreshTable` command to refresh (possibly cached) metadata in external data sources tables.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4624)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4624 from liancheng/refresh-table and squashes the following commits:
      
      8d1aa4c [Cheng Lian] Adds REFRESH TABLE command
      c51ab37f
    • Cheng Lian's avatar
      [SPARK-5296] [SQL] Add more filter types for data sources API · 6f54dee6
      Cheng Lian authored
      This PR adds the following filter types for data sources API:
      
      - `IsNull`
      - `IsNotNull`
      - `Not`
      - `And`
      - `Or`
      
      The code which converts Catalyst predicate expressions to data sources filters is very similar to filter conversion logics in `ParquetFilters` which converts Catalyst predicates to Parquet filter predicates. In this way we can support nested AND/OR/NOT predicates without changing current `BaseScan` type hierarchy.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4623)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4623 from liancheng/more-fiters and squashes the following commits:
      
      1b296f4 [Cheng Lian] Add more filter types for data sources API
      6f54dee6
    • OopsOutOfMemory's avatar
      [SQL] Add fetched row count in SparkSQLCLIDriver · b4d7c703
      OopsOutOfMemory authored
      before this change:
      ```scala
      Time taken: 0.619 seconds
      ```
      
      after this change :
      ```scala
      Time taken: 0.619 seconds, Fetched: 4 row(s)
      ```
      
      Author: OopsOutOfMemory <victorshengli@126.com>
      
      Closes #4604 from OopsOutOfMemory/rowcount and squashes the following commits:
      
      7252dea [OopsOutOfMemory] add fetched row count
      b4d7c703
    • Michael Armbrust's avatar
      [SQL] Initial support for reporting location of error in sql string · 104b2c45
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4587 from marmbrus/position and squashes the following commits:
      
      0810052 [Michael Armbrust] fix tests
      395c019 [Michael Armbrust] Merge remote-tracking branch 'marmbrus/position' into position
      e155dce [Michael Armbrust] more errors
      f3efa51 [Michael Armbrust] Update AnalysisException.scala
      d45ff60 [Michael Armbrust] [SQL] Initial support for reporting location of error in sql string
      104b2c45
    • Daoyuan Wang's avatar
      [SPARK-5824] [SQL] add null format in ctas and set default col comment to null · 275a0c08
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #4609 from adrian-wang/ctas and squashes the following commits:
      
      0a75d5a [Daoyuan Wang] reorder import
      93d1863 [Daoyuan Wang] add null format in ctas and set default col comment to null
      275a0c08
    • Cheng Hao's avatar
      [SQL] [Minor] Update the SpecificMutableRow.copy · cc552e04
      Cheng Hao authored
      When profiling the Join / Aggregate queries via VisualVM, I noticed lots of `SpecificMutableRow` objects created, as well as the `MutableValue`, since the `SpecificMutableRow` are mostly used in data source implementation, but the `copy` method could be called multiple times in upper modules (e.g. in Join / aggregation etc.), duplicated instances created should be avoid.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4619 from chenghao-intel/specific_mutable_row and squashes the following commits:
      
      9300d23 [Cheng Hao] update the SpecificMutableRow.copy
      cc552e04
    • Sean Owen's avatar
      SPARK-5795 [STREAMING] api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java · 8e25373c
      Sean Owen authored
      Revise JavaPairDStream API declaration on saveAs Hadoop methods, to allow it to be called directly as intended.
      
      CC tdas for review
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4608 from srowen/SPARK-5795 and squashes the following commits:
      
      36f1ead [Sean Owen] Add code that shows compile problem and fix
      036bd27 [Sean Owen] Revise JavaPairDStream API declaration on saveAs Hadoop methods, to allow it to be called directly as intended.
      8e25373c
    • Reynold Xin's avatar
    • Liang-Chi Hsieh's avatar
      [SPARK-5799][SQL] Compute aggregation function on specified numeric columns · 5c78be7a
      Liang-Chi Hsieh authored
      Compute aggregation function on specified numeric columns. For example:
      
          val df = Seq(("a", 1, 0, "b"), ("b", 2, 4, "c"), ("a", 2, 3, "d")).toDataFrame("key", "value1", "value2", "rest")
          df.groupBy("key").min("value2")
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4592 from viirya/specific_cols_agg and squashes the following commits:
      
      9446896 [Liang-Chi Hsieh] For comments.
      314c4cd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      353fad7 [Liang-Chi Hsieh] For python unit tests.
      54ed0c4 [Liang-Chi Hsieh] Address comments.
      b079e6b [Liang-Chi Hsieh] Remove duplicate codes.
      55100fb [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      880c2ac [Liang-Chi Hsieh] Fix Python style checks.
      4c63a01 [Liang-Chi Hsieh] Fix pyspark.
      b1a24fc [Liang-Chi Hsieh] Address comments.
      2592f29 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      27069c3 [Liang-Chi Hsieh] Combine functions and add varargs annotation.
      371a3f7 [Liang-Chi Hsieh] Compute aggregation function on specified numeric columns.
      5c78be7a
    • Sean Owen's avatar
      SPARK-5815 [MLLIB] Part 2. Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS · a3afa4a1
      Sean Owen authored
      Now, deprecated runSVDPlusPlus and update run, for 1.4.0 / master only
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4625 from srowen/SPARK-5815.2 and squashes the following commits:
      
      6fd2ca5 [Sean Owen] Now, deprecated runSVDPlusPlus and update run, for 1.4.0 / master only
      a3afa4a1
    • Xutingjun's avatar
      [SPARK-5831][Streaming]When checkpoint file size is bigger than 10, then delete the old ones · 1115e8e7
      Xutingjun authored
      Author: Xutingjun <1039320815@qq.com>
      
      Closes #4621 from XuTingjun/checkpointFile and squashes the following commits:
      
      b5f2816 [Xutingjun] fix bug
      1115e8e7
    • Cheng Lian's avatar
      [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet data source with the newly... · 3ce58cf9
      Cheng Lian authored
      [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet data source with the newly introduced write support for data source API
      
      This PR migrates the Parquet data source to the new data source write support API.  Now users can also overwriting and appending to existing tables. Notice that inserting into partitioned tables is not supported yet.
      
      When Parquet data source is enabled, insertion to Hive Metastore Parquet tables is also fullfilled by the Parquet data source. This is done by the newly introduced `HiveMetastoreCatalog.ParquetConversions` rule, which is a "proper" implementation of the original hacky `HiveStrategies.ParquetConversion`. The latter is still preserved, and can be removed together with the old Parquet support in the future.
      
      TODO:
      
      - [x] Update outdated comments in `newParquet.scala`.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4563)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4563 from liancheng/parquet-refining and squashes the following commits:
      
      fa98d27 [Cheng Lian] Fixes test cases which should disable off Parquet data source
      2476e82 [Cheng Lian] Fixes compilation error introduced during rebasing
      a83d290 [Cheng Lian] Passes Hive Metastore partitioning information to ParquetRelation2
      3ce58cf9
    • Cheng Lian's avatar
      [Minor] [SQL] Renames stringRddToDataFrame to stringRddToDataFrameHolder for consistency · 199a9e80
      Cheng Lian authored
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4613)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4613 from liancheng/df-implicit-rename and squashes the following commits:
      
      db8bdd3 [Cheng Lian] Renames stringRddToDataFrame to stringRddToDataFrameHolder for consistency
      199a9e80
    • Peter Rudenko's avatar
      [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop · d51d6ba1
      Peter Rudenko authored
      On a big dataset explicitly unpersist train and validation folds allows to load more data into memory in the next loop iteration. On my environment (single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross validation), saved more than 5 minutes.
      
      Author: Peter Rudenko <petro.rudenko@gmail.com>
      
      Closes #4595 from petro-rudenko/patch-2 and squashes the following commits:
      
      66a7cfb [Peter Rudenko] Move validationDataset cache to declaration
      c5f3265 [Peter Rudenko] [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
      d51d6ba1
  3. Feb 15, 2015
    • Peter Rudenko's avatar
      [Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline · c78a12c4
      Peter Rudenko authored
      If it's a last estimator in Pipeline there's no need to transform data, since there's no next stage that would consume this data.
      
      Author: Peter Rudenko <petro.rudenko@gmail.com>
      
      Closes #4590 from petro-rudenko/patch-1 and squashes the following commits:
      
      d13ec33 [Peter Rudenko] [Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline
      c78a12c4
    • Sean Owen's avatar
      SPARK-5815 [MLLIB] Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS · acf2558d
      Sean Owen authored
      Deprecate SVDPlusPlus.run and introduce SVDPlusPlus.runSVDPlusPlus with return type that doesn't include DoubleMatrix
      
      CC mengxr
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4614 from srowen/SPARK-5815 and squashes the following commits:
      
      288cb05 [Sean Owen] Clarify deprecation plans in scaladoc
      497458e [Sean Owen] Deprecate SVDPlusPlus.run and introduce SVDPlusPlus.runSVDPlusPlus with return type that doesn't include DoubleMatrix
      acf2558d
    • Xiangrui Meng's avatar
      [SPARK-5769] Set params in constructors and in setParams in Python ML pipelines · cd4a1536
      Xiangrui Meng authored
      This PR allow Python users to set params in constructors and in setParams, where we use decorator `keyword_only` to force keyword arguments. The trade-off is discussed in the design doc of SPARK-4586.
      
      Generated doc:
      ![screen shot 2015-02-12 at 3 06 58 am](https://cloud.githubusercontent.com/assets/829644/6166491/9cfcd06a-b265-11e4-99ea-473d866634fc.png)
      
      CC: davies rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4564 from mengxr/py-pipeline-kw and squashes the following commits:
      
      fedf720 [Xiangrui Meng] use toDF
      d565f2c [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into py-pipeline-kw
      cbc15d3 [Xiangrui Meng] fix style
      5032097 [Xiangrui Meng] update pipeline signature
      950774e [Xiangrui Meng] simplify keyword_only and update constructor/setParams signatures
      fdde5fc [Xiangrui Meng] fix style
      c9384b8 [Xiangrui Meng] fix sphinx doc
      8e59180 [Xiangrui Meng] add setParams and make constructors take params, where we force keyword args
      cd4a1536
    • Sean Owen's avatar
      SPARK-5669 [BUILD] Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS · 836577b3
      Sean Owen authored
      Exclude libgfortran, libgcc bundled by JBLAS for Windows. This much is simple, and solves the essential license issue. But the more important question is whether MLlib works on Windows then.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4453 from srowen/SPARK-5669 and squashes the following commits:
      
      734dd86 [Sean Owen] Exclude libgfortran, libgcc bundled by JBLAS, affecting Windows / OS X / Linux 32-bit (not Linux 64-bit)
      836577b3
    • martinzapletal's avatar
      [MLLIB][SPARK-5502] User guide for isotonic regression · 61eb1267
      martinzapletal authored
      User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java.
      
      Author: martinzapletal <zapletal-martin@email.cz>
      
      Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following commits:
      
      67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use more general language rather than the code/implementation specific terms
      80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic regression, added links to the page, updated data and examples
      7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
      504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
      61eb1267
Loading