Skip to content
Snippets Groups Projects
  1. Jun 02, 2015
    • DB Tsai's avatar
      [SPARK-7547] [ML] Scala Example code for ElasticNet · a86b3e9b
      DB Tsai authored
      This is scala example code for both linear and logistic regression. Python and Java versions are to be added.
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #6576 from dbtsai/elasticNetExample and squashes the following commits:
      
      e7ca406 [DB Tsai] fix test
      6bb6d77 [DB Tsai] fix suite and remove duplicated setMaxIter
      136e0dd [DB Tsai] address feedback
      1ec29d4 [DB Tsai] fix style
      9462f5f [DB Tsai] add example
      a86b3e9b
    • Ram Sriharsha's avatar
      [SPARK-7387] [ML] [DOC] CrossValidator example code in Python · c3f4c325
      Ram Sriharsha authored
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #6358 from harsha2010/SPARK-7387 and squashes the following commits:
      
      63efda2 [Ram Sriharsha] more examples for classifier to distinguish mapreduce from spark properly
      aeb6bb6 [Ram Sriharsha] Python Style Fix
      54a500c [Ram Sriharsha] Merge branch 'master' into SPARK-7387
      615e91c [Ram Sriharsha] cleanup
      204c4e3 [Ram Sriharsha] Merge branch 'master' into SPARK-7387
      7246d35 [Ram Sriharsha] [SPARK-7387][ml][doc] CrossValidator example code in Python
      c3f4c325
    • Cheng Lian's avatar
      [SQL] [TEST] [MINOR] Follow-up of PR #6493, use Guava API to ensure Java 6 friendliness · 5cd6a63d
      Cheng Lian authored
      This is a follow-up of PR #6493, which has been reverted in branch-1.4 because it uses Java 7 specific APIs and breaks Java 6 build. This PR replaces those APIs with equivalent Guava ones to ensure Java 6 friendliness.
      
      cc andrewor14 pwendell, this should also be back ported to branch-1.4.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6547 from liancheng/override-log4j and squashes the following commits:
      
      c900cfd [Cheng Lian] Addresses Shixiong's comment
      72da795 [Cheng Lian] Uses Guava API to ensure Java 6 friendliness
      5cd6a63d
    • Xiangrui Meng's avatar
      [SPARK-8049] [MLLIB] drop tmp col from OneVsRest output · 89f21f66
      Xiangrui Meng authored
      The temporary column should be dropped after we get the prediction column. harsha2010
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6592 from mengxr/SPARK-8049 and squashes the following commits:
      
      1d89107 [Xiangrui Meng] use SparkFunSuite
      6ee70de [Xiangrui Meng] drop tmp col from OneVsRest output
      89f21f66
    • Davies Liu's avatar
      [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise() · 605ddbb2
      Davies Liu authored
      Thanks ogirardot, closes #6580
      
      cc rxin JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6590 from davies/when and squashes the following commits:
      
      c0f2069 [Davies Liu] fix Column.when() and otherwise()
      605ddbb2
    • Cheng Lian's avatar
      [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a... · 686a45f0
      Cheng Lian authored
      [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append
      
      The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later.
      
      This PR fixes this issue by deferring metadata discovery after save mode checking.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6583 from liancheng/spark-8014 and squashes the following commits:
      
      1aafabd [Cheng Lian] Updates comments
      088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined
      8fbd93f [Cheng Lian] Fixes SPARK-8014
      686a45f0
    • Mike Dusenberry's avatar
      [SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating... · ad06727f
      Mike Dusenberry authored
      [SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating ML Doc "Estimator, Transformer, and Param" examples.
      
      Updating ML Doc's *"Estimator, Transformer, and Param"* example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists.
      
      mengxr, I believe this addresses (part of) the *update documentation* TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820).
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits:
      
      6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model.
      d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed.
      0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists.
      7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists.
      ad06727f
    • Marcelo Vanzin's avatar
      [SPARK-8015] [FLUME] Remove Guava dependency from flume-sink. · 0071bd8d
      Marcelo Vanzin authored
      The minimal change would be to disable shading of Guava in the module,
      and rely on the transitive dependency from other libraries instead. But
      since Guava's use is so localized, I think it's better to just not use
      it instead, so I replaced that code and removed all traces of Guava from
      the module's build.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6555 from vanzin/SPARK-8015 and squashes the following commits:
      
      c0ceea8 [Marcelo Vanzin] Add comments about dependency management.
      c38228d [Marcelo Vanzin] Add guava dep in test scope.
      b7a0349 [Marcelo Vanzin] Add libthrift exclusion.
      6e0942d [Marcelo Vanzin] Add comment in pom.
      2d79260 [Marcelo Vanzin] [SPARK-8015] [flume] Remove Guava dependency from flume-sink.
      0071bd8d
    • Cheng Lian's avatar
      [SPARK-8037] [SQL] Ignores files whose name starts with dot in HadoopFsRelation · 1bb5d716
      Cheng Lian authored
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6581 from liancheng/spark-8037 and squashes the following commits:
      
      d08e97b [Cheng Lian] Ignores files whose name starts with dot in HadoopFsRelation
      1bb5d716
    • Xiangrui Meng's avatar
      [SPARK-7432] [MLLIB] fix flaky CrossValidator doctest · bd97840d
      Xiangrui Meng authored
      The new test uses CV to compare `maxIter=0` and `maxIter=1`, and validate on the evaluation result. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6572 from mengxr/SPARK-7432 and squashes the following commits:
      
      c236bb8 [Xiangrui Meng] fix flacky cv doctest
      bd97840d
    • Davies Liu's avatar
      [SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with Scala · 445647a1
      Davies Liu authored
      add schema()/format()/options() for reader,  add mode()/format()/options()/partitionBy() for writer
      
      cc rxin yhuai  pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6578 from davies/readwrite and squashes the following commits:
      
      720d293 [Davies Liu] address comments
      b65dfa2 [Davies Liu] Update readwriter.py
      1299ab6 [Davies Liu] make Python API consistent with Scala
      445647a1
    • Yin Huai's avatar
      [SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid... · 0f80990b
      Yin Huai authored
      [SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid collapsing nondeterministic projects.
      
      This closes #6570.
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6573 from rxin/deterministic and squashes the following commits:
      
      356cd22 [Reynold Xin] Added unit test for the optimizer.
      da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023
      da56200 [Yin Huai] Comments.
      e38f264 [Yin Huai] Comment.
      f9d6a73 [Yin Huai] Add a deterministic method to Expression.
      0f80990b
    • Yin Huai's avatar
      [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get... · 7b7f7b6c
      Yin Huai authored
      [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early
      
      https://issues.apache.org/jira/browse/SPARK-8020
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6571 from yhuai/SPARK-8020-1 and squashes the following commits:
      
      0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.
      7b7f7b6c
    • Davies Liu's avatar
      [SPARK-6917] [SQL] DecimalType is not read back when non-native type exists · bcb47ad7
      Davies Liu authored
      cc yhuai
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6558 from davies/decimalType and squashes the following commits:
      
      c877ca8 [Davies Liu] Update ParquetConverter.scala
      48cc57c [Davies Liu] Update ParquetConverter.scala
      b43845c [Davies Liu] add test
      3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists
      bcb47ad7
    • Xiangrui Meng's avatar
      [SPARK-7582] [MLLIB] user guide for StringIndexer · 0221c7f0
      Xiangrui Meng authored
      This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6561 from mengxr/SPARK-7582 and squashes the following commits:
      
      4bba4f1 [Xiangrui Meng] fix example
      ba1cd1b [Xiangrui Meng] fix style
      7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
      136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
      0221c7f0
  2. Jun 01, 2015
  3. May 31, 2015
    • Wenchen Fan's avatar
      [SPARK-7952][SPARK-7984][SQL] equality check between boolean type and numeric type is broken. · a0e46a0d
      Wenchen Fan authored
      The origin code has several problems:
      * `true <=> 1` will return false as we didn't set a rule to handle it.
      * `true = a` where `a` is not `Literal` and its value is 1, will return false as we only handle literal values.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6505 from cloud-fan/tmp1 and squashes the following commits:
      
      77f0f39 [Wenchen Fan] minor fix
      b6401ba [Wenchen Fan] add type coercion for CaseKeyWhen and address comments
      ebc8c61 [Wenchen Fan] use SQLTestUtils and If
      625973c [Wenchen Fan] improve
      9ba2130 [Wenchen Fan] address comments
      fc0d741 [Wenchen Fan] fix style
      2846a04 [Wenchen Fan] fix 7952
      a0e46a0d
    • Davies Liu's avatar
      [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton · 91777a1c
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6532 from davies/decimal and squashes the following commits:
      
      c7fcbce [Davies Liu] Update tests.py
      1425359 [Davies Liu] DecimalType should not be singleton
      91777a1c
    • Reynold Xin's avatar
      [SPARK-7986] Split scalastyle config into 3 sections. · 6f006b5f
      Reynold Xin authored
      (1) rules that we enforce.
      (2) rules that we would like to enforce, but haven't cleaned up the codebase to
          turn on yet (or we need to make the scalastyle rule more configurable).
      (3) rules that we don't want to enforce.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6543 from rxin/scalastyle and squashes the following commits:
      
      beefaab [Reynold Xin] [SPARK-7986] Split scalastyle config into 3 sections.
      6f006b5f
    • Josh Rosen's avatar
      [MINOR] Enable PySpark SQL readerwriter and window tests · 9126ea4d
      Josh Rosen authored
      PySpark SQL's `readerwriter` and `window` doctests weren't being run by our test runner script; this patch re-enables them.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6542 from JoshRosen/enable-more-pyspark-sql-tests and squashes the following commits:
      
      9f46ce4 [Josh Rosen] Enable PySpark SQL readerwriter and window tests.
      9126ea4d
    • Sun Rui's avatar
      [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame. · 46576ab3
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #6183 from sun-rui/SPARK-7227 and squashes the following commits:
      
      dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
      41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.
      46576ab3
    • Reynold Xin's avatar
      [SPARK-3850] Turn style checker on for trailing whitespaces. · 866652c9
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6541 from rxin/trailing-whitespace-on and squashes the following commits:
      
      f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing whitespaces.
      866652c9
    • Yuhao Yang's avatar
      [SPARK-7949] [MLLIB] [DOC] update document with some missing save/load · 06747003
      Yuhao Yang authored
      add save load for examples:
      KMeansModel
      PowerIterationClusteringModel
      Word2VecModel
      IsotonicRegressionModel
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #6498 from hhbyyh/docSaveLoad and squashes the following commits:
      
      7f9f06d [Yuhao Yang] add missing imports
      c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docSaveLoad
      1dd77cc [Yuhao Yang] update document with some missing save/load
      06747003
    • Reynold Xin's avatar
      [SPARK-3850] Trim trailing spaces for MLlib. · e1067d0a
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6534 from rxin/whitespace-mllib and squashes the following commits:
      
      38926e3 [Reynold Xin] [SPARK-3850] Trim trailing spaces for MLlib.
      e1067d0a
    • zsxwing's avatar
      [MINOR] Add license for dagre-d3 and graphlib-dot · d1d2def2
      zsxwing authored
      Add license for dagre-d3 and graphlib-dot
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6539 from zsxwing/LICENSE and squashes the following commits:
      
      82b0475 [zsxwing] Add license for dagre-d3 and graphlib-dot
      d1d2def2
Loading