Skip to content
Snippets Groups Projects
  1. Aug 14, 2015
    • Holden Karau's avatar
      [SPARK-8744] [ML] Add a public constructor to StringIndexer · a7317ccd
      Holden Karau authored
      It would be helpful to allow users to pass a pre-computed index to create an indexer, rather than always going through StringIndexer to create the model.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7267 from holdenk/SPARK-8744-StringIndexerModel-should-have-public-constructor.
      a7317ccd
    • Joseph K. Bradley's avatar
      [SPARK-9956] [ML] Make trees work with one-category features · 7ecf0c46
      Joseph K. Bradley authored
      This modifies DecisionTreeMetadata construction to treat 1-category features as continuous, so that trees do not fail with such features.  It is important for the pipelines API, where VectorIndexer can automatically categorize certain features as categorical.
      
      As stated in the JIRA, this is a temp fix which we can improve upon later by automatically filtering out those features. That will take longer, though, since it will require careful indexing.
      
      Targeted for 1.5 and master
      
      CC: manishamde  mengxr yanboliang
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8187 from jkbradley/tree-1cat.
      7ecf0c46
    • Xiangrui Meng's avatar
      [SPARK-9661] [MLLIB] minor clean-up of SPARK-9661 · a0e1abbd
      Xiangrui Meng authored
      Some minor clean-ups after SPARK-9661. See my inline comments. MechCoder jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8190 from mengxr/SPARK-9661-fix.
      a0e1abbd
    • zsxwing's avatar
      [SPARK-9958] [SQL] Make HiveThriftServer2Listener thread-safe and update the... · c8677d73
      zsxwing authored
      [SPARK-9958] [SQL] Make HiveThriftServer2Listener thread-safe and update the tab name to "JDBC/ODBC Server"
      
      This PR fixed the thread-safe issue of HiveThriftServer2Listener, and also changed the tab name to "JDBC/ODBC Server" since it's conflict with the new SQL tab.
      
      <img width="1377" alt="thriftserver" src="https://cloud.githubusercontent.com/assets/1000778/9265707/c46f3f2c-4269-11e5-8d7e-888c9113ab4f.png">
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8185 from zsxwing/SPARK-9958.
      c8677d73
    • Liang-Chi Hsieh's avatar
      [MINOR] [SQL] Remove canEqual in Row · 7c7c7529
      Liang-Chi Hsieh authored
      As `InternalRow` does not extend `Row` now, I think we can remove it.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #8170 from viirya/remove_canequal.
      7c7c7529
  2. Aug 13, 2015
    • Davies Liu's avatar
      [SPARK-9945] [SQL] pageSize should be calculated from executor.memory · bd35385d
      Davies Liu authored
      Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead.
      
      Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8175 from davies/page_size.
      bd35385d
    • Andrew Or's avatar
      [SPARK-9580] [SQL] Replace singletons in SQL tests · 8187b3ae
      Andrew Or authored
      A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure.
      
      This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111)
      <!-- Reviewable:end -->
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8111 from andrewor14/sql-tests-refactor.
      8187b3ae
    • Davies Liu's avatar
      [SPARK-9943] [SQL] deserialized UnsafeHashedRelation should be serializable · c50f97da
      Davies Liu authored
      When the free memory in executor goes low, the cached broadcast objects need to serialized into disk, but currently the deserialized UnsafeHashedRelation can't be serialized , fail with NPE. This PR fixes that.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8174 from davies/serialize_hashed.
      c50f97da
    • Davies Liu's avatar
      [SPARK-8976] [PYSPARK] fix open mode in python3 · 693949ba
      Davies Liu authored
      This bug only happen on Python 3 and Windows.
      
      I tested this manually with python 3 and disable python daemon, no unit test yet.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8181 from davies/open_mode.
      693949ba
    • Xiangrui Meng's avatar
      [SPARK-9922] [ML] rename StringIndexerReverse to IndexToString · 6c5858bc
      Xiangrui Meng authored
      What `StringIndexerInverse` does is not strictly associated with `StringIndexer`, and the name is not clearly describing the transformation. Renaming to `IndexToString` might be better.
      
      ~~I also changed `invert` to `inverse` without arguments. `inputCol` and `outputCol` could be set after.~~
      I also removed `invert`.
      
      jkbradley holdenk
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8152 from mengxr/SPARK-9922.
      6c5858bc
    • hyukjinkwon's avatar
      [SPARK-9935] [SQL] EqualNotNull not processed in ORC · c2520f50
      hyukjinkwon authored
      https://issues.apache.org/jira/browse/SPARK-9935
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #8163 from HyukjinKwon/master.
      c2520f50
    • Davies Liu's avatar
      [SPARK-9942] [PYSPARK] [SQL] ignore exceptions while try to import pandas · a8d2f4c5
      Davies Liu authored
      If pandas is broken (can't be imported, raise other exceptions other than ImportError), pyspark can't be imported, we should ignore all the exceptions.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8173 from davies/fix_pandas.
      a8d2f4c5
    • MechCoder's avatar
      [SPARK-9661] [MLLIB] [ML] Java compatibility · 864de8ea
      MechCoder authored
      I skimmed through the docs for various instance of Object and replaced them with Java compaible versions of the same.
      
      1. Some methods in LDAModel.
      2. runMiniBatchSGD
      3. kolmogorovSmirnovTest
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #8126 from MechCoder/java_incop.
      864de8ea
    • Andrew Or's avatar
      [SPARK-9649] Fix MasterSuite, third time's a charm · 8815ba2f
      Andrew Or authored
      This particular test did not load the default configurations so
      it continued to start the REST server, which causes port bind
      exceptions.
      8815ba2f
    • Xiangrui Meng's avatar
      [MINOR] [DOC] fix mllib pydoc warnings · 65fec798
      Xiangrui Meng authored
      Switch to correct Sphinx syntax. MechCoder
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8169 from mengxr/mllib-pydoc-fix.
      65fec798
    • Yanbo Liang's avatar
      [MINOR] [ML] change MultilayerPerceptronClassifierModel to MultilayerPerceptronClassificationModel · 4b70798c
      Yanbo Liang authored
      To follow the naming rule of ML, change `MultilayerPerceptronClassifierModel` to `MultilayerPerceptronClassificationModel` like `DecisionTreeClassificationModel`, `GBTClassificationModel` and so on.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8164 from yanboliang/mlp-name.
      4b70798c
    • Rosstin's avatar
      [SPARK-8965] [DOCS] Add ml-guide Python Example: Estimator, Transformer, and Param · 7a539ef3
      Rosstin authored
      Added ml-guide Python Example: Estimator, Transformer, and Param
      /docs/_site/ml-guide.html
      
      Author: Rosstin <asterazul@gmail.com>
      
      Closes #8081 from Rosstin/SPARK-8965.
      7a539ef3
    • lewuathe's avatar
      [SPARK-9073] [ML] spark.ml Models copy() should call setParent when there is a parent · 2932e25d
      lewuathe authored
      Copied ML models must have the same parent of original ones
      
      Author: lewuathe <lewuathe@me.com>
      Author: Lewuathe <lewuathe@me.com>
      
      Closes #7447 from Lewuathe/SPARK-9073.
      2932e25d
    • Cheng Lian's avatar
      [SPARK-9757] [SQL] Fixes persistence of Parquet relation with decimal column · 69930310
      Cheng Lian authored
      PR #7967 enables us to save data source relations to metastore in Hive compatible format when possible. But it fails to persist Parquet relations with decimal column(s) to Hive metastore of versions lower than 1.2.0. This is because `ParquetHiveSerDe` in Hive versions prior to 1.2.0 doesn't support decimal. This PR checks for this case and falls back to Spark SQL specific metastore table format.
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8130 from liancheng/spark-9757/old-hive-parquet-decimal.
      69930310
    • Yin Huai's avatar
      [SPARK-9885] [SQL] Also pass barrierPrefixes and sharedPrefixes to... · 84a27916
      Yin Huai authored
      [SPARK-9885] [SQL] Also pass barrierPrefixes and sharedPrefixes to IsolatedClientLoader when hiveMetastoreJars is set to maven.
      
      https://issues.apache.org/jira/browse/SPARK-9885
      
      cc marmbrus liancheng
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8158 from yhuai/classloaderMaven.
      84a27916
    • Xiangrui Meng's avatar
      [SPARK-9918] [MLLIB] remove runs from k-means and rename epsilon to tol · 68f99571
      Xiangrui Meng authored
      This requires some discussion. I'm not sure whether `runs` is a useful parameter. It certainly complicates the implementation. We might want to optimize the k-means implementation with block matrix operations. In this case, having `runs` may not be worth the trade-off. Also it increases the communication cost in a single job, which might cause other issues.
      
      This PR also renames `epsilon` to `tol` to have consistent naming among algorithms. The Python constructor is updated to include all parameters.
      
      jkbradley yu-iskw
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8148 from mengxr/SPARK-9918 and squashes the following commits:
      
      149b9e5 [Xiangrui Meng] fix constructor in Python and rename epsilon to tol
      3cc15b3 [Xiangrui Meng] fix test and change initStep to initSteps in python
      a0a0274 [Xiangrui Meng] remove runs from k-means in the pipeline API
      68f99571
    • Yijie Shen's avatar
      [SPARK-9927] [SQL] Revert 8049 since it's pushing wrong filter down · d0b18919
      Yijie Shen authored
      I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-9927
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #8157 from yjshen/rever8049.
      d0b18919
    • Xiangrui Meng's avatar
      [SPARK-9914] [ML] define setters explicitly for Java and use setParam group in RFormula · d7eb371e
      Xiangrui Meng authored
      The problem with defining setters in the base class is that it doesn't return the correct type in Java.
      
      ericl
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8143 from mengxr/SPARK-9914 and squashes the following commits:
      
      d36c887 [Xiangrui Meng] remove setters from model
      a49021b [Xiangrui Meng] define setters explicitly for Java and use setParam group
      d7eb371e
  3. Aug 12, 2015
Loading