Skip to content
Snippets Groups Projects
  1. Jun 03, 2015
    • Reynold Xin's avatar
      [SPARK-8074] Parquet should throw AnalysisException during setup for data... · 939e4f3d
      Reynold Xin authored
      [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6608 from rxin/parquet-analysis and squashes the following commits:
      
      b5dc8e2 [Reynold Xin] Code review feedback.
      5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
      939e4f3d
    • Sun Rui's avatar
      [SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable... · 708c63bb
      Sun Rui authored
      [SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits:
      
      51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
      708c63bb
    • Hari Shreedharan's avatar
      [SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro... · d2a86eb8
      Hari Shreedharan authored
      ...m History Server
      
      This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.
      
      This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #5792 from harishreedharan/eventlog-download and squashes the following commits:
      
      221cc26 [Hari Shreedharan] Update docs with new API information.
      a131be6 [Hari Shreedharan] Fix style issues.
      5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
      6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
      d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
      ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
      1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
      5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
      0b66948 [Hari Shreedharan] Minor formatting/import fixes.
      4fc518c [Hari Shreedharan] Fix rat failures.
      a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
      0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
      350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
      fd6ab00 [Hari Shreedharan] Fix style issues
      32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
      7b362b2 [Hari Shreedharan] Almost working.
      3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
      d2a86eb8
    • animesh's avatar
      [SPARK-7980] [SQL] Support SQLContext.range(end) · d053a31b
      animesh authored
      1. range() overloaded in SQLContext.scala
      2. range() modified in python sql context.py
      3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py
      
      Author: animesh <animesh@apache.spark>
      
      Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits:
      
      935899c [animesh] SPARK-7980:python+scala changes
      d053a31b
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
    • Yin Huai's avatar
      [SPARK-7973] [SQL] Increase the timeout of two CliSuite tests. · f1646e10
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-7973
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6525 from yhuai/SPARK-7973 and squashes the following commits:
      
      763b821 [Yin Huai] Also change the timeout of "Single command with -e" to 2 minutes.
      e598a08 [Yin Huai] Increase the timeout to 3 minutes.
      f1646e10
    • Yuhao Yang's avatar
      [SPARK-7983] [MLLIB] Add require for one-based indices in loadLibSVMFile · 28dbde38
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-7983
      
      Customers frequently use zero-based indices in their LIBSVM files. No warnings or errors from Spark will be reported during their computation afterwards, and usually it will lead to wired result for many algorithms (like GBDT).
      
      add a quick check.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #6538 from hhbyyh/loadSVM and squashes the following commits:
      
      79d9c11 [Yuhao Yang] optimization as respond to comments
      4310710 [Yuhao Yang] merge conflict
      96460f1 [Yuhao Yang] merge conflict
      20a2811 [Yuhao Yang] use require
      6e4f8ca [Yuhao Yang] add check for ascending order
      9956365 [Yuhao Yang] add ut for 0-based loadlibsvm exception
      5bd1f9a [Yuhao Yang] add require for one-based in loadLIBSVM
      28dbde38
    • Wenchen Fan's avatar
      [SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch · d38cf217
      Wenchen Fan authored
      It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is.
      
      This PR mainly works on apply input types checking for arithmetic and predicate expressions.
      
      TODO: apply type checking interface to more expressions.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6405 from cloud-fan/6444 and squashes the following commits:
      
      b5ff31b [Wenchen Fan] address comments
      b917275 [Wenchen Fan] rebase
      39929d9 [Wenchen Fan] add todo
      0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private
      3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison
      8883025 [Wenchen Fan] apply type check interface to CaseWhen
      cffb67c [Wenchen Fan] to have resolved call the data type check function
      6eaadff [Wenchen Fan] add equal type constraint to EqualTo
      3affbd8 [Wenchen Fan] more fixes
      654d46a [Wenchen Fan] improve tests
      e0a3628 [Wenchen Fan] improve error message
      1524ff6 [Wenchen Fan] fix style
      69ca3fe [Wenchen Fan] add error message and tests
      c71d02c [Wenchen Fan] fix hive tests
      6491721 [Wenchen Fan] use value class TypeCheckResult
      7ae76b9 [Wenchen Fan] address comments
      cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
      d38cf217
    • Reynold Xin's avatar
      [SPARK-8060] Improve DataFrame Python test coverage and documentation. · ce320cb2
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits:
      
      baa8ad5 [Reynold Xin] Code review feedback.
      f081d47 [Reynold Xin] More documentation updates.
      c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing.
      ce320cb2
    • MechCoder's avatar
      [SPARK-8032] [PYSPARK] Make version checking for NumPy in MLlib more robust · 452eb82d
      MechCoder authored
      The current checking does version `1.x' is less than `1.4' this will fail if x has greater than 1 digit, since x > 4, however `1.x` < `1.4`
      
      It fails in my system since I have version `1.10` :P
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6579 from MechCoder/np_ver and squashes the following commits:
      
      15430f8 [MechCoder] fix syntax error
      893fb7e [MechCoder] remove equal to
      e35f0d4 [MechCoder] minor
      e89376c [MechCoder] Better checking
      22703dd [MechCoder] [SPARK-8032] Make version checking for NumPy in MLlib more robust
      452eb82d
    • Yuhao Yang's avatar
      [SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc · 43adbd56
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-8043
      
      I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits:
      
      a01a206 [Yuhao Yang] fix for Gaussian mixture
      2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc
      43adbd56
    • WangTaoTheTonic's avatar
      [MINOR] make the launcher project name consistent with others · ccaa8232
      WangTaoTheTonic authored
      I found this by chance while building spark and think it is better to keep its name consistent with other sub-projects (Spark Project *).
      
      I am not gonna file JIRA as it is a pretty small issue.
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #6603 from WangTaoTheTonic/projName and squashes the following commits:
      
      994b3ba [WangTaoTheTonic] make the project name consistent
      ccaa8232
    • Joseph K. Bradley's avatar
      [SPARK-8053] [MLLIB] renamed scalingVector to scalingVec · 07c16cb5
      Joseph K. Bradley authored
      I searched the Spark codebase for all occurrences of "scalingVector"
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #6596 from jkbradley/scalingVec-rename and squashes the following commits:
      
      d3812f8 [Joseph K. Bradley] renamed scalingVector to scalingVec
      07c16cb5
    • Josh Rosen's avatar
      [SPARK-7691] [SQL] Refactor CatalystTypeConverter to use type-specific row accessors · cafd5056
      Josh Rosen authored
      This patch significantly refactors CatalystTypeConverters to both clean up the code and enable these conversions to work with future Project Tungsten features.
      
      At a high level, I've reorganized the code so that all functions dealing with the same type are grouped together into type-specific subclasses of `CatalystTypeConveter`.  In addition, I've added new methods that allow the Catalyst Row -> Scala Row conversions to access the Catalyst row's fields through type-specific `getTYPE()` methods rather than the generic `get()` / `Row.apply` methods.  This refactoring is a blocker to being able to unit test new operators that I'm developing as part of Project Tungsten, since those operators may output `UnsafeRow` instances which don't support the generic `get()`.
      
      The stricter type usage of types here has uncovered some bugs in other parts of Spark SQL:
      
      - #6217: DescribeCommand is assigned wrong output attributes in SparkStrategies
      - #6218: DataFrame.describe() should cast all aggregates to String
      - #6400: Use output schema, not relation schema, for data source input conversion
      
      Spark SQL current has undefined behavior for what happens when you try to create a DataFrame from user-specified rows whose values don't match the declared schema.  According to the `createDataFrame()` Scaladoc:
      
      >  It is important to make sure that the structure of every [[Row]] of the provided RDD matches the provided schema. Otherwise, there will be runtime exception.
      
      Given this, it sounds like it's technically not a break of our API contract to fail-fast when the data types don't match. However, there appear to be many cases where we don't fail even though the types don't match. For example, `JavaHashingTFSuite.hasingTF` passes a column of integers values for a "label" column which is supposed to contain floats.  This column isn't actually read or modified as part of query processing, so its actual concrete type doesn't seem to matter. In other cases, there could be situations where we have generic numeric aggregates that tolerate being called with different numeric types than the schema specified, but this can be okay due to numeric conversions.
      
      In the long run, we will probably want to come up with precise semantics for implicit type conversions / widening when converting Java / Scala rows to Catalyst rows.  Until then, though, I think that failing fast with a ClassCastException is a reasonable behavior; this is the approach taken in this patch.  Note that certain optimizations in the inbound conversion functions for primitive types mean that we'll probably preserve the old undefined behavior in a majority of cases.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6222 from JoshRosen/catalyst-converters-refactoring and squashes the following commits:
      
      740341b [Josh Rosen] Optimize method dispatch for primitive type conversions
      befc613 [Josh Rosen] Add tests to document Option-handling behavior.
      5989593 [Josh Rosen] Use new SparkFunSuite base in CatalystTypeConvertersSuite
      6edf7f8 [Josh Rosen] Re-add convertToScala(), since a Hive test still needs it
      3f7b2d8 [Josh Rosen] Initialize converters lazily so that the attributes are resolved first
      6ad0ebb [Josh Rosen] Fix JavaHashingTFSuite ClassCastException
      677ff27 [Josh Rosen] Fix null handling bug; add tests.
      8033d4c [Josh Rosen] Fix serialization error in UserDefinedGenerator.
      85bba9d [Josh Rosen] Fix wrong input data in InMemoryColumnarQuerySuite
      9c0e4e1 [Josh Rosen] Remove last use of convertToScala().
      ae3278d [Josh Rosen] Throw ClassCastException errors during inbound conversions.
      7ca7fcb [Josh Rosen] Comments and cleanup
      1e87a45 [Josh Rosen] WIP refactoring of CatalystTypeConverters
      cafd5056
  2. Jun 02, 2015
    • DB Tsai's avatar
      [SPARK-7547] [ML] Scala Example code for ElasticNet · a86b3e9b
      DB Tsai authored
      This is scala example code for both linear and logistic regression. Python and Java versions are to be added.
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #6576 from dbtsai/elasticNetExample and squashes the following commits:
      
      e7ca406 [DB Tsai] fix test
      6bb6d77 [DB Tsai] fix suite and remove duplicated setMaxIter
      136e0dd [DB Tsai] address feedback
      1ec29d4 [DB Tsai] fix style
      9462f5f [DB Tsai] add example
      a86b3e9b
    • Ram Sriharsha's avatar
      [SPARK-7387] [ML] [DOC] CrossValidator example code in Python · c3f4c325
      Ram Sriharsha authored
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #6358 from harsha2010/SPARK-7387 and squashes the following commits:
      
      63efda2 [Ram Sriharsha] more examples for classifier to distinguish mapreduce from spark properly
      aeb6bb6 [Ram Sriharsha] Python Style Fix
      54a500c [Ram Sriharsha] Merge branch 'master' into SPARK-7387
      615e91c [Ram Sriharsha] cleanup
      204c4e3 [Ram Sriharsha] Merge branch 'master' into SPARK-7387
      7246d35 [Ram Sriharsha] [SPARK-7387][ml][doc] CrossValidator example code in Python
      c3f4c325
    • Cheng Lian's avatar
      [SQL] [TEST] [MINOR] Follow-up of PR #6493, use Guava API to ensure Java 6 friendliness · 5cd6a63d
      Cheng Lian authored
      This is a follow-up of PR #6493, which has been reverted in branch-1.4 because it uses Java 7 specific APIs and breaks Java 6 build. This PR replaces those APIs with equivalent Guava ones to ensure Java 6 friendliness.
      
      cc andrewor14 pwendell, this should also be back ported to branch-1.4.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6547 from liancheng/override-log4j and squashes the following commits:
      
      c900cfd [Cheng Lian] Addresses Shixiong's comment
      72da795 [Cheng Lian] Uses Guava API to ensure Java 6 friendliness
      5cd6a63d
    • Xiangrui Meng's avatar
      [SPARK-8049] [MLLIB] drop tmp col from OneVsRest output · 89f21f66
      Xiangrui Meng authored
      The temporary column should be dropped after we get the prediction column. harsha2010
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6592 from mengxr/SPARK-8049 and squashes the following commits:
      
      1d89107 [Xiangrui Meng] use SparkFunSuite
      6ee70de [Xiangrui Meng] drop tmp col from OneVsRest output
      89f21f66
    • Davies Liu's avatar
      [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise() · 605ddbb2
      Davies Liu authored
      Thanks ogirardot, closes #6580
      
      cc rxin JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6590 from davies/when and squashes the following commits:
      
      c0f2069 [Davies Liu] fix Column.when() and otherwise()
      605ddbb2
    • Cheng Lian's avatar
      [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a... · 686a45f0
      Cheng Lian authored
      [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append
      
      The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later.
      
      This PR fixes this issue by deferring metadata discovery after save mode checking.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6583 from liancheng/spark-8014 and squashes the following commits:
      
      1aafabd [Cheng Lian] Updates comments
      088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined
      8fbd93f [Cheng Lian] Fixes SPARK-8014
      686a45f0
    • Mike Dusenberry's avatar
      [SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating... · ad06727f
      Mike Dusenberry authored
      [SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating ML Doc "Estimator, Transformer, and Param" examples.
      
      Updating ML Doc's *"Estimator, Transformer, and Param"* example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists.
      
      mengxr, I believe this addresses (part of) the *update documentation* TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820).
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits:
      
      6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model.
      d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed.
      0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists.
      7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists.
      ad06727f
    • Marcelo Vanzin's avatar
      [SPARK-8015] [FLUME] Remove Guava dependency from flume-sink. · 0071bd8d
      Marcelo Vanzin authored
      The minimal change would be to disable shading of Guava in the module,
      and rely on the transitive dependency from other libraries instead. But
      since Guava's use is so localized, I think it's better to just not use
      it instead, so I replaced that code and removed all traces of Guava from
      the module's build.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6555 from vanzin/SPARK-8015 and squashes the following commits:
      
      c0ceea8 [Marcelo Vanzin] Add comments about dependency management.
      c38228d [Marcelo Vanzin] Add guava dep in test scope.
      b7a0349 [Marcelo Vanzin] Add libthrift exclusion.
      6e0942d [Marcelo Vanzin] Add comment in pom.
      2d79260 [Marcelo Vanzin] [SPARK-8015] [flume] Remove Guava dependency from flume-sink.
      0071bd8d
    • Cheng Lian's avatar
      [SPARK-8037] [SQL] Ignores files whose name starts with dot in HadoopFsRelation · 1bb5d716
      Cheng Lian authored
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6581 from liancheng/spark-8037 and squashes the following commits:
      
      d08e97b [Cheng Lian] Ignores files whose name starts with dot in HadoopFsRelation
      1bb5d716
    • Xiangrui Meng's avatar
      [SPARK-7432] [MLLIB] fix flaky CrossValidator doctest · bd97840d
      Xiangrui Meng authored
      The new test uses CV to compare `maxIter=0` and `maxIter=1`, and validate on the evaluation result. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6572 from mengxr/SPARK-7432 and squashes the following commits:
      
      c236bb8 [Xiangrui Meng] fix flacky cv doctest
      bd97840d
    • Davies Liu's avatar
      [SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with Scala · 445647a1
      Davies Liu authored
      add schema()/format()/options() for reader,  add mode()/format()/options()/partitionBy() for writer
      
      cc rxin yhuai  pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6578 from davies/readwrite and squashes the following commits:
      
      720d293 [Davies Liu] address comments
      b65dfa2 [Davies Liu] Update readwriter.py
      1299ab6 [Davies Liu] make Python API consistent with Scala
      445647a1
    • Yin Huai's avatar
      [SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid... · 0f80990b
      Yin Huai authored
      [SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid collapsing nondeterministic projects.
      
      This closes #6570.
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6573 from rxin/deterministic and squashes the following commits:
      
      356cd22 [Reynold Xin] Added unit test for the optimizer.
      da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023
      da56200 [Yin Huai] Comments.
      e38f264 [Yin Huai] Comment.
      f9d6a73 [Yin Huai] Add a deterministic method to Expression.
      0f80990b
    • Yin Huai's avatar
      [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get... · 7b7f7b6c
      Yin Huai authored
      [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early
      
      https://issues.apache.org/jira/browse/SPARK-8020
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6571 from yhuai/SPARK-8020-1 and squashes the following commits:
      
      0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.
      7b7f7b6c
    • Davies Liu's avatar
      [SPARK-6917] [SQL] DecimalType is not read back when non-native type exists · bcb47ad7
      Davies Liu authored
      cc yhuai
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6558 from davies/decimalType and squashes the following commits:
      
      c877ca8 [Davies Liu] Update ParquetConverter.scala
      48cc57c [Davies Liu] Update ParquetConverter.scala
      b43845c [Davies Liu] add test
      3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists
      bcb47ad7
    • Xiangrui Meng's avatar
      [SPARK-7582] [MLLIB] user guide for StringIndexer · 0221c7f0
      Xiangrui Meng authored
      This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6561 from mengxr/SPARK-7582 and squashes the following commits:
      
      4bba4f1 [Xiangrui Meng] fix example
      ba1cd1b [Xiangrui Meng] fix style
      7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
      136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
      0221c7f0
  3. Jun 01, 2015
Loading