Commits · 2bcdf8c239d2ba79f64fb8878da83d4c2ec28b30 · cs525-sp18-g07 / spark

Jun 04, 2015

[SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate · 2bcdf8c2

Reynold Xin authored 9 years ago

This patch replaces Distinct with Aggregate in the optimizer, so Distinct will become
more efficient over time as we optimize Aggregate (via Tungsten).

Author: Reynold Xin <rxin@databricks.com>

Closes #6637 from rxin/replace-distinct and squashes the following commits:

b3cc50e [Reynold Xin] Mima excludes.
93d6117 [Reynold Xin] Code review feedback.
87e4741 [Reynold Xin] [SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate.

2bcdf8c2

Fixed style issues for [SPARK-6909][SQL] Remove Hive Shim code. · 65938422
Reynold Xin authored 9 years ago

65938422

[SPARK-6909][SQL] Remove Hive Shim code · 0526fea4

Cheolsoo Park authored 9 years ago

This is a follow-up on #6393. I am removing the following files in this PR.
```
./sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala
./sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala
```
Basically, I re-factored the shim code as follows-
* Rewrote code directly with Hive 0.13 methods, or
* Converted code into private methods, or
* Extracted code into separate classes

But for leftover code that didn't fit in any of these cases, I created a HiveShim object. For eg, helper functions which wrap Hive 0.13 methods to work around Hive bugs are placed here.

Author: Cheolsoo Park <cheolsoop@netflix.com>

Closes #6604 from piaozhexiu/SPARK-6909 and squashes the following commits:

5dccc20 [Cheolsoo Park] Remove hive shim code

0526fea4

[SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh · 3dc00528

Shivaram Venkataraman authored 9 years ago

This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available

Related to discussion in #6567

cc pwendell srowen -- Let me know if this looks better

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:

b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available

3dc00528

[SPARK-7743] [SQL] Parquet 1.7 · cd3176bd

Thomas Omans authored 9 years ago

Resolves [SPARK-7743](https://issues.apache.org/jira/browse/SPARK-7743).

Trivial changes of versions, package names, as well as a small issue in `ParquetTableOperations.scala`

```diff
-    val readContext = getReadSupport(configuration).init(
+    val readContext = ParquetInputFormat.getReadSupportInstance(configuration).init(
```

Since ParquetInputFormat.getReadSupport was made package private in the latest release.

Thanks
-- Thomas Omans

Author: Thomas Omans <tomans@cj.com>

Closes #6597 from eggsby/SPARK-7743 and squashes the following commits:

2df0d1b [Thomas Omans] [SPARK-7743] [SQL] Upgrading parquet version to 1.7.0

cd3176bd

[SPARK-7969] [SQL] Added a DataFrame.drop function that accepts a Column reference. · df7da07a

Mike Dusenberry authored 9 years ago

Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests. Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits:

514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0.
2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over.
6bf7c0e [Mike Dusenberry] Minor code formatting change.
e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name.
5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names.
4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference.
986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests. Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function.

df7da07a

[SPARK-7956] [SQL] Use Janino to compile SQL expressions into bytecode · c8709dcf

Davies Liu authored 9 years ago

In order to reduce the overhead of codegen, this PR switch to use Janino to compile SQL expressions into bytecode.

After this, the time used to compile a SQL expression is decreased from 100ms to 5ms, which is necessary to turn on codegen for general workload, also tests.

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #6479 from davies/janino and squashes the following commits:

cc689f5 [Davies Liu] remove globalLock
262d848 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
eec3a33 [Davies Liu] address comments from Josh
f37c8c3 [Davies Liu] fix DecimalType and cast to String
202298b [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
a21e968 [Davies Liu] fix style
0ed3dc6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
551a851 [Davies Liu] fix tests
c3bdffa [Davies Liu] remove print
6089ce5 [Davies Liu] change logging level
7e46ac3 [Davies Liu] fix style
d8f0f6c [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
da4926a [Davies Liu] fix tests
03660f3 [Davies Liu] WIP: use Janino to compile Java source
f2629cd [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
f7d66cf [Davies Liu] use template based string for codegen

c8709dcf

Fix maxTaskFailures comment · 10ba1880

Daniel Darabos authored 9 years ago

If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks!

Author: Daniel Darabos <darabos.daniel@gmail.com>

Closes #6621 from darabos/patch-2 and squashes the following commits:

dfebdec [Daniel Darabos] Fix comment.

10ba1880

MAINTENANCE: Automated closing of pull requests. · 9982d453

Patrick Wendell authored 9 years ago

This commit exists to close the following pull requests on Github:

Closes #5976 (close requested by 'JoshRosen')
Closes #4576 (close requested by 'pwendell')
Closes #3430 (close requested by 'pwendell')
Closes #2495 (close requested by 'pwendell')

9982d453

Jun 03, 2015

[BUILD] Fix Maven build for Kinesis · 984ad601

Andrew Or authored 9 years ago

A necessary dependency that is transitively referenced is not
provided, causing compilation failures in builds that provide
the kinesis-asl profile.

984ad601

[BUILD] Use right branch when checking against Hive · 9cf740f3

Andrew Or authored 9 years ago

Right now we always run hive tests in branch-1.4 PRs because we compare whether the diff against master involves hive changes. Really we should be comparing against the target branch itself.

Author: Andrew Or <andrew@databricks.com>

Closes #6629 from andrewor14/build-check-hive and squashes the following commits:

450fbbd [Andrew Or] [BUILD] Use right branch when checking against Hive

9cf740f3

[BUILD] Increase Jenkins test timeout · e35cd36e

Andrew Or authored 9 years ago

Currently hive tests alone take 40m. The right thing to do is
to reduce the test time. However, that is a bigger project and
we currently have PRs blocking on tests not timing out.

e35cd36e

[SPARK-8084] [SPARKR] Make SparkR scripts fail on error · 0576c3c4

Shivaram Venkataraman authored 9 years ago

cc shaneknapp pwendell JoshRosen

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6623 from shivaram/SPARK-8084 and squashes the following commits:

0ec5b26 [Shivaram Venkataraman] Make SparkR scripts fail on error

0576c3c4

[SPARK-8088] don't attempt to lower number of executors by 0 · 51898b51

Ryan Williams authored 9 years ago

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #6624 from ryan-williams/execs and squashes the following commits:

b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0

51898b51

[HOTFIX] History Server API docs error fix. · 566cb594

Hari Shreedharan authored 9 years ago

Minor error in the monitoring docs. Also made indentation changes in `ApiRootResource`

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #6628 from harishreedharan/eventlog-formatting and squashes the following commits:

a12553d [Hari Shreedharan] Javadoc updates.
ca399b6 [Hari Shreedharan] [HOTFIX] History Server API docs error fix.

566cb594

[HOTFIX] [TYPO] Fix typo in #6546 · bfbdab12
Andrew Or authored 9 years ago

bfbdab12

[SPARK-6164] [ML] CrossValidatorModel should keep stats from fitting · d8662cd9

leahmcguire authored 9 years ago

Added stats from cross validation as a val in the cross validation model to save them for user access.

Author: leahmcguire <lmcguire@salesforce.com>

Closes #5915 from leahmcguire/saveCVmetrics and squashes the following commits:

49b507b [leahmcguire] fixed tyle error
67537b1 [leahmcguire] rebased
85907f0 [leahmcguire] fixed name
59987cc [leahmcguire] changed param name and test according to comments
36e71e3 [leahmcguire] rebasing
4b8223e [leahmcguire] fixed name
4ddffc6 [leahmcguire] changed param name and test according to comments
3a995da [leahmcguire] Added stats from cross validation as a val in the cross validation model to save them for user access

d8662cd9

[SPARK-8051] [MLLIB] make StringIndexerModel silent if input column does not exist · 26c9d7a0

Xiangrui Meng authored 9 years ago

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist

26c9d7a0

[SPARK-3674] [EC2] Clear SPARK_WORKER_INSTANCES when using YARN · d3e026f8

Shivaram Venkataraman authored 9 years ago

cc andrewor14

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6424 from shivaram/spark-worker-instances-yarn-ec2 and squashes the following commits:

db244ae [Shivaram Venkataraman] Make Python Lint happy
0593d1b [Shivaram Venkataraman] Clear SPARK_WORKER_INSTANCES when using YARN

d3e026f8

[HOTFIX] Fix Hadoop-1 build caused by #5792. · a8f1f154

Hari Shreedharan authored 9 years ago

Replaced `fs.listFiles` with Hadoop-1 friendly `fs.listStatus` method.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #6619 from harishreedharan/evetlog-hadoop-1-fix and squashes the following commits:

6192078 [Hari Shreedharan] [HOTFIX] Fix Hadoop-1 build caused by #5972.

a8f1f154

[SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and... · f2713478

zsxwing authored 9 years ago

[SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite

The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs.

This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up.

Author: zsxwing <zsxwing@gmail.com>

Closes #6546 from zsxwing/SPARK-7989 and squashes the following commits:

5560e09 [zsxwing] Fix a typo
3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite

f2713478

[SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout · 1d8669f1

zsxwing authored 9 years ago

Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.

Author: zsxwing <zsxwing@gmail.com>

Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits:

607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout

1d8669f1

[SPARK-8059] [YARN] Wake up allocation thread when new requests arrive. · aa40c442

Marcelo Vanzin authored 9 years ago

This should help reduce latency for new executor allocations.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6600 from vanzin/SPARK-8059 and squashes the following commits:

8387a3a [Marcelo Vanzin] [SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.

aa40c442

[SPARK-8083] [MESOS] Use the correct base path in mesos driver page. · bfbf12b3

Timothy Chen authored 9 years ago

Author: Timothy Chen <tnachen@gmail.com>

Closes #6615 from tnachen/mesos_driver_path and squashes the following commits:

4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.

bfbf12b3

[MINOR] [UI] Improve confusing message on log page · c6a6dd0d

Andrew Or authored 9 years ago

It's good practice to check if the input path is in the directory
we expect to avoid potentially confusing error messages.

c6a6dd0d

[SPARK-8054] [MLLIB] Added several Java-friendly APIs + unit tests · 20a26b59

Joseph K. Bradley authored 9 years ago

Java-friendly APIs added:
* GaussianMixture.run()
* GaussianMixtureModel.predict()
* DistributedLDAModel.javaTopicDistributions()
* StreamingKMeans: trainOn, predictOn, predictOnValues
* Statistics.corr
* params
  * added doc to w() since Java docs do not inherit doc
  * removed non-Java-friendly w() from StringArrayParam and DoubleArrayParam
  * made DoubleArrayParam Java-friendly w() actually Java-friendly

I generated the doc and verified all changes.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6562 from jkbradley/java-api-1.4 and squashes the following commits:

c16821b [Joseph K. Bradley] Small fixes based on code review.
d955581 [Joseph K. Bradley] unit test fixes
29b6b0d [Joseph K. Bradley] small fixes
fe6dcfe [Joseph K. Bradley] Added several Java-friendly APIs + unit tests: NaiveBayes, GaussianMixture, LDA, StreamingKMeans, Statistics.corr, params

20a26b59

Update documentation for [SPARK-7980] [SQL] Support SQLContext.range(end) · 2c5a06ca
Reynold Xin authored 9 years ago

2c5a06ca

[SPARK-8074] Parquet should throw AnalysisException during setup for data... · 939e4f3d

Reynold Xin authored 9 years ago

[SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.

Author: Reynold Xin <rxin@databricks.com>

Closes #6608 from rxin/parquet-analysis and squashes the following commits:

b5dc8e2 [Reynold Xin] Code review feedback.
5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.

939e4f3d

[SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable... · 708c63bb

Sun Rui authored 9 years ago

[SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.

Author: Sun Rui <rui.sun@intel.com>

Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits:

51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.

708c63bb

[SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro... · d2a86eb8

Hari Shreedharan authored 9 years ago

...m History Server

This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.

This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #5792 from harishreedharan/eventlog-download and squashes the following commits:

221cc26 [Hari Shreedharan] Update docs with new API information.
a131be6 [Hari Shreedharan] Fix style issues.
5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
0b66948 [Hari Shreedharan] Minor formatting/import fixes.
4fc518c [Hari Shreedharan] Fix rat failures.
a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
fd6ab00 [Hari Shreedharan] Fix style issues
32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
7b362b2 [Hari Shreedharan] Almost working.
3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.

d2a86eb8

[SPARK-7980] [SQL] Support SQLContext.range(end) · d053a31b

animesh authored 9 years ago

1. range() overloaded in SQLContext.scala
2. range() modified in python sql context.py
3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py

Author: animesh <animesh@apache.spark>

Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits:

935899c [animesh] SPARK-7980:python+scala changes

d053a31b

[SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e

Patrick Wendell authored 9 years ago

Author: Patrick Wendell <patrick@databricks.com>

Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:

2f42d02 [Patrick Wendell] A few more excludes
4bebcf0 [Patrick Wendell] Update to RC4
61aaf46 [Patrick Wendell] Using new release candidate
55f1610 [Patrick Wendell] Another exclude
04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0

2c4d550e

[SPARK-7973] [SQL] Increase the timeout of two CliSuite tests. · f1646e10

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-7973

Author: Yin Huai <yhuai@databricks.com>

Closes #6525 from yhuai/SPARK-7973 and squashes the following commits:

763b821 [Yin Huai] Also change the timeout of "Single command with -e" to 2 minutes.
e598a08 [Yin Huai] Increase the timeout to 3 minutes.

f1646e10

[SPARK-7983] [MLLIB] Add require for one-based indices in loadLibSVMFile · 28dbde38

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-7983

Customers frequently use zero-based indices in their LIBSVM files. No warnings or errors from Spark will be reported during their computation afterwards, and usually it will lead to wired result for many algorithms (like GBDT).

add a quick check.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #6538 from hhbyyh/loadSVM and squashes the following commits:

79d9c11 [Yuhao Yang] optimization as respond to comments
4310710 [Yuhao Yang] merge conflict
96460f1 [Yuhao Yang] merge conflict
20a2811 [Yuhao Yang] use require
6e4f8ca [Yuhao Yang] add check for ascending order
9956365 [Yuhao Yang] add ut for 0-based loadlibsvm exception
5bd1f9a [Yuhao Yang] add require for one-based in loadLIBSVM

28dbde38

[SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch · d38cf217

Wenchen Fan authored 9 years ago

It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is.

This PR mainly works on apply input types checking for arithmetic and predicate expressions.

TODO: apply type checking interface to more expressions.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #6405 from cloud-fan/6444 and squashes the following commits:

b5ff31b [Wenchen Fan] address comments
b917275 [Wenchen Fan] rebase
39929d9 [Wenchen Fan] add todo
0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private
3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison
8883025 [Wenchen Fan] apply type check interface to CaseWhen
cffb67c [Wenchen Fan] to have resolved call the data type check function
6eaadff [Wenchen Fan] add equal type constraint to EqualTo
3affbd8 [Wenchen Fan] more fixes
654d46a [Wenchen Fan] improve tests
e0a3628 [Wenchen Fan] improve error message
1524ff6 [Wenchen Fan] fix style
69ca3fe [Wenchen Fan] add error message and tests
c71d02c [Wenchen Fan] fix hive tests
6491721 [Wenchen Fan] use value class TypeCheckResult
7ae76b9 [Wenchen Fan] address comments
cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch

d38cf217

[SPARK-8060] Improve DataFrame Python test coverage and documentation. · ce320cb2

Reynold Xin authored 9 years ago

Author: Reynold Xin <rxin@databricks.com>

Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits:

baa8ad5 [Reynold Xin] Code review feedback.
f081d47 [Reynold Xin] More documentation updates.
c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing.

ce320cb2

[SPARK-8032] [PYSPARK] Make version checking for NumPy in MLlib more robust · 452eb82d

MechCoder authored 9 years ago

The current checking does version `1.x' is less than `1.4' this will fail if x has greater than 1 digit, since x > 4, however `1.x` < `1.4`

It fails in my system since I have version `1.10` :P

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6579 from MechCoder/np_ver and squashes the following commits:

15430f8 [MechCoder] fix syntax error
893fb7e [MechCoder] remove equal to
e35f0d4 [MechCoder] minor
e89376c [MechCoder] Better checking
22703dd [MechCoder] [SPARK-8032] Make version checking for NumPy in MLlib more robust

452eb82d

[SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc · 43adbd56

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-8043

I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits:

a01a206 [Yuhao Yang] fix for Gaussian mixture
2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc

43adbd56

[MINOR] make the launcher project name consistent with others · ccaa8232

WangTaoTheTonic authored 9 years ago

I found this by chance while building spark and think it is better to keep its name consistent with other sub-projects (Spark Project *).

I am not gonna file JIRA as it is a pretty small issue.

Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #6603 from WangTaoTheTonic/projName and squashes the following commits:

994b3ba [WangTaoTheTonic] make the project name consistent

ccaa8232

[SPARK-8053] [MLLIB] renamed scalingVector to scalingVec · 07c16cb5

Joseph K. Bradley authored 9 years ago

I searched the Spark codebase for all occurrences of "scalingVector"

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6596 from jkbradley/scalingVec-rename and squashes the following commits:

d3812f8 [Joseph K. Bradley] renamed scalingVector to scalingVec

07c16cb5