Commits · d3f87dc39480f075170817bbd00142967a938078 · cs525-sp18-g07 / spark

Aug 28, 2015

[SPARK-10325] Override hashCode() for public Row · d3f87dc3

Josh Rosen authored 9 years ago

This commit fixes an issue where the public SQL `Row` class did not override `hashCode`, causing it to violate the hashCode() + equals() contract. To fix this, I simply ported the `hashCode` implementation from the 1.4.x version of `Row`.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8500 from JoshRosen/SPARK-10325 and squashes the following commits:

51ffea1 [Josh Rosen] Override hashCode() for public Row.

d3f87dc3

[SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarnings · 499e8e15

Luciano Resende authored 9 years ago

This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~'

Author: Luciano Resende <lresende@apache.org>

Closes #8343 from lresende/SPARK-8952.

499e8e15

[SPARK-9890] [DOC] [ML] User guide for CountVectorizer · e2a84309

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-9890

document with Scala and java examples

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8487 from hhbyyh/cvDoc.

e2a84309

[YARN] [MINOR] Avoid hard code port number in YarnShuffleService test · 1502a0f6

jerryshao authored 9 years ago

Current port number is fixed as default (7337) in test, this will introduce port contention exception, better to change to a random number in unit test.

squito , seems you're author of this unit test, mind taking a look at this fix? Thanks a lot.

```
[info] - executor state kept across NM restart *** FAILED *** (597 milliseconds)
[info]   org.apache.hadoop.service.ServiceStateException: java.net.BindException: Address already in use
[info]   at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
[info]   at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
[info]   at org.apache.spark.network.yarn.YarnShuffleServiceSuite$$anonfun$1.apply$mcV$sp(YarnShuffleServiceSuite.scala:72)
[info]   at org.apache.spark.network.yarn.YarnShuffleServiceSuite$$anonfun$1.apply(YarnShuffleServiceSuite.scala:70)
[info]   at org.apache.spark.network.yarn.YarnShuffleServiceSuite$$anonfun$1.apply(YarnShuffleServiceSuite.scala:70)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
...
```

Author: jerryshao <sshao@hortonworks.com>

Closes #8502 from jerryshao/avoid-hardcode-port.

1502a0f6

typo in comment · 71a077f6

Dharmesh Kakadia authored 9 years ago

Author: Dharmesh Kakadia <dharmeshkakadia@users.noreply.github.com>

Closes #8497 from dharmeshkakadia/patch-2.

71a077f6

Fix DynamodDB/DynamoDB typo in Kinesis Integration doc · 18294cd8

Keiji Yoshida authored 9 years ago

Fix DynamodDB/DynamoDB typo in Kinesis Integration doc

Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>

Closes #8501 from yosssi/patch-1.

18294cd8

[SPARK-10295] [CORE] Dynamic allocation in Mesos does not release when RDDs are cached · cc398030

Sean Owen authored 9 years ago

Remove obsolete warning about dynamic allocation not working with cached RDDs

See discussion in https://issues.apache.org/jira/browse/SPARK-10295

Author: Sean Owen <sowen@cloudera.com>

Closes #8489 from srowen/SPARK-10295.

cc398030

[SPARK-10260] [ML] Add @Since annotation to ml.clustering · 4eeda8d4

Yu ISHIKAWA authored 9 years ago

### JIRA
[[SPARK-10260] Add Since annotation to ml.clustering - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10260)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8455 from yu-iskw/SPARK-10260.

4eeda8d4

[SPARK-10328] [SPARKR] Fix generic for na.omit · 2f99c372

Shivaram Venkataraman authored 9 years ago

S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8495 from shivaram/na-omit-fix.

2f99c372

[SPARK-10188] [PYSPARK] Pyspark CrossValidator with RMSE selects incorrect model · 7583681e

noelsmith authored 9 years ago

* Added isLargerBetter() method to Pyspark Evaluator to match the Scala version.
* JavaEvaluator delegates isLargerBetter() to underlying Scala object.
* Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax.
* Added test cases for where smaller is better (RMSE) and larger is better (R-Squared).

(This contribution is my original work and that I license the work to the project under Sparks' open source license)

Author: noelsmith <mail@noelsmith.com>

Closes #8399 from noel-smith/pyspark-rmse-xval-fix.

7583681e

[SPARK-SQL] [MINOR] Fixes some typos in HiveContext · 89b94343
Cheng Lian authored 9 years ago
```
Author: Cheng Lian <lian@databricks.com>

Closes #8481 from liancheng/hive-context-typo.
```
89b94343

Aug 27, 2015

[SPARK-9905] [ML] [DOC] Adds LinearRegressionSummary user guide · af0e1249

Feynman Liang authored 9 years ago

* Adds user guide for `LinearRegressionSummary`
* Fixes unresolved issues in  #8197

CC jkbradley mengxr

Author: Feynman Liang <fliang@databricks.com>

Closes #8491 from feynmanliang/SPARK-9905.

af0e1249

[SPARK-9911] [DOC] [ML] Update Userguide for Evaluator · 30734d45

MechCoder authored 9 years ago

I added a small note about the different types of evaluator and the metrics used.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #8304 from MechCoder/multiclass_evaluator.

30734d45

[SPARK-8505] [SPARKR] Add settings to kick `lint-r` from `./dev/run-test.py` · 1f90c5e2

Yu ISHIKAWA authored 9 years ago

JoshRosen we'd like to check the SparkR source code with the `dev/lint-r` script on the Jenkins. I tried to incorporate the script into `dev/run-test.py`. Could you review it when you have time?

shivaram I modified `dev/lint-r` and `dev/lint-r.R` to install lintr package into a local directory(`R/lib/`) and to exit with a lint status. Could you review it?

- [[SPARK-8505] Add settings to kick `lint-r` from `./dev/run-test.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8505)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #7883 from yu-iskw/SPARK-8505.

1f90c5e2

[SPARK-10321] sizeInBytes in HadoopFsRelation · 54cda0de

Davies Liu authored 9 years ago

Having sizeInBytes in HadoopFsRelation to enable broadcast join.

cc marmbrus

Author: Davies Liu <davies@databricks.com>

Closes #8490 from davies/sizeInByte.

54cda0de

[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path · b3dd569a

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-10287

After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).

Author: Yin Huai <yhuai@databricks.com>

Closes #8469 from yhuai/jsonRefresh.

b3dd569a

[SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java compatibility test · 5bfe9e11

Feynman Liang authored 9 years ago

* Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
* Cleans up scaladocs for public methods
* Adds test for Java compatibility
* Follow up Python user guide code example is tracked by SPARK-10249

Author: Feynman Liang <fliang@databricks.com>

Closes #8436 from feynmanliang/SPARK-10230.

5bfe9e11

[SPARK-9906] [ML] User guide for LogisticRegressionSummary · c94ecdfc

MechCoder authored 9 years ago

User guide for LogisticRegression summaries

Author: MechCoder <manojkumarsivaraj334@gmail.com>
Author: Manoj Kumar <mks542@nyu.edu>
Author: Feynman Liang <fliang@databricks.com>

Closes #8197 from MechCoder/log_summary_user_guide.

c94ecdfc

[SPARK-9901] User guide for RowMatrix Tall-and-skinny QR · 6185cdd2

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-9901

The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #8462 from hhbyyh/qrDoc.

6185cdd2

[SPARK-10315] remove document on spark.akka.failure-detector.threshold · 84baa5e9

CodingCat authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-10315

this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold'

Author: CodingCat <zhunansjtu@gmail.com>

Closes #8483 from CodingCat/SPARK_10315.

84baa5e9

[SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide · dc86a227
Michael Armbrust authored 9 years ago
```
Author: Michael Armbrust <michael@databricks.com>

Closes #8441 from marmbrus/documentation.
```
dc86a227

[SPARK-10182] [MLLIB] GeneralizedLinearModel doesn't unpersist cached data · fdd466be

Vyacheslav Baranov authored 9 years ago

`GeneralizedLinearModel` creates a cached RDD when building a model. It's inconvenient, since these RDDs flood the memory when building several models in a row, so useful data might get evicted from the cache.

The proposed solution is to always cache the dataset & remove the warning. There's a caveat though: input dataset gets evaluated twice, in line 270 when fitting `StandardScaler` for the first time, and when running optimizer for the second time. So, it might worth to return removed warning.

Another possible solution is to disable caching entirely & return removed warning. I don't really know what approach is better.

Author: Vyacheslav Baranov <slavik.baranov@gmail.com>

Closes #8395 from SlavikBaranov/SPARK-10182.

fdd466be

[SPARK-10257] [MLLIB] Removes Guava from all spark.mllib Java tests · e1f4de4a

Feynman Liang authored 9 years ago

* Replaces instances of `Lists.newArrayList` with `Arrays.asList`
* Replaces `commons.lang.StringUtils` over `com.google.collections.Strings`
* Replaces `List` interface over `ArrayList` implementations

This PR along with #8445 #8446 #8447 completely removes all `com.google.collections.Lists` dependencies within mllib's Java tests.

Author: Feynman Liang <fliang@databricks.com>

Closes #8451 from feynmanliang/SPARK-10257.

e1f4de4a

[SPARK-9613] [HOTFIX] Fix usage of JavaConverters removed in Scala 2.11 · b02e8187

Jacek Laskowski authored 9 years ago

Fix for [JavaConverters.asJavaListConverter](http://www.scala-lang.org/api/2.10.5/index.html#scala.collection.JavaConverters$) being removed in 2.11.7 and hence the build fails with the 2.11 profile enabled. Tested with the default 2.10 and 2.11 profiles. BUILD SUCCESS in both cases.

Build for 2.10:

    ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -DskipTests clean install

and 2.11:

    ./dev/change-scala-version.sh 2.11
    ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Dscala-2.11 -DskipTests clean install

Author: Jacek Laskowski <jacek@japila.pl>

Closes #8479 from jaceklaskowski/SPARK-9613-hotfix.

b02e8187

[SPARK-10256] [ML] Removes guava dependency from spark.ml.classification JavaTests · 1a446f75
Feynman Liang authored 9 years ago
```
Author: Feynman Liang <fliang@databricks.com>

Closes #8447 from feynmanliang/SPARK-10256.
```
1a446f75
[SPARK-10255] [ML] Removes Guava dependencies from spark.ml.param JavaTests · 75d62307
Feynman Liang authored 9 years ago
```
Author: Feynman Liang <fliang@databricks.com>

Closes #8446 from feynmanliang/SPARK-10255.
```
75d62307

[SPARK-10254] [ML] Removes Guava dependencies in spark.ml.feature JavaTests · 1650f6f5

Feynman Liang authored 9 years ago

* Replaces `com.google.common` dependencies with `java.util.Arrays`
* Small clean up in `JavaNormalizerSuite`

Author: Feynman Liang <fliang@databricks.com>

Closes #8445 from feynmanliang/SPARK-10254.

1650f6f5

[DOCS] [STREAMING] [KAFKA] Fix typo in exactly once semantics · 9625d13d

Moussa Taifi authored 9 years ago

Fix Typo in exactly once semantics
[Semantics of output operations] link

Author: Moussa Taifi <moutai10@gmail.com>

Closes #8468 from moutai/patch-3.

9625d13d

[SPARK-10251] [CORE] some common types are not registered for Kryo Serializat… · de027828
Ram Sriharsha authored 9 years ago
```
…ion by default

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #8465 from harsha2010/SPARK-10251.
```
de027828

[SPARK-10219] [SPARKR] Fix varargsToEnv and add test case · e936cf80

Shivaram Venkataraman authored 9 years ago

cc sun-rui davies

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8475 from shivaram/varargs-fix.

e936cf80

[SPARK-9964] [PYSPARK] [SQL] PySpark DataFrameReader accept RDD of String for JSON · ce97834d

Yanbo Liang authored 9 years ago

PySpark DataFrameReader should could accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path.
If this PR is merged, it should be duplicated to cover the other input types (not just JSON).

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8444 from yanboliang/spark-9964.

ce97834d

Aug 26, 2015

[SPARK-9424] [SQL] Parquet programming guide updates for 1.5 · 0fac144f
Cheng Lian authored 9 years ago
```
Author: Cheng Lian <lian@databricks.com>

Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
```
0fac144f

[MINOR] [SPARKR] Fix some validation problems in SparkR · 773ca037

Yu ISHIKAWA authored 9 years ago

Getting rid of some validation problems in SparkR
https://github.com/apache/spark/pull/7883

cc shivaram

```
inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
  expect_equal(class(x), "character")
                                     ^~
inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.

^~
R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
                df <- data.frame(row.names = 1 : nrow)
                                                      ^~~~~~~~~~~~~~~~
R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
                    df[[names[colIndex]]] <- vec
                                                ^~~~~~~~~~~~~~~~~~
```

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8474 from yu-iskw/minor-fix-sparkr.

773ca037

[SPARK-10308] [SPARKR] Add %in% to the exported namespace · ad7f0f16

Shivaram Venkataraman authored 9 years ago

I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.

cc yu-iskw

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8473 from shivaram/in-namespace.

ad7f0f16

[SPARK-10305] [SQL] fix create DataFrame from Python class · d41d6c48
Davies Liu authored 9 years ago
```
cc jkbradley

Author: Davies Liu <davies@databricks.com>

Closes #8470 from davies/fix_create_df.
```
d41d6c48

[SPARK-10241] [MLLIB] update since versions in mllib.recommendation · 086d4681

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.recommendation`.

cc srowen coderxiang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8432 from mengxr/SPARK-10241.

086d4681

HOTFIX: Increase PRB timeout · de7209c2
Patrick Wendell authored 9 years ago

de7209c2

[SPARK-9665] [MLLIB] audit MLlib API annotations · 6519fd06

Xiangrui Meng authored 9 years ago

I only found `ml.NaiveBayes` missing `Experimental` annotation. This PR doesn't cover Python APIs.

cc jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #8452 from mengxr/SPARK-9665.

6519fd06

Closes #8443 · bb164052
Reynold Xin authored 9 years ago

bb164052

[SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) · 75d4773a

felixcheung authored 9 years ago

Add support for
```
   df[df$name == "Smith", c(1,2)]
   df[df$age %in% c(19, 30), 1:2]
```

shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8394 from felixcheung/rsubset.

75d4773a