Commits · aaec7d4a80ed370847671e9e29ce2e92f1cff2c7 · cs525-sp18-g07 / spark

Feb 22, 2014

SPARK-1117: update accumulator docs · aaec7d4a

Xiangrui Meng authored 11 years ago

The current doc hints spark doesn't support accumulators of type `Long`, which is wrong.

JIRA: https://spark-project.atlassian.net/browse/SPARK-1117

Author: Xiangrui Meng <meng@databricks.com>

Closes #631 from mengxr/acc and squashes the following commits:

45ecd25 [Xiangrui Meng] update accumulator docs

aaec7d4a

Feb 21, 2014

[SPARK-1113] External spilling - fix Int.MaxValue hash code collision bug · fefd22f4

Andrew Or authored 11 years ago

The original poster of this bug is @guojc, who opened a PR that preceded this one at https://github.com/apache/incubator-spark/pull/612.

ExternalAppendOnlyMap uses key hash code to order the buffer streams from which spilled files are read back into memory. When a buffer stream is empty, the default hash code for that stream is equal to Int.MaxValue. This is, however, a perfectly legitimate candidate for a key hash code. When reading from a spilled map containing such a key, a hash collision may occur, in which case we attempt to read from an empty stream and throw NoSuchElementException.

The fix is to maintain the invariant that empty buffer streams are never added back to the merge queue to be considered. This guarantees that we never read from an empty buffer stream, ever again.

This PR also includes two new tests for hash collisions.

Author: Andrew Or <andrewor14@gmail.com>

Closes #624 from andrewor14/spilling-bug and squashes the following commits:

9e7263d [Andrew Or] Slightly optimize next()
2037ae2 [Andrew Or] Move a few comments around...
cf95942 [Andrew Or] Remove default value of Int.MaxValue for minKeyHash
c11f03b [Andrew Or] Fix Int.MaxValue hash collision bug in ExternalAppendOnlyMap
21c1a39 [Andrew Or] Add hash collision tests to ExternalAppendOnlyMapSuite

fefd22f4

MLLIB-25: Implicit ALS runs out of memory for moderately large numbers of features · c8a4c9b1

Sean Owen authored 11 years ago

There's a step in implicit ALS where the matrix `Yt * Y` is computed. It's computed as the sum of matrices; an f x f matrix is created for each of n user/item rows in a partition. In `ALS.scala:214`:

```
        factors.flatMapValues{ case factorArray =>
          factorArray.map{ vector =>
            val x = new DoubleMatrix(vector)
            x.mmul(x.transpose())
          }
        }.reduceByKeyLocally((a, b) => a.addi(b))
         .values
         .reduce((a, b) => a.addi(b))
```

Completely correct, but there's a subtle but quite large memory problem here. map() is going to create all of these matrices in memory at once, when they don't need to ever all exist at the same time.
For example, if a partition has n = 100000 rows, and f = 200, then this intermediate product requires 32GB of heap. The computation will never work unless you can cough up workers with (more than) that much heap.

Fortunately there's a trivial change that fixes it; just add `.view` in there.

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/ALSMatrixAllocationOptimization and squashes the following commits:

062cda9 [Sean Owen] Update style per review comments
e9a5d63 [Sean Owen] Avoid unnecessary out of memory situation by not simultaneously allocating lots of matrices

c8a4c9b1

SPARK-1111: URL Validation Throws Error for HDFS URL's · 45b15e27

Patrick Wendell authored 11 years ago

Fixes an error where HDFS URL's cause an exception. Should be merged into master and 0.9.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #625 from pwendell/url-validation and squashes the following commits:

d14bfe3 [Patrick Wendell] SPARK-1111: URL Validation Throws Error for HDFS URL's

45b15e27

Feb 20, 2014

SPARK-1114: Allow PySpark to use existing JVM and Gateway · 59b13795

Ahir Reddy authored 11 years ago

Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.

Author: Ahir Reddy <ahirreddy@gmail.com>

Closes #622 from ahirreddy/pyspark-existing-jvm and squashes the following commits:

a86f457 [Ahir Reddy] Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.

59b13795

Super minor: Add require for mergeCombiners in combineByKey · 3fede483

Aaron Davidson authored 11 years ago

We changed the behavior in 0.9.0 from requiring that mergeCombiners be null when mapSideCombine was false to requiring that mergeCombiners *never* be null, for external sorting. This patch adds a require() to make this behavior change explicitly messaged rather than resulting in a NPE.

Author: Aaron Davidson <aaron@databricks.com>

Closes #623 from aarondav/master and squashes the following commits:

520b80c [Aaron Davidson] Super minor: Add require for mergeCombiners in combineByKey

3fede483

MLLIB-22. Support negative implicit input in ALS · 9e63f80e

Sean Owen authored 11 years ago

I'm back with another less trivial suggestion for ALS:

In ALS for implicit feedback, input values are treated as weights on squared-errors in a loss function (or rather, the weight is a simple function of the input r, like c = 1 + alpha*r). The paper on which it's based assumes that the input is positive. Indeed, if the input is negative, it will create a negative weight on squared-errors, which causes things to go haywire. The optimization will try to make the error in a cell as large possible, and the result is silently bogus.

There is a good use case for negative input values though. Implicit feedback is usually collected from signals of positive interaction like a view or like or buy, but equally, can come from "not interested" signals. The natural representation is negative values.

The algorithm can be extended quite simply to provide a sound interpretation of these values: negative values should encourage the factorization to come up with 0 for cells with large negative input values, just as much as positive values encourage it to come up with 1.

The implications for the algorithm are simple:
* the confidence function value must not be negative, and so can become 1 + alpha*|r|
* the matrix P should have a value 1 where the input R is _positive_, not merely where it is non-zero. Actually, that's what the paper already says, it's just that we can't assume P = 1 when a cell in R is specified anymore, since it may be negative

This in turn entails just a few lines of code change in `ALS.scala`:
* `rs(i)` becomes `abs(rs(i))`
* When constructing `userXy(us(i))`, it's implicitly only adding where P is 1. That had been true for any us(i) that is iterated over, before, since these are exactly the ones for which P is 1. But now P is zero where rs(i) <= 0, and should not be added

I think it's a safe change because:
* It doesn't change any existing behavior (unless you're using negative values, in which case results are already borked)
* It's the simplest direct extension of the paper's algorithm
* (I've used it to good effect in production FWIW)

Tests included.

I tweaked minor things en route:
* `ALS.scala` javadoc writes "R = Xt*Y" when the paper and rest of code defines it as "R = X*Yt"
* RMSE in the ALS tests uses a confidence-weighted mean, but the denominator is not actually sum of weights

Excuse my Scala style; I'm sure it needs tweaks.

Author: Sean Owen <sowen@cloudera.com>

Closes #500 from srowen/ALSNegativeImplicitInput and squashes the following commits:

cf902a9 [Sean Owen] Support negative implicit input in ALS
953be1c [Sean Owen] Make weighted RMSE in ALS test actually weighted; adjust comment about R = X*Yt

9e63f80e

MLLIB-24: url of "Collaborative Filtering for Implicit Feedback Datasets" in ALS is invalid now · f9b7d64a

Chen Chao authored 11 years ago

url of "Collaborative Filtering for Implicit Feedback Datasets" is invalid now. A new url is provided. http://research.yahoo.com/files/HuKorenVolinsky-ICDM08.pdf

Author: Chen Chao <crazyjvm@gmail.com>

Closes #619 from CrazyJvm/master and squashes the following commits:

a0b54e4 [Chen Chao] change url to IEEE
9e0e9f0 [Chen Chao] correct spell mistale
fcfab5d [Chen Chao] wrap line to to fit within 100 chars
590d56e [Chen Chao] url error

f9b7d64a

Feb 19, 2014

[SPARK-1105] fix site scala version error in docs · 7b012c93

CodingCat authored 11 years ago

https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <zhunansjtu@gmail.com>

Closes #618 from CodingCat/doc_version and squashes the following commits:

39bb8aa [CodingCat] more fixes
65bedb0 [CodingCat] fix site scala version error in doc

7b012c93

Feb 18, 2014

SPARK-1106: check key name and identity file before launch a cluster · b61435c7

Xiangrui Meng authored 11 years ago

I launched an EC2 cluster without providing a key name and an identity file. The error showed up after two minutes. It would be good to check those options before launch, given the fact that EC2 billing rounds up to hours.

JIRA: https://spark-project.atlassian.net/browse/SPARK-1106

Author: Xiangrui Meng <meng@databricks.com>

Closes #617 from mengxr/ec2 and squashes the following commits:

2dfb316 [Xiangrui Meng] check key name and identity file before launch a cluster

b61435c7

Revert "[SPARK-1105] fix site scala version error in doc" · d9bb32a7
Patrick Wendell authored 11 years ago
```
This reverts commit d99773d5.
```
d9bb32a7

[SPARK-1105] fix site scala version error in doc · d99773d5

CodingCat authored 11 years ago

https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <zhunansjtu@gmail.com>

Closes #616 from CodingCat/doc_version and squashes the following commits:

eafd99a [CodingCat] fix site scala version error in doc

d99773d5

Optimized imports · ccb327a4

NirmalReddy authored 11 years ago

Optimized imports and arranged according to scala style guide @
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports

Author: NirmalReddy <nirmal.reddy@imaginea.com>
Author: NirmalReddy <nirmal_reddy2000@yahoo.com>

Closes #613 from NirmalReddy/opt-imports and squashes the following commits:

578b4f5 [NirmalReddy] imported java.lang.Double as JDouble
a2cbcc5 [NirmalReddy] addressed the comments
776d664 [NirmalReddy] Optimized imports in core

ccb327a4

Feb 17, 2014

SPARK-1098: Minor cleanup of ClassTag usage in Java API · f74ae0eb

Aaron Davidson authored 11 years ago

Our usage of fake ClassTags in this manner is probably not healthy, but I'm not sure if there's a better solution available, so I just cleaned up and documented the current one.

Author: Aaron Davidson <aaron@databricks.com>

Closes #604 from aarondav/master and squashes the following commits:

b398e89 [Aaron Davidson] SPARK-1098: Minor cleanup of ClassTag usage in Java API

f74ae0eb

[SPARK-1090] improvement on spark_shell (help information, configure memory) · e0d49ad2

CodingCat authored 11 years ago

https://spark-project.atlassian.net/browse/SPARK-1090

spark-shell should print help information about parameters and should allow user to configure exe memory
there is no document about hot to set --cores/-c in spark-shell

and also

users should be able to set executor memory through command line options

In this PR I also check the format of the options passed by the user

Author: CodingCat <zhunansjtu@gmail.com>

Closes #599 from CodingCat/spark_shell_improve and squashes the following commits:

de5aa38 [CodingCat] add parameter to set driver memory
915cbf8 [CodingCat] improvement on spark_shell (help information, configure memory)

e0d49ad2

Fix typos in Spark Streaming programming guide · 767e3ae1

Andrew Or authored 11 years ago

Author: Andrew Or <andrewor14@gmail.com>

Closes #536 from andrewor14/streaming-typos and squashes the following commits:

a05faa6 [Andrew Or] Fix broken link and wording
bc2e4bc [Andrew Or] Merge github.com:apache/incubator-spark into streaming-typos
d5515b4 [Andrew Or] TD's comments
767ef12 [Andrew Or] Fix broken links
8f4c731 [Andrew Or] Fix typos in programming guide

767e3ae1

Worker registration logging fix · c0795cf4

Andrew Ash authored 11 years ago

Author: Andrew Ash <andrew@andrewash.com>

Closes #608 from ash211/patch-7 and squashes the following commits:

bd85f2a [Andrew Ash] Worker registration logging fix

c0795cf4

Feb 16, 2014

Add subtractByKey to the JavaPairRDD wrapper · 5af4477c

Punya Biswal authored 11 years ago

Author: Punya Biswal <pbiswal@palantir.com>

Closes #600 from punya/subtractByKey-java and squashes the following commits:

e961913 [Punya Biswal] Hide implicit ClassTags from Java API
c5d317b [Punya Biswal] Add subtractByKey to the JavaPairRDD wrapper

5af4477c

fix for https://spark-project.atlassian.net/browse/SPARK-1052 · 73cfdcfe

Bijay Bisht authored 11 years ago

Author: Bijay Bisht <bijay.bisht@gmail.com>

Closes #568 from bijaybisht/SPARK-1052 and squashes the following commits:

da70395 [Bijay Bisht] fix for https://spark-project.atlassian.net/browse/SPARK-1052 - comments incorporated
fdb1d94 [Bijay Bisht] fix for https://spark-project.atlassian.net/browse/SPARK-1052



(cherry picked from commit e797c1ab)
Signed-off-by: Aaron Davidson <aaron@databricks.com>

73cfdcfe

[SPARK-1092] print warning information if user use SPARK_MEM to regulate executor memory usage · 1cad3813

CodingCat authored 11 years ago

https://spark-project.atlassian.net/browse/SPARK-1092?jql=project%20%3D%20SPARK

print warning information if user set SPARK_MEM to regulate memory usage of executors

----
OUTDATED:

Currently, users will usually set SPARK_MEM to control the memory usage of driver programs, (in spark-class)
91 JAVA_OPTS="$OUR_JAVA_OPTS"
92 JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
93 JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"
if they didn't set spark.executor.memory, the value in this environment variable will also affect the memory usage of executors, because the following lines in SparkContext
privatespark val executorMemory = conf.getOption("spark.executor.memory")
.orElse(Option(System.getenv("SPARK_MEM")))
.map(Utils.memoryStringToMb)
.getOrElse(512)
also
since SPARK_MEM has been (proposed to) deprecated in SPARK-929 (https://spark-project.atlassian.net/browse/SPARK-929) and the corresponding PR (https://github.com/apache/incubator-spark/pull/104)
we should remove this line

Author: CodingCat <zhunansjtu@gmail.com>

Closes #602 from CodingCat/clean_spark_mem and squashes the following commits:

302bb28 [CodingCat] print warning information if user use SPARK_MEM to regulate executor memory usage

1cad3813

Feb 14, 2014

Typo: Standlone -> Standalone · eec4bd1a

Andrew Ash authored 11 years ago

Author: Andrew Ash <andrew@andrewash.com>

Closes #601 from ash211/typo and squashes the following commits:

9cd43ac [Andrew Ash] Change docs references to metrics.properties, not metrics.conf
3813ff1 [Andrew Ash] Typo: mulitcast -> multicast
873bd2f [Andrew Ash] Typo: Standlone -> Standalone

eec4bd1a

Feb 13, 2014

Merge pull request #598 from shivaram/master. · 2414ed31

Shivaram Venkataraman authored 11 years ago

Update spark_ec2 to use 0.9.0 by default

Backports change from branch-0.9

Author: Shivaram Venkataraman <shivaram@eecs.berkeley.edu>

Closes #598 and squashes the following commits:

f6d3ed0 [Shivaram Venkataraman] Update spark_ec2 to use 0.9.0 by default Backports change from branch-0.9

2414ed31

Add c3 instance types to Spark EC2 · 5fa53c02

Christian Lundgren authored 11 years ago

The number of disks for the c3 instance types taken from here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#StorageOnInstanceTypes



Author: Christian Lundgren <christian.lundgren@gameanalytics.com>

Closes #595 from chrisavl/branch-0.9 and squashes the following commits:

c8af5f9 [Christian Lundgren] Add c3 instance types to Spark EC2
(cherry picked from commit 19b4bb2b)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>

5fa53c02

Ported hadoopClient jar for < 1.0.1 fix · a3bb8617

Bijay Bisht authored 11 years ago


#522 got messed after i rewrote the branch hadoop_jar_name. So created a new one.

Author: Bijay Bisht <bijay.bisht@gmail.com>

Closes #584 from bijaybisht/hadoop_jar_name_on_0.9.0 and squashes the following commits:

1b6fb3c [Bijay Bisht] Ported hadoopClient jar for < 1.0.1 fix
(cherry picked from commit 8093de1b)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>

a3bb8617

SPARK-1073 Keep GitHub pull request title as commit summary · 6ee0ad8f

Andrew Ash authored 11 years ago

The first line of a git commit message is the line that's used with many git
tools as the most concise textual description of that message.  The most
common use that I see is in the short log, which is a one line per commit
log of recent commits.

This commit moves the line

  Merge pull request #%s from %s.

Lower into the message to reserve the first line of the resulting commit for
the much more important pull request title.

http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

Author: Andrew Ash <andrew@andrewash.com>

Closes #574 from ash211/gh-pr-merge-title and squashes the following commits:

b240823 [Andrew Ash] More merge_message improvements
d2986db [Andrew Ash] Keep GitHub pull request title as commit summary

6ee0ad8f

Merge pull request #592 from rxin/test. · 7fe7a55c

Reynold Xin authored 11 years ago

SPARK-1088: Create a script for running tests so we can have version specific testing on Jenkins.

@pwendell

Author: Reynold Xin <rxin@apache.org>

Closes #592 and squashes the following commits:

be02359 [Reynold Xin] SPARK-1088: Create a script for running tests so we can have version specific testing on Jenkins.

7fe7a55c

Feb 12, 2014

Merge pull request #591 from mengxr/transient-new. · 7e29e027

Xiangrui Meng authored 11 years ago

SPARK-1076: [Fix #578] add @transient to some vals

I'll try to be more careful next time.

Author: Xiangrui Meng <meng@databricks.com>

Closes #591 and squashes the following commits:

2b4f044 [Xiangrui Meng] add @transient to prev in ZippedWithIndexRDD add @transient to seed in PartitionwiseSampledRDD

7e29e027

Merge pull request #589 from mengxr/index. · 2bea0709

Xiangrui Meng authored 11 years ago

SPARK-1076: Convert Int to Long to avoid overflow

Patch for PR #578.

Author: Xiangrui Meng <meng@databricks.com>

Closes #589 and squashes the following commits:

98c435e [Xiangrui Meng] cast Int to Long to avoid Int overflow

2bea0709

Merge pull request #578 from mengxr/rank. · e733d655

Xiangrui Meng authored 11 years ago

SPARK-1076: zipWithIndex and zipWithUniqueId to RDD

Assign ranks to an ordered or unordered data set is a common operation. This could be done by first counting records in each partition and then assign ranks in parallel.

The purpose of assigning ranks to an unordered set is usually to get a unique id for each item, e.g., to map feature names to feature indices. In such cases, the assignment could be done without counting records, saving one spark job.

https://spark-project.atlassian.net/browse/SPARK-1076

== update ==
Because assigning ranks is very similar to Scala's zipWithIndex, I changed the method name to zipWithIndex and put the index in the value field.

Author: Xiangrui Meng <meng@databricks.com>

Closes #578 and squashes the following commits:

52a05e1 [Xiangrui Meng] changed assignRanks to zipWithIndex changed assignUniqueIds to zipWithUniqueId minor updates
756881c [Xiangrui Meng] simplified RankedRDD by implementing assignUniqueIds separately moved couting iterator size to Utils do not count items in the last partition and skip counting if there is only one partition
630868c [Xiangrui Meng] newline
21b434b [Xiangrui Meng] add assignRanks and assignUniqueIds to RDD

e733d655

Merge pull request #583 from colorant/zookeeper. · 68b2c0d0

Raymond Liu authored 11 years ago

Minor fix for ZooKeeperPersistenceEngine to use configured working dir

Author: Raymond Liu <raymond.liu@intel.com>

Closes #583 and squashes the following commits:

91b0609 [Raymond Liu] Minor fix for ZooKeeperPersistenceEngine to use configured working dir

68b2c0d0

Feb 11, 2014

Merge pull request #571 from holdenk/switchtobinarysearch. · b0dab1bb

Holden Karau authored 11 years ago

SPARK-1072 Use binary search when needed in RangePartioner

Author: Holden Karau <holden@pigscanfly.ca>

Closes #571 and squashes the following commits:

f31a2e1 [Holden Karau] Swith to using CollectionsUtils in Partitioner
4c7a0c3 [Holden Karau] Add CollectionsUtil as suggested by aarondav
7099962 [Holden Karau] Add the binary search to only init once
1bef01d [Holden Karau] CR feedback
a21e097 [Holden Karau] Use binary search if we have more than 1000 elements inside of RangePartitioner

b0dab1bb

Merge pull request #577 from hsaputra/fix_simple_streaming_doc. · ba38d989

Henry Saputra authored 11 years ago

SPARK-1075 Fix doc in the Spark Streaming custom receiver closing bracket in the class constructor

The closing parentheses in the constructor in the first code block example is reversed:
diff --git a/docs/streaming-custom-receivers.md b/docs/streaming-custom-receivers.md
index 4e27d65..3fb540c 100644
— a/docs/streaming-custom-receivers.md
+++ b/docs/streaming-custom-receivers.md
@@ -14,7 +14,7 @@ This starts with implementing NetworkReceiver(api/streaming/index.html#org.apa
The following is a simple socket text-stream receiver.
{% highlight scala %}
class SocketTextStreamReceiver(host: String, port: Int(
+ class SocketTextStreamReceiver(host: String, port: Int)
extends NetworkReceiverString
{
protected lazy val blocksGenerator: BlockGenerator =

Author: Henry Saputra <henry@platfora.com>

Closes #577 and squashes the following commits:

6508341 [Henry Saputra] SPARK-1075 Fix doc in the Spark Streaming custom receiver.

ba38d989

Merge pull request #579 from CrazyJvm/patch-1. · 4afe6ccf

Chen Chao authored 11 years ago

"in the source DStream" rather than "int the source DStream"

"flatMap is a one-to-many DStream operation that creates a new DStream by generating multiple new records from each record int the source DStream."

Author: Chen Chao <crazyjvm@gmail.com>

Closes #579 and squashes the following commits:

4abcae3 [Chen Chao] in the source DStream

4afe6ccf

Feb 10, 2014

Revert "Merge pull request #560 from pwendell/logging. Closes #560." · d6a9bdc0
Patrick Wendell authored 11 years ago
```
This reverts commit b6d40b78.
```
d6a9bdc0

Merge pull request #567 from ScrapCodes/style2. · 919bd7f6

Prashant Sharma authored 11 years ago

SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Pt 2

Continuation of PR #557

With this all scala style errors are fixed across the code base !!

The reason for creating a separate PR was to not interrupt an already reviewed and ready to merge PR. Hope this gets reviewed soon and merged too.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #567 and squashes the following commits:

3b1ec30 [Prashant Sharma] scala style fixes

919bd7f6

Feb 09, 2014

Merge pull request #566 from martinjaggi/copy-MLlib-d. · 2182aa3c

Martin Jaggi authored 11 years ago

new MLlib documentation for optimization, regression and classification

new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.

this is the rebased branch, i deleted the old PR

jira:
https://spark-project.atlassian.net/browse/MLLIB-19

Author: Martin Jaggi <m.jaggi@gmail.com>

Closes #566 and squashes the following commits:

5f0f31e [Martin Jaggi] line wrap at 100 chars
4e094fb [Martin Jaggi] better description of GradientDescent
1d6965d [Martin Jaggi] remove broken url
ea569c3 [Martin Jaggi] telling what updater actually does
964732b [Martin Jaggi] lambda R() in documentation
a6c6228 [Martin Jaggi] better comments in SGD code for regression
b32224a [Martin Jaggi] new optimization documentation
d5dfef7 [Martin Jaggi] new classification and regression documentation
b07ead6 [Martin Jaggi] correct scaling for MSE loss
ba6158c [Martin Jaggi] use d for the number of features
bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient

2182aa3c

Merge pull request #551 from qqsun8819/json-protocol. · afc8f3cb

qqsun8819 authored 11 years ago

[SPARK-1038] Add more fields in JsonProtocol and add tests that verify the JSON itself

This is a PR for SPARK-1038. Two major changes:
1 add some fields to JsonProtocol which is new and important to standalone-related data structures
2 Use Diff in liftweb.json to verity the stringified Json output for detecting someone mod type T to Option[T]

Author: qqsun8819 <jin.oyj@alibaba-inc.com>

Closes #551 and squashes the following commits:

fdf0b4e [qqsun8819] [SPARK-1038] 1. Change code style for more readable according to rxin review 2. change submitdate hard-coded string to a date object toString for more complexiblity
095a26f [qqsun8819] [SPARK-1038] mod according to  review of pwendel, use hard-coded json string for json data validation. Each test use its own json string
0524e41 [qqsun8819] Merge remote-tracking branch 'upstream/master' into json-protocol
d203d5c [qqsun8819] [SPARK-1038] Add more fields in JsonProtocol and add tests that verify the JSON itself

afc8f3cb

Merge pull request #569 from pwendell/merge-fixes. · 94ccf869

Patrick Wendell authored 11 years ago

Fixes bug where merges won't close associated pull request.

Previously we added "Closes #XX" in the title. Github will sometimes
linbreak the title in a way that causes this to not work. This patch
instead adds the line in the body.

This also makes the commit format more concise for merge commits.
We might consider just dropping those in the future.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #569 and squashes the following commits:

732eba1 [Patrick Wendell] Fixes bug where merges won't close associated pull request.

94ccf869

Merge pull request #557 from ScrapCodes/style. Closes #557. · b69f8b2a

Patrick Wendell authored 11 years ago

SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build.

Author: Patrick Wendell <pwendell@gmail.com>
Author: Prashant Sharma <scrapcodes@gmail.com>

== Merge branch commits ==

commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4
Author: Prashant Sharma <scrapcodes@gmail.com>
Date:   Sun Feb 9 17:39:07 2014 +0530

    scala style fixes

commit f91709887a8e0b608c5c2b282db19b8a44d53a43
Author: Patrick Wendell <pwendell@gmail.com>
Date:   Fri Jan 24 11:22:53 2014 -0800

    Adding scalastyle snapshot

b69f8b2a

Merge pull request #556 from CodingCat/JettyUtil. Closes #556. · b6dba10a

CodingCat authored 11 years ago

[SPARK-1060] startJettyServer should explicitly use IP information

https://spark-project.atlassian.net/browse/SPARK-1060

In the current implementation, the webserver in Master/Worker is started with

val (srv, bPort) = JettyUtils.startJettyServer("0.0.0.0", port, handlers)

inside startJettyServer:

val server = new Server(currentPort) //here, the Server will take "0.0.0.0" as the hostname, i.e. will always bind to the IP address of the first NIC

this can cause wrong IP binding, e.g. if the host has two NICs, N1 and N2, the user specify the SPARK_LOCAL_IP as the N2's IP address, however, when starting the web server, for the reason stated above, it will always bind to the N1's address

Author: CodingCat <zhunansjtu@gmail.com>

== Merge branch commits ==

commit 6c6d9a8ccc9ec4590678a3b34cb03df19092029d
Author: CodingCat <zhunansjtu@gmail.com>
Date:   Thu Feb 6 14:53:34 2014 -0500

    startJettyServer should explicitly use IP information

b6dba10a