Commits · a3e51cc990812c8099dcaf1f3bd6d5bae45cf8e6 · cs525-sp18-g07 / spark

Dec 27, 2014

[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac · a3e51cc9

Brennon York authored 10 years ago

Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.

Author: Brennon York <brennon.york@capitalone.com>

Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:

0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
b979c58 [Brennon York] updated the merge conflict
c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
14a5da0 [Brennon York] removed unnecessary zinc output on startup
1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
a680d12 [Brennon York] Added comments to functions and tested various mvn calls
bb8cc9d [Brennon York] removed package files
ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
f914dea [Brennon York] Beginning final portions of localized scala home
69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability

a3e51cc9

[SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails · 080ceb77

GuoQiang Li authored 10 years ago

Author: GuoQiang Li <witgo@qq.com>

Closes #3788 from witgo/SPARK-4952 and squashes the following commits:

d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails

080ceb77

[SPARK-4954][Core] add spark version infomation in log for standalone mode · 786808ab

Zhang, Liye authored 10 years ago

The master and worker spark version may be not the same with Driver spark version. That is because spark Jar file might be replaced for new application without restarting the spark cluster. So there shall log out the spark-version in both Mater and Worker log.

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #3790 from liyezhang556520/version4Standalone and squashes the following commits:

e05e1e3 [Zhang, Liye] add spark version infomation in log for standalone mode

786808ab

[SPARK-3955] Different versions between jackson-mapper-asl and jackson-c... · 2483c1ef

Jongyoul Lee authored 10 years ago

...ore-asl

- set the same version to jackson-mapper-asl and jackson-core-asl
- It's related with #2818
- coded a same patch from a latest master

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #3716 from jongyoul/SPARK-3955 and squashes the following commits:

efa29aa [Jongyoul Lee] [SPARK-3955] Different versions between jackson-mapper-asl and jackson-core-asl - set the same version to jackson-mapper-asl and jackson-core-asl

2483c1ef

HOTFIX: Slight tweak on previous commit. · 82bf4bee
Patrick Wendell authored 10 years ago
```
Meant to merge this in when committing SPARK-3787.
```
82bf4bee

[SPARK-3787][BUILD] Assembly jar name is wrong when we build with sbt omitting -Dhadoop.version · de95c57a

Kousuke Saruta authored 10 years ago

This PR is another solution for When we build with sbt with profile for hadoop and without property for hadoop version like:

sbt/sbt -Phadoop-2.2 assembly

jar name is always used default version (1.0.4).

When we build with maven with same condition for sbt, default version for each profile is used.
For instance, if we build like:

mvn -Phadoop-2.2 package

jar name is used hadoop2.2.0 as a default version of hadoop-2.2.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3046 from sarutak/fix-assembly-jarname-2 and squashes the following commits:

41ef90e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname-2
50c8676 [Kousuke Saruta] Merge branch 'fix-assembly-jarname-2' of github.com:sarutak/spark into fix-assembly-jarname-2
52a1cd2 [Kousuke Saruta] Fixed comflicts
dd30768 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname2
f1c90bb [Kousuke Saruta] Fixed SparkBuild.scala in order to read `hadoop.version` property from pom.xml
af6b100 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
c81806b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
ad1f96e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
b2318eb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
5fc1259 [Kousuke Saruta] Fixed typo.
eebbb7d [Kousuke Saruta] Fixed wrong jar name

de95c57a

MAINTENANCE: Automated closing of pull requests. · 534f24b2

Patrick Wendell authored 10 years ago

This commit exists to close the following pull requests on Github:

Closes #3456 (close requested by 'pwendell')
Closes #1602 (close requested by 'tdas')
Closes #2633 (close requested by 'tdas')
Closes #2059 (close requested by 'JoshRosen')
Closes #2348 (close requested by 'tdas')
Closes #3662 (close requested by 'tdas')
Closes #2031 (close requested by 'andrewor14')
Closes #265 (close requested by 'JoshRosen')

534f24b2

Dec 26, 2014

SPARK-4971: Fix typo in BlockGenerator comment · fda4331d

CodingCat authored 10 years ago

Author: CodingCat <zhunansjtu@gmail.com>

Closes #3807 from CodingCat/new_branch and squashes the following commits:

5167f01 [CodingCat] fix typo in the comment

fda4331d

Dec 25, 2014

[SPARK-4608][Streaming] Reorganize StreamingContext implicit to improve API convenience · f9ed2b66

zsxwing authored 10 years ago

There is only one implicit function `toPairDStreamFunctions` in `StreamingContext`. This PR did similar reorganization like [SPARK-4397](https://issues.apache.org/jira/browse/SPARK-4397).

Compiled the following codes with Spark Streaming 1.1.0 and ran it with this PR. Everything is fine.
```Scala
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._

object StreamingApp {

  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[2]").setAppName("FileWordCount")
    val ssc = new StreamingContext(conf, Seconds(10))
    val lines = ssc.textFileStream("/some/path")
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    wordCounts.print()

    ssc.start()
    ssc.awaitTermination()
  }
}
```

Author: zsxwing <zsxwing@gmail.com>

Closes #3464 from zsxwing/SPARK-4608 and squashes the following commits:

aa6d44a [zsxwing] Fix a copy-paste error
f74c190 [zsxwing] Merge branch 'master' into SPARK-4608
e6f9cc9 [zsxwing] Update the docs
27833bb [zsxwing] Remove `import StreamingContext._`
c15162c [zsxwing] Reorganize StreamingContext implicit to improve API convenience

f9ed2b66

[SPARK-4537][Streaming] Expand StreamingSource to add more metrics · f205fe47

jerryshao authored 10 years ago

Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting.

Author: jerryshao <saisai.shao@intel.com>

Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits:

00f5f7f [jerryshao] Change the code style and add totalProcessedRecords
44721a6 [jerryshao] Further address the comments
c097ddc [jerryshao] Address the comments
02dd44f [jerryshao] Fix the addressed comments
c7a9376 [jerryshao] Expand StreamingSource to add more metrics

f205fe47

[EC2] Update mesos/spark-ec2 branch to branch-1.3 · ac827859

Nicholas Chammas authored 10 years ago

Going forward, we'll use matching branch names across the mesos/spark-ec2 and apache/spark repositories, per [the discussion here](https://github.com/mesos/spark-ec2/pull/85#issuecomment-68069589).

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #3804 from nchammas/patch-2 and squashes the following commits:

cd2c0d4 [Nicholas Chammas] [EC2] Update mesos/spark-ec2 branch to branch-1.3

ac827859

[EC2] Update default Spark version to 1.2.0 · b6b6393b

Nicholas Chammas authored 10 years ago

Now that 1.2.0 is out, let's update the default Spark version.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #3793 from nchammas/patch-1 and squashes the following commits:

3255832 [Nicholas Chammas] add 1.2.0 version to Spark-Shark map
ec0e904 [Nicholas Chammas] [EC2] Update default Spark version to 1.2.0

b6b6393b

Fix "Building Spark With Maven" link in README.md · 08b18c7e

Denny Lee authored 10 years ago

Corrected link to the Building Spark with Maven page from its original (http://spark.apache.org/docs/latest/building-with-maven.html) to the current page (http://spark.apache.org/docs/latest/building-spark.html)

Author: Denny Lee <denny.g.lee@gmail.com>

Closes #3802 from dennyglee/patch-1 and squashes the following commits:

15f601a [Denny Lee] Update README.md

08b18c7e

[SPARK-4953][Doc] Fix the description of building Spark with YARN · 11dd9931

Kousuke Saruta authored 10 years ago

At the section "Specifying the Hadoop Version" In building-spark.md, there is description about building with YARN with Hadoop 0.23.
Spark 1.3.0 will not support Hadoop 0.23 so we should fix the description.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3787 from sarutak/SPARK-4953 and squashes the following commits:

ee9c355 [Kousuke Saruta] Removed description related to a specific vendor
9ab0c24 [Kousuke Saruta] Fix the description about building SPARK with YARN

11dd9931

Dec 24, 2014

[SPARK-4873][Streaming] Use `Future.zip` instead of `Future.flatMap`(for-loop)... · b4d0db80

zsxwing authored 10 years ago

[SPARK-4873][Streaming] Use `Future.zip` instead of `Future.flatMap`(for-loop) in WriteAheadLogBasedBlockHandler

Use `Future.zip` instead of `Future.flatMap`(for-loop). `zip` implies these two Futures will run concurrently, while `flatMap` usually means one Future depends on the other one.

Author: zsxwing <zsxwing@gmail.com>

Closes #3721 from zsxwing/SPARK-4873 and squashes the following commits:

46a2cd9 [zsxwing] Use Future.zip instead of Future.flatMap(for-loop)

b4d0db80

SPARK-4297 [BUILD] Build warning fixes omnibus · 29fabb1b

Sean Owen authored 10 years ago

There are a number of warnings generated in a normal, successful build right now. They're mostly Java unchecked cast warnings, which can be suppressed. But there's a grab bag of other Scala language warnings and so on that can all be easily fixed. The forthcoming PR fixes about 90% of the build warnings I see now.

Author: Sean Owen <sowen@cloudera.com>

Closes #3157 from srowen/SPARK-4297 and squashes the following commits:

8c9e469 [Sean Owen] Suppress unchecked cast warnings, and several other build warning fixes

29fabb1b

Dec 23, 2014

[SPARK-4881][Minor] Use SparkConf#getBoolean instead of get().toBoolean · 199e59aa

Kousuke Saruta authored 10 years ago

It's really a minor issue.

In ApplicationMaster, there is code like as follows.

val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", "false").toBoolean

I think, the code can be simplified like as follows.

val preserveFiles = sparkConf.getBoolean("spark.yarn.preserve.staging.files", false)

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3733 from sarutak/SPARK-4881 and squashes the following commits:

1771430 [Kousuke Saruta] Modified the code like sparkConf.get(...).toBoolean to sparkConf.getBoolean(...)
c63daa0 [Kousuke Saruta] Simplified code

199e59aa

[SPARK-4860][pyspark][sql] speeding up `sample()` and `takeSample()` · fd41eb95

jbencook authored 10 years ago

This PR modifies the python `SchemaRDD` to use `sample()` and `takeSample()` from Scala instead of the slower python implementations from `rdd.py`. This is worthwhile because the `Row`'s are already serialized as Java objects.

In order to use the faster `takeSample()`, a `takeSampleToPython()` method was implemented in `SchemaRDD.scala` following the pattern of `collectToPython()`.

Author: jbencook <jbenjamincook@gmail.com>
Author: J. Benjamin Cook <jbenjamincook@gmail.com>

Closes #3764 from jbencook/master and squashes the following commits:

6fbc769 [J. Benjamin Cook] [SPARK-4860][pyspark][sql] fixing sloppy indentation for takeSampleToPython() arguments
5170da2 [J. Benjamin Cook] [SPARK-4860][pyspark][sql] fixing typo: from RDD to SchemaRDD
de22f70 [jbencook] [SPARK-4860][pyspark][sql] using sample() method from JavaSchemaRDD
b916442 [jbencook] [SPARK-4860][pyspark][sql] adding sample() to JavaSchemaRDD
020cbdf [jbencook] [SPARK-4860][pyspark][sql] using Scala implementations of `sample()` and `takeSample()`

fd41eb95

[SPARK-4606] Send EOF to child JVM when there's no more data to read. · 7e2deb71

Marcelo Vanzin authored 10 years ago

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3460 from vanzin/SPARK-4606 and squashes the following commits:

031207d [Marcelo Vanzin] [SPARK-4606] Send EOF to child JVM when there's no more data to read.

7e2deb71

[SPARK-4671][Streaming]Do not replicate streaming block when WAL is enabled · 3f5f4cc4

jerryshao authored 10 years ago

Currently streaming block will be replicated when specific storage level is set, since WAL is already fault tolerant, so replication is needless and will hurt the throughput of streaming application.

Hi tdas , as per discussed about this issue, I fixed with this implementation, I'm not is this the way you want, would you mind taking a look at it? Thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes #3534 from jerryshao/SPARK-4671 and squashes the following commits:

500b456 [jerryshao] Do not replicate streaming block when WAL is enabled

3f5f4cc4

[SPARK-4802] [streaming] Remove receiverInfo once receiver is de-registered · 10d69e9c

Ilayaperumal Gopinathan authored 10 years ago

Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to
remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`.

Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io>

Closes #3647 from ilayaperumalg/receiverInfo-RTracker and squashes the following commits:

6eb97d5 [Ilayaperumal Gopinathan] Polishing based on the review
3640c86 [Ilayaperumal Gopinathan] Remove receiverInfo once receiver is de-registered

10d69e9c

[SPARK-4913] Fix incorrect event log path · 96281cd0

Liang-Chi Hsieh authored 10 years ago

SPARK-2261 uses a single file to log events for an app. `eventLogDir` in `ApplicationDescription` is replaced with `eventLogFile`. However, `ApplicationDescription` in `SparkDeploySchedulerBackend` is initialized with `SparkContext`'s `eventLogDir`. It is just the log directory, not the actual log file path. `Master.rebuildSparkUI` can not correctly rebuild a new SparkUI for the app.

Because the `ApplicationDescription` is remotely registered with `Master` and the app's id is then generated in `Master`, we can not get the app id in advance before registration. So the received description needs to be modified with correct `eventLogFile` value.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #3755 from viirya/fix_app_logdir and squashes the following commits:

5e0ea35 [Liang-Chi Hsieh] Revision for comment.
b5730a1 [Liang-Chi Hsieh] Fix incorrect event log path.

Closes #3777 (a duplicate PR for the same JIRA)

96281cd0

[SPARK-4730][YARN] Warn against deprecated YARN settings · 27c5399f

Andrew Or authored 10 years ago

See https://issues.apache.org/jira/browse/SPARK-4730.

Author: Andrew Or <andrew@databricks.com>

Closes #3590 from andrewor14/yarn-settings and squashes the following commits:

36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings
dcd1316 [Andrew Or] Warn against deprecated YARN settings

27c5399f

[SPARK-4914][Build] Cleans lib_managed before compiling with Hive 0.13.1 · 395b771f

Cheng Lian authored 10 years ago

This PR tries to fix the Hive tests failure encountered in PR #3157 by cleaning `lib_managed` before building assembly jar against Hive 0.13.1 in `dev/run-tests`. Otherwise two sets of datanucleus jars would be left in `lib_managed` and may mess up class paths while executing Hive test suites. Please refer to [this thread] [1] for details. A clean build would be even safer, but we only clean `lib_managed` here to save build time.

This PR also takes the chance to clean up some minor typos and formatting issues in the comments.

[1]: https://github.com/apache/spark/pull/3157#issuecomment-67656488

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3756)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #3756 from liancheng/clean-lib-managed and squashes the following commits:

e2bd21d [Cheng Lian] Adds lib_managed to clean set
c9f2f3e [Cheng Lian] Cleans lib_managed before compiling with Hive 0.13.1

395b771f

[SPARK-4932] Add help comments in Analytics · 9c251c55

Takeshi Yamamuro authored 10 years ago

Trivial modifications for usability.

Author: Takeshi Yamamuro <linguin.m.s@gmail.com>

Closes #3775 from maropu/AddHelpCommentInAnalytics and squashes the following commits:

fbea8f5 [Takeshi Yamamuro] Add help comments in Analytics

9c251c55

[SPARK-4834] [standalone] Clean up application files after app finishes. · dd155369

Marcelo Vanzin authored 10 years ago

Commit 7aacb7bf added support for sharing downloaded files among multiple
executors of the same app. That works great in Yarn, since the app's directory
is cleaned up after the app is done.

But Spark standalone mode didn't do that, so the lock/cache files created
by that change were left around and could eventually fill up the disk hosting
/tmp.

To solve that, create app-specific directories under the local dirs when
launching executors. Multiple executors launched by the same Worker will
use the same app directories, so they should be able to share the downloaded
files. When the application finishes, a new message is sent to all workers
telling them the application has finished; once that message has been received,
and all executors registered for the application shut down, then those
directories will be cleaned up by the Worker.

Note: Unit testing this is hard (if even possible), since local-cluster mode
doesn't seem to leave the Master/Worker daemons running long enough after
`sc.stop()` is called for the clean up protocol to take effect.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3705 from vanzin/SPARK-4834 and squashes the following commits:

b430534 [Marcelo Vanzin] Remove seemingly unnecessary synchronization.
50eb4b9 [Marcelo Vanzin] Review feedback.
c0e5ea5 [Marcelo Vanzin] [SPARK-4834] [standalone] Clean up application files after app finishes.

dd155369

[SPARK-4931][Yarn][Docs] Fix the format of running-on-yarn.md · 2d215aeb

zsxwing authored 10 years ago

Currently, the format about log4j in running-on-yarn.md is a bit messy.

![running-on-yarn](https://cloud.githubusercontent.com/assets/1000778/5535248/204c4b64-8ab4-11e4-83c3-b4722ea0ad9d.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #3774 from zsxwing/SPARK-4931 and squashes the following commits:

4a5f853 [zsxwing] Fix the format of running-on-yarn.md

2d215aeb

[SPARK-4890] Ignore downloaded EC2 libs · 2823c7f0

Nicholas Chammas authored 10 years ago

PR #3737 changed `spark-ec2` to automatically download boto from PyPI. This PR tell git to ignore those downloaded library files.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #3770 from nchammas/ignore-ec2-lib and squashes the following commits:

5c440d3 [Nicholas Chammas] gitignore downloaded EC2 libs

2823c7f0

[Docs] Minor typo fixes · 0e532ccb

Nicholas Chammas authored 10 years ago

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #3772 from nchammas/patch-1 and squashes the following commits:

b7d9083 [Nicholas Chammas] [Docs] Minor typo fixes

0e532ccb

Dec 22, 2014

[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R · a96b7278

DB Tsai authored 10 years ago

In most of the academic paper and algorithm implementations,
people use L = 1/2n ||A weights-y||^2 instead of L = 1/n ||A weights-y||^2
for least-squared loss. See Eq. (1) in http://web.stanford.edu/~hastie/Papers/glmnet.pdf

Since MLlib uses different convention, this will result different residuals and
all the stats properties will be different from GLMNET package in R.

The model coefficients will be still the same under this change.

Author: DB Tsai <dbtsai@alpinenow.com>

Closes #3746 from dbtsai/lir and squashes the following commits:

19c2e85 [DB Tsai] make stepsize twice to converge to the same solution
0b2c29c [DB Tsai] first commit

a96b7278

[SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join · c233ab3d

zsxwing authored 10 years ago

In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as,
```Scala
  val iterable = Seq(1, 2, 3).map(v => {
    println(v)
    v
  })
  println("Iterable map done")

  val iterator = Seq(1, 2, 3).iterator.map(v => {
    println(v)
    v
  })
  println("Iterator map done")
```
outputed
```
1
2
3
Iterable map done
Iterator map done
```
So we should use 'iterator' to reduce memory consumed by join.

Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E

Author: zsxwing <zsxwing@gmail.com>

Closes #3671 from zsxwing/SPARK-4824 and squashes the following commits:

48ee7b9 [zsxwing] Remove the explicit types
95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join

c233ab3d

[SPARK-4920][UI]:current spark version in UI is not striking. · de9d7d2b

genmao.ygm authored 10 years ago

It is not convenient to see the Spark version. We can keep the same style with Spark website.

![spark_version](https://cloud.githubusercontent.com/assets/7402327/5527025/1c8c721c-8a35-11e4-8d6a-2734f3c6bdf8.jpg)

Author: genmao.ygm <genmao.ygm@alibaba-inc.com>

Closes #3763 from uncleGen/master-clean-141222 and squashes the following commits:

0dcb9a9 [genmao.ygm] [SPARK-4920][UI]:current spark version in UI is not striking.

de9d7d2b

[Minor] Fix scala doc · a61aa669

Liang-Chi Hsieh authored 10 years ago

Minor fix for an obvious scala doc error.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #3751 from viirya/fix_scaladoc and squashes the following commits:

03fddaa [Liang-Chi Hsieh] Fix scala doc.

a61aa669

[SPARK-4864] Add documentation to Netty-based configs · fbca6b6c

Aaron Davidson authored 10 years ago

Author: Aaron Davidson <aaron@databricks.com>

Closes #3713 from aarondav/netty-configs and squashes the following commits:

8a8b373 [Aaron Davidson] Address Patrick's comments
3b1f84e [Aaron Davidson] [SPARK-4864] Add documentation to Netty-based configs

fbca6b6c

[SPARK-4079] [CORE] Consolidates Errors if a CompressionCodec is not available · 7c0ed13d

Kostas Sakellis authored 10 years ago

This commit consolidates some of the exceptions thrown if compression codecs are not available. If a bad configuration string was passed in, a ClassNotFoundException was through. Also, if Snappy was not available, it would throw an InvocationTargetException when the codec was being used (not when it was being initialized). Now, an IllegalArgumentException is thrown when a codec is not available at creation time - either because the class does not exist or the codec itself is not available in the system. This will allow us to have a better message and fail faster.

Author: Kostas Sakellis <kostas@cloudera.com>

Closes #3119 from ksakellis/kostas-spark-4079 and squashes the following commits:

9709c7c [Kostas Sakellis] Removed unnecessary Logging class
63bfdd0 [Kostas Sakellis] Removed isAvailable to preserve binary compatibility
1d0ef2f [Kostas Sakellis] [SPARK-4079] [CORE] Added more information to exception
64f3d27 [Kostas Sakellis] [SPARK-4079] [CORE] Code review feedback
52dfa8f [Kostas Sakellis] [SPARK-4079] [CORE] Default to LZF if Snappy not available

7c0ed13d

SPARK-4447. Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha · d62da642

Sandy Ryza authored 10 years ago

Author: Sandy Ryza <sandy@cloudera.com>

Closes #3652 from sryza/sandy-spark-4447 and squashes the following commits:

2791158 [Sandy Ryza] Review feedback
c23507b [Sandy Ryza] Strip margin from client arguments help string
18be7ba [Sandy Ryza] SPARK-4447

d62da642

[SPARK-4733] Add missing prameter comments in ShuffleDependency · fb8e85e8

Takeshi Yamamuro authored 10 years ago

Add missing Javadoc comments in ShuffleDependency.

Author: Takeshi Yamamuro <linguin.m.s@gmail.com>

Closes #3594 from maropu/DependencyJavadocFix and squashes the following commits:

32129b4 [Takeshi Yamamuro] Fix comments in @aggregator and @mapSideCombine
303c75d [Takeshi Yamamuro] [SPARK-4733] Add missing prameter comments in ShuffleDependency

fb8e85e8

[Minor] Improve some code in BroadcastTest for short · 1d9788e4

carlmartin authored 10 years ago

Using
    val arr1 = (0 until num).toArray
instead of
    val arr1 = new Array[Int](num)
    for (i <- 0 until arr1.length) {
      arr1(i) = i
    }
for short.

Author: carlmartin <carlmartinmax@gmail.com>

Closes #3750 from SaintBacchus/BroadcastTest and squashes the following commits:

43adb70 [carlmartin] Improve some code in BroadcastTest for short

1d9788e4

[SPARK-4883][Shuffle] Add a name to the directoryCleaner thread · 8773705f

zsxwing authored 10 years ago

Author: zsxwing <zsxwing@gmail.com>

Closes #3734 from zsxwing/SPARK-4883 and squashes the following commits:

e6f2b61 [zsxwing] Fix the name
cc74727 [zsxwing] Add a name to the directoryCleaner thread

8773705f

[SPARK-4870] Add spark version to driver log · 39272c8c

Zhang, Liye authored 10 years ago

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #3717 from liyezhang556520/version2Log and squashes the following commits:

ccd30d7 [Zhang, Liye] delete log in sparkConf
330f70c [Zhang, Liye] move the log from SaprkConf to SparkContext
96dc115 [Zhang, Liye] remove curly brace
e833330 [Zhang, Liye] add spark version to driver log

39272c8c