Commits · e772b4e4e1b790199dd000bd096a8917cb8def24 · cs525-sp18-g07 / spark

Feb 06, 2015

SPARK-5403: Ignore UserKnownHostsFile in SSH calls · e772b4e4

Grzegorz Dubicki authored 10 years ago

See https://issues.apache.org/jira/browse/SPARK-5403

Author: Grzegorz Dubicki <grzegorz.dubicki@gmail.com>

Closes #4196 from grzegorz-dubicki/SPARK-5403 and squashes the following commits:

a7d863f [Grzegorz Dubicki] Resolve start command hanging issue

e772b4e4

[SPARK-5601][MLLIB] make streaming linear algorithms Java-friendly · 0e23ca9f

Xiangrui Meng authored 10 years ago

Overload `trainOn`, `predictOn`, and `predictOnValues`.

CC freeman-lab

Author: Xiangrui Meng <meng@databricks.com>

Closes #4432 from mengxr/streaming-java and squashes the following commits:

6a79b85 [Xiangrui Meng] add java test for streaming logistic regression
2d7b357 [Xiangrui Meng] organize imports
1f662b3 [Xiangrui Meng] make streaming linear algorithms Java-friendly

0e23ca9f

[SQL] [Minor] HiveParquetSuite was disabled by mistake, re-enable them · c4021401

Cheng Lian authored 10 years ago

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4440)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #4440 from liancheng/parquet-oops and squashes the following commits:

f21ede4 [Cheng Lian] HiveParquetSuite was disabled by mistake, re-enable them.

c4021401

[SQL] Use TestSQLContext in Java tests · 76c4bf59

Michael Armbrust authored 10 years ago

Sometimes tests were failing due to the creation of multiple `SparkContext`s in a single JVM.

Author: Michael Armbrust <michael@databricks.com>

Closes #4441 from marmbrus/javaTests and squashes the following commits:

657b1e0 [Michael Armbrust] [SQL] Use TestSQLContext in Java tests

76c4bf59

[SPARK-4994][network]Cleanup removed executors' ShuffleInfo in yarn shuffle service · 61073f83

lianhuiwang authored 10 years ago

when the application is completed, yarn's nodemanager can remove application's local-dirs.but all executors' metadata of completed application havenot be removed. now it lets yarn ShuffleService to have much more memory to store Executors' ShuffleInfo. so these metadata need to be removed.

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #3828 from lianhuiwang/SPARK-4994 and squashes the following commits:

f3ba1d2 [lianhuiwang] Cleanup removed executors' ShuffleInfo

61073f83

[SPARK-5444][Network]Add a retry to deal with the conflict port in netty server. · 2bda1c1d

huangzhaowei authored 10 years ago

If the `spark.blockMnager.port` had conflicted with a specific port, Spark will throw an exception and exit.
So add a retry to avoid this situation.

Author: huangzhaowei <carlmartinmax@gmail.com>

Closes #4240 from SaintBacchus/NettyPortConflict and squashes the following commits:

cc926d2 [huangzhaowei] Add a retry to deal with the conflict port in netty server.

2bda1c1d

[SPARK-4874] [CORE] Collect record count metrics · dcd1e42d

Kostas Sakellis authored 10 years ago

Collects record counts for both Input/Output and Shuffle Metrics. For the input/output metrics, it just appends the counter every time the iterators get accessed.

For shuffle on the write side, we count the metrics post aggregation (after a map side combine) and on the read side we count the metrics pre aggregation. This allows both the bytes read/written metrics and the records read/written to line up.

For backwards compatibility, if we deserialize an older event that doesn't have record metrics, we set the metric to -1.

Author: Kostas Sakellis <kostas@cloudera.com>

Closes #4067 from ksakellis/kostas-spark-4874 and squashes the following commits:

bd919be [Kostas Sakellis] Changed 'Records Read' in shuffleReadMetrics json output to 'Total Records Read'
dad4d57 [Kostas Sakellis] Add a comment and check to BlockObjectWriter so that it cannot be reopend.
6f236a1 [Kostas Sakellis] Renamed _recordsWritten in ShuffleWriteMetrics to be more consistent
70620a0 [Kostas Sakellis] CR Feedback
17faa3a [Kostas Sakellis] Removed AtomicLong in favour of using Long
b6f9923 [Kostas Sakellis] Merge AfterNextInterceptingIterator with InterruptableIterator to save a function call
46c8186 [Kostas Sakellis] Combined Bytes and # records into one column
57551c1 [Kostas Sakellis] Conforms to SPARK-3288
6cdb44e [Kostas Sakellis] Removed the generic InterceptingIterator and repalced it with specific implementation
1aa273c [Kostas Sakellis] CR Feedback
1bb78b1 [Kostas Sakellis] [SPARK-4874] [CORE] Collect record count metrics

dcd1e42d

[HOTFIX] Fix the maven build after adding sqlContext to spark-shell · 57961567

Michael Armbrust authored 10 years ago

Follow up to #4387 to fix the build break.

Author: Michael Armbrust <michael@databricks.com>

Closes #4443 from marmbrus/fixMaven and squashes the following commits:

1eeba7d [Michael Armbrust] try again
7f5fb15 [Michael Armbrust] [HOTFIX] Fix the maven build after adding sqlContext to spark-shell

57961567

[SPARK-5600] [core] Clean up FsHistoryProvider test, fix app sort order. · 5687bab8

Marcelo Vanzin authored 10 years ago

Clean up some test setup code to remove duplicate instantiation of the
provider. Also make sure unfinished apps are sorted correctly.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #4370 from vanzin/SPARK-5600 and squashes the following commits:

0d048d5 [Marcelo Vanzin] Cleanup test code a bit.
2585119 [Marcelo Vanzin] Review feedback.
8b97544 [Marcelo Vanzin] Merge branch 'master' into SPARK-5600
be979e9 [Marcelo Vanzin] Merge branch 'master' into SPARK-5600
298371c [Marcelo Vanzin] [SPARK-5600] [core] Clean up FsHistoryProvider test, fix app sort order.

5687bab8

SPARK-5613: Catch the ApplicationNotFoundException exception to avoid thread... · ca66159a

Kashish Jain authored 10 years ago

SPARK-5613: Catch the ApplicationNotFoundException exception to avoid thread from getting killed on yarn restart.

[SPARK-5613] Added a  catch block to catch the ApplicationNotFoundException. Without this catch block the thread gets killed on occurrence of this exception. This Exception occurs when yarn restarts and tries to find an application id for a spark job which got interrupted due to yarn getting stopped.
See the stacktrace in the bug for more details.

Author: Kashish Jain <kashish.jain@guavus.com>

Closes #4392 from kasjain/branch-1.2 and squashes the following commits:

4831000 [Kashish Jain] SPARK-5613: Catch the ApplicationNotFoundException exception to avoid thread from getting killed on yarn restart.

ca66159a

SPARK-5633 pyspark saveAsTextFile support for compression codec · b3872e00

Vladimir Vladimirov authored 10 years ago

See https://issues.apache.org/jira/browse/SPARK-5633 for details

Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com>

Closes #4403 from smartkiwi/master and squashes the following commits:

94c014e [Vladimir Vladimirov] SPARK-5633 pyspark saveAsTextFile support for compression codec

b3872e00

[HOTFIX][MLLIB] fix a compilation error with java 6 · 65181b75

Xiangrui Meng authored 10 years ago

Author: Xiangrui Meng <meng@databricks.com>

Closes #4442 from mengxr/java6-fix and squashes the following commits:

2098500 [Xiangrui Meng] fix a compilation error with java 6

65181b75

[SPARK-4983] Insert waiting time before tagging EC2 instances · 0f3a3607

GenTang authored 10 years ago

The boto API doesn't support tag EC2 instances in the same call that launches them.
We add a five-second wait so EC2 has enough time to propagate the information so that
the tagging can succeed.

Author: GenTang <gen.tang86@gmail.com>
Author: Gen TANG <gen.tang86@gmail.com>

Closes #3986 from GenTang/spark-4983 and squashes the following commits:

13e257d [Gen TANG] modification of comments
47f06755 [GenTang] print the information
ab7a931 [GenTang] solve the issus spark-4983 by inserting waiting time
3179737 [GenTang] Revert "handling exceptions about adding tags to ec2"
6a8b53b [GenTang] Revert "the improvement of exception handling"
13e97a6 [GenTang] Revert "typo"
63fd360 [GenTang] typo
692fc2b [GenTang] the improvement of exception handling
6adcf6d [GenTang] handling exceptions about adding tags to ec2

0f3a3607

[SPARK-5586][Spark Shell][SQL] Make `sqlContext` available in spark shell · 3d3ecd77

OopsOutOfMemory authored 10 years ago

Result is like this
```
15/02/05 13:41:22 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/02/05 13:41:22 INFO SparkILoop: Created sql context..
SQLContext available as sqlContext.

scala> sq
sql          sqlContext   sqlParser    sqrt
```

Author: OopsOutOfMemory <victorshengli@126.com>

Closes #4387 from OopsOutOfMemory/sqlContextInShell and squashes the following commits:

c7f5203 [OopsOutOfMemory] auto-import sql() function
e160697 [OopsOutOfMemory] Merge branch 'sqlContextInShell' of https://github.com/OopsOutOfMemory/spark into sqlContextInShell
37c0a16 [OopsOutOfMemory] auto detect hive support
a9c59d9 [OopsOutOfMemory] rename and reduce range of imports
6b9e309 [OopsOutOfMemory] Merge branch 'master' into sqlContextInShell
cae652f [OopsOutOfMemory] make sqlContext available in spark shell

3d3ecd77

[SPARK-5278][SQL] Introduce UnresolvedGetField and complete the check of... · 4793c840

Wenchen Fan authored 10 years ago

[SPARK-5278][SQL] Introduce UnresolvedGetField and complete the check of ambiguous reference to fields

When the `GetField` chain(`a.b.c.d.....`) is interrupted by `GetItem` like `a.b[0].c.d....`, then the check of ambiguous reference to fields is broken.
The reason is that: for something like `a.b[0].c.d`, we first parse it to `GetField(GetField(GetItem(Unresolved("a.b"), 0), "c"), "d")`. Then in `LogicalPlan#resolve`, we resolve `"a.b"` and build a `GetField` chain from bottom(the relation). But for the 2 outer `GetFiled`, we have to resolve them in `Analyzer` or do it in `GetField` lazily, check data type of child, search needed field, etc. which is similar to what we have done in `LogicalPlan#resolve`.
So in this PR, the fix is just copy the same logic in `LogicalPlan#resolve` to `Analyzer`, which is simple and quick, but I do suggest introduce `UnresolvedGetFiled` like I explained in https://github.com/apache/spark/pull/2405.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #4068 from cloud-fan/simple and squashes the following commits:

a6857b5 [Wenchen Fan] fix import order
8411c40 [Wenchen Fan] use UnresolvedGetField

4793c840

[SQL][Minor] Remove cache keyword in SqlParser · bc363560

wangfei authored 10 years ago

Since cache keyword already defined in `SparkSQLParser` and `SqlParser` of catalyst is a more general parser which should not cover keywords related to underlying compute engine, to remove cache keyword in `SqlParser`.

Author: wangfei <wangfei1@huawei.com>

Closes #4393 from scwf/remove-cache-keyword and squashes the following commits:

10ade16 [wangfei] remove cache keyword in sql parser

bc363560

[SQL][HiveConsole][DOC] HiveConsole `correct hiveconsole imports` · b62c3524

OopsOutOfMemory authored 10 years ago

Sorry for that PR #4330 has some mistakes.

I correct it....  so it works correctly now.

Author: OopsOutOfMemory <victorshengli@126.com>

Closes #4389 from OopsOutOfMemory/doc and squashes the following commits:

843eed9 [OopsOutOfMemory] correct hiveconsole imports

b62c3524

[SPARK-5595][SPARK-5603][SQL] Add a rule to do PreInsert type casting and... · 3eccf29c

Yin Huai authored 10 years ago

[SPARK-5595][SPARK-5603][SQL] Add a rule to do PreInsert type casting and field renaming and invalidating in memory cache after INSERT

This PR adds a rule to Analyzer that will add preinsert data type casting and field renaming to the select clause in an `INSERT INTO/OVERWRITE` statement. Also, with the change of this PR, we always invalidate our in memory data cache after inserting into a BaseRelation.

cc marmbrus liancheng

Author: Yin Huai <yhuai@databricks.com>

Closes #4373 from yhuai/insertFollowUp and squashes the following commits:

08237a7 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertFollowUp
316542e [Yin Huai] Doc update.
c9ccfeb [Yin Huai] Revert a unnecessary change.
84aecc4 [Yin Huai] Address comments.
1951fe1 [Yin Huai] Merge remote-tracking branch 'upstream/master'
c18da34 [Yin Huai] Invalidate cache after insert.
727f21a [Yin Huai] Preinsert casting and renaming.

3eccf29c

[SPARK-5324][SQL] Results of describe can't be queried · 0b7eb3f3

OopsOutOfMemory authored 10 years ago

Make below code works.
```
sql("DESCRIBE test").registerTempTable("describeTest")
sql("SELECT * FROM describeTest").collect()
```

Author: OopsOutOfMemory <victorshengli@126.com>
Author: Sheng, Li <OopsOutOfMemory@users.noreply.github.com>

Closes #4249 from OopsOutOfMemory/desc_query and squashes the following commits:

6fee13d [OopsOutOfMemory] up-to-date
e71430a [Sheng, Li] Update HiveOperatorQueryableSuite.scala
3ba1058 [OopsOutOfMemory] change to default argument
aac7226 [OopsOutOfMemory] Merge branch 'master' into desc_query
68eb6dd [OopsOutOfMemory] Merge branch 'desc_query' of github.com:OopsOutOfMemory/spark into desc_query
354ad71 [OopsOutOfMemory] query describe command
d541a35 [OopsOutOfMemory] refine test suite
e1da481 [OopsOutOfMemory] refine test suite
a780539 [OopsOutOfMemory] Merge branch 'desc_query' of github.com:OopsOutOfMemory/spark into desc_query
0015f82 [OopsOutOfMemory] code style
dd0aaef [OopsOutOfMemory] code style
c7d606d [OopsOutOfMemory] rename test suite
75f2342 [OopsOutOfMemory] refine code and test suite
f942c9b [OopsOutOfMemory] initial
11559ae [OopsOutOfMemory] code style
c5fdecf [OopsOutOfMemory] code style
aeaea5f [OopsOutOfMemory] rename test suite
ac2c3bb [OopsOutOfMemory] refine code and test suite
544573e [OopsOutOfMemory] initial

0b7eb3f3

[SPARK-5619][SQL] Support 'show roles' in HiveContext · a958d609

q00251598 authored 10 years ago

Author: q00251598 <qiyadong@huawei.com>

Closes #4397 from watermen/SPARK-5619 and squashes the following commits:

f819b6c [q00251598] Support show roles in HiveContext.

a958d609

[SPARK-5640] Synchronize ScalaReflection where necessary · 500dc2b4

Tobias Schlatter authored 10 years ago

Author: Tobias Schlatter <tobias@meisch.ch>

Closes #4431 from gzm0/sync-scala-refl and squashes the following commits:

c5da21e [Tobias Schlatter] [SPARK-5640] Synchronize ScalaReflection where necessary

500dc2b4

[SPARK-5650][SQL] Support optional 'FROM' clause · d4338161

Liang-Chi Hsieh authored 10 years ago

In Hive, 'FROM' clause is optional. This pr supports it.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #4426 from viirya/optional_from and squashes the following commits:

fe81f31 [Liang-Chi Hsieh] Support optional 'FROM' clause.

d4338161

[SPARK-5628] Add version option to spark-ec2 · 70e5b030

Nicholas Chammas authored 10 years ago

Every proper command line tool should include a `--version` option or something similar.

This PR adds this to `spark-ec2` using the standard functionality provided by `optparse`.

One thing we don't do here is follow the Python convention of setting `__version__`, since it seems awkward given how `spark-ec2` is laid out.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #4414 from nchammas/spark-ec2-show-version and squashes the following commits:

914cab5 [Nicholas Chammas] add version info

70e5b030

[SPARK-2945][YARN][Doc]add doc for spark.executor.instances · d34f79c8

WangTaoTheTonic authored 10 years ago

https://issues.apache.org/jira/browse/SPARK-2945

spark.executor.instances works. As this JIRA recommended, we should add docs for this common config.

Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #4350 from WangTaoTheTonic/SPARK-2945 and squashes the following commits:

4c3913a [WangTaoTheTonic] not compatible with dynamic allocation
5fa9c46 [WangTaoTheTonic] add doc for spark.executor.instances

d34f79c8

[SPARK-4361][Doc] Add more docs for Hadoop Configuration · af2a2a26

zsxwing authored 10 years ago

I'm trying to point out reusing a Configuration in these APIs is dangerous. Any better idea?

Author: zsxwing <zsxwing@gmail.com>

Closes #3225 from zsxwing/SPARK-4361 and squashes the following commits:

fe4e3d5 [zsxwing] Add more docs for Hadoop Configuration

af2a2a26

[HOTFIX] Fix test build break in ExecutorAllocationManagerSuite. · fb6c0cba

Josh Rosen authored 10 years ago

This was caused because #3486 added a new field to ExecutorInfo and #4369
added new tests that created ExecutorInfos. These patches were merged in
quick succession and were never tested together, hence the compilation error.

fb6c0cba

[SPARK-5652][Mllib] Use broadcasted weights in LogisticRegressionModel · 80f3bcb5

Liang-Chi Hsieh authored 10 years ago

`LogisticRegressionModel`'s `predictPoint` should directly use broadcasted weights. This pr also fixes the compilation errors of two unit test suite: `JavaLogisticRegressionSuite ` and `JavaLinearRegressionSuite`.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #4429 from viirya/use_bcvalue and squashes the following commits:

5a797e5 [Liang-Chi Hsieh] Use broadcasted weights. Fix compilation error.

80f3bcb5

[SPARK-5555] Enable UISeleniumSuite tests · 0d74bd7f

Josh Rosen authored 10 years ago

This patch enables UISeleniumSuite, a set of tests for the Spark application web UI. These tests were previously disabled because they were slow, but I think we now have sufficient test time budget that the benefit of enabling them outweighs the time costs.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4334 from JoshRosen/enable-uiseleniumsuite and squashes the following commits:

4ab9477 [Josh Rosen] Use BeforeAndAfterAll to cleanup WebDriver
71efc72 [Josh Rosen] Update broken UISeleniumSuite tests; use random port #.
a5ab595 [Josh Rosen] Enable UISeleniumSuite tests.

0d74bd7f

SPARK-2450 Adds executor log links to Web UI · 32e964c4

Kostas Sakellis authored 10 years ago

Adds links to stderr/stdout in the executor tab of the webUI for:
1) Standalone
2) Yarn client
3) Yarn cluster

This tries to add the log url support in a general way so as to make it easy to add support for all the
cluster managers. This is done by using environment variables to pass to the executor the log urls. The
SPARK_LOG_URL_ prefix is used and so additional logs besides stderr/stdout can also be added.

To propagate this information to the UI we use the onExecutorAdded spark listener event.

Although this commit doesn't add log urls when running on a mesos cluster, it should be possible to add using the same mechanism.

Author: Kostas Sakellis <kostas@cloudera.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #3486 from ksakellis/kostas-spark-2450 and squashes the following commits:

d190936 [Josh Rosen] Fix a few minor style / formatting nits. Reset listener after each test Don't null listener out at end of main().
8673fe1 [Kostas Sakellis] CR feedback. Hide the log column if there are no logs available
5bf6952 [Kostas Sakellis] [SPARK-2450] [CORE] Adds exeuctor log links to Web UI

32e964c4

[SPARK-5618][Spark Core][Minor] Optimise utility code. · 4cdb26c1

Makoto Fukuhara authored 10 years ago

Author: Makoto Fukuhara <fukuo33@gmail.com>

Closes #4396 from fukuo33/fix-unnecessary-regex and squashes the following commits:

cd07fd6 [Makoto Fukuhara] fix unnecessary regex.

4cdb26c1

[SPARK-5593][Core]Replace BlockManagerListener with ExecutorListener in ExecutorAllocationListener · 6072fcc1

lianhuiwang authored 10 years ago

More strictly, in ExecutorAllocationListener, we need to replace onBlockManagerAdded, onBlockManagerRemoved with onExecutorAdded,onExecutorRemoved. because at some time, onExecutorAdded and onExecutorRemoved are more accurate to express these meanings. example at SPARK-5529, BlockManager has been removed,but executor is existed.
andrewor14 sryza

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #4369 from lianhuiwang/SPARK-5593 and squashes the following commits:

333367c [lianhuiwang] Replace BlockManagerListener with ExecutorListener in ExecutorAllocationListener

6072fcc1

[SPARK-4877] Allow user first classes to extend classes in the parent. · 9792bec5

Stephen Haberman authored 10 years ago

Previously, the classloader isolation was almost too good, such
that if a child class needed to load/reference a class that was
only available in the parent, it could not do so.

This adds tests for that case, the user-first Fake2 class extends
the only-in-parent Fake3 class.

It also sneaks in a fix where only the first stage seemed to work,
and on subsequent stages, a LinkageError happened because classes
from the user-first classpath were getting defined twice.

Author: Stephen Haberman <stephen@exigencecorp.com>

Closes #3725 from stephenh/4877_user_first_parent_inheritance and squashes the following commits:

dabcd35 [Stephen Haberman] [SPARK-4877] Respect userClassPathFirst for the driver code too.
3d0fa7c [Stephen Haberman] [SPARK-4877] Allow user first classes to extend classes in the parent.

9792bec5

[SPARK-5396] Syntax error in spark scripts on windows. · c01b9852

Masayoshi TSUZUKI authored 10 years ago

Modified syntax error in spark-submit2.cmd. Command prompt doesn't have "defined" operator.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes #4428 from tsudukim/feature/SPARK-5396 and squashes the following commits:

ec18465 [Masayoshi TSUZUKI] [SPARK-5396] Syntax error in spark scripts on windows.

c01b9852

[SPARK-5636] Ramp up faster in dynamic allocation · fe3740c4

Andrew Or authored 10 years ago

A recent patch #4051 made the initial number default to 0. With this change, any Spark application using dynamic allocation's default settings will ramp up very slowly. Since we never request more executors than needed to saturate the pending tasks, it is safe to ramp up quickly. The current default of 60 may be too slow.

Author: Andrew Or <andrew@databricks.com>

Closes #4409 from andrewor14/dynamic-allocation-interval and squashes the following commits:

d3cc485 [Andrew Or] Lower request interval

fe3740c4

SPARK-4337. [YARN] Add ability to cancel pending requests · 1a88f20d

Sandy Ryza authored 10 years ago

Author: Sandy Ryza <sandy@cloudera.com>

Closes #4141 from sryza/sandy-spark-4337 and squashes the following commits:

a98bd20 [Sandy Ryza] Andrew's comments
cdaab7f [Sandy Ryza] SPARK-4337. Add ability to cancel pending requests to YARN

1a88f20d

[SPARK-5653][YARN] In ApplicationMaster rename isDriver to isClusterMode · cc6e5311

lianhuiwang authored 10 years ago

in ApplicationMaster rename isDriver to isClusterMode,because in Client it uses isClusterMode,ApplicationMaster should keep consistent with it and uses isClusterMode.Also isClusterMode is easier to understand.
andrewor14 sryza

Author: lianhuiwang <lianhuiwang09@gmail.com>

Closes #4430 from lianhuiwang/am-isDriver-rename and squashes the following commits:

f9f3ed0 [lianhuiwang] rename isDriver to isClusterMode

cc6e5311

[SPARK-5013] [MLlib] Added documentation and sample data file for GaussianMixture · 9ad56ad2

Travis Galoppo authored 10 years ago

Simple description and code samples (and sample data) for GaussianMixture

Author: Travis Galoppo <tjg2107@columbia.edu>

Closes #4401 from tgaloppo/spark-5013 and squashes the following commits:

c9ff9a5 [Travis Galoppo] Fixed link in mllib-clustering.md Added Gaussian mixture and power iteration as available clustering techniques in mllib-guide
2368690 [Travis Galoppo] Minor fixes
3eb41fa [Travis Galoppo] [SPARK-5013] Added documentation and sample data file for GaussianMixture

9ad56ad2

[SPARK-5416] init Executor.threadPool before ExecutorSource · 37d35ab5

Ryan Williams authored 10 years ago

Some ExecutorSource metrics can NPE by attempting to reference the
threadpool otherwise.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #4212 from ryan-williams/threadpool and squashes the following commits:

236f2ad [Ryan Williams] init Executor.threadPool before ExecutorSource

37d35ab5

[Build] Set all Debian package permissions to 755 · cf6778e8

Nicholas Chammas authored 10 years ago

755 means the owner can read, write, and execute, and everyone else can just read and execute. I think that's what we want here since without execute permissions others cannot open directories.

Inspired by [this comment on a separate PR](https://github.com/apache/spark/pull/3297#issuecomment-63286730).

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #4277 from nchammas/patch-1 and squashes the following commits:

da77fb0 [Nicholas Chammas] [Build] Set all Debian package permissions to 755

cf6778e8

Update ec2-scripts.md · f827ef4d

Miguel Peralvo authored 10 years ago

Change spark-version from 1.1.0 to 1.2.0 in the example for spark-ec2/Launch Cluster.

Author: Miguel Peralvo <miguel.peralvo@gmail.com>

Closes #4300 from MiguelPeralvo/patch-1 and squashes the following commits:

38adf0b [Miguel Peralvo] Update ec2-scripts.md
1850869 [Miguel Peralvo] Update ec2-scripts.md

f827ef4d