Commits · d43415075b3468fe8aa56de5d2907d409bb96347 · cs525-sp18-g07 / spark

Jul 04, 2014

[SPARK-1199][REPL] Remove VALId and use the original import style for defined classes. · d4341507

Prashant Sharma authored 10 years ago

This is an alternate solution to #1176.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1179 from ScrapCodes/SPARK-1199/repl-fix-second-approach and squashes the following commits:

820b34b [Prashant Sharma] Here we generate two kinds of import wrappers based on whether it is a class or not.

d4341507

[SPARK-2059][SQL] Don't throw TreeNodeException in `execution.ExplainCommand` · 54488045

Cheng Lian authored 10 years ago

This is a fix for the problem revealed by PR #1265.

Currently `HiveComparisonSuite` ignores output of `ExplainCommand` since Catalyst query plan is quite different from Hive query plan. But exceptions throw from `CheckResolution` still breaks test cases. This PR catches any `TreeNodeException` and reports it as part of the query explanation.

After merging this PR, PR #1265 can also be merged safely.

For a normal query:

```
scala> hql("explain select key from src").foreach(println)
...
[Physical execution plan:]
[HiveTableScan [key#9], (MetastoreRelation default, src, None), None]
```

For a wrong query with unresolved attribute(s):

```
scala> hql("explain select kay from src").foreach(println)
...
[Error occurred during query planning: ]
[Unresolved attributes: 'kay, tree:]
[Project ['kay]]
[ LowerCaseSchema ]
[  MetastoreRelation default, src, None]
```

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1294 from liancheng/safe-explain and squashes the following commits:

4318911 [Cheng Lian] Don't throw TreeNodeException in `execution.ExplainCommand`

54488045

SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark · 97a0bfe1

Aaron Davidson authored 10 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-2282

This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes.

This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets.

Author: Aaron Davidson <aaron@databricks.com>

Closes #1220 from aarondav/SPARK-2282 and squashes the following commits:

2e5cab3 [Aaron Davidson] SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark

97a0bfe1

[SPARK-2307][Reprise] Correctly report RDD blocks on SparkUI · 3894a49b

Andrew Or authored 10 years ago

**Problem.** The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors.

**Solution.** The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`.

Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`.

----------

If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits:

45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise
a82ea25 [Andrew Or] Add tests for StorageStatusListener
8773b01 [Andrew Or] Update comment / minor changes
3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI

3894a49b

[SPARK-2350] Don't NPE while launching drivers · 586feb5c

Aaron Davidson authored 10 years ago

Prior to this change, we could throw a NPE if we launch a driver while another one is waiting, because removing from an iterator while iterating over it is not safe.

Author: Aaron Davidson <aaron@databricks.com>

Closes #1289 from aarondav/master-fail and squashes the following commits:

1cf1cf4 [Aaron Davidson] SPARK-2350: Don't NPE while launching drivers

586feb5c

Jul 03, 2014

[SPARK-1097] Workaround Hadoop conf ConcurrentModification issue · 5fa0a057

Raymond Liu authored 10 years ago

Workaround Hadoop conf ConcurrentModification issue

Author: Raymond Liu <raymond.liu@intel.com>

Closes #1273 from colorant/hadoopRDD and squashes the following commits:

994e98b [Raymond Liu] Address comments
e2cda3d [Raymond Liu] Workaround Hadoop conf ConcurrentModification issue

5fa0a057

Streaming programming guide typos · fdc4c112

Clément MATHIEU authored 10 years ago

Fix a bad Java code sample and a broken link in the streaming programming guide.

Author: Clément MATHIEU <clement@unportant.info>

Closes #1286 from cykl/streaming-programming-guide-typos and squashes the following commits:

b0908cb [Clément MATHIEU] Fix broken URL
9d3c535 [Clément MATHIEU] Spark streaming requires at least two working threads (scala version was OK)

fdc4c112

[HOTFIX] Synchronize on SQLContext.settings in tests. · d4c30cd9

Zongheng Yang authored 10 years ago

Let's see if this fixes the ongoing series of test failures in a master build machine (https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT-pre-YARN/SPARK_HADOOP_VERSION=1.0.4,label=centos/81/).

pwendell marmbrus

Author: Zongheng Yang <zongheng.y@gmail.com>

Closes #1277 from concretevitamin/test-fix and squashes the following commits:

28c88bd [Zongheng Yang] Synchronize on SQLContext.settings in tests.

d4c30cd9

[SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. · 731f683b

Prashant Sharma authored 10 years ago

Trivial fix.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:

77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.

731f683b

[SPARK-2342] Evaluation helper's output type doesn't conform to input ty... · a9b52e56

Yijie Shen authored 10 years ago

The function cast doesn't conform to the intention of "Those expressions are supposed to be in the same data type, and also the return type." comment

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #1283 from yijieshen/master and squashes the following commits:

c7aaa4b [Yijie Shen] [SPARK-2342] Evaluation helper's output type doesn't conform to input type

a9b52e56

SPARK-1675. Make clear whether computePrincipalComponents requires centered data · 2b36344f

Sean Owen authored 10 years ago

Just closing out this small JIRA, resolving with a comment change.

Author: Sean Owen <sowen@cloudera.com>

Closes #1171 from srowen/SPARK-1675 and squashes the following commits:

45ee9b7 [Sean Owen] Add simple note that data need not be centered for computePrincipalComponents

2b36344f

[SPARK] Fix NPE for ExternalAppendOnlyMap · c4805377

Andrew Or authored 10 years ago

It did not handle null keys very gracefully before.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1288 from andrewor14/fix-external and squashes the following commits:

312b8d8 [Andrew Or] Abstract key hash code
ed5adf9 [Andrew Or] Fix NPE for ExternalAppendOnlyMap

c4805377

[SPARK-2324] SparkContext should not exit directly when spark.local.dir is a... · 3bbeca64

yantangzhai authored 10 years ago

[SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error

The spark.local.dir is configured as a list of multiple paths as follows /data1/sparkenv/local,/data2/sparkenv/local. If the disk data2 of the driver node has error, the application will exit since DiskBlockManager exits directly at createLocalDirs. If the disk data2 of the worker node has error, the executor will exit either.
DiskBlockManager should not exit directly at createLocalDirs if one of spark.local.dir has error. Since spark.local.dir has multiple paths, a problem should not affect the overall situation.
I think DiskBlockManager could ignore the bad directory at createLocalDirs.

Author: yantangzhai <tyz0303@163.com>

Closes #1274 from YanTangZhai/SPARK-2324 and squashes the following commits:

609bf48 [yantangzhai] [SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error
df08673 [yantangzhai] [SPARK-2324] SparkContext should not exit directly when spark.local.dir is a list of multiple paths and one of them has error

3bbeca64

Jul 02, 2014

[SPARK-2287] [SQL] Make ScalaReflection be able to handle Generic case classes. · bc7041a4

Takuya UESHIN authored 10 years ago

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1226 from ueshin/issues/SPARK-2287 and squashes the following commits:

32ef7c3 [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`.
541dc8d [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287
fac5fae [Takuya UESHIN] Remove unnecessary method receiver.
d306e60 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287
7de5706 [Takuya UESHIN] Make ScalaReflection be able to handle Generic case classes.

bc7041a4

[SPARK-2328] [SQL] Add execution of `SHOW TABLES` before `TestHive.reset()`. · 1e2c26c8

Takuya UESHIN authored 10 years ago

`PruningSuite` is executed first of Hive tests unfortunately, `TestHive.reset()` breaks the test environment.
To prevent this, we must run a query before calling reset the first time.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1268 from ueshin/issues/SPARK-2328 and squashes the following commits:

043ceac [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`.

1e2c26c8

SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG · 5c6ec94d

Ximo Guanter Gonzalbez authored 10 years ago

**Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other.
**Testing** Unit tests added.

Author: Ximo Guanter Gonzalbez <ximo@tid.es>

Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits:

fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions
e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG

5c6ec94d

Jul 01, 2014

update the comments in SqlParser · 6596392d

CodingCat authored 10 years ago

SqlParser has been case-insensitive after https://github.com/apache/spark/commit/dab5439a083b5f771d5d5b462d0d517fa8e9aaf2 was merged

Author: CodingCat <zhunansjtu@gmail.com>

Closes #1275 from CodingCat/master and squashes the following commits:

17931cd [CodingCat] update the comments in SqlParser

6596392d

[SPARK-2185] Emit warning when task size exceeds a threshold. · 05c3d90e

Kay Ousterhout authored 10 years ago

This functionality was added in an earlier commit but shortly
after was removed due to a bad git merge (totally my fault).

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1149 from kayousterhout/warning_bug and squashes the following commits:

3f1bb00 [Kay Ousterhout] Fixed Json tests
462a664 [Kay Ousterhout] Removed task set name from warning message
e89b2f6 [Kay Ousterhout] Fixed Json tests.
7af424c [Kay Ousterhout] Emit warning when task size exceeds a threshold.

05c3d90e

SPARK-2332 [build] add exclusion for old servlet-api on hadoop-client in core · 3319a3e3

Peter MacKinnon authored 10 years ago

Fix for class of test suite failures in jenkins

Author: Peter MacKinnon <pmackinn@redhat.com>

Closes #1271 from pdmack/master and squashes the following commits:

cfe59fd [Peter MacKinnon] exclude servlet-api in hadoop-client for sbt
6f39fec [Peter MacKinnon] add exclusion for old servlet-api on hadoop-client in core

3319a3e3

Jun 30, 2014

SPARK-2293. Replace RDD.zip usage by map with predict inside. · 04fa1223

Sean Owen authored 10 years ago

This is the only occurrence of this pattern in the examples that needs to be replaced. It only addresses the example change.

Author: Sean Owen <sowen@cloudera.com>

Closes #1250 from srowen/SPARK-2293 and squashes the following commits:

6b1b28c [Sean Owen] Compute prediction-and-label RDD directly rather than by zipping, for efficiency

04fa1223

[SPARK-2318] When exiting on a signal, print the signal name first. · 5fccb567

Reynold Xin authored 10 years ago

Author: Reynold Xin <rxin@apache.org>

Closes #1260 from rxin/signalhandler1 and squashes the following commits:

8e73552 [Reynold Xin] Uh add Logging back in ApplicationMaster.
0402ba8 [Reynold Xin] Synchronize SignalLogger.register.
dc70705 [Reynold Xin] Added SignalLogger to YARN ApplicationMaster.
79a21b4 [Reynold Xin] Added license header.
0da052c [Reynold Xin] Added the SignalLogger itself.
e587d2e [Reynold Xin] [SPARK-2318] When exiting on a signal, print the signal name first.

5fccb567

[SPARK-2322] Exception in resultHandler should NOT crash DAGScheduler and shutdown SparkContext. · 358ae153

Reynold Xin authored 10 years ago

This should go into 1.0.1.

Author: Reynold Xin <rxin@apache.org>

Closes #1264 from rxin/SPARK-2322 and squashes the following commits:

c77c07f [Reynold Xin] Added comment to SparkDriverExecutionException and a test case for accumulator.
5d8d920 [Reynold Xin] [SPARK-2322] Exception in resultHandler could crash DAGScheduler and shutdown SparkContext.

358ae153

SPARK-2077 Log serializer that actually ends up being used · 68036422

Andrew Ash authored 10 years ago

I could settle with this being a debug also if we provided an example of how to turn it on in `log4j.properties`

https://issues.apache.org/jira/browse/SPARK-2077

Author: Andrew Ash <andrew@andrewash.com>

Closes #1017 from ash211/SPARK-2077 and squashes the following commits:

580f680 [Andrew Ash] Drop to debug
0266415 [Andrew Ash] SPARK-2077 Log serializer that actually ends up being used

68036422

SPARK-897: preemptively serialize closures · a484030d

William Benton authored 10 years ago

These commits cause `ClosureCleaner.clean` to attempt to serialize the cleaned closure with the default closure serializer and throw a `SparkException` if doing so fails. This behavior is enabled by default but can be disabled at individual callsites of `SparkContext.clean`.

Commit 98e01ae8 fixes some no-op assertions in `GraphSuite` that this work exposed; I'm happy to put that in a separate PR if that would be more appropriate.

Author: William Benton <willb@redhat.com>

Closes #143 from willb/spark-897 and squashes the following commits:

bceab8a [William Benton] Commented DStream corner cases for serializability checking.
64d04d2 [William Benton] FailureSuite now checks both messages and causes.
3b3f74a [William Benton] Stylistic and doc cleanups.
b215dea [William Benton] Fixed spurious failures in ImplicitOrderingSuite
be1ecd6 [William Benton] Don't check serializability of DStream transforms.
abe816b [William Benton] Make proactive serializability checking optional.
5bfff24 [William Benton] Adds proactive closure-serializablilty checking
ed2ccf0 [William Benton] Test cases for SPARK-897.

a484030d

[SPARK-2104] Fix task serializing issues when sort with Java non serializable class · 66135a34

jerryshao authored 10 years ago

Details can be see in [SPARK-2104](https://issues.apache.org/jira/browse/SPARK-2104). This work is based on Reynold's work, add some unit tests to validate the issue.

@rxin , would you please take a look at this PR, thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes #1245 from jerryshao/SPARK-2104 and squashes the following commits:

c8ee362 [jerryshao] Make field partitions transient
2b41917 [jerryshao] Minor changes
47d763c [jerryshao] Fix task serializing issue when sort with Java non serializable class

66135a34

[SPARK-1683] Track task read metrics. · 7b71a0e0

Kay Ousterhout authored 10 years ago

This commit adds a new metric in TaskMetrics to record
the input data size and displays this information in the UI.

An earlier version of this commit also added the read time,
which can be useful for diagnosing straggler problems,
but unfortunately that change introduced a significant performance
regression for jobs that don't do much computation. In order to
track read time, we'll need to do sampling.

The screenshots below show the UI with the new "Input" field,
which I added to the stage summary page, the executor summary page,
and the per-stage page.

![image](https://cloud.githubusercontent.com/assets/1108612/3167930/2627f92a-eb77-11e3-861c-98ea5bb7a1a2.png)

![image](https://cloud.githubusercontent.com/assets/1108612/3167936/475a889c-eb77-11e3-9706-f11c48751f17.png)

![image](https://cloud.githubusercontent.com/assets/1108612/3167948/80ebcf12-eb77-11e3-87ed-349fce6a770c.png)

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #962 from kayousterhout/read_metrics and squashes the following commits:

f13b67d [Kay Ousterhout] Correctly format input bytes on executor page
8b70cde [Kay Ousterhout] Added comment about potential inaccuracy of bytesRead
d1016e8 [Kay Ousterhout] Udated SparkListenerSuite test
8461492 [Kay Ousterhout] Miniscule style fix
ae04d99 [Kay Ousterhout] Remove input metrics for parallel collections
719f19d [Kay Ousterhout] Style fixes
bb6ec62 [Kay Ousterhout] Small fixes
869ac7b [Kay Ousterhout] Updated Json tests
44a0301 [Kay Ousterhout] Fixed accidentally added line
4bd0568 [Kay Ousterhout] Added input source, renamed Hdfs to Hadoop.
f27e535 [Kay Ousterhout] Updates based on review comments and to fix rebase
bf41029 [Kay Ousterhout] Updated Json tests to pass
0fc33e0 [Kay Ousterhout] Added explicit backward compatibility test
4e52925 [Kay Ousterhout] Added Json output and associated tests.
365400b [Kay Ousterhout] [SPARK-1683] Track task read metrics.

7b71a0e0

Jun 29, 2014

[SPARK-2320] Reduce exception/code block font size in web ui · cdf613fc

Reynold Xin authored 10 years ago

Author: Reynold Xin <rxin@apache.org>

Closes #1261 from rxin/ui-pre-size and squashes the following commits:

7ab1a69 [Reynold Xin] [SPARK-2320] Reduce exception/code block font size in web ui

cdf613fc

Jun 28, 2014

Improve MapOutputTracker error logging. · 2053d793

Reynold Xin authored 10 years ago

Author: Reynold Xin <rxin@apache.org>

Closes #1258 from rxin/mapOutputTracker and squashes the following commits:

a7c95b6 [Reynold Xin] Improve MapOutputTracker error logging.

2053d793

[SPARK-1394] Remove SIGCHLD handler in worker subprocess · 3c104c79

Matthew Farrellee authored 10 years ago

It should not be the responsibility of the worker subprocess, which
does not intentionally fork, to try and cleanup child processes. Doing
so is complex and interferes with operations such as
platform.system().

If it is desirable to have tighter control over subprocesses, then
namespaces should be used and it should be the manager's resposibility
to handle cleanup.

Author: Matthew Farrellee <matt@redhat.com>

Closes #1247 from mattf/SPARK-1394 and squashes the following commits:

c36f308 [Matthew Farrellee] [SPARK-1394] Remove SIGCHLD handler in worker subprocess

3c104c79

[SPARK-2233] make-distribution script should list the git hash in the RELEASE file · b8f2e13a

Guillaume Ballet authored 10 years ago

This patch adds the git revision hash (short version) to the RELEASE file. It uses git instead of simply checking for the existence of .git, so as to make sure that this is a functional repository.

Author: Guillaume Ballet <gballet@gmail.com>

Closes #1216 from gballet/master and squashes the following commits:

eabc50f [Guillaume Ballet] Refactored the script to take comments into account.
d93e5e8 [Guillaume Ballet] [SPARK 2233] make-distribution script now lists the git hash tag in the RELEASE file.

b8f2e13a

Jun 27, 2014

[SPARK-2003] Fix python SparkContext example · 0e0686d3

Matthew Farrellee authored 10 years ago

Author: Matthew Farrellee <matt@redhat.com>

Closes #1246 from mattf/SPARK-2003 and squashes the following commits:

b12e7ca [Matthew Farrellee] [SPARK-2003] Fix python SparkContext example

0e0686d3

[SPARK-2259] Fix highly misleading docs on cluster / client deploy modes · f17510e3

Andrew Or authored 10 years ago

The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future.

This PR is prompted by countless headaches users of Spark have run into on the mailing list.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1200 from andrewor14/submit-docs and squashes the following commits:

5ea2460 [Andrew Or] Rephrase cluster vs client explanation
c827f32 [Andrew Or] Clarify spark submit messages
9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards

f17510e3

[SPARK-2307] SparkUI - storage tab displays incorrect RDDs · 21e0f77b

Andrew Or authored 10 years ago

The issue here is that the `StorageTab` listens for updates from the `StorageStatusListener`, but when a block is kicked out of the cache, `StorageStatusListener` removes it from its list. Thus, there is no way for the `StorageTab` to know whether a block has been dropped.

This issue was introduced in #1080, which was itself a bug fix. Here we revert that PR and offer a different fix for the original bug (SPARK-2144).

Author: Andrew Or <andrewor14@gmail.com>

Closes #1249 from andrewor14/storage-ui-fix and squashes the following commits:

af019ce [Andrew Or] Fix SPARK-2307

21e0f77b

Jun 26, 2014

SPARK-2181:The keys for sorting the columns of Executor page in SparkUI are incorrect · 18f29b96

witgo authored 10 years ago

Author: witgo <witgo@qq.com>

Closes #1135 from witgo/SPARK-2181 and squashes the following commits:

39dad90 [witgo] The keys for sorting the columns of Executor page in SparkUI are incorrect

18f29b96

[SPARK-2251] fix concurrency issues in random sampler · c23f5db3

Xiangrui Meng authored 10 years ago

The following code is very likely to throw an exception:

~~~
val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1)
rdd.zip(rdd).count()
~~~

because the same random number generator is used in compute partitions.

Author: Xiangrui Meng <meng@databricks.com>

Closes #1229 from mengxr/fix-sample and squashes the following commits:

f1ee3d7 [Xiangrui Meng] fix concurrency issues in random sampler

c23f5db3

[SPARK-2297][UI] Make task attempt and speculation more explicit in UI. · d1636dd7

Reynold Xin authored 10 years ago

New UI:

![screen shot 2014-06-26 at 1 43 52 pm](https://cloud.githubusercontent.com/assets/323388/3404643/82b9ddc6-fd73-11e3-96f9-f7592a7aee79.png)

Author: Reynold Xin <rxin@apache.org>

Closes #1236 from rxin/ui-task-attempt and squashes the following commits:

3b645dd [Reynold Xin] Expose attemptId in Stage.
c0474b1 [Reynold Xin] Beefed up unit test.
c404bdd [Reynold Xin] Fix ReplayListenerSuite.
f56be4b [Reynold Xin] Fixed JsonProtocolSuite.
e29e0f7 [Reynold Xin] Minor update.
5e4354a [Reynold Xin] [SPARK-2297][UI] Make task attempt and speculation more explicit in UI.

d1636dd7

Removed throwable field from FetchFailedException and added MetadataFetchFailedException · bf578dea

Reynold Xin authored 10 years ago

FetchFailedException used to have a Throwable field, but in reality we never propagate any of the throwable/exceptions back to the driver because Executor explicitly looks for FetchFailedException and then sends FetchFailed as the TaskEndReason.

This pull request removes the throwable and adds a MetadataFetchFailedException that extends FetchFailedException (so now MapOutputTracker throws MetadataFetchFailedException instead).

Author: Reynold Xin <rxin@apache.org>

Closes #1227 from rxin/metadataFetchException and squashes the following commits:

5cb1e0a [Reynold Xin] MetadataFetchFailedException extends FetchFailedException.
8861ee2 [Reynold Xin] Throw MetadataFetchFailedException in MapOutputTracker.

bf578dea

[SQL]Extract the joinkeys from join condition · 981bde9b

Cheng Hao authored 10 years ago

Extract the join keys from equality conditions, that can be evaluated using equi-join.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #1190 from chenghao-intel/extract_join_keys and squashes the following commits:

4a1060a [Cheng Hao] Fix some of the small issues
ceb4924 [Cheng Hao] Remove the redundant pattern of join keys extraction
cec34e8 [Cheng Hao] Update the code style issues
dcc4584 [Cheng Hao] Extract the joinkeys from join condition

981bde9b

Strip '@' symbols when merging pull requests. · f1f7385a

Patrick Wendell authored 10 years ago

Currently all of the commits with 'X' in them cause person X to
receive e-mails every time someone makes a public fork of Spark.

marmbrus who requested this.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1239 from pwendell/strip and squashes the following commits:

22e5a97 [Patrick Wendell] Strip '@' symbols when merging pull requests.

f1f7385a

Fixing AWS instance type information based upon current EC2 data · 62d4a0fa

Zichuan Ye authored 10 years ago

Fixed a problem in previous file in which some information regarding AWS instance types were wrong. Such information was updated base upon current AWS EC2 data.

Author: Zichuan Ye <jerry@tangentds.com>

Closes #1156 from jerry86/master and squashes the following commits:

ff36e95 [Zichuan Ye] Fixing AWS instance type information based upon current EC2 data

62d4a0fa