Commits · f17510e371dfbeaada3c72b884d70c36503ea30a · cs525-sp18-g07 / spark

Jun 27, 2014

[SPARK-2259] Fix highly misleading docs on cluster / client deploy modes · f17510e3

Andrew Or authored 10 years ago

The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future.

This PR is prompted by countless headaches users of Spark have run into on the mailing list.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1200 from andrewor14/submit-docs and squashes the following commits:

5ea2460 [Andrew Or] Rephrase cluster vs client explanation
c827f32 [Andrew Or] Clarify spark submit messages
9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards

f17510e3

[SPARK-2307] SparkUI - storage tab displays incorrect RDDs · 21e0f77b

Andrew Or authored 10 years ago

The issue here is that the `StorageTab` listens for updates from the `StorageStatusListener`, but when a block is kicked out of the cache, `StorageStatusListener` removes it from its list. Thus, there is no way for the `StorageTab` to know whether a block has been dropped.

This issue was introduced in #1080, which was itself a bug fix. Here we revert that PR and offer a different fix for the original bug (SPARK-2144).

Author: Andrew Or <andrewor14@gmail.com>

Closes #1249 from andrewor14/storage-ui-fix and squashes the following commits:

af019ce [Andrew Or] Fix SPARK-2307

21e0f77b

Jun 26, 2014

SPARK-2181:The keys for sorting the columns of Executor page in SparkUI are incorrect · 18f29b96

witgo authored 10 years ago

Author: witgo <witgo@qq.com>

Closes #1135 from witgo/SPARK-2181 and squashes the following commits:

39dad90 [witgo] The keys for sorting the columns of Executor page in SparkUI are incorrect

18f29b96

[SPARK-2251] fix concurrency issues in random sampler · c23f5db3

Xiangrui Meng authored 10 years ago

The following code is very likely to throw an exception:

~~~
val rdd = sc.parallelize(0 until 111, 10).sample(false, 0.1)
rdd.zip(rdd).count()
~~~

because the same random number generator is used in compute partitions.

Author: Xiangrui Meng <meng@databricks.com>

Closes #1229 from mengxr/fix-sample and squashes the following commits:

f1ee3d7 [Xiangrui Meng] fix concurrency issues in random sampler

c23f5db3

[SPARK-2297][UI] Make task attempt and speculation more explicit in UI. · d1636dd7

Reynold Xin authored 10 years ago

New UI:

![screen shot 2014-06-26 at 1 43 52 pm](https://cloud.githubusercontent.com/assets/323388/3404643/82b9ddc6-fd73-11e3-96f9-f7592a7aee79.png)

Author: Reynold Xin <rxin@apache.org>

Closes #1236 from rxin/ui-task-attempt and squashes the following commits:

3b645dd [Reynold Xin] Expose attemptId in Stage.
c0474b1 [Reynold Xin] Beefed up unit test.
c404bdd [Reynold Xin] Fix ReplayListenerSuite.
f56be4b [Reynold Xin] Fixed JsonProtocolSuite.
e29e0f7 [Reynold Xin] Minor update.
5e4354a [Reynold Xin] [SPARK-2297][UI] Make task attempt and speculation more explicit in UI.

d1636dd7

Removed throwable field from FetchFailedException and added MetadataFetchFailedException · bf578dea

Reynold Xin authored 10 years ago

FetchFailedException used to have a Throwable field, but in reality we never propagate any of the throwable/exceptions back to the driver because Executor explicitly looks for FetchFailedException and then sends FetchFailed as the TaskEndReason.

This pull request removes the throwable and adds a MetadataFetchFailedException that extends FetchFailedException (so now MapOutputTracker throws MetadataFetchFailedException instead).

Author: Reynold Xin <rxin@apache.org>

Closes #1227 from rxin/metadataFetchException and squashes the following commits:

5cb1e0a [Reynold Xin] MetadataFetchFailedException extends FetchFailedException.
8861ee2 [Reynold Xin] Throw MetadataFetchFailedException in MapOutputTracker.

bf578dea

[SQL]Extract the joinkeys from join condition · 981bde9b

Cheng Hao authored 10 years ago

Extract the join keys from equality conditions, that can be evaluated using equi-join.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #1190 from chenghao-intel/extract_join_keys and squashes the following commits:

4a1060a [Cheng Hao] Fix some of the small issues
ceb4924 [Cheng Hao] Remove the redundant pattern of join keys extraction
cec34e8 [Cheng Hao] Update the code style issues
dcc4584 [Cheng Hao] Extract the joinkeys from join condition

981bde9b

Strip '@' symbols when merging pull requests. · f1f7385a

Patrick Wendell authored 10 years ago

Currently all of the commits with 'X' in them cause person X to
receive e-mails every time someone makes a public fork of Spark.

marmbrus who requested this.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #1239 from pwendell/strip and squashes the following commits:

22e5a97 [Patrick Wendell] Strip '@' symbols when merging pull requests.

f1f7385a

Fixing AWS instance type information based upon current EC2 data · 62d4a0fa

Zichuan Ye authored 10 years ago

Fixed a problem in previous file in which some information regarding AWS instance types were wrong. Such information was updated base upon current AWS EC2 data.

Author: Zichuan Ye <jerry@tangentds.com>

Closes #1156 from jerry86/master and squashes the following commits:

ff36e95 [Zichuan Ye] Fixing AWS instance type information based upon current EC2 data

62d4a0fa

[SPARK-2286][UI] Report exception/errors for failed tasks that are not ExceptionFailure · 6587ef7c

Reynold Xin authored 10 years ago

Also added inline doc for each TaskEndReason.

Author: Reynold Xin <rxin@apache.org>

Closes #1225 from rxin/SPARK-2286 and squashes the following commits:

6a7959d [Reynold Xin] Fix unit test failure.
cf9d5eb [Reynold Xin] Merge branch 'master' into SPARK-2286
a61fae1 [Reynold Xin] Move to line above ...
38c7391 [Reynold Xin] [SPARK-2286][UI] Report exception/errors for failed tasks that are not ExceptionFailure.

6587ef7c

[SPARK-2295] [SQL] Make JavaBeans nullability stricter. · 32a1ad75

Takuya UESHIN authored 10 years ago

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1235 from ueshin/issues/SPARK-2295 and squashes the following commits:

201c508 [Takuya UESHIN] Make JavaBeans nullability stricter.

32a1ad75

Remove use of spark.worker.instances · 48a82a82

Kay Ousterhout authored 10 years ago

spark.worker.instances was added as part of this commit: https://github.com/apache/spark/commit/1617816090e7b20124a512a43860a21232ebf511

My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility,
but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should
not have been added.

@sryza @pwendell @tgravescs LMK if I'm understanding this correctly

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1214 from kayousterhout/yarn_config and squashes the following commits:

3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances

48a82a82

[SPARK-2254] [SQL] ScalaRefection should mark primitive types as non-nullable. · e4899a25

Takuya UESHIN authored 10 years ago

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #1193 from ueshin/issues/SPARK-2254 and squashes the following commits:

cfd6088 [Takuya UESHIN] Modify ScalaRefection.schemaFor method to return nullability of Scala Type.

e4899a25

[SPARK-2172] PySpark cannot import mllib modules in YARN-client mode · 441cdcca

Szul, Piotr authored 10 years ago


Include pyspark/mllib python sources as resources in the mllib.jar.
This way they will be included in the final assembly

Author: Szul, Piotr <Piotr.Szul@csiro.au>

Closes #1223 from piotrszul/branch-1.0 and squashes the following commits:

69d5174 [Szul, Piotr] Removed unsed resource directory src/main/resource from mllib pom
f8c52a0 [Szul, Piotr] [SPARK-2172] PySpark cannot import mllib modules in YARN-client mode Include pyspark/mllib python sources as resources in the jar

(cherry picked from commit fa167194)
Signed-off-by: Reynold Xin <rxin@apache.org>

441cdcca

[SPARK-2284][UI] Mark all failed tasks as failures. · 4a346e24

Reynold Xin authored 10 years ago

Previously only tasks failed with ExceptionFailure reason was marked as failure.

Author: Reynold Xin <rxin@apache.org>

Closes #1224 from rxin/SPARK-2284 and squashes the following commits:

be79dbd [Reynold Xin] [SPARK-2284][UI] Mark all failed tasks as failures.

4a346e24

Jun 25, 2014

[SPARK-1749] Job cancellation when SchedulerBackend does not implement killTask · b88a59a6

Mark Hamstra authored 10 years ago

This is a fixed up version of #686 (cc @markhamstra @pwendell). The last commit (the only one I authored) reflects the changes I made from Mark's original patch.

Author: Mark Hamstra <markhamstra@gmail.com>
Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1219 from kayousterhout/mark-SPARK-1749 and squashes the following commits:

42dfa7e [Kay Ousterhout] Got rid of terrible double-negative name
80b3205 [Kay Ousterhout] Don't notify listeners of job failure if it wasn't successfully cancelled.
d156d33 [Mark Hamstra] Do nothing in no-kill submitTasks
9312baa [Mark Hamstra] code review update
cc353c8 [Mark Hamstra] scalastyle
e61f7f8 [Mark Hamstra] Catch UnsupportedOperationException when DAGScheduler tries to cancel a job on a SchedulerBackend that does not implement killTask

b88a59a6

[SPARK-2283][SQL] Reset test environment before running PruningSuite · 7f196b00

Cheng Lian authored 10 years ago

JIRA issue: [SPARK-2283](https://issues.apache.org/jira/browse/SPARK-2283)

If `PruningSuite` is run right after `HiveCompatibilitySuite`, the first test case fails because `srcpart` table is cached in-memory by `HiveCompatibilitySuite`, but column pruning is not implemented for `InMemoryColumnarTableScan` operator yet.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1221 from liancheng/spark-2283 and squashes the following commits:

dc0b663 [Cheng Lian] SPARK-2283: reset test environment before running PruningSuite

7f196b00

[SQL] SPARK-1800 Add broadcast hash join operator & associated hints. · 9d824fed

Zongheng Yang authored 10 years ago

This PR is based off Michael's [PR 734](https://github.com/apache/spark/pull/734) and includes a bunch of cleanups.

Moreover, this PR also
- makes `SparkLogicalPlan` take a `tableName: String`, which facilitates testing.
- moves join-related tests to a single file.

Author: Zongheng Yang <zongheng.y@gmail.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #1163 from concretevitamin/auto-broadcast-hash-join and squashes the following commits:

d0f4991 [Zongheng Yang] Fix bug in broadcast hash join & add test to cover it.
af080d7 [Zongheng Yang] Fix in joinIterators()'s next().
440d277 [Zongheng Yang] Fixes to imports; add back requiredChildDistribution (lost when merging)
208d5f6 [Zongheng Yang] Make LeftSemiJoinHash mix in HashJoin.
ad6c7cc [Zongheng Yang] Minor cleanups.
814b3bf [Zongheng Yang] Merge branch 'master' into auto-broadcast-hash-join
a8a093e [Zongheng Yang] Minor cleanups.
6fd8443 [Zongheng Yang] Cut down size estimation related stuff.
a4267be [Zongheng Yang] Add test for broadcast hash join and related necessary refactorings:
0e64b08 [Zongheng Yang] Scalastyle fix.
91461c2 [Zongheng Yang] Merge branch 'master' into auto-broadcast-hash-join
7c7158b [Zongheng Yang] Prototype of auto conversion to broadcast hash join.
0ad122f [Zongheng Yang] Merge branch 'master' into auto-broadcast-hash-join
3e5d77c [Zongheng Yang] WIP: giant and messy WIP.
a92ed0c [Michael Armbrust] Formatting.
76ca434 [Michael Armbrust] A simple strategy that broadcasts tables only when they are found in a configuration hint.
cf6b381 [Michael Armbrust] Split out generic logic for hash joins and create two concrete physical operators: BroadcastHashJoin and ShuffledHashJoin.
a8420ca [Michael Armbrust] Copy records in executeCollect to avoid issues with mutable rows.

9d824fed

[SPARK-2204] Launch tasks on the proper executors in mesos fine-grained mode · 1132e472

Sebastien Rainville authored 10 years ago

The scheduler for Mesos in fine-grained mode launches tasks on the wrong executors. `MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer])` is assuming that `TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer])` is returning task lists in the same order as the offers it was passed, but in the current implementation `TaskSchedulerImpl.resourceOffers` shuffles the offers to avoid assigning the tasks always to the same executors. The result is that the tasks are launched on the wrong executors. The jobs are sometimes able to complete, but most of the time they fail. It seems that as soon as something goes wrong with a task for some reason Spark is not able to recover since it's mistaken as to where the tasks are actually running. Also, it seems that the more the cluster is under load the more likely the job is to fail because there's a higher probability that Spark is trying to launch a task on a slave that doesn't actually have enough resources, again because it's using the wrong offers.

The solution is to not assume that the order in which the tasks are returned is the same as the offers, and simply launch the tasks on the executor decided by `TaskSchedulerImpl.resourceOffers`. What I am not sure about is that I considered slaveId and executorId to be the same, which is true at least in my setup, but I don't know if that is always true.

I tested this on top of the 1.0.0 release and it seems to work fine on our cluster.

Author: Sebastien Rainville <sebastien@hopper.com>

Closes #1140 from sebastienrainville/fine-grained-mode-fix-master and squashes the following commits:

a98b0e0 [Sebastien Rainville] Use a HashMap to retrieve the offer indices
d6ffe54 [Sebastien Rainville] Launch tasks on the proper executors in mesos fine-grained mode

1132e472

[SPARK-2270] Kryo cannot serialize results returned by asJavaIterable · 7ff2c754

Reynold Xin authored 10 years ago

and thus groupBy/cogroup are broken in Java APIs when Kryo is used).

@pwendell this should be merged into 1.0.1.

Thanks @sorenmacbeth for reporting this & helping out with the fix.

Author: Reynold Xin <rxin@apache.org>

Closes #1206 from rxin/kryo-iterable-2270 and squashes the following commits:

09da0aa [Reynold Xin] Updated the comment.
009bf64 [Reynold Xin] [SPARK-2270] Kryo cannot serialize results returned by asJavaIterable (and thus groupBy/cogroup are broken in Java APIs when Kryo is used).

7ff2c754

[SPARK-2258 / 2266] Fix a few worker UI bugs · 9aa60329

Andrew Or authored 10 years ago

**SPARK-2258.** Worker UI displays zombie processes if the executor throws an exception before a process is launched. This is because we only inform the Worker of the change if the process is already launched, which in this case it isn't.

**SPARK-2266.** We expose "Some(app-id)" on the log page. This is fairly minor.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1213 from andrewor14/fix-worker-ui and squashes the following commits:

c1223fe [Andrew Or] Fix worker UI bugs

9aa60329

[SPARK-2242] HOTFIX: pyspark shell hangs on simple job · 5603e4c4

Andrew Or authored 10 years ago

This reverts a change introduced in 38702487, which redirected all stderr to the OS pipe instead of directly to the `bin/pyspark` shell output. This causes a simple job to hang in two ways:

1. If the cluster is not configured correctly or does not have enough resources, the job hangs without producing any output, because the relevant warning messages are masked.
2. If the stderr volume is large, this could lead to a deadlock if we redirect everything to the OS pipe. From the [python docs](https://docs.python.org/2/library/subprocess.html):

```
Note Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock
based on the child process output volume. Use Popen with the communicate() method
when you need pipes.
```

Note that we cannot remove `stdout=PIPE` in a similar way, because we currently use it to communicate the py4j port. However, it should be fine (as it has been for a long time) because we do not produce a ton of traffic through `stdout`.

That commit was not merged in branch-1.0, so this fix is for master only.

Author: Andrew Or <andrewor14@gmail.com>

Closes #1178 from andrewor14/fix-python and squashes the following commits:

e68e870 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-python
20849a8 [Andrew Or] Tone down stdout interference message
a09805b [Andrew Or] Return more than 1 line of error message to user
6dfbd1e [Andrew Or] Don't swallow original exception
0d1861f [Andrew Or] Provide more helpful output if stdout is garbled
21c9d7c [Andrew Or] Do not mask stderr from output

5603e4c4

Replace doc reference to Shark with Spark SQL. · ac06a85d
Reynold Xin authored 10 years ago

ac06a85d

SPARK-2038: rename "conf" parameters in the saveAsHadoop functions with source-compatibility · acc01ab3

CodingCat authored 10 years ago

https://issues.apache.org/jira/browse/SPARK-2038

to differentiate with SparkConf object and at the same time keep the source level compatibility

Author: CodingCat <zhunansjtu@gmail.com>

Closes #1137 from CodingCat/SPARK-2038 and squashes the following commits:

11abeba [CodingCat] revise the comments
7ee5712 [CodingCat] to keep the source-compatibility
763975f [CodingCat] style fix
d91288d [CodingCat] rename "conf" parameters in the saveAsHadoop functions

acc01ab3

[BUGFIX][SQL] Should match java.math.BigDecimal when wnrapping Hive output · 22036aeb

Cheng Lian authored 10 years ago

The `BigDecimal` branch in `unwrap` matches to `scala.math.BigDecimal` rather than `java.math.BigDecimal`.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1199 from liancheng/javaBigDecimal and squashes the following commits:

e9bb481 [Cheng Lian] Should match java.math.BigDecimal when wnrapping Hive output

22036aeb

[SPARK-2263][SQL] Support inserting MAP<K, V> to Hive tables · 8fade897

Cheng Lian authored 10 years ago

JIRA issue: [SPARK-2263](https://issues.apache.org/jira/browse/SPARK-2263)

Map objects were not converted to Hive types before inserting into Hive tables.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1205 from liancheng/spark-2263 and squashes the following commits:

c7a4373 [Cheng Lian] Addressed @concretevitamin's comment
784940b [Cheng Lian] SARPK-2263: support inserting MAP<K, V> to Hive tables

8fade897

Jun 24, 2014

SPARK-2248: spark.default.parallelism does not apply in local mode · b6b44853

witgo authored 10 years ago

Author: witgo <witgo@qq.com>

Closes #1194 from witgo/SPARK-2248 and squashes the following commits:

6ac950b [witgo] spark.default.parallelism does not apply in local mode

b6b44853

Fix possible null pointer in acumulator toString · 2714968e

Michael Armbrust authored 10 years ago

Author: Michael Armbrust <michael@databricks.com>

Closes #1204 from marmbrus/nullPointerToString and squashes the following commits:

35b5fce [Michael Armbrust] Fix possible null pointer in acumulator toString

2714968e

Autodetect JAVA_HOME on RPM-based systems · 54055fb2

Matthew Farrellee authored 10 years ago

Author: Matthew Farrellee <matt@redhat.com>

Closes #1185 from mattf/master-1 and squashes the following commits:

42150fc [Matthew Farrellee] Autodetect JAVA_HOME on RPM-based systems

54055fb2

[SQL]Add base row updating methods for JoinedRow · 133495d8

Cheng Hao authored 10 years ago

This will be helpful in join operators.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #1187 from chenghao-intel/joinedRow and squashes the following commits:

87c19e3 [Cheng Hao] Add base row set methods for JoinedRow

133495d8

[SPARK-1112, 2156] Bootstrap to fetch the driver's Spark properties. · 8ca41769

Xiangrui Meng authored 10 years ago

This is an alternative solution to #1124 . Before launching the executor backend, we first fetch driver's spark properties and use it to overwrite executor's spark properties. This should be better than #1124.

@pwendell Are there spark properties that might be different on the driver and on the executors?

Author: Xiangrui Meng <meng@databricks.com>

Closes #1132 from mengxr/akka-bootstrap and squashes the following commits:

77ff32d [Xiangrui Meng] organize imports
68e1dfb [Xiangrui Meng] use timeout from AkkaUtils; remove props from RegisteredExecutor
46d332d [Xiangrui Meng] fix a test
7947c18 [Xiangrui Meng] increase slack size for akka
4ab696a [Xiangrui Meng] bootstrap to retrieve driver spark conf

8ca41769

[SPARK-2264][SQL] Fix failing CachedTableSuite · a162c9b3

Michael Armbrust authored 10 years ago

Author: Michael Armbrust <michael@databricks.com>

Closes #1201 from marmbrus/fixCacheTests and squashes the following commits:

9d87ed1 [Michael Armbrust] Use analyzer (which runs to fixed point) instead of manually removing analysis operators.

a162c9b3

Fix broken Json tests. · 1978a903

Kay Ousterhout authored 10 years ago

The assertJsonStringEquals method was missing an "assert" so
did not actually check that the strings were equal. This commit
adds the missing assert and fixes subsequently revealed problems
with the JsonProtocolSuite.

@andrewor14 I changed some of the test functionality to match what it
looks like you intended based on the expected strings -- let me know if
anything here looks wrong.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #1198 from kayousterhout/json_test_fix and squashes the following commits:

77f858f [Kay Ousterhout] Fix broken Json tests.

1978a903

HOTFIX: Disabling tests per SPARK-2264 · 221909e6
Patrick Wendell authored 10 years ago

221909e6

SPARK-1937: fix issue with task locality · 924b7082

Rui Li authored 10 years ago

Don't check executor/host availability when creating a TaskSetManager. Because the executors may haven't been registered when the TaskSetManager is created, in which case all tasks will be considered "has no preferred locations", and thus losing data locality in later scheduling.

Author: Rui Li <rui.li@intel.com>
Author: lirui-intel <rui.li@intel.com>

Closes #892 from lirui-intel/delaySchedule and squashes the following commits:

8444d7c [Rui Li] fix code style
fafd57f [Rui Li] keep locality constraints within the valid levels
18f9e05 [Rui Li] restrict allowed locality
5b3fb2f [Rui Li] refine UT
99f843e [Rui Li] add unit test and fix bug
fff4123 [Rui Li] fix computing valid locality levels
685ed3d [Rui Li] remove delay shedule for pendingTasksWithNoPrefs
7b0177a [Rui Li] remove redundant code
c7b93b5 [Rui Li] revise patch
3d7da02 [lirui-intel] Update TaskSchedulerImpl.scala
cab4c71 [Rui Li] revised patch
539a578 [Rui Li] fix code style
cf0d6ac [Rui Li] fix code style
3dfae86 [Rui Li] re-compute pending tasks when new host is added
a225ac2 [Rui Li] SPARK-1937: fix issue with task locality

924b7082

[SPARK-2252] Fix MathJax for HTTPs. · 420c1c3e

Reynold Xin authored 10 years ago

Found out about this from the Hacker News link to GraphX which was using HTTPs.

@mengxr

Author: Reynold Xin <rxin@apache.org>

Closes #1189 from rxin/mllib-doc and squashes the following commits:

5328be0 [Reynold Xin] [SPARK-2252] Fix MathJax for HTTPs.

420c1c3e

Jun 23, 2014

[SPARK-2124] Move aggregation into shuffle implementations · 56eb8af1

jerryshao authored 10 years ago

This PR is a sub-task of SPARK-2044 to move the execution of aggregation into shuffle implementations.

I leave `CoGoupedRDD` and `SubtractedRDD` unchanged because they have their implementations of aggregation. I'm not sure is it suitable to change these two RDDs.

Also I do not move sort related code of `OrderedRDDFunctions` into shuffle, this will be solved in another sub-task.

Author: jerryshao <saisai.shao@intel.com>

Closes #1064 from jerryshao/SPARK-2124 and squashes the following commits:

4a05a40 [jerryshao] Modify according to comments
1f7dcc8 [jerryshao] Style changes
50a2fd6 [jerryshao] Fix test suite issue after moving aggregator to Shuffle reader and writer
1a96190 [jerryshao] Code modification related to the ShuffledRDD
308f635 [jerryshao] initial works of move combiner to ShuffleManager's reader and writer

56eb8af1

[SPARK-2227] Support dfs command in SQL. · 51c81683

Reynold Xin authored 10 years ago

Note that nothing gets printed to the console because we don't properly maintain session state right now.

I will have a followup PR that fixes it.

Author: Reynold Xin <rxin@apache.org>

Closes #1167 from rxin/commands and squashes the following commits:

56f04f8 [Reynold Xin] [SPARK-2227] Support dfs command in SQL.

51c81683

Cleanup on Connection, ConnectionManagerId, ConnectionManager classes part 2 · 383bf72c

Henry Saputra authored 10 years ago

Cleanup on Connection, ConnectionManagerId, and ConnectionManager classes part 2 while I was working at the code there to help IDE:
1. Remove unused imports
2. Remove parentheses in method calls that do not have side affect.
3. Add parentheses in method calls that do have side effect or not simple get to object properties.
4. Change if-else check (via isInstanceOf) for Connection class type with Scala expression for consistency and cleanliness.
5. Remove semicolon
6. Remove extra spaces.
7. Remove redundant return for consistency

Author: Henry Saputra <henry.saputra@gmail.com>

Closes #1157 from hsaputra/cleanup_connection_classes_part2 and squashes the following commits:

4be6906 [Henry Saputra] Fix Spark Scala style for line over 100 chars.
85b24f7 [Henry Saputra] Cleanup on Connection and ConnectionManager classes part 2 while I was working at the code there to help IDE: 1. Remove unused imports 2. Remove parentheses in method calls that do not have side affect. 3. Add parentheses in method calls that do have side effect. 4. Change if-else check (via isInstanceOf) for Connection class type with Scala expression for consitency and cleanliness. 5. Remove semicolon 6. Remove extra spaces.

383bf72c

[SPARK-1768] History server enhancements. · 21ddd7d1

Marcelo Vanzin authored 10 years ago

Two improvements to the history server:

- Separate the HTTP handling from history fetching, so that it's easy to add
  new backends later (thinking about SPARK-1537 in the long run)

- Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
  memory for faster access. This allows the app limit to go away, since holding
  just the listing in memory shouldn't be too expensive unless the user has millions
  of completed apps in the history (at which point I'd expect other issues to arise
  aside from history server memory usage, such as FileSystem.listStatus()
  starting to become ridiculously expensive).

I also fixed a few minor things along the way which aren't really worth mentioning.
I also removed the app's log path from the UI since that information may not even
exist depending on which backend is used (even though there is only one now).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #718 from vanzin/hist-server and squashes the following commits:

53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording.
c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs.
dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration.
4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo.
2a7f68d [Marcelo Vanzin] Address review feedback.
4e72c77 [Marcelo Vanzin] Remove comment about ordering.
249bcea [Marcelo Vanzin] Remove offset / count from provider interface.
ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps.
6e2432f [Marcelo Vanzin] Second round of feedback.
b2c570a [Marcelo Vanzin] Make class package-private.
4406f61 [Marcelo Vanzin] Cosmetic change to listing header.
e852149 [Marcelo Vanzin] Initialize new app array to expected size.
e8026f4 [Marcelo Vanzin] Review feedback.
49d2fd3 [Marcelo Vanzin] Fix a comment.
91e96ca [Marcelo Vanzin] Fix scalastyle issues.
6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info.
eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down.
bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI.
b284478 [Marcelo Vanzin] Separate history server from history backend.

21ddd7d1