Commits · d12c0711faa3d4333513fcbbbee4868bcb784a26 · cs525-sp18-g07 / spark

Dec 16, 2014

[SPARK-3405] add subnet-id and vpc-id options to spark_ec2.py · d12c0711

Mike Jennings authored 10 years ago

Based on this gist:
https://gist.github.com/amar-analytx/0b62543621e1f246c0a2

We use security group ids instead of security group to get around this issue:
https://github.com/boto/boto/issues/350

Author: Mike Jennings <mvj101@gmail.com>
Author: Mike Jennings <mvj@google.com>

Closes #2872 from mvj101/SPARK-3405 and squashes the following commits:

be9cb43 [Mike Jennings] `pep8 spark_ec2.py` runs cleanly.
4dc6756 [Mike Jennings] Remove duplicate comment
731d94c [Mike Jennings] Update for code review.
ad90a36 [Mike Jennings] Merge branch 'master' of https://github.com/apache/spark into SPARK-3405
1ebffa1 [Mike Jennings] Merge branch 'master' into SPARK-3405
52aaeec [Mike Jennings] [SPARK-3405] add subnet-id and vpc-id options to spark_ec2.py

d12c0711

[SPARK-4855][mllib] testing the Chi-squared hypothesis test · cb484474

jbencook authored 10 years ago

This PR tests the pyspark Chi-squared hypothesis test from this commit: c8abddc5 and moves some of the error messaging in to python.

It is a port of the Scala tests here: [HypothesisTestSuite.scala](https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala)

Hopefully, SPARK-2980 can be closed.

Author: jbencook <jbenjamincook@gmail.com>

Closes #3679 from jbencook/master and squashes the following commits:

44078e0 [jbencook] checking that bad input throws the correct exceptions
f12ee10 [jbencook] removing checks for ValueError since input tests are on the Scala side
7536cf1 [jbencook] removing python checks for invalid input
a17ee84 [jbencook] [SPARK-2980][mllib] adding unit tests for the pyspark chi-squared test
3aeb0d9 [jbencook] [SPARK-2980][mllib] bringing Chi-squared error messages to the python side

cb484474

[SPARK-4437] update doc for WholeCombineFileRecordReader · ed362008

Davies Liu authored 10 years ago

update doc for WholeCombineFileRecordReader

Author: Davies Liu <davies@databricks.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #3301 from davies/fix_doc and squashes the following commits:

1d7422f [Davies Liu] Merge pull request #2 from JoshRosen/whole-text-file-cleanup
dc3d21a [Josh Rosen] More genericization in ConfigurableCombineFileRecordReader.
95d13eb [Davies Liu] address comment
bf800b9 [Davies Liu] update doc for WholeCombineFileRecordReader

ed362008

[SPARK-4841] fix zip with textFile() · c246b95d

Davies Liu authored 10 years ago

UTF8Deserializer can not be used in BatchedSerializer, so always use PickleSerializer() when change batchSize in zip().

Also, if two RDD have the same batch size already, they did not need re-serialize any more.

Author: Davies Liu <davies@databricks.com>

Closes #3706 from davies/fix_4841 and squashes the following commits:

20ce3a3 [Davies Liu] fix bug in _reserialize()
e3ebf7c [Davies Liu] add comment
379d2c8 [Davies Liu] fix zip with textFile()

c246b95d

[SPARK-4792] Add error message when making local dir unsuccessfully · c7628771

meiyoula authored 10 years ago

Author: meiyoula <1039320815@qq.com>

Closes #3635 from XuTingjun/master and squashes the following commits:

dd1c66d [meiyoula] when old is deleted, it will throw an exception where call it
2a55bc2 [meiyoula] Update DiskBlockManager.scala
1483a4a [meiyoula] Delete multiple retries to make dir
67f7902 [meiyoula] Try some times to make dir maybe more reasonable
1c51a0c [meiyoula] Update DiskBlockManager.scala

c7628771

Dec 15, 2014

SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionError from... · 81112e4b

Sean Owen authored 10 years ago

SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionError from Hive's LazyBinaryInteger

This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions.

Author: Sean Owen <sowen@cloudera.com>

Closes #3692 from srowen/SPARK-4814 and squashes the following commits:

caca704 [Sean Owen] Disable assertions just for Hive
f71e783 [Sean Owen] Enable assertions for SBT and Maven build

81112e4b

[Minor][Core] fix comments in MapOutputTracker · 5c24759d

wangfei authored 10 years ago

Using driver and executor in the comments of ```MapOutputTracker``` is more clear.

Author: wangfei <wangfei1@huawei.com>

Closes #3700 from scwf/commentFix and squashes the following commits:

aa68524 [wangfei] master and worker should be driver and executor

5c24759d

SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions · 2a28bc61

Sean Owen authored 10 years ago

This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it.

Author: Sean Owen <sowen@cloudera.com>

Closes #3690 from srowen/SPARK-785 and squashes the following commits:

8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey

2a28bc61

[SPARK-4668] Fix some documentation typos. · 8176b7a0

Ryan Williams authored 10 years ago

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #3523 from ryan-williams/tweaks and squashes the following commits:

d2eddaa [Ryan Williams] code review feedback
ce27fc1 [Ryan Williams] CoGroupedRDD comment nit
c6cfad9 [Ryan Williams] remove unnecessary if statement
b74ea35 [Ryan Williams] comment fix
b0221f0 [Ryan Williams] fix a gendered pronoun
c71ffed [Ryan Williams] use names on a few boolean parameters
89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager
e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io
83e8358 [Ryan Williams] fix pom.xml typo
dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md

8176b7a0

[SPARK-1037] The name of findTaskFromList & findTask in TaskSetManager.scala is confusing · 38703bbc

Ilya Ganelin authored 10 years ago

Hi all - I've renamed the methods referenced in this JIRA to clarify that they modify the provided arrays (find vs. deque).

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>

Closes #3665 from ilganeli/SPARK-1037B and squashes the following commits:

64c177c [Ilya Ganelin] Renamed deque to dequeue
f27d85e [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters
683482a [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters

38703bbc

[SPARK-4826] Fix generation of temp file names in WAL tests · f6b8591a

Josh Rosen authored 10 years ago

This PR should fix SPARK-4826, an issue where a bug in how we generate temp. file names was causing spurious test failures in the write ahead log suites.

Closes #3695.
Closes #3701.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3704 from JoshRosen/SPARK-4826 and squashes the following commits:

f2307f5 [Josh Rosen] Use Spark Utils class for directory creation/deletion
a693ddb [Josh Rosen] remove unused Random import
b275e41 [Josh Rosen] Move creation of temp. dir to beforeEach/afterEach.
9362919 [Josh Rosen] [SPARK-4826] Fix bug in generation of temp file names. in WAL suites.
86c1944 [Josh Rosen] Revert "HOTFIX: Disabling failing block manager test"

f6b8591a

[SPARK-4494][mllib] IDFModel.transform() add support for single vector · 8098fab0

Yuu ISHIKAWA authored 10 years ago

I improved `IDFModel.transform` to allow using a single vector.

[[SPARK-4494] IDFModel.transform() add support for single vector - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-4494)

Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #3603 from yu-iskw/idf and squashes the following commits:

256ff3d [Yuu ISHIKAWA] Fix typo
a3bf566 [Yuu ISHIKAWA] - Fix typo - Optimize import order - Aggregate the assertion tests - Modify `IDFModel.transform` API for pyspark
d25e49b [Yuu ISHIKAWA] Add the implementation of `IDFModel.transform` for a term frequency vector

8098fab0

HOTFIX: Disabling failing block manager test · 4c067387
Patrick Wendell authored 10 years ago

4c067387

Dec 14, 2014

fixed spelling errors in documentation · 2a2983f7

Peter Klipfel authored 10 years ago

changed "form" to "from" in 3 documentation entries for Kafka integration

Author: Peter Klipfel <peter@klipfel.me>

Closes #3691 from peterklipfel/master and squashes the following commits:

0fe7fc5 [Peter Klipfel] fixed spelling errors in documentation

2a2983f7

Dec 12, 2014

MAINTENANCE: Automated closing of pull requests. · ef84dab8

Patrick Wendell authored 10 years ago

This commit exists to close the following pull requests on Github:

Closes #3488 (close requested by 'pwendell')
Closes #2939 (close requested by 'marmbrus')
Closes #3173 (close requested by 'marmbrus')

ef84dab8

[SPARK-4829] [SQL] add rule to fold count(expr) if expr is not null · 41a3f934

Daoyuan Wang authored 10 years ago

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3676 from adrian-wang/countexpr and squashes the following commits:

dc5765b [Daoyuan Wang] add rule to fold count(expr) if expr is not null

41a3f934

[SPARK-4742][SQL] The name of Parquet File generated by... · 8091dd62

Sasaki Toru authored 10 years ago

[SPARK-4742][SQL] The name of Parquet File generated by AppendingParquetOutputFormat should be zero padded

When I use Parquet File as a output file using ParquetOutputFormat#getDefaultWorkFile, the file name is not zero padded while RDD#saveAsText does zero padding.

Author: Sasaki Toru <sasakitoa@nttdata.co.jp>

Closes #3602 from sasakitoa/parquet-zeroPadding and squashes the following commits:

6b0e58f [Sasaki Toru] Merge branch 'master' of git://github.com/apache/spark into parquet-zeroPadding
20dc79d [Sasaki Toru] Fixed the name of Parquet File generated by AppendingParquetOutputFormat

8091dd62

[SPARK-4825] [SQL] CTAS fails to resolve when created using saveAsTable · 0abbff28

Cheng Hao authored 10 years ago

Fix bug when query like:
```
  test("save join to table") {
    val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString))
    sql("CREATE TABLE test1 (key INT, value STRING)")
    testData.insertInto("test1")
    sql("CREATE TABLE test2 (key INT, value STRING)")
    testData.insertInto("test2")
    testData.insertInto("test2")
    sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test")
    checkAnswer(
      table("test"),
      sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq)
  }
```

Author: Cheng Hao <hao.cheng@intel.com>

Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits:

e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS
e004895 [Cheng Hao] fix bug

0abbff28

[SQL] enable empty aggr test case · cbb634ae

Daoyuan Wang authored 10 years ago

This is fixed by SPARK-4318 #3184

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3445 from adrian-wang/emptyaggr and squashes the following commits:

982575e [Daoyuan Wang] enable empty aggr test case

cbb634ae

[SPARK-4828] [SQL] sum and avg on empty table should always return null · acb3be6b

Daoyuan Wang authored 10 years ago

So the optimizations are not valid. Also I think the optimization here is rarely encounter, so removing them will not have influence on performance.

Can we merge #3445 before I add a comparison test case from this?

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3675 from adrian-wang/sumempty and squashes the following commits:

42df763 [Daoyuan Wang] sum and avg on empty table should always return null

acb3be6b

[SQL] Remove unnecessary case in HiveContext.toHiveString · d8cf6785

scwf authored 10 years ago

a follow up of #3547
/cc marmbrus

Author: scwf <wangfei1@huawei.com>

Closes #3563 from scwf/rnc and squashes the following commits:

9395661 [scwf] remove unnecessary condition

d8cf6785

[SPARK-4293][SQL] Make Cast be able to handle complex types. · 33448036

Takuya UESHIN authored 10 years ago

Inserting data of type including `ArrayType.containsNull == false` or `MapType.valueContainsNull == false` or `StructType.fields.exists(_.nullable == false)` into Hive table will fail because `Cast` inserted by `HiveMetastoreCatalog.PreInsertionCasts` rule of `Analyzer` can't handle these types correctly.

Complex type cast rule proposal:

- Cast for non-complex types should be able to cast the same as before.
- Cast for `ArrayType` can evaluate if
  - Element type can cast
  - Nullability rule doesn't break
- Cast for `MapType` can evaluate if
  - Key type can cast
  - Nullability for casted key type is `false`
  - Value type can cast
  - Nullability rule for value type doesn't break
- Cast for `StructType` can evaluate if
  - The field size is the same
  - Each field can cast
  - Nullability rule for each field doesn't break
- The nested structure should be the same.

Nullability rule:

- If the casted type is `nullable == true`, the target nullability should be `true`

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #3150 from ueshin/issues/SPARK-4293 and squashes the following commits:

e935939 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293
ba14003 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293
8999868 [Takuya UESHIN] Fix a test title.
f677c30 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293
287f410 [Takuya UESHIN] Add tests to insert data of types ArrayType / MapType / StructType with nullability is false into Hive table.
4f71bb8 [Takuya UESHIN] Make Cast be able to handle complex types.

33448036

[SPARK-4639] [SQL] Pass maxIterations in as a parameter in Analyzer · c152dde7

Jacky Li authored 10 years ago

fix a TODO in Analyzer:
// TODO: pass this in as a parameter
val fixedPoint = FixedPoint(100)

Author: Jacky Li <jacky.likun@huawei.com>

Closes #3499 from jackylk/config and squashes the following commits:

4c1252c [Jacky Li] fix scalastyle
820f460 [Jacky Li] pass maxIterations in as a parameter

c152dde7

[SPARK-4662] [SQL] Whitelist more unittest · a7f07f51

Cheng Hao authored 10 years ago

Whitelist more hive unit test:

"create_like_tbl_props"
"udf5"
"udf_java_method"
"decimal_1"
"udf_pmod"
"udf_to_double"
"udf_to_float"
"udf7" (this will fail in Hive 0.12)

Author: Cheng Hao <hao.cheng@intel.com>

Closes #3522 from chenghao-intel/unittest and squashes the following commits:

f54e4c7 [Cheng Hao] work around to clean up the hive.table.parameters.default in reset
16fee22 [Cheng Hao] Whitelist more unittest

a7f07f51

[SPARK-4713] [SQL] SchemaRDD.unpersist() should not raise exception if it is not persisted · bf40cf89

Cheng Hao authored 10 years ago

Unpersist a uncached RDD, will not raise exception, for example:
```
val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)
distData.unpersist(true)
```

But the `SchemaRDD` will raise exception if the `SchemaRDD` is not cached. Since `SchemaRDD` is the subclasses of the `RDD`, we should follow the same behavior.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #3572 from chenghao-intel/try_uncache and squashes the following commits:

50a7a89 [Cheng Hao] SchemaRDD.unpersist() should not raise exception if it is not persisted

bf40cf89

Dec 11, 2014

[SPARK-4806] Streaming doc update for 1.2 · b004150a

Tathagata Das authored 10 years ago

Important updates to the streaming programming guide
- Make the fault-tolerance properties easier to understand, with information about write ahead logs
- Update the information about deploying the spark streaming app with information about Driver HA
- Update Receiver guide to discuss reliable vs unreliable receivers.

Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>
Author: Josh Rosen <rosenville@gmail.com>

Closes #3653 from tdas/streaming-doc-update-1.2 and squashes the following commits:

f53154a [Tathagata Das] Addressed Josh's comments.
ce299e4 [Tathagata Das] Minor update.
ca19078 [Tathagata Das] Minor change
f746951 [Tathagata Das] Mentioned performance problem with WAL
7787209 [Tathagata Das] Merge branch 'streaming-doc-update-1.2' of github.com:tdas/spark into streaming-doc-update-1.2
2184729 [Tathagata Das] Updated Kafka and Flume guides with reliability information.
2f3178c [Tathagata Das] Added more information about writing reliable receivers in the custom receiver guide.
91aa5aa [Tathagata Das] Improved API Docs menu
5707581 [Tathagata Das] Added Pythn API badge
b9c8c24 [Tathagata Das] Merge pull request #26 from JoshRosen/streaming-programming-guide
b8c8382 [Josh Rosen] minor fixes
a4ef126 [Josh Rosen] Restructure parts of the fault-tolerance section to read a bit nicer when skipping over the headings
65f66cd [Josh Rosen] Fix broken link to fault-tolerance semantics section.
f015397 [Josh Rosen] Minor grammar / pluralization fixes.
3019f3a [Josh Rosen] Fix minor Markdown formatting issues
aa8bb87 [Tathagata Das] Small update.
195852c [Tathagata Das] Updated based on Josh's comments, updated receiver reliability and deploying section, and also updated configuration.
17b99fb [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-doc-update-1.2
a0217c0 [Tathagata Das] Changed Deploying menu layout
67fcffc [Tathagata Das] Added cluster mode + supervise example to submitting application guide.
e45453b [Tathagata Das] Update streaming guide, added deploying section.
192c7a7 [Tathagata Das] Added more info about Python API, and rewrote the checkpointing section.

b004150a

[SPARK-4791] [sql] Infer schema from case class with multiple constructors · 2a5b5fd4

Joseph K. Bradley authored 10 years ago

Modified ScalaReflection.schemaFor to take primary constructor of Product when there are multiple constructors. Added test to suite which failed before but works now.

Needed for [https://github.com/apache/spark/pull/3637]

CC: marmbrus

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #3646 from jkbradley/sql-reflection and squashes the following commits:

796b2e4 [Joseph K. Bradley] Modified ScalaReflection.schemaFor to take primary constructor of Product when there are multiple constructors. Added test to suite which failed before but works now.

2a5b5fd4

Dec 10, 2014

[CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala with other places · 57d37f9c

Zhang, Liye authored 10 years ago

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #2793 from liyezhang556520/uniformHashMap and squashes the following commits:

5884735 [Zhang, Liye] [CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala

57d37f9c

SPARK-3526 Add section about data locality to the tuning guide · 652b781a

Andrew Ash authored 10 years ago

cc kayousterhout

I have a few outstanding questions from compiling this documentation:
- What's the difference between NO_PREF and ANY?  I understand the implications of the ordering but don't know what an example of each would be
- Why is NO_PREF ahead of RACK_LOCAL?  I would think it'd be better to schedule rack-local tasks ahead of no preference if you could only do one or the other.  Is the idea to wait longer and hope for the rack-local tasks to turn into node-local or better?
- Will there be a datacenter-local locality level in the future?  Apache Cassandra for example has this level

Author: Andrew Ash <andrew@andrewash.com>

Closes #2519 from ash211/SPARK-3526 and squashes the following commits:

44cff28 [Andrew Ash] Link to spark.locality parameters rather than copying the list
6d5d966 [Andrew Ash] Stay focused on Spark, no astronaut architecture mumbo-jumbo
20e0e31 [Andrew Ash] SPARK-3526 Add section about data locality to the tuning guide

652b781a

MAINTENANCE: Automated closing of pull requests. · 36bdb5b7

Patrick Wendell authored 10 years ago

This commit exists to close the following pull requests on Github:

Closes #2883 (close requested by 'pwendell')
Closes #3364 (close requested by 'pwendell')
Closes #4458 (close requested by 'pwendell')
Closes #1574 (close requested by 'andrewor14')
Closes #2546 (close requested by 'andrewor14')
Closes #2516 (close requested by 'andrewor14')
Closes #154 (close requested by 'andrewor14')

36bdb5b7

[SPARK-4759] Fix driver hanging from coalescing partitions · 4f93d0ca

Andrew Or authored 10 years ago

The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction.

This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly.

Author: Andrew Or <andrew@databricks.com>

Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits:

e520d6b [Andrew Or] Oops
3ebf8bd [Andrew Or] A few comments
f370a4e [Andrew Or] Fix tests
2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location

4f93d0ca

[SPARK-4569] Rename 'externalSorting' in Aggregator · 447ae2de

Ilya Ganelin authored 10 years ago

Hi all - I've renamed the unhelpfully named variable and added a comment clarifying what's actually happening.

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>

Closes #3666 from ilganeli/SPARK-4569B and squashes the following commits:

1810394 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator
e2d2092 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator
d7cefec [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator
5b3f39c [Ilya Ganelin] [SPARK-4569] Rename in Aggregator

447ae2de

[SPARK-4793] [Deploy] ensure .jar at end of line · e230da18

Daoyuan Wang authored 10 years ago

sometimes I switch between different version and do not want to rebuild spark, so I rename assembly.jar into .jar.bak, but still caught by `compute-classpath.sh`

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3641 from adrian-wang/jar and squashes the following commits:

45cbfd0 [Daoyuan Wang] ensure .jar at end of line

e230da18

[SPARK-4215] Allow requesting / killing executors only in YARN mode · faa8fd81

Andrew Or authored 10 years ago

Currently this doesn't do anything in other modes, so we might as well just disable it rather than having the user mistakenly rely on it.

Author: Andrew Or <andrew@databricks.com>

Closes #3615 from andrewor14/dynamic-allocation-yarn-only and squashes the following commits:

ce6487a [Andrew Or] Allow requesting / killing executors only in YARN mode

faa8fd81

[SPARK-4771][Docs] Document standalone cluster supervise mode · 56212831

Andrew Or authored 10 years ago

tdas looks like streaming already refers to the supervise mode. The link from there is broken though.

Author: Andrew Or <andrew@databricks.com>

Closes #3627 from andrewor14/document-supervise and squashes the following commits:

9ca0908 [Andrew Or] Wording changes
2b55ed2 [Andrew Or] Document standalone cluster supervise mode

56212831

[SPARK-4329][WebUI] HistoryPage pagenation · 0fc637b4

Kousuke Saruta authored 10 years ago

Current HistoryPage have links only to previous page or next page.
I suggest to add index to access history pages easily.

I implemented like following pics.

If there are many pages, current page +/- N pages, head page and last page are indexed.

![2014-11-10 16 13 25](https://cloud.githubusercontent.com/assets/4736016/4986246/9c7bbac4-6937-11e4-8695-8634d039d5b6.png)
![2014-11-10 16 03 21](https://cloud.githubusercontent.com/assets/4736016/4986210/3951bb74-6937-11e4-8b4e-9f90d266d736.png)
![2014-11-10 16 03 39](https://cloud.githubusercontent.com/assets/4736016/4986211/3b196ad8-6937-11e4-9f81-74bc0a6dad5b.png)
![2014-11-10 16 03 49](https://cloud.githubusercontent.com/assets/4736016/4986213/40686138-6937-11e4-86c0-41100f0404f6.png)
![2014-11-10 16 04 04](https://cloud.githubusercontent.com/assets/4736016/4986215/4326c9b4-6937-11e4-87ac-0f30c86ec6e3.png)

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3194 from sarutak/history-page-indexing and squashes the following commits:

15d3d2d [Kousuke Saruta] Simplified code
c93932e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
1c2f605 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
76b05e3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
b2240f8 [Kousuke Saruta] Fixed style
ec7922e [Kousuke Saruta] Simplified code
755a004 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
cfa242b [Kousuke Saruta] Added index to HistoryPage

0fc637b4

[SPARK-4161]Spark shell class path is not correctly set if... · 742e7093

GuoQiang Li authored 10 years ago

[SPARK-4161]Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf

Author: GuoQiang Li <witgo@qq.com>

Closes #3050 from witgo/SPARK-4161 and squashes the following commits:

abb6fa4 [GuoQiang Li] move usejavacp opt to spark-shell
89e39e7 [GuoQiang Li] review commit
c2a6f04 [GuoQiang Li] Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf

742e7093

[SPARK-4772] Clear local copies of accumulators as soon as we're done with them · 94b377f9

Nathan Kronenfeld authored 10 years ago

Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker.

This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up.

Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com>

Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits:

a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests
b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark.
537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task.
39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them

94b377f9

[Minor] Use <sup> tag for help icon in web UI page header · f79c1cfc

Josh Rosen authored 10 years ago

This small commit makes the `(?)` web UI help link into a superscript, which should address feedback that the current design makes it look like an error occurred or like information is missing.

Before:

![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png)

After:

![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png)

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3659 from JoshRosen/webui-help-sup and squashes the following commits:

bd72899 [Josh Rosen] Use <sup> tag for help icon in web UI page header.

f79c1cfc

Dec 09, 2014

Config updates for the new shuffle transport. · 9bd9334f

Reynold Xin authored 10 years ago

Author: Reynold Xin <rxin@databricks.com>

Closes #3657 from rxin/conf-update and squashes the following commits:

7370eab [Reynold Xin] Config updates for the new shuffle transport.

9bd9334f