Commits · 949e393101e19cd00591a9930c4b364278e22609 · cs525-sp18-g07 / spark

Apr 28, 2014

SPARK-1654 and SPARK-1653: Fixes in spark-submit. · 949e3931

Patrick Wendell authored 10 years ago

Deals with two issues:
1. Spark shell didn't correctly pass quoted arguments to spark-submit.
```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
   which triggered warnings. Now we use new, more narrowly scoped,
   variables.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #576 from pwendell/spark-submit and squashes the following commits:

67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.

949e3931

SPARK-1652: Spark submit should fail gracefully if YARN not enabled · cae054aa

Patrick Wendell authored 10 years ago

Author: Patrick Wendell <pwendell@gmail.com>

Closes #579 from pwendell/spark-submit-yarn-2 and squashes the following commits:

05e1b11 [Patrick Wendell] Small fix
d2a40ad [Patrick Wendell] SPARK-1652: Spark submit should fail gracefully if YARN support not enabled

cae054aa

Changes to dev release script · 8421034e
Patrick Wendell authored 10 years ago

8421034e

[SPARK-1633][Streaming] Java API unit test and example for custom streaming receiver in Java · 1d84964b

Tathagata Das authored 10 years ago

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #558 from tdas/more-fixes and squashes the following commits:

c0c84e6 [Tathagata Das] Removing extra println()
d8a8cf4 [Tathagata Das] More tweaks to make unit test work in Jenkins.
b7caa98 [Tathagata Das] More tweaks.
d337367 [Tathagata Das] More tweaks
22d6f2d [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes
40a961b [Tathagata Das] Modified java test to reduce flakiness.
9410ca6 [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes
86d9147 [Tathagata Das] scala style fix
2f3d7b1 [Tathagata Das] Added Scala custom receiver example.
d677611 [Tathagata Das] Merge remote-tracking branch 'apache/master' into more-fixes
bec3fc2 [Tathagata Das] Added license.
51d6514 [Tathagata Das] Fixed docs on receiver.
81aafa0 [Tathagata Das] Added Java test for Receiver API, and added JavaCustomReceiver example.

1d84964b

[SQL]Append some missing types for HiveUDF · f7358844

Cheng Hao authored 10 years ago

Add the missing types

Author: Cheng Hao <hao.cheng@intel.com>

Closes #459 from chenghao-intel/missing_types and squashes the following commits:

21cba2e [Cheng Hao] Append some missing types for HiveUDF

f7358844

Update the import package name for TestHive in sbt shell · ea01affc

Cheng Hao authored 10 years ago

sbt/sbt hive/console will fail as TestHive changed its package from "org.apache.spark.sql.hive" to "org.apache.spark.sql.hive.test".

Author: Cheng Hao <hao.cheng@intel.com>

Closes #574 from chenghao-intel/hive_console and squashes the following commits:

de14035 [Cheng Hao] Update the import package name for TestHive in sbt shell

ea01affc

Apr 27, 2014

Fix SPARK-1609: Executor fails to start when Command.extraJavaOptions... · 71f4d261

witgo authored 10 years ago

Fix SPARK-1609:  Executor fails to start when Command.extraJavaOptions contains multiple Java options

Author: witgo <witgo@qq.com>

Closes #547 from witgo/SPARK-1609 and squashes the following commits:

deb6a4c [witgo] review commit
91da0bb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609
0640852 [witgo] review commit
8f90b22 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609
bcf36cb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609
1185605 [witgo] fix extraJavaOptions split
f7c0ab7 [witgo] bugfix
86fc4bb [witgo] bugfix
8a265b7 [witgo] Fix SPARK-1609: Executor fails to start when use spark-submit

71f4d261

SPARK-1145: Memory mapping with many small blocks can cause JVM allocation failures · 6b3c6e5d

Patrick Wendell authored 10 years ago

This includes some minor code clean-up as well. The main change is that small files are not memory mapped. There is a nicer way to write that code block using Scala's `Try` but to make it easy to back port and as simple as possible, I opted for the more explicit but less pretty format.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #43 from pwendell/block-iter-logging and squashes the following commits:

1cff512 [Patrick Wendell] Small issue from merge.
49f6c269 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into block-iter-logging
4943351 [Patrick Wendell] Added a test and feedback on mateis review
a637a18 [Patrick Wendell] Review feedback and adding rewind() when reading byte buffers.
b76b95f [Patrick Wendell] Review feedback
4e1514e [Patrick Wendell] Don't memory map for small files
d238b88 [Patrick Wendell] Some logging and clean-up

6b3c6e5d

HOTFIX: Minor patch to merge script. · 3d9fb096
Patrick Wendell authored 10 years ago

3d9fb096

SPARK-1651: Delete existing deployment directory · eefb90d3

Rahul Singhal authored 10 years ago

Small bug fix to make sure the "spark contents" are copied to the
deployment directory correctly.

Author: Rahul Singhal <rahul.singhal@guavus.com>

Closes #573 from rahulsinghaliitd/SPARK-1651 and squashes the following commits:

402c999 [Rahul Singhal] SPARK-1651: Delete existing deployment directory

eefb90d3

SPARK-1648 Support closing JIRA's as part of merge script. · fe65beea

Patrick Wendell authored 10 years ago

Adds an automated hook in the merge script that can close the JIRA,
set the fix versions, and leave a comment on the JIRA indicating the
PR in which it was resolved. This ensures that (a) we always close JIRA's
when issues are merged and (b) there is a link to the pull request in every JIRA.

This requires a python library called `jira-client`. We could look at embedding this
library in our project, but it seemed simple enough to just gracefully disable this
feature if it is not installed. It can be installed with `pip install jira-client`.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #570 from pwendell/jira-pr-merge and squashes the following commits:

3022b96 [Patrick Wendell] SPARK-1648 Support closing JIRA's as part of merge script.

fe65beea

SPARK-1650: Correctly identify maven project version · 7b2527d7

Rahul Singhal authored 10 years ago

Better account for various side-effect outputs while executing
"mvn help:evaluate -Dexpression=project.version"

Author: Rahul Singhal <rahul.singhal@guavus.com>

Closes #572 from rahulsinghaliitd/SPARK-1650 and squashes the following commits:

fd6a611 [Rahul Singhal] SPARK-1650: Correctly identify maven project version

7b2527d7

Apr 26, 2014

SPARK-1606: Infer user application arguments instead of requiring --arg. · aa9a7f5d

Patrick Wendell authored 10 years ago

This modifies spark-submit to do something more like the Hadoop `jar`
command. Now we have the following syntax:

./bin/spark-submit [options] user.jar [user options]

Author: Patrick Wendell <pwendell@gmail.com>

Closes #563 from pwendell/spark-submit and squashes the following commits:

32241fc [Patrick Wendell] Review feedback
3adfb69 [Patrick Wendell] Small fix
bc48139 [Patrick Wendell] SPARK-1606: Infer user application arguments instead of requiring --arg.

aa9a7f5d

SPARK-1467: Make StorageLevel.apply() factory methods Developer APIs · 762af4e9

Sandeep authored 10 years ago

We may want to evolve these in the future to add things like SSDs, so let's mark them as experimental for now. Long-term the right solution might be some kind of builder. The stable API should be the existing StorageLevel constants.

Author: Sandeep <sandeep@techaddict.me>

Closes #551 from techaddict/SPARK-1467 and squashes the following commits:

6bdda24 [Sandeep] SPARK-1467: Make StorageLevel.apply() factory methods as Developer Api's We may want to evolve these in the future to add things like SSDs, so let's mark them as experimental for now. Long-term the right solution might be some kind of builder. The stable API should be the existing StorageLevel constants.

762af4e9

[SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to NumericType/TimestampType. · 8e37ed6e

Takuya UESHIN authored 10 years ago

`Cast.nullable` should be `true` when cast from `StringType` to `NumericType` or `TimestampType`.
Because if `StringType` expression has an illegal number string or illegal timestamp string, the casted value becomes `null`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #532 from ueshin/issues/SPARK-1608 and squashes the following commits:

065d37c [Takuya UESHIN] Add tests to check nullabilities of cast expressions.
f278ed7 [Takuya UESHIN] Revert test to keep it readable and concise.
9fc9380 [Takuya UESHIN] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.

8e37ed6e

add note of how to support table with more than 22 fields · e6e44e46

wangfei authored 10 years ago

Author: wangfei <wangfei1@huawei.com>

Closes #564 from scwf/patch-6 and squashes the following commits:

a331876 [wangfei] Update sql-programming-guide.md
685135b [wangfei] Update sql-programming-guide.md
10b3dc0 [wangfei] Update sql-programming-guide.md
1c40480 [wangfei] add note of how to support table with 22 fields

e6e44e46

Apr 25, 2014

[Spark-1382] Fix NPE in DStream.slice (updated version of #365) · 058797c1

zsxwing authored 10 years ago

@zsxwing I cherry-picked your changes and merged the master. #365 had some conflicts once again!

Author: zsxwing <zsxwing@gmail.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #562 from tdas/SPARK-1382 and squashes the following commits:

e2962c1 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-1382
20968d9 [zsxwing] Replace Exception with SparkException in DStream
e476651 [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-1382
35ba56a [zsxwing] SPARK-1382: Fix NPE in DStream.slice

058797c1

SPARK-1632. Remove unnecessary boxing in compares in ExternalAppendOnlyM... · 87cf35c2

Sandy Ryza authored 10 years ago

...ap

Author: Sandy Ryza <sandy@cloudera.com>

Closes #559 from sryza/sandy-spark-1632 and squashes the following commits:

a6cd352 [Sandy Ryza] Only compute hashes once
04e3884 [Sandy Ryza] SPARK-1632. Remove unnecessary boxing in compares in ExternalAppendOnlyMap

87cf35c2

SPARK-1235: manage the DAGScheduler EventProcessActor with supervisor and... · 027f1b85

CodingCat authored 10 years ago

SPARK-1235: manage the DAGScheduler EventProcessActor with supervisor and refactor the DAGScheduler with Akka

https://spark-project.atlassian.net/browse/SPARK-1235

In the current implementation, the running job will hang if the DAGScheduler crashes for some reason (eventProcessActor throws exception in receive() )

The reason is that the actor will automatically restart when the exception is thrown during the running but is not captured properly (Akka behaviour), and the JobWaiters are still waiting there for the completion of the tasks

In this patch, I refactored the DAGScheduler with Akka and manage the eventProcessActor with supervisor, so that upon the failure of a eventProcessActor, the supervisor will terminate the EventProcessActor and close the SparkContext

thanks for @kayousterhout and @markhamstra to give the hints in JIRA

Author: CodingCat <zhunansjtu@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>
Author: Nan Zhu <CodingCat@users.noreply.github.com>

Closes #186 from CodingCat/SPARK-1235 and squashes the following commits:

a7fb0ee [CodingCat] throw Exception on failure of creating DAG
124d82d [CodingCat] blocking the constructor until event actor is ready
baf2d38 [CodingCat] fix the issue brought by non-blocking actorOf
35c886a [CodingCat] fix bug
82d08b3 [CodingCat] calling actorOf on system to ensure it is blocking
310a579 [CodingCat] style fix
cd02d9a [Nan Zhu] small fix
561cfbc [CodingCat] recover doCheckpoint
c048d0e [CodingCat] call submitWaitingStages for every event
a9eea039 [CodingCat] address Matei's comments
ac878ab [CodingCat] typo fix
5d1636a [CodingCat] re-trigger the test.....
9dfb033 [CodingCat] remove unnecessary changes
a7a2a97 [CodingCat] add StageCancelled message
fdf3b17 [CodingCat] just to retrigger the test......
089bc2f [CodingCat] address andrew's comments
228f4b0 [CodingCat] address comments from Mark
b68c1c7 [CodingCat] refactor DAGScheduler with Akka
810efd8 [Xiangrui Meng] akka solution

027f1b85

SPARK-1607. HOTFIX: Fix syntax adapting Int result to Short · df6d8142

Sean Owen authored 10 years ago

Sorry folks. This should make the change for SPARK-1607 compile again. Verified this time with the yarn build enabled.

Author: Sean Owen <sowen@cloudera.com>

Closes #556 from srowen/SPARK-1607.2 and squashes the following commits:

e3fe7a3 [Sean Owen] Fix syntax adapting Int result to Short

df6d8142

Update KafkaWordCount.scala · 8aaef5c7

baishuo(白硕) authored 10 years ago

modify the required args number

Author: baishuo(白硕) <vc_java@hotmail.com>

Closes #523 from baishuo/master and squashes the following commits:

0368ba9 [baishuo(白硕)] Update KafkaWordCount.scala

8aaef5c7

Delete the val that never used · 25a276dd

WangTao authored 10 years ago

It seems that the val "startTime" and "endTime" is never used, so delete them.

Author: WangTao <barneystinson@aliyun.com>

Closes #553 from WangTaoTheTonic/master and squashes the following commits:

4fcb639 [WangTao] Delete the val that never used

25a276dd

SPARK-1621 Upgrade Chill to 0.3.6 · a24d918c

Matei Zaharia authored 10 years ago

It registers more Scala classes, including things like Ranges that we had to register manually before. See https://github.com/twitter/chill/releases for Chill's change log.

Author: Matei Zaharia <matei@databricks.com>

Closes #543 from mateiz/chill-0.3.6 and squashes the following commits:

a1dc5e0 [Matei Zaharia] Upgrade Chill to 0.3.6 and remove our special registration of Ranges

a24d918c

SPARK-1619 Launch spark-shell with spark-submit · dc3b640a

Patrick Wendell authored 10 years ago

This simplifies the shell a bunch and passes all arguments through to spark-submit.

There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #542 from pwendell/spark-shell and squashes the following commits:

9eb3e6f [Patrick Wendell] Updating Spark docs
b552459 [Patrick Wendell] Andrew's feedback
97720fa [Patrick Wendell] Review feedback
aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit

dc3b640a

SPARK-1607. Replace octal literals, removed in Scala 2.11, with hex literals · 6e101f11

Sean Owen authored 10 years ago

Octal literals like "0700" are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618

This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular)

Author: Sean Owen <sowen@cloudera.com>

Closes #529 from srowen/SPARK-1607 and squashes the following commits:

1ee0e67 [Sean Owen] Use Integer.parseInt(...,8) for octal literal instead of hex equivalent
0102f3d [Sean Owen] Replace octal literals, removed in Scala 2.11, with hex literals

6e101f11

Call correct stop(). · 45ad7f0c

Aaron Davidson authored 10 years ago

Oopsie in #504.

Author: Aaron Davidson <aaron@databricks.com>

Closes #527 from aarondav/stop and squashes the following commits:

8d1446a [Aaron Davidson] Call correct stop().

45ad7f0c

SPARK-1242 Add aggregate to python rdd · e03bc379

Holden Karau authored 10 years ago

Author: Holden Karau <holden@pigscanfly.ca>

Closes #139 from holdenk/add_aggregate_to_python_api and squashes the following commits:

0f39ae3 [Holden Karau] Merge in master
4879c75 [Holden Karau] CR feedback, fix issue with empty RDDs in aggregate
70b4724 [Holden Karau] Style fixes from code review
96b047b [Holden Karau] Add aggregate to python rdd

e03bc379

Apr 24, 2014

Fix [SPARK-1078]: Remove the Unnecessary lift-json dependency · 095b5182

Sandeep authored 10 years ago

Remove the Unnecessary lift-json dependency from pom.xml

Author: Sandeep <sandeep@techaddict.me>

Closes #536 from techaddict/FIX-SPARK-1078 and squashes the following commits:

bd0fd1d [Sandeep] Fix [SPARK-1078]: Replace lift-json with json4s-jackson. Remove the Unnecessary lift-json dependency from pom.xml

095b5182

[Typo] In the maven docs: chd -> cdh · 06e82d94

Andrew Or authored 10 years ago

Author: Andrew Or <andrewor14@gmail.com>

Closes #548 from andrewor14/doc-typo and squashes the following commits:

3eaf4c4 [Andrew Or] chd -> cdh

06e82d94

Generalize pattern for planning hash joins. · 86ff8b10

Michael Armbrust authored 10 years ago

This will be helpful for [SPARK-1495](https://issues.apache.org/jira/browse/SPARK-1495) and other cases where we want to have custom hash join implementations but don't want to repeat the logic for finding the join keys.

Author: Michael Armbrust <michael@databricks.com>

Closes #418 from marmbrus/hashFilter and squashes the following commits:

d5cc79b [Michael Armbrust] Address @rxin 's comments.
366b6d9 [Michael Armbrust] style fixes
14560eb [Michael Armbrust] Generalize pattern for planning hash joins.
f4809c1 [Michael Armbrust] Move common functions to PredicateHelper.

86ff8b10

[SPARK-1617] and [SPARK-1618] Improvements to streaming ui and bug fix to socket receiver · cd12dd9b

Tathagata Das authored 10 years ago

1617: These changes expose the receiver state (active or inactive) and last error in the UI
1618: If the socket receiver cannot connect in the first attempt, it should try to restart after a delay. That was broken, as the thread that restarts (hence, stops) the receiver waited on Thread.join on itself!

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #540 from tdas/streaming-ui-fix and squashes the following commits:

e469434 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-ui-fix
dbddf75 [Tathagata Das] Style fix.
66df1a5 [Tathagata Das] Merge remote-tracking branch 'apache/master' into streaming-ui-fix
ad98bc9 [Tathagata Das] Refactored streaming listener to use ReceiverInfo.
d7f849c [Tathagata Das] Revert "Moved BatchInfo from streaming.scheduler to streaming.ui"
5c80919 [Tathagata Das] Moved BatchInfo from streaming.scheduler to streaming.ui
da244f6 [Tathagata Das] Fixed socket receiver as well as made receiver state and error visible in the streamign UI.

cd12dd9b

SPARK-1586 Windows build fixes · 968c0187

Mridul Muralidharan authored 10 years ago

Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.

Author: Mridul Muralidharan <mridulm80@apache.org>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #505 from mridulm/windows_fixes and squashes the following commits:

ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows

968c0187

SPARK-1584: Upgrade Flume dependency to 1.4.0 · d5c6ae6c

tmalaska authored 10 years ago

Updated the Flume dependency in the maven pom file and the scala build file.

Author: tmalaska <ted.malaska@cloudera.com>

Closes #507 from tmalaska/master and squashes the following commits:

79492c8 [tmalaska] excluded all thrift
159c3f1 [tmalaska] fixed the flume pom file issues
5bf56a7 [tmalaska] Upgrade flume version

d5c6ae6c

[SPARK-986]: Job cancelation for PySpark · e53eb4f0

Ahir Reddy authored 10 years ago

* Additions to the PySpark API to cancel jobs
* Monitor Thread in PythonRDD to kill Python workers if a task is interrupted

Author: Ahir Reddy <ahirreddy@gmail.com>

Closes #541 from ahirreddy/python-cancel and squashes the following commits:

dfdf447 [Ahir Reddy] Changed success -> completed and made logging message clearer
6c860ab [Ahir Reddy] PR Comments
4b4100a [Ahir Reddy] Success flag
adba6ed [Ahir Reddy] Destroy python workers
27a2f8f [Ahir Reddy] Start the writer thread...
d422f7b [Ahir Reddy] Remove unnecesssary vals
adda337 [Ahir Reddy] Busy wait on the ocntext.interrupted flag, and then kill the python worker
d9e472f [Ahir Reddy] Revert "removed unnecessary vals"
5b9cae5 [Ahir Reddy] removed unnecessary vals
07b54d9 [Ahir Reddy] Fix canceling unit test
8ae9681 [Ahir Reddy] Don't interrupt worker
7722342 [Ahir Reddy] Monitor Thread for python workers
db04e16 [Ahir Reddy] Added canceling api to PySpark

e53eb4f0

[SPARK-1615] Synchronize accesses to the LiveListenerBus' event queue · ee6f7e22

Andrew Or authored 10 years ago

Original poster is @zsxwing, who reported this bug in #516.

Much of SparkListenerSuite relies on LiveListenerBus's `waitUntilEmpty()` method. As the name suggests, this waits until the event queue is empty. However, the following race condition could happen:

(1) We dequeue an event
(2) The queue is empty, we return true (even though the event has not been processed)
(3) The test asserts something assuming that all listeners have finished executing (and fails)
(4) The listeners receive and process the event

This PR makes (1) and (4) atomic by synchronizing around it. To do that, however, we must avoid using `eventQueue.take`, which is blocking and will cause a deadlock if we synchronize around it. As a workaround, we use the non-blocking `eventQueue.poll` + a semaphore to provide the same semantics.

This has been a possible race condition for a long time, but for some reason we've never run into it.

Author: Andrew Or <andrewor14@gmail.com>

Closes #544 from andrewor14/stage-info-test-fix and squashes the following commits:

3cbe40c [Andrew Or] Merge github.com:apache/spark into stage-info-test-fix
56dbbcb [Andrew Or] Check if event is actually added before releasing semaphore
eb486ae [Andrew Or] Synchronize accesses to the LiveListenerBus' event queue

ee6f7e22

[SPARK-1510] Spark Streaming metrics source for metrics system · 80429f3e

jerryshao authored 10 years ago

This pulls in changes made by @jerryshao in https://github.com/apache/spark/pull/424 and merges with the master.

Author: jerryshao <saisai.shao@intel.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #545 from tdas/streaming-metrics and squashes the following commits:

034b443 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-metrics
fb3b0a5 [jerryshao] Modify according master update
21939f5 [jerryshao] Style changes according to style check error
976116b [jerryshao] Add StreamSource in StreamingContext for better monitoring through metrics system

80429f3e

Spark 1489 Fix the HistoryServer view acls · 44da5ab2

Thomas Graves authored 10 years ago

This allows the view acls set by the user to be enforced by the history server. It also fixes filters being applied properly.

Author: Thomas Graves <tgraves@apache.org>

Closes #509 from tgravescs/SPARK-1489 and squashes the following commits:

869c186 [Thomas Graves] change to either acls enabled or disabled
0d8333c [Thomas Graves] Add history ui policy to allow acls to either use application set, history server force acls on, or off
65148b5 [Thomas Graves] SPARK-1489 Fix the HistoryServer view acls

44da5ab2

[SQL] Add support for parsing indexing into arrays in SQL. · 4660991e

Michael Armbrust authored 10 years ago

Author: Michael Armbrust <michael@databricks.com>

Closes #518 from marmbrus/parseArrayIndex and squashes the following commits:

afd2d6b [Michael Armbrust] 100 chars
c3d6026 [Michael Armbrust] Add support for parsing indexing into arrays in SQL.

4660991e

[SPARK-1592][streaming] Automatically remove streaming input blocks · 526a518b

Tathagata Das authored 10 years ago

The raw input data is stored as blocks in BlockManagers. Earlier they were cleared by cleaner ttl. Now since streaming does not require cleaner TTL to be set, the block would not get cleared. This increases up the Spark's memory usage, which is not even accounted and shown in the Spark storage UI. It may cause the data blocks to spill over to disk, which eventually slows down the receiving of data (persisting to memory become bottlenecked by writing to disk).

The solution in this PR is to automatically remove those blocks. The mechanism to keep track of which BlockRDDs (which has presents the raw data blocks as a RDD) can be safely cleared already exists. Just use it to explicitly remove blocks from BlockRDDs.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #512 from tdas/block-rdd-unpersist and squashes the following commits:

d25e610 [Tathagata Das] Merge remote-tracking branch 'apache/master' into block-rdd-unpersist
5f46d69 [Tathagata Das] Merge remote-tracking branch 'apache/master' into block-rdd-unpersist
2c320cd [Tathagata Das] Updated configuration with spark.streaming.unpersist setting.
2d4b2fd [Tathagata Das] Automatically removed input blocks

526a518b

SPARK-1438 RDD.sample() make seed param optional · 35e3d199

Arun Ramakrishnan authored 10 years ago

copying form previous pull request https://github.com/apache/spark/pull/462

Its probably better to let the underlying language implementation take care of the default . This was easier to do with python as the default value for seed in random and numpy random is None.

In Scala/Java side it might mean propagating an Option or null(oh no!) down the chain until where the Random is constructed. But, looks like the convention in some other methods was to use System.nanoTime. So, followed that convention.

Conflict with overloaded method in sql.SchemaRDD.sample which also defines default params.
sample(fraction, withReplacement=false, seed=math.random)
Scala does not allow more than one overloaded to have default params. I believe the author intended to override the RDD.sample method and not overload it. So, changed it.

If backward compatible is important, 3 new method can be introduced (without default params) like this
sample(fraction)
sample(fraction, withReplacement)
sample(fraction, withReplacement, seed)

Added some tests for the scala RDD takeSample method.

Author: Arun Ramakrishnan <smartnut007@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #477 from smartnut007/master and squashes the following commits:

07bb06e [Arun Ramakrishnan] SPARK-1438 fixing more space formatting issues
b9ebfe2 [Arun Ramakrishnan] SPARK-1438 removing redundant import of random in python rddsampler
8d05b1a [Arun Ramakrishnan] SPARK-1438 RDD . Replace System.nanoTime with a Random generated number. python: use a separate instance of Random instead of seeding language api global Random instance.
69619c6 [Arun Ramakrishnan] SPARK-1438 fix spacing issue
0c247db [Arun Ramakrishnan] SPARK-1438 RDD language apis to support optional seed in RDD methods sample/takeSample

35e3d199