Commits · d2ced6d58c5e8aea23f909c2fc4ac11aa1b55607 · cs525-sp18-g07 / spark

Dec 15, 2013

Merge pull request #256 from MLnick/master · d2ced6d5

Josh Rosen authored 11 years ago

Fix 'IPYTHON=1 ./pyspark' throwing ValueError

This fixes an annoying issue where running ```IPYTHON=1 ./pyspark``` resulted in:

```
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 0.8.0
      /_/

Using Python version 2.7.5 (default, Jun 20 2013 11:06:30)
Spark context avaiable as sc.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
    202             else:
    203                 filename = fname
--> 204             __builtin__.execfile(filename, *where)

/Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/shell.py in <module>()
     30 add_files = os.environ.get("ADD_FILES").split(',') if os.environ.get("ADD_FILES") != None else None
     31
---> 32 sc = SparkContext(os.environ.get("MASTER", "local"), "PySparkShell", pyFiles=add_files)
     33
     34 print """Welcome to

/Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/context.pyc in __init__(self, master, jobName, sparkHome, pyFiles, environment, batchSize)
     70         with SparkContext._lock:
     71             if SparkContext._active_spark_context:
---> 72                 raise ValueError("Cannot run multiple SparkContexts at once")
     73             else:
     74                 SparkContext._active_spark_context = self

ValueError: Cannot run multiple SparkContexts at once
```

The issue arises since previously IPython didn't seem to respect ```$PYTHONSTARTUP```, but since at least 1.0.0 it has. Technically this might break for older versions of IPython, but most users should be able to upgrade IPython to at least 1.0.0 (and should be encouraged to do so :).

New behaviour:
```
Nicks-MacBook-Pro:incubator-spark-mlnick Nick$ IPYTHON=1 ./pyspark
Python 2.7.5 (default, Jun 20 2013, 11:06:30)
Type "copyright", "credits" or "license" for more information.

IPython 1.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/tools/target/scala-2.9.3/spark-tools-assembly-0.9.0-incubating-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
13/12/12 13:08:15 WARN Utils: Your hostname, Nicks-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 10.0.0.4 instead (on interface en0)
13/12/12 13:08:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
13/12/12 13:08:15 INFO Slf4jEventHandler: Slf4jEventHandler started
13/12/12 13:08:15 INFO SparkEnv: Registering BlockManagerMaster
13/12/12 13:08:15 INFO DiskBlockManager: Created local directory at /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-local-20131212130815-0e76
13/12/12 13:08:15 INFO MemoryStore: MemoryStore started with capacity 326.7 MB.
13/12/12 13:08:15 INFO ConnectionManager: Bound socket to port 53732 with id = ConnectionManagerId(10.0.0.4,53732)
13/12/12 13:08:15 INFO BlockManagerMaster: Trying to register BlockManager
13/12/12 13:08:15 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager 10.0.0.4:53732 with 326.7 MB RAM
13/12/12 13:08:15 INFO BlockManagerMaster: Registered BlockManager
13/12/12 13:08:15 INFO HttpBroadcast: Broadcast server started at http://10.0.0.4:53733
13/12/12 13:08:15 INFO SparkEnv: Registering MapOutputTracker
13/12/12 13:08:15 INFO HttpFileServer: HTTP File server directory is /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-8f40e897-8211-4628-a7a8-755562d5244c
13/12/12 13:08:16 INFO SparkUI: Started Spark Web UI at http://10.0.0.4:4040
2013-12-12 13:08:16.337 java[56801:4003] Unable to load realm info from SCDynamicStore
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 0.9.0-SNAPSHOT
      /_/

Using Python version 2.7.5 (default, Jun 20 2013 11:06:30)
Spark context avaiable as sc.
```

d2ced6d5

Merge pull request #257 from tgravescs/sparkYarnFixName · c55e6985

Reynold Xin authored 11 years ago

Fix the --name option for Spark on Yarn

Looks like the --name option accidentally got broken in one of the merges.  The Client hangs if the --name option is used right now.

c55e6985

Merge pull request #264 from shivaram/spark-class-fix · ab85f88f
Reynold Xin authored 11 years ago
```
Use CoarseGrainedExecutorBackend in spark-class
```
ab85f88f
Use CoarseGrainedExecutorBackend in spark-class · fc96ca9f
Shivaram Venkataraman authored 11 years ago

fc96ca9f
Making IPython PySpark compatible across versions <1.0.0. Also cleaned up '-i'... · bb5277b1
Nick Pentreath authored 11 years ago
```
Making IPython PySpark compatible across versions <1.0.0. Also cleaned up '-i' option and made IPYTHON_OPTS work
```
bb5277b1
Merge remote-tracking branch 'upstream/master' · d36ee3b1
Nick Pentreath authored 11 years ago

d36ee3b1

Dec 14, 2013

Merge pull request #251 from pwendell/master · 7db91659

Reynold Xin authored 11 years ago

Fix list rendering in YARN markdown docs.

This is some minor clean-up which makes the list render correctly.

7db91659

Merge pull request #249 from ngbinh/partitionInJavaSortByKey · 2fd781d3

Josh Rosen authored 11 years ago

Expose numPartitions parameter in JavaPairRDD.sortByKey()

This change makes Java and Scala API on sortByKey() the same.

2fd781d3

Merge pull request #259 from pwendell/scala-2.10 · 97ac0601

Patrick Wendell authored 11 years ago

Migration to Scala 2.10

== Below description was written by Prashant Sharma ==

This PR migrates spark to scala 2.10.

Summary of changes apart from scala 2.10 migration:
(has no implications for user.)
1. Migrated Akka to 2.2.3.

Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely.

Uses an indestructible actorsystem which tolerates errors only on executors.

(Might be useful for user.)
4. New configuration settings introduced:

System.getProperty("spark.akka.heartbeat.pauses", "600")
System.getProperty("spark.akka.failure-detector.threshold", "300.0")
System.getProperty("spark.akka.heartbeat.interval", "1000")

Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md

Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too.

This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.

97ac0601

Merge pull request #262 from pwendell/mvn-fix · 7ac944fc
Patrick Wendell authored 11 years ago
```
Fix maven build issues in 2.10 branch

Found some issues when locally testing maven.
```
7ac944fc
Fix maven build issues in 2.10 branch · 6e8a96c7
Patrick Wendell authored 11 years ago

6e8a96c7

Dec 13, 2013
- Merge pull request #261 from ScrapCodes/scala-2.10 · 6defb061
  Reynold Xin authored 11 years ago
  
  Added a comment about ActorRef and ActorSelection difference.
  6defb061
- Added a comment about ActorRef and ActorSelection difference. · 1ae3c0fc
  Prashant Sharma authored 11 years ago
  
  1ae3c0fc
- Merge pull request #260 from ScrapCodes/scala-2.10 · 76566b1f
  Reynold Xin authored 11 years ago
  
  Review comments on the PR for scala 2.10 migration.
  76566b1f
- Review comments on the PR for scala 2.10 migration. · a854cc53
  Prashant Sharma authored 11 years ago
  
  a854cc53
Dec 12, 2013

Merge pull request #255 from ScrapCodes/scala-2.10 · 0aeb182b
Patrick Wendell authored 11 years ago
```
Disabled yarn 2.2 in sbt and mvn build and added a message in the sbt build.
```
0aeb182b
Fix the --name option for Spark on Yarn · 842eb55f
Thomas Graves authored 11 years ago

842eb55f
Fix 'IPYTHON=1 ./pyspark' throwing 'ValueError: Cannot run multiple SparkContexts at once' · 8cdfb08c
Nick Pentreath authored 11 years ago

8cdfb08c
Disabled yarn 2.2 and added a message in the sbt build · 589b83a1
Prashant Sharma authored 11 years ago

589b83a1

Merge pull request #254 from ScrapCodes/scala-2.10 · 2e89398e

Patrick Wendell authored 11 years ago

Scala 2.10 migration

This PR migrates spark to scala 2.10.

Summary of changes apart from scala 2.10 migration:
(has no implications for user.)
1. Migrated Akka to 2.2.3.

Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely.

Uses an indestructible actorsystem which tolerates errors only on executors.

(Might be useful for user.)
4. New configuration settings introduced:

System.getProperty("spark.akka.heartbeat.pauses", "600")
System.getProperty("spark.akka.failure-detector.threshold", "300.0")
System.getProperty("spark.akka.heartbeat.interval", "1000")

Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md

Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too.

This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.

2e89398e

Dec 11, 2013
- A few corrections to documentation. · d3090b79
  Prashant Sharma authored 11 years ago
  
  d3090b79
Dec 10, 2013
- Merge branch 'akka-bug-fix' of github.com:ScrapCodes/incubator-spark into akka-bug-fix · f4c73df5
  Prashant Sharma authored 11 years ago
  
  f4c73df5
- Merge branch 'master' into akka-bug-fix · 603af51b
  Prashant Sharma authored 11 years ago
  
  Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
  603af51b
- added eclipse repository for spark streaming. · 0b82b5af
  Prashant Sharma authored 11 years ago
  
  0b82b5af
- Fix list rendering in YARN markdown docs. · 1291dd4d
  Patrick Wendell authored 11 years ago
  
  1291dd4d
- Merge pull request #250 from pwendell/master · d2efe135
  Patrick Wendell authored 11 years ago
  
  README incorrectly suggests build sources spark-env.sh This is misleading because the build doesn't source that file. IMO it's better to force people to specify build environment variables on the command line always, like we do in every example, so I'm just removing this doc.
  d2efe135
- README incorrectly suggests build sources spark-env.sh · 153cad12
  Patrick Wendell authored 11 years ago
  
  This is misleading because the build doesn't source that file. IMO it's better to force people to specify build environment variables on the command line always, like we do in every example.
  153cad12
- Hook directly to Scala API · 0b494f7d
  Binh Nguyen authored 11 years ago
  
  0b494f7d
- Leave default value of numPartitions to Scala code. · e85af507
  Binh Nguyen authored 11 years ago
  
  e85af507
- Use braces to shorten the line. · c82d4f07
  Binh Nguyen authored 11 years ago
  
  c82d4f07
- Expose numPartitions parameter in JavaPairRDD.sortByKey() · 5013fb64
  Binh Nguyen authored 11 years ago
  
  This change make Java and Scala API on sortByKey() the same.
  5013fb64
- Style fixes and addressed review comments at #221 · 17db6a90
  Prashant Sharma authored 11 years ago
  
  17db6a90
Dec 09, 2013
- Merge pull request #246 from pwendell/master · 6169fe14
  Patrick Wendell authored 11 years ago
  
  Add missing license headers I found this when doing further audits on the 0.8.1 release candidate.
  6169fe14
- License headers · 5b74609d
  Patrick Wendell authored 11 years ago
  
  5b74609d
- fixed yarn build · c1201f47
  Prashant Sharma authored 11 years ago
  
  c1201f47
Dec 08, 2013
- Merge pull request #195 from dhardy92/fix_DebScriptPackage · d992ec6d
  Patrick Wendell authored 11 years ago
  
  [Deb] fix package of Spark classes adding org.apache prefix in scripts embeded in .deb
  d992ec6d
- Merge pull request #242 from pwendell/master · 1f4a4bcc
  Patrick Wendell authored 11 years ago
  
  Update broken links and add HDP 2.0 version string I ran a link checker on the UI and found several broken links.
  1f4a4bcc
- Small fix · 0428145e
  Patrick Wendell authored 11 years ago
  
  0428145e
- Adding HDP 2.0 version · b3e87c0f
  Patrick Wendell authored 11 years ago
  
  b3e87c0f
- Various broken links in documentation · 41c60b33
  Patrick Wendell authored 11 years ago
  
  41c60b33