Skip to content
Snippets Groups Projects
  1. Dec 19, 2013
  2. Dec 18, 2013
    • Aaron Davidson's avatar
      In experimental clusters we've observed that a 10 second timeout was insufficient, · 293a0af5
      Aaron Davidson authored
      despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data).
      This would cause an entire job to fail at the beginning of the reduce phase.
      There is no particular reason for this value to be small as a timeout should only occur
      in an exceptional situation.
      
      Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later
      be cleaned up to use Typesafe).
      
      Finally, deleted some lurking implicits. If anyone can think of a reason they should still
      be there, please let me know.
      293a0af5
    • Reynold Xin's avatar
      Merge pull request #267 from JoshRosen/cygwin · c64a53a4
      Reynold Xin authored
      Fix Cygwin support in several scripts.
      
      This allows the spark-shell, spark-class, run-example, make-distribution.sh,
      and ./bin/start-* scripts to work under Cygwin.  Note that this doesn't
      support PySpark under Cygwin, since that requires many additional `cygpath`
      calls from within Python and will be non-trivial to implement.
      
      This PR was inspired by, and subsumes, #253 (so close #253 after this is merged).
      c64a53a4
    • Reynold Xin's avatar
      Merge pull request #274 from azuryy/master · 5ea18727
      Reynold Xin authored
      Fixed the example link in the Scala programing guid.
      
      The old link cannot access, I changed to the new one.
      5ea18727
    • fengdong's avatar
      ad8ce014
    • Reynold Xin's avatar
      Merge pull request #273 from rxin/top · f4effb37
      Reynold Xin authored
      Fixed a performance problem in RDD.top and BoundedPriorityQueue
      
      BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion.
      
      This should also cherry pick cleanly into branch-0.8.
      f4effb37
  3. Dec 17, 2013
  4. Dec 16, 2013
  5. Dec 15, 2013
    • Josh Rosen's avatar
      Fix Cygwin support in several scripts. · f8ba89da
      Josh Rosen authored
      This allows the spark-shell, spark-class, run-example, make-distribution.sh,
      and ./bin/start-* scripts to work under Cygwin.  Note that this doesn't
      support PySpark under Cygwin, since that requires many additional `cygpath`
      calls from within Python and will be non-trivial to implement.
      
      This PR was inspired by, and subsumes, #253 (so close #253 after this is merged).
      f8ba89da
    • Josh Rosen's avatar
      Merge pull request #256 from MLnick/master · d2ced6d5
      Josh Rosen authored
      Fix 'IPYTHON=1 ./pyspark' throwing ValueError
      
      This fixes an annoying issue where running ```IPYTHON=1 ./pyspark``` resulted in:
      
      ```
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /__ / .__/\_,_/_/ /_/\_\   version 0.8.0
            /_/
      
      Using Python version 2.7.5 (default, Jun 20 2013 11:06:30)
      Spark context avaiable as sc.
      ---------------------------------------------------------------------------
      ValueError                                Traceback (most recent call last)
      /usr/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
          202             else:
          203                 filename = fname
      --> 204             __builtin__.execfile(filename, *where)
      
      /Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/shell.py in <module>()
           30 add_files = os.environ.get("ADD_FILES").split(',') if os.environ.get("ADD_FILES") != None else None
           31
      ---> 32 sc = SparkContext(os.environ.get("MASTER", "local"), "PySparkShell", pyFiles=add_files)
           33
           34 print """Welcome to
      
      /Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/context.pyc in __init__(self, master, jobName, sparkHome, pyFiles, environment, batchSize)
           70         with SparkContext._lock:
           71             if SparkContext._active_spark_context:
      ---> 72                 raise ValueError("Cannot run multiple SparkContexts at once")
           73             else:
           74                 SparkContext._active_spark_context = self
      
      ValueError: Cannot run multiple SparkContexts at once
      ```
      
      The issue arises since previously IPython didn't seem to respect ```$PYTHONSTARTUP```, but since at least 1.0.0 it has. Technically this might break for older versions of IPython, but most users should be able to upgrade IPython to at least 1.0.0 (and should be encouraged to do so :).
      
      New behaviour:
      ```
      Nicks-MacBook-Pro:incubator-spark-mlnick Nick$ IPYTHON=1 ./pyspark
      Python 2.7.5 (default, Jun 20 2013, 11:06:30)
      Type "copyright", "credits" or "license" for more information.
      
      IPython 1.1.0 -- An enhanced Interactive Python.
      ?         -> Introduction and overview of IPython's features.
      %quickref -> Quick reference.
      help      -> Python's own help system.
      object?   -> Details about 'object', use 'object??' for extra details.
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/tools/target/scala-2.9.3/spark-tools-assembly-0.9.0-incubating-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      13/12/12 13:08:15 WARN Utils: Your hostname, Nicks-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 10.0.0.4 instead (on interface en0)
      13/12/12 13:08:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
      13/12/12 13:08:15 INFO Slf4jEventHandler: Slf4jEventHandler started
      13/12/12 13:08:15 INFO SparkEnv: Registering BlockManagerMaster
      13/12/12 13:08:15 INFO DiskBlockManager: Created local directory at /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-local-20131212130815-0e76
      13/12/12 13:08:15 INFO MemoryStore: MemoryStore started with capacity 326.7 MB.
      13/12/12 13:08:15 INFO ConnectionManager: Bound socket to port 53732 with id = ConnectionManagerId(10.0.0.4,53732)
      13/12/12 13:08:15 INFO BlockManagerMaster: Trying to register BlockManager
      13/12/12 13:08:15 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager 10.0.0.4:53732 with 326.7 MB RAM
      13/12/12 13:08:15 INFO BlockManagerMaster: Registered BlockManager
      13/12/12 13:08:15 INFO HttpBroadcast: Broadcast server started at http://10.0.0.4:53733
      13/12/12 13:08:15 INFO SparkEnv: Registering MapOutputTracker
      13/12/12 13:08:15 INFO HttpFileServer: HTTP File server directory is /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-8f40e897-8211-4628-a7a8-755562d5244c
      13/12/12 13:08:16 INFO SparkUI: Started Spark Web UI at http://10.0.0.4:4040
      2013-12-12 13:08:16.337 java[56801:4003] Unable to load realm info from SCDynamicStore
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /__ / .__/\_,_/_/ /_/\_\   version 0.9.0-SNAPSHOT
            /_/
      
      Using Python version 2.7.5 (default, Jun 20 2013 11:06:30)
      Spark context avaiable as sc.
      ```
      d2ced6d5
    • Reynold Xin's avatar
      Merge pull request #257 from tgravescs/sparkYarnFixName · c55e6985
      Reynold Xin authored
      Fix the --name option for Spark on Yarn
      
      Looks like the --name option accidentally got broken in one of the merges.  The Client hangs if the --name option is used right now.
      c55e6985
    • Reynold Xin's avatar
      Merge pull request #264 from shivaram/spark-class-fix · ab85f88f
      Reynold Xin authored
      Use CoarseGrainedExecutorBackend in spark-class
      ab85f88f
    • Mark Hamstra's avatar
      Use scala.binary.version in POMs · 09ed7ddf
      Mark Hamstra authored
      09ed7ddf
    • Shivaram Venkataraman's avatar
    • Nick Pentreath's avatar
      Making IPython PySpark compatible across versions <1.0.0. Also cleaned up '-i'... · bb5277b1
      Nick Pentreath authored
      Making IPython PySpark compatible across versions <1.0.0. Also cleaned up '-i' option and made IPYTHON_OPTS work
      bb5277b1
    • Nick Pentreath's avatar
      d36ee3b1
  6. Dec 14, 2013
    • Reynold Xin's avatar
      Merge pull request #251 from pwendell/master · 7db91659
      Reynold Xin authored
      Fix list rendering in YARN markdown docs.
      
      This is some minor clean-up which makes the list render correctly.
      7db91659
    • Josh Rosen's avatar
      Merge pull request #249 from ngbinh/partitionInJavaSortByKey · 2fd781d3
      Josh Rosen authored
      Expose numPartitions parameter in JavaPairRDD.sortByKey()
      
      This change makes Java and Scala API on sortByKey() the same.
      2fd781d3
    • Patrick Wendell's avatar
      Merge pull request #259 from pwendell/scala-2.10 · 97ac0601
      Patrick Wendell authored
      Migration to Scala 2.10
      
      == Below description was written by Prashant Sharma ==
      
      This PR migrates spark to scala 2.10.
      
      Summary of changes apart from scala 2.10 migration:
      (has no implications for user.)
      1. Migrated Akka to 2.2.3.
      
      Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely.
      
      Uses an indestructible actorsystem which tolerates errors only on executors.
      
      (Might be useful for user.)
      4. New configuration settings introduced:
      
      System.getProperty("spark.akka.heartbeat.pauses", "600")
      System.getProperty("spark.akka.failure-detector.threshold", "300.0")
      System.getProperty("spark.akka.heartbeat.interval", "1000")
      
      Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md
      
      Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too.
      
      This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.
      97ac0601
    • Patrick Wendell's avatar
      Merge pull request #262 from pwendell/mvn-fix · 7ac944fc
      Patrick Wendell authored
      Fix maven build issues in 2.10 branch
      
      Found some issues when locally testing maven.
      7ac944fc
    • Patrick Wendell's avatar
      Fix maven build issues in 2.10 branch · 6e8a96c7
      Patrick Wendell authored
      6e8a96c7
  7. Dec 13, 2013
Loading