Skip to content
  • Timothy Chen's avatar
    53befacc
    [SPARK-5338] [MESOS] Add cluster mode support for Mesos · 53befacc
    Timothy Chen authored
    This patch adds the support for cluster mode to run on Mesos.
    It introduces a new Mesos framework dedicated to launch new apps/drivers, and can be called with the spark-submit script and specifying --master flag to the cluster mode REST interface instead of Mesos master.
    
    Example:
    ./bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi --master mesos://10.0.0.206:8077 --executor-memory 1G --total-executor-cores 100 examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar 30
    
    Part of this patch is also to abstract the StandaloneRestServer so it can have different implementations of the REST endpoints.
    
    Features of the cluster mode in this PR:
    - Supports supervise mode where scheduler will keep trying to reschedule exited job.
    - Adds a new UI for the cluster mode scheduler to see all the running jobs, finished jobs, and supervise jobs waiting to be retried
    - Supports state persistence to ZK, so when the cluster scheduler fails over it can pick up all the queued and running jobs
    
    Author: Timothy Chen <tnachen@gmail.com>
    Author: Luc Bourlier <luc.bourlier@typesafe.com>
    
    Closes #5144 from tnachen/mesos_cluster_mode and squashes the following commits:
    
    069e946 [Timothy Chen] Fix rebase.
    e24b512 [Timothy Chen] Persist submitted driver.
    390c491 [Timothy Chen] Fix zk conf key for mesos zk engine.
    e324ac1 [Timothy Chen] Fix merge.
    fd5259d [Timothy Chen] Address review comments.
    1553230 [Timothy Chen] Address review comments.
    c6c6b73 [Timothy Chen] Pass spark properties to mesos cluster tasks.
    f7d8046 [Timothy Chen] Change app name to spark cluster.
    17f93a2 [Timothy Chen] Fix head of line blocking in scheduling drivers.
    6ff8e5c [Timothy Chen] Address comments and add logging.
    df355cd [Timothy Chen] Add metrics to mesos cluster scheduler.
    20f7284 [Timothy Chen] Address review comments
    7252612 [Timothy Chen] Fix tests.
    a46ad66 [Timothy Chen] Allow zk cli param override.
    920fc4b [Timothy Chen] Fix scala style issues.
    862b5b5 [Timothy Chen] Support asking driver status when it's retrying.
    7f214c2 [Timothy Chen] Fix RetryState visibility
    e0f33f7 [Timothy Chen] Add supervise support and persist retries.
    371ce65 [Timothy Chen] Handle cluster mode recovery and state persistence.
    3d4dfa1 [Luc Bourlier] Adds support to kill submissions
    febfaba [Timothy Chen] Bound the finished drivers in memory
    543a98d [Timothy Chen] Schedule multiple jobs
    6887e5e [Timothy Chen] Support looking at SPARK_EXECUTOR_URI env variable in schedulers
    8ec76bc [Timothy Chen] Fix Mesos dispatcher UI.
    d57d77d [Timothy Chen] Add documentation
    825afa0 [Luc Bourlier] Supports more spark-submit parameters
    b8e7181 [Luc Bourlier] Adds a shutdown latch to keep the deamon running
    0fa7780 [Luc Bourlier] Launch task through the mesos scheduler
    5b7a12b [Timothy Chen] WIP: Making a cluster mode a mesos framework.
    4b2f5ef [Timothy Chen] Specify user jar in command to be replaced with local.
    e775001 [Timothy Chen] Support fetching remote uris in driver runner.
    7179495 [Timothy Chen] Change Driver page output and add logging
    880bc27 [Timothy Chen] Add Mesos Cluster UI to display driver results
    9986731 [Timothy Chen] Kill drivers when shutdown
    67cbc18 [Timothy Chen] Rename StandaloneRestClient to RestClient and add sbin scripts
    e3facdd [Timothy Chen] Add Mesos Cluster dispatcher
    53befacc
    [SPARK-5338] [MESOS] Add cluster mode support for Mesos
    Timothy Chen authored
    This patch adds the support for cluster mode to run on Mesos.
    It introduces a new Mesos framework dedicated to launch new apps/drivers, and can be called with the spark-submit script and specifying --master flag to the cluster mode REST interface instead of Mesos master.
    
    Example:
    ./bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi --master mesos://10.0.0.206:8077 --executor-memory 1G --total-executor-cores 100 examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar 30
    
    Part of this patch is also to abstract the StandaloneRestServer so it can have different implementations of the REST endpoints.
    
    Features of the cluster mode in this PR:
    - Supports supervise mode where scheduler will keep trying to reschedule exited job.
    - Adds a new UI for the cluster mode scheduler to see all the running jobs, finished jobs, and supervise jobs waiting to be retried
    - Supports state persistence to ZK, so when the cluster scheduler fails over it can pick up all the queued and running jobs
    
    Author: Timothy Chen <tnachen@gmail.com>
    Author: Luc Bourlier <luc.bourlier@typesafe.com>
    
    Closes #5144 from tnachen/mesos_cluster_mode and squashes the following commits:
    
    069e946 [Timothy Chen] Fix rebase.
    e24b512 [Timothy Chen] Persist submitted driver.
    390c491 [Timothy Chen] Fix zk conf key for mesos zk engine.
    e324ac1 [Timothy Chen] Fix merge.
    fd5259d [Timothy Chen] Address review comments.
    1553230 [Timothy Chen] Address review comments.
    c6c6b73 [Timothy Chen] Pass spark properties to mesos cluster tasks.
    f7d8046 [Timothy Chen] Change app name to spark cluster.
    17f93a2 [Timothy Chen] Fix head of line blocking in scheduling drivers.
    6ff8e5c [Timothy Chen] Address comments and add logging.
    df355cd [Timothy Chen] Add metrics to mesos cluster scheduler.
    20f7284 [Timothy Chen] Address review comments
    7252612 [Timothy Chen] Fix tests.
    a46ad66 [Timothy Chen] Allow zk cli param override.
    920fc4b [Timothy Chen] Fix scala style issues.
    862b5b5 [Timothy Chen] Support asking driver status when it's retrying.
    7f214c2 [Timothy Chen] Fix RetryState visibility
    e0f33f7 [Timothy Chen] Add supervise support and persist retries.
    371ce65 [Timothy Chen] Handle cluster mode recovery and state persistence.
    3d4dfa1 [Luc Bourlier] Adds support to kill submissions
    febfaba [Timothy Chen] Bound the finished drivers in memory
    543a98d [Timothy Chen] Schedule multiple jobs
    6887e5e [Timothy Chen] Support looking at SPARK_EXECUTOR_URI env variable in schedulers
    8ec76bc [Timothy Chen] Fix Mesos dispatcher UI.
    d57d77d [Timothy Chen] Add documentation
    825afa0 [Luc Bourlier] Supports more spark-submit parameters
    b8e7181 [Luc Bourlier] Adds a shutdown latch to keep the deamon running
    0fa7780 [Luc Bourlier] Launch task through the mesos scheduler
    5b7a12b [Timothy Chen] WIP: Making a cluster mode a mesos framework.
    4b2f5ef [Timothy Chen] Specify user jar in command to be replaced with local.
    e775001 [Timothy Chen] Support fetching remote uris in driver runner.
    7179495 [Timothy Chen] Change Driver page output and add logging
    880bc27 [Timothy Chen] Add Mesos Cluster UI to display driver results
    9986731 [Timothy Chen] Kill drivers when shutdown
    67cbc18 [Timothy Chen] Rename StandaloneRestClient to RestClient and add sbin scripts
    e3facdd [Timothy Chen] Add Mesos Cluster dispatcher
Loading