Skip to content
Snippets Groups Projects
  1. May 08, 2015
    • Brendan Collins's avatar
      updated ec2 instance types · 1c78f686
      Brendan Collins authored
      I needed to run some d2 instances, so I updated the spark_ec2.py accordingly
      
      Author: Brendan Collins <bcollins@blueraster.com>
      
      Closes #6014 from brendancol/ec2-instance-types-update and squashes the following commits:
      
      d7b4191 [Brendan Collins] Merge branch 'ec2-instance-types-update' of github.com:brendancol/spark into ec2-instance-types-update
      6366c45 [Brendan Collins] added back cc1.4xlarge
      fc2931f [Brendan Collins] updated ec2 instance types
      80c2aa6 [Brendan Collins] vertically aligned whitespace
      85c6236 [Brendan Collins] vertically aligned whitespace
      1657c26 [Brendan Collins] updated ec2 instance types
      1c78f686
  2. Apr 16, 2015
    • Davies Liu's avatar
      [SPARK-4897] [PySpark] Python 3 support · 04e44b37
      Davies Liu authored
      This PR update PySpark to support Python 3 (tested with 3.4).
      
      Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
      
      TODO: ec2/spark-ec2.py is not fully tested with python3.
      
      Author: Davies Liu <davies@databricks.com>
      Author: twneale <twneale@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5173 from davies/python3 and squashes the following commits:
      
      d7d6323 [Davies Liu] fix tests
      6c52a98 [Davies Liu] fix mllib test
      99e334f [Davies Liu] update timeout
      b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      cafd5ec [Davies Liu] adddress comments from @mengxr
      bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      179fc8d [Davies Liu] tuning flaky tests
      8c8b957 [Davies Liu] fix ResourceWarning in Python 3
      5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      4006829 [Davies Liu] fix test
      2fc0066 [Davies Liu] add python3 path
      71535e9 [Davies Liu] fix xrange and divide
      5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ed498c8 [Davies Liu] fix compatibility with python 3
      820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ad7c374 [Davies Liu] fix mllib test and warning
      ef1fc2f [Davies Liu] fix tests
      4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      59bb492 [Davies Liu] fix tests
      1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ca0fdd3 [Davies Liu] fix code style
      9563a15 [Davies Liu] add imap back for python 2
      0b1ec04 [Davies Liu] make python examples work with Python 3
      d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      a716d34 [Davies Liu] test with python 3.4
      f1700e8 [Davies Liu] fix test in python3
      671b1db [Davies Liu] fix test in python3
      692ff47 [Davies Liu] fix flaky test
      7b9699f [Davies Liu] invalidate import cache for Python 3.3+
      9c58497 [Davies Liu] fix kill worker
      309bfbf [Davies Liu] keep compatibility
      5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
      8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      f53e1f0 [Davies Liu] fix tests
      70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
      a39167e [Davies Liu] support customize class in __main__
      814c77b [Davies Liu] run unittests with python 3
      7f4476e [Davies Liu] mllib tests passed
      d737924 [Davies Liu] pass ml tests
      375ea17 [Davies Liu] SQL tests pass
      6cc42a9 [Davies Liu] rename
      431a8de [Davies Liu] streaming tests pass
      78901a7 [Davies Liu] fix hash of serializer in Python 3
      24b2f2e [Davies Liu] pass all RDD tests
      35f48fe [Davies Liu] run future again
      1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
      6e3c21d [Davies Liu] make cloudpickle work with Python3
      2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
      1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
      7354371 [twneale] buffer --> memoryview  I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
      b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
      f40d925 [twneale] xrange --> range
      e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
      79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
      2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
      854be27 [Josh Rosen] Run `futurize` on Python code:
      7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
      04e44b37
  3. Apr 08, 2015
    • Michelangelo D'Agostino's avatar
      [SPARK-5242]: Add --private-ips flag to EC2 script · 86403f55
      Michelangelo D'Agostino authored
      The `spark_ec2.py` script currently references the `ip_address` and `public_dns_name` attributes of an instance. On private networks, these fields aren't set, so we have problems.
      
      This PR introduces a `--private-ips` flag that instead refers to the `private_ip_address` attribute in both cases.
      
      Author: Michelangelo D'Agostino <mdagostino@civisanalytics.com>
      
      Closes #5244 from mdagost/ec2_private_nets and squashes the following commits:
      
      b684c67 [Michelangelo D'Agostino] STY: A few python lint changes.
      a4a2eac [Michelangelo D'Agostino] ENH: Fix IP's typo and refactor conditional logic into functions.
      c004604 [Michelangelo D'Agostino] ENH: Add --private-ips flag.
      86403f55
  4. Apr 07, 2015
    • Matt Aasted's avatar
      [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py · 6f0d55d7
      Matt Aasted authored
      The spark_ec2.py script uses public_dns_name everywhere in the script except for testing ssh availability, which is done using the public ip address of the instances. This breaks the script for users who are deploying the cluster with a private-network-only security group. The fix is to use public_dns_name in the remaining place.
      
      Author: Matt Aasted <aasted@twitch.tv>
      
      Closes #5302 from aasted/master and squashes the following commits:
      
      60cf6ee [Matt Aasted] [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
      6f0d55d7
  5. Apr 01, 2015
  6. Mar 19, 2015
    • Nicholas Chammas's avatar
      [SPARK-6219] [Build] Check that Python code compiles · f17d43b0
      Nicholas Chammas authored
      This PR expands the Python lint checks so that they check for obvious compilation errors in our Python code.
      
      For example:
      
      ```
      $ ./dev/lint-python
      Python lint checks failed.
      Compiling ./ec2/spark_ec2.py ...
        File "./ec2/spark_ec2.py", line 618
          return (master_nodes,, slave_nodes)
                               ^
      SyntaxError: invalid syntax
      
      ./ec2/spark_ec2.py:618:25: E231 missing whitespace after ','
      ./ec2/spark_ec2.py:1117:101: E501 line too long (102 > 100 characters)
      ```
      
      This PR also bumps up the version of `pep8`. It ignores new types of checks introduced by that version bump while fixing problems missed by the older version of `pep8` we were using.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4941 from nchammas/compile-spark-ec2 and squashes the following commits:
      
      75e31d8 [Nicholas Chammas] upgrade pep8 + check compile
      b33651c [Nicholas Chammas] PEP8 line length
      f17d43b0
    • Pierre Borckmans's avatar
      [SPARK-6402][DOC] - Remove some refererences to shark in docs and ec2 · 797f8a00
      Pierre Borckmans authored
      EC2 script and job scheduling documentation still refered to Shark.
      I removed these references.
      
      I also removed a remaining `SHARK_VERSION` variable from `ec2-variables.sh`.
      
      Author: Pierre Borckmans <pierre.borckmans@realimpactanalytics.com>
      
      Closes #5083 from pierre-borckmans/remove_refererences_to_shark_in_docs and squashes the following commits:
      
      4e90ffc [Pierre Borckmans] Removed deprecated SHARK_VERSION
      caea407 [Pierre Borckmans] Remove shark reference from ec2 script doc
      196c744 [Pierre Borckmans] Removed references to Shark
      797f8a00
  7. Mar 10, 2015
    • cheng chang's avatar
      [SPARK-6186] [EC2] Make Tachyon version configurable in EC2 deployment script · 7c7d2d5e
      cheng chang authored
      This PR comes from Tachyon community to solve the issue:
      https://tachyon.atlassian.net/browse/TACHYON-11
      
      An accompanying PR is in mesos/spark-ec2:
      https://github.com/mesos/spark-ec2/pull/101
      
      Author: cheng chang <myairia@gmail.com>
      
      Closes #4901 from uronce-cc/master and squashes the following commits:
      
      313aa36 [cheng chang] minor re-wording
      fd2a48e [cheng chang] Remove Tachyon when deploying through git hash
      1d53c5c [cheng chang] add default value to --tachyon-version
      6f8887e [cheng chang] make tachyon version configurable
      7c7d2d5e
    • Nicholas Chammas's avatar
      [SPARK-6191] [EC2] Generalize ability to download libs · d14df06c
      Nicholas Chammas authored
      Right now we have a method to specifically download boto. This PR generalizes it so it's easy to download additional libraries if we want.
      
      For example, adding new external libraries for spark-ec2 is now as simple as:
      
      ```python
      external_libs = [
          {
               "name": "boto",
               "version": "2.34.0",
               "md5": "5556223d2d0cc4d06dd4829e671dcecd"
          },
          {
              "name": "PyYAML",
              "version": "3.11",
              "md5": "f50e08ef0fe55178479d3a618efe21db"
          },
          {
              "name": "argparse",
              "version": "1.3.0",
              "md5": "9bcf7f612190885c8c85e30ba41db3c7"
          }
      ]
      ```
      Likely use cases:
      * Downloading PyYAML to allow spark-ec2 configs to be persisted as a YAML file. ([SPARK-925](https://issues.apache.org/jira/browse/SPARK-925))
      * Downloading argparse to clean up / modernize our option parsing.
      
      First run output, with PyYAML and argparse added just for demonstration purposes:
      
      ```shell
      $ ./spark-ec2 --version
      Downloading external libraries that spark-ec2 needs from PyPI to /path/to/spark/ec2/lib...
      This should be a one-time operation.
       - Downloading boto...
       - Finished downloading boto.
       - Downloading PyYAML...
       - Finished downloading PyYAML.
       - Downloading argparse...
       - Finished downloading argparse.
      spark-ec2 1.2.1
      ```
      
      Output thereafter:
      
      ```shell
      $ ./spark-ec2 --version
      spark-ec2 1.2.1
      ```
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4919 from nchammas/setup-ec2-libs and squashes the following commits:
      
      a077955 [Nicholas Chammas] print default region
      c95fb7d [Nicholas Chammas] to docstring
      5448845 [Nicholas Chammas] remove libs added for demo purposes
      60d8c23 [Nicholas Chammas] generalize ability to download libs
      d14df06c
  8. Mar 09, 2015
    • Theodore Vasiloudis's avatar
      [EC2] [SPARK-6188] Instance types can be mislabeled when re-starting cluster with default arguments · f7c79920
      Theodore Vasiloudis authored
      As described in https://issues.apache.org/jira/browse/SPARK-6188 and discovered in https://issues.apache.org/jira/browse/SPARK-5838.
      
      When re-starting a cluster, if the user does not provide the instance types, which is the recommended behavior in the docs currently, the instance will be assigned the default type m1.large. This then affects the setup of the machines.
      
      This solves this by getting the instance types from the existing instances, and overwriting the default options.
      
      EDIT: Further clarification of the issue:
      
      In short, while the instances themselves are the same as launched, their setup is done assuming the default instance type, m1.large.
      
      This means that the machines are assumed to have 2 disks, and that leads to problems that are described in in issue [5838](https://issues.apache.org/jira/browse/SPARK-5838), where machines that have one disk end up having shuffle spills in the in the small (8GB) snapshot partitions that quickly fills up and results in failing jobs due to "No space left on device" errors.
      
      Other instance specific settings that are set in the spark_ec2.py script are likely to be wrong as well.
      
      Author: Theodore Vasiloudis <thvasilo@users.noreply.github.com>
      Author: Theodore Vasiloudis <tvas@sics.se>
      
      Closes #4916 from thvasilo/SPARK-6188]-Instance-types-can-be-mislabeled-when-re-starting-cluster-with-default-arguments and squashes the following commits:
      
      6705b98 [Theodore Vasiloudis] Added comment to clarify setting master instance type to the empty string.
      a3d29fe [Theodore Vasiloudis] More trailing whitespace
      7b32429 [Theodore Vasiloudis] Removed trailing whitespace
      3ebd52a [Theodore Vasiloudis] Make sure that the instance type is correct when relaunching a cluster.
      f7c79920
  9. Mar 08, 2015
    • Nicholas Chammas's avatar
      [SPARK-6193] [EC2] Push group filter up to EC2 · 52ed7da1
      Nicholas Chammas authored
      When looking for a cluster, spark-ec2 currently pulls down [info for all instances](https://github.com/apache/spark/blob/eb48fd6e9d55fb034c00e61374bb9c2a86a82fb8/ec2/spark_ec2.py#L620) and filters locally. When working on an AWS account with hundreds of active instances, this step alone can take over 10 seconds.
      
      This PR improves how spark-ec2 searches for clusters by pushing the filter up to EC2.
      
      Basically, the problem (and solution) look like this:
      
      ```python
      >>> timeit.timeit('blah = conn.get_all_reservations()', setup='from __main__ import conn', number=10)
      116.96390509605408
      >>> timeit.timeit('blah = conn.get_all_reservations(filters={"instance.group-name": ["my-cluster-master"]})', setup='from __main__ import conn', number=10)
      4.629754066467285
      ```
      
      Translated to a user-visible action, this looks like (against an AWS account with ~200 active instances):
      
      ```shell
      # master
      $ python -m timeit -n 3 --setup 'import subprocess' 'subprocess.call("./spark-ec2 get-master my-cluster --region us-west-2", shell=True)'
      ...
      3 loops, best of 3: 9.83 sec per loop
      
      # this PR
      $ python -m timeit -n 3 --setup 'import subprocess' 'subprocess.call("./spark-ec2 get-master my-cluster --region us-west-2", shell=True)'
      ...
      3 loops, best of 3: 1.47 sec per loop
      ```
      
      This PR also refactors `get_existing_cluster()` to make it, I hope, simpler.
      
      Finally, this PR fixes some minor grammar issues related to printing status to the user. :tophat: :clap:
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4922 from nchammas/get-existing-cluster-faster and squashes the following commits:
      
      18802f1 [Nicholas Chammas] ignore shutting-down
      f2a5b9f [Nicholas Chammas] fix grammar
      d96a489 [Nicholas Chammas] push group filter up to EC2
      52ed7da1
  10. Mar 07, 2015
    • Florian Verhein's avatar
      [SPARK-5641] [EC2] Allow spark_ec2.py to copy arbitrary files to cluster · 334c5bd1
      Florian Verhein authored
      Give users an easy way to rcp a directory structure to the master's / as part of the cluster launch, at a useful point in the workflow (before setup.sh is called on the master).
      
      This is an alternative approach to meeting requirements discussed in https://github.com/apache/spark/pull/4487
      
      Author: Florian Verhein <florian.verhein@gmail.com>
      
      Closes #4583 from florianverhein/master and squashes the following commits:
      
      49dee88 [Florian Verhein] removed addition of trailing / in rsync to give user this option, added documentation in help
      7b8e3d8 [Florian Verhein] remove unused args
      87d922c [Florian Verhein] [SPARK-5641] [EC2] implement --deploy-root-dir
      334c5bd1
    • Nicholas Chammas's avatar
      [EC2] Reorder print statements on termination · 2646794f
      Nicholas Chammas authored
      The PR reorders some print statements slightly on cluster termination so that they read better.
      
      For example, from this:
      
      ```
      Are you sure you want to destroy the cluster spark-cluster-test?
      The following instances will be terminated:
      Searching for existing cluster spark-cluster-test in region us-west-2...
      Found 1 master(s), 2 slaves
      > ...
      ALL DATA ON ALL NODES WILL BE LOST!!
      Destroy cluster spark-cluster-test (y/N):
      ```
      
      To this:
      
      ```
      Searching for existing cluster spark-cluster-test in region us-west-2...
      Found 1 master(s), 2 slaves
      The following instances will be terminated:
      > ...
      ALL DATA ON ALL NODES WILL BE LOST!!
      Are you sure you want to destroy the cluster spark-cluster-test? (y/N)
      ```
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4932 from nchammas/termination-print-order and squashes the following commits:
      
      c23711d [Nicholas Chammas] reorder prints on termination
      2646794f
  11. Feb 12, 2015
    • Vladimir Grigor's avatar
      [SPARK-5335] Fix deletion of security groups within a VPC · ada993e9
      Vladimir Grigor authored
      Please see https://issues.apache.org/jira/browse/SPARK-5335.
      
      The fix itself is in e58a8b01a8bedcbfbbc6d04b1c1489255865cf87 commit. Two earlier commits are fixes of another VPC related bug waiting to be merged. I should have created former bug fix in own branch then this fix would not have former fixes. :(
      
      This code is released under the project's license.
      
      Author: Vladimir Grigor <vladimir@kiosked.com>
      Author: Vladimir Grigor <vladimir@voukka.com>
      
      Closes #4122 from voukka/SPARK-5335_delete_sg_vpc and squashes the following commits:
      
      090dca9 [Vladimir Grigor] fixes as per review: removed printing of group_id and added comment
      730ec05 [Vladimir Grigor] fix for SPARK-5335: Destroying cluster in VPC with "--delete-groups" fails to remove security groups
      ada993e9
    • Katsunori Kanda's avatar
      [EC2] Update default Spark version to 1.2.1 · 9c807650
      Katsunori Kanda authored
      Author: Katsunori Kanda <potix2@gmail.com>
      
      Closes #4566 from potix2/ec2-update-version-1-2-1 and squashes the following commits:
      
      77e7840 [Katsunori Kanda] [EC2] Update default Spark version to 1.2.1
      9c807650
  12. Feb 10, 2015
    • Miguel Peralvo's avatar
      [SPARK-5668] Display region in spark_ec2.py get_existing_cluster() · c49a4049
      Miguel Peralvo authored
      Show the region for the different messages displayed by get_existing_cluster(): The search, found and error messages.
      
      Author: Miguel Peralvo <miguel.peralvo@gmail.com>
      
      Closes #4457 from MiguelPeralvo/patch-2 and squashes the following commits:
      
      a5514c8 [Miguel Peralvo] Update spark_ec2.py
      0a837b0 [Miguel Peralvo] Update spark_ec2.py
      3923f36 [Miguel Peralvo] Update spark_ec2.py
      4ecd9f9 [Miguel Peralvo] [SPARK-5668] Display region in spark_ec2.py get_existing_cluster()
      c49a4049
    • Nicholas Chammas's avatar
      [SPARK-1805] [EC2] Validate instance types · 50820f15
      Nicholas Chammas authored
      Addresses [SPARK-1805](https://issues.apache.org/jira/browse/SPARK-1805), though doesn't resolve it completely.
      
      Error out quickly if the user asks for the master and slaves to have different AMI virtualization types, since we don't currently support that.
      
      In addition to that, we print warnings if the inputted instance types are not recognized, though I would prefer if we errored out. Elsewhere in the script it seems [we allow unrecognized instance types](https://github.com/apache/spark/blob/5de14cc2763a8211f77eeb55940dec025822eb78/ec2/spark_ec2.py#L331), though I think we should remove that.
      
      It's messy, but it should serve us until we enhance spark-ec2 to support clusters with mixed virtualization types.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4455 from nchammas/ec2-master-slave-different-virtualization and squashes the following commits:
      
      ce28609 [Nicholas Chammas] fix style
      b0adba0 [Nicholas Chammas] validate input instance types
      50820f15
  13. Feb 09, 2015
    • Florian Verhein's avatar
      [SPARK-5611] [EC2] Allow spark-ec2 repo and branch to be set on CLI of spark_ec2.py · b884daa5
      Florian Verhein authored
      and by extension, the ami-list
      
      Useful for using alternate spark-ec2 repos or branches.
      
      Author: Florian Verhein <florian.verhein@gmail.com>
      
      Closes #4385 from florianverhein/master and squashes the following commits:
      
      7e2b4be [Florian Verhein] [SPARK-5611] [EC2] typo
      8b653dc [Florian Verhein] [SPARK-5611] [EC2] Enforce only supporting spark-ec2 forks from github, log improvement
      bc4b0ed [Florian Verhein] [SPARK-5611] allow spark-ec2 repos with different names
      8b5c551 [Florian Verhein] improve option naming, fix logging, fix lint failing, add guard to enforce spark-ec2
      7724308 [Florian Verhein] [SPARK-5611] [EC2] fixes
      b42b68c [Florian Verhein] [SPARK-5611] [EC2] Allow spark-ec2 repo and branch to be set on CLI of spark_ec2.py
      b884daa5
    • Nicholas Chammas's avatar
      [SPARK-5473] [EC2] Expose SSH failures after status checks pass · 4dfe180f
      Nicholas Chammas authored
      If there is some fatal problem with launching a cluster, `spark-ec2` just hangs without giving the user useful feedback on what the problem is.
      
      This PR exposes the output of the SSH calls to the user if the SSH test fails during cluster launch for any reason but the instance status checks are all green. It also removes the growing trail of dots while waiting in favor of a fixed 3 dots.
      
      For example:
      
      ```
      $ ./ec2/spark-ec2 -k key -i /incorrect/path/identity.pem --instance-type m3.medium --slaves 1 --zone us-east-1c launch "spark-test"
      Setting up security groups...
      Searching for existing cluster spark-test...
      Spark AMI: ami-35b1885c
      Launching instances...
      Launched 1 slaves in us-east-1c, regid = r-7dadd096
      Launched master in us-east-1c, regid = r-fcadd017
      Waiting for cluster to enter 'ssh-ready' state...
      Warning: SSH connection error. (This could be temporary.)
      Host: 127.0.0.1
      SSH return code: 255
      SSH output: Warning: Identity file /incorrect/path/identity.pem not accessible: No such file or directory.
      Warning: Permanently added '127.0.0.1' (RSA) to the list of known hosts.
      Permission denied (publickey).
      ```
      
      This should give users enough information when some unrecoverable error occurs during launch so they can know to abort the launch. This will help avoid situations like the ones reported [here on Stack Overflow](http://stackoverflow.com/q/28002443/) and [here on the user list](http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3C1422323829398-21381.postn3.nabble.com%3E), where the users couldn't tell what the problem was because it was being hidden by `spark-ec2`.
      
      This is a usability improvement that should be backported to 1.2.
      
      Resolves [SPARK-5473](https://issues.apache.org/jira/browse/SPARK-5473).
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4262 from nchammas/expose-ssh-failure and squashes the following commits:
      
      8bda6ed [Nicholas Chammas] default to print SSH output
      2b92534 [Nicholas Chammas] show SSH output after status check pass
      4dfe180f
  14. Feb 08, 2015
    • liuchang0812's avatar
      [SPARK-5366][EC2] Check the mode of private key · 6fb141e2
      liuchang0812 authored
      Check the mode of private key file.
      
      Author: liuchang0812 <liuchang0812@gmail.com>
      
      Closes #4162 from Liuchang0812/ec2-script and squashes the following commits:
      
      fc37355 [liuchang0812] quota file name
      01ed464 [liuchang0812] more output
      ce2a207 [liuchang0812] pep8
      f44efd2 [liuchang0812] move code to real_main
      8475a54 [liuchang0812] fix bug
      cd61a1a [liuchang0812] import stat
      c106cb2 [liuchang0812] fix trivis bug
      89c9953 [liuchang0812] more output about checking private key
      1177a90 [liuchang0812] remove commet
      41188ab [liuchang0812] check the mode of private key
      6fb141e2
  15. Feb 06, 2015
    • Grzegorz Dubicki's avatar
      SPARK-5403: Ignore UserKnownHostsFile in SSH calls · e772b4e4
      Grzegorz Dubicki authored
      See https://issues.apache.org/jira/browse/SPARK-5403
      
      Author: Grzegorz Dubicki <grzegorz.dubicki@gmail.com>
      
      Closes #4196 from grzegorz-dubicki/SPARK-5403 and squashes the following commits:
      
      a7d863f [Grzegorz Dubicki] Resolve start command hanging issue
      e772b4e4
    • GenTang's avatar
      [SPARK-4983] Insert waiting time before tagging EC2 instances · 0f3a3607
      GenTang authored
      The boto API doesn't support tag EC2 instances in the same call that launches them.
      We add a five-second wait so EC2 has enough time to propagate the information so that
      the tagging can succeed.
      
      Author: GenTang <gen.tang86@gmail.com>
      Author: Gen TANG <gen.tang86@gmail.com>
      
      Closes #3986 from GenTang/spark-4983 and squashes the following commits:
      
      13e257d [Gen TANG] modification of comments
      47f06755 [GenTang] print the information
      ab7a931 [GenTang] solve the issus spark-4983 by inserting waiting time
      3179737 [GenTang] Revert "handling exceptions about adding tags to ec2"
      6a8b53b [GenTang] Revert "the improvement of exception handling"
      13e97a6 [GenTang] Revert "typo"
      63fd360 [GenTang] typo
      692fc2b [GenTang] the improvement of exception handling
      6adcf6d [GenTang] handling exceptions about adding tags to ec2
      0f3a3607
    • Nicholas Chammas's avatar
      [SPARK-5628] Add version option to spark-ec2 · 70e5b030
      Nicholas Chammas authored
      Every proper command line tool should include a `--version` option or something similar.
      
      This PR adds this to `spark-ec2` using the standard functionality provided by `optparse`.
      
      One thing we don't do here is follow the Python convention of setting `__version__`, since it seems awkward given how `spark-ec2` is laid out.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4414 from nchammas/spark-ec2-show-version and squashes the following commits:
      
      914cab5 [Nicholas Chammas] add version info
      70e5b030
  16. Jan 28, 2015
    • Nicholas Chammas's avatar
      [SPARK-5434] [EC2] Preserve spaces in EC2 path · d44ee436
      Nicholas Chammas authored
      Fixes [SPARK-5434](https://issues.apache.org/jira/browse/SPARK-5434).
      
      Simple demonstration of the problem and the fix:
      
      ```
      $ spacey_path="/path/with some/spaces"
      $ dirname $spacey_path
      usage: dirname path
      $ echo $?
      1
      $ dirname "$spacey_path"
      /path/with some
      $ echo $?
      0
      ```
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4224 from nchammas/patch-1 and squashes the following commits:
      
      960711a [Nicholas Chammas] [EC2] Preserve spaces in EC2 path
      d44ee436
  17. Jan 08, 2015
  18. Dec 25, 2014
  19. Dec 19, 2014
    • Josh Rosen's avatar
      [SPARK-4890] Upgrade Boto to 2.34.0; automatically download Boto from PyPi instead of packaging it · c28083f4
      Josh Rosen authored
      This patch upgrades `spark-ec2`'s Boto version to 2.34.0, since this is blocking several features.  Newer versions of Boto don't work properly when they're loaded from a zipfile since they try to read a JSON file from a path relative to the Boto library sources.
      
      Therefore, this patch also changes spark-ec2 to automatically download Boto from PyPi if it's not present in `SPARK_EC2_DIR/lib`, similar to what we do in the `sbt/sbt` script. This shouldn't ben an issue for users since they already need to have an internet connection to launch an EC2 cluster.  By performing the downloading in spark_ec2.py instead of the Bash script, this should also work for Windows users.
      
      I've tested this with Python 2.6, too.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #3737 from JoshRosen/update-boto and squashes the following commits:
      
      0aa43cc [Josh Rosen] Remove unused setup_standalone_cluster() method.
      f02935d [Josh Rosen] Enable Python deprecation warnings and fix one Boto warning:
      587ae89 [Josh Rosen] [SPARK-4890] Upgrade Boto to 2.34.0; automatically download Boto from PyPi instead of packaging it
      c28083f4
  20. Dec 16, 2014
    • Holden Karau's avatar
      SPARK-4767: Add support for launching in a specified placement group to spark_ec2 · b0dfdbdd
      Holden Karau authored
      Placement groups are cool and all the cool kids are using them. Lets add support for them to spark_ec2.py because I'm lazy
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #3623 from holdenk/SPARK-4767-add-support-for-launching-in-a-specified-placement-group-to-spark-ec2-scripts and squashes the following commits:
      
      111a5fd [Holden Karau] merge in master
      70ace25 [Holden Karau] Placement groups are cool and all the cool kids are using them. Lets add support for them to spark_ec2.py because I'm lazy
      b0dfdbdd
    • Mike Jennings's avatar
      [SPARK-3405] add subnet-id and vpc-id options to spark_ec2.py · d12c0711
      Mike Jennings authored
      Based on this gist:
      https://gist.github.com/amar-analytx/0b62543621e1f246c0a2
      
      We use security group ids instead of security group to get around this issue:
      https://github.com/boto/boto/issues/350
      
      Author: Mike Jennings <mvj101@gmail.com>
      Author: Mike Jennings <mvj@google.com>
      
      Closes #2872 from mvj101/SPARK-3405 and squashes the following commits:
      
      be9cb43 [Mike Jennings] `pep8 spark_ec2.py` runs cleanly.
      4dc6756 [Mike Jennings] Remove duplicate comment
      731d94c [Mike Jennings] Update for code review.
      ad90a36 [Mike Jennings] Merge branch 'master' of https://github.com/apache/spark into SPARK-3405
      1ebffa1 [Mike Jennings] Merge branch 'master' into SPARK-3405
      52aaeec [Mike Jennings] [SPARK-3405] add subnet-id and vpc-id options to spark_ec2.py
      d12c0711
  21. Dec 04, 2014
    • alexdebrie's avatar
      [SPARK-4745] Fix get_existing_cluster() function with multiple security groups · 794f3aec
      alexdebrie authored
      The current get_existing_cluster() function would only find an instance belonged to a cluster if the instance's security groups == cluster_name + "-master" (or "-slaves"). This fix allows for multiple security groups by checking if the cluster_name + "-master" security group is in the list of groups for a particular instance.
      
      Author: alexdebrie <alexdebrie1@gmail.com>
      
      Closes #3596 from alexdebrie/master and squashes the following commits:
      
      9d51232 [alexdebrie] Fix get_existing_cluster() function with multiple security groups
      794f3aec
  22. Nov 29, 2014
    • Nicholas Chammas's avatar
      [SPARK-3398] [SPARK-4325] [EC2] Use EC2 status checks. · 317e114e
      Nicholas Chammas authored
      This PR re-introduces [0e648bc](https://github.com/apache/spark/commit/0e648bc2bedcbeb55fce5efac04f6dbad9f063b4) from PR #2339, which somehow never made it into the codebase.
      
      Additionally, it removes a now-unnecessary linear backoff on the SSH checks since we are blocking on EC2 status checks before testing SSH.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #3195 from nchammas/remove-ec2-ssh-backoff and squashes the following commits:
      
      efb29e1 [Nicholas Chammas] Revert "Remove linear backoff."
      ef3ca99 [Nicholas Chammas] reuse conn
      adb4eaa [Nicholas Chammas] Remove linear backoff.
      55caa24 [Nicholas Chammas] Check EC2 status checks before SSH.
      317e114e
  23. Nov 28, 2014
    • Sean Owen's avatar
      SPARK-1450 [EC2] Specify the default zone in the EC2 script help · 48223d88
      Sean Owen authored
      This looks like a one-liner, so I took a shot at it. There can be no fixed default availability zone since the names are different per region. But the default behavior can be documented:
      
      ```
          if opts.zone == "":
              opts.zone = random.choice(conn.get_all_zones()).name
      ```
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3454 from srowen/SPARK-1450 and squashes the following commits:
      
      9193cf3 [Sean Owen] Document that --zone defaults to a single random zone
      48223d88
  24. Nov 25, 2014
    • Xiangrui Meng's avatar
      [Spark-4509] Revert EC2 tag-based cluster membership patch · 7eba0fbe
      Xiangrui Meng authored
      This PR reverts changes related to tag-based cluster membership. As discussed in SPARK-3332, we didn't figure out a safe strategy to use tags to determine cluster membership, because tagging is not atomic. The following changes are reverted:
      
      SPARK-2333: 94053a7b
      SPARK-3213: 7faf755a
      SPARK-3608: 78d4220f.
      
      I tested launch, login, and destroy. It is easy to check the diff by comparing it to Josh's patch for branch-1.1:
      
      https://github.com/apache/spark/pull/2225/files
      
      JoshRosen I sent the PR to master. It might be easier for us to keep master and branch-1.2 the same at this time. We can always re-apply the patch once we figure out a stable solution.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #3453 from mengxr/SPARK-4509 and squashes the following commits:
      
      f0b708b [Xiangrui Meng] revert 94053a7b
      4298ea5 [Xiangrui Meng] revert 7faf755a
      35963a1 [Xiangrui Meng] Revert "SPARK-3608 Break if the instance tag naming succeeds"
      7eba0fbe
  25. Nov 05, 2014
  26. Nov 03, 2014
    • Nicholas Chammas's avatar
      [EC2] Factor out Mesos spark-ec2 branch · 2aca97c7
      Nicholas Chammas authored
      We reference a specific branch in two places. This patch makes it one place.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #3008 from nchammas/mesos-spark-ec2-branch and squashes the following commits:
      
      10a6089 [Nicholas Chammas] factor out mess spark-ec2 branch
      2aca97c7
  27. Oct 09, 2014
  28. Oct 07, 2014
    • Nicholas Chammas's avatar
      [SPARK-3398] [EC2] Have spark-ec2 intelligently wait for specific cluster states · 5912ca67
      Nicholas Chammas authored
      Instead of waiting arbitrary amounts of time for the cluster to reach a specific state, this patch lets `spark-ec2` explicitly wait for a cluster to reach a desired state.
      
      This is useful in a couple of situations:
      * The cluster is launching and you want to wait until SSH is available before installing stuff.
      * The cluster is being terminated and you want to wait until all the instances are terminated before trying to delete security groups.
      
      This patch removes the need for the `--wait` option and removes some of the time-based retry logic that was being used.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2339 from nchammas/spark-ec2-wait-properly and squashes the following commits:
      
      43a69f0 [Nicholas Chammas] short-circuit SSH check; linear backoff
      9a9e035 [Nicholas Chammas] remove extraneous comment
      26c5ed0 [Nicholas Chammas] replace print with write()
      bb67c06 [Nicholas Chammas] deprecate wait option; remove dead code
      7969265 [Nicholas Chammas] fix long line (PEP 8)
      126e4cf [Nicholas Chammas] wait for specific cluster states
      5912ca67
  29. Sep 29, 2014
    • Nicholas Chammas's avatar
      [EC2] Sort long, manually-inputted dictionaries · aedd251c
      Nicholas Chammas authored
      Similar to the work done in #2571, this PR just sorts the remaining manually-inputted dicts in the EC2 script so they are easier to maintain.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2578 from nchammas/ec2-dict-sort and squashes the following commits:
      
      f55c692 [Nicholas Chammas] sort long dictionaries
      aedd251c
  30. Sep 28, 2014
    • Nicholas Chammas's avatar
      [EC2] Cleanup Python parens and disk dict · 1651cc11
      Nicholas Chammas authored
      Minor fixes:
      * Remove unnecessary parens (Python style)
      * Sort `disks_by_instance` dict and remove duplicate `t1.micro` key
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2571 from nchammas/ec2-polish and squashes the following commits:
      
      9d203d5 [Nicholas Chammas] paren and dict cleanup
      1651cc11
Loading