Skip to content
Snippets Groups Projects
  1. Sep 06, 2014
    • GuoQiang Li's avatar
      [SPARK-3397] Bump pom.xml version number of master branch to 1.2.0-SNAPSHOT · 607ae39c
      GuoQiang Li authored
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2268 from witgo/SPARK-3397 and squashes the following commits:
      
      eaf913f [GuoQiang Li] Bump pom.xml version number of master branch to 1.2.0-SNAPSHOT
      607ae39c
    • Holden Karau's avatar
      Spark-3406 add a default storage level to python RDD persist API · da35330e
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #2280 from holdenk/SPARK-3406-Python-RDD-persist-api-does-not-have-default-storage-level and squashes the following commits:
      
      33eaade [Holden Karau] As Josh pointed out, sql also override persist. Make persist behave the same as in the underlying RDD as well
      e658227 [Holden Karau] Fix the test I added
      e95a6c5 [Holden Karau] The Python persist function did not have a default storageLevel unlike the Scala API. Noticed this issue because we got a bug report back from the book where we had documented it as if it was the same as the Scala API
      da35330e
    • Tathagata Das's avatar
      [SPARK-2419][Streaming][Docs] More updates to the streaming programming guide · baff7e93
      Tathagata Das authored
      - Improvements to the kinesis integration guide from @cfregly
      - More information about unified input dstreams in main guide
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Chris Fregly <chris@fregly.com>
      
      Closes #2307 from tdas/streaming-doc-fix1 and squashes the following commits:
      
      ec40b5d [Tathagata Das] Updated figure with kinesis
      fdb9c5e [Tathagata Das] Fixed style issues with kinesis guide
      036d219 [Chris Fregly] updated kinesis docs and added an arch diagram
      24f622a [Tathagata Das] More modifications.
      baff7e93
    • Nicholas Chammas's avatar
      [EC2] don't duplicate default values · 0c681dd6
      Nicholas Chammas authored
      This PR makes two minor changes to the `spark-ec2` script:
      
      1. The script's input parameter default values are duplicated into the help text. This is unnecessary. This PR replaces the duplicated info with the appropriate `optparse`  placeholder.
      2. The default Spark version currently needs to be updated by hand during each release, which is known to be a faulty process. This PR places that default value in an easy-to-spot place.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2290 from nchammas/spark-ec2-default-version and squashes the following commits:
      
      0c6d3bb [Nicholas Chammas] don't duplicate default values
      0c681dd6
    • Reynold Xin's avatar
      [SPARK-3409][SQL] Avoid pulling in Exchange operator itself in Exchange's closures. · 1b9001f7
      Reynold Xin authored
      This is a tiny teeny optimization to move the if check of sortBasedShuffledOn to outside the closures so the closures don't need to pull in the entire Exchange operator object.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2282 from rxin/SPARK-3409 and squashes the following commits:
      
      1de3f88 [Reynold Xin] [SPARK-3409][SQL] Avoid pulling in Exchange operator itself in Exchange's closures.
      1b9001f7
    • Nicholas Chammas's avatar
      [SPARK-3361] Expand PEP 8 checks to include EC2 script and Python examples · 9422c4ee
      Nicholas Chammas authored
      This PR resolves [SPARK-3361](https://issues.apache.org/jira/browse/SPARK-3361) by expanding the PEP 8 checks to cover the remaining Python code base:
      * The EC2 script
      * All Python / PySpark examples
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2297 from nchammas/pep8-rulez and squashes the following commits:
      
      1e5ac9a [Nicholas Chammas] PEP 8 fixes to Python examples
      c3dbeff [Nicholas Chammas] PEP 8 fixes to EC2 script
      65ef6e8 [Nicholas Chammas] expand PEP 8 checks
      9422c4ee
  2. Sep 05, 2014
    • Nicholas Chammas's avatar
      [Build] suppress curl/wget progress bars · 19f61c16
      Nicholas Chammas authored
      In the Jenkins console output, `curl` gives us mountains of `#` symbols as it tries to show its download progress.
      
      ![noise from curl in Jenkins output](http://i.imgur.com/P2E7yUw.png)
      
      I don't think this is useful so I've changed things to suppress these progress bars. If there is actually some use to this, feel free to reject this proposal.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2279 from nchammas/trim-test-output and squashes the following commits:
      
      14a720c [Nicholas Chammas] suppress curl/wget progress bars
      19f61c16
    • Andrew Ash's avatar
      SPARK-3211 .take() is OOM-prone with empty partitions · ba5bcadd
      Andrew Ash authored
      Instead of jumping straight from 1 partition to all partitions, do exponential
      growth and double the number of partitions to attempt each time instead.
      
      Fix proposed by Paul Nepywoda
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #2117 from ash211/SPARK-3211 and squashes the following commits:
      
      8b2299a [Andrew Ash] Quadruple instead of double for a minor speedup
      e5f7e4d [Andrew Ash] Update comment to better reflect what we're doing
      09a27f7 [Andrew Ash] Update PySpark to be less OOM-prone as well
      3a156b8 [Andrew Ash] SPARK-3211 .take() is OOM-prone with empty partitions
      ba5bcadd
    • Kousuke Saruta's avatar
      [SPARK-3399][PySpark] Test for PySpark should ignore HADOOP_CONF_DIR and YARN_CONF_DIR · 7ff8c45d
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2270 from sarutak/SPARK-3399 and squashes the following commits:
      
      7613be6 [Kousuke Saruta] Modified pyspark script to ignore environment variables YARN_CONF_DIR and HADOOP_CONF_DIR while testing
      7ff8c45d
    • Thomas Graves's avatar
      [SPARK-3375] spark on yarn container allocation issues · 62c55760
      Thomas Graves authored
      If yarn doesn't get the containers immediately it stops asking for them and the yarn application hangs with never getting any executors.
      
      The issue here is that we are sending the number of containers as 0 after we send the original one of X. on the yarn side this clears out the original request.
      
      For a ping we should just send empty asks.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #2275 from tgravescs/SPARK-3375 and squashes the following commits:
      
      74b6820 [Thomas Graves] send empty resource requests when we aren't asking for containers
      62c55760
    • Thomas Graves's avatar
      [SPARK-3260] yarn - pass acls along with executor launch · 51b53a75
      Thomas Graves authored
      Pass along the acl settings when we launch a container so that they can be applied to viewing the logs on a running NodeManager.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #2185 from tgravescs/SPARK-3260 and squashes the following commits:
      
      6f94b5a [Thomas Graves] make unit test more robust
      28b9dd3 [Thomas Graves] yarn - pass acls along with executor launch
      51b53a75
    • Nicholas Chammas's avatar
      [Docs] fix minor MLlib case typo · 6a37ed83
      Nicholas Chammas authored
      Also make the list of features consistent in style.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2278 from nchammas/patch-1 and squashes the following commits:
      
      56df319 [Nicholas Chammas] [Docs] fix minor MLlib case typo
      6a37ed83
    • Reynold Xin's avatar
      [SPARK-3391][EC2] Support attaching up to 8 EBS volumes. · 1725a1a5
      Reynold Xin authored
      Please merge this at the same time as https://github.com/mesos/spark-ec2/pull/66
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2260 from rxin/ec2-ebs-vol and squashes the following commits:
      
      b9527d9 [Reynold Xin] Removed io1 ebs type.
      bf9c403 [Reynold Xin] Made EBS volume type configurable.
      c8e25ea [Reynold Xin] Support up to 8 EBS volumes.
      adf4f2e [Reynold Xin] Revert git repo change.
      020c542 [Reynold Xin] [SPARK-3391] Support attaching more than 1 EBS volumes.
      1725a1a5
  3. Sep 04, 2014
    • Cheng Hao's avatar
      [SPARK-3392] [SQL] Show value spark.sql.shuffle.partitions for mapred.reduce.tasks · 1904bac3
      Cheng Hao authored
      This is a tiny fix for getting the value of "mapred.reduce.tasks", which make more sense for the hive user.
      As well as the command "set -v", which should output verbose information for all of the key/values.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2261 from chenghao-intel/set_mapreduce_tasks and squashes the following commits:
      
      653858a [Cheng Hao] show value spark.sql.shuffle.partitions for mapred.reduce.tasks
      1904bac3
    • Cheng Lian's avatar
      [SPARK-2219][SQL] Added support for the "add jar" command · ee575f12
      Cheng Lian authored
      Adds logical and physical command classes for the "add jar" command.
      
      Note that this PR conflicts with and should be merged after #2215.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2242 from liancheng/add-jar and squashes the following commits:
      
      e43a2f1 [Cheng Lian] Updates AddJar according to conventions introduced in #2215
      b99107f [Cheng Lian] Added test case for ADD JAR command
      095b2c7 [Cheng Lian] Also forward ADD JAR command to Hive
      9be031b [Cheng Lian] Trims Jar path string
      8195056 [Cheng Lian] Added support for the "add jar" command
      ee575f12
    • Liang-Chi Hsieh's avatar
      [SPARK-3310][SQL] Directly use currentTable without unnecessary implicit conversion · 3eb6ef31
      Liang-Chi Hsieh authored
      We can directly use currentTable there without unnecessary implicit conversion.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2203 from viirya/direct_use_inmemoryrelation and squashes the following commits:
      
      4741d02 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into direct_use_inmemoryrelation
      b671f67 [Liang-Chi Hsieh] Can directly use currentTable there without unnecessary implicit conversion.
      3eb6ef31
    • Matei Zaharia's avatar
      Manually close old PR · 90b17a70
      Matei Zaharia authored
      Closes #544
      90b17a70
    • Matei Zaharia's avatar
      Manually close old PR · 0fdf2f5a
      Matei Zaharia authored
      Closes #1588
      0fdf2f5a
    • Kousuke Saruta's avatar
      [SPARK-3378] [DOCS] Replace the word "SparkSQL" with right word "Spark SQL" · dc1ba9e9
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2251 from sarutak/SPARK-3378 and squashes the following commits:
      
      0bfe234 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3378
      bb5938f [Kousuke Saruta] Replaced rest of "SparkSQL" with "Spark SQL"
      6df66de [Kousuke Saruta] Replaced "SparkSQL" with "Spark SQL"
      dc1ba9e9
    • Kousuke Saruta's avatar
      [SPARK-3401][PySpark] Wrong usage of tee command in python/run-tests · 4feb46c5
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2272 from sarutak/SPARK-3401 and squashes the following commits:
      
      2b35a59 [Kousuke Saruta] Modified wrong usage of tee command in python/run-tests
      4feb46c5
    • GuoQiang Li's avatar
      [Minor]Remove extra semicolon in FlumeStreamSuite.scala · 90586190
      GuoQiang Li authored
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2265 from witgo/FlumeStreamSuite and squashes the following commits:
      
      6c99e6e [GuoQiang Li] Remove extra semicolon in FlumeStreamSuite.scala
      90586190
    • Ankur Dave's avatar
      [HOTFIX] [SPARK-3400] Revert 9b225ac3 "fix GraphX EdgeRDD zipPartitions" · 00362dac
      Ankur Dave authored
      9b225ac3 has been causing GraphX tests
      to fail nondeterministically, which is blocking development for others.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #2271 from ankurdave/SPARK-3400 and squashes the following commits:
      
      10c2a97 [Ankur Dave] [HOTFIX] [SPARK-3400] Revert 9b225ac3 "fix GraphX EdgeRDD zipPartitions"
      00362dac
  4. Sep 03, 2014
    • Kousuke Saruta's avatar
      [SPARK-3372] [MLlib] MLlib doesn't pass maven build / checkstyle due to... · 1bed0a38
      Kousuke Saruta authored
      [SPARK-3372] [MLlib] MLlib doesn't pass maven build / checkstyle due to multi-byte character contained in Gradient.scala
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2248 from sarutak/SPARK-3372 and squashes the following commits:
      
      73a28b8 [Kousuke Saruta] Replaced UTF-8 hyphen with ascii hyphen
      1bed0a38
    • Matthew Farrellee's avatar
      [SPARK-2435] Add shutdown hook to pyspark · 7c6e71f0
      Matthew Farrellee authored
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2183 from mattf/SPARK-2435 and squashes the following commits:
      
      ee0ee99 [Matthew Farrellee] [SPARK-2435] Add shutdown hook to pyspark
      7c6e71f0
    • Davies Liu's avatar
      [SPARK-3335] [SQL] [PySpark] support broadcast in Python UDF · c5cbc492
      Davies Liu authored
      After this patch, broadcast can be used in Python UDF.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2243 from davies/udf_broadcast and squashes the following commits:
      
      7b88861 [Davies Liu] support broadcast in UDF
      c5cbc492
    • Cheng Lian's avatar
      [SPARK-2961][SQL] Use statistics to prune batches within cached partitions · 248067ad
      Cheng Lian authored
      This PR is based on #1883 authored by marmbrus. Key differences:
      
      1. Batch pruning instead of partition pruning
      
         When #1883 was authored, batched column buffer building (#1880) hadn't been introduced. This PR combines these two and provide partition batch level pruning, which leads to smaller memory footprints and can generally skip more elements. The cost is that the pruning predicates are evaluated more frequently (partition number multiplies batch number per partition).
      
      1. More filters are supported
      
         Filter predicates consist of `=`, `<`, `<=`, `>`, `>=` and their conjunctions and disjunctions are supported.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2188 from liancheng/in-mem-batch-pruning and squashes the following commits:
      
      68cf019 [Cheng Lian] Marked sqlContext as @transient
      4254f6c [Cheng Lian] Enables in-memory partition pruning in PartitionBatchPruningSuite
      3784105 [Cheng Lian] Overrides InMemoryColumnarTableScan.sqlContext
      d2a1d66 [Cheng Lian] Disables in-memory partition pruning by default
      062c315 [Cheng Lian] HiveCompatibilitySuite code cleanup
      16b77bf [Cheng Lian] Fixed pruning predication conjunctions and disjunctions
      16195c5 [Cheng Lian] Enabled both disjunction and conjunction
      89950d0 [Cheng Lian] Worked around Scala style check
      9c167f6 [Cheng Lian] Minor code cleanup
      3c4d5c7 [Cheng Lian] Minor code cleanup
      ea59ee5 [Cheng Lian] Renamed PartitionSkippingSuite to PartitionBatchPruningSuite
      fc517d0 [Cheng Lian] More test cases
      1868c18 [Cheng Lian] Code cleanup, bugfix, and adding tests
      cb76da4 [Cheng Lian] Added more predicate filters, fixed table scan stats for testing purposes
      385474a [Cheng Lian] Merge branch 'inMemStats' into in-mem-batch-pruning
      248067ad
    • Cheng Lian's avatar
      [SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() · f48420fd
      Cheng Lian authored
      By overriding `executeCollect()` in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. `sql("SET").collect()`.
      
      Previously, `Command.sideEffectResult` returns a `Seq[Any]`, and the `execute()` method in sub-classes of `Command` typically convert that to a `Seq[Row]` then parallelize it to an RDD. Now with this PR, `sideEffectResult` is required to return a `Seq[Row]` directly, so that `executeCollect()` can directly leverage that and be factored to the `Command` parent class.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2215 from liancheng/lightweight-commands and squashes the following commits:
      
      3fbef60 [Cheng Lian] Factored execute() method of physical commands to parent class Command
      5a0e16c [Cheng Lian] Passes test suites
      e0e12e9 [Cheng Lian] Refactored Command.sideEffectResult and Command.executeCollect
      995bdd8 [Cheng Lian] Cleaned up DescribeHiveTableCommand
      542977c [Cheng Lian] Avoids confusion between logical and physical plan by adding package prefixes
      55b2aa5 [Cheng Lian] Avoids distributed jobs when execution SQL commands
      f48420fd
    • Kousuke Saruta's avatar
      [SPARK-3233] Executor never stop its SparnEnv, BlockManager, ConnectionManager etc. · 4bba10c4
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2138 from sarutak/SPARK-3233 and squashes the following commits:
      
      c0205b7 [Kousuke Saruta] Merge branch 'SPARK-3233' of github.com:sarutak/spark into SPARK-3233
      064679d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3233
      d3005fd [Kousuke Saruta] Modified Class definition format of BlockManagerMaster
      039b747 [Kousuke Saruta] Modified style
      889e2d1 [Kousuke Saruta] Modified BlockManagerMaster to be able to be past isDriver flag
      4da8535 [Kousuke Saruta] Modified BlockManagerMaster#stop to send StopBlockManagerMaster message when sender is Driver
      6518c3a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3233
      d5ab19a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3233
      6bce25c [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3233
      6058a58 [Kousuke Saruta] Modified Executor not to invoke SparkEnv#stop in local mode
      e5ad9d3 [Kousuke Saruta] Modified Executor to stop SparnEnv at the end of itself
      4bba10c4
    • scwf's avatar
      [SPARK-3303][core] fix SparkContextSchedulerCreationSuite test error · e08ea739
      scwf authored
      run test with the master branch with this command when mesos native lib is set
      sbt/sbt -Phive "test-only org.apache.spark.SparkContextSchedulerCreationSuite"
      
      get this error:
      [info] SparkContextSchedulerCreationSuite:
      [info] - bad-master
      [info] - local
      [info] - local-*
      [info] - local-n
      [info] - local--n-failures
      [info] - local-n-failures
      [info] - bad-local-n
      [info] - bad-local-n-failures
      [info] - local-default-parallelism
      [info] - simr
      [info] - local-cluster
      [info] - yarn-cluster
      [info] - yarn-standalone
      [info] - yarn-client
      [info] - mesos fine-grained
      [info] - mesos coarse-grained ** FAILED ***
      [info] Executor Spark home `spark.mesos.executor.home` is not set!
      
      Since `executorSparkHome` only used in `createCommand`, move `val executorSparkHome...` to `createCommand` to fix this issue.
      
      Author: scwf <wangfei1@huawei.com>
      Author: wangfei <wangfei_hello@126.com>
      
      Closes #2199 from scwf/SparkContextSchedulerCreationSuite and squashes the following commits:
      
      ef1de22 [scwf] fix code fomate
      19d26f3 [scwf] fix conflict
      d9a8a60 [wangfei] fix SparkContextSchedulerCreationSuite test error
      e08ea739
    • Tathagata Das's avatar
      [SPARK-2419][Streaming][Docs] Updates to the streaming programming guide · a5224079
      Tathagata Das authored
      Updated the main streaming programming guide, and also added source-specific guides for Kafka, Flume, Kinesis.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #2254 from tdas/streaming-doc-fix and squashes the following commits:
      
      e45c6d7 [Jacek Laskowski] More fixes from an old PR
      5125316 [Tathagata Das] Fixed links
      dc02f26 [Tathagata Das] Refactored streaming kinesis guide and made many other changes.
      acbc3e3 [Tathagata Das] Fixed links between streaming guides.
      cb7007f [Tathagata Das] Added Streaming + Flume integration guide.
      9bd9407 [Tathagata Das] Updated streaming programming guide with additional information from SPARK-2419.
      a5224079
    • Liang-Chi Hsieh's avatar
      [SPARK-3345] Do correct parameters for ShuffleFileGroup · 996b7434
      Liang-Chi Hsieh authored
      In the method `newFileGroup` of class `FileShuffleBlockManager`, the parameters for creating new `ShuffleFileGroup` object is in wrong order.
      
      Because in current codes, the parameters `shuffleId` and `fileId` are not used. So it doesn't cause problem now. However it should be corrected for readability and avoid future problem.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2235 from viirya/correct_shufflefilegroup_params and squashes the following commits:
      
      fe72567 [Liang-Chi Hsieh] Do correct parameters for ShuffleFileGroup.
      996b7434
    • Andrew Or's avatar
      [Minor] Fix outdated Spark version · 2784822e
      Andrew Or authored
      This is causing the event logs to include a file called SPARK_VERSION_1.0.0, which is not accurate.
      
      Author: Andrew Or <andrewor14@gmail.com>
      Author: andrewor14 <andrewor14@gmail.com>
      
      Closes #2255 from andrewor14/spark-version and squashes the following commits:
      
      1fbdfe9 [andrewor14] Snapshot
      805a1c8 [Andrew Or] JK. Update Spark version to 1.2.0 instead.
      bffbaab [Andrew Or] Update Spark version to 1.1.0
      2784822e
    • Marcelo Vanzin's avatar
      [SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. · f2b5b619
      Marcelo Vanzin authored
      This change exposes the application ID generated by the Spark Master, Mesos or Yarn
      via the SparkListenerApplicationStart event. It then uses that information to expose the
      application via its ID in the history server, instead of using the internal directory name
      generated by the event logger as an application id. This allows someone who knows
      the application ID to easily figure out the URL for the application's entry in the HS, aside
      from looking better.
      
      In Yarn mode, this is used to generate a direct link from the RM application list to the
      Spark history server entry (thus providing a fix for SPARK-2150).
      
      Note this sort of assumes that the different managers will generate app ids that are
      sufficiently different from each other that clashes will not occur.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Andrew Or <andrewor14@gmail.com>
      
      Closes #1218 from vanzin/yarn-hs-link-2 and squashes the following commits:
      
      2d19f3c [Marcelo Vanzin] Review feedback.
      6706d3a [Marcelo Vanzin] Implement applicationId() in base classes.
      56fe42e [Marcelo Vanzin] Fix cluster mode history address, plus a cleanup.
      44112a8 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      8278316 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      a86bbcf [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      a0056e6 [Marcelo Vanzin] Unbreak test.
      4b10cfd [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      cb0cab2 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      25f2826 [Marcelo Vanzin] Add MIMA excludes.
      f0ba90f [Marcelo Vanzin] Use BufferedIterator.
      c90a08d [Marcelo Vanzin] Remove unused code.
      3f8ec66 [Marcelo Vanzin] Review feedback.
      21aa71b [Marcelo Vanzin] Fix JSON test.
      b022bae [Marcelo Vanzin] Undo SparkContext cleanup.
      c6d7478 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      4e3483f [Marcelo Vanzin] Fix test.
      57517b8 [Marcelo Vanzin] Review feedback. Mostly, more consistent use of Scala's Option.
      311e49d [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      d35d86f [Marcelo Vanzin] Fix yarn backend after rebase.
      36dc362 [Marcelo Vanzin] Don't use Iterator::takeWhile().
      0afd696 [Marcelo Vanzin] Wait until master responds before returning from start().
      abc4697 [Marcelo Vanzin] Make FsHistoryProvider keep a map of applications by id.
      26b266e [Marcelo Vanzin] Use Mesos framework ID as Spark application ID.
      b3f3664 [Marcelo Vanzin] [yarn] Make the RM link point to the app direcly in the HS.
      2fb7de4 [Marcelo Vanzin] Expose the application ID in the ApplicationStart event.
      ed10348 [Marcelo Vanzin] Expose application id to spark context.
      f2b5b619
    • Marcelo Vanzin's avatar
      [SPARK-2845] Add timestamps to block manager events. · ccc69e26
      Marcelo Vanzin authored
      These are not used by the UI but are useful when analysing the
      logs from a spark job.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #654 from vanzin/bm-event-tstamp and squashes the following commits:
      
      d5d6e66 [Marcelo Vanzin] Fix tests.
      ec06218 [Marcelo Vanzin] Review feedback.
      f134dbc [Marcelo Vanzin] Merge branch 'master' into bm-event-tstamp
      b495b7c [Marcelo Vanzin] Merge branch 'master' into bm-event-tstamp
      7d2fe9e [Marcelo Vanzin] Review feedback.
      d6f381c [Marcelo Vanzin] Update tests added after patch was created.
      45e3bf8 [Marcelo Vanzin] Fix unit test after merge.
      b37a10f [Marcelo Vanzin] Use === in test assertions.
      ef72824 [Marcelo Vanzin] Handle backwards compatibility with 1.0.0.
      aca1151 [Marcelo Vanzin] Fix unit test to check new fields.
      efdda8e [Marcelo Vanzin] Add timestamps to block manager events.
      ccc69e26
    • RJ Nowling's avatar
      [SPARK-3263][GraphX] Fix changes made to GraphGenerator.logNormalGraph in PR #720 · e5d37680
      RJ Nowling authored
      PR #720 made multiple changes to GraphGenerator.logNormalGraph including:
      
      * Replacing the call to functions for generating random vertices and edges with in-line implementations with different equations. Based on reading the Pregel paper, I believe the in-line functions are incorrect.
      * Hard-coding of RNG seeds so that method now generates the same graph for a given number of vertices, edges, mu, and sigma -- user is not able to override seed or specify that seed should be randomly generated.
      * Backwards-incompatible change to logNormalGraph signature with introduction of new required parameter.
      * Failed to update scala docs and programming guide for API changes
      * Added a Synthetic Benchmark in the examples.
      
      This PR:
      * Removes the in-line calls and calls original vertex / edge generation functions again
      * Adds an optional seed parameter for deterministic behavior (when desired)
      * Keeps the number of partitions parameter that was added.
      * Keeps compatibility with the synthetic benchmark example
      * Maintains backwards-compatible API
      
      Author: RJ Nowling <rnowling@gmail.com>
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #2168 from rnowling/graphgenrand and squashes the following commits:
      
      f1cd79f [Ankur Dave] Style fixes
      e11918e [RJ Nowling] Fix bad comparisons in unit tests
      785ac70 [RJ Nowling] Fix style error
      c70868d [RJ Nowling] Fix logNormalGraph scala doc for seed
      41fd1f8 [RJ Nowling] Fix logNormalGraph scala doc for seed
      799f002 [RJ Nowling] Added test for different seeds for sampleLogNormal
      43949ad [RJ Nowling] Added test for different seeds for generateRandomEdges
      2faf75f [RJ Nowling] Added unit test for logNormalGraph
      82f22397 [RJ Nowling] Add unit test for sampleLogNormal
      b99cba9 [RJ Nowling] Make sampleLogNormal private to Spark (vs private) for unit testing
      6803da1 [RJ Nowling] Add GraphGeneratorsSuite with test for generateRandomEdges
      1c8fc44 [RJ Nowling] Connected components part of SynthBenchmark was failing to call count on RDD before printing
      dfbb6dd [RJ Nowling] Fix parameter name in SynthBenchmark docs
      b5eeb80 [RJ Nowling] Add optional seed parameter to SynthBenchmark and set default to randomly generate a seed
      1ff8d30 [RJ Nowling] Fix bug in generateRandomEdges where numVertices instead of numEdges was used to control number of edges to generate
      98bb73c [RJ Nowling] Add documentation for logNormalGraph parameters
      d40141a [RJ Nowling] Fix style error
      684804d [RJ Nowling] revert PR #720 which introduce errors in logNormalGraph and messed up seeding of RNGs.  Add user-defined optional seed for deterministic behavior
      c183136 [RJ Nowling] Fix to deterministic GraphGenerators.logNormalGraph that allows generating graphs randomly using optional seed.
      015010c [RJ Nowling] Fixed GraphGenerator logNormalGraph API to make backward-incompatible change in commit 894ecde0
      e5d37680
    • Davies Liu's avatar
      [SPARK-3309] [PySpark] Put all public API in __all__ · 6481d274
      Davies Liu authored
      Put all public API in __all__, also put them all in pyspark.__init__.py, then we can got all the documents for public API by `pydoc pyspark`. It also can be used by other programs (such as Sphinx or Epydoc) to generate only documents for public APIs.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2205 from davies/public and squashes the following commits:
      
      c6c5567 [Davies Liu] fix message
      f7b35be [Davies Liu] put SchemeRDD, Row in pyspark.sql module
      7e3016a [Davies Liu] add __all__ in mllib
      6281b48 [Davies Liu] fix doc for SchemaRDD
      6caab21 [Davies Liu] add public interfaces into pyspark.__init__.py
      6481d274
    • Marcelo Vanzin's avatar
      [SPARK-3187] [yarn] Cleanup allocator code. · 6a72a369
      Marcelo Vanzin authored
      Move all shared logic to the base YarnAllocator class, and leave
      the version-specific logic in the version-specific module.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2169 from vanzin/SPARK-3187 and squashes the following commits:
      
      46c2826 [Marcelo Vanzin] Hide the privates.
      4dc9c83 [Marcelo Vanzin] Actually release containers.
      8b1a077 [Marcelo Vanzin] Changes to the Yarn alpha allocator.
      f3f5f1d [Marcelo Vanzin] [SPARK-3187] [yarn] Cleanup allocator code.
      6a72a369
  5. Sep 02, 2014
    • Patrick Wendell's avatar
      SPARK-3358: [EC2] Switch back to HVM instances for m3.X. · c64cc435
      Patrick Wendell authored
      During regression tests of Spark 1.1 we discovered perf issues with
      PVM instances when running PySpark. This reverts a change added in #1156
      which changed the default type for m3 instances to PVM.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2244 from pwendell/ec2-hvm and squashes the following commits:
      
      1342d7e [Patrick Wendell] SPARK-3358: [EC2] Switch back to HVM instances for m3.X.
      c64cc435
    • Liang-Chi Hsieh's avatar
      [SPARK-3300][SQL] No need to call clear() and shorten build() · 24ab3840
      Liang-Chi Hsieh authored
      The function `ensureFreeSpace` in object `ColumnBuilder` clears old buffer before copying its content to new buffer. This PR fixes it.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2195 from viirya/fix_buffer_clear and squashes the following commits:
      
      792f009 [Liang-Chi Hsieh] no need to call clear(). use flip() instead of calling limit(), position() and rewind().
      df2169f [Liang-Chi Hsieh] should clean old buffer after copying its content.
      24ab3840
    • Cheng Lian's avatar
      [SQL] Renamed ColumnStat to ColumnMetrics to avoid confusion between ColumnStats · 19d3e1e8
      Cheng Lian authored
      Class names of these two are just too similar.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2189 from liancheng/column-metrics and squashes the following commits:
      
      8bb3b21 [Cheng Lian] Renamed ColumnStat to ColumnMetrics to avoid confusion between ColumnStats
      19d3e1e8
Loading