Skip to content
Snippets Groups Projects
  1. May 02, 2015
    • Burak Yavuz's avatar
      [SPARK-7242] added python api for freqItems in DataFrames · 2e0f3579
      Burak Yavuz authored
      The python api for DataFrame's plus addressed your comments from previous PR.
      rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5859 from brkyvz/df-freq-py2 and squashes the following commits:
      
      f9aa9ce [Burak Yavuz] addressed comments v0.1
      4b25056 [Burak Yavuz] added python api for freqItems
      2e0f3579
  2. May 01, 2015
    • Mridul Muralidharan's avatar
      [SPARK-7317] [Shuffle] Expose shuffle handle · b79aeb95
      Mridul Muralidharan authored
      Details in JIRA, in a nut-shell, all machinary for custom RDD's to leverage spark shuffle directly (without exposing impl details of shuffle) exists - except for this small piece.
      
      Exposing this will allow for custom dependencies to get a handle to ShuffleHandle - which they can then leverage on reduce side.
      
      Author: Mridul Muralidharan <mridulm@yahoo-inc.com>
      
      Closes #5857 from mridulm/expose_shuffle_handle and squashes the following commits:
      
      d8b6bd4 [Mridul Muralidharan] Expose ShuffleHandle
      b79aeb95
    • Marcelo Vanzin's avatar
      [SPARK-6229] Add SASL encryption to network library. · 38d4e9e4
      Marcelo Vanzin authored
      There are two main parts of this change:
      
      - Extending the bootstrap mechanism in the network library to add a server-side
        bootstrap (which works a little bit differently than the client-side bootstrap), and
        to allow the  bootstraps to modify the underlying channel.
      
      - Use SASL to encrypt data going through the RPC channel.
      
      The second item requires some non-optimal code to be able to work around the
      fact that the outbound path in netty is not thread-safe, and ordering is very important
      when encryption is in the picture.
      
      A lot of the changes outside the network/common library are just to adjust to the
      changed API for initializing the RPC server.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5377 from vanzin/SPARK-6229 and squashes the following commits:
      
      ff01966 [Marcelo Vanzin] Use fancy new size config style.
      be53f32 [Marcelo Vanzin] Merge branch 'master' into SPARK-6229
      47d4aff [Marcelo Vanzin] Merge branch 'master' into SPARK-6229
      7a2a805 [Marcelo Vanzin] Clean up some unneeded changes.
      2f92237 [Marcelo Vanzin] Add comment.
      67bb0c6 [Marcelo Vanzin] Revert "Avoid exposing ByteArrayWritableChannel outside of test code."
      065f684 [Marcelo Vanzin] Add test to verify chunking.
      3d1695d [Marcelo Vanzin] Minor cleanups.
      73cff0e [Marcelo Vanzin] Skip bytes in decode path too.
      318ad23 [Marcelo Vanzin] Avoid exposing ByteArrayWritableChannel outside of test code.
      346f829 [Marcelo Vanzin] Avoid trip through channel selector by not reporting 0 bytes written.
      a4a5938 [Marcelo Vanzin] Review feedback.
      4797519 [Marcelo Vanzin] Remove unused import.
      9908ada [Marcelo Vanzin] Fix test, SASL backend disposal.
      7fe1489 [Marcelo Vanzin] Add a test that makes sure encryption is actually enabled.
      adb6f9d [Marcelo Vanzin] Review feedback.
      cf2a605 [Marcelo Vanzin] Clean up some code.
      8584323 [Marcelo Vanzin] Fix a comment.
      e98bc55 [Marcelo Vanzin] Add option to only allow encrypted connections to the server.
      dad42fc [Marcelo Vanzin] Make encryption thread-safe, less memory-intensive.
      b00999a [Marcelo Vanzin] Consolidate ByteArrayWritableChannel, fix SASL code to match master changes.
      b923cae [Marcelo Vanzin] Make SASL encryption handler thread-safe, handle FileRegion messages.
      39539a7 [Marcelo Vanzin] Add config option to enable SASL encryption.
      351a86f [Marcelo Vanzin] Add SASL encryption to network library.
      fbe6ccb [Marcelo Vanzin] Add TransportServerBootstrap, make SASL code use it.
      38d4e9e4
    • Chris Heller's avatar
      [SPARK-2691] [MESOS] Support for Mesos DockerInfo · 8f50a07d
      Chris Heller authored
      This patch adds partial support for running spark on mesos inside of a docker container. Only fine-grained mode is presently supported, and there is no checking done to ensure that the version of libmesos is recent enough to have a DockerInfo structure in the protobuf (other than pinning a mesos version in the pom.xml).
      
      Author: Chris Heller <hellertime@gmail.com>
      
      Closes #3074 from hellertime/SPARK-2691 and squashes the following commits:
      
      d504af6 [Chris Heller] Assist type inference
      f64885d [Chris Heller] Fix errant line length
      17c41c0 [Chris Heller] Base Dockerfile on mesosphere/mesos image
      8aebda4 [Chris Heller] Simplfy Docker image docs
      1ae7f4f [Chris Heller] Style points
      974bd56 [Chris Heller] Convert map to flatMap
      5d8bdf7 [Chris Heller] Factor out the DockerInfo construction.
      7b75a3d [Chris Heller] Align to styleguide
      80108e7 [Chris Heller] Bend to the will of RAT
      ba77056 [Chris Heller] Explicit RAT exclude
      abda5e5 [Chris Heller] Wildcard .rat-excludes
      2f2873c [Chris Heller] Exclude spark-mesos from RAT
      a589a5b [Chris Heller] Add example Dockerfile
      b6825ce [Chris Heller] Remove use of EasyMock
      eae1b86 [Chris Heller] Move properties under 'spark.mesos.'
      c184d00 [Chris Heller] Use map on Option to be consistent with non-coarse code
      fb9501a [Chris Heller] Bumped mesos version to current release
      fa11879 [Chris Heller] Add listenerBus to EasyMock
      882151e [Chris Heller] Changes to scala style
      b22d42d [Chris Heller] Exclude template from RAT
      db536cf [Chris Heller] Remove unneeded mocks
      dea1bd5 [Chris Heller] Force default protocol
      7dac042 [Chris Heller] Add test for DockerInfo
      5456c0c [Chris Heller] Adjust syntax style
      521c194 [Chris Heller] Adjust version info
      6e38f70 [Chris Heller] Document Mesos Docker properties
      29572ab [Chris Heller] Support all DockerInfo fields
      b8c0dea [Chris Heller] Support for mesos DockerInfo in coarse-mode.
      482a9fd [Chris Heller] Support for mesos DockerInfo in fine-grained mode.
      8f50a07d
    • WangTaoTheTonic's avatar
      [SPARK-6443] [SPARK SUBMIT] Could not submit app in standalone cluster mode when HA is enabled · b4b43df8
      WangTaoTheTonic authored
      **3/26 update:**
      * Akka-based:
        Use an array of `ActorSelection` to represent multiple master. Add an `activeMasterActor` for query status of driver. And will add lost masters( including the standby one) to `lostMasters`.
        When size of `lostMasters` equals or greater than # of all masters, we should give an error that all masters are not avalible.
      
      * Rest-based:
        When all masters are not available(throw an exception), we use akka gateway to submit apps.
      
      I have tested simply on standalone HA cluster(with two masters alive and one alive/one dead), it worked.
      
      There might remains some issues on style or message print, but we can check the solution then fix them together.
      
      /cc srowen andrewor14
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #5116 from WangTaoTheTonic/SPARK-6443 and squashes the following commits:
      
      2a28aab [WangTaoTheTonic] based the newest change https://github.com/apache/spark/pull/5144
      76fd411 [WangTaoTheTonic] rebase
      f4f972b [WangTaoTheTonic] rebase...again
      a41de0b [WangTaoTheTonic] rebase
      220cb3c [WangTaoTheTonic] move connect exception inside
      35119a0 [WangTaoTheTonic] style and compile issues
      9d636be [WangTaoTheTonic] per Andrew's comments
      979760c [WangTaoTheTonic] rebase
      e4f4ece [WangTaoTheTonic] fix failed test
      5d23958 [WangTaoTheTonic] refact some duplicated code, style and comments
      7a881b3 [WangTaoTheTonic] when one of masters is gone, we still can submit
      2b011c9 [WangTaoTheTonic] fix broken tests
      60d97a4 [WangTaoTheTonic] rebase
      fa1fa80 [WangTaoTheTonic] submit app to HA cluster in standalone cluster mode
      b4b43df8
    • Timothy Chen's avatar
      [SPARK-7216] [MESOS] Add driver details page to Mesos cluster UI. · 20221934
      Timothy Chen authored
      Add a details page that displays Mesos driver in the Mesos cluster UI
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #5763 from tnachen/mesos_cluster_page and squashes the following commits:
      
      55f36eb [Timothy Chen] Add driver details page to Mesos cluster UI.
      20221934
    • Sandy Ryza's avatar
      [SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a negative n... · 099327d5
      Sandy Ryza authored
      ...umber of executors
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #5704 from sryza/sandy-spark-6954 and squashes the following commits:
      
      b7890fb [Sandy Ryza] Avoid ramping up to an existing number of executors
      6eb516a [Sandy Ryza] SPARK-6954. ExecutorAllocationManager can end up requesting a negative number of executors
      099327d5
    • Holden Karau's avatar
      [SPARK-3444] Provide an easy way to change log level · ae98eec7
      Holden Karau authored
      Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from pwendel & davies
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #5791 from holdenk/SPARK-3444-provide-an-easy-way-to-change-log-level-r2 and squashes the following commits:
      
      3bf3be9 [Holden Karau] fix exception
      42ba873 [Holden Karau] fix exception
      9117244 [Holden Karau] Only allow valid log levels, throw exception if invalid log level.
      338d7bf [Holden Karau] rename setLoggingLevel to setLogLevel
      fac14a0 [Holden Karau] Fix style errors
      d9d03f3 [Holden Karau] Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from @pwendel & @davies
      ae98eec7
    • cody koeninger's avatar
      [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2 · 47864840
      cody koeninger authored
      i don't think this should be merged until after 1.3.0 is final
      
      Author: cody koeninger <cody@koeninger.org>
      Author: Helena Edelson <helena.edelson@datastax.com>
      
      Closes #4537 from koeninger/wip-2808-kafka-0.8.2-upgrade and squashes the following commits:
      
      803aa2c [cody koeninger] [SPARK-2808][Streaming][Kafka] code cleanup per TD
      e6dfaf6 [cody koeninger] [SPARK-2808][Streaming][Kafka] pointless whitespace change to trigger jenkins again
      1770abc [cody koeninger] [SPARK-2808][Streaming][Kafka] make waitUntilLeaderOffset easier to call, call it from python tests as well
      d4267e9 [cody koeninger] [SPARK-2808][Streaming][Kafka] fix stderr redirect in python test script
      30d991d [cody koeninger] [SPARK-2808][Streaming][Kafka] remove stderr prints since it breaks python 3 syntax
      1d896e2 [cody koeninger] [SPARK-2808][Streaming][Kafka] add even even more logging to python test
      4c4557f [cody koeninger] [SPARK-2808][Streaming][Kafka] add even more logging to python test
      115aeee [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
      2712649 [cody koeninger] [SPARK-2808][Streaming][Kafka] add more logging to python test, see why its timing out in jenkins
      2b92d3f [cody koeninger] [SPARK-2808][Streaming][Kafka] wait for leader offsets in the java test as well
      3824ce3 [cody koeninger] [SPARK-2808][Streaming][Kafka] naming / comments per tdas
      61b3464 [cody koeninger] [SPARK-2808][Streaming][Kafka] delay for second send in boundary condition test
      af6f3ec [cody koeninger] [SPARK-2808][Streaming][Kafka] delay test until latest leader offset matches expected value
      9edab4c [cody koeninger] [SPARK-2808][Streaming][Kafka] more shots in the dark on jenkins failing test
      c70ee43 [cody koeninger] [SPARK-2808][Streaming][Kafka] add more asserts to test, try to figure out why it fails on jenkins but not locally
      1d10751 [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
      ed02d2c [cody koeninger] [SPARK-2808][Streaming][Kafka] move default argument for api version to overloaded method, for binary compat
      407382e [cody koeninger] [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2.1
      77de6c2 [cody koeninger] Merge branch 'master' into wip-2808-kafka-0.8.2-upgrade
      6953429 [cody koeninger] [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
      2e67c66 [Helena Edelson] #SPARK-2808 Update to Kafka 0.8.2.0 GA from beta.
      d9dc2bc [Helena Edelson] Merge remote-tracking branch 'upstream/master' into wip-2808-kafka-0.8.2-upgrade
      e768164 [Helena Edelson] #2808 update kafka to version 0.8.2
      47864840
    • jerryshao's avatar
      [SPARK-7112][Streaming][WIP] Add a InputInfoTracker to track all the input streams · b88c275e
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #5680 from jerryshao/SPARK-7111 and squashes the following commits:
      
      339f854 [Saisai Shao] Add an end-to-end test
      812bcaf [jerryshao] Continue address the comments
      abd0036 [jerryshao] Address the comments
      727264e [jerryshao] Fix comment typo
      6682bef [jerryshao] Fix compile issue
      8325787 [jerryshao] Fix rebase issue
      17fa251 [jerryshao] Refactor to build InputInfoTracker
      ee1b536 [jerryshao] Add DirectStreamTracker to track the direct streams
      b88c275e
    • zsxwing's avatar
      [SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler · ebc25a4d
      zsxwing authored
      Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler when stopping them.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5845 from zsxwing/SPARK-7309 and squashes the following commits:
      
      6c004fd [zsxwing] Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler
      ebc25a4d
    • Cheng Hao's avatar
      [SPARK-6999] [SQL] Remove the infinite recursive method (useless) · 98e70458
      Cheng Hao authored
      Remove the method, since it causes infinite recursive calls. And seems it's a dummy method, since we have the API:
      `def createDataFrame(rowRDD: JavaRDD[Row], schema: StructType): DataFrame`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #5804 from chenghao-intel/spark_6999 and squashes the following commits:
      
      63220a8 [Cheng Hao] remove the infinite recursive method (useless)
      98e70458
    • Rajendra Gokhale (rvgcentos)'s avatar
      [SPARK-7304] [BUILD] Include $@ in call to mvn consistently in make-distribution.sh · e6fb3771
      Rajendra Gokhale (rvgcentos) authored
      Adding the $ allows the caller of this script to supply additional arguments to the mvn command and is consistent with how mvn is being invoked elsewhere in the scripts
      
      Author: Rajendra Gokhale (rvgcentos) <rvg@cloudera.com>
      
      Closes #5846 from palamau/master and squashes the following commits:
      
      e5f2adb [Rajendra Gokhale (rvgcentos)] Add $@ in call to mvn consistently in make-distribution.sh
      e6fb3771
    • Yin Huai's avatar
      [SPARK-7312][SQL] SPARK-6913 broke jdk6 build · 41c6a44b
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7312
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5847 from yhuai/jdbcJava6 and squashes the following commits:
      
      68433a2 [Yin Huai] compile with Java 6
      41c6a44b
    • Patrick Wendell's avatar
      5c1fabaf
    • Hari Shreedharan's avatar
      [SPARK-5342] [YARN] Allow long running Spark apps to run on secure YARN/HDFS · b1f4ca82
      Hari Shreedharan authored
      Take 2. Does the same thing as #4688, but fixes Hadoop-1 build.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #5823 from harishreedharan/kerberos-longrunning and squashes the following commits:
      
      3c86bba [Hari Shreedharan] Import fixes. Import postfixOps explicitly.
      4d04301 [Hari Shreedharan] Minor formatting fixes.
      b5e7a72 [Hari Shreedharan] Remove reflection, use a method in SparkHadoopUtil to update the token renewer.
      7bff6e9 [Hari Shreedharan] Make sure all required classes are present in the jar. Fix import order.
      e851f70 [Hari Shreedharan] Move the ExecutorDelegationTokenRenewer to yarn module. Use reflection to use it.
      36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments.
      611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens.
      09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml
      6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to.
      072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils.
      f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens.
      ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates.
      7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required.
      e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message.
      0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      7f1bc58 [Hari Shreedharan] Minor fixes, cleanup.
      bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup.
      f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files.
      2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted.
      61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS.
      62c45ce [Hari Shreedharan] Relogin from keytab periodically.
      fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues.
      42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master.
      0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity.
      9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes.
      f4fd711 [Hari Shreedharan] Fix SparkConf usage.
      2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS.
      af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required.
      f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file.
      f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials.
      5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues.
      b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS
      0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire.
      d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens()
      8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object.
      fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start()
      41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
      bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil.
      f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled.
      2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
      ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos
      77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.
      b1f4ca82
    • Burak Yavuz's avatar
      [SPARK-7240][SQL] Single pass covariance calculation for dataframes · 4dc8d744
      Burak Yavuz authored
      Added the calculation of covariance between two columns to DataFrames.
      
      cc mengxr rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5825 from brkyvz/df-cov and squashes the following commits:
      
      cb18046 [Burak Yavuz] changed to sample covariance
      f2e862b [Burak Yavuz] fixed failed test
      51e39b8 [Burak Yavuz] moved implementation
      0c6a759 [Burak Yavuz] addressed math comments
      8456eca [Burak Yavuz] fix pyStyle3
      aa2ad29 [Burak Yavuz] fix pyStyle2
      4e97a50 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into df-cov
      e3b0b85 [Burak Yavuz] addressed comments v0.1
      a7115f1 [Burak Yavuz] fix python style
      7dc6dbc [Burak Yavuz] reorder imports
      408cb77 [Burak Yavuz] initial commit
      4dc8d744
    • Marcelo Vanzin's avatar
      [SPARK-7281] [YARN] Add option to set AM's lib path in client mode. · 7b5dd3e3
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5813 from vanzin/SPARK-7281 and squashes the following commits:
      
      1cb6f42 [Marcelo Vanzin] [SPARK-7281] [yarn] Add option to set AM's lib path in client mode.
      7b5dd3e3
    • Nishkam Ravi's avatar
      [SPARK-7213] [YARN] Check for read permissions before copying a Hadoop config file · f53a4882
      Nishkam Ravi authored
      Author: Nishkam Ravi <nravi@cloudera.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      Author: nravi <nravi@c1704.halxg.cloudera.com>
      
      Closes #5760 from nishkamravi2/master_nravi and squashes the following commits:
      
      eaa13b5 [nishkamravi2] Update Client.scala
      981afd2 [Nishkam Ravi] Check for read permission before initiating copy
      1b81383 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      0f1abd0 [nishkamravi2] Update Utils.scala
      474e3bf [nishkamravi2] Update DiskBlockManager.scala
      97c383e [nishkamravi2] Update Utils.scala
      8691e0c [Nishkam Ravi] Add a try/catch block around Utils.removeShutdownHook
      2be1e76 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      1c13b79 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      bad4349 [nishkamravi2] Update Main.java
      36a6f87 [Nishkam Ravi] Minor changes and bug fixes
      b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
      d9658d6 [Nishkam Ravi] Changes for SPARK-6406
      ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
      345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ac58975 [Nishkam Ravi] spark-class changes
      06bfeb0 [nishkamravi2] Update spark-class
      35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
      4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
      746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
      bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      d453197 [nishkamravi2] Update NewHadoopRDD.scala
      6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
      0ce2c32 [nishkamravi2] Update HadoopRDD.scala
      f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
      71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      494d8c0 [nishkamravi2] Update DiskBlockManager.scala
      3c5ddba [nishkamravi2] Update DiskBlockManager.scala
      f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
      79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
      535295a [nishkamravi2] Update TaskSetManager.scala
      3e1b616 [Nishkam Ravi] Modify test for maxResultSize
      9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
      5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      636a9ff [nishkamravi2] Update YarnAllocator.scala
      8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
      35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
      5ac2ec1 [Nishkam Ravi] Remove out
      dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
      42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
      362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
      c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
      1cf2d1e [nishkamravi2] Update YarnAllocator.scala
      ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
      2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
      2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
      3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
      5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
      eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
      df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
      6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
      5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
      681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
      f53a4882
    • Patrick Wendell's avatar
      c6d9a429
    • Patrick Wendell's avatar
      Revert "[SPARK-7287] enabled fixed test" · 58d6584d
      Patrick Wendell authored
      This reverts commit 7cf1eb79.
      58d6584d
    • Reynold Xin's avatar
      [SPARK-7274] [SQL] Create Column expression for array/struct creation. · 37537760
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5802 from rxin/SPARK-7274 and squashes the following commits:
      
      19aecaa [Reynold Xin] Fixed unicode tests.
      bfc1538 [Reynold Xin] Export all Python functions.
      2517b8c [Reynold Xin] Code review.
      23da335 [Reynold Xin] Fixed Python bug.
      132002e [Reynold Xin] Fixed tests.
      56fce26 [Reynold Xin] Added Python support.
      b0d591a [Reynold Xin] Fixed debug error.
      86926a6 [Reynold Xin] Added test suite.
      7dbb9ab [Reynold Xin] Ok one more.
      470e2f5 [Reynold Xin] One more MLlib ...
      e2d14f0 [Reynold Xin] [SPARK-7274][SQL] Create Column expression for array/struct creation.
      37537760
    • Liang-Chi Hsieh's avatar
      [SPARK-7183] [NETWORK] Fix memory leak of TransportRequestHandler.streamIds · 16860327
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7183
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5743 from viirya/fix_requesthandler_memory_leak and squashes the following commits:
      
      cf2c086 [Liang-Chi Hsieh] For comments.
      97e205c [Liang-Chi Hsieh] Remove unused import.
      d35f19a [Liang-Chi Hsieh] For comments.
      f9a0c37 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_requesthandler_memory_leak
      45908b7 [Liang-Chi Hsieh] for style.
      17f020f [Liang-Chi Hsieh] Remove unused import.
      37a4b6c [Liang-Chi Hsieh] Remove streamIds from TransportRequestHandler.
      3b3f38a [Liang-Chi Hsieh] Fix memory leak of TransportRequestHandler.streamIds.
      16860327
    • Sean Owen's avatar
      [SPARK-6846] [WEBUI] [HOTFIX] return to GET for kill link in UI since YARN AM won't proxy POST · 1262e310
      Sean Owen authored
      Partial undoing of SPARK-6846; YARN AM proxy won't forward POSTs, so go back to GET for kill links in Spark UI. Standalone UIs are not affected.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5837 from srowen/SPARK-6846.2 and squashes the following commits:
      
      c17c386 [Sean Owen] Partial undoing of SPARK-6846; YARN AM proxy won't forward POSTs, so go back to GET for kill links in Spark UI. Standalone UIs are not affected.
      1262e310
    • Dan McClary's avatar
      [SPARK-5854] personalized page rank · 7d427222
      Dan McClary authored
      Here's a modification to PageRank which does personalized PageRank.  The approach is basically similar to that outlined by Bahmani et al. from 2010 (http://arxiv.org/pdf/1006.2880.pdf).
      
      I'm sure this needs tuning up or other considerations, so let me know how I can improve this.
      
      Author: Dan McClary <dan.mcclary@gmail.com>
      Author: dwmclary <dan.mcclary@gmail.com>
      
      Closes #4774 from dwmclary/SPARK-5854-Personalized-PageRank and squashes the following commits:
      
      8b907db [dwmclary] fixed scalastyle errors in PageRankSuite
      2c20e5d [dwmclary] merged with upstream master
      d6cebac [dwmclary] updated as per style requests
      7d00c23 [Dan McClary] fixed line overrun in personalizedVertexPageRank
      d711677 [Dan McClary] updated vertexProgram to restore binary compatibility for inner method
      bb8d507 [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank
      fba0edd [Dan McClary] fixed silly mistakes
      de51be2 [Dan McClary] cleaned up whitespace between comments and methods
      0c30d0c [Dan McClary] updated to maintain binary compatibility
      aaf0b4b [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank
      76773f6 [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank
      44ada8e [Dan McClary] updated tolerance on chain PPR
      1ffed95 [Dan McClary] updated tolerance on chain PPR
      b67ac69 [Dan McClary] updated tolerance on chain PPR
      a560942 [Dan McClary] rolled PPR into pregel code for PageRank
      6dc2c29 [Dan McClary] initial implementation of personalized page rank
      7d427222
    • niranda's avatar
      changing persistence engine trait to an abstract class · 27de6fef
      niranda authored
      Author: niranda <niranda.perera@gmail.com>
      
      Closes #5832 from nirandaperera/PersistanceEngine_abstract_class and squashes the following commits:
      
      67b9d5a [niranda] changing persistence engine trait to an abstract class
      27de6fef
    • Chris Biow's avatar
      Limit help option regex · c8c481da
      Chris Biow authored
      Added word-boundary delimiters so that embedded text such as "-h" within command line options and values doesn't trigger the usage script and exit.
      
      Author: Chris Biow <chris.biow@10gen.com>
      
      Closes #5816 from cbiow/patch-1 and squashes the following commits:
      
      36b3726 [Chris Biow] Limit help option regex
      c8c481da
    • Liang-Chi Hsieh's avatar
      [SPARK-5891] [ML] Add Binarizer ML Transformer · 7630213c
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5891
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5699 from viirya/add_binarizer and squashes the following commits:
      
      1a0b9a4 [Liang-Chi Hsieh] For comments.
      bc397f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
      cc4f03c [Liang-Chi Hsieh] Implement threshold param and use merged params map.
      7564c63 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_binarizer
      1682f8c [Liang-Chi Hsieh] Add Binarizer ML Transformer.
      7630213c
    • Debasish Das's avatar
      [SPARK-3066] [MLLIB] Support recommendAll in matrix factorization model · 3b514af8
      Debasish Das authored
      This is based on #3098 from debasish83.
      
      1. BLAS' GEMM is used to compute inner products.
      2. Reverted changes to MovieLensALS. SPARK-4231 should be addressed in a separate PR.
      3. ~~Fixed a bug in topByKey~~
      
      Closes #3098
      
      debasish83 coderxiang
      
      Author: Debasish Das <debasish.das@one.verizon.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5829 from mengxr/SPARK-3066 and squashes the following commits:
      
      22e6a87 [Xiangrui Meng] topByKey was correct. update its usage
      389b381 [Xiangrui Meng] fix indentation
      49953de [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-3066
      cb9799a [Xiangrui Meng] revert MovieLensALS
      f864f5e [Xiangrui Meng] update test and fix a bug in topByKey
      c5e0181 [Xiangrui Meng] use GEMM and topByKey
      3a0c4eb [Debasish Das] updated with spark master
      98fa424 [Debasish Das] updated with master
      ee99571 [Debasish Das] addressed initial review comments;merged with master;added tests for batch predict APIs in matrix factorization
      3f97c49 [Debasish Das] fixed spark coding style for imports
      7163a5c [Debasish Das] Added API for batch user and product recommendation; MAP calculation for product recommendation per user using randomized split
      d144f57 [Debasish Das] recommendAll API to MatrixFactorizationModel, uses topK finding using BoundedPriorityQueue similar to RDD.top
      f38a1b5 [Debasish Das] use sampleByKey for per user sampling
      10cbb37 [Debasish Das] provide ratio for topN product validation; generate MAP and prec@k metric for movielens dataset
      9fa063e [Debasish Das] import scala.math.round
      4bbae0f [Debasish Das] comments fixed as per scalastyle
      cd3ab31 [Debasish Das] merged with AbstractParams serialization bug
      9b3951f [Debasish Das] validate user/product on MovieLens dataset through user input and compute map measure along with rmse
      3b514af8
    • Marcelo Vanzin's avatar
      [SPARK-4705] Handle multiple app attempts event logs, history server. · 3052f491
      Marcelo Vanzin authored
      This change modifies the event logging listener to write the logs for different application
      attempts to different files. The attempt ID is set by the scheduler backend, so as long
      as the backend returns that ID to SparkContext, things should work. Currently, the
      YARN backend does that.
      
      The history server was also modified to model multiple attempts per application. Each
      attempt has its own UI and a separate row in the listing table, so that users can look at
      all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info
      when all the applications being shown have a single attempt.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com>
      Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com>
      Author: twinkle sachdeva <twinkle.sachdeva@guavus.com>
      
      Closes #5432 from vanzin/SPARK-4705 and squashes the following commits:
      
      7e289fa [Marcelo Vanzin] Review feedback.
      f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
      bc885b7 [Marcelo Vanzin] Review feedback.
      76a3651 [Marcelo Vanzin] Fix log cleaner, add test.
      7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
      1aa309d [Marcelo Vanzin] Improve sorting of app attempts.
      2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name.
      9d59d92 [Marcelo Vanzin] Scalastyle...
      d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent.
      ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id.
      f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
      c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
      9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
      86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
      07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist.
      9092af5 [Marcelo Vanzin] Fix HistoryServer test.
      3a14503 [Marcelo Vanzin] Argh scalastyle.
      657ec18 [Marcelo Vanzin] Fix yarn history URL, app links.
      c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes.
      ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes.
      cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option.
      88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts.
      3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model.
      5fd5c6f [Marcelo Vanzin] Fix my broken rebase.
      318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II
      6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
      4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
      0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
      3052f491
    • Kousuke Saruta's avatar
      [SPARK-3468] [WEBUI] Timeline-View feature · 7fe0f3f2
      Kousuke Saruta authored
      I sometimes trouble-shoot and analyse the cause of long time spending job.
      
      At the time, I find the stages which spends long time or fails, then I find the tasks which spends long time or fails, next I analyse the proportion of each phase in a task.
      
      Another case, I find executors which spends long time for running a task and analyse the details of a task.
      
      In such situation, I think it's helpful to visualize timeline  for each application, job, task and the details of proportion of activity for each task.
      
      I added 3 timeline features to existing Web UI.
      
      [Application Timeline View]
      This view shows following things.
      
      * When each executor was added/removed and the reason why it's removed.
      * When each job  was started/finished.
      * Status of each job.
      
      ![screenshot from 2015-04-01 16 49 25](https://cloud.githubusercontent.com/assets/4736016/6936886/e35fd582-d891-11e4-980d-8de13f50e442.png)
      
      [Stage Timeline View]
      Similar to Application Timeline View, this view shows following things.
      
      * When each executor was added/removed and the reason why it's removed.
      * When each job was started/finished.
      * Status of each stage.
      
      ![screenshot from 2015-04-01 16 50 59](https://cloud.githubusercontent.com/assets/4736016/6936900/0dca6526-d892-11e4-84a8-efd9037af444.png)
      
      [Task Assignment Timeline View]
      This view shows following things.
      
      * When each task started/finished
      * How long each task spent and the proportion.
      * Status of each task.
      * Where each task ran on.
      
      ![screenshot from 2015-04-01 16 51 54](https://cloud.githubusercontent.com/assets/4736016/6936910/20fd5acc-d892-11e4-9018-80e463881fc2.png)
      
      All the view above is zoomable by mouse wheel action and scrollable by drag action.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2342 from sarutak/timeline-viewer-feature and squashes the following commits:
      
      11fe67d [Kousuke Saruta] Fixed conflict
      79ac03d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      a91abd3 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      ef34a5b [Kousuke Saruta] Implement tooltip using bootstrap
      b09d0c5 [Kousuke Saruta] Move `stroke` and `fill` attribute of rect elements to css
      d3c63c8 [Kousuke Saruta] Fixed a little bit bugs
      a36291b [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      28714b6 [Kousuke Saruta] Fixed highlight issue
      0dc4278 [Kousuke Saruta] Addressed most of Patrics's feedbacks
      8110acf [Kousuke Saruta] Added scroll limit to Job timeline
      974a64a [Kousuke Saruta] Removed unused function
      ee7a7f0 [Kousuke Saruta] Refactored
      6a91872 [Kousuke Saruta] Temporary commit
      6693f34 [Kousuke Saruta] Added link to job/stage box in the timeline in order to move to corresponding row when we click
      8f88222 [Kousuke Saruta] Added job/stage description
      aeed4b1 [Kousuke Saruta] Removed stage timeline
      fc1696c [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      999ccd4 [Kousuke Saruta] Improved scalability
      0fc6a31 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      19815ae [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      68b7540 [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      52b5f0b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dec85db [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      fcdab7d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dab7cc1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      09cce97 [Kousuke Saruta] Cleanuped
      16f82cf [Kousuke Saruta] Cleanuped
      9fb522e [Kousuke Saruta] Cleanuped
      d05f2c2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      e85e9aa [Kousuke Saruta] Cleanup: Added TimelineViewUtils.scala
      a76e569 [Kousuke Saruta] Removed unused setting in timeline-view.css
      5ce1b21 [Kousuke Saruta] Added vis.min.js, vis.min.css and vis.map to .rat-exclude
      082f709 [Kousuke Saruta] Added Timeline-View feature for Applications, Jobs and Stages
      7fe0f3f2
    • MechCoder's avatar
      [SPARK-6257] [PYSPARK] [MLLIB] MLlib API missing items in Recommendation · c24aeb6a
      MechCoder authored
      Adds
      
      rank, recommendUsers and RecommendProducts to MatrixFactorizationModel in PySpark.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #5807 from MechCoder/spark-6257 and squashes the following commits:
      
      09629c6 [MechCoder] doc
      953b326 [MechCoder] [SPARK-6257] MLlib API missing items in Recommendation
      c24aeb6a
    • zsxwing's avatar
      [SPARK-7291] [CORE] Fix a flaky test in AkkaRpcEnvSuite · 14b32886
      zsxwing authored
      Read the port from RpcEnv to check the result so that it will success even if port conflicts
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5822 from zsxwing/SPARK-7291 and squashes the following commits:
      
      e521b84 [zsxwing] Fix a flaky test in AkkaRpcEnvSuite
      14b32886
    • Burak Yavuz's avatar
      [SPARK-7287] enabled fixed test · 7cf1eb79
      Burak Yavuz authored
      andrewor14 pwendell I reenabled the test. Let's see if it's fixed. I did also notice that `--jars` started to fail after this was ignored though in the JIRA. like [here](https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/2238/consoleFull), you see that --jars fails for the same exact reason.
      
      Has any change been made to Spark Submit recently? Did the test setup on Jenkins change? If we look into flaky tests last month, you wouldn't find this test among them.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5826 from brkyvz/restart-test and squashes the following commits:
      
      f509f68 [Burak Yavuz] enabled fixed test
      7cf1eb79
    • Sandy Ryza's avatar
      [SPARK-4550] In sort-based shuffle, store map outputs in serialized form · 0a2b15ce
      Sandy Ryza authored
      Refer to the JIRA for the design doc and some perf results.
      
      I wanted to call out some of the more possibly controversial changes up front:
      * Map outputs are only stored in serialized form when Kryo is in use.  I'm still unsure whether Java-serialized objects can be relocated.  At the very least, Java serialization writes out a stream header which causes problems with the current approach, so I decided to leave investigating this to future work.
      * The shuffle now explicitly operates on key-value pairs instead of any object.  Data is written to shuffle files in alternating keys and values instead of key-value tuples.  `BlockObjectWriter.write` now accepts a key argument and a value argument instead of any object.
      * The map output buffer can hold a max of Integer.MAX_VALUE bytes.  Though this wouldn't be terribly difficult to change.
      * When spilling occurs, the objects that still in memory at merge time end up serialized and deserialized an extra time.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #4450 from sryza/sandy-spark-4550 and squashes the following commits:
      
      8c70dd9 [Sandy Ryza] Fix serialization
      9c16fe6 [Sandy Ryza] Fix a couple tests and move getAutoReset to KryoSerializerInstance
      6c54e06 [Sandy Ryza] Fix scalastyle
      d8462d8 [Sandy Ryza] SPARK-4550
      0a2b15ce
    • Patrick Wendell's avatar
      HOTFIX: Disable buggy dependency checker · a9fc5055
      Patrick Wendell authored
      a9fc5055
    • Zhan Zhang's avatar
      [SPARK-6479] [BLOCK MANAGER] Create off-heap block storage API · 36a7a680
      Zhan Zhang authored
      This is the classes for creating off-heap block storage API. It also includes the migration for Tachyon. The diff seems to be big, but it mainly just rename tachyon to offheap. New implementation for hdfs will be submit for review in spark-6112.
      
      Author: Zhan Zhang <zhazhan@gmail.com>
      
      Closes #5430 from zhzhan/SPARK-6479 and squashes the following commits:
      
      60acd84 [Zhan Zhang] minor change to kickoff the test
      12f54c9 [Zhan Zhang] solve merge conflicts
      a54132c [Zhan Zhang] solve review comments
      ffb8e00 [Zhan Zhang] rebase to sparkcontext change
      6e121e0 [Zhan Zhang] resolve review comments and restructure blockmanasger code
      a7aed6c [Zhan Zhang] add Tachyon migration code
      186de31 [Zhan Zhang] initial commit for off-heap block storage api
      36a7a680
  3. Apr 30, 2015
    • Burak Yavuz's avatar
      [SPARK-7248] implemented random number generators for DataFrames · b5347a46
      Burak Yavuz authored
      Adds the functions `rand` (Uniform Dist) and `randn` (Normal Dist.) as expressions to DataFrames.
      
      cc mengxr rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5819 from brkyvz/df-rng and squashes the following commits:
      
      50d69d4 [Burak Yavuz] add seed for test that failed
      4234c3a [Burak Yavuz] fix Rand expression
      13cad5c [Burak Yavuz] couple fixes
      7d53953 [Burak Yavuz] waiting for hive tests
      b453716 [Burak Yavuz] move radn with seed down
      03637f0 [Burak Yavuz] fix broken hive func
      c5909eb [Burak Yavuz] deleted old implementation of Rand
      6d43895 [Burak Yavuz] implemented random generators
      b5347a46
    • zsxwing's avatar
      [SPARK-7282] [STREAMING] Fix the race conditions in StreamingListenerSuite · 69a739c7
      zsxwing authored
      Fixed the following flaky test
      ```Scala
      [info] StreamingListenerSuite:
      [info] - batch info reporting (782 milliseconds)
      [info] - receiver info reporting *** FAILED *** (3 seconds, 911 milliseconds)
      [info]   The code passed to eventually never returned normally. Attempted 10 times over 3.4735783689999997 seconds. Last failure message: 0 did not equal 1. (StreamingListenerSuite.scala:104)
      [info]   org.scalatest.exceptions.TestFailedDueToTimeoutException:
      [info]   at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
      [info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
      [info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      [info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
      [info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply$mcV$sp(StreamingListenerSuite.scala:104)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
      [info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
      [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
      [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
      [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
      [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
      [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
      [info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
      [info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
      [info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
      [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.runTest(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
      [info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
      [info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
      [info]   at scala.collection.immutable.List.foreach(List.scala:318)
      [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
      [info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
      [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
      [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
      [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
      [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
      [info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
      [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
      [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.run(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
      [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
      [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      [info]   at java.lang.Thread.run(Thread.java:745)
      [info]   Cause: org.scalatest.exceptions.TestFailedException: 0 did not equal 1
      [info]   at org.scalatest.MatchersHelper$.newTestFailedException(MatchersHelper.scala:160)
      [info]   at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6231)
      [info]   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6277)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcV$sp(StreamingListenerSuite.scala:105)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply(StreamingListenerSuite.scala:104)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply(StreamingListenerSuite.scala:104)
      [info]   at org.scalatest.concurrent.Eventually$class.makeAValiantAttempt$1(Eventually.scala:394)
      [info]   at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:408)
      [info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
      [info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      [info]   at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
      [info]   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply$mcV$sp(StreamingListenerSuite.scala:104)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite$$anonfun$2.apply(StreamingListenerSuite.scala:94)
      [info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
      [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
      [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
      [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
      [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
      [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
      [info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
      [info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
      [info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
      [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.runTest(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
      [info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
      [info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
      [info]   at scala.collection.immutable.List.foreach(List.scala:318)
      [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
      [info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
      [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
      [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
      [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
      [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
      [info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
      [info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
      [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
      [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
      [info]   at org.apache.spark.streaming.StreamingListenerSuite.run(StreamingListenerSuite.scala:34)
      [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
      [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
      [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      [info]   at java.lang.Thread.run(Thread.java:745)
      ```
      
      The original codes didn't have a memory barrier in the `eventually` closure, which might fail the test, because JVM doesn't guarantee the memory consistency between different threads without  a memory barrier.
      
      This PR used `ConcurrentLinkedQueue` to set up the memory barrier.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5812 from zsxwing/SPARK-7282 and squashes the following commits:
      
      59115ef [zsxwing] Use SynchronizedBuffer
      014dd2b [zsxwing] Fix the race conditions in StreamingListenerSuite
      69a739c7
    • Patrick Wendell's avatar
      Revert "[SPARK-5213] [SQL] Pluggable SQL Parser Support" · beeafcfd
      Patrick Wendell authored
      This reverts commit 3ba5aaab.
      beeafcfd
Loading