Skip to content
Snippets Groups Projects
  1. Dec 23, 2014
    • jerryshao's avatar
      [SPARK-4671][Streaming]Do not replicate streaming block when WAL is enabled · 3f5f4cc4
      jerryshao authored
      Currently streaming block will be replicated when specific storage level is set, since WAL is already fault tolerant, so replication is needless and will hurt the throughput of streaming application.
      
      Hi tdas , as per discussed about this issue, I fixed with this implementation, I'm not is this the way you want, would you mind taking a look at it? Thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #3534 from jerryshao/SPARK-4671 and squashes the following commits:
      
      500b456 [jerryshao] Do not replicate streaming block when WAL is enabled
      3f5f4cc4
    • Ilayaperumal Gopinathan's avatar
      [SPARK-4802] [streaming] Remove receiverInfo once receiver is de-registered · 10d69e9c
      Ilayaperumal Gopinathan authored
        Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to
      remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`.
      
      Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io>
      
      Closes #3647 from ilayaperumalg/receiverInfo-RTracker and squashes the following commits:
      
      6eb97d5 [Ilayaperumal Gopinathan] Polishing based on the review
      3640c86 [Ilayaperumal Gopinathan] Remove receiverInfo once receiver is de-registered
      10d69e9c
    • Liang-Chi Hsieh's avatar
      [SPARK-4913] Fix incorrect event log path · 96281cd0
      Liang-Chi Hsieh authored
      SPARK-2261 uses a single file to log events for an app. `eventLogDir` in `ApplicationDescription` is replaced with `eventLogFile`. However, `ApplicationDescription` in `SparkDeploySchedulerBackend` is initialized with `SparkContext`'s `eventLogDir`. It is just the log directory, not the actual log file path. `Master.rebuildSparkUI` can not correctly rebuild a new SparkUI for the app.
      
      Because the `ApplicationDescription` is remotely registered with `Master` and the app's id is then generated in `Master`, we can not get the app id in advance before registration. So the received description needs to be modified with correct `eventLogFile` value.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #3755 from viirya/fix_app_logdir and squashes the following commits:
      
      5e0ea35 [Liang-Chi Hsieh] Revision for comment.
      b5730a1 [Liang-Chi Hsieh] Fix incorrect event log path.
      
      Closes #3777 (a duplicate PR for the same JIRA)
      96281cd0
    • Andrew Or's avatar
      [SPARK-4730][YARN] Warn against deprecated YARN settings · 27c5399f
      Andrew Or authored
      See https://issues.apache.org/jira/browse/SPARK-4730.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #3590 from andrewor14/yarn-settings and squashes the following commits:
      
      36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings
      dcd1316 [Andrew Or] Warn against deprecated YARN settings
      27c5399f
    • Cheng Lian's avatar
      [SPARK-4914][Build] Cleans lib_managed before compiling with Hive 0.13.1 · 395b771f
      Cheng Lian authored
      This PR tries to fix the Hive tests failure encountered in PR #3157 by cleaning `lib_managed` before building assembly jar against Hive 0.13.1 in `dev/run-tests`. Otherwise two sets of datanucleus jars would be left in `lib_managed` and may mess up class paths while executing Hive test suites. Please refer to [this thread] [1] for details. A clean build would be even safer, but we only clean `lib_managed` here to save build time.
      
      This PR also takes the chance to clean up some minor typos and formatting issues in the comments.
      
      [1]: https://github.com/apache/spark/pull/3157#issuecomment-67656488
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3756)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #3756 from liancheng/clean-lib-managed and squashes the following commits:
      
      e2bd21d [Cheng Lian] Adds lib_managed to clean set
      c9f2f3e [Cheng Lian] Cleans lib_managed before compiling with Hive 0.13.1
      395b771f
    • Takeshi Yamamuro's avatar
      [SPARK-4932] Add help comments in Analytics · 9c251c55
      Takeshi Yamamuro authored
      Trivial modifications for usability.
      
      Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
      
      Closes #3775 from maropu/AddHelpCommentInAnalytics and squashes the following commits:
      
      fbea8f5 [Takeshi Yamamuro] Add help comments in Analytics
      9c251c55
    • Marcelo Vanzin's avatar
      [SPARK-4834] [standalone] Clean up application files after app finishes. · dd155369
      Marcelo Vanzin authored
      Commit 7aacb7bf added support for sharing downloaded files among multiple
      executors of the same app. That works great in Yarn, since the app's directory
      is cleaned up after the app is done.
      
      But Spark standalone mode didn't do that, so the lock/cache files created
      by that change were left around and could eventually fill up the disk hosting
      /tmp.
      
      To solve that, create app-specific directories under the local dirs when
      launching executors. Multiple executors launched by the same Worker will
      use the same app directories, so they should be able to share the downloaded
      files. When the application finishes, a new message is sent to all workers
      telling them the application has finished; once that message has been received,
      and all executors registered for the application shut down, then those
      directories will be cleaned up by the Worker.
      
      Note: Unit testing this is hard (if even possible), since local-cluster mode
      doesn't seem to leave the Master/Worker daemons running long enough after
      `sc.stop()` is called for the clean up protocol to take effect.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3705 from vanzin/SPARK-4834 and squashes the following commits:
      
      b430534 [Marcelo Vanzin] Remove seemingly unnecessary synchronization.
      50eb4b9 [Marcelo Vanzin] Review feedback.
      c0e5ea5 [Marcelo Vanzin] [SPARK-4834] [standalone] Clean up application files after app finishes.
      dd155369
    • zsxwing's avatar
      [SPARK-4931][Yarn][Docs] Fix the format of running-on-yarn.md · 2d215aeb
      zsxwing authored
      Currently, the format about log4j in running-on-yarn.md is a bit messy.
      
      ![running-on-yarn](https://cloud.githubusercontent.com/assets/1000778/5535248/204c4b64-8ab4-11e4-83c3-b4722ea0ad9d.png)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3774 from zsxwing/SPARK-4931 and squashes the following commits:
      
      4a5f853 [zsxwing] Fix the format of running-on-yarn.md
      2d215aeb
    • Nicholas Chammas's avatar
      [SPARK-4890] Ignore downloaded EC2 libs · 2823c7f0
      Nicholas Chammas authored
      PR #3737 changed `spark-ec2` to automatically download boto from PyPI. This PR tell git to ignore those downloaded library files.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #3770 from nchammas/ignore-ec2-lib and squashes the following commits:
      
      5c440d3 [Nicholas Chammas] gitignore downloaded EC2 libs
      2823c7f0
    • Nicholas Chammas's avatar
      [Docs] Minor typo fixes · 0e532ccb
      Nicholas Chammas authored
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #3772 from nchammas/patch-1 and squashes the following commits:
      
      b7d9083 [Nicholas Chammas] [Docs] Minor typo fixes
      0e532ccb
  2. Dec 22, 2014
    • DB Tsai's avatar
      [SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R · a96b7278
      DB Tsai authored
      In most of the academic paper and algorithm implementations,
      people use L = 1/2n ||A weights-y||^2 instead of L = 1/n ||A weights-y||^2
      for least-squared loss. See Eq. (1) in http://web.stanford.edu/~hastie/Papers/glmnet.pdf
      
      Since MLlib uses different convention, this will result different residuals and
      all the stats properties will be different from GLMNET package in R.
      
      The model coefficients will be still the same under this change.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #3746 from dbtsai/lir and squashes the following commits:
      
      19c2e85 [DB Tsai] make stepsize twice to converge to the same solution
      0b2c29c [DB Tsai] first commit
      a96b7278
    • zsxwing's avatar
      [SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join · c233ab3d
      zsxwing authored
      In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as,
      ```Scala
        val iterable = Seq(1, 2, 3).map(v => {
          println(v)
          v
        })
        println("Iterable map done")
      
        val iterator = Seq(1, 2, 3).iterator.map(v => {
          println(v)
          v
        })
        println("Iterator map done")
      ```
      outputed
      ```
      1
      2
      3
      Iterable map done
      Iterator map done
      ```
      So we should use 'iterator' to reduce memory consumed by join.
      
      Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3671 from zsxwing/SPARK-4824 and squashes the following commits:
      
      48ee7b9 [zsxwing] Remove the explicit types
      95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join
      c233ab3d
    • genmao.ygm's avatar
      [SPARK-4920][UI]:current spark version in UI is not striking. · de9d7d2b
      genmao.ygm authored
      It is not convenient to see the Spark version. We can keep the same style with Spark website.
      
      ![spark_version](https://cloud.githubusercontent.com/assets/7402327/5527025/1c8c721c-8a35-11e4-8d6a-2734f3c6bdf8.jpg)
      
      Author: genmao.ygm <genmao.ygm@alibaba-inc.com>
      
      Closes #3763 from uncleGen/master-clean-141222 and squashes the following commits:
      
      0dcb9a9 [genmao.ygm] [SPARK-4920][UI]:current spark version in UI is not striking.
      de9d7d2b
    • Liang-Chi Hsieh's avatar
      [Minor] Fix scala doc · a61aa669
      Liang-Chi Hsieh authored
      Minor fix for an obvious scala doc error.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #3751 from viirya/fix_scaladoc and squashes the following commits:
      
      03fddaa [Liang-Chi Hsieh] Fix scala doc.
      a61aa669
    • Aaron Davidson's avatar
      [SPARK-4864] Add documentation to Netty-based configs · fbca6b6c
      Aaron Davidson authored
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3713 from aarondav/netty-configs and squashes the following commits:
      
      8a8b373 [Aaron Davidson] Address Patrick's comments
      3b1f84e [Aaron Davidson] [SPARK-4864] Add documentation to Netty-based configs
      fbca6b6c
    • Kostas Sakellis's avatar
      [SPARK-4079] [CORE] Consolidates Errors if a CompressionCodec is not available · 7c0ed13d
      Kostas Sakellis authored
      This commit consolidates some of the exceptions thrown if compression codecs are not available. If a bad configuration string was passed in, a ClassNotFoundException was through. Also, if Snappy was not available, it would throw an InvocationTargetException when the codec was being used (not when it was being initialized). Now, an IllegalArgumentException is thrown when a codec is not available at creation time - either because the class does not exist or the codec itself is not available in the system. This will allow us to have a better message and fail faster.
      
      Author: Kostas Sakellis <kostas@cloudera.com>
      
      Closes #3119 from ksakellis/kostas-spark-4079 and squashes the following commits:
      
      9709c7c [Kostas Sakellis] Removed unnecessary Logging class
      63bfdd0 [Kostas Sakellis] Removed isAvailable to preserve binary compatibility
      1d0ef2f [Kostas Sakellis] [SPARK-4079] [CORE] Added more information to exception
      64f3d27 [Kostas Sakellis] [SPARK-4079] [CORE] Code review feedback
      52dfa8f [Kostas Sakellis] [SPARK-4079] [CORE] Default to LZF if Snappy not available
      7c0ed13d
    • Sandy Ryza's avatar
      SPARK-4447. Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha · d62da642
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3652 from sryza/sandy-spark-4447 and squashes the following commits:
      
      2791158 [Sandy Ryza] Review feedback
      c23507b [Sandy Ryza] Strip margin from client arguments help string
      18be7ba [Sandy Ryza] SPARK-4447
      d62da642
    • Takeshi Yamamuro's avatar
      [SPARK-4733] Add missing prameter comments in ShuffleDependency · fb8e85e8
      Takeshi Yamamuro authored
      Add missing Javadoc comments in ShuffleDependency.
      
      Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
      
      Closes #3594 from maropu/DependencyJavadocFix and squashes the following commits:
      
      32129b4 [Takeshi Yamamuro] Fix comments in @aggregator and @mapSideCombine
      303c75d [Takeshi Yamamuro] [SPARK-4733] Add missing prameter comments in ShuffleDependency
      fb8e85e8
    • carlmartin's avatar
      [Minor] Improve some code in BroadcastTest for short · 1d9788e4
      carlmartin authored
      Using
          val arr1 = (0 until num).toArray
      instead of
          val arr1 = new Array[Int](num)
          for (i <- 0 until arr1.length) {
            arr1(i) = i
          }
      for short.
      
      Author: carlmartin <carlmartinmax@gmail.com>
      
      Closes #3750 from SaintBacchus/BroadcastTest and squashes the following commits:
      
      43adb70 [carlmartin] Improve some code in BroadcastTest for short
      1d9788e4
    • zsxwing's avatar
      [SPARK-4883][Shuffle] Add a name to the directoryCleaner thread · 8773705f
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3734 from zsxwing/SPARK-4883 and squashes the following commits:
      
      e6f2b61 [zsxwing] Fix the name
      cc74727 [zsxwing] Add a name to the directoryCleaner thread
      8773705f
    • Zhang, Liye's avatar
      [SPARK-4870] Add spark version to driver log · 39272c8c
      Zhang, Liye authored
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #3717 from liyezhang556520/version2Log and squashes the following commits:
      
      ccd30d7 [Zhang, Liye] delete log in sparkConf
      330f70c [Zhang, Liye] move the log from SaprkConf to SparkContext
      96dc115 [Zhang, Liye] remove curly brace
      e833330 [Zhang, Liye] add spark version to driver log
      39272c8c
    • Tsuyoshi Ozawa's avatar
      [SPARK-4915][YARN] Fix classname to be specified for external shuffle service. · 96606f69
      Tsuyoshi Ozawa authored
      Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@lab.ntt.co.jp>
      
      Closes #3757 from oza/SPARK-4915 and squashes the following commits:
      
      3b0d6d6 [Tsuyoshi Ozawa] Fix classname to be specified for external shuffle service.
      96606f69
    • zsxwing's avatar
      [SPARK-4918][Core] Reuse Text in saveAsTextFile · 93b2f3a8
      zsxwing authored
      Reuse Text in saveAsTextFile to reduce GC.
      
      /cc rxin
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3762 from zsxwing/SPARK-4918 and squashes the following commits:
      
      59f03eb [zsxwing] Reuse Text in saveAsTextFile
      93b2f3a8
    • zsxwing's avatar
      [SPARK-2075][Core] Make the compiler generate same bytes code for Hadoop 1.+ and Hadoop 2.+ · 6ee6aa70
      zsxwing authored
      `NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the compiler will generate same codes.
      
      I used the following commands to confirm the generated byte codes are some.
      ```
      mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am
      javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop1.txt
      
      mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package -pl core -am
      javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop2.txt
      
      diff ~/hadoop1.txt ~/hadoop2.txt
      ```
      
      However, the compiler will generate different codes for the classes which call methods of `JobContext/TaskAttemptContext`. `JobContext/TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use `invokevirtual`, while it's an interface in Hadoop 2.+, and will use `invokeinterface`.
      
      To fix it, we can use reflection to call `JobContext/TaskAttemptContext.getConfiguration`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3740 from zsxwing/SPARK-2075 and squashes the following commits:
      
      39d9df2 [zsxwing] Fix the code style
      e4ad8b5 [zsxwing] Use null for the implicit Ordering
      734bac9 [zsxwing] Explicitly set the implicit parameters
      ca03559 [zsxwing] Use reflection to access JobContext/TaskAttemptContext.getConfiguration
      fa40db0 [zsxwing] Add an Ordering for NullWritable to make the compiler generate same byte codes for RDD
      6ee6aa70
  3. Dec 21, 2014
    • Sean Owen's avatar
      SPARK-4910 [CORE] build failed (use of FileStatus.isFile in Hadoop 1.x) · c6a3c0d5
      Sean Owen authored
      Fix small Hadoop 1 compile error from SPARK-2261. In Hadoop 1.x, all we have is FileStatus.isDir, so these "is file" assertions are changed to "is not a dir". This is how similar checks are done so far in the code base.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3754 from srowen/SPARK-4910 and squashes the following commits:
      
      52c5e4e [Sean Owen] Fix small Hadoop 1 compile error from SPARK-2261
      c6a3c0d5
  4. Dec 20, 2014
  5. Dec 19, 2014
    • Andrew Or's avatar
      [SPARK-4140] Document dynamic allocation · 15c03e1e
      Andrew Or authored
      Once the external shuffle service is also documented, the dynamic allocation section will link to it. Let me know if the whole dynamic allocation should be moved to its separate page; I personally think the organization might be cleaner that way.
      
      This patch builds on top of oza's work in #3689.
      
      aarondav pwendell
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@gmail.com>
      
      Closes #3731 from andrewor14/document-dynamic-allocation and squashes the following commits:
      
      1281447 [Andrew Or] Address a few comments
      b9843f2 [Andrew Or] Document the configs as well
      246fb44 [Andrew Or] Merge branch 'SPARK-4839' of github.com:oza/spark into document-dynamic-allocation
      8c64004 [Andrew Or] Add documentation for dynamic allocation (without configs)
      6827b56 [Tsuyoshi Ozawa] Fixing a documentation of spark.dynamicAllocation.enabled.
      53cff58 [Tsuyoshi Ozawa] Adding a documentation about dynamic resource allocation.
      15c03e1e
    • Daniel Darabos's avatar
      [SPARK-4831] Do not include SPARK_CLASSPATH if empty · 7cb3f547
      Daniel Darabos authored
      My guess for fixing https://issues.apache.org/jira/browse/SPARK-4831.
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #3678 from darabos/patch-1 and squashes the following commits:
      
      36e1243 [Daniel Darabos] Do not include SPARK_CLASSPATH if empty.
      7cb3f547
    • Kanwaljit Singh's avatar
      SPARK-2641: Passing num executors to spark arguments from properties file · 1d648123
      Kanwaljit Singh authored
      Since we can set spark executor memory and executor cores using property file, we must also be allowed to set the executor instances.
      
      Author: Kanwaljit Singh <kanwaljit.singh@guavus.com>
      
      Closes #1657 from kjsingh/branch-1.0 and squashes the following commits:
      
      d8a5a12 [Kanwaljit Singh] SPARK-2641: Fixing how spark arguments are loaded from properties file for num executors
      
      Conflicts:
      	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
      1d648123
    • Masayoshi TSUZUKI's avatar
      [SPARK-3060] spark-shell.cmd doesn't accept application options in Windows OS · 8d932475
      Masayoshi TSUZUKI authored
      Added equivalent module as utils.sh and modified spark-shell2.cmd to use it to parse options.
      
      Now we can use application options.
        ex) `bin\spark-shell.cmd --master spark://master:7077 -i path\to\script.txt`
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #3350 from tsudukim/feature/SPARK-3060 and squashes the following commits:
      
      4551e56 [Masayoshi TSUZUKI] Modified too long line which defines the submission options to pass findstr command.
      3a11361 [Masayoshi TSUZUKI] [SPARK-3060] spark-shell.cmd doesn't accept application options in Windows OS
      8d932475
    • Eran Medan's avatar
      change signature of example to match released code · c25c669d
      Eran Medan authored
      the signature of registerKryoClasses is actually of Array[Class[_]]  not Seq
      
      Author: Eran Medan <ehrann.mehdan@gmail.com>
      
      Closes #3747 from eranation/patch-1 and squashes the following commits:
      
      ee9885d [Eran Medan] change signature of example to match released code
      c25c669d
    • Marcelo Vanzin's avatar
      [SPARK-2261] Make event logger use a single file. · 45645191
      Marcelo Vanzin authored
      Currently the event logger uses a directory and several files to
      describe an app's event log, all but one of which are empty. This
      is not very HDFS-friendly, since creating lots of nodes in HDFS
      (especially when they don't contain any data) is frowned upon due
      to the node metadata being kept in the NameNode's memory.
      
      Instead, add a header section to the event log file that contains metadata
      needed to read the events. This metadata includes things like the Spark
      version (for future code that may need it for backwards compatibility) and
      the compression codec used for the event data.
      
      With the new approach, aside from reducing the load on the NN, there's
      also a lot less remote calls needed when reading the log directory.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1222 from vanzin/hist-server-single-log and squashes the following commits:
      
      cc8f5de [Marcelo Vanzin] Store header in plain text.
      c7e6123 [Marcelo Vanzin] Update comment.
      59c561c [Marcelo Vanzin] Review feedback.
      216c5a3 [Marcelo Vanzin] Review comments.
      dce28e9 [Marcelo Vanzin] Fix log overwrite test.
      f91c13e [Marcelo Vanzin] Handle "spark.eventLog.overwrite", and add unit test.
      346f0b4 [Marcelo Vanzin] Review feedback.
      ed0023e [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log
      3f4500f [Marcelo Vanzin] Unit test for SPARK-3697.
      45c7a1f [Marcelo Vanzin] Version of SPARK-3697 for this branch.
      b3ee30b [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log
      a6d5c50 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log
      16fd491 [Marcelo Vanzin] Use unique log directory for each codec.
      0ef3f70 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log
      d93c44a [Marcelo Vanzin] Add a newline to make the header more readable.
      9e928ba [Marcelo Vanzin] Add types.
      bd6ba8c [Marcelo Vanzin] Review feedback.
      a624a89 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log
      04364dc [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log
      bb7c2d3 [Marcelo Vanzin] Fix scalastyle warning.
      16661a3 [Marcelo Vanzin] Simplify some internal code.
      cc6bce4 [Marcelo Vanzin] Some review feedback.
      a722184 [Marcelo Vanzin] Do not encode metadata in log file name.
      3700586 [Marcelo Vanzin] Restore log flushing.
      f677930 [Marcelo Vanzin] Fix botched rebase.
      ae571fa [Marcelo Vanzin] Fix end-to-end event logger test.
      9db0efd [Marcelo Vanzin] Show prettier name in UI.
      8f42274 [Marcelo Vanzin] Make history server parse old-style log directories.
      6251dd7 [Marcelo Vanzin] Make event logger use a single file.
      45645191
    • Josh Rosen's avatar
      [SPARK-4890] Upgrade Boto to 2.34.0; automatically download Boto from PyPi instead of packaging it · c28083f4
      Josh Rosen authored
      This patch upgrades `spark-ec2`'s Boto version to 2.34.0, since this is blocking several features.  Newer versions of Boto don't work properly when they're loaded from a zipfile since they try to read a JSON file from a path relative to the Boto library sources.
      
      Therefore, this patch also changes spark-ec2 to automatically download Boto from PyPi if it's not present in `SPARK_EC2_DIR/lib`, similar to what we do in the `sbt/sbt` script. This shouldn't ben an issue for users since they already need to have an internet connection to launch an EC2 cluster.  By performing the downloading in spark_ec2.py instead of the Bash script, this should also work for Windows users.
      
      I've tested this with Python 2.6, too.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #3737 from JoshRosen/update-boto and squashes the following commits:
      
      0aa43cc [Josh Rosen] Remove unused setup_standalone_cluster() method.
      f02935d [Josh Rosen] Enable Python deprecation warnings and fix one Boto warning:
      587ae89 [Josh Rosen] [SPARK-4890] Upgrade Boto to 2.34.0; automatically download Boto from PyPi instead of packaging it
      c28083f4
    • Ryan Williams's avatar
      [SPARK-4896] don’t redundantly overwrite executor JAR deps · 7981f969
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #2848 from ryan-williams/fetch-file and squashes the following commits:
      
      c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently
      8e39c16 [Ryan Williams] code review feedback
      788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps
      7981f969
    • Ryan Williams's avatar
      [SPARK-4889] update history server example cmds · cdb2c645
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #3736 from ryan-williams/hist and squashes the following commits:
      
      421d8ff [Ryan Williams] add another random typo fix
      76d6a4c [Ryan Williams] remove hdfs example
      a2d0f82 [Ryan Williams] code review feedback
      9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds
      cdb2c645
    • Reynold Xin's avatar
      Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor. · 336cd341
      Reynold Xin authored
      This consolidates some code path and makes constructor arguments simpler for a few classes.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #3738 from rxin/sparkEnvDepRefactor and squashes the following commits:
      
      82e02cc [Reynold Xin] Fixed couple bugs.
      217062a [Reynold Xin] Code review feedback.
      bd00af7 [Reynold Xin] Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor.
      336cd341
    • scwf's avatar
      [Build] Remove spark-staging-1038 · 8e253ebb
      scwf authored
      Author: scwf <wangfei1@huawei.com>
      
      Closes #3743 from scwf/abc and squashes the following commits:
      
      7d98bc8 [scwf] removing spark-staging-1038
      8e253ebb
    • Cheng Hao's avatar
      [SPARK-4901] [SQL] Hot fix for ByteWritables.copyBytes · 5479450c
      Cheng Hao authored
      HiveInspectors.scala failed in compiling with Hadoop 1, as the BytesWritable.copyBytes is not available in Hadoop 1.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3742 from chenghao-intel/settable_oi_hotfix and squashes the following commits:
      
      bb04d1f [Cheng Hao] hot fix for ByteWritables.copyBytes
      5479450c
    • Sandy Ryza's avatar
      SPARK-3428. TaskMetrics for running tasks is missing GC time metrics · 283263ff
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits:
      
      cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics
      283263ff
  6. Dec 18, 2014
    • Liang-Chi Hsieh's avatar
      [SPARK-4674] Refactor getCallSite · d7fc69a8
      Liang-Chi Hsieh authored
      The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single visit. We also do not need to keep filtered `StackTraceElement`.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #3532 from viirya/refactor_getCallSite and squashes the following commits:
      
      62aa124 [Liang-Chi Hsieh] Fix style.
      e741017 [Liang-Chi Hsieh] Refactor getCallSite.
      d7fc69a8
Loading