Skip to content
Snippets Groups Projects
  1. May 16, 2014
    • Patrick Wendell's avatar
      Version bump of spark-ec2 scripts · c0ab85d7
      Patrick Wendell authored
      This will allow us to change things in spark-ec2 related to the 1.0 release.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #809 from pwendell/spark-ec2 and squashes the following commits:
      
      59117fb [Patrick Wendell] Version bump of spark-ec2 scripts
      c0ab85d7
    • Michael Armbrust's avatar
      SPARK-1864 Look in spark conf instead of system properties when propagating... · a80a6a13
      Michael Armbrust authored
      SPARK-1864 Look in spark conf instead of system properties when propagating configuration to executors.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #808 from marmbrus/confClasspath and squashes the following commits:
      
      4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors.
      a80a6a13
    • Matei Zaharia's avatar
      Tweaks to Mesos docs · fed6303f
      Matei Zaharia authored
      - Mention Apache downloads first
      - Shorten some wording
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #806 from mateiz/doc-update and squashes the following commits:
      
      d9345cd [Matei Zaharia] typo
      a179f8d [Matei Zaharia] Tweaks to Mesos docs
      fed6303f
    • Andre Schumacher's avatar
      SPARK-1487 [SQL] Support record filtering via predicate pushdown in Parquet · 40d6acd6
      Andre Schumacher authored
      Simple filter predicates such as LessThan, GreaterThan, etc., where one side is a literal and the other one a NamedExpression are now pushed down to the underlying ParquetTableScan. Here are some results for a microbenchmark with a simple schema of six fields of different types where most records failed the test:
      
                   | Uncompressed  | Compressed
      -------------| ------------- | -------------
      File size  |     10 GB  | 2 GB
      Speedup |      2         | 1.8
      
      Since mileage may vary I added a new option to SparkConf:
      
      `org.apache.spark.sql.parquet.filter.pushdown`
      
      Default value would be `true` and setting it to `false` disables the pushdown. When most rows are expected to pass the filter or when there are few fields performance can be better when pushdown is disabled. The default should fit situations with a reasonable number of (possibly nested) fields where not too many records on average pass the filter.
      
      Because of an issue with Parquet ([see here](https://github.com/Parquet/parquet-mr/issues/371])) currently only predicates on non-nullable attributes are pushed down. If one would know that for a given table no optional fields have missing values one could also allow overriding this.
      
      Author: Andre Schumacher <andre.schumacher@iki.fi>
      
      Closes #511 from AndreSchumacher/parquet_filter and squashes the following commits:
      
      16bfe83 [Andre Schumacher] Removing leftovers from merge during rebase
      7b304ca [Andre Schumacher] Fixing formatting
      c36d5cb [Andre Schumacher] Scalastyle
      3da98db [Andre Schumacher] Second round of review feedback
      7a78265 [Andre Schumacher] Fixing broken formatting in ParquetFilter
      a86553b [Andre Schumacher] First round of code review feedback
      b0f7806 [Andre Schumacher] Optimizing imports in ParquetTestData
      85fea2d [Andre Schumacher] Adding SparkConf setting to disable filter predicate pushdown
      f0ad3cf [Andre Schumacher] Undoing changes not needed for this PR
      210e9cb [Andre Schumacher] Adding disjunctive filter predicates
      a93a588 [Andre Schumacher] Adding unit test for filtering
      6d22666 [Andre Schumacher] Extending ParquetFilters
      93e8192 [Andre Schumacher] First commit Parquet record filtering
      40d6acd6
    • Michael Armbrust's avatar
      [SQL] Implement between in hql · 032d6632
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #804 from marmbrus/between and squashes the following commits:
      
      ae24672 [Michael Armbrust] add golden answer.
      d9997ef [Michael Armbrust] Implement between in hql.
      9bd4433 [Michael Armbrust] Better error on parse failures.
      032d6632
    • Zhen Peng's avatar
      bugfix: overflow of graphx Edge compare function · fa6de408
      Zhen Peng authored
      Author: Zhen Peng <zhenpeng01@baidu.com>
      
      Closes #769 from zhpengg/bugfix-graphx-edge-compare and squashes the following commits:
      
      8a978ff [Zhen Peng] add ut for graphx Edge.lexicographicOrdering.compare
      413c258 [Zhen Peng] there maybe a overflow for two Long's substraction
      fa6de408
    • Patrick Wendell's avatar
      HOTFIX: Duplication of hbase version · e304eb99
      Patrick Wendell authored
      e304eb99
    • Patrick Wendell's avatar
      SPARK-1862: Support for MapR in the Maven build. · 17702e28
      Patrick Wendell authored
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #803 from pwendell/mapr-support and squashes the following commits:
      
      8df60e4 [Patrick Wendell] SPARK-1862: Support for MapR in the Maven build.
      17702e28
    • Cheng Hao's avatar
      [Spark-1461] Deferred Expression Evaluation (short-circuit evaluation) · a20fea98
      Cheng Hao authored
      This patch unify the foldable & nullable interface for Expression.
      1) Deterministic-less UDF (like Rand()) can not be folded.
      2) Short-circut will significantly improves the performance in Expression Evaluation, however, the stateful UDF should not be ignored in a short-circuit evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000, row_sequence() can not be ignored even if col1 > 0 is false)
      
      I brought an concept of DeferredObject from Hive, which has 2 kinds of children classes (EagerResult / DeferredResult), the former requires triggering the evaluation before it's created, while the later trigger the evaluation when first called its get() method.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #446 from chenghao-intel/expression_deferred_evaluation and squashes the following commits:
      
      d2729de [Cheng Hao] Fix the codestyle issues
      a08f09c [Cheng Hao] fix bug in or/and short-circuit evaluation
      af2236b [Cheng Hao] revert the short-circuit expression evaluation for IF
      b7861d2 [Cheng Hao] Add Support for Deferred Expression Evaluation
      a20fea98
  2. May 15, 2014
    • Aaron Davidson's avatar
      SPARK-1860: Do not cleanup application work/ directories by default · bb98ecaf
      Aaron Davidson authored
      This causes an unrecoverable error for applications that are running for longer
      than 7 days that have jars added to the SparkContext, as the jars are cleaned up
      even though the application is still running.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #800 from aarondav/shitty-defaults and squashes the following commits:
      
      a573fbb [Aaron Davidson] SPARK-1860: Do not cleanup application work/ directories by default
      bb98ecaf
    • Huajian Mao's avatar
      Typos in Spark · 94c51396
      Huajian Mao authored
      Author: Huajian Mao <huajianmao@gmail.com>
      
      Closes #798 from huajianmao/patch-1 and squashes the following commits:
      
      208a454 [Huajian Mao] A typo in Task
      1b515af [Huajian Mao] A typo in the message
      94c51396
    • Prashant Sharma's avatar
      Fixes a misplaced comment. · e1e3416c
      Prashant Sharma authored
      Fixes a misplaced comment from #785.
      
      @pwendell
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #788 from ScrapCodes/patch-1 and squashes the following commits:
      
      3ef6a69 [Prashant Sharma] Update package-info.java
      67d9461 [Prashant Sharma] Update package-info.java
      e1e3416c
    • Michael Armbrust's avatar
      [SQL] Fix tiny/small ints from HiveMetastore. · a4aafe5f
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #797 from marmbrus/smallInt and squashes the following commits:
      
      2db9dae [Michael Armbrust] Fix tiny/small ints from HiveMetastore.
      a4aafe5f
    • Stevo Slavić's avatar
      SPARK-1803 Replaced colon in filenames with a dash · e66e31be
      Stevo Slavić authored
      This patch replaces colon in several filenames with dash to make these filenames Windows compatible.
      
      Author: Stevo Slavić <sslavic@gmail.com>
      Author: Stevo Slavic <sslavic@gmail.com>
      
      Closes #739 from sslavic/SPARK-1803 and squashes the following commits:
      
      3ec66eb [Stevo Slavic] Removed extra empty line which was causing test to fail
      b967cc3 [Stevo Slavić] Aligned tests and names of test resources
      2b12776 [Stevo Slavić] Fixed a typo in file name
      1c5dfff [Stevo Slavić] Replaced colon in file name with dash
      8f5bf7f [Stevo Slavić] Replaced colon in file name with dash
      c5b5083 [Stevo Slavić] Replaced colon in file name with dash
      a49801f [Stevo Slavić] Replaced colon in file name with dash
      401d99e [Stevo Slavić] Replaced colon in file name with dash
      40a9621 [Stevo Slavić] Replaced colon in file name with dash
      4774580 [Stevo Slavić] Replaced colon in file name with dash
      004f8bb [Stevo Slavić] Replaced colon in file name with dash
      d6a3e2c [Stevo Slavić] Replaced colon in file name with dash
      b585126 [Stevo Slavić] Replaced colon in file name with dash
      028e48a [Stevo Slavić] Replaced colon in file name with dash
      ece0507 [Stevo Slavić] Replaced colon in file name with dash
      84f5d2f [Stevo Slavić] Replaced colon in file name with dash
      2fc7854 [Stevo Slavić] Replaced colon in file name with dash
      9e1467d [Stevo Slavić] Replaced colon in file name with dash
      e66e31be
    • Sandy Ryza's avatar
      SPARK-1851. Upgrade Avro dependency to 1.7.6 so Spark can read Avro file... · 08e7606a
      Sandy Ryza authored
      ...s
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #795 from sryza/sandy-spark-1851 and squashes the following commits:
      
      79c8227 [Sandy Ryza] SPARK-1851. Upgrade Avro dependency to 1.7.6 so Spark can read Avro files
      08e7606a
    • Xiangrui Meng's avatar
      [SPARK-1741][MLLIB] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans · d52761d6
      Xiangrui Meng authored
      `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users.
      
      Added tests for KMeans, LinearRegression, and NaiveBayes.
      
      Will update examples after https://github.com/apache/spark/pull/653 gets merged.
      
      cc: @srowen
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #670 from mengxr/predict-javardd and squashes the following commits:
      
      b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd
      43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans
      d52761d6
    • Takuya UESHIN's avatar
      [SPARK-1819] [SQL] Fix GetField.nullable. · 94c9d6f5
      Takuya UESHIN authored
      `GetField.nullable` should be `true` not only when `field.nullable` is `true` but also when `child.nullable` is `true`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #757 from ueshin/issues/SPARK-1819 and squashes the following commits:
      
      8781a11 [Takuya UESHIN] Modify a test to use named parameters.
      5bfc77d [Takuya UESHIN] Fix GetField.nullable.
      94c9d6f5
    • Takuya UESHIN's avatar
      [SPARK-1845] [SQL] Use AllScalaRegistrar for SparkSqlSerializer to register serializers of ... · db8cc6f2
      Takuya UESHIN authored
      ...Scala collections.
      
      When I execute `orderBy` or `limit` for `SchemaRDD` including `ArrayType` or `MapType`, `SparkSqlSerializer` throws the following exception:
      
      ```
      com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.$colon$colon
      ```
      
      or
      
      ```
      com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.Vector
      ```
      
      or
      
      ```
      com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.HashMap$HashTrieMap
      ```
      
      and so on.
      
      This is because registrations of serializers for each concrete collections are missing in `SparkSqlSerializer`.
      I believe it should use `AllScalaRegistrar`.
      `AllScalaRegistrar` covers a lot of serializers for concrete classes of `Seq`, `Map` for `ArrayType`, `MapType`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #790 from ueshin/issues/SPARK-1845 and squashes the following commits:
      
      d1ed992 [Takuya UESHIN] Use AllScalaRegistrar for SparkSqlSerializer to register serializers of Scala collections.
      db8cc6f2
    • Andrew Ash's avatar
      SPARK-1846 Ignore logs directory in RAT checks · 3abe2b73
      Andrew Ash authored
      https://issues.apache.org/jira/browse/SPARK-1846
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #793 from ash211/SPARK-1846 and squashes the following commits:
      
      3f50db5 [Andrew Ash] SPARK-1846 Ignore logs directory in RAT checks
      3abe2b73
    • Patrick Wendell's avatar
      HOTFIX: Don't build Javadoc in Maven when creating releases. · 514157f2
      Patrick Wendell authored
      Because we've added java package descriptions in some packages that don't
      have any Java files, running the Javadoc target hits this issue:
      
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4492654
      
      To fix this I've simply removed the javadoc target when publishing
      releases.
      514157f2
    • witgo's avatar
      fix different versions of commons-lang dependency and apache/spark#746 addendum · bae07e36
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #754 from witgo/commons-lang and squashes the following commits:
      
      3ebab31 [witgo] merge master
      f3b8fa2 [witgo] merge master
      2083fae [witgo] repeat definition
      5599cdb [witgo] multiple version of sbt  dependency
      c1b66a1 [witgo] fix different versions of commons-lang dependency
      bae07e36
    • Prashant Sharma's avatar
      Package docs · 46324279
      Prashant Sharma authored
      This is a few changes based on the original patch by @scrapcodes.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #785 from pwendell/package-docs and squashes the following commits:
      
      c32b731 [Patrick Wendell] Changes based on Prashant's patch
      c0463d3 [Prashant Sharma] added eof new line
      ce8bf73 [Prashant Sharma] Added eof new line to all files.
      4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs
      46324279
    • Patrick Wendell's avatar
      Documentation: Encourage use of reduceByKey instead of groupByKey. · 21570b46
      Patrick Wendell authored
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #784 from pwendell/group-by-key and squashes the following commits:
      
      9b4505f [Patrick Wendell] Small fix
      6347924 [Patrick Wendell] Documentation: Encourage use of reduceByKey instead of groupByKey.
      21570b46
  3. May 14, 2014
    • Matei Zaharia's avatar
      Add language tabs and Python version to interactive part of quick-start · f10de042
      Matei Zaharia authored
      This is an addition of some stuff that was missed in https://issues.apache.org/jira/browse/SPARK-1567. I've also updated the doc to show submitting the Python application with spark-submit.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #782 from mateiz/spark-1567-extra and squashes the following commits:
      
      6f8f2aa [Matei Zaharia] tweaks
      9ed9874 [Matei Zaharia] tweaks
      ae67c3e [Matei Zaharia] tweak
      b303ba3 [Matei Zaharia] tweak
      1433a4d [Matei Zaharia] Add language tabs and Python version to interactive part of quick-start guide
      f10de042
    • Tathagata Das's avatar
      [SPARK-1840] SparkListenerBus prints out scary error message when terminated normally · ad4e60ee
      Tathagata Das authored
      Running SparkPi example gave this error.
      ```
      Pi is roughly 3.14374
      14/05/14 18:16:19 ERROR Utils: Uncaught exception in thread SparkListenerBus
      scala.runtime.NonLocalReturnControl$mcV$sp
      ```
      This is due to the catch-all in the SparkListenerBus, which logged control throwable used by scala system
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #783 from tdas/controlexception-fix and squashes the following commits:
      
      a466c8d [Tathagata Das] Ignored control exceptions when logging all exceptions.
      ad4e60ee
    • Chen Chao's avatar
      default task number misleading in several places · 2f639957
      Chen Chao authored
        private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
          new HashPartitioner(numPartitions)
        }
      
      it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism
      
      the property "spark.default.parallelism" refers to https://github.com/apache/spark/pull/389
      
      Author: Chen Chao <crazyjvm@gmail.com>
      
      Closes #766 from CrazyJvm/patch-7 and squashes the following commits:
      
      0b7efba [Chen Chao] Update streaming-programming-guide.md
      cc5b66c [Chen Chao] default task number misleading in several places
      2f639957
    • wangfei's avatar
      [SPARK-1826] fix the head notation of package object dsl · 44165fc9
      wangfei authored
      Author: wangfei <scnbwf@yeah.net>
      
      Closes #765 from scwf/dslfix and squashes the following commits:
      
      d2d1a9d [wangfei] Update package.scala
      66ff53b [wangfei] fix the head notation of package object dsl
      44165fc9
    • andrewor14's avatar
      [Typo] propertes -> properties · 9ad096d5
      andrewor14 authored
      Author: andrewor14 <andrewor14@gmail.com>
      
      Closes #780 from andrewor14/submit-typo and squashes the following commits:
      
      e70e057 [andrewor14] propertes -> properties
      9ad096d5
    • Xiangrui Meng's avatar
      [SPARK-1696][MLLIB] use alpha in dense dspr · e3d72a74
      Xiangrui Meng authored
      It doesn't affect existing code because only `alpha = 1.0` is used in the code.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #778 from mengxr/mllib-dspr-fix and squashes the following commits:
      
      a37402e [Xiangrui Meng] use alpha in dense dspr
      e3d72a74
    • Jacek Laskowski's avatar
      String interpolation + some other small changes · 601e3719
      Jacek Laskowski authored
      After having been invited to make the change in https://github.com/apache/spark/commit/6bee01dd04ef73c6b829110ebcdd622d521ea8ff#commitcomment-6284165 by @witgo.
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #748 from jaceklaskowski/sparkenv-string-interpolation and squashes the following commits:
      
      be6ebac [Jacek Laskowski] String interpolation + some other small changes
      601e3719
    • Xiangrui Meng's avatar
      [FIX] do not load defaults when testing SparkConf in pyspark · 94c6c06e
      Xiangrui Meng authored
      The default constructor loads default properties, which can fail the test.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #775 from mengxr/pyspark-conf-fix and squashes the following commits:
      
      83ef6c4 [Xiangrui Meng] do not load defaults when testing SparkConf in pyspark
      94c6c06e
    • Patrick Wendell's avatar
      SPARK-1833 - Have an empty SparkContext constructor. · 65533c7e
      Patrick Wendell authored
      This is nicer than relying on new SparkContext(new SparkConf())
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #774 from pwendell/spark-context and squashes the following commits:
      
      ef9f12f [Patrick Wendell] SPARK-1833 - Have an empty SparkContext constructor.
      65533c7e
    • Andrew Ash's avatar
      SPARK-1829 Sub-second durations shouldn't round to "0 s" · a3315d7f
      Andrew Ash authored
      As "99 ms" up to 99 ms
      As "0.1 s" from 0.1 s up to 0.9 s
      
      https://issues.apache.org/jira/browse/SPARK-1829
      
      Compare the first image to the second here: http://imgur.com/RaLEsSZ,7VTlgfo#0
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #768 from ash211/spark-1829 and squashes the following commits:
      
      1c15b8e [Andrew Ash] SPARK-1829 Format sub-second durations more appropriately
      a3315d7f
    • witgo's avatar
      Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space · fde82c15
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #773 from witgo/sbt_javaOptions and squashes the following commits:
      
      26c7d38 [witgo] Improve sbt configuration
      fde82c15
    • Mark Hamstra's avatar
      [SPARK-1620] Handle uncaught exceptions in function run by Akka scheduler · 17f3075b
      Mark Hamstra authored
      If the intended behavior was that uncaught exceptions thrown in functions being run by the Akka scheduler would end up being handled by the default uncaught exception handler set in Executor, and if that behavior is, in fact, correct, then this is a way to accomplish that.  I'm not certain, though, that we shouldn't be doing something different to handle uncaught exceptions from some of these scheduled functions.
      
      In any event, this PR covers all of the cases I comment on in [SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620).
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      
      Closes #622 from markhamstra/SPARK-1620 and squashes the following commits:
      
      071d193 [Mark Hamstra] refactored post-SPARK-1772
      1a6a35e [Mark Hamstra] another style fix
      d30eb94 [Mark Hamstra] scalastyle
      3573ecd [Mark Hamstra] Use wrapped try/catch in Utils.tryOrExit
      8fc0439 [Mark Hamstra] Make functions run by the Akka scheduler use Executor's UncaughtExceptionHandler
      17f3075b
    • Patrick Wendell's avatar
      SPARK-1828: Created forked version of hive-exec that doesn't bundle other dependencies · d58cb33f
      Patrick Wendell authored
      See https://issues.apache.org/jira/browse/SPARK-1828 for more information.
      
      This is being submitted to Jenkin's for testing. The dependency won't fully
      propagate in Maven central for a few more hours.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #767 from pwendell/hive-shaded and squashes the following commits:
      
      ea10ac5 [Patrick Wendell] SPARK-1828: Created forked version of hive-exec that doesn't bundle other dependencies
      d58cb33f
    • Andrew Ash's avatar
      SPARK-1818 Freshen Mesos documentation · d1d41cce
      Andrew Ash authored
      Place more emphasis on using precompiled binary versions of Spark and Mesos
      instead of encouraging the reader to compile from source.
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #756 from ash211/spark-1818 and squashes the following commits:
      
      7ef3b33 [Andrew Ash] Brief explanation of the interactions between Spark and Mesos
      e7dea8e [Andrew Ash] Add troubleshooting and debugging section
      956362d [Andrew Ash] Don't need to pass spark.executor.uri into the spark shell
      de3353b [Andrew Ash] Wrap to 100char
      7ebf6ef [Andrew Ash] Polish on the section on Mesos Master URLs
      3dcc2c1 [Andrew Ash] Use --tgz parameter of make-distribution
      41b68ed [Andrew Ash] Period at end of sentence; formatting on :5050
      8bf2c53 [Andrew Ash] Update site.MESOS_VERSIOn to match /pom.xml
      74f2040 [Andrew Ash] SPARK-1818 Freshen Mesos documentation
      d1d41cce
    • Sean Owen's avatar
      SPARK-1827. LICENSE and NOTICE files need a refresh to contain transitive dependency info · 2e5a7cde
      Sean Owen authored
      LICENSE and NOTICE policy is explained here:
      
      http://www.apache.org/dev/licensing-howto.html
      http://www.apache.org/legal/3party.html
      
      This leads to the following changes.
      
      First, this change enables two extensions to maven-shade-plugin in assembly/ that will try to include and merge all NOTICE and LICENSE files. This can't hurt.
      
      This generates a consolidated NOTICE file that I manually added to NOTICE.
      
      Next, a list of all dependencies and their licenses was generated:
      `mvn ... license:aggregate-add-third-party`
      to create: `target/generated-sources/license/THIRD-PARTY.txt`
      
      Each dependency is listed with one or more licenses. Determine the most-compatible license for each if there is more than one.
      
      For "unknown" license dependencies, I manually evaluateD their license. Many are actually Apache projects or components of projects covered already. The only non-trivial one was Colt, which has its own (compatible) license.
      
      I ignored Apache-licensed and public domain dependencies as these require no further action (beyond NOTICE above).
      
      BSD and MIT licenses (permissive Category A licenses) are evidently supposed to be mentioned in LICENSE, so I added a section without output from the THIRD-PARTY.txt file appropriately.
      
      Everything else, Category B licenses, are evidently mentioned in NOTICE (?) Same there.
      
      LICENSE contained some license statements for source code that is redistributed. I left this as I think that is the right place to put it.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #770 from srowen/SPARK-1827 and squashes the following commits:
      
      a764504 [Sean Owen] Add LICENSE and NOTICE info for all transitive dependencies as of 1.0
      2e5a7cde
    • Tathagata Das's avatar
      Fixed streaming examples docs to use run-example instead of spark-submit · 68f28dab
      Tathagata Das authored
      Pretty self-explanatory
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #722 from tdas/example-fix and squashes the following commits:
      
      7839979 [Tathagata Das] Minor changes.
      0673441 [Tathagata Das] Fixed java docs of java streaming example
      e687123 [Tathagata Das] Fixed scala style errors.
      9b8d112 [Tathagata Das] Fixed streaming examples docs to use run-example instead of spark-submit.
      68f28dab
    • Andrew Or's avatar
      [SPARK-1769] Executor loss causes NPE race condition · 69f75022
      Andrew Or authored
      This PR replaces the Schedulable data structures in Pool.scala with thread-safe ones from java. Note that Scala's `with SynchronizedBuffer` trait is soon to be deprecated in 2.11 because it is ["inherently unreliable"](http://www.scala-lang.org/api/2.11.0/index.html#scala.collection.mutable.SynchronizedBuffer). We should slowly drift away from `SynchronizedBuffer` in other places too.
      
      Note that this PR introduces an API-breaking change; `sc.getAllPools` now returns an Array rather than an ArrayBuffer. This is because we want this method to return an immutable copy rather than one may potentially confuse the user if they try to modify the copy, which takes no effect on the original data structure.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #762 from andrewor14/pool-npe and squashes the following commits:
      
      383e739 [Andrew Or] JavaConverters -> JavaConversions
      3f32981 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
      769be19 [Andrew Or] Assorted minor changes
      2189247 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
      05ad9e9 [Andrew Or] Fix test - contains is not the same as containsKey
      0921ea0 [Andrew Or] var -> val
      07d720c [Andrew Or] Synchronize Schedulable data structures
      69f75022
Loading