Skip to content
Snippets Groups Projects
  1. May 07, 2014
    • Thomas Graves's avatar
      SPARK-1569 Spark on Yarn, authentication broken by pr299 · 4bec84b6
      Thomas Graves authored
      Pass the configs as java options since the executor needs to know before it registers whether to create the connection using authentication or not.    We could see about passing only the authentication configs but for now I just had it pass them all.
      
      I also updating it to use a list to construct the command to make it the same as ClientBase and avoid any issues with spaces.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #649 from tgravescs/SPARK-1569 and squashes the following commits:
      
      0178ab8 [Thomas Graves] add akka settings
      22a8735 [Thomas Graves] Change to only path spark.auth* configs
      8ccc1d4 [Thomas Graves] SPARK-1569 Spark on Yarn, authentication broken
      4bec84b6
    • Andrew Or's avatar
      [SPARK-1688] Propagate PySpark worker stderr to driver · 52008722
      Andrew Or authored
      When at least one of the following conditions is true, PySpark cannot be loaded:
      
      1. PYTHONPATH is not set
      2. PYTHONPATH does not contain the python directory (or jar, in the case of YARN)
      3. The jar does not contain pyspark files (YARN)
      4. The jar does not contain py4j files (YARN)
      
      However, we currently throw the same random `java.io.EOFException` for all of the above cases, when trying to read from the python daemon's output. This message is super unhelpful.
      
      This PR includes the python stderr and the PYTHONPATH in the exception propagated to the driver. Now, the exception message looks something like:
      
      ```
      Error from python worker:
        : No module named pyspark
      PYTHONPATH was:
        /path/to/spark/python:/path/to/some/jar
      java.io.EOFException
        <stack trace>
      ```
      
      whereas before it was just
      
      ```
      java.io.EOFException
        <stack trace>
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #603 from andrewor14/pyspark-exception and squashes the following commits:
      
      10d65d3 [Andrew Or] Throwable -> Exception, worker -> daemon
      862d1d7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
      a5ed798 [Andrew Or] Use block string and interpolation instead of var (minor)
      cc09c45 [Andrew Or] Account for the fact that the python daemon may not have terminated yet
      444f019 [Andrew Or] Use the new RedirectThread + include system PYTHONPATH
      aab00ae [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
      0cc2402 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
      783efe2 [Andrew Or] Make python daemon stderr indentation consistent
      9524172 [Andrew Or] Avoid potential NPE / error stream contention + Move things around
      29f9688 [Andrew Or] Add back original exception type
      e92d36b [Andrew Or] Include python worker stderr in the exception propagated to the driver
      7c69360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
      cdbc185 [Andrew Or] Fix python attribute not found exception when PYTHONPATH is not set
      dcc0353 [Andrew Or] Check both python and system environment variables for PYTHONPATH
      6c09c21 [Andrew Or] Validate PYTHONPATH and PySpark modules before starting python workers
      52008722
    • Andrew Ash's avatar
      Typo fix: fetchting -> fetching · d00981a9
      Andrew Ash authored
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #680 from ash211/patch-3 and squashes the following commits:
      
      9ce3746 [Andrew Ash] Typo fix: fetchting -> fetching
      d00981a9
    • Andrew Ash's avatar
      Nicer logging for SecurityManager startup · 7f6f4a10
      Andrew Ash authored
      Happy to open a jira ticket if you'd like to track one there.
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #678 from ash211/SecurityManagerLogging and squashes the following commits:
      
      2aa0b7a [Andrew Ash] Nicer logging for SecurityManager startup
      7f6f4a10
    • Cheng Hao's avatar
      [SQL] Fix Performance Issue in data type casting · ca431868
      Cheng Hao authored
      Using lazy val object instead of function in the class Cast, which improved the performance nearly by 2X in my local micro-benchmark.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #679 from chenghao-intel/fix_type_casting and squashes the following commits:
      
      71b0902 [Cheng Hao] using lazy val object instead of function for data type casting
      ca431868
    • Aaron Davidson's avatar
      SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions · 3308722c
      Aaron Davidson authored
      This patch includes several cleanups to PythonRDD, focused around fixing [SPARK-1579](https://issues.apache.org/jira/browse/SPARK-1579) cleanly. Listed in order of approximate importance:
      
      - The Python daemon waits for Spark to close the socket before exiting,
        in order to avoid causing spurious IOExceptions in Spark's
        `PythonRDD::WriterThread`.
      - Removes the Python Monitor Thread, which polled for task cancellations
        in order to kill the Python worker. Instead, we do this in the
        onCompleteCallback, since this is guaranteed to be called during
        cancellation.
      - Adds a "completed" variable to TaskContext to avoid the issue noted in
        [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), where onCompleteCallbacks may be execution-order dependent.
        Along with this, I removed the "context.interrupted = true" flag in
        the onCompleteCallback.
      - Extracts PythonRDD::WriterThread to its own class.
      
      Since this patch provides an alternative solution to [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), I did test it with
      
      ```
      sc.textFile("latlon.tsv").take(5)
      ```
      
      many times without error.
      
      Additionally, in order to test the unswallowed exceptions, I performed
      
      ```
      sc.textFile("s3n://<big file>").count()
      ```
      
      and cut my internet during execution. Prior to this patch, we got the "stdin writer exited early" message, which was unhelpful. Now, we get the SocketExceptions propagated through Spark to the user and get proper (though unsuccessful) task retries.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #640 from aarondav/pyspark-io and squashes the following commits:
      
      b391ff8 [Aaron Davidson] Detect "clean socket shutdowns" and stop waiting on the socket
      c0c49da [Aaron Davidson] SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions
      3308722c
    • Kan Zhang's avatar
      [SPARK-1460] Returning SchemaRDD instead of normal RDD on Set operations... · 967635a2
      Kan Zhang authored
      ... that do not change schema
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #448 from kanzhang/SPARK-1460 and squashes the following commits:
      
      111e388 [Kan Zhang] silence MiMa errors in EdgeRDD and VertexRDD
      91dc787 [Kan Zhang] Taking into account newly added Ordering param
      79ed52a [Kan Zhang] [SPARK-1460] Returning SchemaRDD on Set operations that do not change schema
      967635a2
    • Cheng Hao's avatar
      [WIP][Spark-SQL] Optimize the Constant Folding for Expression · 3eb53bd5
      Cheng Hao authored
      Currently, expression does not support the "constant null" well in constant folding.
      e.g. Sum(a, 0) actually always produces Literal(0, NumericType) in runtime.
      
      For example:
      ```
      explain select isnull(key+null)  from src;
      == Logical Plan ==
      Project [HiveGenericUdf#isnull((key#30 + CAST(null, IntegerType))) AS c_0#28]
       MetastoreRelation default, src, None
      
      == Optimized Logical Plan ==
      Project [true AS c_0#28]
       MetastoreRelation default, src, None
      
      == Physical Plan ==
      Project [true AS c_0#28]
       HiveTableScan [], (MetastoreRelation default, src, None), None
      ```
      
      I've create a new Optimization rule called NullPropagation for such kind of constant folding.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #482 from chenghao-intel/optimize_constant_folding and squashes the following commits:
      
      2f14b50 [Cheng Hao] Fix code style issues
      68b9fad [Cheng Hao] Remove the Literal pattern matching for NullPropagation
      29c8166 [Cheng Hao] Update the code for feedback of code review
      50444cc [Cheng Hao] Remove the unnecessary null checking
      80f9f18 [Cheng Hao] Update the UnitTest for aggregation constant folding
      27ea3d7 [Cheng Hao] Fix Constant Folding Bugs & Add More Unittests
      b28e03a [Cheng Hao] Merge pull request #1 from marmbrus/pr/482
      9ccefdb [Michael Armbrust] Add tests for optimized expression evaluation.
      543ef9d [Cheng Hao] fix code style issues
      9cf0396 [Cheng Hao] update code according to the code review comment
      536c005 [Cheng Hao] Add Exceptional case for constant folding
      3c045c7 [Cheng Hao] Optimize the Constant Folding by adding more rules
      2645d4f [Cheng Hao] Constant Folding(null propagation)
      3eb53bd5
    • Patrick Wendell's avatar
      SPARK-1746: Support setting SPARK_JAVA_OPTS on executors for backwards compatibility · 913a0a9c
      Patrick Wendell authored
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #676 from pwendell/worker-opts and squashes the following commits:
      
      54456c4 [Patrick Wendell] SPARK-1746: Support setting SPARK_JAVA_OPTS on executors for backwards compatibility
      913a0a9c
  2. May 06, 2014
    • Sandeep's avatar
      [HOTFIX] SPARK-1637: There are some Streaming examples added after the PR #571 was last updated. · fdae095d
      Sandeep authored
      This resulted in Compilation Errors.
      cc @mateiz project not compiling currently.
      
      Author: Sandeep <sandeep@techaddict.me>
      
      Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits:
      
      b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors.
      fdae095d
    • Ethan Jewett's avatar
      Proposal: clarify Scala programming guide on caching ... · 48ba3b8c
      Ethan Jewett authored
      ... with regards to saved map output. Wording taken partially from Matei Zaharia's email to the Spark user list. http://apache-spark-user-list.1001560.n3.nabble.com/performance-improvement-on-second-operation-without-caching-td5227.html
      
      Author: Ethan Jewett <esjewett@gmail.com>
      
      Closes #668 from esjewett/Doc-update and squashes the following commits:
      
      11793ce [Ethan Jewett] Update based on feedback
      171e670 [Ethan Jewett] Clarify Scala programming guide on caching ...
      48ba3b8c
    • Sean Owen's avatar
      SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs · 25ad8f93
      Sean Owen authored
      While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.
      
      Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #653 from srowen/SPARK-1727 and squashes the following commits:
      
      6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count
      8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output)
      99966a9 [Sean Owen] Update issue tracker URL in docs
      23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak)
      8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs
      25ad8f93
    • Sandeep's avatar
      SPARK-1637: Clean up examples for 1.0 · a000b5c3
      Sandeep authored
      - [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
      - [x] Move Python examples into examples/src/main/python
      - [x] Update docs to reflect these changes
      
      Author: Sandeep <sandeep@techaddict.me>
      
      This patch had conflicts when merged, resolved by
      Committer: Matei Zaharia <matei@databricks.com>
      
      Closes #571 from techaddict/SPARK-1637 and squashes the following commits:
      
      47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples
      8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples
      5f96121 [Sandeep] Move Python examples into examples/src/main/python
      0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
      a000b5c3
    • Patrick Wendell's avatar
      SPARK-1737: Warn rather than fail when Java 7+ is used to create distributions · 39b8b148
      Patrick Wendell authored
      Also moves a few lines of code around in make-distribution.sh.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #669 from pwendell/make-distribution and squashes the following commits:
      
      8bfac49 [Patrick Wendell] Small fix
      46918ec [Patrick Wendell] SPARK-1737: Warn rather than fail when Java 7+ is used to create distributions.
      39b8b148
    • Matei Zaharia's avatar
      [SPARK-1549] Add Python support to spark-submit · 951a5d93
      Matei Zaharia authored
      This PR updates spark-submit to allow submitting Python scripts (currently only with deploy-mode=client, but that's all that was supported before) and updates the PySpark code to properly find various paths, etc. One significant change is that we assume we can always find the Python files either from the Spark assembly JAR (which will happen with the Maven assembly build in make-distribution.sh) or from SPARK_HOME (which will exist in local mode even if you use sbt assembly, and should be enough for testing). This means we no longer need a weird hack to modify the environment for YARN.
      
      This patch also updates the Python worker manager to run python with -u, which means unbuffered output (send it to our logs right away instead of waiting a while after stuff was written); this should simplify debugging.
      
      In addition, it fixes https://issues.apache.org/jira/browse/SPARK-1709, setting the main class from a JAR's Main-Class attribute if not specified by the user, and fixes a few help strings and style issues in spark-submit.
      
      In the future we may want to make the `pyspark` shell use spark-submit as well, but it seems unnecessary for 1.0.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #664 from mateiz/py-submit and squashes the following commits:
      
      15e9669 [Matei Zaharia] Fix some uses of path.separator property
      051278c [Matei Zaharia] Small style fixes
      0afe886 [Matei Zaharia] Add license headers
      4650412 [Matei Zaharia] Add pyFiles to PYTHONPATH in executors, remove old YARN stuff, add tests
      15f8e1e [Matei Zaharia] Set PYTHONPATH in PythonWorkerFactory in case it wasn't set from outside
      47c0655 [Matei Zaharia] More work to make spark-submit work with Python:
      d4375bd [Matei Zaharia] Clean up description of spark-submit args a bit and add Python ones
      951a5d93
    • witgo's avatar
      SPARK-1734: spark-submit throws an exception: Exception in thread "main"... · ec09acdd
      witgo authored
      ... java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory
      
      Author: witgo <witgo@qq.com>
      
      Closes #665 from witgo/SPARK-1734 and squashes the following commits:
      
      cacf238 [witgo] SPARK-1734: spark-submit throws an exception: Exception in thread "main" java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory
      ec09acdd
    • Mark Hamstra's avatar
      [SPARK-1685] Cancel retryTimer on restart of Worker or AppClient · fbfe69de
      Mark Hamstra authored
      See https://issues.apache.org/jira/browse/SPARK-1685 for a more complete description, but in essence: If the Worker or AppClient actor restarts before successfully registering with Master, multiple retryTimers will be running, which will lead to less than the full number of registration retries being attempted before the new actor is forced to give up.
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      
      Closes #602 from markhamstra/SPARK-1685 and squashes the following commits:
      
      11cc088 [Mark Hamstra] retryTimer -> registrationRetryTimer
      69c348c [Mark Hamstra] Cancel retryTimer on restart of Worker or AppClient
      fbfe69de
    • Patrick Wendell's avatar
      Fix two download suggestions in the docs: · 7b978c1a
      Patrick Wendell authored
      1) On the quick start page provide a direct link to the downloads (suggested by @pbailis).
      2) On the index page, don't suggest users always have to build Spark, since many won't.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #662 from pwendell/quick-start and squashes the following commits:
      
      0622f27 [Patrick Wendell] Fix two download suggestions in the docs:
      7b978c1a
    • Thomas Graves's avatar
      SPARK-1474: Spark on yarn assembly doesn't include AmIpFilter · 1e829905
      Thomas Graves authored
      We use org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter in spark on yarn but are not included it in the assembly jar.
      
      I tested this on yarn cluster by removing the yarn jars from the classpath and spark runs fine now.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #406 from tgravescs/SPARK-1474 and squashes the following commits:
      
      1548bf9 [Thomas Graves] SPARK-1474: Spark on yarn assembly doesn't include org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
      1e829905
    • ArcherShao's avatar
      Update OpenHashSet.scala · 0a5a4681
      ArcherShao authored
      Modify wrong comment of function addWithoutResize.
      
      Author: ArcherShao <ArcherShao@users.noreply.github.com>
      
      Closes #667 from ArcherShao/patch-3 and squashes the following commits:
      
      a607358 [ArcherShao] Update OpenHashSet.scala
      0a5a4681
    • Michael Armbrust's avatar
      [SQL] SPARK-1732 - Support for null primitive values. · 3c64750b
      Michael Armbrust authored
      I also removed a println that I bumped into.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #658 from marmbrus/nullPrimitives and squashes the following commits:
      
      a3ec4f3 [Michael Armbrust] Remove println.
      695606b [Michael Armbrust] Support for null primatives from using scala and java reflection.
      3c64750b
    • Andrew Or's avatar
      [SPARK-1735] Add the missing special profiles to make-distribution.sh · a2262cdb
      Andrew Or authored
      73b0cbcc introduced a few special profiles that are not covered in the `make-distribution.sh`. This affects hadoop versions 2.2.x, 2.3.x, and 2.4.x. Without these special profiles, a java version error for protobufs is thrown at run time.
      
      I took the opportunity to rewrite the way we construct the maven command. Previously, the only hadoop version that triggered the `yarn-alpha` profile was 0.23.x, which was inconsistent with the [docs](https://github.com/apache/spark/blob/master/docs/building-with-maven.md). This is now generalized to hadoop versions from 0.23.x to 2.1.x.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #660 from andrewor14/hadoop-distribution and squashes the following commits:
      
      6740126 [Andrew Or] Generalize the yarn profile to hadoop versions 2.2+
      88f192d [Andrew Or] Add the required special profiles to make-distribution.sh
      a2262cdb
  3. May 05, 2014
    • Cheng Lian's avatar
      [SPARK-1678][SPARK-1679] In-memory compression bug fix and made compression... · 6d721c5f
      Cheng Lian authored
      [SPARK-1678][SPARK-1679] In-memory compression bug fix and made compression configurable, disabled by default
      
      In-memory compression is now configurable in `SparkConf` by the `spark.sql.inMemoryCompression.enabled` property, and is disabled by default.
      
      To help code review, the bug fix is in [the first commit](https://github.com/liancheng/spark/commit/d537a367edf0bf24d0b925cc58b21d805ccbc11f), compression configuration is in [the second one](https://github.com/liancheng/spark/commit/4ce09aa8aa820bbbbbaa0f3f084a6cff1d4e6195).
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #608 from liancheng/spark-1678 and squashes the following commits:
      
      66c3a8d [Cheng Lian] Renamed in-memory compression configuration key
      f8fb3a0 [Cheng Lian] Added assertion for testing .hasNext of various decoder
      4ce09aa [Cheng Lian] Made in-memory compression configurable via SparkConf
      d537a36 [Cheng Lian] Fixed SPARK-1678
      6d721c5f
    • Xiangrui Meng's avatar
      [SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide · 98750a74
      Xiangrui Meng authored
      Final pass before the v1.0 release.
      
      * Remove `VectorRDDs`
      * Move `BinaryClassificationMetrics` from `evaluation.binary` to `evaluation`
      * Change default value of `addIntercept` to false and allow to add intercept in Ridge and Lasso.
      * Clean `DecisionTree` package doc and test suite.
      * Mark model constructors `private[spark]`
      * Rename `loadLibSVMData` to `loadLibSVMFile` and hide `LabelParser` from users.
      * Add `saveAsLibSVMFile`.
      * Add `appendBias` to `MLUtils`.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #524 from mengxr/mllib-cleaning and squashes the following commits:
      
      295dc8b [Xiangrui Meng] update loadLibSVMFile doc
      1977ac1 [Xiangrui Meng] fix doc of appendBias
      649fcf0 [Xiangrui Meng] rename loadLibSVMData to loadLibSVMFile; hide LabelParser from user APIs
      54b812c [Xiangrui Meng] add appendBias
      a71e7d0 [Xiangrui Meng] add saveAsLibSVMFile
      d976295 [Xiangrui Meng] Merge branch 'master' into mllib-cleaning
      b7e5cec [Xiangrui Meng] remove some experimental annotations and make model constructors private[mllib]
      9b02b93 [Xiangrui Meng] minor code style update
      a593ddc [Xiangrui Meng] fix python tests
      fc28c18 [Xiangrui Meng] mark more classes experimental
      f6cbbff [Xiangrui Meng] fix Java tests
      0af70b0 [Xiangrui Meng] minor
      6e139ef [Xiangrui Meng] Merge branch 'master' into mllib-cleaning
      94e6dce [Xiangrui Meng] move BinaryLabelCounter and BinaryConfusionMatrixImpl to evaluation.binary
      df34907 [Xiangrui Meng] clean DecisionTreeSuite to use LocalSparkContext
      c81807f [Xiangrui Meng] set the default value of AddIntercept to false
      03389c0 [Xiangrui Meng] allow to add intercept in Ridge and Lasso
      c66c56f [Xiangrui Meng] move tree md to package object doc
      a2695df [Xiangrui Meng] update guide for BinaryClassificationMetrics
      9194f4c [Xiangrui Meng] move BinaryClassificationMetrics one level up
      1c1a0e3 [Xiangrui Meng] remove VectorRDDs because it only contains one function that is not necessary for us to maintain
      98750a74
    • Andrew Or's avatar
      Expose SparkListeners and relevant classes as DeveloperApi · ea10b312
      Andrew Or authored
      Hopefully this can go into 1.0, as a few people on the user list have asked for this.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #648 from andrewor14/expose-listeners and squashes the following commits:
      
      e45e1ef [Andrew Or] Add missing colons (minor)
      350d643 [Andrew Or] Expose SparkListeners and relevant classes as DeveloperApi
      ea10b312
    • Sandy Ryza's avatar
      SPARK-1728. JavaRDDLike.mapPartitionsWithIndex requires ClassTag · 8e724dcb
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #657 from sryza/sandy-spark-1728 and squashes the following commits:
      
      4751443 [Sandy Ryza] SPARK-1728. JavaRDDLike.mapPartitionsWithIndex requires ClassTag
      8e724dcb
    • Andrew Or's avatar
      [SPARK-1681] Include datanucleus jars in Spark Hive distribution · cf0a8f02
      Andrew Or authored
      This copies the datanucleus jars over from `lib_managed` into `dist/lib`, if any. The `CLASSPATH` must also be updated to reflect this change.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #610 from andrewor14/hive-distribution and squashes the following commits:
      
      a4bc96f [Andrew Or] Rename search path in jar error check
      fa205e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
      7855f58 [Andrew Or] Have jar command respect JAVA_HOME + check for jar errors both cases
      c16bbfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution
      32f6826 [Andrew Or] Leave the double colons
      940a1bb [Andrew Or] Add back 2>/dev/null
      58357cc [Andrew Or] Include datanucleus jars in Spark distribution built with Hive support
      cf0a8f02
    • Tathagata Das's avatar
      [SPARK-1504], [SPARK-1505], [SPARK-1558] Updated Spark Streaming guide · a975a19f
      Tathagata Das authored
      - SPARK-1558: Updated custom receiver guide to match it with the new API
      - SPARK-1504: Added deployment and monitoring subsection to streaming
      - SPARK-1505: Added migration guide for migrating from 0.9.x and below to Spark 1.0
      - Updated various Java streaming examples to use JavaReceiverInputDStream to highlight the API change.
      - Removed the requirement for cleaner ttl from streaming guide
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #652 from tdas/doc-fix and squashes the following commits:
      
      cb4f4b7 [Tathagata Das] Possible fix for flaky graceful shutdown test.
      ab71f7f [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into doc-fix
      8d6ff9b [Tathagata Das] Addded migration guide to Spark Streaming.
      7d171df [Tathagata Das] Added reference to JavaReceiverInputStream in examples and streaming guide.
      49edd7c [Tathagata Das] Change java doc links to use Java docs.
      11528d7 [Tathagata Das] Updated links on index page.
      ff80970 [Tathagata Das] More updates to streaming guide.
      4dc42e9 [Tathagata Das] Added monitoring and other documentation in the streaming guide.
      14c6564 [Tathagata Das] Updated custom receiver guide.
      a975a19f
    • Bouke van der Bijl's avatar
      SPARK-1721: Reset the thread classLoader in the Mesos Executor · 3292e2a7
      Bouke van der Bijl authored
      This is because Mesos calls it with a different environment or something, the result is that the Spark jar is missing and it can't load classes.
      
      This fixes http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html
      
      I have no idea whether this is the right fix, I can only confirm that it fixes the issue for us.
      
      The `registered` method is called from mesos (https://github.com/apache/mesos/blob/765ff9bc2ac5a12d4362f8235b572a37d646390a/src/java/jni/org_apache_mesos_MesosExecutorDriver.cpp)
      
      I am unsure which commit caused this regression
      
      Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
      
      Closes #620 from bouk/mesos-classloader-fix and squashes the following commits:
      
      c13eae0 [Bouke van der Bijl] Use getContextOrSparkClassLoader in SparkEnv and CompressionCodec
      3292e2a7
    • Sean Owen's avatar
      SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions · 73b0cbcc
      Sean Owen authored
      See related discussion at https://github.com/apache/spark/pull/468
      
      This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.
      
      - Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
      - Removes `hadoop.major.version`
      - Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
        - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
        - like the jets3t version issue now
      - Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
      - _(YARN profiles in the parent now only exist to add the sub-module)_
      - Fixes the jets3t dependency issue
       - and makes it a runtime dependency
       - and centralizes config of this guy in the parent pom
      - Updates build docs
      - Updates SBT build too
        - and fixes a regex problem along the way
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #629 from srowen/SPARK-1556 and squashes the following commits:
      
      c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
      a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
      274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
      bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
      f21f356 [Sean Owen] Build changes to set up for jets3t fix
      73b0cbcc
    • Reynold Xin's avatar
      Updated doc for spark.closure.serializer to indicate only Java serializer work. · f2eb070a
      Reynold Xin authored
      See discussion from http://apache-spark-developers-list.1001551.n3.nabble.com/bug-using-kryo-as-closure-serializer-td6473.html
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #642 from rxin/docs-ser and squashes the following commits:
      
      a507db5 [Reynold Xin] Use "Java" instead of default.
      5eb8cdd [Reynold Xin] Updated doc for spark.closure.serializer to indicate only the default serializer work.
      f2eb070a
  4. May 04, 2014
    • msiddalingaiah's avatar
      Address SPARK-1717 · bb2bb0cf
      msiddalingaiah authored
      I tested the change locally with Spark 0.9.1, but I can't test with 1.0.0 because there was no AMI for it at the time. It's a trivial fix, so it shouldn't cause any problems.
      
      Author: msiddalingaiah <madhu@madhu.com>
      
      Closes #641 from msiddalingaiah/master and squashes the following commits:
      
      a4f7404 [msiddalingaiah] Address SPARK-1717
      bb2bb0cf
    • Sandeep's avatar
      SPARK-1710: spark-submit should print better errors than "InvocationTargetException" · b48a55ae
      Sandeep authored
      Catching the InvocationTargetException, printing getTargetException.
      
      Author: Sandeep <sandeep@techaddict.me>
      
      Closes #630 from techaddict/SPARK-1710 and squashes the following commits:
      
      834d79b [Sandeep] changes from srowen  suggestions
      109d604 [Sandeep] SPARK-1710: spark-submit should print better errors than "InvocationTargetException"
      b48a55ae
    • Allan Douglas R. de Oliveira's avatar
      EC2 script should exit with non-zero code on UsageError · bcb9b7fd
      Allan Douglas R. de Oliveira authored
      This is specially import because some ssh errors are raised as UsageError, preventing an automated usage of the script from detecting the failure.
      
      Author: Allan Douglas R. de Oliveira <allan@chaordicsystems.com>
      
      Closes #638 from douglaz/ec2_exit_code_fix and squashes the following commits:
      
      5915e6d [Allan Douglas R. de Oliveira] EC2 script should exit with non-zero code on UsageError
      bcb9b7fd
    • witgo's avatar
      SPARK-1693: Most of the tests throw a java.lang.SecurityException when s... · d940e4c1
      witgo authored
      ...park built for hadoop 2.3.0 , 2.4.0
      
      Author: witgo <witgo@qq.com>
      
      Closes #628 from witgo/SPARK-1693_new and squashes the following commits:
      
      e3af968 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1693_new
      dc63905 [witgo] SPARK-1693: Most of the tests throw a java.lang.SecurityException when spark built for hadoop 2.3.0 , 2.4.0
      d940e4c1
    • Sean Owen's avatar
      SPARK-1629. Addendum: Depend on commons lang3 (already used by tachyon) as... · f5041579
      Sean Owen authored
      SPARK-1629. Addendum: Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala
      
      For consideration. This was proposed in related discussion: https://github.com/apache/spark/pull/569
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #635 from srowen/SPARK-1629.2 and squashes the following commits:
      
      a442b98 [Sean Owen] Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala
      f5041579
    • Patrick Wendell's avatar
      SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7. · 0c98a8f6
      Patrick Wendell authored
      This add some guards and good warning messages if users hit this issue. /cc @aarondav with whom I discussed parts of the design.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #627 from pwendell/jdk6 and squashes the following commits:
      
      a38a958 [Patrick Wendell] Code review feedback
      94e9f84 [Patrick Wendell] SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7.
      0c98a8f6
    • Sean Owen's avatar
      SPARK-1663. (Addendum) Fix signature of one version of JavaPairRDDStream.reduceByKeyAndWindow() · 0088cede
      Sean Owen authored
      It appears that one of these methods doesn't use `org.apache.spark.api.java.function.Function2` like all the others, but uses Scala's `Function2`.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #633 from srowen/SPARK-1663.2 and squashes the following commits:
      
      1e0232d [Sean Owen] Fix signature of one version of reduceByKeyAndWindow to use Java API Function2, as apparently intended
      0088cede
    • Rahul Singhal's avatar
      SPARK-1658: Correctly identify if maven is installed and working · e97a2e67
      Rahul Singhal authored
      The current test is checking the exit code of "tail" rather than "mvn".
      This new check will make sure that mvn is installed and was able to
      execute the "version command".
      
      Author: Rahul Singhal <rahul.singhal@guavus.com>
      
      Closes #580 from rahulsinghaliitd/SPARK-1658 and squashes the following commits:
      
      83c0313 [Rahul Singhal] SPARK-1658: Correctly identify if maven is installed and working
      bf821b9 [Rahul Singhal] SPARK-1658: Correctly identify if maven is installed and working
      e97a2e67
    • witgo's avatar
      The default version of yarn is equal to the hadoop version · fb054322
      witgo authored
      This is a part of [PR 590](https://github.com/apache/spark/pull/590)
      
      Author: witgo <witgo@qq.com>
      
      Closes #626 from witgo/yarn_version and squashes the following commits:
      
      c390631 [witgo] restore  the yarn dependency declarations
      f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha
      2df6cf5 [witgo] review commit
      a1d876a [witgo] review commit
      20e7e3e [witgo] review commit
      c76763b [witgo] The default value of yarn.version is equal to hadoop.version
      fb054322
Loading