Skip to content
Snippets Groups Projects
  1. Apr 07, 2014
    • Aaron Davidson's avatar
      SPARK-1099: Introduce local[*] mode to infer number of cores · 0307db0f
      Aaron Davidson authored
      This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #182 from aarondav/110 and squashes the following commits:
      
      a88294c [Aaron Davidson] Rebased changes for new spark-shell
      a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
      0307db0f
    • Patrick Wendell's avatar
      HOTFIX: Disable actor input stream test. · 2a2ca48b
      Patrick Wendell authored
      This test makes incorrect assumptions about the behavior of Thread.sleep().
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #347 from pwendell/stream-tests and squashes the following commits:
      
      10e09e0 [Patrick Wendell] HOTFIX: Disable actor input stream.
      2a2ca48b
    • Sandy Ryza's avatar
      SPARK-1252. On YARN, use container-log4j.properties for executors · 9dd8b916
      Sandy Ryza authored
      container-log4j.properties is a file that YARN provides so that containers can have log4j.properties distinct from that of the NodeManagers.
      
      Logs now go to syslog, and stderr and stdout just have the process's standard err and standard out.
      
      I tested this on pseudo-distributed clusters for both yarn (Hadoop 2.2) and yarn-alpha (Hadoop 0.23.7)/
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #148 from sryza/sandy-spark-1252 and squashes the following commits:
      
      c0043b8 [Sandy Ryza] Put log4j.properties file under common
      55823da [Sandy Ryza] Add license headers to new files
      10934b8 [Sandy Ryza] Add log4j-spark-container.properties and support SPARK_LOG4J_CONF
      e74450b [Sandy Ryza] SPARK-1252. On YARN, use container-log4j.properties for executors
      9dd8b916
    • Reynold Xin's avatar
      [sql] Rename Expression.apply to eval for better readability. · 83f2a2f1
      Reynold Xin authored
      Also used this opportunity to add a bunch of override's and made some members private.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #340 from rxin/eval and squashes the following commits:
      
      a7c7ca7 [Reynold Xin] Fixed conflicts in merge.
      9069de6 [Reynold Xin] Merge branch 'master' into eval
      3ccc313 [Reynold Xin] Merge branch 'master' into eval
      1a47e10 [Reynold Xin] Renamed apply to eval for generators and added a bunch of override's.
      ea061de [Reynold Xin] Rename Expression.apply to eval for better readability.
      83f2a2f1
    • Davis Shepherd's avatar
      SPARK-1432: Make sure that all metadata fields are properly cleaned · a3c51c6e
      Davis Shepherd authored
      While working on spark-1337 with @pwendell, we noticed that not all of the metadata maps in JobProgessListener were being properly cleaned. This could lead to a (hypothetical) memory leak issue should a job run long enough. This patch aims to address the issue.
      
      Author: Davis Shepherd <davis@conviva.com>
      
      Closes #338 from dgshep/master and squashes the following commits:
      
      a77b65c [Davis Shepherd] In the contex of SPARK-1337: Make sure that all metadata fields are properly cleaned
      a3c51c6e
    • Michael Armbrust's avatar
      [SQL] SPARK-1427 Fix toString for SchemaRDD NativeCommands. · b5bae849
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #343 from marmbrus/toStringFix and squashes the following commits:
      
      37198fe [Michael Armbrust] Fix toString for SchemaRDD NativeCommands.
      b5bae849
    • Michael Armbrust's avatar
      [SQL] SPARK-1371 Hash Aggregation Improvements · accd0999
      Michael Armbrust authored
      Given:
      ```scala
      case class Data(a: Int, b: Int)
      val rdd =
        sparkContext
          .parallelize(1 to 200)
          .flatMap(_ => (1 to 50000).map(i => Data(i % 100, i)))
      rdd.registerAsTable("data")
      cacheTable("data")
      ```
      Before:
      ```
      SELECT COUNT(*) FROM data:[10000000]
      16795.567ms
      SELECT a, SUM(b) FROM data GROUP BY a
      7536.436ms
      SELECT SUM(b) FROM data
      10954.1ms
      ```
      
      After:
      ```
      SELECT COUNT(*) FROM data:[10000000]
      1372.175ms
      SELECT a, SUM(b) FROM data GROUP BY a
      2070.446ms
      SELECT SUM(b) FROM data
      958.969ms
      ```
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #295 from marmbrus/hashAgg and squashes the following commits:
      
      ec63575 [Michael Armbrust] Add comment.
      d0495a9 [Michael Armbrust] Use scaladoc instead.
      b4a6887 [Michael Armbrust] Address review comments.
      a2d90ba [Michael Armbrust] Capture child output statically to avoid issues with generators and serialization.
      7c13112 [Michael Armbrust] Rewrite Aggregate operator to stream input and use projections.  Remove unused local RDD functions implicits.
      5096f99 [Michael Armbrust] Make HiveUDAF fields transient since object inspectors are not serializable.
      6a4b671 [Michael Armbrust] Add option to avoid binding operators expressions automatically.
      92cca08 [Michael Armbrust] Always include serialization debug info when running tests.
      1279df2 [Michael Armbrust] Increase default number of partitions.
      accd0999
  2. Apr 06, 2014
    • Patrick Wendell's avatar
      SPARK-1431: Allow merging conflicting pull requests · 87d0928a
      Patrick Wendell authored
      Sometimes if there is a small conflict it's nice to be able to just
      manually fix it up rather than have another RTT with the contributor.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #342 from pwendell/merge-conflicts and squashes the following commits:
      
      cdce61a [Patrick Wendell] SPARK-1431: Allow merging conflicting pull requests
      87d0928a
    • Evan Chan's avatar
      SPARK-1154: Clean up app folders in worker nodes · 1440154c
      Evan Chan authored
      This is a fix for [SPARK-1154](https://issues.apache.org/jira/browse/SPARK-1154).   The issue is that worker nodes fill up with a huge number of app-* folders after some time.  This change adds a periodic cleanup task which asynchronously deletes app directories older than a configurable TTL.
      
      Two new configuration parameters have been introduced:
        spark.worker.cleanup_interval
        spark.worker.app_data_ttl
      
      This change does not include moving the downloads of application jars to a location outside of the work directory.  We will address that if we have time, but that potentially involves caching so it will come either as part of this PR or a separate PR.
      
      Author: Evan Chan <ev@ooyala.com>
      Author: Kelvin Chu <kelvinkwchu@yahoo.com>
      
      Closes #288 from velvia/SPARK-1154-cleanup-app-folders and squashes the following commits:
      
      0689995 [Evan Chan] CR from @aarondav - move config, clarify for standalone mode
      9f10d96 [Evan Chan] CR from @pwendell - rename configs and add cleanup.enabled
      f2f6027 [Evan Chan] CR from @andrewor14
      553d8c2 [Kelvin Chu] change the variable name to currentTimeMillis since it actually tracks in seconds
      8dc9cb5 [Kelvin Chu] Fixed a bug in Utils.findOldFiles() after merge.
      cb52f2b [Kelvin Chu] Change the name of findOldestFiles() to findOldFiles()
      72f7d2d [Kelvin Chu] Fix a bug of Utils.findOldestFiles(). file.lastModified is returned in milliseconds.
      ad99955 [Kelvin Chu] Add unit test for Utils.findOldestFiles()
      dc1a311 [Evan Chan] Don't recompute current time with every new file
      e3c408e [Evan Chan] Document the two new settings
      b92752b [Evan Chan] SPARK-1154: Add a periodic task to clean up app directories
      1440154c
    • Aaron Davidson's avatar
      SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging · 41065584
      Aaron Davidson authored
      Previously, we based our decision regarding including datanucleus jars based on the existence of a spark-hive-assembly jar, which was incidentally built whenever "sbt assembly" is run. This means that a typical and previously supported pathway would start using hive jars.
      
      This patch has the following features/bug fixes:
      
      - Use of SPARK_HIVE (default false) to determine if we should include Hive in the assembly jar.
      - Analagous feature in Maven with -Phive (previously, there was no support for adding Hive to any of our jars produced by Maven)
      - assemble-deps fixed since we no longer use a different ASSEMBLY_DIR
      - avoid adding log message in compute-classpath.sh to the classpath :)
      
      Still TODO before mergeable:
      - We need to download the datanucleus jars outside of sbt. Perhaps we can have spark-class download them if SPARK_HIVE is set similar to how sbt downloads itself.
      - Spark SQL documentation updates.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #237 from aarondav/master and squashes the following commits:
      
      5dc4329 [Aaron Davidson] Typo fixes
      dd4f298 [Aaron Davidson] Doc update
      dd1a365 [Aaron Davidson] Eliminate need for SPARK_HIVE at runtime by d/ling datanucleus from Maven
      a9269b5 [Aaron Davidson] [WIP] Use SPARK_HIVE to determine if we include Hive in packaging
      41065584
    • Aaron Davidson's avatar
      SPARK-1349: spark-shell gets its own command history · 7ce52c4a
      Aaron Davidson authored
      Currently, spark-shell shares its command history with scala repl.
      
      This fix is simply a modification of the default FileBackedHistory file setting:
      https://github.com/scala/scala/blob/master/src/repl/scala/tools/nsc/interpreter/session/FileBackedHistory.scala#L77
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #267 from aarondav/repl and squashes the following commits:
      
      f9c62d2 [Aaron Davidson] SPARK-1349: spark-shell gets its own command history separate from scala repl
      7ce52c4a
    • Sean Owen's avatar
      SPARK-1387. Update build plugins, avoid plugin version warning, centralize versions · 856c50f5
      Sean Owen authored
      Another handful of small build changes to organize and standardize a bit, and avoid warnings:
      
      - Update Maven plugin versions for good measure
      - Since plugins need maven 3.0.4 already, require it explicitly (<3.0.4 had some bugs anyway)
      - Use variables to define versions across dependencies where they should move in lock step
      - ... and make this consistent between Maven/SBT
      
      OK, I also updated the JIRA URL while I was at it here.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #291 from srowen/SPARK-1387 and squashes the following commits:
      
      461eca1 [Sean Owen] Couldn't resist also updating JIRA location to new one
      c2d5cc5 [Sean Owen] Update plugins and Maven version; use variables consistently across Maven/SBT to define dependency versions that should stay in step.
      856c50f5
    • Egor Pakhomov's avatar
      [SPARK-1259] Make RDD locally iterable · e258e504
      Egor Pakhomov authored
      Author: Egor Pakhomov <pahomov.egor@gmail.com>
      
      Closes #156 from epahomov/SPARK-1259 and squashes the following commits:
      
      8ec8f24 [Egor Pakhomov] Make to local iterator shorter
      34aa300 [Egor Pakhomov] Fix toLocalIterator docs
      08363ef [Egor Pakhomov] SPARK-1259 from toLocallyIterable to toLocalIterator
      6a994eb [Egor Pakhomov] SPARK-1259 Make RDD locally iterable
      8be3dcf [Egor Pakhomov] SPARK-1259 Make RDD locally iterable
      33ecb17 [Egor Pakhomov] SPARK-1259 Make RDD locally iterable
      e258e504
    • witgo's avatar
      Fix SPARK-1420 The maven build error for Spark Catalyst · 7012ffaf
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #333 from witgo/SPARK-1420 and squashes the following commits:
      
      902519e [witgo] add dependency scala-reflect to catalyst
      7012ffaf
  3. Apr 05, 2014
    • Matei Zaharia's avatar
      SPARK-1421. Make MLlib work on Python 2.6 · 0b855167
      Matei Zaharia authored
      The reason it wasn't working was passing a bytearray to stream.write(), which is not supported in Python 2.6 but is in 2.7. (This array came from NumPy when we converted data to send it over to Java). Now we just convert those bytearrays to strings of bytes, which preserves nonprintable characters as well.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #335 from mateiz/mllib-python-2.6 and squashes the following commits:
      
      f26c59f [Matei Zaharia] Update docs to no longer say we need Python 2.7
      a84d6af [Matei Zaharia] SPARK-1421. Make MLlib work on Python 2.6
      0b855167
    • Sean Owen's avatar
      Fix for PR #195 for Java 6 · 890d63bd
      Sean Owen authored
      Use Java 6's recommended equivalent of Java 7's Logger.getGlobal() to retain Java 6 compatibility. See PR #195
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #334 from srowen/FixPR195ForJava6 and squashes the following commits:
      
      f92fbd3 [Sean Owen] Use Java 6's recommended equivalent of Java 7's Logger.getGlobal() to retain Java 6 compatibility
      890d63bd
    • Mridul Muralidharan's avatar
      [SPARK-1371] fix computePreferredLocations signature to not depend on underlying implementation · 6e88583a
      Mridul Muralidharan authored
      Change to Map and Set - not mutable HashMap and HashSet
      
      Author: Mridul Muralidharan <mridulm80@apache.org>
      
      Closes #302 from mridulm/master and squashes the following commits:
      
      df747af [Mridul Muralidharan] Address review comments
      17e2907 [Mridul Muralidharan] fix computePreferredLocations signature to not depend on underlying implementation
      6e88583a
    • Kay Ousterhout's avatar
      Remove the getStageInfo() method from SparkContext. · 2d0150c1
      Kay Ousterhout authored
      This method exposes the Stage objects, which are
      private to Spark and should not be exposed to the
      user.
      
      This method was added in https://github.com/apache/spark/commit/01d77f329f5878b7c8672bbdc1859f3ca95d759d; ccing @squito here in case there's a good reason to keep this!
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #308 from kayousterhout/remove_public_method and squashes the following commits:
      
      2e2f009 [Kay Ousterhout] Remove the getStageInfo() method from SparkContext.
      2d0150c1
    • Prashant Sharma's avatar
      HOTFIX for broken CI, by SPARK-1336 · 7c18428f
      Prashant Sharma authored
      Learnt about `set -o pipefail` is very useful.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Prashant Sharma <scrapcodes@gmail.com>
      
      Closes #321 from ScrapCodes/hf-SPARK-1336 and squashes the following commits:
      
      9d22bc2 [Prashant Sharma] added comment why echo -e q exists.
      f865951 [Prashant Sharma] made error to match with word boundry so errors does not match. This is there to make sure build fails if provided SparkBuild has compile errors.
      7fffdf2 [Prashant Sharma] Removed a stray line.
      97379d8 [Prashant Sharma] HOTFIX for broken CI, by SPARK-1336
      7c18428f
  4. Apr 04, 2014
    • Prabeesh K's avatar
      small fix ( proogram -> program ) · 0acc7a02
      Prabeesh K authored
      Author: Prabeesh K <prabsmails@gmail.com>
      
      Closes #331 from prabeesh/patch-3 and squashes the following commits:
      
      9399eb5 [Prabeesh K] small fix(proogram -> program)
      0acc7a02
    • Michael Armbrust's avatar
      [SQL] SPARK-1366 Consistent sql function across different types of SQLContexts · 8de038eb
      Michael Armbrust authored
      Now users who want to use HiveQL should explicitly say `hiveql` or `hql`.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #319 from marmbrus/standardizeSqlHql and squashes the following commits:
      
      de68d0e [Michael Armbrust] Fix sampling test.
      fbe4a54 [Michael Armbrust] Make `sql` always use spark sql parser, users of hive context can now use hql or hiveql to run queries using HiveQL instead.
      8de038eb
    • Haoyuan Li's avatar
      SPARK-1305: Support persisting RDD's directly to Tachyon · b50ddfde
      Haoyuan Li authored
      Move the PR#468 of apache-incubator-spark to the apache-spark
      "Adding an option to persist Spark RDD blocks into Tachyon."
      
      Author: Haoyuan Li <haoyuan@cs.berkeley.edu>
      Author: RongGu <gurongwalker@gmail.com>
      
      Closes #158 from RongGu/master and squashes the following commits:
      
      72b7768 [Haoyuan Li] merge master
      9f7fa1b [Haoyuan Li] fix code style
      ae7834b [Haoyuan Li] minor cleanup
      a8b3ec6 [Haoyuan Li] merge master branch
      e0f4891 [Haoyuan Li] better check offheap.
      55b5918 [RongGu] address matei's comment on the replication of offHeap storagelevel
      7cd4600 [RongGu] remove some logic code for tachyonstore's replication
      51149e7 [RongGu] address aaron's comment on returning value of the remove() function in tachyonstore
      8adfcfa [RongGu] address arron's comment on inTachyonSize
      120e48a [RongGu] changed the root-level dir name in Tachyon
      5cc041c [Haoyuan Li] address aaron's comments
      9b97935 [Haoyuan Li] address aaron's comments
      d9a6438 [Haoyuan Li] fix for pspark
      77d2703 [Haoyuan Li] change python api.git status
      3dcace4 [Haoyuan Li] address matei's comments
      91fa09d [Haoyuan Li] address patrick's comments
      589eafe [Haoyuan Li] use TRY_CACHE instead of MUST_CACHE
      64348b2 [Haoyuan Li] update conf docs.
      ed73e19 [Haoyuan Li] Merge branch 'master' of github.com:RongGu/spark-1
      619a9a8 [RongGu] set number of directories in TachyonStore back to 64; added a TODO tag for duplicated code from the DiskStore
      be79d77 [RongGu] find a way to clean up some unnecessay metods and classed to make the code simpler
      49cc724 [Haoyuan Li] update docs with off_headp option
      4572f9f [RongGu] reserving the old apply function API of StorageLevel
      04301d3 [RongGu] rename StorageLevel.TACHYON to Storage.OFF_HEAP
      c9aeabf [RongGu] rename the StorgeLevel.TACHYON as StorageLevel.OFF_HEAP
      76805aa [RongGu] unifies the config properties name prefix; add the configs into docs/configuration.md
      e700d9c [RongGu] add the SparkTachyonHdfsLR example and some comments
      fd84156 [RongGu] use randomUUID to generate sparkapp directory name on tachyon;minor code style fix
      939e467 [Haoyuan Li] 0.4.1-thrift from maven central
      86a2eab [Haoyuan Li] tachyon 0.4.1-thrift is in the staging repo. but jenkins failed to download it. temporarily revert it back to 0.4.1
      16c5798 [RongGu] make the dependency on tachyon as tachyon-0.4.1-thrift
      eacb2e8 [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1
      bbeb4de [RongGu] fix the JsonProtocolSuite test failure problem
      6adb58f [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1
      d827250 [RongGu] fix JsonProtocolSuie test failure
      716e93b [Haoyuan Li] revert the version
      ca14469 [Haoyuan Li] bump tachyon version to 0.4.1-thrift
      2825a13 [RongGu] up-merging to the current master branch of the apache spark
      6a22c1a [Haoyuan Li] fix scalastyle
      8968b67 [Haoyuan Li] exclude more libraries from tachyon dependency to be the same as referencing tachyon-client.
      77be7e8 [RongGu] address mateiz's comment about the temp folder name problem. The implementation followed mateiz's advice.
      1dcadf9 [Haoyuan Li] typo
      bf278fa [Haoyuan Li] fix python tests
      e82909c [Haoyuan Li] minor cleanup
      776a56c [Haoyuan Li] address patrick's and ali's comments from the previous PR
      8859371 [Haoyuan Li] various minor fixes and clean up
      e3ddbba [Haoyuan Li] add doc to use Tachyon cache mode.
      fcaeab2 [Haoyuan Li] address Aaron's comment
      e554b1e [Haoyuan Li] add python code
      47304b3 [Haoyuan Li] make tachyonStore in BlockMananger lazy val; add more comments StorageLevels.
      dc8ef24 [Haoyuan Li] add old storelevel constructor
      e01a271 [Haoyuan Li] update tachyon 0.4.1
      8011a96 [RongGu] fix a brought-in mistake in StorageLevel
      70ca182 [RongGu] a bit change in comment
      556978b [RongGu] fix the scalastyle errors
      791189b [RongGu] "Adding an option to persist Spark RDD blocks into Tachyon." move the PR#468 of apache-incubator-spark to the apache-spark
      b50ddfde
    • Mark Hamstra's avatar
      [SPARK-1419] Bumped parent POM to apache 14 · 1347ebd4
      Mark Hamstra authored
      Keeping up-to-date with the parent, which includes some bugfixes.
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      
      Closes #328 from markhamstra/Apache14 and squashes the following commits:
      
      3f19975 [Mark Hamstra] Bumped parent POM to apache 14
      1347ebd4
    • Patrick Wendell's avatar
      Add test utility for generating Jar files with compiled classes. · 5f3c1bb5
      Patrick Wendell authored
      This was requested by a few different people and may be generally
      useful, so I'd like to contribute this and not block on a different
      PR for it to get in.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #326 from pwendell/class-loader-test-utils and squashes the following commits:
      
      ff3e88e [Patrick Wendell] Add test utility for generating Jar files with compiled classes.
      5f3c1bb5
    • Matei Zaharia's avatar
      SPARK-1414. Python API for SparkContext.wholeTextFiles · 60e18ce7
      Matei Zaharia authored
      Also clarified comment on each file having to fit in memory
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #327 from mateiz/py-whole-files and squashes the following commits:
      
      9ad64a5 [Matei Zaharia] SPARK-1414. Python API for SparkContext.wholeTextFiles
      60e18ce7
    • Michael Armbrust's avatar
      [SQL] Minor fixes. · d956cc25
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #315 from marmbrus/minorFixes and squashes the following commits:
      
      b23a15d [Michael Armbrust] fix scaladoc
      11062ac [Michael Armbrust] Fix registering "SELECT *" queries as tables and caching them.  As some tests for this and self-joins.
      3997dc9 [Michael Armbrust] Move Row extractor to catalyst.
      208bf5e [Michael Armbrust] More idiomatic naming of DSL functions. * subquery => as * for join condition => on, i.e., `r.join(s, condition = 'a == 'b)` =>`r.join(s, on = 'a == 'b)`
      87211ce [Michael Armbrust] Correctly handle self joins of in-memory cached tables.
      69e195e [Michael Armbrust] Change != to !== in the DSL since != will always translate to != on Any.
      01f2dd5 [Michael Armbrust] Correctly assign aliases to tables in SqlParser.
      d956cc25
    • Thomas Graves's avatar
      [SPARK-1198] Allow pipes tasks to run in different sub-directories · 198892fe
      Thomas Graves authored
      This works as is on Linux/Mac/etc but doesn't cover working on Windows.  In here I use ln -sf for symlinks. Putting this up for comments on that. Do we want to create perhaps some classes for doing shell commands - Linux vs Windows.  Is there some other way we want to do this?   I assume we are still supporting jdk1.6?
      
      Also should I update the Java API for pipes to allow this parameter?
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #128 from tgravescs/SPARK1198 and squashes the following commits:
      
      abc1289 [Thomas Graves] remove extra tag in pom file
      ba23fc0 [Thomas Graves] Add support for symlink on windows, remove commons-io usage
      da4b221 [Thomas Graves] Merge branch 'master' of https://github.com/tgravescs/spark into SPARK1198
      61be271 [Thomas Graves] Fix file name filter
      6b783bd [Thomas Graves] style fixes
      1ab49ca [Thomas Graves] Add support for running pipe tasks is separate directories
      198892fe
    • Patrick Wendell's avatar
      Don't create SparkContext in JobProgressListenerSuite. · a02b535d
      Patrick Wendell authored
      This reduces the time of the test from 11 seconds to 20 milliseconds.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #324 from pwendell/job-test and squashes the following commits:
      
      868d9eb [Patrick Wendell] Don't create SparkContext in JobProgressListenerSuite.
      a02b535d
    • Sandy Ryza's avatar
      SPARK-1375. Additional spark-submit cleanup · 16b83088
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #278 from sryza/sandy-spark-1375 and squashes the following commits:
      
      5fbf1e9 [Sandy Ryza] SPARK-1375. Additional spark-submit cleanup
      16b83088
    • Xusen Yin's avatar
      [SPARK-1133] Add whole text files reader in MLlib · f1fa6170
      Xusen Yin authored
      Here is a pointer to the former [PR164](https://github.com/apache/spark/pull/164).
      
      I add the pull request for the JIRA issue [SPARK-1133](https://spark-project.atlassian.net/browse/SPARK-1133), which brings a new files reader API in MLlib.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #252 from yinxusen/whole-files-input and squashes the following commits:
      
      7191be6 [Xusen Yin] refine comments
      0af3faf [Xusen Yin] add JavaAPI test
      01745ee [Xusen Yin] fix deletion error
      cc97dca [Xusen Yin] move whole text file API to Spark core
      d792cee [Xusen Yin] remove the typo character "+"
      6bdf2c2 [Xusen Yin] test for small local file system block size
      a1f1e7e [Xusen Yin] add two extra spaces
      28cb0fe [Xusen Yin] add whole text files reader
      f1fa6170
    • Aaron Davidson's avatar
      SPARK-1404: Always upgrade spark-env.sh vars to environment vars · 01cf4c40
      Aaron Davidson authored
      This was broken when spark-env.sh was made idempotent, as the idempotence check is an environment variable, but the spark-env.sh variables may not have been.
      
      Tested in zsh, bash, and sh.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #310 from aarondav/SPARK-1404 and squashes the following commits:
      
      c3406a5 [Aaron Davidson] Add extra export in spark-shell
      6a0e340 [Aaron Davidson] SPARK-1404: Always upgrade spark-env.sh vars to environment vars
      01cf4c40
    • Sandy Ryza's avatar
      SPARK-1350. Always use JAVA_HOME to run executor container JVMs. · 7f32fd42
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #313 from sryza/sandy-spark-1350 and squashes the following commits:
      
      bb6d187 [Sandy Ryza] SPARK-1350. Always use JAVA_HOME to run executor container JVMs.
      7f32fd42
    • Patrick Wendell's avatar
      SPARK-1337: Application web UI garbage collects newest stages · ee6e9e7d
      Patrick Wendell authored
      Simple fix...
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #320 from pwendell/stage-clean-up and squashes the following commits:
      
      29be62e [Patrick Wendell] SPARK-1337: Application web UI garbage collects newest stages instead old ones
      ee6e9e7d
  5. Apr 03, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-1398] Removed findbugs jsr305 dependency" · 33e63618
      Patrick Wendell authored
      This reverts commit 92a86b28.
      33e63618
    • Michael Armbrust's avatar
      Fix jenkins from giving the green light to builds that don't compile. · 9231b011
      Michael Armbrust authored
       Adding `| grep` swallows the non-zero return code from sbt failures. See [here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13735/consoleFull) for a Jenkins run that fails to compile, but still gets a green light.
      
      Note the [BUILD FIX] commit isn't actually part of this PR, but github is out of date.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #317 from marmbrus/fixJenkins and squashes the following commits:
      
      7c77ff9 [Michael Armbrust] Remove output filter that was swallowing non-zero exit codes for test failures.
      9231b011
    • Michael Armbrust's avatar
      [BUILD FIX] Fix compilation of Spark SQL Java API. · d94826be
      Michael Armbrust authored
      The JavaAPI and the Parquet improvements PRs didn't conflict, but broke the build.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #316 from marmbrus/hotFixJavaApi and squashes the following commits:
      
      0b84c2d [Michael Armbrust] Fix compilation of Spark SQL Java API.
      d94826be
    • Diana Carroll's avatar
      [SPARK-1134] Fix and document passing of arguments to IPython · a599e43d
      Diana Carroll authored
      This is based on @dianacarroll's previous pull request https://github.com/apache/spark/pull/227, and @joshrosen's comments on https://github.com/apache/spark/pull/38. Since we do want to allow passing arguments to IPython, this does the following:
      * It documents that IPython can't be used with standalone jobs for now. (Later versions of IPython will deal with PYTHONSTARTUP properly and enable this, see https://github.com/ipython/ipython/pull/5226, but no released version has that fix.)
      * If you run `pyspark` with `IPYTHON=1`, it passes your command-line arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark notebook`.
      * The old `IPYTHON_OPTS` remains, but I've removed it from the documentation. This is in case people read an old tutorial that uses it.
      
      This is not a perfect solution and I'd also be okay with keeping things as they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only doing the doc change. With this change though, when IPython fixes https://github.com/ipython/ipython/pull/5226, people will immediately be able to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get all the benefits of running scripts in IPython (presumably better debugging and such). Without it, there will be no way to run scripts in IPython.
      
      @joshrosen you should probably take the final call on this.
      
      Author: Diana Carroll <dcarroll@cloudera.com>
      
      Closes #294 from mateiz/spark-1134 and squashes the following commits:
      
      747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents non-interactive use with spark; only call ipython if no command line arguments were supplied
      a599e43d
    • Michael Armbrust's avatar
      [SQL] SPARK-1333 First draft of java API · b8f53419
      Michael Armbrust authored
      WIP: Some work remains...
       * [x] Hive support
       * [x] Tests
       * [x] Update docs
      
      Feedback welcome!
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #248 from marmbrus/javaSchemaRDD and squashes the following commits:
      
      b393913 [Michael Armbrust] @srowen 's java style suggestions.
      f531eb1 [Michael Armbrust] Address matei's comments.
      33a1b1a [Michael Armbrust] Ignore JavaHiveSuite.
      822f626 [Michael Armbrust] improve docs.
      ab91750 [Michael Armbrust] Improve Java SQL API: * Change JavaRow => Row * Add support for querying RDDs of JavaBeans * Docs * Tests * Hive support
      0b859c8 [Michael Armbrust] First draft of java API.
      b8f53419
    • Prashant Sharma's avatar
      Spark 1162 Implemented takeOrdered in pyspark. · c1ea3afb
      Prashant Sharma authored
      Since python does not have a library for max heap and usual tricks like inverting values etc.. does not work for all cases.
      
      We have our own implementation of max heap.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #97 from ScrapCodes/SPARK-1162/pyspark-top-takeOrdered2 and squashes the following commits:
      
      35f86ba [Prashant Sharma] code review
      2b1124d [Prashant Sharma] fixed tests
      e8a08e2 [Prashant Sharma] Code review comments.
      49e6ba7 [Prashant Sharma] SPARK-1162 added takeOrdered to pyspark
      c1ea3afb
    • Cheng Hao's avatar
      [SPARK-1360] Add Timestamp Support for SQL · 5d1feda2
      Cheng Hao authored
      This PR includes:
      1) Add new data type Timestamp
      2) Add more data type casting base on Hive's Rule
      3) Fix bug missing data type in both parsers (HiveQl & SQLParser).
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #275 from chenghao-intel/timestamp and squashes the following commits:
      
      df709e5 [Cheng Hao] Move orc_ends_with_nulls to blacklist
      24b04b0 [Cheng Hao] Put 3 cases into the black lists(describe_pretty,describe_syntax,lateral_view_outer)
      fc512c2 [Cheng Hao] remove the unnecessary data type equality check in data casting
      d0d1919 [Cheng Hao] Add more data type for scala reflection
      3259808 [Cheng Hao] Add the new Golden files
      3823b97 [Cheng Hao] Update the UnitTest cases & add timestamp type for HiveQL
      54a0489 [Cheng Hao] fix bug mapping to 0 (which is supposed to be null) when NumberFormatException occurs
      9cb505c [Cheng Hao] Fix issues according to PR comments
      e529168 [Cheng Hao] Fix bug of converting from String
      6fc8100 [Cheng Hao] Update Unit Test & CodeStyle
      8a1d4d6 [Cheng Hao] Add DataType for SqlParser
      ce4385e [Cheng Hao] Add TimestampType Support
      5d1feda2
Loading