Skip to content
Snippets Groups Projects
  1. Jun 01, 2014
    • Patrick Wendell's avatar
      Better explanation for how to use MIMA excludes. · d17d2214
      Patrick Wendell authored
      This patch does a few things:
      1. We have a file MimaExcludes.scala exclusively for excludes.
      2. The test runner tells users about that file if a test fails.
      3. I've added back the excludes used from 0.9->1.0. We should keep
         these in the project as an official audit trail of times where
         we decided to make exceptions.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #937 from pwendell/mima and squashes the following commits:
      
      7ee0db2 [Patrick Wendell] Better explanation for how to use MIMA excludes.
      d17d2214
    • Reynold Xin's avatar
      Made spark_ec2.py PEP8 compliant. · eea3aab4
      Reynold Xin authored
      The change set is actually pretty small -- mostly whitespace changes. Admittedly this is a scary change due to the lack of tests to cover the ec2 scripts, and also because indentation actually impacts control flow in Python ...
      
      Look at changes without whitespace diff here: https://github.com/apache/spark/pull/891/files?w=1
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #891 from rxin/spark-ec2-pep8 and squashes the following commits:
      
      ac1bf11 [Reynold Xin] Made spark_ec2.py PEP8 compliant.
      eea3aab4
  2. May 31, 2014
    • Yadid Ayzenberg's avatar
      updated java code blocks in spark SQL guide such that ctx will refer to ... · 366c0c4c
      Yadid Ayzenberg authored
      ...a JavaSparkContext and sqlCtx will refer to a JavaSQLContext
      
      Author: Yadid Ayzenberg <yadid@media.mit.edu>
      
      Closes #932 from yadid/master and squashes the following commits:
      
      f92fb3a [Yadid Ayzenberg] updated java code blocks in spark SQL guide such that ctx will refer to a JavaSparkContext and sqlCtx will refer to a JavaSQLContext
      366c0c4c
    • Uri Laserson's avatar
      SPARK-1917: fix PySpark import of scipy.special functions · 5e98967b
      Uri Laserson authored
      https://issues.apache.org/jira/browse/SPARK-1917
      
      Author: Uri Laserson <laserson@cloudera.com>
      
      Closes #866 from laserson/SPARK-1917 and squashes the following commits:
      
      d947e8c [Uri Laserson] Added test for scipy.special importing
      1798bbd [Uri Laserson] SPARK-1917: fix PySpark import of scipy.special
      5e98967b
    • witgo's avatar
      Improve maven plugin configuration · d8c005d5
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #786 from witgo/maven_plugin and squashes the following commits:
      
      5de86a2 [witgo] Merge branch 'master' of https://github.com/apache/spark into maven_plugin
      c35ef73 [witgo] Improve maven plugin configuration
      d8c005d5
    • Aaron Davidson's avatar
      SPARK-1839: PySpark RDD#take() shouldn't always read from driver · 9909efc1
      Aaron Davidson authored
      This patch simply ports over the Scala implementation of RDD#take(), which reads the first partition at the driver, then decides how many more partitions it needs to read and will possibly start a real job if it's more than 1. (Note that SparkContext#runJob(allowLocal=true) only runs the job locally if there's 1 partition selected and no parent stages.)
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #922 from aarondav/take and squashes the following commits:
      
      fa06df9 [Aaron Davidson] SPARK-1839: PySpark RDD#take() shouldn't always read from driver
      9909efc1
    • Aaron Davidson's avatar
      Super minor: Close inputStream in SparkSubmitArguments · 7d52777e
      Aaron Davidson authored
      `Properties#load()` doesn't close the InputStream, but it'd be closed after being GC'd anyway...
      
      Also changed file.getName to file, because getName only shows the filename. This will show the full (possibly relative) path, which is less confusing if it's not found.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #914 from aarondav/tiny and squashes the following commits:
      
      db9d072 [Aaron Davidson] Super minor: Close inputStream in SparkSubmitArguments
      7d52777e
    • Michael Armbrust's avatar
      [SQL] SPARK-1964 Add timestamp to hive metastore type parser. · 1a0da0ec
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #913 from marmbrus/timestampMetastore and squashes the following commits:
      
      8e0154f [Michael Armbrust] Add timestamp to hive metastore type parser.
      1a0da0ec
    • Michael Armbrust's avatar
      Optionally include Hive as a dependency of the REPL. · 7463cd24
      Michael Armbrust authored
      Due to the way spark-shell launches from an assembly jar, I don't think this change will affect anyone who isn't trying to launch the shell directly from sbt.  That said, it is kinda nice to be able to launch all things directly from SBT when developing.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #801 from marmbrus/hiveRepl and squashes the following commits:
      
      9570571 [Michael Armbrust] Optionally include Hive as a dependency of the REPL.
      7463cd24
    • Takuya UESHIN's avatar
      [SPARK-1947] [SQL] Child of SumDistinct or Average should be widened to... · 3ce81494
      Takuya UESHIN authored
      [SPARK-1947] [SQL] Child of SumDistinct or Average should be widened to prevent overflows the same as Sum.
      
      Child of `SumDistinct` or `Average` should be widened to prevent overflows the same as `Sum`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #902 from ueshin/issues/SPARK-1947 and squashes the following commits:
      
      99c3dcb [Takuya UESHIN] Insert Cast for SumDistinct and Average.
      3ce81494
    • Chen Chao's avatar
      correct tiny comment error · 9ecc40d3
      Chen Chao authored
      Author: Chen Chao <crazyjvm@gmail.com>
      
      Closes #928 from CrazyJvm/patch-8 and squashes the following commits:
      
      144328b [Chen Chao] correct tiny comment error
      9ecc40d3
    • Cheng Lian's avatar
      [SPARK-1959] String "NULL" shouldn't be interpreted as null value · cf989601
      Cheng Lian authored
      JIRA issue: [SPARK-1959](https://issues.apache.org/jira/browse/SPARK-1959)
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #909 from liancheng/spark-1959 and squashes the following commits:
      
      306659c [Cheng Lian] [SPARK-1959] String "NULL" shouldn't be interpreted as null value
      cf989601
    • CodingCat's avatar
      SPARK-1976: fix the misleading part in streaming docs · 41bfdda3
      CodingCat authored
      Spark streaming requires at least two working threads, but the document gives the example like
      
      import org.apache.spark.api.java.function._
      import org.apache.spark.streaming._
      import org.apache.spark.streaming.api._
      // Create a StreamingContext with a local master
      val ssc = new StreamingContext("local", "NetworkWordCount", Seconds(1))
      http://spark.apache.org/docs/latest/streaming-programming-guide.html
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #924 from CodingCat/master and squashes the following commits:
      
      bb89f20 [CodingCat] update streaming docs
      41bfdda3
    • nchammas's avatar
      updated link to mailing list · 23ae3663
      nchammas authored
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #923 from nchammas/patch-1 and squashes the following commits:
      
      65c4d18 [nchammas] updated link to mailing list
      23ae3663
    • Andrew Ash's avatar
      Typo: and -> an · 9c1f204d
      Andrew Ash authored
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #927 from ash211/patch-5 and squashes the following commits:
      
      79b577d [Andrew Ash] Typo: and -> an
      9c1f204d
  3. May 30, 2014
    • Zhen Peng's avatar
      [SPARK-1901] worker should make sure executor has exited before updating executor's info · ff562b23
      Zhen Peng authored
      https://issues.apache.org/jira/browse/SPARK-1901
      
      Author: Zhen Peng <zhenpeng01@baidu.com>
      
      Closes #854 from zhpengg/bugfix-worker-kills-executor and squashes the following commits:
      
      21d380b [Zhen Peng] add some error messages
      506cea6 [Zhen Peng] add some docs for killProcess()
      a0b9860 [Zhen Peng] [SPARK-1901] worker should make sure executor has exited before updating executor's info
      ff562b23
    • Prashant Sharma's avatar
      [SPARK-1971] Update MIMA to compare against Spark 1.0.0 · 79fa8fd4
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #910 from ScrapCodes/enable-mima/spark-core and squashes the following commits:
      
      79f3687 [Prashant Sharma] updated Mima to check against version 1.0
      1e8969c [Prashant Sharma] Spark core missed out on Mima settings. So in effect we never tested spark core for mima related errors.
      79fa8fd4
    • Matei Zaharia's avatar
      [SPARK-1566] consolidate programming guide, and general doc updates · c8bf4131
      Matei Zaharia authored
      This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
      
      * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
      * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
      * Spark-submit guide moved to a separate page and expanded slightly
      * Various cleanups of the menu system, security docs, and others
      * Updated look of title bar to differentiate the docs from previous Spark versions
      
      You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #896 from mateiz/1.0-docs and squashes the following commits:
      
      03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
      0779508 [Matei Zaharia] tweak
      ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
      1bf4112 [Matei Zaharia] Review comments
      4414f88 [Matei Zaharia] tweaks
      d04e979 [Matei Zaharia] Fix some old links to Java guide
      a34ed33 [Matei Zaharia] tweak
      541bb3b [Matei Zaharia] miscellaneous changes
      fcefdec [Matei Zaharia] Moved submitting apps to separate doc
      61d72b4 [Matei Zaharia] stuff
      181f217 [Matei Zaharia] migration guide, remove old language guides
      e11a0da [Matei Zaharia] Add more API functions
      6a030a9 [Matei Zaharia] tweaks
      8db0ae3 [Matei Zaharia] Added key-value pairs section
      318d2c9 [Matei Zaharia] tweaks
      1c81477 [Matei Zaharia] New section on basics and function syntax
      e38f559 [Matei Zaharia] Actually added programming guide to Git
      a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
      3b6a876 [Matei Zaharia] More CSS tweaks
      01ec8bf [Matei Zaharia] More CSS tweaks
      e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
      c8bf4131
    • Prashant Sharma's avatar
      [SPARK-1820] Make GenerateMimaIgnore @DeveloperApi annotation aware. · eeee978a
      Prashant Sharma authored
      We add all the classes annotated as `DeveloperApi` to `~/.mima-excludes`.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: nikhil7sh <nikhilsharmalnmiit@gmail.ccom>
      
      Closes #904 from ScrapCodes/SPARK-1820/ignore-Developer-Api and squashes the following commits:
      
      de944f9 [Prashant Sharma] Code review.
      e3c5215 [Prashant Sharma] Incorporated patrick's suggestions and fixed the scalastyle build.
      9983a42 [nikhil7sh] [SPARK-1820] Make GenerateMimaIgnore @DeveloperApi annotation aware
      eeee978a
  4. May 29, 2014
    • Ankur Dave's avatar
      initial version of LPA · b7e28fa4
      Ankur Dave authored
      A straightforward implementation of LPA algorithm for detecting graph communities using the Pregel framework.  Amongst the growing literature on community detection algorithms in networks, LPA is perhaps the most elementary, and despite its flaws it remains a nice and simple approach.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      Author: haroldsultan <haroldsultan@gmail.com>
      Author: Harold Sultan <haroldsultan@gmail.com>
      
      Closes #905 from haroldsultan/master and squashes the following commits:
      
      327aee0 [haroldsultan] Merge pull request #2 from ankurdave/label-propagation
      227a4d0 [Ankur Dave] Untabify
      0ac574c [haroldsultan] Merge pull request #1 from ankurdave/label-propagation
      0e24303 [Ankur Dave] Add LabelPropagationSuite
      84aa061 [Ankur Dave] LabelPropagation: Fix compile errors and style; rename from LPA
      9830342 [Harold Sultan] initial version of LPA
      b7e28fa4
    • Cheng Lian's avatar
      [SPARK-1368][SQL] Optimized HiveTableScan · 8f7141fb
      Cheng Lian authored
      JIRA issue: [SPARK-1368](https://issues.apache.org/jira/browse/SPARK-1368)
      
      This PR introduces two major updates:
      
      - Replaced FP style code with `while` loop and reusable `GenericMutableRow` object in critical path of `HiveTableScan`.
      - Using `ColumnProjectionUtils` to help optimizing RCFile and ORC column pruning.
      
      My quick micro benchmark suggests these two optimizations made the optimized version 2x and 2.5x faster when scanning CSV table and RCFile table respectively:
      
      ```
      Original:
      
      [info] CSV: 27676 ms, RCFile: 26415 ms
      [info] CSV: 27703 ms, RCFile: 26029 ms
      [info] CSV: 27511 ms, RCFile: 25962 ms
      
      Optimized:
      
      [info] CSV: 13820 ms, RCFile: 10402 ms
      [info] CSV: 14158 ms, RCFile: 10691 ms
      [info] CSV: 13606 ms, RCFile: 10346 ms
      ```
      
      The micro benchmark loads a 609MB CVS file (structurally similar to the `src` test table) into a normal Hive table with `LazySimpleSerDe` and a RCFile table, then scans these tables respectively.
      
      Preparation code:
      
      ```scala
      package org.apache.spark.examples.sql.hive
      
      import org.apache.spark.sql.hive.LocalHiveContext
      import org.apache.spark.{SparkConf, SparkContext}
      
      object HiveTableScanPrepare extends App {
        val sparkContext = new SparkContext(
          new SparkConf()
            .setMaster("local")
            .setAppName(getClass.getSimpleName.stripSuffix("$")))
      
        val hiveContext = new LocalHiveContext(sparkContext)
      
        import hiveContext._
      
        hql("drop table scan_csv")
        hql("drop table scan_rcfile")
      
        hql("""create table scan_csv (key int, value string)
              |  row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
              |  with serdeproperties ('field.delim'=',')
            """.stripMargin)
      
        hql(s"""load data local inpath "${args(0)}" into table scan_csv""")
      
        hql("""create table scan_rcfile (key int, value string)
              |  row format serde 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
              |stored as
              |  inputformat 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
              |  outputformat 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
            """.stripMargin)
      
        hql(
          """
            |from scan_csv
            |insert overwrite table scan_rcfile
            |select scan_csv.key, scan_csv.value
          """.stripMargin)
      }
      ```
      
      Benchmark code:
      
      ```scala
      package org.apache.spark.examples.sql.hive
      
      import org.apache.spark.sql.hive.LocalHiveContext
      import org.apache.spark.{SparkConf, SparkContext}
      
      object HiveTableScanBenchmark extends App {
        val sparkContext = new SparkContext(
          new SparkConf()
            .setMaster("local")
            .setAppName(getClass.getSimpleName.stripSuffix("$")))
      
        val hiveContext = new LocalHiveContext(sparkContext)
      
        import hiveContext._
      
        val scanCsv = hql("select key from scan_csv")
        val scanRcfile = hql("select key from scan_rcfile")
      
        val csvDuration = benchmark(scanCsv.count())
        val rcfileDuration = benchmark(scanRcfile.count())
      
        println(s"CSV: $csvDuration ms, RCFile: $rcfileDuration ms")
      
        def benchmark(f: => Unit) = {
          val begin = System.currentTimeMillis()
          f
          val end = System.currentTimeMillis()
          end - begin
        }
      }
      ```
      
      @marmbrus Please help review, thanks!
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #758 from liancheng/fastHiveTableScan and squashes the following commits:
      
      4241a19 [Cheng Lian] Distinguishes sorted and possibly not sorted operations more accurately in HiveComparisonTest
      cf640d8 [Cheng Lian] More HiveTableScan optimisations:
      bf0e7dc [Cheng Lian] Added SortedOperation pattern to match *some* definitely sorted operations and avoid some sorting cost in HiveComparisonTest.
      6d1c642 [Cheng Lian] Using ColumnProjectionUtils to optimise RCFile and ORC column pruning
      eb62fd3 [Cheng Lian] [SPARK-1368] Optimized HiveTableScan
      8f7141fb
    • Yin Huai's avatar
      SPARK-1935: Explicitly add commons-codec 1.5 as a dependency. · 60b89fe6
      Yin Huai authored
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #889 from yhuai/SPARK-1935 and squashes the following commits:
      
      7d50ef1 [Yin Huai] Explicitly add commons-codec 1.5 as a dependency.
      60b89fe6
    • Jyotiska NK's avatar
      Added doctest and method description in context.py · 9cff1dd2
      Jyotiska NK authored
      Added doctest for method textFile and description for methods _initialize_context and _ensure_initialized in context.py
      
      Author: Jyotiska NK <jyotiska123@gmail.com>
      
      Closes #187 from jyotiska/pyspark_context and squashes the following commits:
      
      356f945 [Jyotiska NK] Added doctest for textFile method in context.py
      5b23686 [Jyotiska NK] Updated context.py with method descriptions
      9cff1dd2
  5. May 28, 2014
    • witgo's avatar
      [SPARK-1712]: TaskDescription instance is too big causes Spark to hang · 4dbb27b0
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #694 from witgo/SPARK-1712_new and squashes the following commits:
      
      0f52483 [witgo] review commit
      83ce29b [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      52e6752 [witgo] reset test SparkContext
      63636b6 [witgo] review commit
      44a59ee [witgo] review commit
      3b6d48c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      926bd6a [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      9a5cfad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      03cc562 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      b0930b0 [witgo] review commit
      b1174bd [witgo] merge master
      f76679b [witgo] merge master
      689495d [witgo] fix scala style bug
      1d35c3c [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      062c182 [witgo] fix small bug for code style
      0a428cf [witgo] add unit tests
      158b2dc [witgo] review commit
      4afe71d [witgo] review commit
      9e4ffa7 [witgo] review commit
      1d35c7d [witgo] fix hang
      7965580 [witgo] fix Statement order
      0e29eac [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      3ea1ca1 [witgo] remove duplicate serialize
      743a7ad [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      86e2048 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1712_new
      2a89adc [witgo] SPARK-1712: TaskDescription instance is too big causes Spark to hang
      4dbb27b0
    • David Lemieux's avatar
      Spark 1916 · 4312cf0b
      David Lemieux authored
      
      The changes could be ported back to 0.9 as well.
      Changing in.read to in.readFully to read the whole input stream rather than the first 1020 bytes.
      This should ok considering that Flume caps the body size to 32K by default.
      
      Author: David Lemieux <david.lemieux@radialpoint.com>
      
      Closes #865 from lemieud/SPARK-1916 and squashes the following commits:
      
      a265673 [David Lemieux] Updated SparkFlumeEvent to read the whole stream rather than the first X bytes.
      (cherry picked from commit 0b769b73)
      
      Signed-off-by: default avatarPatrick Wendell <pwendell@gmail.com>
      4312cf0b
    • Patrick Wendell's avatar
      Organize configuration docs · 7801d44f
      Patrick Wendell authored
      This PR improves and organizes the config option page
      and makes a few other changes to config docs. See a preview here:
      http://people.apache.org/~pwendell/config-improvements/configuration.html
      
      The biggest changes are:
      1. The configs for the standalone master/workers were moved to the
      standalone page and out of the general config doc.
      2. SPARK_LOCAL_DIRS was missing from the standalone docs.
      3. Expanded discussion of injecting configs with spark-submit, including an
      example.
      4. Config options were organized into the following categories:
      - Runtime Environment
      - Shuffle Behavior
      - Spark UI
      - Compression and Serialization
      - Execution Behavior
      - Networking
      - Scheduling
      - Security
      - Spark Streaming
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #880 from pwendell/config-cleanup and squashes the following commits:
      
      93f56c3 [Patrick Wendell] Feedback from Matei
      6f66efc [Patrick Wendell] More feedback
      16ae776 [Patrick Wendell] Adding back header section
      d9c264f [Patrick Wendell] Small fix
      e0c1728 [Patrick Wendell] Response to Matei's review
      27d57db [Patrick Wendell] Reverting changes to index.html (covered in #896)
      e230ef9 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      a374369 [Patrick Wendell] Line wrapping fixes
      fdff7fc [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
      3289ea4 [Patrick Wendell] Pulling in changes from #856
      106ee31 [Patrick Wendell] Small link fix
      f7e79bc [Patrick Wendell] Re-organizing config options.
      54b184d [Patrick Wendell] Adding standalone configs to the standalone page
      592e94a [Patrick Wendell] Stash
      29b5446 [Patrick Wendell] Better discussion of spark-submit in configuration docs
      2d719ef [Patrick Wendell] Small fix
      4af9e07 [Patrick Wendell] Adding SPARK_LOCAL_DIRS docs
      204b248 [Patrick Wendell] Small fixes
      7801d44f
    • jmu's avatar
      Fix doc about NetworkWordCount/JavaNetworkWordCount usage of spark streaming · 82eadc3b
      jmu authored
      Usage: NetworkWordCount <master> <hostname> <port>
      -->
      Usage: NetworkWordCount <hostname> <port>
      
      Usage: JavaNetworkWordCount <master> <hostname> <port>
      -->
      Usage: JavaNetworkWordCount <hostname> <port>
      
      Author: jmu <jmujmu@gmail.com>
      
      Closes #826 from jmu/master and squashes the following commits:
      
      9fb7980 [jmu] Merge branch 'master' of https://github.com/jmu/spark
      b9a6b02 [jmu] Fix doc for NetworkWordCount/JavaNetworkWordCount Usage: NetworkWordCount <master> <hostname> <port> --> Usage: NetworkWordCount <hostname> <port>
      82eadc3b
    • Takuya UESHIN's avatar
      [SPARK-1938] [SQL] ApproxCountDistinctMergeFunction should return Int value. · 9df86835
      Takuya UESHIN authored
      `ApproxCountDistinctMergeFunction` should return `Int` value because the `dataType` of `ApproxCountDistinct` is `IntegerType`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #893 from ueshin/issues/SPARK-1938 and squashes the following commits:
      
      3970e88 [Takuya UESHIN] Remove a superfluous line.
      5ad7ec1 [Takuya UESHIN] Make dataType for each of CountDistinct, ApproxCountDistinctMerge and ApproxCountDistinct LongType.
      cbe7c71 [Takuya UESHIN] Revert a change.
      fc3ac0f [Takuya UESHIN] Fix evaluated value type of ApproxCountDistinctMergeFunction to Int.
      9df86835
  6. May 27, 2014
  7. May 26, 2014
    • Reynold Xin's avatar
      Updated dev Python scripts to make them PEP8 compliant. · 9ed37190
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #875 from rxin/pep8-dev-scripts and squashes the following commits:
      
      04b084f [Reynold Xin] Made dev Python scripts PEP8 compliant.
      9ed37190
    • Reynold Xin's avatar
    • Zhen Peng's avatar
      SPARK-1929 DAGScheduler suspended by local task OOM · 8d271c90
      Zhen Peng authored
      DAGScheduler does not handle local task OOM properly, and will wait for the job result forever.
      
      Author: Zhen Peng <zhenpeng01@baidu.com>
      
      Closes #883 from zhpengg/bugfix-dag-scheduler-oom and squashes the following commits:
      
      76f7eda [Zhen Peng] remove redundant memory allocations
      aa63161 [Zhen Peng] SPARK-1929 DAGScheduler suspended by local task OOM
      8d271c90
    • Ankur Dave's avatar
      [SPARK-1931] Reconstruct routing tables in Graph.partitionBy · 56c771cb
      Ankur Dave authored
      905173df introduced a bug in partitionBy where, after repartitioning the edges, it reuses the VertexRDD without updating the routing tables to reflect the new edge layout. Subsequent accesses of the triplets contain nulls for many vertex properties.
      
      This commit adds a test for this bug and fixes it by introducing `VertexRDD#withEdges` and calling it in `partitionBy`.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #885 from ankurdave/SPARK-1931 and squashes the following commits:
      
      3930cdd [Ankur Dave] Note how to set up VertexRDD for efficient joins
      9bdbaa4 [Ankur Dave] [SPARK-1931] Reconstruct routing tables in Graph.partitionBy
      56c771cb
    • zsxwing's avatar
      SPARK-1925: Replace '&' with '&&' · cb7fe503
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1925
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #879 from zsxwing/SPARK-1925 and squashes the following commits:
      
      5cf5a6d [zsxwing] SPARK-1925: Replace '&' with '&&'
      cb7fe503
    • witgo's avatar
      Fix scalastyle warnings in yarn alpha · bee6c4f4
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #884 from witgo/scalastyle and squashes the following commits:
      
      4b08ae4 [witgo] Fix scalastyle warnings in yarn alpha
      bee6c4f4
Loading