Skip to content
Snippets Groups Projects
  1. Nov 10, 2014
    • Jey Kottalam's avatar
      [SPARK-4312] bash doesn't have "die" · c5db8e2c
      Jey Kottalam authored
      sbt-launch-lib.bash includes `die` command but it's not valid command for Linux, MacOS X or Windows.
      
      Closes #2898
      
      Author: Jey Kottalam <jey@kottalam.net>
      
      Closes #3182 from sarutak/SPARK-4312 and squashes the following commits:
      
      24c6677 [Jey Kottalam] bash doesn't have "die"
      c5db8e2c
    • comcmipi's avatar
      Update RecoverableNetworkWordCount.scala · 0340c56a
      comcmipi authored
      Trying this example, I missed the moment when the checkpoint was iniciated
      
      Author: comcmipi <pitonak@fns.uniba.sk>
      
      Closes #2735 from comcmipi/patch-1 and squashes the following commits:
      
      b6d8001 [comcmipi] Update RecoverableNetworkWordCount.scala
      96fe274 [comcmipi] Update RecoverableNetworkWordCount.scala
      0340c56a
    • Sean Owen's avatar
      SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing · 3a02d416
      Sean Owen authored
      Here's my attempt to re-port `RecoverableNetworkWordCount` to Java, following the example of its Scala and Java siblings. I fixed a few minor doc/formatting issues along the way I believe.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2564 from srowen/SPARK-2548 and squashes the following commits:
      
      0d0bf29 [Sean Owen] Update checkpoint call as in https://github.com/apache/spark/pull/2735
      35f23e3 [Sean Owen] Remove old comment about running in standalone mode
      179b3c2 [Sean Owen] Re-port RecoverableNetworkWordCount to Java example, and touch up doc / formatting in related examples
      3a02d416
    • Niklas Wilcke's avatar
      [SPARK-4169] [Core] Accommodate non-English Locales in unit tests · ed8bf1ea
      Niklas Wilcke authored
      For me the core tests failed because there are two locale dependent parts in the code.
      Look at the Jira ticket for details.
      
      Why is it necessary to check the exception message in isBindCollision in
      https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1686
      ?
      
      Author: Niklas Wilcke <1wilcke@informatik.uni-hamburg.de>
      
      Closes #3036 from numbnut/core-test-fix and squashes the following commits:
      
      1fb0d04 [Niklas Wilcke] Fixing locale dependend code and tests
      ed8bf1ea
    • Xiangrui Meng's avatar
      [SQL] support udt to hive types conversion (hive->udt is not supported) · 894a7245
      Xiangrui Meng authored
      marmbrus
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #3164 from mengxr/hive-udt and squashes the following commits:
      
      57c7519 [Xiangrui Meng] support udt->hive types (hive->udt is not supported)
      894a7245
    • RongGu's avatar
      [SPARK-2703][Core]Make Tachyon related unit tests execute without deploying a... · bd86cb17
      RongGu authored
      [SPARK-2703][Core]Make Tachyon related unit tests execute without deploying a Tachyon system locally.
      
      Make Tachyon related unit tests execute without deploying a Tachyon system locally.
      
      Author: RongGu <gurongwalker@gmail.com>
      
      Closes #3030 from RongGu/SPARK-2703 and squashes the following commits:
      
      ad08827 [RongGu] Make Tachyon related unit tests execute without deploying a Tachyon system locally
      bd86cb17
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 227488d8
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #2898 (close requested by 'pwendell')
      Closes #2212 (close requested by 'pwendell')
      Closes #2102 (close requested by 'pwendell')
      227488d8
    • Sandy Ryza's avatar
      SPARK-3179. Add task OutputMetrics. · 3c2cff4b
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #2968 from sryza/sandy-spark-3179 and squashes the following commits:
      
      dce4784 [Sandy Ryza] More review feedback
      8d350d1 [Sandy Ryza] Fix test against Hadoop 2.5+
      e7c74d0 [Sandy Ryza] More review feedback
      6cff9c4 [Sandy Ryza] Review feedback
      fb2dde0 [Sandy Ryza] SPARK-3179
      3c2cff4b
    • Sean Owen's avatar
      SPARK-1209 [CORE] (Take 2) SparkHadoop{MapRed,MapReduce}Util should not use... · f8e57323
      Sean Owen authored
      SPARK-1209 [CORE] (Take 2) SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop
      
      andrewor14 Another try at SPARK-1209, to address https://github.com/apache/spark/pull/2814#issuecomment-61197619
      
      I successfully tested with `mvn -Dhadoop.version=1.0.4 -DskipTests clean package; mvn -Dhadoop.version=1.0.4 test` I assume that is what failed Jenkins last time. I also tried `-Dhadoop.version1.2.1` and `-Phadoop-2.4 -Pyarn -Phive` for more coverage.
      
      So this is why the class was put in `org.apache.hadoop` to begin with, I assume. One option is to leave this as-is for now and move it only when Hadoop 1.0.x support goes away.
      
      This is the other option, which adds a call to force the constructor to be public at run-time. It's probably less surprising than putting Spark code in `org.apache.hadoop`, but, does involve reflection. A `SecurityManager` might forbid this, but it would forbid a lot of stuff Spark does. This would also only affect Hadoop 1.0.x it seems.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3048 from srowen/SPARK-1209 and squashes the following commits:
      
      0d48f4b [Sean Owen] For Hadoop 1.0.x, make certain constructors public, which were public in later versions
      466e179 [Sean Owen] Disable MIMA warnings resulting from moving the class -- this was also part of the PairRDDFunctions type hierarchy though?
      eb61820 [Sean Owen] Move SparkHadoopMapRedUtil / SparkHadoopMapReduceUtil from org.apache.hadoop to org.apache.spark
      f8e57323
  2. Nov 09, 2014
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · f73b56f5
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #464 (close requested by 'JoshRosen')
      Closes #283 (close requested by 'pwendell')
      Closes #449 (close requested by 'pwendell')
      Closes #907 (close requested by 'pwendell')
      Closes #2478 (close requested by 'JoshRosen')
      Closes #2192 (close requested by 'tdas')
      Closes #918 (close requested by 'pwendell')
      Closes #1465 (close requested by 'pwendell')
      Closes #3135 (close requested by 'JoshRosen')
      Closes #1693 (close requested by 'tdas')
      Closes #1279 (close requested by 'pwendell')
      f73b56f5
    • Sean Owen's avatar
      SPARK-1344 [DOCS] Scala API docs for top methods · d1362659
      Sean Owen authored
      Use "k" in javadoc of top and takeOrdered to avoid confusion with type K in pair RDDs. I think this resolves the discussion in SPARK-1344.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3168 from srowen/SPARK-1344 and squashes the following commits:
      
      6963fcc [Sean Owen] Use "k" in javadoc of top and takeOrdered to avoid confusion with type K in pair RDDs
      d1362659
    • Sean Owen's avatar
      SPARK-971 [DOCS] Link to Confluence wiki from project website / documentation · 8c99a47a
      Sean Owen authored
      This is a trivial change to add links to the wiki from `README.md` and the main docs page. It is already linked to from spark.apache.org.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3169 from srowen/SPARK-971 and squashes the following commits:
      
      dcb84d0 [Sean Owen] Add link to wiki from README, docs home page
      8c99a47a
  3. Nov 08, 2014
    • Josh Rosen's avatar
      [SPARK-4301] StreamingContext should not allow start() to be called after calling stop() · 7b41b17f
      Josh Rosen authored
      In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`.
      
      The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.
      
      Prior discussions:
      - https://github.com/apache/spark/pull/3053#discussion-diff-19710333R490
      - https://github.com/apache/spark/pull/3121#issuecomment-61927353
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits:
      
      dbcc929 [Josh Rosen] Address more review comments
      bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before.
      03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called.
      832a7f4 [Josh Rosen] Address review comment
      5142517 [Josh Rosen] Add tests; improve Scaladoc.
      813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49
      5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.
      7b41b17f
    • Aaron Davidson's avatar
      [Minor] [Core] Don't NPE on closeQuietly(null) · 4af5c7e2
      Aaron Davidson authored
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3166 from aarondav/closeQuietlyer and squashes the following commits:
      
      78096b5 [Aaron Davidson] Don't NPE on closeQuietly(null)
      4af5c7e2
    • Andrew Or's avatar
      [SPARK-4291][Build] Rename network module projects · 7afc8564
      Andrew Or authored
      The names of the recently introduced network modules are inconsistent with those of the other modules in the project. We should just drop the "Code" suffix since it doesn't sacrifice any meaning, especially before they get into an official release.
      
      ```
      [INFO] Reactor Build Order:
      [INFO]
      [INFO] Spark Project Parent POM
      [INFO] Spark Project Common Network Code
      [INFO] Spark Project Shuffle Streaming Service Code
      [INFO] Spark Project Core
      [INFO] Spark Project Bagel
      [INFO] Spark Project GraphX
      [INFO] Spark Project Streaming
      [INFO] Spark Project Catalyst
      [INFO] Spark Project SQL
      [INFO] Spark Project ML Library
      [INFO] Spark Project Tools
      [INFO] Spark Project Hive
      [INFO] Spark Project REPL
      [INFO] Spark Project YARN Parent POM
      [INFO] Spark Project YARN Stable API
      [INFO] Spark Project Assembly
      [INFO] Spark Project External Twitter
      [INFO] Spark Project External Kafka
      [INFO] Spark Project External Flume Sink
      [INFO] Spark Project External Flume
      [INFO] Spark Project External ZeroMQ
      [INFO] Spark Project External MQTT
      [INFO] Spark Project Examples
      [INFO] Spark Project Yarn Shuffle Service Code
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #3148 from andrewor14/build-drop-code and squashes the following commits:
      
      eac839b [Andrew Or] Network -> Networking
      d01ad47 [Andrew Or] Rename network module project names
      7afc8564
    • Michelangelo D'Agostino's avatar
      [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API · 7e9d9756
      Michelangelo D'Agostino authored
      SPARK-1553 added alternating nonnegative least squares to MLLib, however it's not possible to access it via the python API.  This pull request resolves that.
      
      Author: Michelangelo D'Agostino <mdagostino@civisanalytics.com>
      
      Closes #3095 from mdagost/python_nmf and squashes the following commits:
      
      a6743ad [Michelangelo D'Agostino] Use setters instead of static methods in PythonMLLibAPI.  Remove the new static methods I added.  Set seed in tests.  Change ratings to ratingsRDD in both train and trainImplicit for consistency.
      7cffd39 [Michelangelo D'Agostino] Swapped nonnegative and seed in a few more places.
      3fdc851 [Michelangelo D'Agostino] Moved seed to the end of the python parameter list.
      bdcc154 [Michelangelo D'Agostino] Change seed type to java.lang.Long so that it can handle null.
      cedf043 [Michelangelo D'Agostino] Added in ability to set the seed from python and made that play nice with the nonnegative changes.  Also made the python ALS tests more exact.
      a72fdc9 [Michelangelo D'Agostino] Expose nonnegative ALS in the python API.
      7e9d9756
  4. Nov 07, 2014
    • Davies Liu's avatar
      [SPARK-4304] [PySpark] Fix sort on empty RDD · 77791097
      Davies Liu authored
      This PR fix sortBy()/sortByKey() on empty RDD.
      
      This should be back ported into 1.1/1.2
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3162 from davies/fix_sort and squashes the following commits:
      
      84f64b7 [Davies Liu] add tests
      52995b5 [Davies Liu] fix sortByKey() on empty RDD
      77791097
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 5923dd98
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3016 (close requested by 'andrewor14')
      Closes #2798 (close requested by 'andrewor14')
      Closes #2864 (close requested by 'andrewor14')
      Closes #3154 (close requested by 'JoshRosen')
      Closes #3156 (close requested by 'JoshRosen')
      Closes #214 (close requested by 'kayousterhout')
      Closes #2584 (close requested by 'andrewor14')
      5923dd98
    • xiao321's avatar
      Update JavaCustomReceiver.java · 7c9ec529
      xiao321 authored
      数组下标越界
      
      Author: xiao321 <1042460381@qq.com>
      
      Closes #3153 from xiao321/patch-1 and squashes the following commits:
      
      0ed17b5 [xiao321] Update JavaCustomReceiver.java
      7c9ec529
    • wangfei's avatar
      [SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC · d6e55524
      wangfei authored
      select * from src, get the wrong result set as follows:
      ```
      ...
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 309  | val_309  |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      | 97   | val_97   |
      ...
      
      ```
      
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #3149 from scwf/SPARK-4292 and squashes the following commits:
      
      1574a43 [wangfei] using result.collect
      8b2d845 [wangfei] adding test
      f64eddf [wangfei] result set iter bug
      d6e55524
    • Matthew Taylor's avatar
      [SPARK-4203][SQL] Partition directories in random order when inserting into hive table · ac70c972
      Matthew Taylor authored
      When doing an insert into hive table with partitions the folders written to the file system are in a random order instead of the order defined in table creation. Seems that the loadPartition method in Hive.java has a Map<String,String> parameter but expects to be called with a map that has a defined ordering such as LinkedHashMap. Working on a test but having intillij problems
      
      Author: Matthew Taylor <matthew.t@tbfe.net>
      
      Closes #3076 from tbfenet/partition_dir_order_problem and squashes the following commits:
      
      f1b9a52 [Matthew Taylor] Comment format fix
      bca709f [Matthew Taylor] review changes
      0e50f6b [Matthew Taylor] test fix
      99f1a31 [Matthew Taylor] partition ordering fix
      369e618 [Matthew Taylor] partition ordering fix
      ac70c972
    • Takuya UESHIN's avatar
      [SPARK-4270][SQL] Fix Cast from DateType to DecimalType. · a6405c5d
      Takuya UESHIN authored
      `Cast` from `DateType` to `DecimalType` throws `NullPointerException`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #3134 from ueshin/issues/SPARK-4270 and squashes the following commits:
      
      7394e4b [Takuya UESHIN] Fix Cast from DateType to DecimalType.
      a6405c5d
    • Cheng Hao's avatar
      [SPARK-4272] [SQL] Add more unwrapper functions for primitive type in TableReader · 60ab80f5
      Cheng Hao authored
      Currently, the data "unwrap" only support couple of primitive types, not all, it will not cause exception, but may get some performance in table scanning for the type like binary, date, timestamp, decimal etc.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3136 from chenghao-intel/table_reader and squashes the following commits:
      
      fffb729 [Cheng Hao] fix bug for retrieving the timestamp object
      e9c97a4 [Cheng Hao] Add more unwrapper functions for primitive type in TableReader
      60ab80f5
    • Kousuke Saruta's avatar
      [SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators · 14c54f18
      Kousuke Saruta authored
      Following description is quoted from JIRA:
      
      When I issue a hql query against a HiveContext where my predicate uses a column of string type with one of LT, LTE, GT, or GTE operator, I get the following error:
      scala.MatchError: StringType (of class org.apache.spark.sql.catalyst.types.StringType$)
      Looking at the code in org.apache.spark.sql.parquet.ParquetFilters, StringType is absent from the corresponding functions for creating these filters.
      To reproduce, in a Hive 0.13.1 shell, I created the following table (at a specified DB):
      
          create table sparkbug (
          id int,
          event string
          ) stored as parquet;
      
      Insert some sample data:
      
          insert into table sparkbug select 1, '2011-06-18' from <some table> limit 1;
          insert into table sparkbug select 2, '2012-01-01' from <some table> limit 1;
      
      Launch a spark shell and create a HiveContext to the metastore where the table above is located.
      
          import org.apache.spark.sql._
          import org.apache.spark.sql.SQLContext
          import org.apache.spark.sql.hive.HiveContext
          val hc = new HiveContext(sc)
          hc.setConf("spark.sql.shuffle.partitions", "10")
          hc.setConf("spark.sql.hive.convertMetastoreParquet", "true")
          hc.setConf("spark.sql.parquet.compression.codec", "snappy")
          import hc._
          hc.hql("select * from <db>.sparkbug where event >= '2011-12-01'")
      
      A scala.MatchError will appear in the output.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3083 from sarutak/SPARK-4213 and squashes the following commits:
      
      4ab6e56 [Kousuke Saruta] WIP
      b6890c6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4213
      9a1fae7 [Kousuke Saruta] Fixed ParquetFilters so that compare Strings
      14c54f18
    • Jacky Li's avatar
      [SQL] Modify keyword val location according to ordering · 68609c51
      Jacky Li authored
      'DOUBLE' should be moved before 'ELSE' according to the ordering convension
      
      Author: Jacky Li <jacky.likun@gmail.com>
      
      Closes #3080 from jackylk/patch-5 and squashes the following commits:
      
      3c11df7 [Jacky Li] [SQL] Modify keyword val location according to ordering
      68609c51
    • Michael Armbrust's avatar
      [SQL] Support ScalaReflection of schema in different universes · 8154ed7d
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #3096 from marmbrus/reflectionContext and squashes the following commits:
      
      adc221f [Michael Armbrust] Support ScalaReflection of schema in different universes
      8154ed7d
    • Cheng Lian's avatar
      [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version · 86e9eaa3
      Cheng Lian authored
      This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in the assembly jar to inspect Spark version. Currently, when built with Maven, the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 MANIFEST.MF, probably because of the assembly/shading tricks.
      
      Another related PR is #3103, which tries to fix the MANIFEST issue.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #3105 from liancheng/spark-4225 and squashes the following commits:
      
      d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version
      86e9eaa3
    • wangfei's avatar
      [SQL][DOC][Minor] Spark SQL Hive now support dynamic partitioning · 636d7bcc
      wangfei authored
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #3127 from scwf/patch-9 and squashes the following commits:
      
      e39a560 [wangfei] now support dynamic partitioning
      636d7bcc
    • Aaron Davidson's avatar
      [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages · d4fa04e5
      Aaron Davidson authored
      This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1).
      
      This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time.
      
      Additionally this unifies the RPC messages of NettyBlockTransferService and ExternalShuffleClient.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3146 from aarondav/free and squashes the following commits:
      
      ed1102a [Aaron Davidson] Remove some unused imports
      b8e2a49 [Aaron Davidson] Add appId to test
      538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages
      d4fa04e5
  5. Nov 06, 2014
    • zsxwing's avatar
      [SPARK-4204][Core][WebUI] Change Utils.exceptionString to contain the inner... · 3abdb1b2
      zsxwing authored
      [SPARK-4204][Core][WebUI] Change Utils.exceptionString to contain the inner exceptions and make the error information in Web UI more friendly
      
      This PR fixed `Utils.exceptionString` to output the full exception information. However, the stack trace may become very huge, so I also updated the Web UI to collapse the error information by default (display the first line and clicking `+detail` will display the full info).
      
      Here are the screenshots:
      
      Stages:
      ![stages](https://cloud.githubusercontent.com/assets/1000778/4882441/66d8cc68-6356-11e4-8346-6318677d9470.png)
      
      Details for one stage:
      ![stage](https://cloud.githubusercontent.com/assets/1000778/4882513/1311043c-6357-11e4-8804-ca14240a9145.png)
      
      The full information in the gray text field is:
      ```Java
      org.apache.spark.shuffle.FetchFailedException: Connection reset by peer
      	at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
      	at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
      	at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
      	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
      	at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
      	at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160)
      	at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
      	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
      	at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
      	at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
      	at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
      	at org.apache.spark.scheduler.Task.run(Task.scala:56)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.IOException: Connection reset by peer
      	at sun.nio.ch.FileDispatcher.read0(Native Method)
      	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
      	at sun.nio.ch.IOUtil.read(IOUtil.java:166)
      	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
      	at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
      	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
      	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
      	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
      	... 1 more
      ```
      
      /cc aarondav
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3073 from zsxwing/SPARK-4204 and squashes the following commits:
      
      176d1e3 [zsxwing] Add comments to explain the stack trace difference
      ca509d3 [zsxwing] Add fullStackTrace to the constructor of ExceptionFailure
      a07057b [zsxwing] Core style fix
      dfb0032 [zsxwing] Backward compatibility for old history server
      1e50f71 [zsxwing] Update as per review and increase the max height of the stack trace details
      94f2566 [zsxwing] Change Utils.exceptionString to contain the inner exceptions and make the error information in Web UI more friendly
      3abdb1b2
    • Aaron Davidson's avatar
      [SPARK-4236] Cleanup removed applications' files in shuffle service · 48a19a6d
      Aaron Davidson authored
      This relies on a hook from whoever is hosting the shuffle service to invoke removeApplication() when the application is completed. Once invoked, we will clean up all the executors' shuffle directories we know about.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3126 from aarondav/cleanup and squashes the following commits:
      
      33a64a9 [Aaron Davidson] Missing brace
      e6e428f [Aaron Davidson] Address comments
      16a0d27 [Aaron Davidson] Cleanup
      e4df3e7 [Aaron Davidson] [SPARK-4236] Cleanup removed applications' files in shuffle service
      48a19a6d
    • Aaron Davidson's avatar
      [SPARK-4188] [Core] Perform network-level retry of shuffle file fetches · f165b2bb
      Aaron Davidson authored
      This adds a RetryingBlockFetcher to the NettyBlockTransferService which is wrapped around our typical OneForOneBlockFetcher, adding retry logic in the event of an IOException.
      
      This sort of retry allows us to avoid marking an entire executor as failed due to garbage collection or high network load.
      
      TODO:
      - [x] unit tests
      - [x] put in ExternalShuffleClient too
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3101 from aarondav/retry and squashes the following commits:
      
      72a2a32 [Aaron Davidson] Add that we should remove the condition around the retry thingy
      c7fd107 [Aaron Davidson] Fix unit tests
      e80e4c2 [Aaron Davidson] Address initial comments
      6f594cd [Aaron Davidson] Fix unit test
      05ff43c [Aaron Davidson] Add to external shuffle client and add unit test
      66e5a24 [Aaron Davidson] [SPARK-4238] [Core] Perform network-level retry of shuffle file fetches
      f165b2bb
    • Aaron Davidson's avatar
      [SPARK-4277] Support external shuffle service on Standalone Worker · 6e9ef10f
      Aaron Davidson authored
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3142 from aarondav/worker and squashes the following commits:
      
      3780bd7 [Aaron Davidson] Address comments
      2dcdfc1 [Aaron Davidson] Add private[worker]
      47f49d3 [Aaron Davidson] NettyBlockTransferService shouldn't care about app ids (it's only b/t executors)
      258417c [Aaron Davidson] [SPARK-4277] Support external shuffle service on executor
      6e9ef10f
    • Andrew Or's avatar
      [SPARK-3797] Minor addendum to Yarn shuffle service · 96136f22
      Andrew Or authored
      I did not realize there was a `network.util.JavaUtils` when I wrote this code. This PR moves the `ByteBuffer` string conversion to the appropriate place. I tested the changes on a stable yarn cluster.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #3144 from andrewor14/yarn-shuffle-util and squashes the following commits:
      
      b6c08bf [Andrew Or] Remove unused import
      94e205c [Andrew Or] Use netty Unpooled
      85202a5 [Andrew Or] Use guava Charsets
      057135b [Andrew Or] Reword comment
      adf186d [Andrew Or] Move byte buffer String conversion logic to JavaUtils
      96136f22
    • Andrew Or's avatar
      [HOT FIX] Make distribution fails · 470881b2
      Andrew Or authored
      This was added by me in https://github.com/apache/spark/commit/61a5cced049a8056292ba94f23fa7bd040f50685. The real fix will be added in [SPARK-4281](https://issues.apache.org/jira/browse/SPARK-4281).
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #3145 from andrewor14/fix-make-distribution and squashes the following commits:
      
      c78be61 [Andrew Or] Hot fix make distribution
      470881b2
    • lianhuiwang's avatar
      [SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx · d15c6e9d
      lianhuiwang authored
      at first srcIds is not initialized and are all 0. so we use edgeArray(0).srcId to currSrcId
      
      Author: lianhuiwang <lianhuiwang09@gmail.com>
      
      Closes #3138 from lianhuiwang/SPARK-4249 and squashes the following commits:
      
      3f4e503 [lianhuiwang] fix a problem of EdgePartitionBuilder in Graphx
      d15c6e9d
    • Aaron Davidson's avatar
      [SPARK-4264] Completion iterator should only invoke callback once · 23eaf0e1
      Aaron Davidson authored
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3128 from aarondav/compiter and squashes the following commits:
      
      698e4be [Aaron Davidson] [SPARK-4264] Completion iterator should only invoke callback once
      23eaf0e1
    • Davies Liu's avatar
      [SPARK-4186] add binaryFiles and binaryRecords in Python · b41a39e2
      Davies Liu authored
      add binaryFiles() and binaryRecords() in Python
      ```
      binaryFiles(self, path, minPartitions=None):
          :: Developer API ::
      
          Read a directory of binary files from HDFS, a local file system
          (available on all nodes), or any Hadoop-supported file system URI
          as a byte array. Each file is read as a single record and returned
          in a key-value pair, where the key is the path of each file, the
          value is the content of each file.
      
          Note: Small files are preferred, large file is also allowable, but
          may cause bad performance.
      
      binaryRecords(self, path, recordLength):
          Load data from a flat binary file, assuming each record is a set of numbers
          with the specified numerical format (see ByteBuffer), and the number of
          bytes per record is constant.
      
          :param path: Directory to the input data files
          :param recordLength: The length at which to split the records
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3078 from davies/binary and squashes the following commits:
      
      cd0bdbd [Davies Liu] Merge branch 'master' of github.com:apache/spark into binary
      3aa349b [Davies Liu] add experimental notes
      24e84b6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into binary
      5ceaa8a [Davies Liu] Merge branch 'master' of github.com:apache/spark into binary
      1900085 [Davies Liu] bugfix
      bb22442 [Davies Liu] add binaryFiles and binaryRecords in Python
      b41a39e2
    • Kay Ousterhout's avatar
      [SPARK-4255] Fix incorrect table striping · 5f27ae16
      Kay Ousterhout authored
      This commit stripes table rows after hiding some rows, to
      ensure that rows are correct striped to alternate white
      and grey even when rows are hidden by default.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #3117 from kayousterhout/striping and squashes the following commits:
      
      be6e10a [Kay Ousterhout] [SPARK-4255] Fix incorrect table striping
      5f27ae16
  6. Nov 05, 2014
Loading