Skip to content
Snippets Groups Projects
  1. Oct 28, 2014
    • Ryan Williams's avatar
      fix broken links in README.md · 4ceb048b
      Ryan Williams authored
      seems like `building-spark.html` was renamed to `building-with-maven.html`?
      
      Is Maven the blessed build tool these days, or SBT? I couldn't find a building-with-sbt page so I went with the Maven one here.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #2859 from ryan-williams/broken-links-readme and squashes the following commits:
      
      7692253 [Ryan Williams] fix broken links in README.md
      4ceb048b
    • GuoQiang Li's avatar
      [SPARK-4064]NioBlockTransferService.fetchBlocks may cause spark to hang. · 7c0c26cd
      GuoQiang Li authored
      cc @rxin
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2929 from witgo/SPARK-4064 and squashes the following commits:
      
      20110f2 [GuoQiang Li] Modify the exception msg
      3425225 [GuoQiang Li] review commits
      2b07e49 [GuoQiang Li] If we create a lot of big broadcast variables, Spark may hang
      7c0c26cd
    • wangxiaojing's avatar
      [SPARK-3907][SQL] Add truncate table support · 0c34fa5b
      wangxiaojing authored
      JIRA issue: [SPARK-3907]https://issues.apache.org/jira/browse/SPARK-3907
      
      Add turncate table support
      TRUNCATE TABLE table_name [PARTITION partition_spec];
      partition_spec:
        : (partition_col = partition_col_value, partition_col = partiton_col_value, ...)
      Removes all rows from a table or partition(s). Currently target table should be native/managed table or exception will be thrown. User can specify partial partition_spec for truncating multiple partitions at once and omitting partition_spec will truncate all partitions in the table.
      
      Author: wangxiaojing <u9jing@gmail.com>
      
      Closes #2770 from wangxiaojing/spark-3907 and squashes the following commits:
      
      63dbd81 [wangxiaojing] change hive scalastyle
      7a03707 [wangxiaojing] add comment
      f6e710e [wangxiaojing] change truncate table
      a1f692c [wangxiaojing] Correct spelling mistakes
      3b20007 [wangxiaojing] add truncate can not support column err message
      e483547 [wangxiaojing] add golden file
      77b1f20 [wangxiaojing]  add truncate table support
      0c34fa5b
  2. Oct 27, 2014
    • Yin Huai's avatar
      [SQL] Correct a variable name in JavaApplySchemaSuite.applySchemaToJSON · 27470d34
      Yin Huai authored
      `schemaRDD2` is not tested because `schemaRDD1` is registered again.
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2869 from yhuai/JavaApplySchemaSuite and squashes the following commits:
      
      95fe894 [Yin Huai] Correct variable name.
      27470d34
    • wangfei's avatar
      [SPARK-4041][SQL] Attributes names in table scan should converted to lowercase... · 89af6dfc
      wangfei authored
      [SPARK-4041][SQL] Attributes names in table scan should converted to lowercase when compare with relation attributes
      
      In ```MetastoreRelation``` the attributes name is lowercase because of hive using lowercase for fields name, so we should convert attributes name in table scan lowercase in ```indexWhere(_.name == a.name)```.
      ```neededColumnIDs``` may be not correct if not convert to lowercase.
      
      Author: wangfei <wangfei1@huawei.com>
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2884 from scwf/fixColumnIds and squashes the following commits:
      
      6174046 [scwf] use AttributeMap for this issue
      dc74a24 [wangfei] use lowerName and add a test case for this issue
      3ff3a80 [wangfei] more safer change
      294fcb7 [scwf] attributes names in table scan should convert lowercase in neededColumnsIDs
      89af6dfc
    • Alex Liu's avatar
      [SPARK-3816][SQL] Add table properties from storage handler to output jobConf · 698a7eab
      Alex Liu authored
      ...ob conf in SparkHadoopWriter class
      
      Author: Alex Liu <alex_liu68@yahoo.com>
      
      Closes #2677 from alexliu68/SPARK-SQL-3816 and squashes the following commits:
      
      79c269b [Alex Liu] [SPARK-3816][SQL] Add table properties from storage handler to job conf
      698a7eab
    • Cheng Hao's avatar
      [SPARK-3911] [SQL] HiveSimpleUdf can not be optimized in constant folding · 418ad83f
      Cheng Hao authored
      ```
      explain extended select cos(null) from src limit 1;
      ```
      outputs:
      ```
       Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#5]
        MetastoreRelation default, src, None
      
      == Optimized Logical Plan ==
      Limit 1
       Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#5]
        MetastoreRelation default, src, None
      
      == Physical Plan ==
      Limit 1
       Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#5]
        HiveTableScan [], (MetastoreRelation default, src, None), None
      ```
      After patching this PR it outputs
      ```
      == Parsed Logical Plan ==
      Limit 1
       Project ['cos(null) AS c_0#0]
        UnresolvedRelation None, src, None
      
      == Analyzed Logical Plan ==
      Limit 1
       Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#0]
        MetastoreRelation default, src, None
      
      == Optimized Logical Plan ==
      Limit 1
       Project [null AS c_0#0]
        MetastoreRelation default, src, None
      
      == Physical Plan ==
      Limit 1
       Project [null AS c_0#0]
        HiveTableScan [], (MetastoreRelation default, src, None), None
      ```
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2771 from chenghao-intel/hive_udf_constant_folding and squashes the following commits:
      
      1379c73 [Cheng Hao] duplicate the PlanTest with catalyst/plans/PlanTest
      1e52dda [Cheng Hao] add unit test for hive simple udf constant folding
      01609ff [Cheng Hao] support constant folding for HiveSimpleUdf
      418ad83f
    • coderxiang's avatar
      [MLlib] SPARK-3987: add test case on objective value for NNLS · 7e3a1ada
      coderxiang authored
      Also update step parameter to pass the proposed test
      
      Author: coderxiang <shuoxiangpub@gmail.com>
      
      Closes #2965 from coderxiang/nnls-test and squashes the following commits:
      
      24b06f9 [coderxiang] add test case on objective value for NNLS; update step parameter to pass the test
      7e3a1ada
    • Sean Owen's avatar
      SPARK-4022 [CORE] [MLLIB] Replace colt dependency (LGPL) with commons-math · bfa614b1
      Sean Owen authored
      This change replaces usages of colt with commons-math3 equivalents, and makes some minor necessary adjustments to related code and tests to match.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2928 from srowen/SPARK-4022 and squashes the following commits:
      
      61a232f [Sean Owen] Fix failure due to different sampling in JavaAPISuite.sample()
      16d66b8 [Sean Owen] Simplify seeding with call to reseedRandomGenerator
      a1a78e0 [Sean Owen] Use Well19937c
      31c7641 [Sean Owen] Fix Python Poisson test by choosing a different seed; about 88% of seeds should work but 1 didn't, it seems
      5c9c67f [Sean Owen] Additional test fixes from review
      d8f88e0 [Sean Owen] Replace colt with commons-math3. Some tests do not pass yet.
      bfa614b1
    • Cheng Lian's avatar
      [SQL] Fixes caching related JoinSuite failure · 1d7bcc88
      Cheng Lian authored
      PR #2860 refines in-memory table statistics and enables broader broadcasted hash join optimization for in-memory tables. This makes `JoinSuite` fail when some test suite caches test table `testData` and gets executed before `JoinSuite`. Because expected `ShuffledHashJoin`s are optimized to `BroadcastedHashJoin` according to collected in-memory table statistics.
      
      This PR fixes this issue by clearing the cache before testing join operator selection. A separate test case is also added to test broadcasted hash join operator selection.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #2960 from liancheng/fix-join-suite and squashes the following commits:
      
      715b2de [Cheng Lian] Fixes caching related JoinSuite failure
      1d7bcc88
    • Sandy Ryza's avatar
      SPARK-2621. Update task InputMetrics incrementally · dea302dd
      Sandy Ryza authored
      The patch takes advantage an API provided in Hadoop 2.5 that allows getting accurate data on Hadoop FileSystem bytes read.  It eliminates the old method, which naively accepts the split size as the input bytes.  An impact of this change will be that input metrics go away when using against Hadoop versions earlier thatn 2.5.  I can add this back in, but my opinion is that no metrics are better than inaccurate metrics.
      
      This is difficult to write a test for because we don't usually build against a version of Hadoop that contains the function we need.  I've tested it manually on a pseudo-distributed cluster.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #2087 from sryza/sandy-spark-2621 and squashes the following commits:
      
      23010b8 [Sandy Ryza] Missing style fixes
      74fc9bb [Sandy Ryza] Make getFSBytesReadOnThreadCallback private
      1ab662d [Sandy Ryza] Clear things up a bit
      984631f [Sandy Ryza] Switch from pull to push model and add test
      7ef7b22 [Sandy Ryza] Add missing curly braces
      219abc9 [Sandy Ryza] Fall back to split size
      90dbc14 [Sandy Ryza] SPARK-2621. Update task InputMetrics incrementally
      dea302dd
    • Prashant Sharma's avatar
      [SPARK-4032] Deprecate YARN alpha support in Spark 1.2 · c9e05ca2
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2878 from ScrapCodes/SPARK-4032/deprecate-yarn-alpha and squashes the following commits:
      
      17e9857 [Prashant Sharma] added deperecated comment to Client and ExecutorRunnable.
      3a34b1e [Prashant Sharma] Updated docs...
      4608dea [Prashant Sharma] [SPARK-4032] Deprecate YARN alpha support in Spark 1.2
      c9e05ca2
    • Shivaram Venkataraman's avatar
      [SPARK-4030] Make destroy public for broadcast variables · 9aa340a2
      Shivaram Venkataraman authored
      This change makes the destroy function public for broadcast variables. Motivation for the change is described in https://issues.apache.org/jira/browse/SPARK-4030.
      This patch also logs where destroy was called from if a broadcast variable is used after destruction.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #2922 from shivaram/broadcast-destroy and squashes the following commits:
      
      a11abab [Shivaram Venkataraman] Fix scala style in Utils.scala
      bed9c9d [Shivaram Venkataraman] Make destroy blocking by default
      e80c1ab [Shivaram Venkataraman] Make destroy public for broadcast variables Also log where destroy was called from if a broadcast variable is used after destruction.
      9aa340a2
  3. Oct 26, 2014
    • Liang-Chi Hsieh's avatar
      [SPARK-3970] Remove duplicate removal of local dirs · 6377adaf
      Liang-Chi Hsieh authored
      The shutdown hook of `DiskBlockManager` would remove localDirs. So do not need to register them with `Utils.registerShutdownDeleteDir`. It causes duplicate removal of these local dirs and corresponding exceptions.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2826 from viirya/fix_duplicate_localdir_remove and squashes the following commits:
      
      051d4b5 [Liang-Chi Hsieh] check dir existing and return empty List as default.
      2b91a9c [Liang-Chi Hsieh] remove duplicate removal of local dirs.
      6377adaf
    • scwf's avatar
      [SPARK-4042][SQL] Append columns ids and names before broadcast · f4e8c289
      scwf authored
      Append columns ids and names before broadcast ```hiveExtraConf```  in ```HadoopTableReader```.
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2885 from scwf/HadoopTableReader and squashes the following commits:
      
      a8c498c [scwf] append columns ids and names before broadcast
      f4e8c289
    • Kousuke Saruta's avatar
      [SPARK-4061][SQL] We cannot use EOL character in the operand of LIKE predicate. · 3a9d66cf
      Kousuke Saruta authored
      We cannot use EOL character like \n or \r in the operand of LIKE predicate.
      So following condition is never true.
      
          -- someStr is 'hoge\nfuga'
          where someStr LIKE 'hoge_fuga'
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2908 from sarutak/spark-sql-like-match-modification and squashes the following commits:
      
      d15798b [Kousuke Saruta] Remove test setting for thriftserver
      f99a2f4 [Kousuke Saruta] Fixed LIKE predicate so that we can use EOL character as in a operand
      3a9d66cf
    • Kousuke Saruta's avatar
      [SPARK-3959][SPARK-3960][SQL] SqlParser fails to parse literal... · ace41e8b
      Kousuke Saruta authored
      [SPARK-3959][SPARK-3960][SQL] SqlParser fails to parse literal -9223372036854775808 (Long.MinValue). / We can apply unary minus only to literal.
      
      SqlParser fails to parse -9223372036854775808 (Long.MinValue) so we cannot write queries such like as follows.
      
          SELECT value FROM someTable WHERE value > -9223372036854775808
      
      Additionally, because of the wrong syntax definition, we cannot apply unary minus only to literal. So, we cannot write such expressions.
      
          -(value1 + value2) // Parenthesized expressions
          -column // Columns
          -MAX(column) // Functions
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2816 from sarutak/spark-sql-dsl-improvement2 and squashes the following commits:
      
      32a5005 [Kousuke Saruta] Remove test setting for thriftserver
      c2bab5e [Kousuke Saruta] Fixed SPARK-3959 and SPARK-3960
      ace41e8b
    • ravipesala's avatar
      [SPARK-3483][SQL] Special chars in column names · 974d7b23
      ravipesala authored
      Supporting special chars in column names by using back ticks. Closed https://github.com/apache/spark/pull/2804 and created this PR as it has merge conflicts
      
      Author: ravipesala <ravindra.pesala@huawei.com>
      
      Closes #2927 from ravipesala/SPARK-3483-NEW and squashes the following commits:
      
      f6329f3 [ravipesala] Rebased with master
      974d7b23
    • Yin Huai's avatar
      [SPARK-4068][SQL] NPE in jsonRDD schema inference · 0481aaa8
      Yin Huai authored
      Please refer to added tests for cases that can trigger the bug.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-4068
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2918 from yhuai/SPARK-4068 and squashes the following commits:
      
      d360eae [Yin Huai] Handle nulls when building key paths from elements of an array.
      0481aaa8
    • Yin Huai's avatar
      [SPARK-4052][SQL] Use scala.collection.Map for pattern matching instead of... · 05308426
      Yin Huai authored
      [SPARK-4052][SQL] Use scala.collection.Map for pattern matching instead of using Predef.Map (it is scala.collection.immutable.Map)
      
      Please check https://issues.apache.org/jira/browse/SPARK-4052 for cases triggering this bug.
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2899 from yhuai/SPARK-4052 and squashes the following commits:
      
      1188f70 [Yin Huai] Address liancheng's comments.
      b6712be [Yin Huai] Use scala.collection.Map instead of Predef.Map (scala.collection.immutable.Map).
      05308426
    • Kousuke Saruta's avatar
      [SPARK-3953][SQL][Minor] Confusable variable name. · d518bc24
      Kousuke Saruta authored
      In SqlParser.scala, there is following code.
      
          case d ~ p ~ r ~ f ~ g ~ h ~ o ~ l  =>
            val base = r.getOrElse(NoRelation)
            val withFilter = f.map(f => Filter(f, base)).getOrElse(base)
      
      In the code above, there are 2 variables which have same name "f" in near place.
      One is receiver "f" and other is bound variable "f".
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2807 from sarutak/SPARK-3953 and squashes the following commits:
      
      4957c32 [Kousuke Saruta] Improved variable name in SqlParser.scala
      d518bc24
    • Kousuke Saruta's avatar
      [SQL][DOC] Wrong package name "scala.math.sql" in sql-programming-guide.md · dc51f4d6
      Kousuke Saruta authored
      In sql-programming-guide.md, there is a wrong package name "scala.math.sql".
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2873 from sarutak/wrong-packagename-fix and squashes the following commits:
      
      4d5ecf4 [Kousuke Saruta] Fixed wrong package name in sql-programming-guide.md
      dc51f4d6
    • GuoQiang Li's avatar
      [SPARK-3997][Build]scalastyle should output the error location · 89e8a5d8
      GuoQiang Li authored
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2846 from witgo/SPARK-3997 and squashes the following commits:
      
      d6a57f8 [GuoQiang Li] scalastyle should output the error location
      89e8a5d8
    • Cheng Lian's avatar
      [SPARK-3537][SPARK-3914][SQL] Refines in-memory columnar table statistics · 2838bf8a
      Cheng Lian authored
      This PR refines in-memory columnar table statistics:
      
      1. adds 2 more statistics for in-memory table columns: `count` and `sizeInBytes`
      1. adds filter pushdown support for `IS NULL` and `IS NOT NULL`.
      1. caches and propagates statistics in `InMemoryRelation` once the underlying cached RDD is materialized.
      
         Statistics are collected to driver side with an accumulator.
      
      This PR also fixes SPARK-3914 by properly propagating in-memory statistics.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #2860 from liancheng/propagates-in-mem-stats and squashes the following commits:
      
      0cc5271 [Cheng Lian] Restricts visibility of o.a.s.s.c.p.l.Statistics
      c5ff904 [Cheng Lian] Fixes test table name conflict
      a8c818d [Cheng Lian] Refines tests
      1d01074 [Cheng Lian] Bug fix: shouldn't call STRING.actualSize on null string value
      7dc6a34 [Cheng Lian] Adds more in-memory table statistics and propagates them properly
      2838bf8a
    • Michael Armbrust's avatar
      [HOTFIX][SQL] Temporarily turn off hive-server tests. · 879a1658
      Michael Armbrust authored
      The thirift server is not available in the default (hive13) profile yet which is breaking all SQL only PRs.  This turns off these test until #2685 is merged.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2950 from marmbrus/fixTests and squashes the following commits:
      
      1a6dfee [Michael Armbrust] [HOTFIX][SQL] Temporarily turn of hive-server tests.
      879a1658
    • Liang-Chi Hsieh's avatar
      [SPARK-3925][SQL] Do not consider the ordering of qualifiers during comparison · 0af7e514
      Liang-Chi Hsieh authored
      The orderings should not be considered during the comparison between old qualifiers and new qualifiers.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2783 from viirya/full_qualifier_comp and squashes the following commits:
      
      89f652c [Liang-Chi Hsieh] modification for comment.
      abb5762 [Liang-Chi Hsieh] More comprehensive comparison of qualifiers.
      0af7e514
    • anant asthana's avatar
      Just fixing comment that shows usage · 677852c3
      anant asthana authored
      Author: anant asthana <anant.asty@gmail.com>
      
      Closes #2948 from anantasty/patch-1 and squashes the following commits:
      
      d8fea0b [anant asthana] Just fixing comment that shows usage
      677852c3
    • Josh Rosen's avatar
      [SPARK-3616] Add basic Selenium tests to WebUISuite · bf589fc7
      Josh Rosen authored
      This patch adds Selenium tests for Spark's web UI.  To avoid adding extra
      dependencies to the test environment, the tests use Selenium's HtmlUnitDriver,
      which is pure-Java, instead of, say, ChromeDriver.
      
      I added new tests to try to reproduce a few UI bugs reported on JIRA, namely
      SPARK-3021, SPARK-2105, and SPARK-2527.  I wasn't able to reproduce these bugs;
      I suspect that the older ones might have been fixed by other patches.
      
      In order to use HtmlUnitDriver, I added an explicit dependency on the
      org.apache.httpcomponents version of httpclient in order to prevent jets3t's
      older version from taking precedence on the classpath.
      
      I also upgraded ScalaTest to 2.2.1.
      
      Author: Josh Rosen <joshrosen@apache.org>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2474 from JoshRosen/webui-selenium-tests and squashes the following commits:
      
      fcc9e83 [Josh Rosen] scalautils -> scalactic package rename
      510e54a [Josh Rosen] [SPARK-3616] Add basic Selenium tests to WebUISuite.
      bf589fc7
    • Daniel Lemire's avatar
      Update RoaringBitmap to 0.4.3 · b7595401
      Daniel Lemire authored
      Roaring has been updated to version 0.4.3. We fixed a rarely occurring bug with serialization. No API or format changes were made.
      
      Author: Daniel Lemire <lemire@gmail.com>
      
      Closes #2938 from lemire/master and squashes the following commits:
      
      431f3a0 [Daniel Lemire] Recommended bug fix release
      b7595401
    • Sean Owen's avatar
      SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work with Java 8 · df7974b8
      Sean Owen authored
      This follows https://github.com/apache/spark/pull/2893 , but does not completely fix SPARK-3359 either. This fixes minor scaladoc/javadoc issues that Javadoc 8 will treat as errors.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2909 from srowen/SPARK-3359 and squashes the following commits:
      
      f62c347 [Sean Owen] Fix some javadoc issues that javadoc 8 considers errors. This is not all of the errors turned up when javadoc 8 runs on output of genjavadoc.
      df7974b8
  4. Oct 25, 2014
    • Andrew Or's avatar
      [SPARK-4071] Unroll fails silently if BlockManager is small · c6834440
      Andrew Or authored
      In tests, we may want to have BlockManagers of size < 1MB (spark.storage.unrollMemoryThreshold). However, these BlockManagers are useless because we can't unroll anything in them ever. At the very least we need to log a warning.
      
      tdas
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #2917 from andrewor14/unroll-safely-logging and squashes the following commits:
      
      38947e3 [Andrew Or] Warn against starting a block manager that's too small
      fd621b4 [Andrew Or] Warn against failure to reserve initial memory threshold
      c6834440
    • Josh Rosen's avatar
      Revert "[SPARK-4056] Upgrade snappy-java to 1.1.1.5" · 2e52e4f8
      Josh Rosen authored
      This reverts commit 898b22ab.
      
      Reverting because this may be causing OOMs.
      2e52e4f8
    • Davies Liu's avatar
      [SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM · e41786c7
      Davies Liu authored
      In case of take() or exception in Python, python worker may exit before JVM read() all the response, then the write thread may raise "Connection reset" exception.
      
      Python should always wait JVM to close the socket first.
      
      cc JoshRosen This is a warm fix, or the tests will be flaky, sorry for that.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #2941 from davies/fix_exit and squashes the following commits:
      
      9d4d21e [Davies Liu] fix race
      e41786c7
    • Josh Rosen's avatar
      [SPARK-2321] Stable pull-based progress / status API · 95303168
      Josh Rosen authored
      This pull request is a first step towards the implementation of a stable, pull-based progress / status API for Spark (see [SPARK-2321](https://issues.apache.org/jira/browse/SPARK-2321)).  For now, I'd like to discuss the basic implementation, API names, and overall interface design.  Once we arrive at a good design, I'll go back and add additional methods to expose more information via these API.
      
      #### Design goals:
      
      - Pull-based API
      - Usable from Java / Scala / Python (eventually, likely with a wrapper)
      - Can be extended to expose more information without introducing binary incompatibilities.
      - Returns immutable objects.
      - Don't leak any implementation details, preserving our freedom to change the implementation.
      
      #### Implementation:
      
      - Add public methods (`getJobInfo`, `getStageInfo`) to SparkContext to allow status / progress information to be retrieved.
      - Add public interfaces (`SparkJobInfo`, `SparkStageInfo`) for our API return values.  These interfaces consist entirely of Java-style getter methods.  The interfaces are currently implemented in Java.  I decided to explicitly separate the interface from its implementation (`SparkJobInfoImpl`, `SparkStageInfoImpl`) in order to prevent users from constructing these responses themselves.
      -Allow an existing JobProgressListener to be used when constructing a live SparkUI.  This allows us to re-use this listeners in the implementation of this status API.  There are a few reasons why this listener re-use makes sense:
         - The status API and web UI are guaranteed to show consistent information.
         - These listeners are already well-tested.
         - The same garbage-collection / information retention configurations can apply to both this API and the web UI.
      - Extend JobProgressListener to maintain `jobId -> Job` and `stageId -> Stage` mappings.
      
      The progress API methods are implemented in a separate trait that's mixed into SparkContext.  This helps to avoid SparkContext.scala from becoming larger and more difficult to read.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2696 from JoshRosen/progress-reporting-api and squashes the following commits:
      
      e6aa78d [Josh Rosen] Add tests.
      b585c16 [Josh Rosen] Accept SparkListenerBus instead of more specific subclasses.
      c96402d [Josh Rosen] Address review comments.
      2707f98 [Josh Rosen] Expose current stage attempt id
      c28ba76 [Josh Rosen] Update demo code:
      646ff1d [Josh Rosen] Document spark.ui.retainedJobs.
      7f47d6d [Josh Rosen] Clean up SparkUI constructors, per Andrew's feedback.
      b77b3d8 [Josh Rosen] Merge remote-tracking branch 'origin/master' into progress-reporting-api
      787444c [Josh Rosen] Move status API methods into trait that can be mixed into SparkContext.
      f9a9a00 [Josh Rosen] More review comments:
      3dc79af [Josh Rosen] Remove creation of unused listeners in SparkContext.
      249ca16 [Josh Rosen] Address several review comments:
      da5648e [Josh Rosen] Add example of basic progress reporting in Java.
      7319ffd [Josh Rosen] Add getJobIdsForGroup() and num*Tasks() methods.
      cc568e5 [Josh Rosen] Add note explaining that interfaces should not be implemented outside of Spark.
      6e840d4 [Josh Rosen] Remove getter-style names and "consistent snapshot" semantics:
      08cbec9 [Josh Rosen] Begin to sketch the interfaces for a stable, public status API.
      ac2d13a [Josh Rosen] Add jobId->stage, stageId->stage mappings in JobProgressListener
      24de263 [Josh Rosen] Create UI listeners in SparkContext instead of in Tabs:
      95303168
  5. Oct 24, 2014
    • Michael Armbrust's avatar
      [SQL] Update Hive test harness for Hive 12 and 13 · 3a845d3c
      Michael Armbrust authored
      As part of the upgrade I also copy the newest version of the query tests, and whitelist a bunch of new ones that are now passing.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2936 from marmbrus/fix13tests and squashes the following commits:
      
      d9cbdab [Michael Armbrust] Remove user specific tests
      65801cd [Michael Armbrust] style and rat
      8f6b09a [Michael Armbrust] Update test harness to work with both Hive 12 and 13.
      f044843 [Michael Armbrust] Update Hive query tests and golden files to 0.13
      3a845d3c
    • Josh Rosen's avatar
      [SPARK-4056] Upgrade snappy-java to 1.1.1.5 · 898b22ab
      Josh Rosen authored
      This upgrades snappy-java to 1.1.1.5, which improves error messages when attempting to deserialize empty inputs using SnappyInputStream (see https://github.com/xerial/snappy-java/issues/89).
      
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2911 from JoshRosen/upgrade-snappy-java and squashes the following commits:
      
      adec96c [Josh Rosen] Use snappy-java 1.1.1.5
      cc953d6 [Josh Rosen] [SPARK-4056] Upgrade snappy-java to 1.1.1.4
      898b22ab
    • Josh Rosen's avatar
      [SPARK-4080] Only throw IOException from [write|read][Object|External] · 6c98c29a
      Josh Rosen authored
      If classes implementing Serializable or Externalizable interfaces throw
      exceptions other than IOException or ClassNotFoundException from their
      (de)serialization methods, then this results in an unhelpful
      "IOException: unexpected exception type" rather than the actual exception that
      produced the (de)serialization error.
      
      This patch fixes this by adding a utility method that re-wraps any uncaught
      exceptions in IOException (unless they are already instances of IOException).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits:
      
      cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write|read][Object|External].
      6c98c29a
    • Michael Armbrust's avatar
      [HOTFIX][SQL] Remove sleep on reset() failure. · 3a906c66
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2934 from marmbrus/patch-2 and squashes the following commits:
      
      a96dab2 [Michael Armbrust] Remove sleep on reset() failure.
      3a906c66
    • Grace's avatar
      [GraphX] Modify option name according to example doc in SynthBenchmark · 07e439b4
      Grace authored
      Now graphx.SynthBenchmark example has an option of iteration number named as "niter". However, in its document, it is named as "niters". The mismatch between the implementation and document causes certain IllegalArgumentException while trying that example.
      
      Author: Grace <jie.huang@intel.com>
      
      Closes #2888 from GraceH/synthbenchmark and squashes the following commits:
      
      f101ee1 [Grace] Modify option name according to example doc
      07e439b4
    • Nan Zhu's avatar
      [SPARK-4067] refactor ExecutorUncaughtExceptionHandler · f80dcf2a
      Nan Zhu authored
      https://issues.apache.org/jira/browse/SPARK-4067
      
      currently , we call Utils.tryOrExit everywhere
      AppClient
      Executor
      TaskSchedulerImpl
      It makes the name of ExecutorUncaughtExceptionHandler unfit to the real case....
      
      Author: Nan Zhu <nanzhu@Nans-MacBook-Pro.local>
      Author: Nan Zhu <nanzhu@nans-mbp.home>
      
      Closes #2913 from CodingCat/SPARK-4067 and squashes the following commits:
      
      035ee3d [Nan Zhu] make RAT happy
      e62e416 [Nan Zhu] add some general Exit code
      a10b63f [Nan Zhu] refactor
      f80dcf2a
Loading