Skip to content
Snippets Groups Projects
  1. Apr 16, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-6855] [SPARKR] Set R includes to get the right collate order. · 55f553a9
      Shivaram Venkataraman authored
      This prevents tools like devtools::document creating invalid collate orders
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #5462 from shivaram/collate-order and squashes the following commits:
      
      f3db562 [Shivaram Venkataraman] Set R includes to get the right collate order. This prevents tools like devtools::document creating invalid collate orders
      55f553a9
    • zsxwing's avatar
      [SPARK-6934][Core] Use 'spark.akka.askTimeout' for the ask timeout · ef3fb801
      zsxwing authored
      Fixed my mistake in #4588
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5529 from zsxwing/SPARK-6934 and squashes the following commits:
      
      9890b2d [zsxwing] Use 'spark.akka.askTimeout' for the ask timeout
      ef3fb801
    • Jin Adachi's avatar
      [SPARK-6694][SQL]SparkSQL CLI must be able to specify an option --database on the command line. · 3ae37b93
      Jin Adachi authored
      SparkSQL CLI has an option --database as follows.
      But, the option --database is ignored.
      
      ```
      $ spark-sql --help
      :
      CLI options:
          :
          --database <databasename>     Specify the database to use
      ```
      
      Author: Jin Adachi <adachij2002@yahoo.co.jp>
      Author: adachij <adachij@nttdata.co.jp>
      
      Closes #5345 from adachij2002/SPARK-6694 and squashes the following commits:
      
      8659084 [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694
      0301eb9 [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694
      df81086 [Jin Adachi] Modify code style.
      846f83e [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694
      dbe8c63 [Jin Adachi] Change file permission to 644.
      7b58f42 [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694
      c581d06 [Jin Adachi] Add an option --database test
      db56122 [Jin Adachi] Merge branch 'SPARK-6694' of https://github.com/adachij2002/spark into SPARK-6694
      ee09fa5 [adachij] Merge branch 'master' into SPARK-6694
      c804c03 [adachij] SparkSQL CLI must be able to specify an option --database on the command line.
      3ae37b93
    • Marcelo Vanzin's avatar
      [SPARK-4194] [core] Make SparkContext initialization exception-safe. · de4fa6b6
      Marcelo Vanzin authored
      SparkContext has a very long constructor, where multiple things are
      initialized, multiple threads are spawned, and multiple opportunities
      for exceptions to be thrown exist. If one of these happens at an
      innoportune time, lots of garbage tends to stick around.
      
      This patch re-organizes SparkContext so that its internal state is
      initialized in a big "try" block. The fields keeping state are now
      completely private to SparkContext, and are "vars", because Scala
      doesn't allow you to initialize a val later. The existing API interface
      is kept by turning vals into defs (which works because Scala guarantees
      the same binary interface for those).
      
      On top of that, a few things in other areas were changed to avoid more
      things leaking:
      
      - Executor was changed to explicitly wait for the heartbeat thread to
        stop. LocalBackend was changed to wait for the "StopExecutor"
        message to be received, since otherwise there could be a race
        between that message arriving and the actor system being shut down.
      - ConnectionManager could possibly hang during shutdown, because an
        interrupt at the wrong moment could cause the selector thread to
        still call select and then wait forever. So also wake up the
        selector so that this situation is avoided.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5335 from vanzin/SPARK-4194 and squashes the following commits:
      
      746b661 [Marcelo Vanzin] Fix borked merge.
      80fc00e [Marcelo Vanzin] Merge branch 'master' into SPARK-4194
      408dada [Marcelo Vanzin] Merge branch 'master' into SPARK-4194
      2621609 [Marcelo Vanzin] Merge branch 'master' into SPARK-4194
      6b73fcb [Marcelo Vanzin] Scalastyle.
      c671c46 [Marcelo Vanzin] Fix merge.
      3979aad [Marcelo Vanzin] Merge branch 'master' into SPARK-4194
      8caa8b3 [Marcelo Vanzin] [SPARK-4194] [core] Make SparkContext initialization exception-safe.
      071f16e [Marcelo Vanzin] Nits.
      27456b9 [Marcelo Vanzin] More exception safety.
      a0b0881 [Marcelo Vanzin] Stop alloc manager before scheduler.
      5545d83 [Marcelo Vanzin] [SPARK-6650] [core] Stop ExecutorAllocationManager when context stops.
      de4fa6b6
    • Sean Owen's avatar
      SPARK-4783 [CORE] System.exit() calls in SparkContext disrupt applications embedding Spark · 6179a948
      Sean Owen authored
      Avoid `System.exit(1)` in `TaskSchedulerImpl` and convert to `SparkException`; ensure scheduler calls `sc.stop()` even when this exception is thrown.
      
      CC mateiz aarondav as those who may have last touched this code.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5492 from srowen/SPARK-4783 and squashes the following commits:
      
      60dc682 [Sean Owen] Avoid System.exit(1) in TaskSchedulerImpl and convert to SparkException; ensure scheduler calls sc.stop() even when this exception is thrown
      6179a948
    • jerryshao's avatar
      [Streaming][minor] Remove additional quote and unneeded imports · 83705505
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #5540 from jerryshao/minor-fix and squashes the following commits:
      
      ebaa646 [jerryshao] Minor fix
      83705505
    • Xiangrui Meng's avatar
      [SPARK-6893][ML] default pipeline parameter handling in python · 57cd1e86
      Xiangrui Meng authored
      Same as #5431 but for Python. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5534 from mengxr/SPARK-6893 and squashes the following commits:
      
      d3b519b [Xiangrui Meng] address comments
      ebaccc6 [Xiangrui Meng] style update
      fce244e [Xiangrui Meng] update explainParams with test
      4d6b07a [Xiangrui Meng] add tests
      5294500 [Xiangrui Meng] update default param handling in python
      57cd1e86
  2. Apr 15, 2015
    • Juliet Hougland's avatar
      SPARK-6938: All require statements now have an informative error message. · 52c3439a
      Juliet Hougland authored
      This pr adds informative error messages to all require statements in the Vectors class that did not previously have them. This references [SPARK-6938](https://issues.apache.org/jira/browse/SPARK-6938).
      
      Author: Juliet Hougland <juliet@cloudera.com>
      
      Closes #5532 from jhlch/SPARK-6938 and squashes the following commits:
      
      ab321bb [Juliet Hougland] Remove braces from string interpolation when not required.
      1221f94 [Juliet Hougland] All require statements now have an informative error message.
      52c3439a
    • Max Seiden's avatar
      [SPARK-5277][SQL] - SparkSqlSerializer doesn't always register user specified KryoRegistrators · 8a53de16
      Max Seiden authored
      [SPARK-5277][SQL] - SparkSqlSerializer doesn't always register user specified KryoRegistrators
      
      There were a few places where new SparkSqlSerializer instances were created with new, empty SparkConfs resulting in user specified registrators sometimes not getting initialized.
      
      The fix is to try and pull a conf from the SparkEnv, and construct a new conf (that loads defaults) if one cannot be found.
      
      The changes touched:
          1) SparkSqlSerializer's resource pool (this appears to fix the issue in the comment)
          2) execution.Exchange (for all of the partitioners)
          3) execution.Limit (for the HashPartitioner)
      
      A few tests were added to ColumnTypeSuite, ensuring that a custom registrator and serde is initialized and used when in-memory columns are written.
      
      Author: Max Seiden <max@platfora.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #5237 from mhseiden/sql_udt_kryo and squashes the following commits:
      
      3175c2f [Max Seiden] [SPARK-5277][SQL] - address code review comments
      e5011fb [Max Seiden] [SPARK-5277][SQL] - SparkSqlSerializer does not register user specified KryoRegistrators
      8a53de16
    • Isaias Barroso's avatar
      [SPARK-2312] Logging Unhandled messages · d5f1b965
      Isaias Barroso authored
      The  previous solution has changed based on https://github.com/apache/spark/pull/2048 discussions.
      
      Author: Isaias Barroso <isaias.barroso@gmail.com>
      
      Closes #2055 from isaias/SPARK-2312 and squashes the following commits:
      
      f61d9e6 [Isaias Barroso] Change Log level for unhandled message to debug
      f341777 [Isaias Barroso] [SPARK-2312] Logging Unhandled messages
      d5f1b965
    • Daoyuan Wang's avatar
      [SPARK-2213] [SQL] sort merge join for spark sql · 585638e8
      Daoyuan Wang authored
      Thanks for the initial work from Ishiihara in #3173
      
      This PR introduce a new join method of sort merge join, which firstly ensure that keys of same value are in the same partition, and inside each partition the Rows are sorted by key. Then we can run down both sides together, find matched rows using [sort merge join](http://en.wikipedia.org/wiki/Sort-merge_join). In this way, we don't have to store the whole hash table of one side as hash join, thus we have less memory usage. Also, this PR would benefit from #3438 , making the sorting phrase much more efficient.
      
      We introduced a new configuration of "spark.sql.planner.sortMergeJoin" to switch between this(`true`) and ShuffledHashJoin(`false`), probably we want the default value of it be `false` at first.
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #5208 from adrian-wang/smj and squashes the following commits:
      
      2493b9f [Daoyuan Wang] fix style
      5049d88 [Daoyuan Wang] propagate rowOrdering for RangePartitioning
      f91a2ae [Daoyuan Wang] yin's comment: use external sort if option is enabled, add comments
      f515cd2 [Daoyuan Wang] yin's comment: outputOrdering, join suite refine
      ec8061b [Daoyuan Wang] minor change
      413fd24 [Daoyuan Wang] Merge pull request #3 from marmbrus/pr/5208
      952168a [Michael Armbrust] add type
      5492884 [Michael Armbrust] copy when ordering
      7ddd656 [Michael Armbrust] Cleanup addition of ordering requirements
      b198278 [Daoyuan Wang] inherit ordering in project
      c8e82a3 [Daoyuan Wang] fix style
      6e897dd [Daoyuan Wang] hide boundReference from manually construct RowOrdering for key compare in smj
      8681d73 [Daoyuan Wang] refactor Exchange and fix copy for sorting
      2875ef2 [Daoyuan Wang] fix changed configuration
      61d7f49 [Daoyuan Wang] add omitted comment
      00a4430 [Daoyuan Wang] fix bug
      078d69b [Daoyuan Wang] address comments: add comments, do sort in shuffle, and others
      3af6ba5 [Daoyuan Wang] use buffer for only one side
      171001f [Daoyuan Wang] change default outputordering
      47455c9 [Daoyuan Wang] add apache license ...
      a28277f [Daoyuan Wang] fix style
      645c70b [Daoyuan Wang] address comments using sort
      068c35d [Daoyuan Wang] fix new style and add some tests
      925203b [Daoyuan Wang] address comments
      07ce92f [Daoyuan Wang] fix ArrayIndexOutOfBound
      42fca0e [Daoyuan Wang] code clean
      e3ec096 [Daoyuan Wang] fix comment style..
      2edd235 [Daoyuan Wang] fix outputpartitioning
      57baa40 [Daoyuan Wang] fix sort eval bug
      303b6da [Daoyuan Wang] fix several errors
      95db7ad [Daoyuan Wang] fix brackets for if-statement
      4464f16 [Daoyuan Wang] fix error
      880d8e9 [Daoyuan Wang] sort merge join for spark sql
      585638e8
    • Wenchen Fan's avatar
      [SPARK-6898][SQL] completely support special chars in column names · 4754e16f
      Wenchen Fan authored
      Even if we wrap column names in backticks like `` `a#$b.c` ``,  we still handle the "." inside column name specially. I think it's fragile to use a special char to split name parts, why not put name parts in `UnresolvedAttribute` directly?
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #5511 from cloud-fan/6898 and squashes the following commits:
      
      48e3e57 [Wenchen Fan] more style fix
      820dc45 [Wenchen Fan] do not ignore newName in UnresolvedAttribute
      d81ad43 [Wenchen Fan] fix style
      11699d6 [Wenchen Fan] completely support special chars in column names
      4754e16f
    • sboeschhuawei's avatar
      [SPARK-6937][MLLIB] Fixed bug in PICExample in which the radius were not being accepted on c... · 557a797a
      sboeschhuawei authored
       Tiny bug in PowerIterationClusteringExample in which radius not accepted from command line
      
      Author: sboeschhuawei <stephen.boesch@huawei.com>
      
      Closes #5531 from javadba/picsub and squashes the following commits:
      
      2aab8cf [sboeschhuawei] Fixed bug in PICExample in which the radius were not being accepted on command line
      557a797a
    • Liang-Chi Hsieh's avatar
      [SPARK-6844][SQL] Clean up accumulators used in InMemoryRelation when it is uncached · cf38fe04
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-6844
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5475 from viirya/cache_memory_leak and squashes the following commits:
      
      0b41235 [Liang-Chi Hsieh] fix style.
      dc1d5d5 [Liang-Chi Hsieh] For comments.
      78af229 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cache_memory_leak
      26c9bb6 [Liang-Chi Hsieh] Add configuration to enable in-memory table scan accumulators.
      1c3b06e [Liang-Chi Hsieh] Clean up accumulators used in InMemoryRelation when it is uncached.
      cf38fe04
    • Davies Liu's avatar
      [SPARK-6638] [SQL] Improve performance of StringType in SQL · 85842760
      Davies Liu authored
      This PR change the internal representation for StringType from java.lang.String to UTF8String, which is implemented use ArrayByte.
      
      This PR should not break any public API, Row.getString() will still return java.lang.String.
      
      This is the first step of improve the performance of String in SQL.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5350 from davies/string and squashes the following commits:
      
      3b7bfa8 [Davies Liu] fix schema of AddJar
      2772f0d [Davies Liu] fix new test failure
      6d776a9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
      59025c8 [Davies Liu] address comments from @marmbrus
      341ec2c [Davies Liu] turn off scala style check in UTF8StringSuite
      744788f [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
      b04a19c [Davies Liu] add comment for getString/setString
      08d897b [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
      5116b43 [Davies Liu] rollback unrelated changes
      1314a37 [Davies Liu] address comments from Yin
      867bf50 [Davies Liu] fix String filter push down
      13d9d42 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
      2089d24 [Davies Liu] add hashcode check back
      ac18ae6 [Davies Liu] address comment
      fd11364 [Davies Liu] optimize UTF8String
      8d17f21 [Davies Liu] fix hive compatibility tests
      e5fa5b8 [Davies Liu] remove clone in UTF8String
      28f3d81 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
      28d6f32 [Davies Liu] refactor
      537631c [Davies Liu] some comment about Date
      9f4c194 [Davies Liu] convert data type for data source
      956b0a4 [Davies Liu] fix hive tests
      73e4363 [Davies Liu] Merge branch 'master' of github.com:apache/spark into string
      9dc32d1 [Davies Liu] fix some hive tests
      23a766c [Davies Liu] refactor
      8b45864 [Davies Liu] fix codegen with UTF8String
      bb52e44 [Davies Liu] fix scala style
      c7dd4d2 [Davies Liu] fix some catalyst tests
      38c303e [Davies Liu] fix python sql tests
      5f9e120 [Davies Liu] fix sql tests
      6b499ac [Davies Liu] fix style
      a85fb27 [Davies Liu] refactor
      d32abd1 [Davies Liu] fix utf8 for python api
      4699c3a [Davies Liu] use Array[Byte] in UTF8String
      21f67c6 [Davies Liu] cleanup
      685fd07 [Davies Liu] use UTF8String instead of String for StringType
      85842760
    • Yin Huai's avatar
      [SPARK-6887][SQL] ColumnBuilder misses FloatType · 785f9558
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-6887
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5499 from yhuai/inMemFloat and squashes the following commits:
      
      84cba38 [Yin Huai] Add test.
      4b75ba6 [Yin Huai] Add FloatType back.
      785f9558
    • Liang-Chi Hsieh's avatar
      [SPARK-6800][SQL] Update doc for JDBCRelation's columnPartition · e3e4e9a3
      Liang-Chi Hsieh authored
      JIRA https://issues.apache.org/jira/browse/SPARK-6800
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5488 from viirya/fix_jdbc_where and squashes the following commits:
      
      51386c8 [Liang-Chi Hsieh] Update code comment.
      1dcc929 [Liang-Chi Hsieh] Update document.
      3eb74d6 [Liang-Chi Hsieh] Revert and modify doc.
      df11783 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_jdbc_where
      3e7db15 [Liang-Chi Hsieh] Fix wrong logic to generate WHERE clause for JDBC.
      e3e4e9a3
    • Liang-Chi Hsieh's avatar
      [SPARK-6730][SQL] Allow using keyword as identifier in OPTIONS · b75b3070
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-6730
      
      It is very possible that keyword will be used as identifier in `OPTIONS`, this pr makes it works.
      
      However, another approach is that we can request that `OPTIONS` can't include keywords and has to use alternative identifier (e.g. table -> cassandraTable) if needed.
      
      If so, please let me know to close this pr. Thanks.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5520 from viirya/relax_options and squashes the following commits:
      
      339fd68 [Liang-Chi Hsieh] Use regex parser.
      92be11c [Liang-Chi Hsieh] Allow using keyword as identifier in OPTIONS.
      b75b3070
    • Davies Liu's avatar
      [SPARK-6886] [PySpark] fix big closure with shuffle · f11288d5
      Davies Liu authored
      Currently, the created broadcast object will have same life cycle as RDD in Python. For multistage jobs, an PythonRDD will be created in JVM and the RDD in Python may be GCed, then the broadcast will be destroyed in JVM before the PythonRDD.
      
      This PR change to use PythonRDD to track the lifecycle of the broadcast object. It also have a refactor about getNumPartitions() to avoid unnecessary creation of PythonRDD, which could be heavy.
      
      cc JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5496 from davies/big_closure and squashes the following commits:
      
      9a0ea4c [Davies Liu] fix big closure with shuffle
      f11288d5
    • Sean Owen's avatar
      SPARK-6861 [BUILD] Scalastyle config prevents building Maven child modules alone · 6c5ed8a6
      Sean Owen authored
      Move scalastyle-config.xml to dev/ (SBT config still doesn't work) to fix running mvn targets from subdirs; make scalastyle a verify stage target again in Maven; output results in target not project root; update to scalastyle 0.7.0
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5471 from srowen/SPARK-6861 and squashes the following commits:
      
      acac637 [Sean Owen] Oops, add back execution but leave it at the default verify phase
      35a4fd2 [Sean Owen] Revert change to scalastyle-config.xml location, but return scalastyle Maven check to verify phase instead of package to get it farther out of the way, since the Maven invocation is optional
      c4fb42c [Sean Owen] Move scalastyle-config.xml to dev/ (SBT config still doesn't work) to fix running mvn targets from subdirs; make scalastyle a verify stage target again in Maven; output results in target not project root; update to scalastyle 0.7.0
      6c5ed8a6
    • Daoyuan Wang's avatar
      [HOTFIX] [SPARK-6896] [SQL] fix compile error in hive-thriftserver · 29aabdd6
      Daoyuan Wang authored
      SPARK-6440 #5424 import guava but did not promote guava dependency to compile level.
      
      [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
      [info] Compiling 8 Scala sources to /root/projects/spark/sql/hive-thriftserver/target/scala-2.10/classes...
      [error] bad symbolic reference. A signature in Utils.class refers to term util
      [error] in package com.google.common which is not available.
      [error] It may be completely missing from the current classpath, or the version on
      [error] the classpath might be incompatible with the version used when compiling Utils.class.
      [error]
      [error] while compiling: /root/projects/spark/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
      [error] during phase: erasure
      [error] library version: version 2.10.4
      [error] compiler version: version 2.10.4
      [error] reconstructed args: -deprecation -classpath
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #5507 from adrian-wang/guava and squashes the following commits:
      
      c337dad [Daoyuan Wang] fix compile error
      29aabdd6
    • Liang-Chi Hsieh's avatar
      [SPARK-6871][SQL] WITH clause in CTE can not following another WITH clause · 6be91894
      Liang-Chi Hsieh authored
      JIRA https://issues.apache.org/jira/browse/SPARK-6871
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5480 from viirya/no_cte_after_cte and squashes the following commits:
      
      4da3712 [Liang-Chi Hsieh] Create new test.
      40b38ed [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into no_cte_after_cte
      0edf568 [Liang-Chi Hsieh] for comments.
      6591b79 [Liang-Chi Hsieh] WITH clause in CTE can not following another WITH clause.
      6be91894
  3. Apr 14, 2015
    • Marcelo Vanzin's avatar
      [SPARK-5634] [core] Show correct message in HS when no incomplete apps f... · 30a6e0dc
      Marcelo Vanzin authored
      ...ound.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5515 from vanzin/SPARK-5634 and squashes the following commits:
      
      f74ecf1 [Marcelo Vanzin] [SPARK-5634] [core] Show correct message in HS when no incomplete apps found.
      30a6e0dc
    • Marcelo Vanzin's avatar
      [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES. · 97173893
      Marcelo Vanzin authored
      The fix for SPARK-6406 broke the case where sub-processes are launched
      when SPARK_PREPEND_CLASSES is set, because the code now would only add
      the launcher's build directory to the sub-process's classpath instead
      of the complete assembly.
      
      This patch fixes the problem by having the launch scripts stash the
      assembly's location in an environment variable. This is not the prettiest
      solution, but it avoids having to plumb that location all the way through
      the Worker code that launches executors. The env variable is always
      set by the launch scripts, so users cannot override it.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5504 from vanzin/SPARK-6890 and squashes the following commits:
      
      7aec921 [Marcelo Vanzin] Fix tests.
      ff87a60 [Marcelo Vanzin] Merge branch 'master' into SPARK-6890
      31d3ce8 [Marcelo Vanzin] [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.
      97173893
    • zsxwing's avatar
      [SPARK-6796][Streaming][WebUI] Add "Active Batches" and "Completed Batches" lists to StreamingPage · 6de282e2
      zsxwing authored
      This PR adds two lists, `Active Batches` and `Completed Batches`. Here is the screenshot:
      
      ![batch_list](https://cloud.githubusercontent.com/assets/1000778/7060458/d8898572-deb3-11e4-938b-6f8602c71a9f.png)
      
      Due to [SPARK-6766](https://issues.apache.org/jira/browse/SPARK-6766), I need to merge #5414 in my local machine to get the above screenshot.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5434 from zsxwing/SPARK-6796 and squashes the following commits:
      
      be50fc6 [zsxwing] Fix the code style
      51b792e [zsxwing] Fix the unit test
      6f3078e [zsxwing] Make 'startTime' readable
      f40e0a9 [zsxwing] Merge branch 'master' into SPARK-6796
      2525336 [zsxwing] Rename 'Processed batches' and 'Waiting batches' and also add links
      a69c091 [zsxwing] Show the number of total completed batches too
      a12ad7b [zsxwing] Change 'records' to 'events' in the UI
      86b5e7f [zsxwing] Make BatchTableBase abstract
      b248787 [zsxwing] Add tests to verify the new tables
      d18ab7d [zsxwing] Fix the code style
      6ceffb3 [zsxwing] Add "Active Batches" and "Completed Batches" lists to StreamingPage
      6de282e2
    • Josh Rosen's avatar
      Revert "[SPARK-6352] [SQL] Add DirectParquetOutputCommitter" · a76b921a
      Josh Rosen authored
      This reverts commit b29663ee.
      
      I'm reverting this because it broke test compilation for the Hadoop 1.x
      profiles.
      a76b921a
    • Kousuke Saruta's avatar
      [SPARK-6769][YARN][TEST] Usage of the ListenerBus in YarnClusterSuite is wrong · 4d4b2492
      Kousuke Saruta authored
      In YarnClusterSuite, a test case uses `SaveExecutorInfo`  to handle ExecutorAddedEvent as follows.
      
      ```
      private class SaveExecutorInfo extends SparkListener {
        val addedExecutorInfos = mutable.Map[String, ExecutorInfo]()
      
        override def onExecutorAdded(executor: SparkListenerExecutorAdded) {
          addedExecutorInfos(executor.executorId) = executor.executorInfo
        }
      }
      
      ...
      
          listener = new SaveExecutorInfo
          val sc = new SparkContext(new SparkConf()
            .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and $dollarSigns"))
          sc.addSparkListener(listener)
          val status = new File(args(0))
          var result = "failure"
          try {
            val data = sc.parallelize(1 to 4, 4).collect().toSet
            assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))
            data should be (Set(1, 2, 3, 4))
            result = "success"
          } finally {
            sc.stop()
            Files.write(result, status, UTF_8)
          }
      ```
      
      But, the usage is wrong because Executors will spawn during initializing SparkContext and SparkContext#addSparkListener should be invoked after the initialization, thus after Executors spawn, so SaveExecutorInfo cannot handle ExecutorAddedEvent.
      
      Following code refers the result of the handling ExecutorAddedEvent. Because of the reason above, we cannot reach the assertion.
      
      ```
          // verify log urls are present
          listener.addedExecutorInfos.values.foreach { info =>
            assert(info.logUrlMap.nonEmpty)
          }
      ```
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #5417 from sarutak/SPARK-6769 and squashes the following commits:
      
      8adc8ba [Kousuke Saruta] Fixed compile error
      e258530 [Kousuke Saruta] Fixed style
      591cf3e [Kousuke Saruta] Fixed style
      48ec89a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6769
      860c965 [Kousuke Saruta] Simplified code
      207d325 [Kousuke Saruta] Added findListenersByClass method to ListenerBus
      2408c84 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6769
      2d7e409 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6769
      3874adf [Kousuke Saruta] Fixed the usage of listener bus in LogUrlsStandaloneSuite
      153a91b [Kousuke Saruta] Fixed the usage of listener bus in YarnClusterSuite
      4d4b2492
    • Marcelo Vanzin's avatar
      [SPARK-5808] [build] Package pyspark files in sbt assembly. · 65774370
      Marcelo Vanzin authored
      This turned out to be more complicated than I wanted because the
      layout of python/ doesn't really follow the usual maven conventions.
      So some extra code is needed to copy just the right things.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5461 from vanzin/SPARK-5808 and squashes the following commits:
      
      7153dac [Marcelo Vanzin] Only try to create resource dir if it doesn't already exist.
      ee90e84 [Marcelo Vanzin] [SPARK-5808] [build] Package pyspark files in sbt assembly.
      65774370
    • Josh Rosen's avatar
      [SPARK-6905] Upgrade to snappy-java 1.1.1.7 · 6adb8bcb
      Josh Rosen authored
      We should upgrade our snappy-java dependency to 1.1.1.7 in order to include a fix for a bug that results in worse compression in SnappyOutputStream (see https://github.com/xerial/snappy-java/issues/100).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:
      
      f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.
      6adb8bcb
    • Marcelo Vanzin's avatar
      [SPARK-6700] [yarn] Re-enable flaky test. · b075e4b7
      Marcelo Vanzin authored
      Test runs have been successful on jenkins. So let's re-enable the test and look out for any failures, and fix things appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5459 from vanzin/SPARK-6700 and squashes the following commits:
      
      2ead85b [Marcelo Vanzin] WIP: re-enable flaky test to catch failure in jenkins.
      b075e4b7
    • CodingCat's avatar
      SPARK-1706: Allow multiple executors per worker in Standalone mode · 8f8dc45f
      CodingCat authored
      resubmit of https://github.com/apache/spark/pull/636  for a totally different algorithm
      
      https://issues.apache.org/jira/browse/SPARK-1706
      
      In current implementation, the user has to start multiple workers in a server for starting multiple executors in a server, which introduces additional overhead due to the more JVM processes...
      
      In this patch, I changed the scheduling logic in master to enable the user to start multiple executor processes within the same JVM process.
      
      1. user configure spark.executor.maxCoreNumPerExecutor to suggest the maximum core he/she would like to allocate to each executor
      
      2. Master assigns the executors to the workers with the major consideration on the memoryPerExecutor and the worker.freeMemory, and tries to allocate as many as possible cores to the executor ```min(min(memoryPerExecutor, worker.freeCore), maxLeftCoreToAssign)``` where ```maxLeftCoreToAssign = maxExecutorCanAssign * maxCoreNumPerExecutor```
      
      ---------------------------------------
      
      Other small changes include
      
      change memoryPerSlave in ApplicationDescription to memoryPerExecutor, as "Slave" is overrided to represent both worker and executor in the documents... (we have some discussion on this before?)
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #731 from CodingCat/SPARK-1706-2 and squashes the following commits:
      
      6dee808 [CodingCat] change filter predicate
      fbeb7e5 [CodingCat] address the comments
      940cb42 [CodingCat] avoid unnecessary allocation
      b8ca561 [CodingCat] revert a change
      45967b4 [CodingCat] remove unused method
      2eeff77 [CodingCat] stylistic fixes
      12a1b32 [CodingCat] change the semantic of coresPerExecutor to exact core number
      f035423 [CodingCat] stylistic fix
      d9c1685 [CodingCat] remove unused var
      f595bd6 [CodingCat] recover some unintentional changes
      63b3df9 [CodingCat] change the description of the parameter in the submit script
      4cf61f1 [CodingCat] improve the code and docs
      ff011e2 [CodingCat] start multiple executors on the worker by rewriting startExeuctor logic
      2c2bcc5 [CodingCat] fix wrong usage info
      497ec2c [CodingCat] address andrew's comments
      878402c [CodingCat] change the launching executor code
      f64a28d [CodingCat] typo fix
      387f4ec [CodingCat] bug fix
      35c462c [CodingCat] address Andrew's comments
      0b64fea [CodingCat] fix compilation issue
      19d3da7 [CodingCat] address the comments
      5b81466 [CodingCat] remove outdated comments
      ec7d421 [CodingCat] test commit
      e5efabb [CodingCat] more java docs and consolidate canUse function
      a26096d [CodingCat] stylistic fix
      a5d629a [CodingCat] java doc
      b34ec0c [CodingCat] make master support multiple executors per worker
      8f8dc45f
    • GuoQiang Li's avatar
      [SPARK-2033] Automatically cleanup checkpoint · 25998e4d
      GuoQiang Li authored
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #855 from witgo/cleanup_checkpoint_date and squashes the following commits:
      
      1649850 [GuoQiang Li] review commit
      c0087e0 [GuoQiang Li] Automatically cleanup checkpoint
      25998e4d
    • pankaj arora's avatar
      [CORE] SPARK-6880: Fixed null check when all the dependent stages are... · dcf8a9f3
      pankaj arora authored
      [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure
      
      Fixed null check when all the dependent stages are cancelled due to previous stage failure. This happens when one of the executor node goes down and all the dependent stages are cancelled.
      
      Author: pankaj arora <pankaj.arora@guavus.com>
      
      Closes #5494 from pankajarora12/NEWBRANCH and squashes the following commits:
      
      55ba5e3 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure
      4575720 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure
      dcf8a9f3
    • WangTaoTheTonic's avatar
      [SPARK-6894]spark.executor.extraLibraryOptions => spark.executor.extraLibraryPath · f63b44a5
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-6894
      
      cc vanzin
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #5506 from WangTaoTheTonic/SPARK-6894 and squashes the following commits:
      
      4b7ced7 [WangTaoTheTonic] spark.executor.extraLibraryOptions => spark.executor.extraLibraryPath
      f63b44a5
    • Timothy Chen's avatar
      [SPARK-6081] Support fetching http/https uris in driver runner. · 320bca45
      Timothy Chen authored
      Currently if passed uris such as http/https, it won't able to fetch them as it only calls HadoopFs get.
      This fix utilizes the existing util method to fetch remote uris as well.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #4832 from tnachen/driver_remote and squashes the following commits:
      
      aa52cd6 [Timothy Chen] Support fetching remote uris in driver runner.
      320bca45
    • Erik van Oosten's avatar
      SPARK-6878 [CORE] Fix for sum on empty RDD fails with exception · 51b306b9
      Erik van Oosten authored
      Author: Erik van Oosten <evanoosten@ebay.com>
      
      Closes #5489 from erikvanoosten/master and squashes the following commits:
      
      1c91954 [Erik van Oosten] Rewrote double range matcher to an exact equality assert (SPARK-6878)
      f1708c9 [Erik van Oosten] Fix for sum on empty RDD fails with exception (SPARK-6878)
      51b306b9
    • Punyashloka Biswal's avatar
      [SPARK-6731] Bump version of apache commons-math3 · 628a72f7
      Punyashloka Biswal authored
      Version 3.1.1 is two years old and the newer version includes
      approximate percentile statistics (among other things).
      
      Author: Punyashloka Biswal <punya.biswal@gmail.com>
      
      Closes #5380 from punya/patch-1 and squashes the following commits:
      
      226622b [Punyashloka Biswal] Bump version of apache commons-math3
      628a72f7
    • Brennon York's avatar
      [WIP][HOTFIX][SPARK-4123]: Fix bug in PR dependency (all deps. removed issue) · 77eeb10f
      Brennon York authored
      We're seeing a bug sporadically in the new PR dependency comparison test whereby it notes that *all* dependencies are removed. This happens when the current PR is built, but the final, sorted, dependency file is left blank. I believe this is an error either in the way the `git checkout` calls have been or an error within the `mvn` build for that PR (again, likely related to the `git checkout`). As such I've set the checkouts to now force (with `-f` flag) which is more in line with what Jenkins currently does on the initial checkout.
      
      Setting this as a WIP for now to trigger the build process myriad times to see if the issue still arises.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5443 from brennonyork/HOTFIX2-SPARK-4123 and squashes the following commits:
      
      f2186be [Brennon York] added output for the various git commit refs
      3f073d6 [Brennon York] removed the git checkouts piping to dev null
      07765a6 [Brennon York] updated the diff logic to reference the filenames rather than hardlink
      e3f63c7 [Brennon York] added '-f' to the checkout flags for git
      710c8d1 [Brennon York] added 30 minutes to the test benchmark
      77eeb10f
  4. Apr 13, 2015
    • Xiangrui Meng's avatar
      [SPARK-5957][ML] better handling of parameters · 971b95b0
      Xiangrui Meng authored
      The design doc was posted on the JIRA page. Python changes will be in a follow-up PR. jkbradley
      
      1. Use codegen for shared params.
      1. Move shared params to package `ml.param.shared`.
      1. Set default values in `Params` instead of in `Param`.
      1. Add a few methods to `Params` and `ParamMap`.
      1. Move schema handling to `SchemaUtils` from `Params`.
      
      - [x] check visibility of the methods added
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5431 from mengxr/SPARK-5957 and squashes the following commits:
      
      d19236d [Xiangrui Meng] fix test
      26ae2d7 [Xiangrui Meng] re-gen code and mark clear protected
      38b78c7 [Xiangrui Meng] update Param.toString and remove Params.explain()
      409e2d5 [Xiangrui Meng] address comments
      2d637bd [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5957
      eec2264 [Xiangrui Meng] make get* public in Params
      4090d95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5957
      4fee9e7 [Xiangrui Meng] re-gen shared params
      2737c2d [Xiangrui Meng] rename SharedParamCodeGen to SharedParamsCodeGen
      e938f81 [Xiangrui Meng] update code to set default parameter values
      28ed322 [Xiangrui Meng] merge master
      55be1f3 [Xiangrui Meng] merge master
      d63b5cc [Xiangrui Meng] fix examples
      29b004c [Xiangrui Meng] update ParamsSuite
      94fd98e [Xiangrui Meng] fix explain params
      48d0e84 [Xiangrui Meng] add remove and update explainParams
      4ac6348 [Xiangrui Meng] move schema utils to SchemaUtils add a few methods to Params
      0d9594e [Xiangrui Meng] add getOrElse to ParamMap
      eeeffe8 [Xiangrui Meng] map ++ paramMap => extractValues
      0d3fc5b [Xiangrui Meng] setDefault after param
      a9dbf59 [Xiangrui Meng] minor updates
      d9302b8 [Xiangrui Meng] generate default values
      1c72579 [Xiangrui Meng] pass test compile
      abb7a3b [Xiangrui Meng] update default values handling
      dcab97a [Xiangrui Meng] add codegen for shared params
      971b95b0
    • hlin09's avatar
      [Minor][SparkR] Minor refactor and removes redundancy related to cleanClosure. · 0ba3fdd5
      hlin09 authored
      1. Only use `cleanClosure` in creation of RRDDs. Normally, user and developer do not need to call `cleanClosure` in their function definition.
      2. Removes redundant code (e.g. unnecessary wrapper functions) related to `cleanClosure`.
      
      Author: hlin09 <hlin09pu@gmail.com>
      
      Closes #5495 from hlin09/cleanClosureFix and squashes the following commits:
      
      74ec303 [hlin09] Minor refactor and removes redundancy.
      0ba3fdd5
Loading