Skip to content
Snippets Groups Projects
  1. May 15, 2014
    • Patrick Wendell's avatar
      HOTFIX: Don't build Javadoc in Maven when creating releases. · 514157f2
      Patrick Wendell authored
      Because we've added java package descriptions in some packages that don't
      have any Java files, running the Javadoc target hits this issue:
      
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4492654
      
      To fix this I've simply removed the javadoc target when publishing
      releases.
      514157f2
    • witgo's avatar
      fix different versions of commons-lang dependency and apache/spark#746 addendum · bae07e36
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #754 from witgo/commons-lang and squashes the following commits:
      
      3ebab31 [witgo] merge master
      f3b8fa2 [witgo] merge master
      2083fae [witgo] repeat definition
      5599cdb [witgo] multiple version of sbt  dependency
      c1b66a1 [witgo] fix different versions of commons-lang dependency
      bae07e36
    • Prashant Sharma's avatar
      Package docs · 46324279
      Prashant Sharma authored
      This is a few changes based on the original patch by @scrapcodes.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #785 from pwendell/package-docs and squashes the following commits:
      
      c32b731 [Patrick Wendell] Changes based on Prashant's patch
      c0463d3 [Prashant Sharma] added eof new line
      ce8bf73 [Prashant Sharma] Added eof new line to all files.
      4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs
      46324279
    • Patrick Wendell's avatar
      Documentation: Encourage use of reduceByKey instead of groupByKey. · 21570b46
      Patrick Wendell authored
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #784 from pwendell/group-by-key and squashes the following commits:
      
      9b4505f [Patrick Wendell] Small fix
      6347924 [Patrick Wendell] Documentation: Encourage use of reduceByKey instead of groupByKey.
      21570b46
  2. May 14, 2014
    • Matei Zaharia's avatar
      Add language tabs and Python version to interactive part of quick-start · f10de042
      Matei Zaharia authored
      This is an addition of some stuff that was missed in https://issues.apache.org/jira/browse/SPARK-1567. I've also updated the doc to show submitting the Python application with spark-submit.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #782 from mateiz/spark-1567-extra and squashes the following commits:
      
      6f8f2aa [Matei Zaharia] tweaks
      9ed9874 [Matei Zaharia] tweaks
      ae67c3e [Matei Zaharia] tweak
      b303ba3 [Matei Zaharia] tweak
      1433a4d [Matei Zaharia] Add language tabs and Python version to interactive part of quick-start guide
      f10de042
    • Tathagata Das's avatar
      [SPARK-1840] SparkListenerBus prints out scary error message when terminated normally · ad4e60ee
      Tathagata Das authored
      Running SparkPi example gave this error.
      ```
      Pi is roughly 3.14374
      14/05/14 18:16:19 ERROR Utils: Uncaught exception in thread SparkListenerBus
      scala.runtime.NonLocalReturnControl$mcV$sp
      ```
      This is due to the catch-all in the SparkListenerBus, which logged control throwable used by scala system
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #783 from tdas/controlexception-fix and squashes the following commits:
      
      a466c8d [Tathagata Das] Ignored control exceptions when logging all exceptions.
      ad4e60ee
    • Chen Chao's avatar
      default task number misleading in several places · 2f639957
      Chen Chao authored
        private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
          new HashPartitioner(numPartitions)
        }
      
      it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism
      
      the property "spark.default.parallelism" refers to https://github.com/apache/spark/pull/389
      
      Author: Chen Chao <crazyjvm@gmail.com>
      
      Closes #766 from CrazyJvm/patch-7 and squashes the following commits:
      
      0b7efba [Chen Chao] Update streaming-programming-guide.md
      cc5b66c [Chen Chao] default task number misleading in several places
      2f639957
    • wangfei's avatar
      [SPARK-1826] fix the head notation of package object dsl · 44165fc9
      wangfei authored
      Author: wangfei <scnbwf@yeah.net>
      
      Closes #765 from scwf/dslfix and squashes the following commits:
      
      d2d1a9d [wangfei] Update package.scala
      66ff53b [wangfei] fix the head notation of package object dsl
      44165fc9
    • andrewor14's avatar
      [Typo] propertes -> properties · 9ad096d5
      andrewor14 authored
      Author: andrewor14 <andrewor14@gmail.com>
      
      Closes #780 from andrewor14/submit-typo and squashes the following commits:
      
      e70e057 [andrewor14] propertes -> properties
      9ad096d5
    • Xiangrui Meng's avatar
      [SPARK-1696][MLLIB] use alpha in dense dspr · e3d72a74
      Xiangrui Meng authored
      It doesn't affect existing code because only `alpha = 1.0` is used in the code.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #778 from mengxr/mllib-dspr-fix and squashes the following commits:
      
      a37402e [Xiangrui Meng] use alpha in dense dspr
      e3d72a74
    • Jacek Laskowski's avatar
      String interpolation + some other small changes · 601e3719
      Jacek Laskowski authored
      After having been invited to make the change in https://github.com/apache/spark/commit/6bee01dd04ef73c6b829110ebcdd622d521ea8ff#commitcomment-6284165 by @witgo.
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #748 from jaceklaskowski/sparkenv-string-interpolation and squashes the following commits:
      
      be6ebac [Jacek Laskowski] String interpolation + some other small changes
      601e3719
    • Xiangrui Meng's avatar
      [FIX] do not load defaults when testing SparkConf in pyspark · 94c6c06e
      Xiangrui Meng authored
      The default constructor loads default properties, which can fail the test.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #775 from mengxr/pyspark-conf-fix and squashes the following commits:
      
      83ef6c4 [Xiangrui Meng] do not load defaults when testing SparkConf in pyspark
      94c6c06e
    • Patrick Wendell's avatar
      SPARK-1833 - Have an empty SparkContext constructor. · 65533c7e
      Patrick Wendell authored
      This is nicer than relying on new SparkContext(new SparkConf())
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #774 from pwendell/spark-context and squashes the following commits:
      
      ef9f12f [Patrick Wendell] SPARK-1833 - Have an empty SparkContext constructor.
      65533c7e
    • Andrew Ash's avatar
      SPARK-1829 Sub-second durations shouldn't round to "0 s" · a3315d7f
      Andrew Ash authored
      As "99 ms" up to 99 ms
      As "0.1 s" from 0.1 s up to 0.9 s
      
      https://issues.apache.org/jira/browse/SPARK-1829
      
      Compare the first image to the second here: http://imgur.com/RaLEsSZ,7VTlgfo#0
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #768 from ash211/spark-1829 and squashes the following commits:
      
      1c15b8e [Andrew Ash] SPARK-1829 Format sub-second durations more appropriately
      a3315d7f
    • witgo's avatar
      Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space · fde82c15
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #773 from witgo/sbt_javaOptions and squashes the following commits:
      
      26c7d38 [witgo] Improve sbt configuration
      fde82c15
    • Mark Hamstra's avatar
      [SPARK-1620] Handle uncaught exceptions in function run by Akka scheduler · 17f3075b
      Mark Hamstra authored
      If the intended behavior was that uncaught exceptions thrown in functions being run by the Akka scheduler would end up being handled by the default uncaught exception handler set in Executor, and if that behavior is, in fact, correct, then this is a way to accomplish that.  I'm not certain, though, that we shouldn't be doing something different to handle uncaught exceptions from some of these scheduled functions.
      
      In any event, this PR covers all of the cases I comment on in [SPARK-1620](https://issues.apache.org/jira/browse/SPARK-1620).
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      
      Closes #622 from markhamstra/SPARK-1620 and squashes the following commits:
      
      071d193 [Mark Hamstra] refactored post-SPARK-1772
      1a6a35e [Mark Hamstra] another style fix
      d30eb94 [Mark Hamstra] scalastyle
      3573ecd [Mark Hamstra] Use wrapped try/catch in Utils.tryOrExit
      8fc0439 [Mark Hamstra] Make functions run by the Akka scheduler use Executor's UncaughtExceptionHandler
      17f3075b
    • Patrick Wendell's avatar
      SPARK-1828: Created forked version of hive-exec that doesn't bundle other dependencies · d58cb33f
      Patrick Wendell authored
      See https://issues.apache.org/jira/browse/SPARK-1828 for more information.
      
      This is being submitted to Jenkin's for testing. The dependency won't fully
      propagate in Maven central for a few more hours.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #767 from pwendell/hive-shaded and squashes the following commits:
      
      ea10ac5 [Patrick Wendell] SPARK-1828: Created forked version of hive-exec that doesn't bundle other dependencies
      d58cb33f
    • Andrew Ash's avatar
      SPARK-1818 Freshen Mesos documentation · d1d41cce
      Andrew Ash authored
      Place more emphasis on using precompiled binary versions of Spark and Mesos
      instead of encouraging the reader to compile from source.
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #756 from ash211/spark-1818 and squashes the following commits:
      
      7ef3b33 [Andrew Ash] Brief explanation of the interactions between Spark and Mesos
      e7dea8e [Andrew Ash] Add troubleshooting and debugging section
      956362d [Andrew Ash] Don't need to pass spark.executor.uri into the spark shell
      de3353b [Andrew Ash] Wrap to 100char
      7ebf6ef [Andrew Ash] Polish on the section on Mesos Master URLs
      3dcc2c1 [Andrew Ash] Use --tgz parameter of make-distribution
      41b68ed [Andrew Ash] Period at end of sentence; formatting on :5050
      8bf2c53 [Andrew Ash] Update site.MESOS_VERSIOn to match /pom.xml
      74f2040 [Andrew Ash] SPARK-1818 Freshen Mesos documentation
      d1d41cce
    • Sean Owen's avatar
      SPARK-1827. LICENSE and NOTICE files need a refresh to contain transitive dependency info · 2e5a7cde
      Sean Owen authored
      LICENSE and NOTICE policy is explained here:
      
      http://www.apache.org/dev/licensing-howto.html
      http://www.apache.org/legal/3party.html
      
      This leads to the following changes.
      
      First, this change enables two extensions to maven-shade-plugin in assembly/ that will try to include and merge all NOTICE and LICENSE files. This can't hurt.
      
      This generates a consolidated NOTICE file that I manually added to NOTICE.
      
      Next, a list of all dependencies and their licenses was generated:
      `mvn ... license:aggregate-add-third-party`
      to create: `target/generated-sources/license/THIRD-PARTY.txt`
      
      Each dependency is listed with one or more licenses. Determine the most-compatible license for each if there is more than one.
      
      For "unknown" license dependencies, I manually evaluateD their license. Many are actually Apache projects or components of projects covered already. The only non-trivial one was Colt, which has its own (compatible) license.
      
      I ignored Apache-licensed and public domain dependencies as these require no further action (beyond NOTICE above).
      
      BSD and MIT licenses (permissive Category A licenses) are evidently supposed to be mentioned in LICENSE, so I added a section without output from the THIRD-PARTY.txt file appropriately.
      
      Everything else, Category B licenses, are evidently mentioned in NOTICE (?) Same there.
      
      LICENSE contained some license statements for source code that is redistributed. I left this as I think that is the right place to put it.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #770 from srowen/SPARK-1827 and squashes the following commits:
      
      a764504 [Sean Owen] Add LICENSE and NOTICE info for all transitive dependencies as of 1.0
      2e5a7cde
    • Tathagata Das's avatar
      Fixed streaming examples docs to use run-example instead of spark-submit · 68f28dab
      Tathagata Das authored
      Pretty self-explanatory
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #722 from tdas/example-fix and squashes the following commits:
      
      7839979 [Tathagata Das] Minor changes.
      0673441 [Tathagata Das] Fixed java docs of java streaming example
      e687123 [Tathagata Das] Fixed scala style errors.
      9b8d112 [Tathagata Das] Fixed streaming examples docs to use run-example instead of spark-submit.
      68f28dab
    • Andrew Or's avatar
      [SPARK-1769] Executor loss causes NPE race condition · 69f75022
      Andrew Or authored
      This PR replaces the Schedulable data structures in Pool.scala with thread-safe ones from java. Note that Scala's `with SynchronizedBuffer` trait is soon to be deprecated in 2.11 because it is ["inherently unreliable"](http://www.scala-lang.org/api/2.11.0/index.html#scala.collection.mutable.SynchronizedBuffer). We should slowly drift away from `SynchronizedBuffer` in other places too.
      
      Note that this PR introduces an API-breaking change; `sc.getAllPools` now returns an Array rather than an ArrayBuffer. This is because we want this method to return an immutable copy rather than one may potentially confuse the user if they try to modify the copy, which takes no effect on the original data structure.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #762 from andrewor14/pool-npe and squashes the following commits:
      
      383e739 [Andrew Or] JavaConverters -> JavaConversions
      3f32981 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
      769be19 [Andrew Or] Assorted minor changes
      2189247 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pool-npe
      05ad9e9 [Andrew Or] Fix test - contains is not the same as containsKey
      0921ea0 [Andrew Or] var -> val
      07d720c [Andrew Or] Synchronize Schedulable data structures
      69f75022
    • Marcelo Vanzin's avatar
      Fix dep exclusion: avro-ipc, not avro, depends on netty. · 54ae8328
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #763 from vanzin/netty-dep-hell and squashes the following commits:
      
      dfb6ce2 [Marcelo Vanzin] Fix dep exclusion: avro-ipc, not avro, depends on netty.
      54ae8328
    • Koert Kuipers's avatar
      SPARK-1801. expose InterruptibleIterator and TaskKilledException in deve... · b22952fa
      Koert Kuipers authored
      ...loper api
      
      Author: Koert Kuipers <koert@tresata.com>
      
      Closes #764 from koertkuipers/feat-rdd-developerapi and squashes the following commits:
      
      8516dd2 [Koert Kuipers] SPARK-1801. expose InterruptibleIterator and TaskKilledException in developer api
      b22952fa
    • Michael Armbrust's avatar
      [SQL] Improve column pruning. · 6ce08844
      Michael Armbrust authored
      Fixed a bug that was preventing us from ever pruning beneath Joins.
      
      ## TPC-DS Q3
      ### Before:
      ```
      Aggregate false, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand_id#64 AS brand_id#0,i_brand#65 AS brand#1,SUM(PartialSum#79) AS sum_agg#2]
       Exchange (HashPartitioning [d_year#12:0,i_brand#65:1,i_brand_id#64:2], 150)
        Aggregate true, [d_year#12,i_brand#65,i_brand_id#64], [d_year#12,i_brand#65,i_brand_id#64,SUM(CAST(ss_ext_sales_price#49, DoubleType)) AS PartialSum#79]
         Project [d_year#12:6,i_brand#65:59,i_brand_id#64:58,ss_ext_sales_price#49:43]
          HashJoin [ss_item_sk#36], [i_item_sk#57], BuildRight
           Exchange (HashPartitioning [ss_item_sk#36:30], 150)
            HashJoin [d_date_sk#6], [ss_sold_date_sk#34], BuildRight
             Exchange (HashPartitioning [d_date_sk#6:0], 150)
              Filter (d_moy#14:8 = 12)
               HiveTableScan [d_date_sk#6,d_date_id#7,d_date#8,d_month_seq#9,d_week_seq#10,d_quarter_seq#11,d_year#12,d_dow#13,d_moy#14,d_dom#15,d_qoy#16,d_fy_year#17,d_fy_quarter_seq#18,d_fy_week_seq#19,d_day_name#20,d_quarter_name#21,d_holiday#22,d_weekend#23,d_following_holiday#24,d_first_dom#25,d_last_dom#26,d_same_day_ly#27,d_same_day_lq#28,d_current_day#29,d_current_week#30,d_current_month#31,d_current_quarter#32,d_current_year#33], (MetastoreRelation default, date_dim, Some(dt)), None
             Exchange (HashPartitioning [ss_sold_date_sk#34:0], 150)
              HiveTableScan [ss_sold_date_sk#34,ss_sold_time_sk#35,ss_item_sk#36,ss_customer_sk#37,ss_cdemo_sk#38,ss_hdemo_sk#39,ss_addr_sk#40,ss_store_sk#41,ss_promo_sk#42,ss_ticket_number#43,ss_quantity#44,ss_wholesale_cost#45,ss_list_price#46,ss_sales_price#47,ss_ext_discount_amt#48,ss_ext_sales_price#49,ss_ext_wholesale_cost#50,ss_ext_list_price#51,ss_ext_tax#52,ss_coupon_amt#53,ss_net_paid#54,ss_net_paid_inc_tax#55,ss_net_profit#56], (MetastoreRelation default, store_sales, None), None
           Exchange (HashPartitioning [i_item_sk#57:0], 150)
            Filter (i_manufact_id#70:13 = 436)
             HiveTableScan [i_item_sk#57,i_item_id#58,i_rec_start_date#59,i_rec_end_date#60,i_item_desc#61,i_current_price#62,i_wholesale_cost#63,i_brand_id#64,i_brand#65,i_class_id#66,i_class#67,i_category_id#68,i_category#69,i_manufact_id#70,i_manufact#71,i_size#72,i_formulation#73,i_color#74,i_units#75,i_container#76,i_manager_id#77,i_product_name#78], (MetastoreRelation default, item, None), None
      ```
      ### After
      ```
      Aggregate false, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand_id#224 AS brand_id#160,i_brand#225 AS brand#161,SUM(PartialSum#239) AS sum_agg#162]
       Exchange (HashPartitioning [d_year#172:0,i_brand#225:1,i_brand_id#224:2], 150)
        Aggregate true, [d_year#172,i_brand#225,i_brand_id#224], [d_year#172,i_brand#225,i_brand_id#224,SUM(CAST(ss_ext_sales_price#209, DoubleType)) AS PartialSum#239]
         Project [d_year#172:1,i_brand#225:5,i_brand_id#224:3,ss_ext_sales_price#209:0]
          HashJoin [ss_item_sk#196], [i_item_sk#217], BuildRight
           Exchange (HashPartitioning [ss_item_sk#196:2], 150)
            Project [ss_ext_sales_price#209:2,d_year#172:1,ss_item_sk#196:3]
             HashJoin [d_date_sk#166], [ss_sold_date_sk#194], BuildRight
              Exchange (HashPartitioning [d_date_sk#166:0], 150)
               Project [d_date_sk#166:0,d_year#172:1]
                Filter (d_moy#174:2 = 12)
                 HiveTableScan [d_date_sk#166,d_year#172,d_moy#174], (MetastoreRelation default, date_dim, Some(dt)), None
              Exchange (HashPartitioning [ss_sold_date_sk#194:2], 150)
               HiveTableScan [ss_ext_sales_price#209,ss_item_sk#196,ss_sold_date_sk#194], (MetastoreRelation default, store_sales, None), None
           Exchange (HashPartitioning [i_item_sk#217:1], 150)
            Project [i_brand_id#224:0,i_item_sk#217:1,i_brand#225:2]
             Filter (i_manufact_id#230:3 = 436)
              HiveTableScan [i_brand_id#224,i_item_sk#217,i_brand#225,i_manufact_id#230], (MetastoreRelation default, item, None), None
      ```
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #729 from marmbrus/fixPruning and squashes the following commits:
      
      5feeff0 [Michael Armbrust] Improve column pruning.
      6ce08844
    • Patrick Wendell's avatar
  3. May 13, 2014
    • larvaboy's avatar
      Implement ApproximateCountDistinct for SparkSql · c33b8dcb
      larvaboy authored
      Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1) counting the number of distinct elements in each partitions, and 2) merge the HyperLogLog results from different partitions.
      
      A simple serializer and test cases are added as well.
      
      Author: larvaboy <larvaboy@gmail.com>
      
      Closes #737 from larvaboy/master and squashes the following commits:
      
      bd8ef3f [larvaboy] Add support of user-provided standard deviation to ApproxCountDistinct.
      9ba8360 [larvaboy] Fix alignment and null handling issues.
      95b4067 [larvaboy] Add a test case for count distinct and approximate count distinct.
      f57917d [larvaboy] Add the parser for the approximate count.
      a2d5d10 [larvaboy] Add ApproximateCountDistinct aggregates and functions.
      7ad273a [larvaboy] Add SparkSql serializer for HyperLogLog.
      1d9aacf [larvaboy] Fix a minor typo in the toString method of the Count case class.
      653542b [larvaboy] Fix a couple of minor typos.
      c33b8dcb
    • Syed Hashmi's avatar
      [SPARK-1784] Add a new partitioner to allow specifying # of keys per partition · 92cebada
      Syed Hashmi authored
      This change adds a new partitioner which allows users
      to specify # of keys per partition.
      
      Author: Syed Hashmi <shashmi@cloudera.com>
      
      Closes #721 from syedhashmi/master and squashes the following commits:
      
      4ca94cc [Syed Hashmi] [SPARK-1784] Add a new partitioner
      92cebada
    • Michael Armbrust's avatar
      [SQL] Make it possible to create Java/Python SQLContexts from an existing Scala SQLContext. · 44233865
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #761 from marmbrus/existingContext and squashes the following commits:
      
      4651051 [Michael Armbrust] Make it possible to create Java/Python SQLContexts from an existing Scala SQLContext.
      44233865
    • Ye Xianjin's avatar
      [SPARK-1527] change rootDir*.getName to rootDir*.getAbsolutePath · 753b04de
      Ye Xianjin authored
      JIRA issue: [SPARK-1527](https://issues.apache.org/jira/browse/SPARK-1527)
      
      getName() only gets the last component of the file path. When deleting test-generated directories,
      we should pass the generated directory's absolute path to DiskBlockManager.
      
      Author: Ye Xianjin <advancedxy@gmail.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Patrick Wendell <pwendell@gmail.com>
      
      Closes #436 from advancedxy/SPARK-1527 and squashes the following commits:
      
      4678bab [Ye Xianjin] change rootDir*.getname to rootDir*.getAbsolutePath so the temporary directories are deleted when the test is finished.
      753b04de
    • Andrew Or's avatar
      [SPARK-1816] LiveListenerBus dies if a listener throws an exception · 5c0dafc2
      Andrew Or authored
      The solution is to wrap a try / catch / log around the posting of each event to each listener.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #759 from andrewor14/listener-die and squashes the following commits:
      
      aee5107 [Andrew Or] Merge branch 'master' of github.com:apache/spark into listener-die
      370939f [Andrew Or] Remove two layers of indirection
      422d278 [Andrew Or] Explicitly throw an exception instead of 1 / 0
      0df0e2a [Andrew Or] Try/catch and log exceptions when posting events
      5c0dafc2
    • Andrew Tulloch's avatar
      SPARK-1791 - SVM implementation does not use threshold parameter · d1e48747
      Andrew Tulloch authored
      Summary:
      https://issues.apache.org/jira/browse/SPARK-1791
      
      Simple fix, and backward compatible, since
      
      - anyone who set the threshold was getting completely wrong answers.
      - anyone who did not set the threshold had the default 0.0 value for the threshold anyway.
      
      Test Plan:
      Unit test added that is verified to fail under the old implementation,
      and pass under the new implementation.
      
      Reviewers:
      
      CC:
      
      Author: Andrew Tulloch <andrew@tullo.ch>
      
      Closes #725 from ajtulloch/SPARK-1791-SVM and squashes the following commits:
      
      770f55d [Andrew Tulloch] SPARK-1791 - SVM implementation does not use threshold parameter
      d1e48747
    • William Benton's avatar
      SPARK-571: forbid return statements in cleaned closures · 16ffadcc
      William Benton authored
      This patch checks top-level closure arguments to `ClosureCleaner.clean` for `return` statements and raises an exception if it finds any.  This is mainly a user-friendliness addition, since programs with return statements in closure arguments will currently fail upon RDD actions with a less-than-intuitive error message.
      
      Author: William Benton <willb@redhat.com>
      
      Closes #717 from willb/spark-571 and squashes the following commits:
      
      c41eb7d [William Benton] Another test case for SPARK-571
      30c42f4 [William Benton] Stylistic cleanups
      559b16b [William Benton] Stylistic cleanups from review
      de13b79 [William Benton] Style fixes
      295b6a5 [William Benton] Forbid return statements in closure arguments.
      b017c47 [William Benton] Added a test for SPARK-571
      16ffadcc
    • Patrick Wendell's avatar
      52d90529
  4. May 12, 2014
    • Sandy Ryza's avatar
      SPARK-1815. SparkContext should not be marked DeveloperApi · 2792bd01
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #753 from sryza/sandy-spark-1815 and squashes the following commits:
      
      957a8ac [Sandy Ryza] SPARK-1815. SparkContext should not be marked DeveloperApi
      2792bd01
    • Andrew Or's avatar
      [SPARK-1753 / 1773 / 1814] Update outdated docs for spark-submit, YARN, standalone etc. · 2ffd1eaf
      Andrew Or authored
      YARN
      - SparkPi was updated to not take in master as an argument; we should update the docs to reflect that.
      - The default YARN build guide should be in maven, not sbt.
      - This PR also adds a paragraph on steps to debug a YARN application.
      
      Standalone
      - Emphasize spark-submit more. Right now it's one small paragraph preceding the legacy way of launching through `org.apache.spark.deploy.Client`.
      - The way we set configurations / environment variables according to the old docs is outdated. This needs to reflect changes introduced by the Spark configuration changes we made.
      
      In general, this PR also adds a little more documentation on the new spark-shell, spark-submit, spark-defaults.conf etc here and there.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #701 from andrewor14/yarn-docs and squashes the following commits:
      
      e2c2312 [Andrew Or] Merge in changes in #752 (SPARK-1814)
      25cfe7b [Andrew Or] Merge in the warning from SPARK-1753
      a8c39c5 [Andrew Or] Minor changes
      336bbd9 [Andrew Or] Tabs -> spaces
      4d9d8f7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      041017a [Andrew Or] Abstract Spark submit documentation to cluster-overview.html
      3cc0649 [Andrew Or] Detail how to set configurations + remove legacy instructions
      5b7140a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      85a51fc [Andrew Or] Update run-example, spark-shell, configuration etc.
      c10e8c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      381fe32 [Andrew Or] Update docs for standalone mode
      757c184 [Andrew Or] Add a note about the requirements for the debugging trick
      f8ca990 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-docs
      924f04c [Andrew Or] Revert addition of --deploy-mode
      d5fe17b [Andrew Or] Update the YARN docs
      2ffd1eaf
    • Andrew Or's avatar
      [SPARK-1780] Non-existent SPARK_DAEMON_OPTS is lurking around · ba96bb3d
      Andrew Or authored
      What they really mean is SPARK_DAEMON_***JAVA***_OPTS
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #751 from andrewor14/spark-daemon-opts and squashes the following commits:
      
      70c41f9 [Andrew Or] SPARK_DAEMON_OPTS -> SPARK_DAEMON_JAVA_OPTS
      ba96bb3d
    • Andrew Ash's avatar
      SPARK-1757 Failing test for saving null primitives with .saveAsParquetFile() · 156df87e
      Andrew Ash authored
      https://issues.apache.org/jira/browse/SPARK-1757
      
      The first test succeeds, but the second test fails with exception:
      
      ```
      [info] - save and load case class RDD with Nones as parquet *** FAILED *** (14 milliseconds)
      [info]   java.lang.RuntimeException: Unsupported datatype StructType(List())
      [info]   at scala.sys.package$.error(package.scala:27)
      [info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetRelation.scala:201)
      [info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
      [info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$1.apply(ParquetRelation.scala:235)
      [info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      [info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      [info]   at scala.collection.immutable.List.foreach(List.scala:318)
      [info]   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
      [info]   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
      [info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetRelation.scala:234)
      [info]   at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetRelation.scala:267)
      [info]   at org.apache.spark.sql.parquet.ParquetRelation$.createEmpty(ParquetRelation.scala:143)
      [info]   at org.apache.spark.sql.parquet.ParquetRelation$.create(ParquetRelation.scala:122)
      [info]   at org.apache.spark.sql.execution.SparkStrategies$ParquetOperations$.apply(SparkStrategies.scala:139)
      [info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      [info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      [info]   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      [info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
      [info]   at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:264)
      [info]   at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:264)
      [info]   at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:265)
      [info]   at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:265)
      [info]   at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:268)
      [info]   at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:268)
      [info]   at org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:66)
      [info]   at org.apache.spark.sql.SchemaRDD.saveAsParquetFile(SchemaRDD.scala:98)
      ```
      
      Author: Andrew Ash <andrew@andrewash.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #690 from ash211/rdd-parquet-save and squashes the following commits:
      
      747a0b9 [Andrew Ash] Merge pull request #1 from marmbrus/pr/690
      54bd00e [Michael Armbrust] Need to put Option first since Option <: Seq.
      8f3f281 [Andrew Ash] SPARK-1757 Add failing test for saving SparkSQL Schemas with Option[?] fields as parquet
      156df87e
    • Kousuke Saruta's avatar
      Modify a typo in monitoring.md · 9cf9f189
      Kousuke Saruta authored
      As I mentioned in SPARK-1765, there is a word 'JXM' in monitoring.md.
      I think it's typo for 'JMX'.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #698 from sarutak/SPARK-1765 and squashes the following commits:
      
      bae9843 [Kousuke Saruta] modified a typoe in monitoring.md
      9cf9f189
    • DB Tsai's avatar
      L-BFGS Documentation · 5c2275d6
      DB Tsai authored
      Documentation for L-BFGS, and an example of training binary L2 logistic regression using L-BFGS.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #702 from dbtsai/dbtsai-lbfgs-doc and squashes the following commits:
      
      0712215 [DB Tsai] Update
      38fdfa1 [DB Tsai] Removed extra empty line
      5745b64 [DB Tsai] Update again
      e9e418e [DB Tsai] Update
      7381521 [DB Tsai] L-BFGS Documentation
      5c2275d6
    • Andrew Ash's avatar
      Typo: resond -> respond · a5150d19
      Andrew Ash authored
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #743 from ash211/patch-4 and squashes the following commits:
      
      c959f3b [Andrew Ash] Typo: resond -> respond
      a5150d19
Loading