Skip to content
Snippets Groups Projects
  1. Apr 03, 2015
    • Reynold Xin's avatar
      [SPARK-6428] Turn on explicit type checking for public methods. · 82701ee2
      Reynold Xin authored
      This builds on my earlier pull requests and turns on the explicit type checking in scalastyle.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5342 from rxin/SPARK-6428 and squashes the following commits:
      
      7b531ab [Reynold Xin] import ordering
      2d9a8a5 [Reynold Xin] jl
      e668b1c [Reynold Xin] override
      9b9e119 [Reynold Xin] Parenthesis.
      82e0cf5 [Reynold Xin] [SPARK-6428] Turn on explicit type checking for public methods.
      82701ee2
    • Yin Huai's avatar
      [SPARK-6575][SQL] Converted Parquet Metastore tables no longer cache metadata · c42c3fc7
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-6575
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Cheng Lian <lian@databricks.com>
      
      Closes #5339 from yhuai/parquetRelationCache and squashes the following commits:
      
      b0e1a42 [Yin Huai] Address comments.
      83d9846 [Yin Huai] Remove unnecessary change.
      c0dc7a4 [Yin Huai] Cache converted parquet relations.
      c42c3fc7
    • zsxwing's avatar
      [SPARK-6621][Core] Fix the bug that calling EventLoop.stop in... · 440ea31b
      zsxwing authored
      [SPARK-6621][Core] Fix the bug that calling EventLoop.stop in EventLoop.onReceive/onError/onStart doesn't call onStop
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5280 from zsxwing/SPARK-6621 and squashes the following commits:
      
      521125e [zsxwing] Fix the bug that calling EventLoop.stop in EventLoop.onReceive and EventLoop.onError doesn't call onStop
      440ea31b
  2. Apr 02, 2015
    • freeman's avatar
      [SPARK-6345][STREAMING][MLLIB] Fix for training with prediction · 6e1c1ec6
      freeman authored
      This patch fixes a reported bug causing model updates to not properly propagate to model predictions during streaming regression. These minor changes in model declaration fix the problem, and I expanded the tests to include the scenario in which the bug was arising. The two new tests failed prior to the patch and now pass.
      
      cc mengxr
      
      Author: freeman <the.freeman.lab@gmail.com>
      
      Closes #5037 from freeman-lab/train-predict-fix and squashes the following commits:
      
      3af953e [freeman] Expand test coverage to include combined training and prediction
      8f84fc8 [freeman] Move model declaration
      6e1c1ec6
    • KaiXinXiaoLei's avatar
      [CORE] The descriptionof jobHistory config should be spark.history.fs.logDirectory · 8a0aa81c
      KaiXinXiaoLei authored
      The config option  is spark.history.fs.logDirectory, not spark.fs.history.logDirectory. So the descriptionof  should be changed. Thanks.
      
      Author: KaiXinXiaoLei <huleilei1@huawei.com>
      
      Closes #5332 from KaiXinXiaoLei/historyConfig and squashes the following commits:
      
      5ffbfb5 [KaiXinXiaoLei] the describe of jobHistory config is error
      8a0aa81c
    • Yin Huai's avatar
      [SPARK-6575][SQL] Converted Parquet Metastore tables no longer cache metadata · 4b82bd73
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-6575
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5339 from yhuai/parquetRelationCache and squashes the following commits:
      
      83d9846 [Yin Huai] Remove unnecessary change.
      c0dc7a4 [Yin Huai] Cache converted parquet relations.
      4b82bd73
    • Marcelo Vanzin's avatar
      [SPARK-6650] [core] Stop ExecutorAllocationManager when context stops. · 45134ec9
      Marcelo Vanzin authored
      This fixes the thread leak. I also changed the unit test to keep track
      of allocated contexts and make sure they're closed after tests are
      run; this is needed since some tests use this pattern:
      
          val sc = createContext()
          doSomethingThatMayThrow()
          sc.stop()
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5311 from vanzin/SPARK-6650 and squashes the following commits:
      
      652c73b [Marcelo Vanzin] Nits.
      5711512 [Marcelo Vanzin] More exception safety.
      cc5a744 [Marcelo Vanzin] Stop alloc manager before scheduler.
      9886f69 [Marcelo Vanzin] [SPARK-6650] [core] Stop ExecutorAllocationManager when context stops.
      45134ec9
    • Michael Armbrust's avatar
      [SPARK-6686][SQL] Use resolved output instead of names for toDF rename · 052dee07
      Michael Armbrust authored
      This is a workaround for a problem reported on the user list.  This doesn't fix the core problem, but in general is a more robust way to do renames.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #5337 from marmbrus/toDFrename and squashes the following commits:
      
      6a3159d [Michael Armbrust] [SPARK-6686][SQL] Use resolved output instead of names for toDF rename
      052dee07
    • DoingDone9's avatar
      [SPARK-6243][SQL] The Operation of match did not conside the scenarios that... · 947802cb
      DoingDone9 authored
      [SPARK-6243][SQL] The Operation of match did not conside the scenarios that order.dataType does not match NativeType
      
      It did not conside that order.dataType does not match NativeType. So i add "case other => ..." for other cenarios.
      
      Author: DoingDone9 <799203320@qq.com>
      
      Closes #4959 from DoingDone9/case_ and squashes the following commits:
      
      6278846 [DoingDone9] Update rows.scala
      cb1852d [DoingDone9] Merge pull request #2 from apache/master
      c3f046f [DoingDone9] Merge pull request #1 from apache/master
      947802cb
    • Cheng Hao's avatar
      [SQL][Minor] Use analyzed logical instead of unresolved in HiveComparisonTest · dfd2982b
      Cheng Hao authored
      Some internal unit test failed due to the logical plan node in pattern matching in `HiveComparisonTest`,  e.g.
      https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala#L137
      
      Which will may call the `output` function on an unresolved logical plan.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4946 from chenghao-intel/logical and squashes the following commits:
      
      432ecb3 [Cheng Hao] Use analyzed instead of logical in HiveComparisonTest
      dfd2982b
    • Yin Huai's avatar
      [SPARK-6618][SPARK-6669][SQL] Lock Hive metastore client correctly. · 5db89127
      Yin Huai authored
      Author: Yin Huai <yhuai@databricks.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #5333 from yhuai/lookupRelationLock and squashes the following commits:
      
      59c884f [Michael Armbrust] [SQL] Lock metastore client in analyzeTable
      7667030 [Yin Huai] Merge pull request #2 from marmbrus/pr/5333
      e4a9b0b [Michael Armbrust] Correctly lock on MetastoreCatalog
      d6fc32f [Yin Huai] Missing `)`.
      1e241af [Yin Huai] Protect InsertIntoHive.
      fee7e9c [Yin Huai] A test?
      5416b0f [Yin Huai] Just protect client.
      5db89127
    • Cheng Lian's avatar
      [Minor] [SQL] Follow-up of PR #5210 · d3944b6f
      Cheng Lian authored
      This PR addresses rxin's comments in PR #5210.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5219)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #5219 from liancheng/spark-6554-followup and squashes the following commits:
      
      41f3a09 [Cheng Lian] Addresses comments in #5210
      d3944b6f
    • Yin Huai's avatar
      [SPARK-6655][SQL] We need to read the schema of a data source table stored in... · 251698fb
      Yin Huai authored
      [SPARK-6655][SQL] We need to read the schema of a data source table stored in spark.sql.sources.schema property
      
      https://issues.apache.org/jira/browse/SPARK-6655
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5313 from yhuai/SPARK-6655 and squashes the following commits:
      
      1e00c03 [Yin Huai] Unnecessary change.
      f131bd9 [Yin Huai] Fix.
      f1218c1 [Yin Huai] Failed test.
      251698fb
    • Michael Armbrust's avatar
      [SQL] Throw UnsupportedOperationException instead of NotImplementedError · 4214e50f
      Michael Armbrust authored
      NotImplementedError in scala 2.10 is a fatal exception, which is not very nice to throw when not actually fatal.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #5315 from marmbrus/throwUnsupported and squashes the following commits:
      
      c29e03b [Michael Armbrust] [SQL] Throw UnsupportedOperationException instead of NotImplementedError
      052e05b [Michael Armbrust] [SQL] Throw UnsupportedOperationException instead of NotImplementedError
      4214e50f
    • Hung Lin's avatar
      SPARK-6414: Spark driver failed with NPE on job cancelation · e3202aa2
      Hung Lin authored
      Use Option for ActiveJob.properties to avoid NPE bug
      
      Author: Hung Lin <hung.lin@gmail.com>
      
      Closes #5124 from hunglin/SPARK-6414 and squashes the following commits:
      
      2290b6b [Hung Lin] [SPARK-6414][core] Fix NPE in SparkContext.cancelJobGroup()
      e3202aa2
    • Davies Liu's avatar
      [SPARK-6667] [PySpark] remove setReuseAddress · 0cce5451
      Davies Liu authored
      The reused address on server side had caused the server can not acknowledge the connected connections, remove it.
      
      This PR will retry once after timeout, it also add a timeout at client side.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5324 from davies/collect_hang and squashes the following commits:
      
      e5a51a2 [Davies Liu] remove setReuseAddress
      7977c2f [Davies Liu] do retry on client side
      b838f35 [Davies Liu] retry after timeout
      0cce5451
    • Xiangrui Meng's avatar
      [SPARK-6672][SQL] convert row to catalyst in createDataFrame(RDD[Row], ...) · 424e987d
      Xiangrui Meng authored
      We assume that `RDD[Row]` contains Scala types. So we need to convert them into catalyst types in createDataFrame. liancheng
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5329 from mengxr/SPARK-6672 and squashes the following commits:
      
      2d52644 [Xiangrui Meng] set needsConversion = false in jsonRDD
      06896e4 [Xiangrui Meng] add createDataFrame without conversion
      4a3767b [Xiangrui Meng] convert Row to catalyst
      424e987d
    • Patrick Wendell's avatar
      [SPARK-6627] Some clean-up in shuffle code. · 6562787b
      Patrick Wendell authored
      Before diving into review #4450 I did a look through the existing shuffle
      code to learn how it works. Unfortunately, there are some very
      confusing things in this code. This patch makes a few small changes
      to simplify things. It is not easily to concisely describe the changes
      because of how convoluted the issues were, but they are fairly small
      logically:
      
      1. There is a trait named `ShuffleBlockManager` that only deals with
         one logical function which is retrieving shuffle block data given shuffle
         block coordinates. This trait has two implementors FileShuffleBlockManager
         and IndexShuffleBlockManager. Confusingly the vast majority of those
         implementations have nothing to do with this particular functionality.
         So I've renamed the trait to ShuffleBlockResolver and documented it.
      2. The aforementioned trait had two almost identical methods, for no good
         reason. I removed one method (getBytes) and modified callers to use the
         other one. I think the behavior is preserved in all cases.
      3. The sort shuffle code uses an identifier "0" in the reduce slot of a
         BlockID as a placeholder. I made it into a constant since it needs to
         be consistent across multiple places.
      
      I think for (3) there is actually a better solution that would avoid the
      need to do this type of workaround/hack in the first place, but it's more
      complex so I'm punting it for now.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #5286 from pwendell/cleanup and squashes the following commits:
      
      c71fbc7 [Patrick Wendell] Open interface back up for testing
      f36edd5 [Patrick Wendell] Code review feedback
      d1c0494 [Patrick Wendell] Style fix
      a406079 [Patrick Wendell] [HOTFIX] Some clean-up in shuffle code.
      6562787b
    • Davies Liu's avatar
      [SPARK-6663] [SQL] use Literal.create instread of constructor · 40df5d49
      Davies Liu authored
      In order to do inbound checking and type conversion, we should use Literal.create() instead of  constructor.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5320 from davies/literal and squashes the following commits:
      
      1667604 [Davies Liu] fix style and add comment
      5f8c0fd [Davies Liu] use Literal.create instread of constructor
      40df5d49
  3. Apr 01, 2015
    • Cheng Lian's avatar
    • Chet Mancini's avatar
      [SPARK-6658][SQL] Update DataFrame documentation to fix type references. · 191524e7
      Chet Mancini authored
      First contribution here; would love to be getting some code contributions in soon. Let me know if there's anything about contribution process I should improve.
      
      Author: Chet Mancini <chetmancini@gmail.com>
      
      Closes #5316 from chetmancini/SPARK_6658_dataframe_doc and squashes the following commits:
      
      53b627a [Chet Mancini] [SQL] SPARK-6658: Update DataFrame documentation to refer to correct types
      191524e7
    • Reynold Xin's avatar
      [SPARK-6578] Small rewrite to make the logic more clear in MessageWithHeader.transferTo. · 899ebcb1
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5319 from rxin/SPARK-6578 and squashes the following commits:
      
      7c62a64 [Reynold Xin] Small rewrite to make the logic more clear in transferTo.
      899ebcb1
    • Xiangrui Meng's avatar
      [SPARK-6660][MLLIB] pythonToJava doesn't recognize object arrays · 4815bc21
      Xiangrui Meng authored
      davies
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5318 from mengxr/SPARK-6660 and squashes the following commits:
      
      0f66ec2 [Xiangrui Meng] recognize object arrays
      ad8c42f [Xiangrui Meng] add a test for SPARK-6660
      4815bc21
    • ksonj's avatar
      [SPARK-6553] [pyspark] Support functools.partial as UDF · 757b2e91
      ksonj authored
      
      Use `f.__repr__()` instead of `f.__name__` when instantiating `UserDefinedFunction`s, so `functools.partial`s may be used.
      
      Author: ksonj <kson@siberie.de>
      
      Closes #5206 from ksonj/partials and squashes the following commits:
      
      ea66f3d [ksonj] Inserted blank lines for PEP8 compliance
      d81b02b [ksonj] added tests for udf with partial function and callable object
      2c76100 [ksonj] Makes UDFs work with all types of callables
      b814a12 [ksonj] support functools.partial as udf
      
      (cherry picked from commit 98f72dfc)
      Signed-off-by: default avatarJosh Rosen <joshrosen@databricks.com>
      757b2e91
    • Yanbo Liang's avatar
      [SPARK-6580] [MLLIB] Optimize LogisticRegressionModel.predictPoint · 86b43993
      Yanbo Liang authored
      https://issues.apache.org/jira/browse/SPARK-6580
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #5249 from yanboliang/spark-6580 and squashes the following commits:
      
      6f47f21 [Yanbo Liang] address comments
      4e0bd0f [Yanbo Liang] fix typos
      04e2e2a [Yanbo Liang] trigger jenkins
      cad5bcd [Yanbo Liang] Optimize LogisticRegressionModel.predictPoint
      86b43993
    • MechCoder's avatar
      [SPARK-6576] [MLlib] [PySpark] DenseMatrix in PySpark should support indexing · 2fa3b47d
      MechCoder authored
      Support indexing in DenseMatrices in PySpark
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #5232 from MechCoder/SPARK-6576 and squashes the following commits:
      
      a735078 [MechCoder] Change bounds
      a062025 [MechCoder] Matrices are stored in column order
      7917bc1 [MechCoder] [SPARK-6576] DenseMatrix in PySpark should support indexing
      2fa3b47d
    • Xiangrui Meng's avatar
      [SPARK-6642][MLLIB] use 1.2 lambda scaling and remove addImplicit from NormalEquation · ccafd757
      Xiangrui Meng authored
      This PR changes lambda scaling from number of users/items to number of explicit ratings. The latter is the behavior in 1.2. Slight refactor of NormalEquation to make it independent of ALS models. srowen codexiang
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5314 from mengxr/SPARK-6642 and squashes the following commits:
      
      dc655a1 [Xiangrui Meng] relax python tests
      f410df2 [Xiangrui Meng] use 1.2 scaling and remove addImplicit from NormalEquation
      ccafd757
    • Marcelo Vanzin's avatar
      [SPARK-6578] [core] Fix thread-safety issue in outbound path of network library. · f084c5de
      Marcelo Vanzin authored
      While the inbound path of a netty pipeline is thread-safe, the outbound
      path is not. That means that multiple threads can compete to write messages
      to the next stage of the pipeline.
      
      The network library sometimes breaks a single RPC message into multiple
      buffers internally to avoid copying data (see MessageEncoder). This can
      result in the following scenario (where "FxBy" means "frame x, buffer y"):
      
                     T1         F1B1            F1B2
                                  \               \
                                   \               \
                     socket        F1B1   F2B1    F1B2  F2B2
                                           /             /
                                          /             /
                     T2                  F2B1         F2B2
      
      And the frames now cannot be rebuilt on the receiving side because the
      different messages have been mixed up on the wire.
      
      The fix wraps these multi-buffer messages into a `FileRegion` object
      so that these messages are written "atomically" to the next pipeline handler.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5234 from vanzin/SPARK-6578 and squashes the following commits:
      
      16b2d70 [Marcelo Vanzin] Forgot to update a type.
      c9c2e4e [Marcelo Vanzin] Review comments: simplify some code.
      9c888ac [Marcelo Vanzin] Small style nits.
      8474bab [Marcelo Vanzin] Fix multiple calls to MessageWithHeader.transferTo().
      e26509f [Marcelo Vanzin] Merge branch 'master' into SPARK-6578
      c503f6c [Marcelo Vanzin] Implement a custom FileRegion instead of using locks.
      84aa7ce [Marcelo Vanzin] Rename handler to the correct name.
      432f3bd [Marcelo Vanzin] Remove unneeded method.
      8d70e60 [Marcelo Vanzin] Fix thread-safety issue in outbound path of network library.
      f084c5de
    • Joseph K. Bradley's avatar
      [SPARK-6657] [Python] [Docs] fixed python doc build warnings · fb25e8c7
      Joseph K. Bradley authored
      fixed python doc build warnings
      
      CC whomever wants to review: rxin mengxr davies
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #5317 from jkbradley/python-doc-warnings and squashes the following commits:
      
      4cd43c2 [Joseph K. Bradley] fixed python doc build warnings
      fb25e8c7
    • Xiangrui Meng's avatar
      [SPARK-6651][MLLIB] delegate dense vector arithmetics to the underlying numpy array · 2275acce
      Xiangrui Meng authored
      Users should be able to use numpy operators directly on dense vectors. davies atalwalkar
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5312 from mengxr/SPARK-6651 and squashes the following commits:
      
      e665c5c [Xiangrui Meng] wrap the result in a dense vector
      23dfca3 [Xiangrui Meng] delegate dense vector arithmetics to the underlying numpy array
      2275acce
    • Steve Loughran's avatar
      SPARK-6433 hive tests to import spark-sql test JAR for QueryTest access · ee11be25
      Steve Loughran authored
      1. Test JARs are built & published
      1. log4j.resources is explicitly excluded. Without this, downstream test run logging depends on the order the JARs are listed/loaded
      1. sql/hive pulls in spark-sql &...spark-catalyst for its test runs
      1. The copied in test classes were rm'd, and a test edited to remove its now duplicate assert method
      1. Spark streaming is now build with the same plugin/phase as the rest, but its shade plugin declaration is kept in (so different from the rest of the test plugins). Due to (#2), this means the test JAR no longer includes its log4j file.
      
      Outstanding issues:
      * should the JARs be shaded? `spark-streaming-test.jar` does, but given these are test jars for developers only, especially in the same spark source tree, it's hard to justify.
      * `maven-jar-plugin` v 2.6 was explicitly selected; without this the apache-1.4 parent template JAR version (2.4) chosen.
      * Are there any other resources to exclude?
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #5119 from steveloughran/stevel/patches/SPARK-6433-test-jars and squashes the following commits:
      
      81ceb01 [Steve Loughran] SPARK-6433 add a clearer comment explaining what the plugin is doing & why
      a6dca33 [Steve Loughran] SPARK-6433 : pull configuration section form archive plugin
      c2b5f89 [Steve Loughran] SPARK-6433 omit "jar" goal from jar plugin
      fdac51b [Steve Loughran] SPARK-6433 -002; indentation & delegate plugin version to parent
      650f442 [Steve Loughran] SPARK-6433 patch 001: test JARs are built; sql/hive pulls in spark-sql & spark-catalyst for its test runs
      ee11be25
    • Cheng Lian's avatar
      [SPARK-6608] [SQL] Makes DataFrame.rdd a lazy val · d36c5fca
      Cheng Lian authored
      Before 1.3.0, `SchemaRDD.id` works as a unique identifier of each `SchemaRDD`. In 1.3.0, unlike `SchemaRDD`, `DataFrame` is no longer an RDD, and `DataFrame.rdd` is actually a function which always returns a new RDD instance. Making `DataFrame.rdd` a lazy val should bring the unique identifier back.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5265)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #5265 from liancheng/spark-6608 and squashes the following commits:
      
      7500968 [Cheng Lian] Updates javadoc
      7f37d21 [Cheng Lian] Makes DataFrame.rdd a lazy val
      d36c5fca
    • jayson's avatar
      SPARK-6626 [DOCS]: Corrected Scala:TwitterUtils parameters · 0358b08d
      jayson authored
      Per Sean Owen's request, here is the update call for TwitterUtils using Scala :)
      
      Author: jayson <jayson@ziprecruiter.com>
      
      Closes #5295 from JaysonSunshine/master and squashes the following commits:
      
      df1d056 [jayson] Corrected Scala:TwitterUtils parameters
      0358b08d
    • Kousuke Saruta's avatar
      [SPARK-6597][Minor] Replace `input:checkbox` with `input[type="checkbox"]` in additional-metrics.js · d824c11c
      Kousuke Saruta authored
      In additional-metrics.js, there are some selector notation like `input:checkbox` but JQuery's official document says `input[type="checkbox"]` is better.
      
      https://api.jquery.com/checkbox-selector/
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #5254 from sarutak/SPARK-6597 and squashes the following commits:
      
      a253bc4 [Kousuke Saruta] Replaced input:checkbox with input[type="checkbox"]
      d824c11c
    • Florian Verhein's avatar
      [EC2] [SPARK-6600] Open ports in ec2/spark_ec2.py to allow HDFS NFS gateway · 41226234
      Florian Verhein authored
      Authorizes incoming access to master on the ports required to use the hadoop hdfs nfs gateway from outside the cluster.
      
      Author: Florian Verhein <florian.verhein@gmail.com>
      
      Closes #5257 from florianverhein/master and squashes the following commits:
      
      72a586a [Florian Verhein] [EC2] [SPARK-6600] initial impl
      41226234
    • Ilya Ganelin's avatar
      [SPARK-4655][Core] Split Stage into ShuffleMapStage and ResultStage subclasses · ff1915e1
      Ilya Ganelin authored
      Hi all - this patch changes the Stage class to an abstract class and introduces two new classes that extend it: ShuffleMapStage and ResultStage - with the goal of increasing readability of the DAGScheduler class. Their usage is updated within DAGScheduler.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      Author: Ilya Ganelin <ilganeli@gmail.com>
      
      Closes #4708 from ilganeli/SPARK-4655 and squashes the following commits:
      
      c248924 [Ilya Ganelin] Merge branch 'SPARK-4655' of github.com:ilganeli/spark into SPARK-4655
      d930385 [Ilya Ganelin] Fixed merge conflict from
      a9a765f [Ilya Ganelin] Update DAGScheduler.scala
      c03563c [Ilya Ganelin] Minor fixeS
      c39e971 [Ilya Ganelin] Added return typing for public methods
      845bc87 [Ilya Ganelin] Merge branch 'SPARK-4655' of github.com:ilganeli/spark into SPARK-4655
      e8031d8 [Ilya Ganelin] Minor string fixes
      4ec53ac [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4655
      c004f62 [Ilya Ganelin] Update DAGScheduler.scala
      a2cb03f [Ilya Ganelin] [SPARK-4655] Replaced usages of Nil and eliminated some code reuse
      3d5cf20 [Ilya Ganelin] [SPARK-4655] Moved mima exclude to 1.4
      6912c55 [Ilya Ganelin] Resolved merge conflict
      4bff208 [Ilya Ganelin] Minor stylistic fixes
      c6fffbb [Ilya Ganelin] newline
      41402ad [Ilya Ganelin] Style fixes
      02c6981 [Ilya Ganelin] Merge branch 'SPARK-4655' of github.com:ilganeli/spark into SPARK-4655
      c755a09 [Ilya Ganelin] Some more stylistic updates and minor refactoring
      b6257a0 [Ilya Ganelin] Update MimaExcludes.scala
      0f0c624 [Ilya Ganelin] Fixed merge conflict
      2eba262 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4655
      6b43d7b [Ilya Ganelin] Got rid of some spaces
      6f1a5db [Ilya Ganelin] Revert "More minor formatting and refactoring"
      1b3471b [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4655
      c9288e2 [Ilya Ganelin] More minor formatting and refactoring
      d548caf [Ilya Ganelin] Formatting fix
      c3ae5c2 [Ilya Ganelin] Explicit typing
      0dacaf3 [Ilya Ganelin] Got rid of stale import
      6da3a71 [Ilya Ganelin] Trailing whitespace
      b85c5fe [Ilya Ganelin] Added minor fixes
      a57dfcd [Ilya Ganelin] Added MiMA exclusion to get around binary compatibility check
      83ed849 [Ilya Ganelin] moved braces for consistency
      96dd161 [Ilya Ganelin] Fixed minor style error
      cfd6f10 [Ilya Ganelin] Updated DAGScheduler to use new ResultStage and ShuffleMapStage classes
      83494e9 [Ilya Ganelin] Added new Stage classes
      ff1915e1
  4. Mar 31, 2015
    • Reynold Xin's avatar
      [Doc] Improve Python DataFrame documentation · 305abe1e
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5287 from rxin/pyspark-df-doc-cleanup-context and squashes the following commits:
      
      1841b60 [Reynold Xin] Lint.
      f2007f1 [Reynold Xin] functions and types.
      bc3b72b [Reynold Xin] More improvements to DataFrame Python doc.
      ac1d4c0 [Reynold Xin] Bug fix.
      b163365 [Reynold Xin] Python fix. Added Experimental flag to DataFrameNaFunctions.
      608422d [Reynold Xin] [Doc] Cleanup context.py Python docs.
      305abe1e
    • Josh Rosen's avatar
      [SPARK-6614] OutputCommitCoordinator should clear authorized committer only... · 37326079
      Josh Rosen authored
      [SPARK-6614] OutputCommitCoordinator should clear authorized committer only after authorized committer fails, not after any failure
      
      In OutputCommitCoordinator, there is some logic to clear the authorized committer's lock on committing in case that task fails.  However, it looks like the current code also clears this lock if other non-authorized tasks fail, which is an obvious bug.
      
      In theory, it's possible that this could allow a new committer to start, run to completion, and commit output before the authorized committer finished, but it's unlikely that this race occurs often in practice due to the complex combination of failure and timing conditions that would be required to expose it.
      
      This patch addresses this issue and adds a regression test.
      
      Thanks to aarondav for spotting this issue.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5276 from JoshRosen/SPARK-6614 and squashes the following commits:
      
      d532ba7 [Josh Rosen] Check whether failed task was authorized committer
      cbb3784 [Josh Rosen] Add regression test for SPARK-6614
      37326079
    • MechCoder's avatar
      [SPARK-5692] [MLlib] Word2Vec save/load · 0e00f12d
      MechCoder authored
      Word2Vec model now supports saving and loading.
      
      a] The Metadata stored in JSON format consists of "version", "classname", "vectorSize" and "numWords"
      b] The data stored in Parquet file format consists of an Array of rows with each row consisting of 2 columns, first being the word: String and the second, an Array of Floats.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #5291 from MechCoder/spark-5692 and squashes the following commits:
      
      1142f3a [MechCoder] Add numWords to metaData
      bfe4c39 [MechCoder] [SPARK-5692] Word2Vec save/load
      0e00f12d
    • Liang-Chi Hsieh's avatar
      [SPARK-6633][SQL] Should be "Contains" instead of "EndsWith" when constructing... · 2036bc59
      Liang-Chi Hsieh authored
      [SPARK-6633][SQL] Should be "Contains" instead of "EndsWith" when constructing sources.StringContains
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5299 from viirya/stringcontains and squashes the following commits:
      
      c1ece4c [Liang-Chi Hsieh] Should be Contains instead of EndsWith.
      2036bc59
Loading