Skip to content
Snippets Groups Projects
  1. Jun 25, 2015
    • Joshi's avatar
      [SPARK-5768] [WEB UI] Fix for incorrect memory in Spark UI · 085a7216
      Joshi authored
      Fix for incorrect memory in Spark UI as per SPARK-5768
      
      Author: Joshi <rekhajoshm@gmail.com>
      Author: Rekha Joshi <rekhajoshm@gmail.com>
      
      Closes #6972 from rekhajoshm/SPARK-5768 and squashes the following commits:
      
      b678a91 [Joshi] Fix for incorrect memory in Spark UI
      2fe53d9 [Joshi] Fix for incorrect memory in Spark UI
      eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI
      0be142d [Rekha Joshi] Merge pull request #3 from apache/master
      106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
      e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
      085a7216
    • Cheng Lian's avatar
      [SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class · c337844e
      Cheng Lian authored
      `HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers).
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6998 from liancheng/spark-8604 and squashes the following commits:
      
      9be51d1 [Cheng Lian] Adds more comments
      6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class
      c337844e
    • Matt Massie's avatar
      [SPARK-7884] Move block deserialization from BlockStoreShuffleFetcher to ShuffleReader · 7bac2fe7
      Matt Massie authored
      This commit updates the shuffle read path to enable ShuffleReader implementations more control over the deserialization process.
      
      The BlockStoreShuffleFetcher.fetch() method has been renamed to BlockStoreShuffleFetcher.fetchBlockStreams(). Previously, this method returned a record iterator; now, it returns an iterator of (BlockId, InputStream). Deserialization of records is now handled in the ShuffleReader.read() method.
      
      This change creates a cleaner separation of concerns and allows implementations of ShuffleReader more flexibility in how records are retrieved.
      
      Author: Matt Massie <massie@cs.berkeley.edu>
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6423 from massie/shuffle-api-cleanup and squashes the following commits:
      
      8b0632c [Matt Massie] Minor Scala style fixes
      d0a1b39 [Matt Massie] Merge pull request #1 from kayousterhout/massie_shuffle-api-cleanup
      290f1eb [Kay Ousterhout] Added test for HashShuffleReader.read()
      5186da0 [Kay Ousterhout] Revert "Add test to ensure HashShuffleReader is freeing resources"
      f98a1b9 [Matt Massie] Add test to ensure HashShuffleReader is freeing resources
      a011bfa [Matt Massie] Use PrivateMethodTester on check that delegate stream is closed
      4ea1712 [Matt Massie] Small code cleanup for readability
      7429a98 [Matt Massie] Update tests to check that BufferReleasingStream is closing delegate InputStream
      f458489 [Matt Massie] Remove unnecessary map() on return Iterator
      4abb855 [Matt Massie] Consolidate metric code. Make it clear why InterrubtibleIterator is needed.
      5c30405 [Matt Massie] Return visibility of BlockStoreShuffleFetcher to private[hash]
      7eedd1d [Matt Massie] Small Scala import cleanup
      28f8085 [Matt Massie] Small import nit
      f93841e [Matt Massie] Update shuffle read metrics in ShuffleReader instead of BlockStoreShuffleFetcher.
      7e8e0fe [Matt Massie] Minor Scala style fixes
      01e8721 [Matt Massie] Explicitly cast iterator in branches for type clarity
      7c8f73e [Matt Massie] Close Block InputStream immediately after all records are read
      208b7a5 [Matt Massie] Small code style changes
      b70c945 [Matt Massie] Make BlockStoreShuffleFetcher visible to shuffle package
      19135f2 [Matt Massie] [SPARK-7884] Allow Spark shuffle APIs to be more customizable
      7bac2fe7
  2. Jun 24, 2015
    • Reynold Xin's avatar
      Two minor SQL cleanup (compiler warning & indent). · 82f80c1c
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7000 from rxin/minor-cleanup and squashes the following commits:
      
      046044c [Reynold Xin] Two minor SQL cleanup (compiler warning & indent).
      82f80c1c
    • Wenchen Fan's avatar
      [SPARK-8075] [SQL] apply type check interface to more expressions · b71d3254
      Wenchen Fan authored
      a follow up of https://github.com/apache/spark/pull/6405.
      Note: It's not a big change, a lot of changing is due to I swap some code in `aggregates.scala` to make aggregate functions right below its corresponding aggregate expressions.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6723 from cloud-fan/type-check and squashes the following commits:
      
      2124301 [Wenchen Fan] fix tests
      5a658bb [Wenchen Fan] add tests
      287d3bb [Wenchen Fan] apply type check interface to more expressions
      b71d3254
    • Yin Huai's avatar
      [SPARK-8567] [SQL] Increase the timeout of HiveSparkSubmitSuite · 7daa7029
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-8567
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6957 from yhuai/SPARK-8567 and squashes the following commits:
      
      62dff5b [Yin Huai] Increase the timeout.
      7daa7029
    • fe2s's avatar
      [SPARK-8558] [BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set · dca21a83
      fe2s authored
      Author: fe2s <aka.fe2s@gmail.com>
      Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>
      
      Closes #6956 from fe2s/fix-run-tests and squashes the following commits:
      
      31b6edc [fe2s] str is a built-in function, so using it as a variable name will lead to spurious warnings in some Python linters
      7d781a0 [fe2s] fixing for openjdk/IBM, seems like they have slightly different wording, but all have 'version' word. Surrounding with spaces for the case if version word appears in _JAVA_OPTIONS
      cd455ef [fe2s] address comment, looking for java version string rather than expecting to have on a certain line number
      ad577d7 [Oleksiy Dyagilev] [SPARK-8558][BUILD] Script /dev/run-tests fails when _JAVA_OPTIONS env var set
      dca21a83
    • Cheng Lian's avatar
      [SPARK-6777] [SQL] Implements backwards compatibility rules in CatalystSchemaConverter · 8ab50765
      Cheng Lian authored
      This PR introduces `CatalystSchemaConverter` for converting Parquet schema to Spark SQL schema and vice versa.  Original conversion code in `ParquetTypesConverter` is removed. Benefits of the new version are:
      
      1. When converting Spark SQL schemas, it generates standard Parquet schemas conforming to [the most updated Parquet format spec] [1]. Converting to old style Parquet schemas is also supported via feature flag `spark.sql.parquet.followParquetFormatSpec` (which is set to `false` for now, and should be set to `true` after both read and write paths are fixed).
      
         Note that although this version of Parquet format spec hasn't been officially release yet, Parquet MR 1.7.0 already sticks to it. So it should be safe to follow.
      
      1. It implements backwards-compatibility rules described in the most updated Parquet format spec. Thus can recognize more schema patterns generated by other/legacy systems/tools.
      1. Code organization follows convention used in [parquet-mr] [2], which is easier to follow. (Structure of `CatalystSchemaConverter` is similar to `AvroSchemaConverter`).
      
      To fully implement backwards-compatibility rules in both read and write path, we also need to update `CatalystRowConverter` (which is responsible for converting Parquet records to `Row`s), `RowReadSupport`, and `RowWriteSupport`. These would be done in follow-up PRs.
      
      TODO
      
      - [x] More schema conversion test cases for legacy schema patterns.
      
      [1]: https://github.com/apache/parquet-format/blob/ea095226597fdbecd60c2419d96b54b2fdb4ae6c/LogicalTypes.md
      [2]: https://github.com/apache/parquet-mr/
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6617 from liancheng/spark-6777 and squashes the following commits:
      
      2a2062d [Cheng Lian] Don't convert decimals without precision information
      b60979b [Cheng Lian] Adds a constructor which accepts a Configuration, and fixes default value of assumeBinaryIsString
      743730f [Cheng Lian] Decimal scale shouldn't be larger than precision
      a104a9e [Cheng Lian] Fixes Scala style issue
      1f71d8d [Cheng Lian] Adds feature flag to allow falling back to old style Parquet schema conversion
      ba84f4b [Cheng Lian] Fixes MapType schema conversion bug
      13cb8d5 [Cheng Lian] Fixes MiMa failure
      81de5b0 [Cheng Lian] Fixes UDT, workaround read path, and add tests
      28ef95b [Cheng Lian] More AnalysisExceptions
      b10c322 [Cheng Lian] Replaces require() with analysisRequire() which throws AnalysisException
      cceaf3f [Cheng Lian] Implements backwards compatibility rules in CatalystSchemaConverter
      8ab50765
    • MechCoder's avatar
      [SPARK-7633] [MLLIB] [PYSPARK] Python bindings for StreamingLogisticRegressionwithSGD · fb32c388
      MechCoder authored
      Add Python bindings to StreamingLogisticRegressionwithSGD.
      
      No Java wrappers are needed as models are updated directly using train.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6849 from MechCoder/spark-3258 and squashes the following commits:
      
      b4376a5 [MechCoder] minor
      d7e5fc1 [MechCoder] Refactor into StreamingLinearAlgorithm Better docs
      9c09d4e [MechCoder] [SPARK-7633] Python bindings for StreamingLogisticRegressionwithSGD
      fb32c388
    • Wenchen Fan's avatar
      [SPARK-7289] handle project -> limit -> sort efficiently · f04b5672
      Wenchen Fan authored
      make the `TakeOrdered` strategy and operator more general, such that it can optionally handle a projection when necessary
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6780 from cloud-fan/limit and squashes the following commits:
      
      34aa07b [Wenchen Fan] revert
      07d5456 [Wenchen Fan] clean closure
      20821ec [Wenchen Fan] fix
      3676a82 [Wenchen Fan] address comments
      b558549 [Wenchen Fan] address comments
      214842b [Wenchen Fan] fix style
      2d8be83 [Wenchen Fan] add LimitPushDown
      948f740 [Wenchen Fan] fix existing
      f04b5672
    • Santiago M. Mola's avatar
      [SPARK-7088] [SQL] Fix analysis for 3rd party logical plan. · b84d4b4d
      Santiago M. Mola authored
      ResolveReferences analysis rule now does not throw when it cannot resolve references in a self-join.
      
      Author: Santiago M. Mola <smola@stratio.com>
      
      Closes #6853 from smola/SPARK-7088 and squashes the following commits:
      
      af71ac7 [Santiago M. Mola] [SPARK-7088] Fix analysis for 3rd party logical plan.
      b84d4b4d
    • Holden Karau's avatar
      [SPARK-8506] Add pakages to R context created through init. · 43e66192
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits:
      
      b60dd63 [Holden Karau] Add an example with the spark-csv package
      fa8bc92 [Holden Karau] typo: sparm -> spark
      865a90c [Holden Karau] strip spaces for comparision
      c7a4471 [Holden Karau] Add some documentation
      c1a9233 [Holden Karau] refactor for testing
      c818556 [Holden Karau] Add pakages to R
      43e66192
    • BenFradet's avatar
      [SPARK-8399] [STREAMING] [WEB UI] Overlap between histograms and axis' name in Spark Streaming UI · 1173483f
      BenFradet authored
      Moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui so the histograms and the axis' name do not overlap.
      
      Author: BenFradet <benjamin.fradet@gmail.com>
      
      Closes #6845 from BenFradet/SPARK-8399 and squashes the following commits:
      
      b63695f [BenFradet] adjusted inner histograms
      eb610ee [BenFradet] readjusted #batches on the x axis
      dd46f98 [BenFradet] aligned all unit labels and ticks
      0564b62 [BenFradet] readjusted #batches placement
      edd0936 [BenFradet] moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui
      1173483f
    • Nicholas Chammas's avatar
      [SPARK-8576] Add spark-ec2 options to set IAM roles and instance-initiated shutdown behavior · 31f48e5a
      Nicholas Chammas authored
      Both of these options are useful when spark-ec2 is being used as part of an automated pipeline and the engineers want to minimize the need to pass around AWS keys for access to things like S3 (keys are replaced by the IAM role) and to be able to launch a cluster that can terminate itself cleanly.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #6962 from nchammas/additional-ec2-options and squashes the following commits:
      
      fcf252e [Nicholas Chammas] PEP8 fixes
      efba9ee [Nicholas Chammas] add help for --instance-initiated-shutdown-behavior
      598aecf [Nicholas Chammas] option to launch instances into IAM role
      2743632 [Nicholas Chammas] add option for instance initiated shutdown
      31f48e5a
    • Yin Huai's avatar
      [SPARK-8578] [SQL] Should ignore user defined output committer when appending data · bba6699d
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-8578
      
      It is not very safe to use a custom output committer when append data to an existing dir. This changes adds the logic to check if we are appending data, and if so, we use the output committer associated with the file output format.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6964 from yhuai/SPARK-8578 and squashes the following commits:
      
      43544c4 [Yin Huai] Do not use a custom output commiter when appendiing data.
      bba6699d
    • Cheng Lian's avatar
      [SPARK-8567] [SQL] Debugging flaky HiveSparkSubmitSuite · 9d36ec24
      Cheng Lian authored
      Using similar approach used in `HiveThriftServer2Suite` to print stdout/stderr of the spawned process instead of logging them to see what happens on Jenkins. (This test suite only fails on Jenkins and doesn't spill out any log...)
      
      cc yhuai
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6978 from liancheng/debug-hive-spark-submit-suite and squashes the following commits:
      
      b031647 [Cheng Lian] Prints process stdout/stderr instead of logging them
      9d36ec24
    • Cheng Lian's avatar
      [SPARK-8138] [SQL] Improves error message when conflicting partition columns are found · cc465fd9
      Cheng Lian authored
      This PR improves the error message shown when conflicting partition column names are detected.  This can be particularly annoying and confusing when there are a large number of partitions while a handful of them happened to contain unexpected temporary file(s).  Now all suspicious directories are listed as below:
      
      ```
      java.lang.AssertionError: assertion failed: Conflicting partition column names detected:
      
              Partition column name list #0: b, c, d
              Partition column name list #1: b, c
              Partition column name list #2: b
      
      For partitioned table directories, data files should only live in leaf directories. Please check the following directories for unexpected files:
      
              file:/tmp/foo/b=0
              file:/tmp/foo/b=1
              file:/tmp/foo/b=1/c=1
              file:/tmp/foo/b=0/c=0
      ```
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6610 from liancheng/part-errmsg and squashes the following commits:
      
      7d05f2c [Cheng Lian] Fixes Scala style issue
      a149250 [Cheng Lian] Adds test case for the error message
      6b74dd8 [Cheng Lian] Also lists suspicious non-leaf partition directories
      a935eb8 [Cheng Lian] Improves error message when conflicting partition columns are found
      cc465fd9
    • Wenchen Fan's avatar
      [SPARK-8371] [SQL] improve unit test for MaxOf and MinOf and fix bugs · 09fcf96b
      Wenchen Fan authored
      a follow up of https://github.com/apache/spark/pull/6813
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6825 from cloud-fan/cg and squashes the following commits:
      
      43170cc [Wenchen Fan] fix bugs in code gen
      09fcf96b
    • Josh Rosen's avatar
      [HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for launcher project · 13ae806b
      Josh Rosen authored
      This commit changes the MiMa tests to test against the released 1.4.0 artifacts rather than 1.4.0-rc4; this change is necessary to fix a Jenkins build break since it seems that the RC4 snapshot is no longer available via Maven.
      
      I also enabled MiMa checks for the `launcher` subproject, which we should have done right after 1.4.0 was released.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6974 from JoshRosen/mima-hotfix and squashes the following commits:
      
      4b4175a [Josh Rosen] [HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for launcher project
      13ae806b
    • Eric Liang's avatar
      [SPARK-6749] [SQL] Make metastore client robust to underlying socket connection loss · 50c3a86f
      Eric Liang authored
      This works around a bug in the underlying RetryingMetaStoreClient (HIVE-10384) by refreshing the metastore client on thrift exceptions. We attempt to emulate the proper hive behavior by retrying only as configured by hiveconf.
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #6912 from ericl/spark-6749 and squashes the following commits:
      
      2d54b55 [Eric Liang] use conf from state
      0e3a74e [Eric Liang] use shim properly
      980b3e5 [Eric Liang] Fix conf parsing hive 0.14 conf.
      92459b6 [Eric Liang] Work around RetryingMetaStoreClient bug
      50c3a86f
  3. Jun 23, 2015
    • Reynold Xin's avatar
      Revert "[SPARK-7157][SQL] add sampleBy to DataFrame" · a458efc6
      Reynold Xin authored
      This reverts commit 0401cbaa.
      
      The new test case on Jenkins is failing.
      a458efc6
    • Xiangrui Meng's avatar
      [SPARK-7157][SQL] add sampleBy to DataFrame · 0401cbaa
      Xiangrui Meng authored
      Add `sampleBy` to DataFrame. rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6769 from mengxr/SPARK-7157 and squashes the following commits:
      
      991f26f [Xiangrui Meng] fix seed
      4a14834 [Xiangrui Meng] move sampleBy to stat
      832f7cc [Xiangrui Meng] add sampleBy to DataFrame
      0401cbaa
    • Cheng Lian's avatar
      [SPARK-8139] [SQL] Updates docs and comments of data sources and Parquet output committer options · 111d6b9b
      Cheng Lian authored
      This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6683 from liancheng/output-committer-docs and squashes the following commits:
      
      b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option
      ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
      111d6b9b
    • Davies Liu's avatar
      [SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column is used in booelan expression · 7fb5ae50
      Davies Liu authored
      It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6961 from davies/column_bool and squashes the following commits:
      
      9f19beb [Davies Liu] update message
      af74bd6 [Davies Liu] fix tests
      07dff84 [Davies Liu] address comments, fix tests
      f70c08e [Davies Liu] raise Exception if column is used in booelan expression
      7fb5ae50
    • Cheng Lian's avatar
      [DOC] [SQL] Addes Hive metastore Parquet table conversion section · d96d7b55
      Cheng Lian authored
      This PR adds a section about Hive metastore Parquet table conversion. It documents:
      
      1. Schema reconciliation rules introduced in #5214 (see [this comment] [1] in #5188)
      2. Metadata refreshing requirement introduced in #5339
      
      [1]: https://github.com/apache/spark/pull/5188#issuecomment-86531248
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits:
      
      42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet
      4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet
      756e660 [Cheng Lian] Adds Python snippet for metadata refreshing
      50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section
      d96d7b55
    • Oleksiy Dyagilev's avatar
      [SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespace... · a8031183
      Oleksiy Dyagilev authored
      [SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespace between label and features vector
      
      fix LabeledPoint parser when there is a whitespace between label and features vector, e.g.
      (y, [x1, x2, x3])
      
      Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>
      
      Closes #6954 from fe2s/SPARK-8525 and squashes the following commits:
      
      0755b9d [Oleksiy Dyagilev] [SPARK-8525][MLLIB] addressing comment, removing dep on commons-lang
      c1abc2b [Oleksiy Dyagilev] [SPARK-8525][MLLIB] fix LabeledPoint parser when there is a whitespace on specific position
      a8031183
    • Alok  Singh's avatar
      [SPARK-8111] [SPARKR] SparkR shell should display Spark logo and version banner on startup. · f2fb0285
      Alok Singh authored
      spark version is taken from the environment variable SPARK_VERSION
      
      Author: Alok  Singh <singhal@Aloks-MacBook-Pro.local>
      Author: Alok  Singh <singhal@aloks-mbp.usca.ibm.com>
      
      Closes #6944 from aloknsingh/aloknsingh_spark_jiras and squashes the following commits:
      
      ed607bd [Alok  Singh] [SPARK-8111][SparkR] As per suggestion, 1) using the version from sparkContext rather than the Sys.env. 2) change "Welcome to SparkR!" to "Welcome to" followed by Spark logo and version
      acd5b85 [Alok  Singh] fix the jira SPARK-8111 to add the spark version and logo. Currently spark version is taken from the environment variable SPARK_VERSION
      f2fb0285
    • MechCoder's avatar
      [SPARK-8265] [MLLIB] [PYSPARK] Add LinearDataGenerator to pyspark.mllib.utils · f2022fa0
      MechCoder authored
      It is useful to generate linear data for easy testing of linear models and in general. Scala already has it. This is just a wrapper around the Scala code.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6715 from MechCoder/generate_linear_input and squashes the following commits:
      
      6182884 [MechCoder] Minor changes
      8bda047 [MechCoder] Minor style fixes
      0f1053c [MechCoder] [SPARK-8265] Add LinearDataGenerator to pyspark.mllib.utils
      f2022fa0
    • Holden Karau's avatar
      [SPARK-7888] Be able to disable intercept in linear regression in ml package · 2b1111dd
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6927 from holdenk/SPARK-7888-Be-able-to-disable-intercept-in-Linear-Regression-in-ML-package and squashes the following commits:
      
      0ad384c [Holden Karau] Add MiMa excludes
      4016fac [Holden Karau] Switch to wild card import, remove extra blank lines
      ae5baa8 [Holden Karau] CR feedback, move the fitIntercept down rather than changing ymean and etc above
      f34971c [Holden Karau] Fix some more long lines
      319bd3f [Holden Karau] Fix long lines
      3bb9ee1 [Holden Karau] Update the regression suite tests
      7015b9f [Holden Karau] Our code performs the same with R, except we need more than one data point but that seems reasonable
      0b0c8c0 [Holden Karau] fix the issue with the sample R code
      e2140ba [Holden Karau] Add a test, it fails!
      5e84a0b [Holden Karau] Write out thoughts and use the correct trait
      91ffc0a [Holden Karau] more murh
      006246c [Holden Karau] murp?
      2b1111dd
    • Davies Liu's avatar
      [SPARK-8432] [SQL] fix hashCode() and equals() of BinaryType in Row · 6f4cadf5
      Davies Liu authored
      Also added more tests in LiteralExpressionSuite
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6876 from davies/fix_hashcode and squashes the following commits:
      
      429c2c0 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode
      32d9811 [Davies Liu] fix test
      a0626ed [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode
      89c2432 [Davies Liu] fix style
      bd20780 [Davies Liu] check with catalyst types
      41caec6 [Davies Liu] change for to while
      d96929b [Davies Liu] address comment
      6ad2a90 [Davies Liu] fix style
      5819d33 [Davies Liu] unify equals() and hashCode()
      0fff25d [Davies Liu] fix style
      53c38b1 [Davies Liu] fix hashCode() and equals() of BinaryType in Row
      6f4cadf5
    • Cheng Hao's avatar
      [SPARK-7235] [SQL] Refactor the grouping sets · 7b1450b6
      Cheng Hao authored
      The logical plan `Expand` takes the `output` as constructor argument, which break the references chain. We need to refactor the code, as well as the column pruning.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #5780 from chenghao-intel/expand and squashes the following commits:
      
      76e4aa4 [Cheng Hao] revert the change for case insenstive
      7c10a83 [Cheng Hao] refactor the grouping sets
      7b1450b6
    • lockwobr's avatar
      [SQL] [DOCS] updated the documentation for explode · 4f7fbefb
      lockwobr authored
      the syntax was incorrect in the example in explode
      
      Author: lockwobr <lockwobr@gmail.com>
      
      Closes #6943 from lockwobr/master and squashes the following commits:
      
      3d864d1 [lockwobr] updated the documentation for explode
      4f7fbefb
    • Holden Karau's avatar
      [SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffle writer · 0f92be5b
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits:
      
      f807832 [Holden Karau] Log error if we can't throw it
      855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates.
      039d620 [Holden Karau] Add missing closeandwriteoutput
      30e558d [Holden Karau] go back to try/finally
      e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception
      ae0b7a7 [Holden Karau] Fix the test
      2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions
      0f92be5b
    • Reynold Xin's avatar
      [SPARK-8300] DataFrame hint for broadcast join. · 6ceb1696
      Reynold Xin authored
      Users can now do
      ```scala
      left.join(broadcast(right), "joinKey")
      ```
      to give the query planner a hint that "right" DataFrame is small and should be broadcasted.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6751 from rxin/broadcastjoin-hint and squashes the following commits:
      
      953eec2 [Reynold Xin] Code review feedback.
      88752d8 [Reynold Xin] Fixed import.
      8187b88 [Reynold Xin] [SPARK-8300] DataFrame hint for broadcast join.
      6ceb1696
    • Scott Taylor's avatar
      [SPARK-8541] [PYSPARK] test the absolute error in approx doctests · f0dcbe8a
      Scott Taylor authored
      A minor change but one which is (presumably) visible on the public api docs webpage.
      
      Author: Scott Taylor <github@megatron.me.uk>
      
      Closes #6942 from megatron-me-uk/patch-3 and squashes the following commits:
      
      fbed000 [Scott Taylor] test the absolute error in approx doctests
      f0dcbe8a
    • Hari Shreedharan's avatar
      [SPARK-8483] [STREAMING] Remove commons-lang3 dependency from Flume Si… · 9b618fb0
      Hari Shreedharan authored
      …nk. Also bump Flume version to 1.6.0
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6910 from harishreedharan/remove-commons-lang3 and squashes the following commits:
      
      9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0
      ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0
      9b618fb0
    • Liang-Chi Hsieh's avatar
      [SPARK-8359] [SQL] Fix incorrect decimal precision after multiplication · 31bd3068
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8359
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6814 from viirya/fix_decimal2 and squashes the following commits:
      
      071a757 [Liang-Chi Hsieh] Remove maximum precision and use MathContext.UNLIMITED.
      df217d4 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2
      a43bfc3 [Liang-Chi Hsieh] Add MathContext with maximum supported precision.
      72eeb3f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2
      44c9348 [Liang-Chi Hsieh] Fix incorrect decimal precision after multiplication.
      31bd3068
    • Yu ISHIKAWA's avatar
      [SPARK-8431] [SPARKR] Add in operator to DataFrame Column in SparkR · d4f63351
      Yu ISHIKAWA authored
      [[SPARK-8431] Add in operator to DataFrame Column in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8431)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6941 from yu-iskw/SPARK-8431 and squashes the following commits:
      
      1f64423 [Yu ISHIKAWA] Modify the comment
      f4309a7 [Yu ISHIKAWA] Make a `setMethod` for `%in%` be independent
      6e37936 [Yu ISHIKAWA] Modify a variable name
      c196173 [Yu ISHIKAWA] [SPARK-8431][SparkR] Add in operator to DataFrame Column in SparkR
      d4f63351
    • Holden Karau's avatar
      [SPARK-7781] [MLLIB] gradient boosted trees.train regressor missing max bins · 164fe2aa
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6331 from holdenk/SPARK-7781-GradientBoostedTrees.trainRegressor-missing-max-bins and squashes the following commits:
      
      2894695 [Holden Karau] remove extra blank line
      2573e8d [Holden Karau] Update the scala side of the pythonmllibapi and make the test a bit nicer too
      3a09170 [Holden Karau] add maxBins to to the train method as well
      af7f274 [Holden Karau] Add maxBins to GradientBoostedTrees.trainRegressor and correctly mention the default of 32 in other places where it mentioned 100
      164fe2aa
  4. Jun 22, 2015
Loading