Skip to content
Snippets Groups Projects
  1. Aug 11, 2015
    • Andrew Or's avatar
      [SPARK-9649] Fix flaky test MasterSuite again - disable REST · ca8f70e9
      Andrew Or authored
      The REST server is not actually used in most tests and so we can disable it. It is a source of flakiness because it tries to bind to a specific port in vain. There was also some code that avoided the shuffle service in tests. This is actually not necessary because the shuffle service is already off by default.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8084 from andrewor14/fix-master-suite-again.
      ca8f70e9
    • Reynold Xin's avatar
      [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible · afa757c9
      Reynold Xin authored
      DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option, so we must be able to resolve the old committer qualified name.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8114 from rxin/SPARK-9849.
      afa757c9
    • Marcelo Vanzin's avatar
      [SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set. · 5a5bbc29
      Marcelo Vanzin authored
      This change allows any Spark argument to be added to the app to
      be started using SparkLauncher. Known arguments are properly
      validated, while unknown arguments are allowed so that the
      library can launch newer Spark versions (in case SPARK_HOME points
      at one).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7975 from vanzin/SPARK-9074 and squashes the following commits:
      
      b5e451a [Marcelo Vanzin] [SPARK-9074] [launcher] Allow arbitrary Spark args to be set.
      5a5bbc29
    • Andrew Or's avatar
      [HOTFIX] Fix style error caused by 017b5de0 · 736af95b
      Andrew Or authored
      736af95b
    • Sudhakar Thota's avatar
      [SPARK-8925] [MLLIB] Add @since tags to mllib.util · 017b5de0
      Sudhakar Thota authored
      Went thru the history of changes the file MLUtils.scala and picked up the version that the change went in.
      
      Author: Sudhakar Thota <sudhakarthota@yahoo.com>
      Author: Sudhakar Thota <sudhakarthota@sudhakars-mbp-2.usca.ibm.com>
      
      Closes #7436 from sthota2014/SPARK-8925_thotas.
      017b5de0
    • Feynman Liang's avatar
      [SPARK-9788] [MLLIB] Fix LDA Binary Compatibility · be3e2716
      Feynman Liang authored
      1. Add “asymmetricDocConcentration” and revert docConcentration changes. If the (internal) doc concentration vector is a single value, “getDocConcentration" returns it. If it is a constant vector, getDocConcentration returns the first item, and fails otherwise.
      2. Give `LDAModel.gammaShape` a default value in `LDAModel` concrete class constructors.
      
      jkbradley
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8077 from feynmanliang/SPARK-9788 and squashes the following commits:
      
      6b07bc8 [Feynman Liang] Code review changes
      9d6a71e [Feynman Liang] Add asymmetricAlpha alias
      bf4e685 [Feynman Liang] Asymmetric docConcentration
      4cab972 [Feynman Liang] Default gammaShape
      be3e2716
    • Xiangrui Meng's avatar
      Closes #1290 · 423cdfd8
      Xiangrui Meng authored
      Closes #4934
      423cdfd8
    • zsxwing's avatar
      [SPARK-9824] [CORE] Fix the issue that InternalAccumulator leaks WeakReference · f16bc68d
      zsxwing authored
      `InternalAccumulator.create` doesn't call `registerAccumulatorForCleanup` to register itself with ContextCleaner, so `WeakReference`s for these accumulators in `Accumulators.originals` won't be removed.
      
      This PR added `registerAccumulatorForCleanup` for internal accumulators to avoid the memory leak.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8108 from zsxwing/internal-accumulators-leak.
      f16bc68d
    • hyukjinkwon's avatar
      [SPARK-9814] [SQL] EqualNotNull not passing to data sources · 00c02728
      hyukjinkwon authored
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: 권혁진 <gurwls223@gmail.com>
      
      Closes #8096 from HyukjinKwon/master.
      00c02728
    • Patrick Wendell's avatar
      [SPARK-7726] Add import so Scaladoc doesn't fail. · 2a3be4dd
      Patrick Wendell authored
      This is another import needed so Scala 2.11 doc generation doesn't fail.
      See SPARK-7726 for more detail. I tested this locally and the 2.11
      install goes from failing to succeeding with this patch.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #8095 from pwendell/scaladoc.
      2a3be4dd
    • Feynman Liang's avatar
      [SPARK-9750] [MLLIB] Improve equals on SparseMatrix and DenseMatrix · 520ad44b
      Feynman Liang authored
      Adds unit test for `equals` on `mllib.linalg.Matrix` class and `equals` to both `SparseMatrix` and `DenseMatrix`. Supports equality testing between `SparseMatrix` and `DenseMatrix`.
      
      mengxr
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8042 from feynmanliang/SPARK-9750 and squashes the following commits:
      
      bb70d5e [Feynman Liang] Breeze compare for dense matrices as well, in case other is sparse
      ab6f3c8 [Feynman Liang] Sparse matrix compare for equals
      22782df [Feynman Liang] Add equality based on matrix semantics, not representation
      78f9426 [Feynman Liang] Add casts
      43d28fa [Feynman Liang] Fix failing test
      6416fa0 [Feynman Liang] Add failing sparse matrix equals tests
      520ad44b
    • zsxwing's avatar
      [SPARK-9646] [SQL] Add metrics for all join and aggregate operators · 5831294a
      zsxwing authored
      This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case:
      1. The iterator is not totally consumed and the metric values will be less.
      2. Recreating the iterators will make metric values look bigger than the size of the input source, such as `CartesianProduct`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8060 from zsxwing/sql-metrics and squashes the following commits:
      
      40f3fc1 [zsxwing] Mark LongSQLMetric private[metric] to avoid using incorrectly and leak memory
      b1b9071 [zsxwing] Merge branch 'master' into sql-metrics
      4bef25a [zsxwing] Add metrics for SortMergeOuterJoin
      95ccfc6 [zsxwing] Merge branch 'master' into sql-metrics
      67cb4dd [zsxwing] Add metrics for Project and TungstenProject; remove metrics from PhysicalRDD and LocalTableScan
      0eb47d4 [zsxwing] Merge branch 'master' into sql-metrics
      dd9d932 [zsxwing] Avoid creating new Iterators
      589ea26 [zsxwing] Add metrics for all join and aggregate operators
      5831294a
    • Tathagata Das's avatar
      [SPARK-9572] [STREAMING] [PYSPARK] Added StreamingContext.getActiveOrCreate() in Python · 5b8bb1b2
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8080 from tdas/SPARK-9572 and squashes the following commits:
      
      64a231d [Tathagata Das] Fix based on comments
      741a0d0 [Tathagata Das] Fixed style
      f4f094c [Tathagata Das] Tweaked test
      9afcdbe [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
      e21488d [Tathagata Das] Minor update
      1a371d9 [Tathagata Das] Addressed comments.
      60479da [Tathagata Das] Fixed indent
      9c2da9c [Tathagata Das] Fixed bugs
      b5bd32c [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
      b55b348 [Tathagata Das] Removed prints
      5781728 [Tathagata Das] Fix style issues
      b711214 [Tathagata Das] Reverted run-tests.py
      643b59d [Tathagata Das] Revert unnecessary change
      150e58c [Tathagata Das] Added StreamingContext.getActiveOrCreate() in Python
      5b8bb1b2
    • Holden Karau's avatar
      [SPARK-8764] [ML] string indexer should take option to handle unseen values · dbd778d8
      Holden Karau authored
      As a precursor to adding a public constructor add an option to handle unseen values by skipping rather than throwing an exception (default remains throwing an exception),
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7266 from holdenk/SPARK-8764-string-indexer-should-take-option-to-handle-unseen-values and squashes the following commits:
      
      38a4de9 [Holden Karau] fix long line
      045bf22 [Holden Karau] Add a second b entry so b gets 0 for sure
      81dd312 [Holden Karau] Update the docs for handleInvalid param to be more descriptive
      7f37f6e [Holden Karau] remove extra space (scala style)
      414e249 [Holden Karau] And switch to using handleInvalid instead of skipInvalid
      1e53f9b [Holden Karau] update the param (codegen side)
      7a22215 [Holden Karau] fix typo
      100a39b [Holden Karau] Merge in master
      aa5b093 [Holden Karau] Since we filter we should never go down this code path if getSkipInvalid is true
      75ffa69 [Holden Karau] Remove extra newline
      d69ef5e [Holden Karau] Add a test
      b5734be [Holden Karau] Add support for unseen labels
      afecd4e [Holden Karau] Add a param to skip invalid entries.
      dbd778d8
    • Yanbo Liang's avatar
      [SPARK-8345] [ML] Add an SQL node as a feature transformer · 8cad854e
      Yanbo Liang authored
      Implements the transforms which are defined by SQL statement.
      Currently we only support SQL syntax like 'SELECT ... FROM __THIS__'
      where '__THIS__' represents the underlying table of the input dataset.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7465 from yanboliang/spark-8345 and squashes the following commits:
      
      b403fcb [Yanbo Liang] address comments
      0d4bb15 [Yanbo Liang] a better transformSchema() implementation
      51eb9e7 [Yanbo Liang] Add an SQL node as a feature transformer
      8cad854e
    • Jeff Zhang's avatar
      Fix comment error · bce72797
      Jeff Zhang authored
      API is updated but its doc comment is not updated.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #8097 from zjffdu/dev.
      bce72797
    • Josh Rosen's avatar
      [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering · dfe347d2
      Josh Rosen authored
      HashPartitioning compatibility is currently defined w.r.t the _set_ of expressions, but the ordering of those expressions matters when computing hash codes; this could lead to incorrect answers if we mistakenly avoided a shuffle based on the assumption that HashPartitionings with the same expressions in different orders will produce equivalent row hashcodes. The first commit adds a regression test which illustrates this problem.
      
      The fix for this is simple: make `HashPartitioning.compatibleWith` and `HashPartitioning.guarantees` sensitive to the expression ordering (i.e. do not perform set comparison).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8074 from JoshRosen/hashpartitioning-compatiblewith-fixes and squashes the following commits:
      
      b61412f [Josh Rosen] Demonstrate that I haven't cheated in my fix
      0b4d7d9 [Josh Rosen] Update so that clusteringSet is only used in satisfies().
      dc9c9d7 [Josh Rosen] Add failing regression test for SPARK-9785
      dfe347d2
    • Reynold Xin's avatar
      [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform. · d378396f
      Reynold Xin authored
      PlatformDependent.UNSAFE is way too verbose.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8094 from rxin/SPARK-9815 and squashes the following commits:
      
      229b603 [Reynold Xin] [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.
      d378396f
    • Tathagata Das's avatar
      [SPARK-9727] [STREAMING] [BUILD] Updated streaming kinesis SBT project name to be more consistent · 600031eb
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8092 from tdas/SPARK-9727 and squashes the following commits:
      
      b1b01fd [Tathagata Das] Updated streaming kinesis project name
      600031eb
    • Reynold Xin's avatar
      [SPARK-9810] [BUILD] Remove individual commit messages from the squash commit message · 55752d88
      Reynold Xin authored
      For more information, please see the JIRA ticket and the associated dev list discussion.
      
      https://issues.apache.org/jira/browse/SPARK-9810
      
      http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-Removing-individual-commit-messages-from-the-squash-commit-message-td13295.html
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8091 from rxin/SPARK-9810.
      55752d88
    • Tathagata Das's avatar
      [SPARK-9640] [STREAMING] [TEST] Do not run Python Kinesis tests when the... · 0f90d605
      Tathagata Das authored
      [SPARK-9640] [STREAMING] [TEST] Do not run Python Kinesis tests when the Kinesis assembly JAR has not been generated
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7961 from tdas/SPARK-9640 and squashes the following commits:
      
      974ce19 [Tathagata Das] Undo changes related to SPARK-9727
      004ae26 [Tathagata Das] style fixes
      9bbb97d [Tathagata Das] Minor style fies
      e6a677e [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9640
      ca90719 [Tathagata Das] Removed extra line
      ba9cfc7 [Tathagata Das] Improved kinesis test selection logic
      88d59bd [Tathagata Das] updated test modules
      871fcc8 [Tathagata Das] Fixed SparkBuild
      94be631 [Tathagata Das] Fixed style
      b858196 [Tathagata Das] Fixed conditions and few other things based on PR comments.
      e292e64 [Tathagata Das] Added filters for Kinesis python tests
      0f90d605
    • Josh Rosen's avatar
      [SPARK-9729] [SPARK-9363] [SQL] Use sort merge join for left and right outer join · 91e9389f
      Josh Rosen authored
      This patch adds a new `SortMergeOuterJoin` operator that performs left and right outer joins using sort merge join.  It also refactors `SortMergeJoin` in order to improve performance and code clarity.
      
      Along the way, I also performed a couple pieces of minor cleanup and optimization:
      
      - Rename the `HashJoin` physical planner rule to `EquiJoinSelection`, since it's also used for non-hash joins.
      - Rewrite the comment at the top of `HashJoin` to better explain the precedence for choosing join operators.
      - Update `JoinSuite` to use `SqlTestUtils.withConf` for changing SQLConf settings.
      
      This patch incorporates several ideas from adrian-wang's patch, #5717.
      
      Closes #5717.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7904)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #7904 from JoshRosen/outer-join-smj and squashes 1 commits.
      91e9389f
  2. Aug 10, 2015
    • Damian Guy's avatar
      [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists · 071bbad5
      Damian Guy authored
      This PR is inspired by #8063 authored by dguy. Especially, testing Parquet files added here are all taken from that PR.
      
      **Committer who merges this PR should attribute it to "Damian Guy <damian.guygmail.com>".**
      
      ----
      
      SPARK-6776 and SPARK-6777 followed `parquet-avro` to implement backwards-compatibility rules defined in `parquet-format` spec. However, both Spark SQL and `parquet-avro` neglected the following statement in `parquet-format`:
      
      > This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field.
      
      One of the consequences is that, Parquet files generated by `parquet-protobuf` containing unannotated repeated fields are not correctly converted to Catalyst arrays.
      
      This PR fixes this issue by
      
      1. Handling unannotated repeated fields in `CatalystSchemaConverter`.
      2. Converting this kind of special repeated fields to Catalyst arrays in `CatalystRowConverter`.
      
         Two special converters, `RepeatedPrimitiveConverter` and `RepeatedGroupConverter`, are added. They delegate actual conversion work to a child `elementConverter` and accumulates elements in an `ArrayBuffer`.
      
         Two extra methods, `start()` and `end()`, are added to `ParentContainerUpdater`. So that they can be used to initialize new `ArrayBuffer`s for unannotated repeated fields, and propagate converted array values to upstream.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8070 from liancheng/spark-9340/unannotated-parquet-list and squashes the following commits:
      
      ace6df7 [Cheng Lian] Moves ParquetProtobufCompatibilitySuite
      f1c7bfd [Cheng Lian] Updates .rat-excludes
      420ad2b [Cheng Lian] Fixes converting unannotated Parquet lists
      071bbad5
    • Hao Zhu's avatar
      [SPARK-9801] [STREAMING] Check if file exists before deleting temporary files. · 3c9802d9
      Hao Zhu authored
      Spark streaming deletes the temp file and backup files without checking if they exist or not
      
      Author: Hao Zhu <viadeazhu@gmail.com>
      
      Closes #8082 from viadea/master and squashes the following commits:
      
      242d05f [Hao Zhu] [SPARK-9801][Streaming]No need to check the existence of those files
      fd143f2 [Hao Zhu] [SPARK-9801][Streaming]Check if backupFile exists before deleting backupFile files.
      087daf0 [Hao Zhu] SPARK-9801
      3c9802d9
    • Prabeesh K's avatar
      [SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python · 853809e9
      Prabeesh K authored
      This PR is based on #4229, thanks prabeesh.
      
      Closes #4229
      
      Author: Prabeesh K <prabsmails@gmail.com>
      Author: zsxwing <zsxwing@gmail.com>
      Author: prabs <prabsmails@gmail.com>
      Author: Prabeesh K <prabeesh.k@namshi.com>
      
      Closes #7833 from zsxwing/pr4229 and squashes the following commits:
      
      9570bec [zsxwing] Fix the variable name and check null in finally
      4a9c79e [zsxwing] Fix pom.xml indentation
      abf5f18 [zsxwing] Merge branch 'master' into pr4229
      935615c [zsxwing] Fix the flaky MQTT tests
      47278c5 [zsxwing] Include the project class files
      478f844 [zsxwing] Add unpack
      5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
      734db99 [zsxwing] Merge branch 'master' into pr4229
      126608a [Prabeesh K] address the comments
      b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
      d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
      a6747cb [Prabeesh K] wait for starting the receiver before publishing data
      87fc677 [Prabeesh K] address the comments:
      97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
      80474d1 [Prabeesh K] fix
      1f0cfe9 [Prabeesh K] python style fix
      e1ee016 [Prabeesh K] scala style fix
      a5a8f9f [Prabeesh K] added Python test
      9767d82 [Prabeesh K] implemented Python-friendly class
      a11968b [Prabeesh K] fixed python style
      795ec27 [Prabeesh K] address comments
      ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
      3f4df12 [Prabeesh K] updated version
      b34c3c1 [prabs] adress comments
      3aa7fff [prabs] Added Python streaming mqtt word count example
      b7d42ff [prabs] Mqtt streaming support in Python
      853809e9
    • Davies Liu's avatar
      [SPARK-9759] [SQL] improve decimal.times() and cast(int, decimalType) · c4fd2a24
      Davies Liu authored
      This patch optimize two things:
      
      1. passing MathContext to JavaBigDecimal.multiply/divide/reminder to do right rounding, because java.math.BigDecimal.apply(MathContext) is expensive
      
      2. Cast integer/short/byte to decimal directly (without double)
      
      This two optimizations could speed up the end-to-end time of a aggregation (SUM(short * decimal(5, 2)) 75% (from 19s -> 10.8s)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8052 from davies/optimize_decimal and squashes the following commits:
      
      225efad [Davies Liu] improve decimal.times() and cast(int, decimalType)
      c4fd2a24
    • Davies Liu's avatar
      [SPARK-9620] [SQL] generated UnsafeProjection should support many columns or large exressions · fe2fb7fb
      Davies Liu authored
      Currently, generated UnsafeProjection can reach 64k byte code limit of Java. This patch will split the generated expressions into multiple functions, to avoid the limitation.
      
      After this patch, we can work well with table that have up to 64k columns (hit max number of constants limit in Java), it should be enough in practice.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8044 from davies/wider_table and squashes the following commits:
      
      9192e6c [Davies Liu] fix generated safe projection
      d1ef81a [Davies Liu] fix failed tests
      737b3d3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
      ffcd132 [Davies Liu] address comments
      1b95be4 [Davies Liu] put the generated class into sql package
      77ed72d [Davies Liu] address comments
      4518e17 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
      75ccd01 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
      495e932 [Davies Liu] support wider table with more than 1k columns for generated projections
      fe2fb7fb
    • Reynold Xin's avatar
      [SPARK-9763][SQL] Minimize exposure of internal SQL classes. · 40ed2af5
      Reynold Xin authored
      There are a few changes in this pull request:
      
      1. Moved all data sources to execution.datasources, except the public JDBC APIs.
      2. In order to maintain backward compatibility from 1, added a backward compatibility translation map in data source resolution.
      3. Moved ui and metric package into execution.
      4. Added more documentation on some internal classes.
      5. Renamed DataSourceRegister.format -> shortName.
      6. Added "override" modifier on shortName.
      7. Removed IntSQLMetric.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8056 from rxin/SPARK-9763 and squashes the following commits:
      
      9df4801 [Reynold Xin] Removed hardcoded name in test cases.
      d9babc6 [Reynold Xin] Shorten.
      e484419 [Reynold Xin] Removed VisibleForTesting.
      171b812 [Reynold Xin] MimaExcludes.
      2041389 [Reynold Xin] Compile ...
      79dda42 [Reynold Xin] Compile.
      0818ba3 [Reynold Xin] Removed IntSQLMetric.
      c46884f [Reynold Xin] Two more fixes.
      f9aa88d [Reynold Xin] [SPARK-9763][SQL] Minimize exposure of internal SQL classes.
      40ed2af5
    • Josh Rosen's avatar
      [SPARK-9784] [SQL] Exchange.isUnsafe should check whether codegen and unsafe are enabled · 0fe66744
      Josh Rosen authored
      Exchange.isUnsafe should check whether codegen and unsafe are enabled.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8073 from JoshRosen/SPARK-9784 and squashes the following commits:
      
      7a1019f [Josh Rosen] [SPARK-9784] Exchange.isUnsafe should check whether codegen and unsafe are enabled
      0fe66744
    • Mahmoud Lababidi's avatar
      Fixed AtmoicReference<> Example · d2852127
      Mahmoud Lababidi authored
      Author: Mahmoud Lababidi <lababidi@gmail.com>
      
      Closes #8076 from lababidi/master and squashes the following commits:
      
      af4553b [Mahmoud Lababidi] Fixed AtmoicReference<> Example
      d2852127
    • Feynman Liang's avatar
      [SPARK-9755] [MLLIB] Add docs to MultivariateOnlineSummarizer methods · 00b655cc
      Feynman Liang authored
      Adds method documentations back to `MultivariateOnlineSummarizer`, which were present in 1.4 but disappeared somewhere along the way to 1.5.
      
      jkbradley
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8045 from feynmanliang/SPARK-9755 and squashes the following commits:
      
      af67fde [Feynman Liang] Add MultivariateOnlineSummarizer docs
      00b655cc
    • Marcelo Vanzin's avatar
      [SPARK-9710] [TEST] Fix RPackageUtilsSuite when R is not available. · 0f3366a4
      Marcelo Vanzin authored
      RUtils.isRInstalled throws an exception if R is not installed,
      instead of returning false. Fix that.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8008 from vanzin/SPARK-9710 and squashes the following commits:
      
      df72d8c [Marcelo Vanzin] [SPARK-9710] [test] Fix RPackageUtilsSuite when R is not available.
      0f3366a4
    • Cheng Lian's avatar
      [SPARK-9743] [SQL] Fixes JSONRelation refreshing · e3fef0f9
      Cheng Lian authored
      PR #7696 added two `HadoopFsRelation.refresh()` calls ([this] [1], and [this] [2]) in `DataSourceStrategy` to make test case `InsertSuite.save directly to the path of a JSON table` pass. However, this forces every `HadoopFsRelation` table scan to do a refresh, which can be super expensive for tables with large number of partitions.
      
      The reason why the original test case fails without the `refresh()` calls is that, the old JSON relation builds the base RDD with the input paths, while `HadoopFsRelation` provides `FileStatus`es of leaf files. With the old JSON relation, we can create a temporary table based on a path, writing data to that, and then read newly written data without refreshing the table. This is no long true for `HadoopFsRelation`.
      
      This PR removes those two expensive refresh calls, and moves the refresh into `JSONRelation` to fix this issue. We might want to update `HadoopFsRelation` interface to provide better support for this use case.
      
      [1]: https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L63
      [2]: https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L91
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8035 from liancheng/spark-9743/fix-json-relation-refreshing and squashes the following commits:
      
      ec1957d [Cheng Lian] Fixes JSONRelation refreshing
      e3fef0f9
    • Yin Huai's avatar
      [SPARK-9777] [SQL] Window operator can accept UnsafeRows · be80def0
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-9777
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8064 from yhuai/windowUnsafe and squashes the following commits:
      
      8fb3537 [Yin Huai] Set canProcessUnsafeRows to true.
      be80def0
  3. Aug 09, 2015
    • Shivaram Venkataraman's avatar
      [CORE] [SPARK-9760] Use Option instead of Some for Ivy repos · 46025616
      Shivaram Venkataraman authored
      This was introduced in #7599
      
      cc rxin brkyvz
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8055 from shivaram/spark-packages-repo-fix and squashes the following commits:
      
      890f306 [Shivaram Venkataraman] Remove test case
      51d69ee [Shivaram Venkataraman] Add test case for --packages without --repository
      c02e0b4 [Shivaram Venkataraman] Use Option instead of Some for Ivy repos
      46025616
    • Josh Rosen's avatar
      [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles · 23cf5af0
      Josh Rosen authored
      This pull request refactors the `EnsureRequirements` planning rule in order to avoid the addition of certain unnecessary shuffles.
      
      As an example of how unnecessary shuffles can occur, consider SortMergeJoin, which requires clustered distribution and sorted ordering of its children's input rows. Say that both of SMJ's children produce unsorted output but are both SinglePartition. In this case, we will need to inject sort operators but should not need to inject Exchanges. Unfortunately, it looks like the EnsureRequirements unnecessarily repartitions using a hash partitioning.
      
      This patch solves this problem by refactoring `EnsureRequirements` to properly implement the `compatibleWith` checks that were broken in earlier implementations. See the significant inline comments for a better description of how this works. The majority of this PR is new comments and test cases, with few actual changes to the code.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7988 from JoshRosen/exchange-fixes and squashes the following commits:
      
      38006e7 [Josh Rosen] Rewrite EnsureRequirements _yet again_ to make things even simpler
      0983f75 [Josh Rosen] More guarantees vs. compatibleWith cleanup; delete BroadcastPartitioning.
      8784bd9 [Josh Rosen] Giant comment explaining compatibleWith vs. guarantees
      1307c50 [Josh Rosen] Update conditions for requiring child compatibility.
      18cddeb [Josh Rosen] Rename DummyPlan to DummySparkPlan.
      2c7e126 [Josh Rosen] Merge remote-tracking branch 'origin/master' into exchange-fixes
      fee65c4 [Josh Rosen] Further refinement to comments / reasoning
      642b0bb [Josh Rosen] Further expand comment / reasoning
      06aba0c [Josh Rosen] Add more comments
      8dbc845 [Josh Rosen] Add even more tests.
      4f08278 [Josh Rosen] Fix the test by adding the compatibility check to EnsureRequirements
      a1c12b9 [Josh Rosen] Add failing test to demonstrate allCompatible bug
      0725a34 [Josh Rosen] Small assertion cleanup.
      5172ac5 [Josh Rosen] Add test for requiresChildrenToProduceSameNumberOfPartitions.
      2e0f33a [Josh Rosen] Write a more generic test for EnsureRequirements.
      752b8de [Josh Rosen] style fix
      c628daf [Josh Rosen] Revert accidental ExchangeSuite change.
      c9fb231 [Josh Rosen] Rewrite exchange to fix better handle this case.
      adcc742 [Josh Rosen] Move test to PlannerSuite.
      0675956 [Josh Rosen] Preserving ordering and partitioning in row format converters also does not help.
      cc5669c [Josh Rosen] Adding outputPartitioning to Repartition does not fix the test.
      2dfc648 [Josh Rosen] Add failing test illustrating bad exchange planning.
      23cf5af0
    • Reynold Xin's avatar
    • Yadong Qi's avatar
      [SPARK-9737] [YARN] Add the suggested configuration when required executor... · 86fa4ba6
      Yadong Qi authored
      [SPARK-9737] [YARN] Add the suggested configuration when required executor memory is above the max threshold of this cluster on YARN mode
      
      Author: Yadong Qi <qiyadong2010@gmail.com>
      
      Closes #8028 from watermen/SPARK-9737 and squashes the following commits:
      
      48bdf3d [Yadong Qi] Add suggested configuration.
      86fa4ba6
    • Yijie Shen's avatar
      [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if... · 68ccc6e1
      Yijie Shen authored
      [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #8057 from yjshen/explode_star and squashes the following commits:
      
      eae181d [Yijie Shen] change explaination message
      54c9d11 [Yijie Shen] meaning message for * in explode
      68ccc6e1
    • Reynold Xin's avatar
      [SPARK-9752][SQL] Support UnsafeRow in Sample operator. · e9c36938
      Reynold Xin authored
      In order for this to work, I had to disable gap sampling.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8040 from rxin/SPARK-9752 and squashes the following commits:
      
      f9e248c [Reynold Xin] Fix the test case for real this time.
      adbccb3 [Reynold Xin] Fixed test case.
      589fb23 [Reynold Xin] Merge branch 'SPARK-9752' of github.com:rxin/spark into SPARK-9752
      55ccddc [Reynold Xin] Fixed core test.
      78fa895 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator.
      c9e7112 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator.
      e9c36938
Loading