Skip to content
Snippets Groups Projects
  1. Aug 12, 2015
    • Timothy Chen's avatar
      [SPARK-9575] [MESOS] Add docuemntation around Mesos shuffle service. · 741a29f9
      Timothy Chen authored
      andrewor14
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #7907 from tnachen/mesos_shuffle.
      741a29f9
    • Timothy Chen's avatar
      [SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos · 5c99d8bf
      Timothy Chen authored
      Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #7195 from tnachen/mesos_files.
      5c99d8bf
    • Carson Wang's avatar
      [SPARK-9426] [WEBUI] Job page DAG visualization is not shown · bab89232
      Carson Wang authored
      To reproduce the issue, go to the stage page and click DAG Visualization once, then go to the job page to show the job DAG visualization. You will only see the first stage of the job.
      Root cause: the java script use local storage to remember your selection. Once you click the stage DAG visualization, the local storage set `expand-dag-viz-arrow-stage` to true. When you go to the job page, the js checks `expand-dag-viz-arrow-stage` in the local storage first and will try to show stage DAG visualization on the job page.
      To fix this, I set an id to the DAG span to differ job page and stage page. In the js code, we check the id and local storage together to make sure we show the correct DAG visualization.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #8104 from carsonwang/SPARK-9426.
      bab89232
    • zsxwing's avatar
      [SPARK-9829] [WEBUI] Display the update value for peak execution memory · 4e3f4b93
      zsxwing authored
      The peak execution memory is not correct because it shows the sum of finished tasks' values when a task finishes.
      
      This PR fixes it by using the update value rather than the accumulator value.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8121 from zsxwing/SPARK-9829.
      4e3f4b93
    • Rohit Agarwal's avatar
      [SPARK-9806] [WEB UI] Don't share ReplayListenerBus between multiple applications · a807fcbe
      Rohit Agarwal authored
      Author: Rohit Agarwal <rohita@qubole.com>
      
      Closes #8088 from mindprince/SPARK-9806.
      a807fcbe
    • xutingjun's avatar
      [SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks · b85f9a24
      xutingjun authored
      Author: xutingjun <xutingjun@huawei.com>
      Author: meiyoula <1039320815@qq.com>
      
      Closes #6817 from XuTingjun/SPARK-8366.
      b85f9a24
    • Josh Rosen's avatar
      [SPARK-9854] [SQL] RuleExecutor.timeMap should be thread-safe · b1581ac2
      Josh Rosen authored
      `RuleExecutor.timeMap` is currently a non-thread-safe mutable HashMap; this can lead to infinite loops if multiple threads are concurrently modifying the map.  I believe that this is responsible for some hangs that I've observed in HiveQuerySuite.
      
      This patch addresses this by using a Guava `AtomicLongMap`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8120 from JoshRosen/rule-executor-time-map-fix.
      b1581ac2
    • Davies Liu's avatar
      [SPARK-9831] [SQL] fix serialization with empty broadcast · c3e9a120
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8117 from davies/fix_serialization and squashes the following commits:
      
      d21ac71 [Davies Liu] fix serialization with empty broadcast
      c3e9a120
  2. Aug 11, 2015
    • Eric Liang's avatar
      [SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5 · 74a293f4
      Eric Liang authored
      This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc.
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #8085 from ericl/docs.
      74a293f4
    • Patrick Wendell's avatar
      [SPARK-1517] Refactor release scripts to facilitate nightly publishing · 3ef0f329
      Patrick Wendell authored
      This update contains some code changes to the release scripts that allow easier nightly publishing. I've been using these new scripts on Jenkins for cutting and publishing nightly snapshots for the last month or so, and it has been going well. I'd like to get them merged back upstream so this can be maintained by the community.
      
      The main changes are:
      1. Separates the release tagging from various build possibilities for an already tagged release (`release-tag.sh` and `release-build.sh`).
      2. Allow for injecting credentials through the environment, including GPG keys. This is then paired with secure key injection in Jenkins.
      3. Support for copying build results to a remote directory, and also "rotating" results, e.g. the ability to keep the last N copies of binary or doc builds.
      
      I'm happy if anyone wants to take a look at this - it's not user facing but an internal utility used for generating releases.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #7411 from pwendell/release-script-updates and squashes the following commits:
      
      74f9beb [Patrick Wendell] Moving maven build command to a variable
      233ce85 [Patrick Wendell] [SPARK-1517] Refactor release scripts to facilitate nightly publishing
      3ef0f329
    • Andrew Or's avatar
      [SPARK-9649] Fix flaky test MasterSuite again - disable REST · ca8f70e9
      Andrew Or authored
      The REST server is not actually used in most tests and so we can disable it. It is a source of flakiness because it tries to bind to a specific port in vain. There was also some code that avoided the shuffle service in tests. This is actually not necessary because the shuffle service is already off by default.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8084 from andrewor14/fix-master-suite-again.
      ca8f70e9
    • Reynold Xin's avatar
      [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible · afa757c9
      Reynold Xin authored
      DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option, so we must be able to resolve the old committer qualified name.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8114 from rxin/SPARK-9849.
      afa757c9
    • Marcelo Vanzin's avatar
      [SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set. · 5a5bbc29
      Marcelo Vanzin authored
      This change allows any Spark argument to be added to the app to
      be started using SparkLauncher. Known arguments are properly
      validated, while unknown arguments are allowed so that the
      library can launch newer Spark versions (in case SPARK_HOME points
      at one).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7975 from vanzin/SPARK-9074 and squashes the following commits:
      
      b5e451a [Marcelo Vanzin] [SPARK-9074] [launcher] Allow arbitrary Spark args to be set.
      5a5bbc29
    • Andrew Or's avatar
      [HOTFIX] Fix style error caused by 017b5de0 · 736af95b
      Andrew Or authored
      736af95b
    • Sudhakar Thota's avatar
      [SPARK-8925] [MLLIB] Add @since tags to mllib.util · 017b5de0
      Sudhakar Thota authored
      Went thru the history of changes the file MLUtils.scala and picked up the version that the change went in.
      
      Author: Sudhakar Thota <sudhakarthota@yahoo.com>
      Author: Sudhakar Thota <sudhakarthota@sudhakars-mbp-2.usca.ibm.com>
      
      Closes #7436 from sthota2014/SPARK-8925_thotas.
      017b5de0
    • Feynman Liang's avatar
      [SPARK-9788] [MLLIB] Fix LDA Binary Compatibility · be3e2716
      Feynman Liang authored
      1. Add “asymmetricDocConcentration” and revert docConcentration changes. If the (internal) doc concentration vector is a single value, “getDocConcentration" returns it. If it is a constant vector, getDocConcentration returns the first item, and fails otherwise.
      2. Give `LDAModel.gammaShape` a default value in `LDAModel` concrete class constructors.
      
      jkbradley
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8077 from feynmanliang/SPARK-9788 and squashes the following commits:
      
      6b07bc8 [Feynman Liang] Code review changes
      9d6a71e [Feynman Liang] Add asymmetricAlpha alias
      bf4e685 [Feynman Liang] Asymmetric docConcentration
      4cab972 [Feynman Liang] Default gammaShape
      be3e2716
    • Xiangrui Meng's avatar
      Closes #1290 · 423cdfd8
      Xiangrui Meng authored
      Closes #4934
      423cdfd8
    • zsxwing's avatar
      [SPARK-9824] [CORE] Fix the issue that InternalAccumulator leaks WeakReference · f16bc68d
      zsxwing authored
      `InternalAccumulator.create` doesn't call `registerAccumulatorForCleanup` to register itself with ContextCleaner, so `WeakReference`s for these accumulators in `Accumulators.originals` won't be removed.
      
      This PR added `registerAccumulatorForCleanup` for internal accumulators to avoid the memory leak.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8108 from zsxwing/internal-accumulators-leak.
      f16bc68d
    • hyukjinkwon's avatar
      [SPARK-9814] [SQL] EqualNotNull not passing to data sources · 00c02728
      hyukjinkwon authored
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: 권혁진 <gurwls223@gmail.com>
      
      Closes #8096 from HyukjinKwon/master.
      00c02728
    • Patrick Wendell's avatar
      [SPARK-7726] Add import so Scaladoc doesn't fail. · 2a3be4dd
      Patrick Wendell authored
      This is another import needed so Scala 2.11 doc generation doesn't fail.
      See SPARK-7726 for more detail. I tested this locally and the 2.11
      install goes from failing to succeeding with this patch.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #8095 from pwendell/scaladoc.
      2a3be4dd
    • Feynman Liang's avatar
      [SPARK-9750] [MLLIB] Improve equals on SparseMatrix and DenseMatrix · 520ad44b
      Feynman Liang authored
      Adds unit test for `equals` on `mllib.linalg.Matrix` class and `equals` to both `SparseMatrix` and `DenseMatrix`. Supports equality testing between `SparseMatrix` and `DenseMatrix`.
      
      mengxr
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8042 from feynmanliang/SPARK-9750 and squashes the following commits:
      
      bb70d5e [Feynman Liang] Breeze compare for dense matrices as well, in case other is sparse
      ab6f3c8 [Feynman Liang] Sparse matrix compare for equals
      22782df [Feynman Liang] Add equality based on matrix semantics, not representation
      78f9426 [Feynman Liang] Add casts
      43d28fa [Feynman Liang] Fix failing test
      6416fa0 [Feynman Liang] Add failing sparse matrix equals tests
      520ad44b
    • zsxwing's avatar
      [SPARK-9646] [SQL] Add metrics for all join and aggregate operators · 5831294a
      zsxwing authored
      This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case:
      1. The iterator is not totally consumed and the metric values will be less.
      2. Recreating the iterators will make metric values look bigger than the size of the input source, such as `CartesianProduct`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8060 from zsxwing/sql-metrics and squashes the following commits:
      
      40f3fc1 [zsxwing] Mark LongSQLMetric private[metric] to avoid using incorrectly and leak memory
      b1b9071 [zsxwing] Merge branch 'master' into sql-metrics
      4bef25a [zsxwing] Add metrics for SortMergeOuterJoin
      95ccfc6 [zsxwing] Merge branch 'master' into sql-metrics
      67cb4dd [zsxwing] Add metrics for Project and TungstenProject; remove metrics from PhysicalRDD and LocalTableScan
      0eb47d4 [zsxwing] Merge branch 'master' into sql-metrics
      dd9d932 [zsxwing] Avoid creating new Iterators
      589ea26 [zsxwing] Add metrics for all join and aggregate operators
      5831294a
    • Tathagata Das's avatar
      [SPARK-9572] [STREAMING] [PYSPARK] Added StreamingContext.getActiveOrCreate() in Python · 5b8bb1b2
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8080 from tdas/SPARK-9572 and squashes the following commits:
      
      64a231d [Tathagata Das] Fix based on comments
      741a0d0 [Tathagata Das] Fixed style
      f4f094c [Tathagata Das] Tweaked test
      9afcdbe [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
      e21488d [Tathagata Das] Minor update
      1a371d9 [Tathagata Das] Addressed comments.
      60479da [Tathagata Das] Fixed indent
      9c2da9c [Tathagata Das] Fixed bugs
      b5bd32c [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
      b55b348 [Tathagata Das] Removed prints
      5781728 [Tathagata Das] Fix style issues
      b711214 [Tathagata Das] Reverted run-tests.py
      643b59d [Tathagata Das] Revert unnecessary change
      150e58c [Tathagata Das] Added StreamingContext.getActiveOrCreate() in Python
      5b8bb1b2
    • Holden Karau's avatar
      [SPARK-8764] [ML] string indexer should take option to handle unseen values · dbd778d8
      Holden Karau authored
      As a precursor to adding a public constructor add an option to handle unseen values by skipping rather than throwing an exception (default remains throwing an exception),
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7266 from holdenk/SPARK-8764-string-indexer-should-take-option-to-handle-unseen-values and squashes the following commits:
      
      38a4de9 [Holden Karau] fix long line
      045bf22 [Holden Karau] Add a second b entry so b gets 0 for sure
      81dd312 [Holden Karau] Update the docs for handleInvalid param to be more descriptive
      7f37f6e [Holden Karau] remove extra space (scala style)
      414e249 [Holden Karau] And switch to using handleInvalid instead of skipInvalid
      1e53f9b [Holden Karau] update the param (codegen side)
      7a22215 [Holden Karau] fix typo
      100a39b [Holden Karau] Merge in master
      aa5b093 [Holden Karau] Since we filter we should never go down this code path if getSkipInvalid is true
      75ffa69 [Holden Karau] Remove extra newline
      d69ef5e [Holden Karau] Add a test
      b5734be [Holden Karau] Add support for unseen labels
      afecd4e [Holden Karau] Add a param to skip invalid entries.
      dbd778d8
    • Yanbo Liang's avatar
      [SPARK-8345] [ML] Add an SQL node as a feature transformer · 8cad854e
      Yanbo Liang authored
      Implements the transforms which are defined by SQL statement.
      Currently we only support SQL syntax like 'SELECT ... FROM __THIS__'
      where '__THIS__' represents the underlying table of the input dataset.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7465 from yanboliang/spark-8345 and squashes the following commits:
      
      b403fcb [Yanbo Liang] address comments
      0d4bb15 [Yanbo Liang] a better transformSchema() implementation
      51eb9e7 [Yanbo Liang] Add an SQL node as a feature transformer
      8cad854e
    • Jeff Zhang's avatar
      Fix comment error · bce72797
      Jeff Zhang authored
      API is updated but its doc comment is not updated.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #8097 from zjffdu/dev.
      bce72797
    • Josh Rosen's avatar
      [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering · dfe347d2
      Josh Rosen authored
      HashPartitioning compatibility is currently defined w.r.t the _set_ of expressions, but the ordering of those expressions matters when computing hash codes; this could lead to incorrect answers if we mistakenly avoided a shuffle based on the assumption that HashPartitionings with the same expressions in different orders will produce equivalent row hashcodes. The first commit adds a regression test which illustrates this problem.
      
      The fix for this is simple: make `HashPartitioning.compatibleWith` and `HashPartitioning.guarantees` sensitive to the expression ordering (i.e. do not perform set comparison).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8074 from JoshRosen/hashpartitioning-compatiblewith-fixes and squashes the following commits:
      
      b61412f [Josh Rosen] Demonstrate that I haven't cheated in my fix
      0b4d7d9 [Josh Rosen] Update so that clusteringSet is only used in satisfies().
      dc9c9d7 [Josh Rosen] Add failing regression test for SPARK-9785
      dfe347d2
    • Reynold Xin's avatar
      [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform. · d378396f
      Reynold Xin authored
      PlatformDependent.UNSAFE is way too verbose.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8094 from rxin/SPARK-9815 and squashes the following commits:
      
      229b603 [Reynold Xin] [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.
      d378396f
    • Tathagata Das's avatar
      [SPARK-9727] [STREAMING] [BUILD] Updated streaming kinesis SBT project name to be more consistent · 600031eb
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8092 from tdas/SPARK-9727 and squashes the following commits:
      
      b1b01fd [Tathagata Das] Updated streaming kinesis project name
      600031eb
    • Reynold Xin's avatar
      [SPARK-9810] [BUILD] Remove individual commit messages from the squash commit message · 55752d88
      Reynold Xin authored
      For more information, please see the JIRA ticket and the associated dev list discussion.
      
      https://issues.apache.org/jira/browse/SPARK-9810
      
      http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-Removing-individual-commit-messages-from-the-squash-commit-message-td13295.html
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8091 from rxin/SPARK-9810.
      55752d88
    • Tathagata Das's avatar
      [SPARK-9640] [STREAMING] [TEST] Do not run Python Kinesis tests when the... · 0f90d605
      Tathagata Das authored
      [SPARK-9640] [STREAMING] [TEST] Do not run Python Kinesis tests when the Kinesis assembly JAR has not been generated
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7961 from tdas/SPARK-9640 and squashes the following commits:
      
      974ce19 [Tathagata Das] Undo changes related to SPARK-9727
      004ae26 [Tathagata Das] style fixes
      9bbb97d [Tathagata Das] Minor style fies
      e6a677e [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9640
      ca90719 [Tathagata Das] Removed extra line
      ba9cfc7 [Tathagata Das] Improved kinesis test selection logic
      88d59bd [Tathagata Das] updated test modules
      871fcc8 [Tathagata Das] Fixed SparkBuild
      94be631 [Tathagata Das] Fixed style
      b858196 [Tathagata Das] Fixed conditions and few other things based on PR comments.
      e292e64 [Tathagata Das] Added filters for Kinesis python tests
      0f90d605
    • Josh Rosen's avatar
      [SPARK-9729] [SPARK-9363] [SQL] Use sort merge join for left and right outer join · 91e9389f
      Josh Rosen authored
      This patch adds a new `SortMergeOuterJoin` operator that performs left and right outer joins using sort merge join.  It also refactors `SortMergeJoin` in order to improve performance and code clarity.
      
      Along the way, I also performed a couple pieces of minor cleanup and optimization:
      
      - Rename the `HashJoin` physical planner rule to `EquiJoinSelection`, since it's also used for non-hash joins.
      - Rewrite the comment at the top of `HashJoin` to better explain the precedence for choosing join operators.
      - Update `JoinSuite` to use `SqlTestUtils.withConf` for changing SQLConf settings.
      
      This patch incorporates several ideas from adrian-wang's patch, #5717.
      
      Closes #5717.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7904)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #7904 from JoshRosen/outer-join-smj and squashes 1 commits.
      91e9389f
  3. Aug 10, 2015
    • Damian Guy's avatar
      [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists · 071bbad5
      Damian Guy authored
      This PR is inspired by #8063 authored by dguy. Especially, testing Parquet files added here are all taken from that PR.
      
      **Committer who merges this PR should attribute it to "Damian Guy <damian.guygmail.com>".**
      
      ----
      
      SPARK-6776 and SPARK-6777 followed `parquet-avro` to implement backwards-compatibility rules defined in `parquet-format` spec. However, both Spark SQL and `parquet-avro` neglected the following statement in `parquet-format`:
      
      > This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field.
      
      One of the consequences is that, Parquet files generated by `parquet-protobuf` containing unannotated repeated fields are not correctly converted to Catalyst arrays.
      
      This PR fixes this issue by
      
      1. Handling unannotated repeated fields in `CatalystSchemaConverter`.
      2. Converting this kind of special repeated fields to Catalyst arrays in `CatalystRowConverter`.
      
         Two special converters, `RepeatedPrimitiveConverter` and `RepeatedGroupConverter`, are added. They delegate actual conversion work to a child `elementConverter` and accumulates elements in an `ArrayBuffer`.
      
         Two extra methods, `start()` and `end()`, are added to `ParentContainerUpdater`. So that they can be used to initialize new `ArrayBuffer`s for unannotated repeated fields, and propagate converted array values to upstream.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8070 from liancheng/spark-9340/unannotated-parquet-list and squashes the following commits:
      
      ace6df7 [Cheng Lian] Moves ParquetProtobufCompatibilitySuite
      f1c7bfd [Cheng Lian] Updates .rat-excludes
      420ad2b [Cheng Lian] Fixes converting unannotated Parquet lists
      071bbad5
    • Hao Zhu's avatar
      [SPARK-9801] [STREAMING] Check if file exists before deleting temporary files. · 3c9802d9
      Hao Zhu authored
      Spark streaming deletes the temp file and backup files without checking if they exist or not
      
      Author: Hao Zhu <viadeazhu@gmail.com>
      
      Closes #8082 from viadea/master and squashes the following commits:
      
      242d05f [Hao Zhu] [SPARK-9801][Streaming]No need to check the existence of those files
      fd143f2 [Hao Zhu] [SPARK-9801][Streaming]Check if backupFile exists before deleting backupFile files.
      087daf0 [Hao Zhu] SPARK-9801
      3c9802d9
    • Prabeesh K's avatar
      [SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python · 853809e9
      Prabeesh K authored
      This PR is based on #4229, thanks prabeesh.
      
      Closes #4229
      
      Author: Prabeesh K <prabsmails@gmail.com>
      Author: zsxwing <zsxwing@gmail.com>
      Author: prabs <prabsmails@gmail.com>
      Author: Prabeesh K <prabeesh.k@namshi.com>
      
      Closes #7833 from zsxwing/pr4229 and squashes the following commits:
      
      9570bec [zsxwing] Fix the variable name and check null in finally
      4a9c79e [zsxwing] Fix pom.xml indentation
      abf5f18 [zsxwing] Merge branch 'master' into pr4229
      935615c [zsxwing] Fix the flaky MQTT tests
      47278c5 [zsxwing] Include the project class files
      478f844 [zsxwing] Add unpack
      5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
      734db99 [zsxwing] Merge branch 'master' into pr4229
      126608a [Prabeesh K] address the comments
      b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
      d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
      a6747cb [Prabeesh K] wait for starting the receiver before publishing data
      87fc677 [Prabeesh K] address the comments:
      97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
      80474d1 [Prabeesh K] fix
      1f0cfe9 [Prabeesh K] python style fix
      e1ee016 [Prabeesh K] scala style fix
      a5a8f9f [Prabeesh K] added Python test
      9767d82 [Prabeesh K] implemented Python-friendly class
      a11968b [Prabeesh K] fixed python style
      795ec27 [Prabeesh K] address comments
      ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
      3f4df12 [Prabeesh K] updated version
      b34c3c1 [prabs] adress comments
      3aa7fff [prabs] Added Python streaming mqtt word count example
      b7d42ff [prabs] Mqtt streaming support in Python
      853809e9
    • Davies Liu's avatar
      [SPARK-9759] [SQL] improve decimal.times() and cast(int, decimalType) · c4fd2a24
      Davies Liu authored
      This patch optimize two things:
      
      1. passing MathContext to JavaBigDecimal.multiply/divide/reminder to do right rounding, because java.math.BigDecimal.apply(MathContext) is expensive
      
      2. Cast integer/short/byte to decimal directly (without double)
      
      This two optimizations could speed up the end-to-end time of a aggregation (SUM(short * decimal(5, 2)) 75% (from 19s -> 10.8s)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8052 from davies/optimize_decimal and squashes the following commits:
      
      225efad [Davies Liu] improve decimal.times() and cast(int, decimalType)
      c4fd2a24
    • Davies Liu's avatar
      [SPARK-9620] [SQL] generated UnsafeProjection should support many columns or large exressions · fe2fb7fb
      Davies Liu authored
      Currently, generated UnsafeProjection can reach 64k byte code limit of Java. This patch will split the generated expressions into multiple functions, to avoid the limitation.
      
      After this patch, we can work well with table that have up to 64k columns (hit max number of constants limit in Java), it should be enough in practice.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8044 from davies/wider_table and squashes the following commits:
      
      9192e6c [Davies Liu] fix generated safe projection
      d1ef81a [Davies Liu] fix failed tests
      737b3d3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
      ffcd132 [Davies Liu] address comments
      1b95be4 [Davies Liu] put the generated class into sql package
      77ed72d [Davies Liu] address comments
      4518e17 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
      75ccd01 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
      495e932 [Davies Liu] support wider table with more than 1k columns for generated projections
      fe2fb7fb
    • Reynold Xin's avatar
      [SPARK-9763][SQL] Minimize exposure of internal SQL classes. · 40ed2af5
      Reynold Xin authored
      There are a few changes in this pull request:
      
      1. Moved all data sources to execution.datasources, except the public JDBC APIs.
      2. In order to maintain backward compatibility from 1, added a backward compatibility translation map in data source resolution.
      3. Moved ui and metric package into execution.
      4. Added more documentation on some internal classes.
      5. Renamed DataSourceRegister.format -> shortName.
      6. Added "override" modifier on shortName.
      7. Removed IntSQLMetric.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8056 from rxin/SPARK-9763 and squashes the following commits:
      
      9df4801 [Reynold Xin] Removed hardcoded name in test cases.
      d9babc6 [Reynold Xin] Shorten.
      e484419 [Reynold Xin] Removed VisibleForTesting.
      171b812 [Reynold Xin] MimaExcludes.
      2041389 [Reynold Xin] Compile ...
      79dda42 [Reynold Xin] Compile.
      0818ba3 [Reynold Xin] Removed IntSQLMetric.
      c46884f [Reynold Xin] Two more fixes.
      f9aa88d [Reynold Xin] [SPARK-9763][SQL] Minimize exposure of internal SQL classes.
      40ed2af5
    • Josh Rosen's avatar
      [SPARK-9784] [SQL] Exchange.isUnsafe should check whether codegen and unsafe are enabled · 0fe66744
      Josh Rosen authored
      Exchange.isUnsafe should check whether codegen and unsafe are enabled.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8073 from JoshRosen/SPARK-9784 and squashes the following commits:
      
      7a1019f [Josh Rosen] [SPARK-9784] Exchange.isUnsafe should check whether codegen and unsafe are enabled
      0fe66744
    • Mahmoud Lababidi's avatar
      Fixed AtmoicReference<> Example · d2852127
      Mahmoud Lababidi authored
      Author: Mahmoud Lababidi <lababidi@gmail.com>
      
      Closes #8076 from lababidi/master and squashes the following commits:
      
      af4553b [Mahmoud Lababidi] Fixed AtmoicReference<> Example
      d2852127
Loading