Skip to content
Snippets Groups Projects
  1. Jul 22, 2015
    • Kenichi Maehashi's avatar
      [SPARK-9180] fix spark-shell to accept --name option · 430cd781
      Kenichi Maehashi authored
      This patch fixes [[SPARK-9180]](https://issues.apache.org/jira/browse/SPARK-9180).
      Users can now set the app name of spark-shell using `spark-shell --name "whatever"`.
      
      Author: Kenichi Maehashi <webmaster@kenichimaehashi.com>
      
      Closes #7512 from kmaehashi/fix-spark-shell-app-name and squashes the following commits:
      
      e24991a [Kenichi Maehashi] use setIfMissing instead of setAppName
      18aa4ad [Kenichi Maehashi] fix spark-shell to accept --name option
      430cd781
    • Iulian Dragos's avatar
      [SPARK-8975] [STREAMING] Adds a mechanism to send a new rate from the driver to the block generator · 798dff7b
      Iulian Dragos authored
      First step for [SPARK-7398](https://issues.apache.org/jira/browse/SPARK-7398).
      
      tdas huitseeker
      
      Author: Iulian Dragos <jaguarul@gmail.com>
      Author: François Garillot <francois@garillot.net>
      
      Closes #7471 from dragos/topic/streaming-bp/dynamic-rate and squashes the following commits:
      
      8941cf9 [Iulian Dragos] Renames and other nitpicks.
      162d9e5 [Iulian Dragos] Use Reflection for accessing truly private `executor` method and use the listener bus to know when receivers have registered (`onStart` is called before receivers have registered, leading to flaky behavior).
      210f495 [Iulian Dragos] Revert "Added a few tests that measure the receiver’s rate."
      0c51959 [Iulian Dragos] Added a few tests that measure the receiver’s rate.
      261a051 [Iulian Dragos] - removed field to hold the current rate limit in rate limiter - made rate limit a Long and default to Long.MaxValue (consequence of the above) - removed custom `waitUntil` and replaced it by `eventually`
      cd1397d [Iulian Dragos] Add a test for the propagation of a new rate limit from driver to receivers.
      6369b30 [Iulian Dragos] Merge pull request #15 from huitseeker/SPARK-8975
      d15de42 [François Garillot] [SPARK-8975][Streaming] Adds Ratelimiter unit tests w.r.t. spark.streaming.receiver.maxRate
      4721c7d [François Garillot] [SPARK-8975][Streaming] Add a mechanism to send a new rate from the driver to the block generator
      798dff7b
    • Matei Zaharia's avatar
      [SPARK-9244] Increase some memory defaults · fe26584a
      Matei Zaharia authored
      There are a few memory limits that people hit often and that we could
      make higher, especially now that memory sizes have grown.
      
      - spark.akka.frameSize: This defaults at 10 but is often hit for map
        output statuses in large shuffles. This memory is not fully allocated
        up-front, so we can just make this larger and still not affect jobs
        that never sent a status that large. We increase it to 128.
      
      - spark.executor.memory: Defaults at 512m, which is really small. We
        increase it to 1g.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #7586 from mateiz/configs and squashes the following commits:
      
      ce0038a [Matei Zaharia] [SPARK-9244] Increase some memory defaults
      fe26584a
    • Feynman Liang's avatar
      [SPARK-8536] [MLLIB] Generalize OnlineLDAOptimizer to asymmetric document-topic Dirichlet priors · 1aca9c13
      Feynman Liang authored
      Modify `LDA` to take asymmetric document-topic prior distributions and `OnlineLDAOptimizer` to use the asymmetric prior during variational inference.
      
      This PR only generalizes `OnlineLDAOptimizer` and the associated `LocalLDAModel`; `EMLDAOptimizer` and `DistributedLDAModel` still only support symmetric `alpha` (checked during `EMLDAOptimizer.initialize`).
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #7575 from feynmanliang/SPARK-8536-LDA-asymmetric-priors and squashes the following commits:
      
      af8fbb7 [Feynman Liang] Fix merge errors
      ef5821d [Feynman Liang] Merge remote-tracking branch 'apache/master' into SPARK-8536-LDA-asymmetric-priors
      58f1d7b [Feynman Liang] Fix from review feedback
      a6dcf70 [Feynman Liang] Change docConcentration interface and move LDAOptimizer validation to initialize, add sad path tests
      72038ff [Feynman Liang] Add tests referenced against gensim
      d4284fa [Feynman Liang] Generalize OnlineLDA to asymmetric priors, no tests
      1aca9c13
    • Yin Huai's avatar
      [SPARK-4366] [SQL] [Follow-up] Fix SqlParser compiling warning. · cf21d05f
      Yin Huai authored
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7588 from yhuai/SPARK-4366-update1 and squashes the following commits:
      
      25f5f36 [Yin Huai] Fix SqlParser Warning.
      cf21d05f
    • Feynman Liang's avatar
      [SPARK-9224] [MLLIB] OnlineLDA Performance Improvements · 8486cd85
      Feynman Liang authored
      In-place updates, reduce number of transposes, and vectorize operations in OnlineLDA implementation.
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #7454 from feynmanliang/OnlineLDA-perf-improvements and squashes the following commits:
      
      78b0f5a [Feynman Liang] Make in-place variables vals, fix BLAS error
      7f62a55 [Feynman Liang] --amend
      c62cb1e [Feynman Liang] Outer product for stats, revert Range slicing
      aead650 [Feynman Liang] Range slice, in-place update, reduce transposes
      8486cd85
    • Davies Liu's avatar
      [SPARK-9024] Unsafe HashJoin/HashOuterJoin/HashSemiJoin · e0b7ba59
      Davies Liu authored
      This PR introduce unsafe version (using UnsafeRow) of HashJoin, HashOuterJoin and HashSemiJoin, including the broadcast one and shuffle one (except FullOuterJoin, which is better to be implemented using SortMergeJoin).
      
      It use HashMap to store UnsafeRow right now, will change to use BytesToBytesMap for better performance (in another PR).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7480 from davies/unsafe_join and squashes the following commits:
      
      6294b1e [Davies Liu] fix projection
      10583f1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_join
      dede020 [Davies Liu] fix test
      84c9807 [Davies Liu] address comments
      a05b4f6 [Davies Liu] support UnsafeRow in LeftSemiJoinBNL and BroadcastNestedLoopJoin
      611d2ed [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_join
      9481ae8 [Davies Liu] return UnsafeRow after join()
      ca2b40f [Davies Liu] revert unrelated change
      68f5cd9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_join
      0f4380d [Davies Liu] ada a comment
      69e38f5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_join
      1a40f02 [Davies Liu] refactor
      ab1690f [Davies Liu] address comments
      60371f2 [Davies Liu] use UnsafeRow in SemiJoin
      a6c0b7d [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_join
      184b852 [Davies Liu] fix style
      6acbb11 [Davies Liu] fix tests
      95d0762 [Davies Liu] remove println
      bea4a50 [Davies Liu] Unsafe HashJoin
      e0b7ba59
    • Yijie Shen's avatar
      [SPARK-9165] [SQL] codegen for CreateArray, CreateStruct and CreateNamedStruct · 86f80e2b
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9165
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7537 from yjshen/array_struct_codegen and squashes the following commits:
      
      3a6dce6 [Yijie Shen] use infix notion in createArray test
      5e90f0a [Yijie Shen] resolve comments: classOf
      39cefb8 [Yijie Shen] codegen for createArray createStruct & createNamedStruct
      86f80e2b
    • Wenchen Fan's avatar
      [SPARK-9082] [SQL] Filter using non-deterministic expressions should not be pushed down · 76520955
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7446 from cloud-fan/filter and squashes the following commits:
      
      330021e [Wenchen Fan] add exists to tree node
      2cab68c [Wenchen Fan] more enhance
      949be07 [Wenchen Fan] push down part of predicate if possible
      3912f84 [Wenchen Fan] address comments
      8ce15ca [Wenchen Fan] fix bug
      557158e [Wenchen Fan] Filter using non-deterministic expressions should not be pushed down
      76520955
    • Cheng Lian's avatar
      [SPARK-9254] [BUILD] [HOTFIX] sbt-launch-lib.bash should support HTTP/HTTPS redirection · b55a36bc
      Cheng Lian authored
      Target file(s) can be hosted on CDN nodes. HTTP/HTTPS redirection must be supported to download these files.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7597 from liancheng/spark-9254 and squashes the following commits:
      
      fd266ca [Cheng Lian] Uses `--fail' to make curl return non-zero value and remove garbage output when the download fails
      a7cbfb3 [Cheng Lian] Supports HTTP/HTTPS redirection
      b55a36bc
    • Yin Huai's avatar
      [SPARK-4233] [SPARK-4367] [SPARK-3947] [SPARK-3056] [SQL] Aggregation Improvement · c03299a1
      Yin Huai authored
      This is the first PR for the aggregation improvement, which is tracked by https://issues.apache.org/jira/browse/SPARK-4366 (umbrella JIRA). This PR contains work for its subtasks, SPARK-3056, SPARK-3947, SPARK-4233, and SPARK-4367.
      
      This PR introduces a new code path for evaluating aggregate functions. This code path is guarded by `spark.sql.useAggregate2` and by default the value of this flag is true.
      
      This new code path contains:
      * A new aggregate function interface (`AggregateFunction2`) and 7 built-int aggregate functions based on this new interface (`AVG`, `COUNT`, `FIRST`, `LAST`, `MAX`, `MIN`, `SUM`)
      * A UDAF interface (`UserDefinedAggregateFunction`) based on the new code path and two example UDAFs (`MyDoubleAvg` and `MyDoubleSum`).
      * A sort-based aggregate operator (`Aggregate2Sort`) for the new aggregate function interface .
      * A sort-based aggregate operator (`FinalAndCompleteAggregate2Sort`) for distinct aggregations (for distinct aggregations the query plan will use `Aggregate2Sort` and `FinalAndCompleteAggregate2Sort` together).
      
      With this change, `spark.sql.useAggregate2` is `true`, the flow of compiling an aggregation query is:
      1. Our analyzer looks up functions and returns aggregate functions built based on the old aggregate function interface.
      2. When our planner is compiling the physical plan, it tries try to convert all aggregate functions to the ones built based on the new interface. The planner will fallback to the old code path if any of the following two conditions is true:
      * code-gen is disabled.
      * there is any function that cannot be converted (right now, Hive UDAFs).
      * the schema of grouping expressions contain any complex data type.
      * There are multiple distinct columns.
      
      Right now, the new code path handles a single distinct column in the query (you can have multiple aggregate functions using that distinct column). For a query having a aggregate function with DISTINCT and regular aggregate functions, the generated plan will do partial aggregations for those regular aggregate function.
      
      Thanks chenghao-intel for his initial work on it.
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #7458 from yhuai/UDAF and squashes the following commits:
      
      7865f5e [Yin Huai] Put the catalyst expression in the comment of the generated code for it.
      b04d6c8 [Yin Huai] Remove unnecessary change.
      f1d5901 [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      35b0520 [Yin Huai] Use semanticEquals to replace grouping expressions in the output of the aggregate operator.
      3b43b24 [Yin Huai] bug fix.
      00eb298 [Yin Huai] Make it compile.
      a3ca551 [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      e0afca3 [Yin Huai] Gracefully fallback to old aggregation code path.
      8a8ac4a [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      88c7d4d [Yin Huai] Enable spark.sql.useAggregate2 by default for testing purpose.
      dc96fd1 [Yin Huai] Many updates:
      85c9c4b [Yin Huai] newline.
      43de3de [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      c3614d7 [Yin Huai] Handle single distinct column.
      68b8ee9 [Yin Huai] Support single distinct column set. WIP
      3013579 [Yin Huai] Format.
      d678aee [Yin Huai] Remove AggregateExpressionSuite.scala since our built-in aggregate functions will be based on AlgebraicAggregate and we need to have another way to test it.
      e243ca6 [Yin Huai] Add aggregation iterators.
      a101960 [Yin Huai] Change MyJavaUDAF to MyDoubleSum.
      594cdf5 [Yin Huai] Change existing AggregateExpression to AggregateExpression1 and add an AggregateExpression as the common interface for both AggregateExpression1 and AggregateExpression2.
      380880f [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      0a827b3 [Yin Huai] Add comments and doc. Move some classes to the right places.
      a19fea6 [Yin Huai] Add UDAF interface.
      262d4c4 [Yin Huai] Make it compile.
      b2e358e [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      6edb5ac [Yin Huai] Format update.
      70b169c [Yin Huai] Remove groupOrdering.
      4721936 [Yin Huai] Add CheckAggregateFunction to extendedCheckRules.
      d821a34 [Yin Huai] Cleanup.
      32aea9c [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      5b46d41 [Yin Huai] Bug fix.
      aff9534 [Yin Huai] Make Aggregate2Sort work with both algebraic AggregateFunctions and non-algebraic AggregateFunctions.
      2857b55 [Yin Huai] Merge remote-tracking branch 'upstream/master' into UDAF
      4435f20 [Yin Huai] Add ConvertAggregateFunction to HiveContext's analyzer.
      1b490ed [Michael Armbrust] make hive test
      8cfa6a9 [Michael Armbrust] add test
      1b0bb3f [Yin Huai] Do not bind references in AlgebraicAggregate and use code gen for all places.
      072209f [Yin Huai] Bug fix: Handle expressions in grouping columns that are not attribute references.
      f7d9e54 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into UDAF
      39ee975 [Yin Huai] Code cleanup: Remove unnecesary AttributeReferences.
      b7720ba [Yin Huai] Add an analysis rule to convert aggregate function to the new version.
      5c00f3f [Michael Armbrust] First draft of codegen
      6bbc6ba [Michael Armbrust] now with correct answers\!
      f7996d0 [Michael Armbrust] Add AlgebraicAggregate
      dded1c5 [Yin Huai] wip
      c03299a1
    • Andrew Or's avatar
      [SPARK-9232] [SQL] Duplicate code in JSONRelation · f4785f5b
      Andrew Or authored
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7576 from andrewor14/clean-up-json-relation and squashes the following commits:
      
      ea80803 [Andrew Or] Clean up duplicate code
      f4785f5b
    • Yu ISHIKAWA's avatar
      [SPARK-9121] [SPARKR] Get rid of the warnings about `no visible global... · 63f4bcc7
      Yu ISHIKAWA authored
      [SPARK-9121] [SPARKR] Get rid of the warnings about `no visible global function definition` in SparkR
      
      [[SPARK-9121] Get rid of the warnings about `no visible global function definition` in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9121)
      
      ## The Result of `dev/lint-r`
      [The result of lint-r for SPARK-9121 at the revision:1ddd0f2f when I have sent a PR](https://gist.github.com/yu-iskw/6f55953425901725edf6)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7567 from yu-iskw/SPARK-9121 and squashes the following commits:
      
      c8cfd63 [Yu ISHIKAWA] Fix the typo
      b1f19ed [Yu ISHIKAWA] Add a validate statement for local SparkR
      1a03987 [Yu ISHIKAWA] Load the `testthat` package in `dev/lint-r.R`, instead of using the full path of function.
      3a5e0ab [Yu ISHIKAWA] [SPARK-9121][SparkR] Get rid of the warnings about `no visible global function definition` in SparkR
      63f4bcc7
  2. Jul 21, 2015
    • Reynold Xin's avatar
      [SPARK-9154][SQL] Rename formatString to format_string. · a4c83cb1
      Reynold Xin authored
      Also make format_string the canonical form, rather than printf.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7579 from rxin/format_strings and squashes the following commits:
      
      53ee54f [Reynold Xin] Fixed unit tests.
      52357e1 [Reynold Xin] Add format_string alias.
      b40a42a [Reynold Xin] [SPARK-9154][SQL] Rename formatString to format_string.
      a4c83cb1
    • Tarek Auel's avatar
      [SPARK-9154] [SQL] codegen StringFormat · d4c7a7a3
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-9154
      
      fixes bug of #7546
      
      marmbrus I can't reopen the other PR, because I didn't closed it. Can you trigger Jenkins?
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7571 from tarekauel/SPARK-9154 and squashes the following commits:
      
      dcae272 [Tarek Auel] [SPARK-9154][SQL] build fix
      1487602 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-9154
      f512c5f [Tarek Auel] [SPARK-9154][SQL] build fix
      a943d3e [Tarek Auel] [SPARK-9154] implicit input cast, added tests for null, support for null primitives
      10b4de8 [Tarek Auel] [SPARK-9154][SQL] codegen removed fallback trait
      cd8322b [Tarek Auel] [SPARK-9154][SQL] codegen string format
      086caba [Tarek Auel] [SPARK-9154][SQL] codegen string format
      d4c7a7a3
    • Dennis Huo's avatar
      [SPARK-9206] [SQL] Fix HiveContext classloading for GCS connector. · c07838b5
      Dennis Huo authored
      IsolatedClientLoader.isSharedClass includes all of com.google.\*, presumably
      for Guava, protobuf, and/or other shared Google libraries, but needs to
      count com.google.cloud.\* as "hive classes" when determining which ClassLoader
      to use. Otherwise, things like HiveContext.parquetFile will throw a
      ClassCastException when fs.defaultFS is set to a Google Cloud Storage (gs://)
      path. On StackOverflow: http://stackoverflow.com/questions/31478955
      
      EDIT: Adding yhuai who worked on the relevant classloading isolation pieces.
      
      Author: Dennis Huo <dhuo@google.com>
      
      Closes #7549 from dennishuo/dhuo-fix-hivecontext-gcs and squashes the following commits:
      
      1f8db07 [Dennis Huo] Fix HiveContext classloading for GCS connector.
      c07838b5
    • Reynold Xin's avatar
      [SPARK-8906][SQL] Move all internal data source classes into execution.datasources. · 60c0ce13
      Reynold Xin authored
      This way, the sources package contains only public facing interfaces.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7565 from rxin/move-ds and squashes the following commits:
      
      7661aff [Reynold Xin] Mima
      9d5196a [Reynold Xin] Rearranged imports.
      3dd7174 [Reynold Xin] [SPARK-8906][SQL] Move all internal data source classes into execution.datasources.
      60c0ce13
    • navis.ryu's avatar
      [SPARK-8357] Fix unsafe memory leak on empty inputs in GeneratedAggregate · 9ba7c64d
      navis.ryu authored
      This patch fixes a managed memory leak in GeneratedAggregate.  The leak occurs when the unsafe aggregation path is used to perform grouped aggregation on an empty input; in this case, GeneratedAggregate allocates an UnsafeFixedWidthAggregationMap that is never cleaned up because `next()` is never called on the aggregate result iterator.
      
      This patch fixes this by short-circuiting on empty inputs.
      
      This patch is an updated version of #6810.
      
      Closes #6810.
      
      Author: navis.ryu <navis@apache.org>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7560 from JoshRosen/SPARK-8357 and squashes the following commits:
      
      3486ce4 [Josh Rosen] Some minor cleanup
      c649310 [Josh Rosen] Revert SparkPlan change:
      3c7db0f [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-8357
      adc8239 [Josh Rosen] Back out Projection changes.
      c5419b3 [navis.ryu] addressed comments
      143e1ef [navis.ryu] fixed format & added test for CCE case
      735972f [navis.ryu] used new conf apis
      1a02a55 [navis.ryu] Rolled-back test-conf cleanup & fixed possible CCE & added more tests
      51178e8 [navis.ryu] addressed comments
      4d326b9 [navis.ryu] fixed test fails
      15c5afc [navis.ryu] added a test as suggested by JoshRosen
      d396589 [navis.ryu] added comments
      1b07556 [navis.ryu] [SPARK-8357] [SQL] Memory leakage on unsafe aggregation path with empty input
      9ba7c64d
    • Michael Armbrust's avatar
      Revert "[SPARK-9154] [SQL] codegen StringFormat" · 87d890cc
      Michael Armbrust authored
      This reverts commit 7f072c3d.
      
      Revert #7546
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #7570 from marmbrus/revert9154 and squashes the following commits:
      
      ed2c32a [Michael Armbrust] Revert "[SPARK-9154] [SQL] codegen StringFormat"
      87d890cc
    • MechCoder's avatar
      [SPARK-5989] [MLLIB] Model save/load for LDA · 89db3c0b
      MechCoder authored
      Add support for saving and loading LDA both the local and distributed versions.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6948 from MechCoder/lda_save_load and squashes the following commits:
      
      49bcdce [MechCoder] minor style fixes
      cc14054 [MechCoder] minor
      4587d1d [MechCoder] Minor changes
      c753122 [MechCoder] Load and save the model in private methods
      2782326 [MechCoder] [SPARK-5989] Model save/load for LDA
      89db3c0b
    • Tarek Auel's avatar
      [SPARK-9154] [SQL] codegen StringFormat · 7f072c3d
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-9154
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7546 from tarekauel/SPARK-9154 and squashes the following commits:
      
      a943d3e [Tarek Auel] [SPARK-9154] implicit input cast, added tests for null, support for null primitives
      10b4de8 [Tarek Auel] [SPARK-9154][SQL] codegen removed fallback trait
      cd8322b [Tarek Auel] [SPARK-9154][SQL] codegen string format
      086caba [Tarek Auel] [SPARK-9154][SQL] codegen string format
      7f072c3d
    • zsxwing's avatar
      [SPARK-5423] [CORE] Register a TaskCompletionListener to make sure release all resources · d45355ee
      zsxwing authored
      Make `DiskMapIterator.cleanup` idempotent and register a TaskCompletionListener to make sure call `cleanup`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7529 from zsxwing/SPARK-5423 and squashes the following commits:
      
      3e3c413 [zsxwing] Remove TODO
      9556c78 [zsxwing] Fix NullPointerException for tests
      3d574d9 [zsxwing] Register a TaskCompletionListener to make sure release all resources
      d45355ee
    • zsxwing's avatar
      [SPARK-4598] [WEBUI] Task table pagination for the Stage page · 4f7f1ee3
      zsxwing authored
      This PR adds pagination for the task table to solve the scalability issue of the stage page. Here is the initial screenshot:
      <img width="1347" alt="pagination" src="https://cloud.githubusercontent.com/assets/1000778/8679669/9e63863c-2a8e-11e5-94e4-994febcd6717.png">
      The task table only shows 100 tasks. There is a page navigation above the table. Users can click the page navigation or type the page number to jump to another page. The table can be sorted by clicking the headers. However, unlike previous implementation, the sorting work is done in the server now. So clicking a table column to sort needs to refresh the web page.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7399 from zsxwing/task-table-pagination and squashes the following commits:
      
      144f513 [zsxwing] Display the page navigation when the page number is out of range
      a3eee22 [zsxwing] Add extra space for the error message
      54c5b84 [zsxwing] Reset page to 1 if the user changes the page size
      c2f7f39 [zsxwing] Add a text field to let users fill the page size
      bad52eb [zsxwing] Display user-friendly error messages
      410586b [zsxwing] Scroll down to the tasks table if the url contains any sort column
      a0746d1 [zsxwing] Use expand-dag-viz-arrow-job and expand-dag-viz-arrow-stage instead of expand-dag-viz-arrow-true and expand-dag-viz-arrow-false
      b123f67 [zsxwing] Use localStorage to remember the user's actions and replay them when loading the page
      894a342 [zsxwing] Show the link cursor when hovering for headers and page links and other minor fix
      4d4fecf [zsxwing] Address Carson's comments
      d9285f0 [zsxwing] Add comments and fix the style
      74285fa [zsxwing] Merge branch 'master' into task-table-pagination
      db6c859 [zsxwing] Task table pagination for the Stage page
      4f7f1ee3
    • Jacek Lewandowski's avatar
      [SPARK-7171] Added a method to retrieve metrics sources in TaskContext · 31954910
      Jacek Lewandowski authored
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #5805 from jacek-lewandowski/SPARK-7171 and squashes the following commits:
      
      ed20bda [Jacek Lewandowski] SPARK-7171: Added a method to retrieve metrics sources in TaskContext
      31954910
    • Liang-Chi Hsieh's avatar
      [SPARK-9128] [CORE] Get outerclasses and objects with only one method calling in ClosureCleaner · 9a4fd875
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9128
      
      Currently, in `ClosureCleaner`, the outerclasses and objects are retrieved using two different methods. However, the logic of the two methods is the same, and we can get both the outerclasses and objects with only one method calling.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7459 from viirya/remove_extra_closurecleaner and squashes the following commits:
      
      7c9858d [Liang-Chi Hsieh] For comments.
      a096941 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into remove_extra_closurecleaner
      2ec5ce1 [Liang-Chi Hsieh] Remove unnecessary methods.
      4df5a51 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into remove_extra_closurecleaner
      dc110d1 [Liang-Chi Hsieh] Add method to get outerclasses and objects at the same time.
      9a4fd875
    • Ben's avatar
      [SPARK-9036] [CORE] SparkListenerExecutorMetricsUpdate messages not included in JsonProtocol · f67da43c
      Ben authored
      This PR implements a JSON serializer and deserializer in the JSONProtocol to handle the (de)serialization of SparkListenerExecutorMetricsUpdate events. It also includes a unit test in the JSONProtocolSuite file. This was implemented to satisfy the improvement request in the JIRA  issue SPARK-9036.
      
      Author: Ben <benjaminpiering@gmail.com>
      
      Closes #7555 from NamelessAnalyst/master and squashes the following commits:
      
      fb4e3cc [Ben] Update JSON Protocol and tests
      aa69517 [Ben] Update JSON Protocol and tests --Corrected Stage Attempt to Stage Attempt ID
      33e5774 [Ben] Update JSON Protocol Tests
      3f237e7 [Ben] Update JSON Protocol Tests
      84ca798 [Ben] Update JSON Protocol Tests
      cde57a0 [Ben] Update JSON Protocol Tests
      8049600 [Ben] Update JSON Protocol Tests
      c5bc061 [Ben] Update JSON Protocol Tests
      6f25785 [Ben] Merge remote-tracking branch 'origin/master'
      df2a609 [Ben] Update JSON Protocol
      dcda80b [Ben] Update JSON Protocol
      f67da43c
    • Grace's avatar
      [SPARK-9193] Avoid assigning tasks to "lost" executor(s) · 6592a605
      Grace authored
      Now, when some executors are killed by dynamic-allocation, it leads to some mis-assignment onto lost executors sometimes. Such kind of mis-assignment causes task failure(s) or even job failure if it repeats that errors for 4 times.
      
      The root cause is that ***killExecutors*** doesn't remove those executors under killing ASAP. It depends on the ***OnDisassociated*** event to refresh the active working list later. The delay time really depends on your cluster status (from several milliseconds to sub-minute). When new tasks to be scheduled during that period of time, it will be assigned to those "active" but "under killing" executors. Then the tasks will be failed due to "executor lost". The better way is to exclude those executors under killing in the makeOffers(). Then all those tasks won't be allocated onto those executors "to be lost" any more.
      
      Author: Grace <jie.huang@intel.com>
      
      Closes #7528 from GraceH/AssignToLostExecutor and squashes the following commits:
      
      ecc1da6 [Grace] scala style fix
      6e2ed96 [Grace] Re-word makeOffers by more readable lines
      b5546ce [Grace] Add comments about the fix
      30a9ad0 [Grace] Avoid assigning tasks to lost executors
      6592a605
    • petz2000's avatar
      [SPARK-8915] [DOCUMENTATION, MLLIB] Added @since tags to mllib.classification · df4ddb31
      petz2000 authored
      Created since tags for methods in mllib.classification
      
      Author: petz2000 <petz2000@gmail.com>
      
      Closes #7371 from petz2000/add_since_mllib.classification and squashes the following commits:
      
      39fe291 [petz2000] Removed whitespace in block comment
      c9b1e03 [petz2000] Removed @since tags again from protected and private methods
      cd759b6 [petz2000] Added @since tags to methods
      df4ddb31
    • Yijie Shen's avatar
      [SPARK-9081] [SPARK-9168] [SQL] nanvl & dropna/fillna supporting nan as well · be5c5d37
      Yijie Shen authored
      JIRA:
      https://issues.apache.org/jira/browse/SPARK-9081
      https://issues.apache.org/jira/browse/SPARK-9168
      
      This PR target at two modifications:
      1.  Change `isNaN` to return `false` on `null` input
      2.  Make `dropna` and `fillna` to fill/drop NaN values as well
      3.  Implement `nanvl`
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7523 from yjshen/fillna_dropna and squashes the following commits:
      
      f0a51db [Yijie Shen] make coalesce untouched and implement nanvl
      1d3e35f [Yijie Shen] make Coalesce aware of NaN in order to support fillna
      2760cbc [Yijie Shen] change isNaN(null) to false as well as implement dropna
      be5c5d37
    • Michael Allman's avatar
      [SPARK-8401] [BUILD] Scala version switching build enhancements · f5b6dc5e
      Michael Allman authored
      These commits address a few minor issues in the Scala cross-version support in the build:
      
        1. Correct two missing `${scala.binary.version}` pom file substitutions.
        2. Don't update `scala.binary.version` in parent POM. This property is set through profiles.
        3. Update the source of the generated scaladocs in `docs/_plugins/copy_api_dirs.rb`.
        4. Factor common code out of `dev/change-version-to-*.sh` and add some validation. We also test `sed` to see if it's GNU sed and try `gsed` as an alternative if not. This prevents the script from running with a non-GNU sed.
      
      This is my original work and I license this work to the Spark project under the Apache License.
      
      Author: Michael Allman <michael@videoamp.com>
      
      Closes #6832 from mallman/scala-versions and squashes the following commits:
      
      cde2f17 [Michael Allman] Delete dev/change-version-to-*.sh, replacing them with single dev/change-scala-version.sh script that takes a version as argument
      02296f2 [Michael Allman] Make the scala version change scripts cross-platform by restricting ourselves to POSIX sed syntax instead of looking for GNU sed
      ad9b40a [Michael Allman] Factor change-scala-version.sh out of change-version-to-*.sh, adding command line argument validation and testing for GNU sed
      bdd20bf [Michael Allman] Update source of scaladocs when changing Scala version
      475088e [Michael Allman] Replace jackson-module-scala_2.10 with jackson-module-scala_${scala.binary.version}
      f5b6dc5e
    • Kay Ousterhout's avatar
      [SPARK-8875] Remove BlockStoreShuffleFetcher class · 6364735b
      Kay Ousterhout authored
      The shuffle code has gotten increasingly difficult to read as it has evolved, and many classes
      have evolved significantly since they were originally created. The BlockStoreShuffleFetcher class
      now serves little purpose other than to make the code more difficult to read; this commit moves its
      functionality into the ShuffleBlockFetcherIterator class.
      
      cc massie JoshRosen (Josh, this PR also removes the Try you pointed out as being confusing / not necessarily useful in a previous comment).  Matt, would be helpful to know whether this will interfere in any negative ways with your new shuffle PR (I took a look and it seems like this should still cleanly integrate with your parquet work, but want to double check).
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #7268 from kayousterhout/SPARK-8875 and squashes the following commits:
      
      2b24a97 [Kay Ousterhout] Fixed DAGSchedulerSuite compile error
      98a1831 [Kay Ousterhout] Merge remote-tracking branch 'upstream/master' into SPARK-8875
      90f0e89 [Kay Ousterhout] Fixed broken test
      14bfcbb [Kay Ousterhout] Last style fix
      bc69d2b [Kay Ousterhout] Style improvements based on Josh's code review
      ad3c8d1 [Kay Ousterhout] Better documentation for MapOutputTracker methods
      0bc0e59 [Kay Ousterhout] [SPARK-8875] Remove BlockStoreShuffleFetcher class
      6364735b
    • Yijie Shen's avatar
      [SPARK-9173][SQL]UnionPushDown should also support Intersect and Except · ae230596
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9173
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7540 from yjshen/union_pushdown and squashes the following commits:
      
      278510a [Yijie Shen] rename UnionPushDown to SetOperationPushDown
      91741c1 [Yijie Shen] Add UnionPushDown support for intersect and except
      ae230596
    • Pedro Rodriguez's avatar
      [SPARK-8230][SQL] Add array/map size method · 560c658a
      Pedro Rodriguez authored
      Pull Request for: https://issues.apache.org/jira/browse/SPARK-8230
      
      Primary issue resolved is to implement array/map size for Spark SQL. Code is ready for review by a committer. Chen Hao is on the JIRA ticket, but I don't know his username on github, rxin is also on JIRA ticket.
      
      Things to review:
      1. Where to put added functions namespace wise, they seem to be part of a few operations on collections which includes `sort_array` and `array_contains`. Hence the name given `collectionOperations.scala` and `_collection_functions` in python.
      2. In Python code, should it be in a `1.5.0` function array or in a collections array?
      3. Are there any missing methods on the `Size` case class? Looks like many of these functions have generated Java code, is that also needed in this case?
      4. Something else?
      
      Author: Pedro Rodriguez <ski.rodriguez@gmail.com>
      Author: Pedro Rodriguez <prodriguez@trulia.com>
      
      Closes #7462 from EntilZha/SPARK-8230 and squashes the following commits:
      
      9a442ae [Pedro Rodriguez] fixed functions and sorted __all__
      9aea3bb [Pedro Rodriguez] removed imports from python docs
      15d4bf1 [Pedro Rodriguez] Added null test case and changed to nullSafeCodeGen
      d88247c [Pedro Rodriguez] removed python code
      bd5f0e4 [Pedro Rodriguez] removed duplicate function from rebase/merge
      59931b4 [Pedro Rodriguez] fixed compile bug instroduced when merging
      c187175 [Pedro Rodriguez] updated code to add size to __all__ directly and removed redundent pretty print
      130839f [Pedro Rodriguez] fixed failing test
      aa9bade [Pedro Rodriguez] fix style
      e093473 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
      0449377 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
      9a1a2ff [Pedro Rodriguez] added unit tests for map size
      2bfbcb6 [Pedro Rodriguez] added unit test for size
      20df2b4 [Pedro Rodriguez] Finished working version of size function and added it to python
      b503e75 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
      99a6a5c [Pedro Rodriguez] fixed failing test
      cac75ac [Pedro Rodriguez] fix style
      933d843 [Pedro Rodriguez] updated python code with docs, switched classes/traits implemented, added (failing) expression tests
      42bb7d4 [Pedro Rodriguez] refactored code to use better abstract classes/traits and implementations
      f9c3b8a [Pedro Rodriguez] added unit tests for map size
      2515d9f [Pedro Rodriguez] added documentation
      0e60541 [Pedro Rodriguez] added unit test for size
      acf9853 [Pedro Rodriguez] Finished working version of size function and added it to python
      84a5d38 [Pedro Rodriguez] First attempt at implementing size for maps and arrays
      560c658a
    • Cheng Hao's avatar
      [SPARK-8255] [SPARK-8256] [SQL] Add regex_extract/regex_replace · 8c8f0ef5
      Cheng Hao authored
      Add expressions `regex_extract` & `regex_replace`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7468 from chenghao-intel/regexp and squashes the following commits:
      
      e5ea476 [Cheng Hao] minor update for documentation
      ef96fd6 [Cheng Hao] update the code gen
      72cf28f [Cheng Hao] Add more log for compilation error
      4e11381 [Cheng Hao] Add regexp_replace / regexp_extract support
      8c8f0ef5
    • Cheng Lian's avatar
      [SPARK-9100] [SQL] Adds DataFrame reader/writer shortcut methods for ORC · d38c5029
      Cheng Lian authored
      This PR adds DataFrame reader/writer shortcut methods for ORC in both Scala and Python.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7444 from liancheng/spark-9100 and squashes the following commits:
      
      284d043 [Cheng Lian] Fixes PySpark test cases and addresses PR comments
      e0b09fb [Cheng Lian] Adds DataFrame reader/writer shortcut methods for ORC
      d38c5029
    • Tarek Auel's avatar
      [SPARK-9161][SQL] codegen FormatNumber · 1ddd0f2f
      Tarek Auel authored
      Jira https://issues.apache.org/jira/browse/SPARK-9161
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7545 from tarekauel/SPARK-9161 and squashes the following commits:
      
      21425c8 [Tarek Auel] [SPARK-9161][SQL] codegen FormatNumber
      1ddd0f2f
    • Shivaram Venkataraman's avatar
      [SPARK-9179] [BUILD] Use default primary author if unspecified · 228ab65a
      Shivaram Venkataraman authored
      Fixes feature introduced in #7508 to use the default value if nothing is specified in command line
      
      cc liancheng rxin pwendell
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #7558 from shivaram/merge-script-fix and squashes the following commits:
      
      7092141 [Shivaram Venkataraman] Use default primary author if unspecified
      228ab65a
    • Josh Rosen's avatar
      [SPARK-9023] [SQL] Followup for #7456 (Efficiency improvements for UnsafeRows in Exchange) · 48f8fd46
      Josh Rosen authored
      This patch addresses code review feedback from #7456.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7551 from JoshRosen/unsafe-exchange-followup and squashes the following commits:
      
      76dbdf8 [Josh Rosen] Add comments + more methods to UnsafeRowSerializer
      3d7a1f2 [Josh Rosen] Add writeToStream() method to UnsafeRow
      48f8fd46
    • Reynold Xin's avatar
      [SPARK-9208][SQL] Remove variant of DataFrame string functions that accept column names. · 67570bee
      Reynold Xin authored
      It can be ambiguous whether that is a string literal or a column name.
      
      cc marmbrus
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7556 from rxin/str-exprs and squashes the following commits:
      
      92afa83 [Reynold Xin] [SPARK-9208][SQL] Remove variant of DataFrame string functions that accept column names.
      67570bee
    • Tarek Auel's avatar
      [SPARK-9157] [SQL] codegen substring · 560b355c
      Tarek Auel authored
      https://issues.apache.org/jira/browse/SPARK-9157
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7534 from tarekauel/SPARK-9157 and squashes the following commits:
      
      e65e3e9 [Tarek Auel] [SPARK-9157] indent fix
      44e89f8 [Tarek Auel] [SPARK-9157] use EMPTY_UTF8
      37d54c4 [Tarek Auel] Merge branch 'master' into SPARK-9157
      60732ea [Tarek Auel] [SPARK-9157] created substringSQL in UTF8String
      18c3576 [Tarek Auel] [SPARK-9157][SQL] remove slice pos
      1a2e611 [Tarek Auel] [SPARK-9157][SQL] codegen substring
      560b355c
Loading