Skip to content
Snippets Groups Projects
  1. Nov 03, 2015
    • Cheng Lian's avatar
      [SPARK-10978][SQL] Allow data sources to eliminate filters · ebf8b0b4
      Cheng Lian authored
      This PR adds a new method `unhandledFilters` to `BaseRelation`. Data sources which implement this method properly may avoid the overhead of defensive filtering done by Spark SQL.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #9399 from liancheng/spark-10978.unhandled-filters.
      ebf8b0b4
    • Mark Grover's avatar
      [SPARK-9790][YARN] Expose in WebUI if NodeManager is the reason why executors were killed. · b2e4b314
      Mark Grover authored
      Author: Mark Grover <grover.markgrover@gmail.com>
      
      Closes #8093 from markgrover/nm2.
      b2e4b314
    • Yanbo Liang's avatar
      [SPARK-11349][ML] Support transform string label for RFormula · f54ff19b
      Yanbo Liang authored
      Currently ```RFormula``` can only handle label with ```NumericType``` or ```BinaryType``` (cast it to ```DoubleType``` as the label of Linear Regression training), we should also support label of ```StringType``` which is needed for Logistic Regression (glm with family = "binomial").
      For label of ```StringType```, we should use ```StringIndexer``` to transform it to 0-based index.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9302 from yanboliang/spark-11349.
      f54ff19b
    • Yanbo Liang's avatar
      [MINOR][ML] Fix naming conventions of AFTSurvivalRegression coefficients · 3434572b
      Yanbo Liang authored
      Rename ```regressionCoefficients``` back to ```coefficients```, and name ```weights``` to ```parameters```.
      See discussion [here](https://github.com/apache/spark/pull/9311/files#diff-e277fd0bc21f825d3196b4551c01fe5fR230). mengxr vectorijk dbtsai
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9431 from yanboliang/aft-coefficients.
      3434572b
    • Yanbo Liang's avatar
      [SPARK-9836][ML] Provide R-like summary statistics for OLS via normal equation solver · d6f10aa7
      Yanbo Liang authored
      https://issues.apache.org/jira/browse/SPARK-9836
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9413 from yanboliang/spark-9836.
      d6f10aa7
    • Liang-Chi Hsieh's avatar
      [SPARK-10304] [SQL] Partition discovery should throw an exception if the dir structure is invalid · d6035d97
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-10304
      
      This patch detects if the structure of partition directories is not valid.
      
      The test cases are from #8547. Thanks zhzhan.
      
      cc liancheng
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #8840 from viirya/detect_invalid_part_dir.
      d6035d97
    • Reynold Xin's avatar
      [SPARK-11256] Mark all Stage/ResultStage/ShuffleMapStage internal state as private. · 57446eb6
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9219 from rxin/stage-cleanup1.
      57446eb6
    • Daoyuan Wang's avatar
      [SPARK-10533][SQL] handle scientific notation in sqlParser · d188a677
      Daoyuan Wang authored
      https://issues.apache.org/jira/browse/SPARK-10533
      
      val df = sqlContext.createDataFrame(Seq(("a",1.0),("b",2.0),("c",3.0)))
      df.filter("_2 < 2.0e1").show
      
      Scientific notation didn't work.
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #9085 from adrian-wang/scinotation.
      d188a677
    • Jacek Lewandowski's avatar
      [SPARK-11344] Made ApplicationDescription and DriverDescription case classes · 233e534a
      Jacek Lewandowski authored
      DriverDescription refactored to case class because it included no mutable fields.
      
      ApplicationDescription had one mutable field, which was appUiUrl. This field was set by the driver to point to the driver web UI. Master was modifying this field when the application was removed to redirect requests to history server. This was wrong because objects which are sent over the wire should be immutable. Now appUiUrl is immutable in ApplicationDescription and always points to the driver UI even if it is already shutdown. The UI url which master exposes to the user and modifies dynamically is now included into ApplicationInfo - a data object which describes the application state internally in master. That URL in ApplicationInfo is initialised with the value from ApplicationDescription.
      
      ApplicationDescription also included value user, which is now a part of case class fields.
      
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #9299 from jacek-lewandowski/SPARK-11344.
      233e534a
    • Michael Armbrust's avatar
      [SPARK-11404] [SQL] Support for groupBy using column expressions · b86f2cab
      Michael Armbrust authored
      This PR adds a new method `groupBy(cols: Column*)` to `Dataset` that allows users to group using column expressions instead of a lambda function.  Since the return type of these expressions is not known at compile time, we just set the key type as a generic `Row`.  If the user would like to work the key in a type-safe way, they can call `grouped.asKey[Type]`, which is also added in this PR.
      
      ```scala
      val ds = Seq(("a", 10), ("a", 20), ("b", 1), ("b", 2), ("c", 1)).toDS()
      val grouped = ds.groupBy($"_1").asKey[String]
      val agged = grouped.mapGroups { case (g, iter) =>
        Iterator((g, iter.map(_._2).sum))
      }
      
      agged.collect()
      
      res0: Array(("a", 30), ("b", 3), ("c", 1))
      ```
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #9359 from marmbrus/columnGroupBy and squashes the following commits:
      
      bbcb03b [Michael Armbrust] Update DatasetSuite.scala
      8fd2908 [Michael Armbrust] Update DatasetSuite.scala
      0b0e2f8 [Michael Armbrust] [SPARK-11404] [SQL] Support for groupBy using column expressions
      b86f2cab
    • Wenchen Fan's avatar
      [SPARK-11436] [SQL] rebind right encoder when join 2 datasets · 425ff03f
      Wenchen Fan authored
      When we join 2 datasets, we will combine 2 encoders into a tupled one, and use it as the encoder for the jioned dataset. Assume both of the 2 encoders are flat, their `constructExpression`s both reference to the first element of input row. However, when we combine 2 encoders, the schema of input row changed,  now the right encoder should reference to second element of input row. So we should rebind right encoder to let it know the new schema of input row before combine it.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #9391 from cloud-fan/join and squashes the following commits:
      
      846d3ab [Wenchen Fan] rebind right encoder when join 2 datasets
      425ff03f
    • Davies Liu's avatar
      [SPARK-10429] [SQL] make mutableProjection atomic · 67e23b39
      Davies Liu authored
      Right now, SQL's mutable projection updates every value of the mutable project after it evaluates the corresponding expression. This makes the behavior of MutableProjection confusing and complicate the implementation of common aggregate functions like stddev because developers need to be aware that when evaluating {{i+1}}th expression of a mutable projection, {{i}}th slot of the mutable row has already been updated.
      
      This PR make the MutableProjection atomic, by generating all the results of expressions first, then copy them into mutableRow.
      
      Had run a mircro-benchmark, there is no notable performance difference between using class members and local variables.
      
      cc yhuai
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9422 from davies/atomic_mutable and squashes the following commits:
      
      bbc1758 [Davies Liu] support wide table
      8a0ae14 [Davies Liu] fix bug
      bec07da [Davies Liu] refactor
      2891628 [Davies Liu] make mutableProjection atomic
      67e23b39
    • Yin Huai's avatar
      [SPARK-9858][SPARK-9859][SPARK-9861][SQL] Add an ExchangeCoordinator to... · d728d5c9
      Yin Huai authored
      [SPARK-9858][SPARK-9859][SPARK-9861][SQL] Add an ExchangeCoordinator to estimate the number of post-shuffle partitions for aggregates and joins
      
      https://issues.apache.org/jira/browse/SPARK-9858
      https://issues.apache.org/jira/browse/SPARK-9859
      https://issues.apache.org/jira/browse/SPARK-9861
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #9276 from yhuai/numReducer.
      d728d5c9
    • navis.ryu's avatar
      [SPARK-9034][SQL] Reflect field names defined in GenericUDTF · c34c27fe
      navis.ryu authored
      Hive GenericUDTF#initialize() defines field names in a returned schema though,
      the current HiveGenericUDTF drops these names.
      We might need to reflect these in a logical plan tree.
      
      Author: navis.ryu <navis@apache.org>
      
      Closes #8456 from navis/SPARK-9034.
      c34c27fe
  2. Nov 02, 2015
  3. Nov 01, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-9298][SQL] Add pearson correlation aggregation function · 3e770a64
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9298
      
      This patch adds pearson correlation aggregation function based on `AggregateExpression2`.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #8587 from viirya/corr_aggregation.
      3e770a64
    • Marcelo Vanzin's avatar
      [SPARK-11073][CORE][YARN] Remove akka dependency in secret key generation. · f8d93ede
      Marcelo Vanzin authored
      Use standard JDK APIs for that (with a little help from Guava). Most of the
      changes here are in test code, since there were no tests specific to that
      part of the code.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9257 from vanzin/SPARK-11073.
      f8d93ede
    • Marcelo Vanzin's avatar
      [SPARK-11020][CORE] Wait for HDFS to leave safe mode before initializing HS. · cf04fdfe
      Marcelo Vanzin authored
      Large HDFS clusters may take a while to leave safe mode when starting; this change
      makes the HS wait for that before doing checks about its configuraton. This means
      the HS won't stop right away if HDFS is in safe mode and the configuration is not
      correct, but that should be a very uncommon situation.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9043 from vanzin/SPARK-11020.
      cf04fdfe
    • Nong Li's avatar
      [SPARK-11410][SQL] Add APIs to provide functionality similar to Hive's DISTRIBUTE BY and SORT BY. · 046e32ed
      Nong Li authored
      DISTRIBUTE BY allows the user to hash partition the data by specified exprs. It also allows for
      optioning sorting within each resulting partition. There is no required relationship between the
      exprs for partitioning and sorting (i.e. one does not need to be a prefix of the other).
      
      This patch adds to APIs to DataFrames which can be used together to provide this functionality:
        1. distributeBy() which partitions the data frame into a specified number of partitions using the
           partitioning exprs.
        2. localSort() which sorts each partition using the provided sorting exprs.
      
      To get the DISTRIBUTE BY functionality, the user simply does: df.distributeBy(...).localSort(...)
      
      Author: Nong Li <nongli@gmail.com>
      
      Closes #9364 from nongli/spark-11410.
      046e32ed
    • Christian Kadner's avatar
      [SPARK-11338] [WEBUI] Prepend app links on HistoryPage with uiRoot path · dc7e399f
      Christian Kadner authored
      [SPARK-11338: HistoryPage not multi-tenancy enabled ...](https://issues.apache.org/jira/browse/SPARK-11338)
      - `HistoryPage.scala` ...prepending all page links with the web proxy (`uiRoot`) path
      - `HistoryServerSuite.scala` ...adding a test case to verify all site-relative links are prefixed when the environment variable `APPLICATION_WEB_PROXY_BASE` (or System property `spark.ui.proxyBase`) is set
      
      Author: Christian Kadner <ckadner@us.ibm.com>
      
      Closes #9291 from ckadner/SPARK-11338 and squashes the following commits:
      
      01d2f35 [Christian Kadner] [SPARK-11338][WebUI] nit fixes
      d054bd7 [Christian Kadner] [SPARK-11338][WebUI] prependBaseUri in method makePageLink
      8bcb3dc [Christian Kadner] [SPARK-11338][WebUI] Prepend application links on HistoryPage with uiRoot path
      dc7e399f
Loading