Skip to content
Snippets Groups Projects
  1. Sep 17, 2017
  2. Sep 13, 2017
  3. Sep 12, 2017
    • FavioVazquez's avatar
      [SPARK-21976][DOC] Fix wrong documentation for Mean Absolute Error. · e7696ebe
      FavioVazquez authored
      ## What changes were proposed in this pull request?
      
      Fixed wrong documentation for Mean Absolute Error.
      
      Even though the code is correct for the MAE:
      
      ```scala
      Since("1.2.0")
        def meanAbsoluteError: Double = {
          summary.normL1(1) / summary.count
        }
      ```
      In the documentation the division by N is missing.
      
      ## How was this patch tested?
      
      All of spark tests were run.
      
      Please review http://spark.apache.org/contributing.html
      
       before opening a pull request.
      
      Author: FavioVazquez <favio.vazquezp@gmail.com>
      Author: faviovazquez <favio.vazquezp@gmail.com>
      Author: Favio André Vázquez <favio.vazquezp@gmail.com>
      
      Closes #19190 from FavioVazquez/mae-fix.
      
      (cherry picked from commit e2ac2f1c)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      e7696ebe
  4. Sep 10, 2017
    • Felix Cheung's avatar
      [SPARKR][BACKPORT-2.1] backporting package and test changes · ae4e8ae4
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      cherrypick or manually porting changes to 2.1
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Wayne Zhang <actuaryzhang@uber.com>
      
      Closes #19165 from felixcheung/rbackportpkg21.
      ae4e8ae4
  5. Sep 08, 2017
  6. Aug 30, 2017
  7. Aug 24, 2017
  8. Aug 20, 2017
  9. Aug 15, 2017
    • Liang-Chi Hsieh's avatar
      [SPARK-21721][SQL][BACKPORT-2.1] Clear FileSystem deleteOnExit cache when... · 6f366fbb
      Liang-Chi Hsieh authored
      [SPARK-21721][SQL][BACKPORT-2.1] Clear FileSystem deleteOnExit cache when paths are successfully removed
      
      ## What changes were proposed in this pull request?
      
      Backport SPARK-21721 to branch 2.1:
      
      We put staging path to delete into the deleteOnExit cache of FileSystem in case of the path can't be successfully removed. But when we successfully remove the path, we don't remove it from the cache. We should do it to avoid continuing grow the cache size.
      
      ## How was this patch tested?
      
      Added test.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #18947 from viirya/SPARK-21721-backport-2.1.
      6f366fbb
  10. Aug 07, 2017
    • Yan Facai (颜发才)'s avatar
      [SPARK-21306][ML] For branch 2.1, OneVsRest should support setWeightCol · 9b749b6c
      Yan Facai (颜发才) authored
      The PR is related to #18554, and is modified for branch 2.1.
      
      ## What changes were proposed in this pull request?
      
      add `setWeightCol` method for OneVsRest.
      
      `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.
      
      ## How was this patch tested?
      
      + [x] add an unit test.
      
      Author: Yan Facai (颜发才) <facai.yan@gmail.com>
      
      Closes #18763 from facaiy/BUG/branch-2.1_OneVsRest_support_setWeightCol.
      9b749b6c
    • Mark Grover's avatar
      [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive information · 444cca14
      Mark Grover authored
      ## What changes were proposed in this pull request?
      
      Backporting SPARK-18535 and SPARK-19720 to spark 2.1
      
      It's a backport PR that redacts senstive information by configuration to Spark UI and Spark Submit console logs.
      
      Using reference from Mark Grover markapache.org PRs
      
      ## How was this patch tested?
      
      Same tests from PR applied
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #18802 from dmvieira/feature-redact.
      444cca14
  11. Aug 06, 2017
    • vinodkc's avatar
      [SPARK-21588][SQL] SQLContext.getConf(key, null) should return null · 5634fadb
      vinodkc authored
      
      ## What changes were proposed in this pull request?
      
      In SQLContext.get(key,null) for a key that is not defined in the conf, and doesn't have a default value defined, throws a NPE. Int happens only when conf has a value converter
      
      Added null check on defaultValue inside SQLConf.getConfString to avoid calling entry.valueConverter(defaultValue)
      
      ## How was this patch tested?
      Added unit test
      
      Author: vinodkc <vinod.kc.in@gmail.com>
      
      Closes #18852 from vinodkc/br_Fix_SPARK-21588.
      
      (cherry picked from commit 1ba967b2)
      Signed-off-by: default avatargatorsmile <gatorsmile@gmail.com>
      5634fadb
  12. Aug 04, 2017
    • Andrew Ray's avatar
      [SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC table with... · 734b144d
      Andrew Ray authored
      [SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC table with extreme values on the partition column
      
      ## What changes were proposed in this pull request?
      
      An overflow of the difference of bounds on the partitioning column leads to no data being read. This
      patch checks for this overflow.
      
      ## How was this patch tested?
      
      New unit test.
      
      Author: Andrew Ray <ray.andrew@gmail.com>
      
      Closes #18800 from aray/SPARK-21330.
      
      (cherry picked from commit 25826c77)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      734b144d
  13. Aug 02, 2017
    • Bryan Cutler's avatar
      [SPARK-12717][PYTHON][BRANCH-2.1] Adding thread-safe broadcast pickle registry · d93e45b8
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      When using PySpark broadcast variables in a multi-threaded environment,  `SparkContext._pickled_broadcast_vars` becomes a shared resource.  A race condition can occur when broadcast variables that are pickled from one thread get added to the shared ` _pickled_broadcast_vars` and become part of the python command from another thread.  This PR introduces a thread-safe pickled registry using thread local storage so that when python command is pickled (causing the broadcast variable to be pickled and added to the registry) each thread will have their own view of the pickle registry to retrieve and clear the broadcast variables used.
      
      ## How was this patch tested?
      
      Added a unit test that causes this race condition using another thread.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #18825 from BryanCutler/pyspark-bcast-threadsafe-SPARK-12717-2_1.
      d93e45b8
  14. Aug 01, 2017
  15. Jul 29, 2017
    • Liang-Chi Hsieh's avatar
      [SPARK-21555][SQL] RuntimeReplaceable should be compared semantically by its canonicalized child · 78f7cdfa
      Liang-Chi Hsieh authored
      
      ## What changes were proposed in this pull request?
      
      When there are aliases (these aliases were added for nested fields) as parameters in `RuntimeReplaceable`, as they are not in the children expression, those aliases can't be cleaned up in analyzer rule `CleanupAliases`.
      
      An expression `nvl(foo.foo1, "value")` can be resolved to two semantically different expressions in a group by query because they contain different aliases.
      
      Because those aliases are not children of `RuntimeReplaceable` which is an `UnaryExpression`. So we can't trim the aliases out by simple transforming the expressions in `CleanupAliases`.
      
      If we want to replace the non-children aliases in `RuntimeReplaceable`, we need to add more codes to `RuntimeReplaceable` and modify all expressions of `RuntimeReplaceable`. It makes the interface ugly IMO.
      
      Consider those aliases will be replaced later at optimization and so they're no harm, this patch chooses to simply override `canonicalized` of `RuntimeReplaceable`.
      
      One concern is about `CleanupAliases`. Because it actually cannot clean up ALL aliases inside a plan. To make caller of this rule notice that, this patch adds a comment to `CleanupAliases`.
      
      ## How was this patch tested?
      
      Added test.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #18761 from viirya/SPARK-21555.
      
      (cherry picked from commit 9c8109ef)
      Signed-off-by: default avatargatorsmile <gatorsmile@gmail.com>
      78f7cdfa
  16. Jul 28, 2017
  17. Jul 27, 2017
  18. Jul 19, 2017
    • DFFuture's avatar
      [SPARK-21446][SQL] Fix setAutoCommit never executed · 94987987
      DFFuture authored
      ## What changes were proposed in this pull request?
      JIRA Issue: https://issues.apache.org/jira/browse/SPARK-21446
      
      
      options.asConnectionProperties can not have fetchsize,because fetchsize belongs to Spark-only options, and Spark-only options have been excluded in connection properities.
      So change properties of beforeFetch from  options.asConnectionProperties.asScala.toMap to options.asProperties.asScala.toMap
      
      ## How was this patch tested?
      
      Author: DFFuture <albert.zhang23@gmail.com>
      
      Closes #18665 from DFFuture/sparksql_pg.
      
      (cherry picked from commit c9729187)
      Signed-off-by: default avatargatorsmile <gatorsmile@gmail.com>
      94987987
    • donnyzone's avatar
      [SPARK-21441][SQL] Incorrect Codegen in SortMergeJoinExec results failures in some cases · ac206934
      donnyzone authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/projects/SPARK/issues/SPARK-21441
      
      
      
      This issue can be reproduced by the following example:
      
      ```
      val spark = SparkSession
         .builder()
         .appName("smj-codegen")
         .master("local")
         .config("spark.sql.autoBroadcastJoinThreshold", "1")
         .getOrCreate()
      val df1 = spark.createDataFrame(Seq((1, 1), (2, 2), (3, 3))).toDF("key", "int")
      val df2 = spark.createDataFrame(Seq((1, "1"), (2, "2"), (3, "3"))).toDF("key", "str")
      val df = df1.join(df2, df1("key") === df2("key"))
         .filter("int = 2 or reflect('java.lang.Integer', 'valueOf', str) = 1")
         .select("int")
         df.show()
      ```
      
      To conclude, the issue happens when:
      (1) SortMergeJoin condition contains CodegenFallback expressions.
      (2) In PhysicalPlan tree, SortMergeJoin node  is the child of root node, e.g., the Project in above example.
      
      This patch fixes the logic in `CollapseCodegenStages` rule.
      
      ## How was this patch tested?
      Unit test and manual verification in our cluster.
      
      Author: donnyzone <wellfengzhu@gmail.com>
      
      Closes #18656 from DonnyZone/Fix_SortMergeJoinExec.
      
      (cherry picked from commit 6b6dd682)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      ac206934
  19. Jul 17, 2017
    • aokolnychyi's avatar
      [SPARK-21332][SQL] Incorrect result type inferred for some decimal expressions · caf32b3c
      aokolnychyi authored
      
      ## What changes were proposed in this pull request?
      
      This PR changes the direction of expression transformation in the DecimalPrecision rule. Previously, the expressions were transformed down, which led to incorrect result types when decimal expressions had other decimal expressions as their operands. The root cause of this issue was in visiting outer nodes before their children. Consider the example below:
      
      ```
          val inputSchema = StructType(StructField("col", DecimalType(26, 6)) :: Nil)
          val sc = spark.sparkContext
          val rdd = sc.parallelize(1 to 2).map(_ => Row(BigDecimal(12)))
          val df = spark.createDataFrame(rdd, inputSchema)
      
          // Works correctly since no nested decimal expression is involved
          // Expected result type: (26, 6) * (26, 6) = (38, 12)
          df.select($"col" * $"col").explain(true)
          df.select($"col" * $"col").printSchema()
      
          // Gives a wrong result since there is a nested decimal expression that should be visited first
          // Expected result type: ((26, 6) * (26, 6)) * (26, 6) = (38, 12) * (26, 6) = (38, 18)
          df.select($"col" * $"col" * $"col").explain(true)
          df.select($"col" * $"col" * $"col").printSchema()
      ```
      
      The example above gives the following output:
      
      ```
      // Correct result without sub-expressions
      == Parsed Logical Plan ==
      'Project [('col * 'col) AS (col * col)#4]
      +- LogicalRDD [col#1]
      
      == Analyzed Logical Plan ==
      (col * col): decimal(38,12)
      Project [CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS (col * col)#4]
      +- LogicalRDD [col#1]
      
      == Optimized Logical Plan ==
      Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4]
      +- LogicalRDD [col#1]
      
      == Physical Plan ==
      *Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4]
      +- Scan ExistingRDD[col#1]
      
      // Schema
      root
       |-- (col * col): decimal(38,12) (nullable = true)
      
      // Incorrect result with sub-expressions
      == Parsed Logical Plan ==
      'Project [(('col * 'col) * 'col) AS ((col * col) * col)#11]
      +- LogicalRDD [col#1]
      
      == Analyzed Logical Plan ==
      ((col * col) * col): decimal(38,12)
      Project [CheckOverflow((promote_precision(cast(CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS ((col * col) * col)#11]
      +- LogicalRDD [col#1]
      
      == Optimized Logical Plan ==
      Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)#11]
      +- LogicalRDD [col#1]
      
      == Physical Plan ==
      *Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)#11]
      +- Scan ExistingRDD[col#1]
      
      // Schema
      root
       |-- ((col * col) * col): decimal(38,12) (nullable = true)
      ```
      
      ## How was this patch tested?
      
      This PR was tested with available unit tests. Moreover, there are tests to cover previously failing scenarios.
      
      Author: aokolnychyi <anton.okolnychyi@sap.com>
      
      Closes #18583 from aokolnychyi/spark-21332.
      
      (cherry picked from commit 0be5fb41)
      Signed-off-by: default avatargatorsmile <gatorsmile@gmail.com>
      caf32b3c
    • Kazuaki Ishizaki's avatar
      [SPARK-19104][BACKPORT-2.1][SQL] Lambda variables in ExternalMapToCatalyst should be global · a9efce46
      Kazuaki Ishizaki authored
      ## What changes were proposed in this pull request?
      
      This PR is backport of #18418 to Spark 2.1. [SPARK-21391](https://issues.apache.org/jira/browse/SPARK-21391) reported this problem in Spark 2.1.
      
      The issue happens in `ExternalMapToCatalyst`. For example, the following codes create ExternalMap`ExternalMapToCatalyst`ToCatalyst to convert Scala Map to catalyst map format.
      
      ```
      val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> InnerData("name", i + 100))))
      val ds = spark.createDataset(data)
      ```
      The `valueConverter` in `ExternalMapToCatalyst` looks like:
      
      ```
      if (isnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true))) null else named_struct(name, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).name, true), value, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, ExternalMapToCatalyst_value_isNull52, ObjectType(class org.apache.spark.sql.InnerData), true)).value)
      ```
      There is a `CreateNamedStruct` expression (`named_struct`) to create a row of `InnerData.name` and `InnerData.value` that are referred by `ExternalMapToCatalyst_value52`.
      
      Because `ExternalMapToCatalyst_value52` are local variable, when `CreateNamedStruct` splits expressions to individual functions, the local variable can't be accessed anymore.
      
      ## How was this patch tested?
      
      Added a new test suite into `DatasetPrimitiveSuite`
      
      Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
      
      Closes #18627 from kiszk/SPARK-21391.
      a9efce46
  20. Jul 14, 2017
  21. Jul 09, 2017
  22. Jul 08, 2017
  23. Jul 06, 2017
    • Sumedh Wale's avatar
      [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream · 7f7b63bb
      Sumedh Wale authored
      
      ## What changes were proposed in this pull request?
      
      Corrects offsetInBytes calculation in UnsafeRow.writeToStream. Known failures include writes to some DataSources that have own SparkPlan implementations and cause EXCHANGE in writes.
      
      ## How was this patch tested?
      
      Extended UnsafeRowSuite.writeToStream to include an UnsafeRow over byte array having non-zero offset.
      
      Author: Sumedh Wale <swale@snappydata.io>
      
      Closes #18535 from sumwale/SPARK-21312.
      
      (cherry picked from commit 14a3bb3a)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      7f7b63bb
  24. Jul 04, 2017
    • Dongjoon Hyun's avatar
      [SPARK-20256][SQL][BRANCH-2.1] SessionState should be created more lazily · 8f1ca695
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      `SessionState` is designed to be created lazily. However, in reality, it created immediately in `SparkSession.Builder.getOrCreate` ([here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L943)).
      
      This PR aims to recover the lazy behavior by keeping the options into `initialSessionOptions`. The benefit is like the following. Users can start `spark-shell` and use RDD operations without any problems.
      
      **BEFORE**
      ```scala
      $ bin/spark-shell
      java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'
      ...
      Caused by: org.apache.spark.sql.AnalysisException:
          org.apache.hadoop.hive.ql.metadata.HiveException:
             MetaException(message:java.security.AccessControlException:
                Permission denied: user=spark, access=READ,
                   inode="/apps/hive/warehouse":hive:hdfs:drwx------
      ```
      As reported in SPARK-20256, this happens when the warehouse directory is not allowed for this user.
      
      **AFTER**
      ```scala
      $ bin/spark-shell
      ...
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 2.1.2-SNAPSHOT
            /_/
      
      Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> sc.range(0, 10, 1).count()
      res0: Long = 10
      ```
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #18530 from dongjoon-hyun/SPARK-20256-BRANCH-2.1.
      8f1ca695
  25. Jun 30, 2017
  26. Jun 29, 2017
    • Herman van Hovell's avatar
      [SPARK-21258][SQL] Fix WindowExec complex object aggregation with spilling · d995dac1
      Herman van Hovell authored
      ## What changes were proposed in this pull request?
      `WindowExec` currently improperly stores complex objects (UnsafeRow, UnsafeArrayData, UnsafeMapData, UTF8String) during aggregation by keeping a reference in the buffer used by `GeneratedMutableProjections` to the actual input data. Things go wrong when the input object (or the backing bytes) are reused for other things. This could happen in window functions when it starts spilling to disk. When reading the back the spill files the `UnsafeSorterSpillReader` reuses the buffer to which the `UnsafeRow` points, leading to weird corruption scenario's. Note that this only happens for aggregate functions that preserve (parts of) their input, for example `FIRST`, `LAST`, `MIN` & `MAX`.
      
      This was not seen before, because the spilling logic was not doing actual spills as much and actually used an in-memory page. This page was not cleaned up during window processing and made sure unsafe objects point to their own dedicated memory location. This was changed by https://github.com/apache/spark/pull/16909
      
      , after this PR Spark spills more eagerly.
      
      This PR provides a surgical fix because we are close to releasing Spark 2.2. This change just makes sure that there cannot be any object reuse at the expensive of a little bit of performance. We will follow-up with a more subtle solution at a later point.
      
      ## How was this patch tested?
      Added a regression test to `DataFrameWindowFunctionsSuite`.
      
      Author: Herman van Hovell <hvanhovell@databricks.com>
      
      Closes #18470 from hvanhovell/SPARK-21258.
      
      (cherry picked from commit e2f32ee4)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      d995dac1
    • IngoSchuster's avatar
      [SPARK-21176][WEB UI] Limit number of selector threads for admin ui proxy servlets to 8 · 083adb07
      IngoSchuster authored
      ## What changes were proposed in this pull request?
      Please see also https://issues.apache.org/jira/browse/SPARK-21176
      
      This change limits the number of selector threads that jetty creates to maximum 8 per proxy servlet (Jetty default is number of processors / 2).
      The newHttpClient for Jettys ProxyServlet class is overwritten to avoid the Jetty defaults (which are designed for high-performance http servers).
      Once https://github.com/eclipse/jetty.project/issues/1643
      
       is available, the code could be cleaned up to avoid the method override.
      
      I really need this on v2.1.1 - what is the best way for a backport automatic merge works fine)? Shall I create another PR?
      
      ## How was this patch tested?
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      The patch was tested manually on a Spark cluster with a head node that has 88 processors using JMX to verify that the number of selector threads is now limited to 8 per proxy.
      
      gurvindersingh zsxwing can you please review the change?
      
      Author: IngoSchuster <ingo.schuster@de.ibm.com>
      Author: Ingo Schuster <ingo.schuster@de.ibm.com>
      
      Closes #18437 from IngoSchuster/master.
      
      (cherry picked from commit 88a536ba)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      083adb07
  27. Jun 25, 2017
  28. Jun 24, 2017
    • gatorsmile's avatar
      [SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct · 0d6b701e
      gatorsmile authored
      
      ### What changes were proposed in this pull request?
      ```SQL
      CREATE TABLE `tab1`
      (`custom_fields` ARRAY<STRUCT<`id`: BIGINT, `value`: STRING>>)
      USING parquet
      
      INSERT INTO `tab1`
      SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b'))
      
      SELECT custom_fields.id, custom_fields.value FROM tab1
      ```
      
      The above query always return the last struct of the array, because the rule `SimplifyCasts` incorrectly rewrites the query. The underlying cause is we always use the same `GenericInternalRow` object when doing the cast.
      
      ### How was this patch tested?
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18412 from gatorsmile/castStruct.
      
      (cherry picked from commit 2e1586f6)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      0d6b701e
    • Marcelo Vanzin's avatar
      [SPARK-21159][CORE] Don't try to connect to launcher in standalone cluster mode. · 6750db3f
      Marcelo Vanzin authored
      
      Monitoring for standalone cluster mode is not implemented (see SPARK-11033), but
      the same scheduler implementation is used, and if it tries to connect to the
      launcher it will fail. So fix the scheduler so it only tries that in client mode;
      cluster mode applications will be correctly launched and will work, but monitoring
      through the launcher handle will not be available.
      
      Tested by running a cluster mode app with "SparkLauncher.startApplication".
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18397 from vanzin/SPARK-21159.
      
      (cherry picked from commit bfd73a7c)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      6750db3f
    • Gabor Feher's avatar
      [SPARK-20555][SQL] Fix mapping of Oracle DECIMAL types to Spark types in read path · f12883e3
      Gabor Feher authored
      This PR is to revert some code changes in the read path of https://github.com/apache/spark/pull/14377. The original fix is https://github.com/apache/spark/pull/17830
      
      When merging this PR, please give the credit to gaborfeher
      
      Added a test case to OracleIntegrationSuite.scala
      
      Author: Gabor Feher <gabor.feher@lynxanalytics.com>
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18408 from gatorsmile/OracleType.
      f12883e3
  29. Jun 23, 2017
    • Ong Ming Yang's avatar
      [MINOR][DOCS] Docs in DataFrameNaFunctions.scala use wrong method · bcaf06c4
      Ong Ming Yang authored
      
      ## What changes were proposed in this pull request?
      
      * Following the first few examples in this file, the remaining methods should also be methods of `df.na` not `df`.
      * Filled in some missing parentheses
      
      ## How was this patch tested?
      
      N/A
      
      Author: Ong Ming Yang <me@ongmingyang.com>
      
      Closes #18398 from ongmingyang/master.
      
      (cherry picked from commit 4cc62951)
      Signed-off-by: default avatarXiao Li <gatorsmile@gmail.com>
      bcaf06c4
    • Dhruve Ashar's avatar
      [SPARK-21181] Release byteBuffers to suppress netty error messages · f8fd3b48
      Dhruve Ashar authored
      ## What changes were proposed in this pull request?
      We are explicitly calling release on the byteBuf's used to encode the string to Base64 to suppress the memory leak error message reported by netty. This is to make it less confusing for the user.
      
      ### Changes proposed in this fix
      By explicitly invoking release on the byteBuf's we are decrement the internal reference counts for the wrappedByteBuf's. Now, when the GC kicks in, these would be reclaimed as before, just that netty wouldn't report any memory leak error messages as the internal ref. counts are now 0.
      
      ## How was this patch tested?
      Ran a few spark-applications and examined the logs. The error message no longer appears.
      
      Original PR was opened against branch-2.1 => https://github.com/apache/spark/pull/18392
      
      
      
      Author: Dhruve Ashar <dhruveashar@gmail.com>
      
      Closes #18407 from dhruve/master.
      
      (cherry picked from commit 1ebe7ffe)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      f8fd3b48
  30. Jun 22, 2017
Loading