Skip to content
Snippets Groups Projects
  1. Apr 10, 2017
    • Sean Owen's avatar
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish... · a26e3ed5
      Sean Owen authored
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems
      
      ## What changes were proposed in this pull request?
      
      Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
      
      The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17527 from srowen/SPARK-20156.
      a26e3ed5
    • Xiao Li's avatar
      [SPARK-20273][SQL] Disallow Non-deterministic Filter push-down into Join Conditions · fd711ea1
      Xiao Li authored
      ## What changes were proposed in this pull request?
      ```
      sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b having r > 0.5").show()
      ```
      We will get the following error:
      ```
      Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor driver): java.lang.NullPointerException
      	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
      	at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
      	at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
      ```
      Filters could be pushed down to the join conditions by the optimizer rule `PushPredicateThroughJoin`. However, Analyzer [blocks users to add non-deterministics conditions](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L386-L395) (For details, see the PR https://github.com/apache/spark/pull/7535).
      
      We should not push down non-deterministic conditions; otherwise, we need to explicitly initialize the non-deterministic expressions. This PR is to simply block it.
      
      ### How was this patch tested?
      Added a test case
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #17585 from gatorsmile/joinRandCondition.
      fd711ea1
    • hyukjinkwon's avatar
      [SPARK-19518][SQL] IGNORE NULLS in first / last in SQL · 5acaf8c0
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to add `IGNORE NULLS` keyword in `first`/`last` in Spark's parser likewise http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions057.htm.  This simply maps the keywords to existing `ignoreNullsExpr`.
      
      **Before**
      
      ```scala
      scala> sql("select first('a' IGNORE NULLS)").show()
      ```
      
      ```
      org.apache.spark.sql.catalyst.parser.ParseException:
      extraneous input 'NULLS' expecting {')', ','}(line 1, pos 24)
      
      == SQL ==
      select first('a' IGNORE NULLS)
      ------------------------^^^
      
        at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:210)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:112)
        at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:46)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:66)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:622)
        ... 48 elided
      ```
      
      **After**
      
      ```scala
      scala> sql("select first('a' IGNORE NULLS)").show()
      ```
      
      ```
      +--------------+
      |first(a, true)|
      +--------------+
      |             a|
      +--------------+
      ```
      
      ## How was this patch tested?
      
      Unit tests in `ExpressionParserSuite`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17566 from HyukjinKwon/SPARK-19518.
      5acaf8c0
    • Bogdan Raducanu's avatar
      [SPARK-20243][TESTS] DebugFilesystem.assertNoOpenStreams thread race · 4f7d49b9
      Bogdan Raducanu authored
      ## What changes were proposed in this pull request?
      
      Synchronize access to openStreams map.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Bogdan Raducanu <bogdan@databricks.com>
      
      Closes #17592 from bogdanrdc/SPARK-20243.
      4f7d49b9
    • Wenchen Fan's avatar
      [SPARK-20229][SQL] add semanticHash to QueryPlan · 3d7f201f
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Like `Expression`, `QueryPlan` should also have a `semanticHash` method, then we can put plans to a hash map and look it up fast. This PR refactors `QueryPlan` to follow `Expression` and put all the normalization logic in `QueryPlan.canonicalized`, so that it's very natural to implement `semanticHash`.
      
      follow-up: improve `CacheManager` to leverage this `semanticHash` and speed up plan lookup, instead of iterating all cached plans.
      
      ## How was this patch tested?
      
      existing tests. Note that we don't need to test the `semanticHash` method, once the existing tests prove `sameResult` is correct, we are good.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #17541 from cloud-fan/plan-semantic.
      3d7f201f
    • DB Tsai's avatar
      [SPARK-20270][SQL] na.fill should not change the values in long or integer... · 1a0bc416
      DB Tsai authored
      [SPARK-20270][SQL] na.fill should not change the values in long or integer when the default value is in double
      
      ## What changes were proposed in this pull request?
      
      This bug was partially addressed in SPARK-18555 https://github.com/apache/spark/pull/15994, but the root cause isn't completely solved. This bug is pretty critical since it changes the member id in Long in our application if the member id can not be represented by Double losslessly when the member id is very big.
      
      Here is an example how this happens, with
      ```
            Seq[(java.lang.Long, java.lang.Double)]((null, 3.14), (9123146099426677101L, null),
              (9123146560113991650L, 1.6), (null, null)).toDF("a", "b").na.fill(0.2),
      ```
      the logical plan will be
      ```
      == Analyzed Logical Plan ==
      a: bigint, b: double
      Project [cast(coalesce(cast(a#232L as double), cast(0.2 as double)) as bigint) AS a#240L, cast(coalesce(nanvl(b#233, cast(null as double)), 0.2) as double) AS b#241]
      +- Project [_1#229L AS a#232L, _2#230 AS b#233]
         +- LocalRelation [_1#229L, _2#230]
      ```
      
      Note that even the value is not null, Spark will cast the Long into Double first. Then if it's not null, Spark will cast it back to Long which results in losing precision.
      
      The behavior should be that the original value should not be changed if it's not null, but Spark will change the value which is wrong.
      
      With the PR, the logical plan will be
      ```
      == Analyzed Logical Plan ==
      a: bigint, b: double
      Project [coalesce(a#232L, cast(0.2 as bigint)) AS a#240L, coalesce(nanvl(b#233, cast(null as double)), cast(0.2 as double)) AS b#241]
      +- Project [_1#229L AS a#232L, _2#230 AS b#233]
         +- LocalRelation [_1#229L, _2#230]
      ```
      which behaves correctly without changing the original Long values and also avoids extra cost of unnecessary casting.
      
      ## How was this patch tested?
      
      unit test added.
      
      +cc srowen rxin cloud-fan gatorsmile
      
      Thanks.
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #17577 from dbtsai/fixnafill.
      Unverified
      1a0bc416
  2. Apr 09, 2017
    • Reynold Xin's avatar
      [SPARK-20264][SQL] asm should be non-test dependency in sql/core · 7bfa05e0
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      sq/core module currently declares asm as a test scope dependency. Transitively it should actually be a normal dependency since the actual core module defines it. This occasionally confuses IntelliJ.
      
      ## How was this patch tested?
      N/A - This is a build change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #17574 from rxin/SPARK-20264.
      7bfa05e0
    • Kazuaki Ishizaki's avatar
      [SPARK-20253][SQL] Remove unnecessary nullchecks of a return value from Spark... · 7a63f5e8
      Kazuaki Ishizaki authored
      [SPARK-20253][SQL] Remove unnecessary nullchecks of a return value from Spark runtime routines in generated Java code
      
      ## What changes were proposed in this pull request?
      
      This PR elminates unnecessary nullchecks of a return value from known Spark runtime routines. We know whether a given Spark runtime routine returns ``null`` or not (e.g. ``ArrayData.toDoubleArray()`` never returns ``null``). Thus, we can eliminate a null check for the return value from the Spark runtime routine.
      
      When we run the following example program, now we get the Java code "Without this PR". In this code, since we know ``ArrayData.toDoubleArray()`` never returns ``null```, we can eliminate null checks at lines 90-92, and 97.
      
      ```java
      val ds = sparkContext.parallelize(Seq(Array(1.1, 2.2)), 1).toDS.cache
      ds.count
      ds.map(e => e).show
      ```
      
      Without this PR
      ```java
      /* 050 */   protected void processNext() throws java.io.IOException {
      /* 051 */     while (inputadapter_input.hasNext() && !stopEarly()) {
      /* 052 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
      /* 053 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
      /* 054 */       ArrayData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getArray(0));
      /* 055 */
      /* 056 */       ArrayData deserializetoobject_value1 = null;
      /* 057 */
      /* 058 */       if (!inputadapter_isNull) {
      /* 059 */         int deserializetoobject_dataLength = inputadapter_value.numElements();
      /* 060 */
      /* 061 */         Double[] deserializetoobject_convertedArray = null;
      /* 062 */         deserializetoobject_convertedArray = new Double[deserializetoobject_dataLength];
      /* 063 */
      /* 064 */         int deserializetoobject_loopIndex = 0;
      /* 065 */         while (deserializetoobject_loopIndex < deserializetoobject_dataLength) {
      /* 066 */           MapObjects_loopValue2 = (double) (inputadapter_value.getDouble(deserializetoobject_loopIndex));
      /* 067 */           MapObjects_loopIsNull2 = inputadapter_value.isNullAt(deserializetoobject_loopIndex);
      /* 068 */
      /* 069 */           if (MapObjects_loopIsNull2) {
      /* 070 */             throw new RuntimeException(((java.lang.String) references[0]));
      /* 071 */           }
      /* 072 */           if (false) {
      /* 073 */             deserializetoobject_convertedArray[deserializetoobject_loopIndex] = null;
      /* 074 */           } else {
      /* 075 */             deserializetoobject_convertedArray[deserializetoobject_loopIndex] = MapObjects_loopValue2;
      /* 076 */           }
      /* 077 */
      /* 078 */           deserializetoobject_loopIndex += 1;
      /* 079 */         }
      /* 080 */
      /* 081 */         deserializetoobject_value1 = new org.apache.spark.sql.catalyst.util.GenericArrayData(deserializetoobject_convertedArray); /*###*/
      /* 082 */       }
      /* 083 */       boolean deserializetoobject_isNull = true;
      /* 084 */       double[] deserializetoobject_value = null;
      /* 085 */       if (!inputadapter_isNull) {
      /* 086 */         deserializetoobject_isNull = false;
      /* 087 */         if (!deserializetoobject_isNull) {
      /* 088 */           Object deserializetoobject_funcResult = null;
      /* 089 */           deserializetoobject_funcResult = deserializetoobject_value1.toDoubleArray();
      /* 090 */           if (deserializetoobject_funcResult == null) {
      /* 091 */             deserializetoobject_isNull = true;
      /* 092 */           } else {
      /* 093 */             deserializetoobject_value = (double[]) deserializetoobject_funcResult;
      /* 094 */           }
      /* 095 */
      /* 096 */         }
      /* 097 */         deserializetoobject_isNull = deserializetoobject_value == null;
      /* 098 */       }
      /* 099 */
      /* 100 */       boolean mapelements_isNull = true;
      /* 101 */       double[] mapelements_value = null;
      /* 102 */       if (!false) {
      /* 103 */         mapelements_resultIsNull = false;
      /* 104 */
      /* 105 */         if (!mapelements_resultIsNull) {
      /* 106 */           mapelements_resultIsNull = deserializetoobject_isNull;
      /* 107 */           mapelements_argValue = deserializetoobject_value;
      /* 108 */         }
      /* 109 */
      /* 110 */         mapelements_isNull = mapelements_resultIsNull;
      /* 111 */         if (!mapelements_isNull) {
      /* 112 */           Object mapelements_funcResult = null;
      /* 113 */           mapelements_funcResult = ((scala.Function1) references[1]).apply(mapelements_argValue);
      /* 114 */           if (mapelements_funcResult == null) {
      /* 115 */             mapelements_isNull = true;
      /* 116 */           } else {
      /* 117 */             mapelements_value = (double[]) mapelements_funcResult;
      /* 118 */           }
      /* 119 */
      /* 120 */         }
      /* 121 */         mapelements_isNull = mapelements_value == null;
      /* 122 */       }
      /* 123 */
      /* 124 */       serializefromobject_resultIsNull = false;
      /* 125 */
      /* 126 */       if (!serializefromobject_resultIsNull) {
      /* 127 */         serializefromobject_resultIsNull = mapelements_isNull;
      /* 128 */         serializefromobject_argValue = mapelements_value;
      /* 129 */       }
      /* 130 */
      /* 131 */       boolean serializefromobject_isNull = serializefromobject_resultIsNull;
      /* 132 */       final ArrayData serializefromobject_value = serializefromobject_resultIsNull ? null : org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(serializefromobject_argValue);
      /* 133 */       serializefromobject_isNull = serializefromobject_value == null;
      /* 134 */       serializefromobject_holder.reset();
      /* 135 */
      /* 136 */       serializefromobject_rowWriter.zeroOutNullBytes();
      /* 137 */
      /* 138 */       if (serializefromobject_isNull) {
      /* 139 */         serializefromobject_rowWriter.setNullAt(0);
      /* 140 */       } else {
      /* 141 */         // Remember the current cursor so that we can calculate how many bytes are
      /* 142 */         // written later.
      /* 143 */         final int serializefromobject_tmpCursor = serializefromobject_holder.cursor;
      /* 144 */
      /* 145 */         if (serializefromobject_value instanceof UnsafeArrayData) {
      /* 146 */           final int serializefromobject_sizeInBytes = ((UnsafeArrayData) serializefromobject_value).getSizeInBytes();
      /* 147 */           // grow the global buffer before writing data.
      /* 148 */           serializefromobject_holder.grow(serializefromobject_sizeInBytes);
      /* 149 */           ((UnsafeArrayData) serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor);
      /* 150 */           serializefromobject_holder.cursor += serializefromobject_sizeInBytes;
      /* 151 */
      /* 152 */         } else {
      /* 153 */           final int serializefromobject_numElements = serializefromobject_value.numElements();
      /* 154 */           serializefromobject_arrayWriter.initialize(serializefromobject_holder, serializefromobject_numElements, 8);
      /* 155 */
      /* 156 */           for (int serializefromobject_index = 0; serializefromobject_index < serializefromobject_numElements; serializefromobject_index++) {
      /* 157 */             if (serializefromobject_value.isNullAt(serializefromobject_index)) {
      /* 158 */               serializefromobject_arrayWriter.setNullDouble(serializefromobject_index);
      /* 159 */             } else {
      /* 160 */               final double serializefromobject_element = serializefromobject_value.getDouble(serializefromobject_index);
      /* 161 */               serializefromobject_arrayWriter.write(serializefromobject_index, serializefromobject_element);
      /* 162 */             }
      /* 163 */           }
      /* 164 */         }
      /* 165 */
      /* 166 */         serializefromobject_rowWriter.setOffsetAndSize(0, serializefromobject_tmpCursor, serializefromobject_holder.cursor - serializefromobject_tmpCursor);
      /* 167 */       }
      /* 168 */       serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize());
      /* 169 */       append(serializefromobject_result);
      /* 170 */       if (shouldStop()) return;
      /* 171 */     }
      /* 172 */   }
      ```
      
      With this PR (removed most of lines 90-97 in the above code)
      ```java
      /* 050 */   protected void processNext() throws java.io.IOException {
      /* 051 */     while (inputadapter_input.hasNext() && !stopEarly()) {
      /* 052 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
      /* 053 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
      /* 054 */       ArrayData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getArray(0));
      /* 055 */
      /* 056 */       ArrayData deserializetoobject_value1 = null;
      /* 057 */
      /* 058 */       if (!inputadapter_isNull) {
      /* 059 */         int deserializetoobject_dataLength = inputadapter_value.numElements();
      /* 060 */
      /* 061 */         Double[] deserializetoobject_convertedArray = null;
      /* 062 */         deserializetoobject_convertedArray = new Double[deserializetoobject_dataLength];
      /* 063 */
      /* 064 */         int deserializetoobject_loopIndex = 0;
      /* 065 */         while (deserializetoobject_loopIndex < deserializetoobject_dataLength) {
      /* 066 */           MapObjects_loopValue2 = (double) (inputadapter_value.getDouble(deserializetoobject_loopIndex));
      /* 067 */           MapObjects_loopIsNull2 = inputadapter_value.isNullAt(deserializetoobject_loopIndex);
      /* 068 */
      /* 069 */           if (MapObjects_loopIsNull2) {
      /* 070 */             throw new RuntimeException(((java.lang.String) references[0]));
      /* 071 */           }
      /* 072 */           if (false) {
      /* 073 */             deserializetoobject_convertedArray[deserializetoobject_loopIndex] = null;
      /* 074 */           } else {
      /* 075 */             deserializetoobject_convertedArray[deserializetoobject_loopIndex] = MapObjects_loopValue2;
      /* 076 */           }
      /* 077 */
      /* 078 */           deserializetoobject_loopIndex += 1;
      /* 079 */         }
      /* 080 */
      /* 081 */         deserializetoobject_value1 = new org.apache.spark.sql.catalyst.util.GenericArrayData(deserializetoobject_convertedArray); /*###*/
      /* 082 */       }
      /* 083 */       boolean deserializetoobject_isNull = true;
      /* 084 */       double[] deserializetoobject_value = null;
      /* 085 */       if (!inputadapter_isNull) {
      /* 086 */         deserializetoobject_isNull = false;
      /* 087 */         if (!deserializetoobject_isNull) {
      /* 088 */           Object deserializetoobject_funcResult = null;
      /* 089 */           deserializetoobject_funcResult = deserializetoobject_value1.toDoubleArray();
      /* 090 */           deserializetoobject_value = (double[]) deserializetoobject_funcResult;
      /* 091 */
      /* 092 */         }
      /* 093 */
      /* 094 */       }
      /* 095 */
      /* 096 */       boolean mapelements_isNull = true;
      /* 097 */       double[] mapelements_value = null;
      /* 098 */       if (!false) {
      /* 099 */         mapelements_resultIsNull = false;
      /* 100 */
      /* 101 */         if (!mapelements_resultIsNull) {
      /* 102 */           mapelements_resultIsNull = deserializetoobject_isNull;
      /* 103 */           mapelements_argValue = deserializetoobject_value;
      /* 104 */         }
      /* 105 */
      /* 106 */         mapelements_isNull = mapelements_resultIsNull;
      /* 107 */         if (!mapelements_isNull) {
      /* 108 */           Object mapelements_funcResult = null;
      /* 109 */           mapelements_funcResult = ((scala.Function1) references[1]).apply(mapelements_argValue);
      /* 110 */           if (mapelements_funcResult == null) {
      /* 111 */             mapelements_isNull = true;
      /* 112 */           } else {
      /* 113 */             mapelements_value = (double[]) mapelements_funcResult;
      /* 114 */           }
      /* 115 */
      /* 116 */         }
      /* 117 */         mapelements_isNull = mapelements_value == null;
      /* 118 */       }
      /* 119 */
      /* 120 */       serializefromobject_resultIsNull = false;
      /* 121 */
      /* 122 */       if (!serializefromobject_resultIsNull) {
      /* 123 */         serializefromobject_resultIsNull = mapelements_isNull;
      /* 124 */         serializefromobject_argValue = mapelements_value;
      /* 125 */       }
      /* 126 */
      /* 127 */       boolean serializefromobject_isNull = serializefromobject_resultIsNull;
      /* 128 */       final ArrayData serializefromobject_value = serializefromobject_resultIsNull ? null : org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(serializefromobject_argValue);
      /* 129 */       serializefromobject_isNull = serializefromobject_value == null;
      /* 130 */       serializefromobject_holder.reset();
      /* 131 */
      /* 132 */       serializefromobject_rowWriter.zeroOutNullBytes();
      /* 133 */
      /* 134 */       if (serializefromobject_isNull) {
      /* 135 */         serializefromobject_rowWriter.setNullAt(0);
      /* 136 */       } else {
      /* 137 */         // Remember the current cursor so that we can calculate how many bytes are
      /* 138 */         // written later.
      /* 139 */         final int serializefromobject_tmpCursor = serializefromobject_holder.cursor;
      /* 140 */
      /* 141 */         if (serializefromobject_value instanceof UnsafeArrayData) {
      /* 142 */           final int serializefromobject_sizeInBytes = ((UnsafeArrayData) serializefromobject_value).getSizeInBytes();
      /* 143 */           // grow the global buffer before writing data.
      /* 144 */           serializefromobject_holder.grow(serializefromobject_sizeInBytes);
      /* 145 */           ((UnsafeArrayData) serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor);
      /* 146 */           serializefromobject_holder.cursor += serializefromobject_sizeInBytes;
      /* 147 */
      /* 148 */         } else {
      /* 149 */           final int serializefromobject_numElements = serializefromobject_value.numElements();
      /* 150 */           serializefromobject_arrayWriter.initialize(serializefromobject_holder, serializefromobject_numElements, 8);
      /* 151 */
      /* 152 */           for (int serializefromobject_index = 0; serializefromobject_index < serializefromobject_numElements; serializefromobject_index++) {
      /* 153 */             if (serializefromobject_value.isNullAt(serializefromobject_index)) {
      /* 154 */               serializefromobject_arrayWriter.setNullDouble(serializefromobject_index);
      /* 155 */             } else {
      /* 156 */               final double serializefromobject_element = serializefromobject_value.getDouble(serializefromobject_index);
      /* 157 */               serializefromobject_arrayWriter.write(serializefromobject_index, serializefromobject_element);
      /* 158 */             }
      /* 159 */           }
      /* 160 */         }
      /* 161 */
      /* 162 */         serializefromobject_rowWriter.setOffsetAndSize(0, serializefromobject_tmpCursor, serializefromobject_holder.cursor - serializefromobject_tmpCursor);
      /* 163 */       }
      /* 164 */       serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize());
      /* 165 */       append(serializefromobject_result);
      /* 166 */       if (shouldStop()) return;
      /* 167 */     }
      /* 168 */   }
      ```
      
      ## How was this patch tested?
      
      Add test suites to ``DatasetPrimitiveSuite``
      
      Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
      
      Closes #17569 from kiszk/SPARK-20253.
      7a63f5e8
    • Vijay Ramesh's avatar
      [SPARK-20260][MLLIB] String interpolation required for error message · 261eaf51
      Vijay Ramesh authored
      ## What changes were proposed in this pull request?
      This error message doesn't get properly formatted because of a missing `s`.  Currently the error looks like:
      
      ```
      Caused by: java.lang.IllegalArgumentException: requirement failed: indices should be one-based and in ascending order; found current=$current, previous=$previous; line="$line"
      ```
      (note the literal `$current` instead of the interpolated value)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Vijay Ramesh <vramesh@demandbase.com>
      
      Closes #17572 from vijaykramesh/master.
      261eaf51
    • Sean Owen's avatar
      [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer performance improvement · 1f0de3c1
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Avoid `NoSuchElementException` every time `ConfigProvider.get(val, default)` falls back to default. This apparently causes non-trivial overhead in at least one path, and can easily be avoided.
      
      See https://github.com/apache/spark/pull/17329
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17567 from srowen/SPARK-19991.
      1f0de3c1
    • asmith26's avatar
      [MINOR] Issue: Change "slice" vs "partition" in exception messages (and code?) · 34fc48fb
      asmith26 authored
      ## What changes were proposed in this pull request?
      
      Came across the term "slice" when running some spark scala code. Consequently, a Google search indicated that "slices" and "partitions" refer to the same things; indeed see:
      
      - [This issue](https://issues.apache.org/jira/browse/SPARK-1701)
      - [This pull request](https://github.com/apache/spark/pull/2305)
      - [This StackOverflow answer](http://stackoverflow.com/questions/23436640/what-is-the-difference-between-an-rdd-partition-and-a-slice) and [this one](http://stackoverflow.com/questions/24269495/what-are-the-differences-between-slices-and-partitions-of-rdds)
      
      Thus this pull request fixes the occurrence of slice I came accross. Nonetheless, [it would appear](https://github.com/apache/spark/search?utf8=%E2%9C%93&q=slice&type=) there are still many references to "slice/slices" - thus I thought I'd raise this Pull Request to address the issue (sorry if this is the wrong place, I'm not too familar with raising apache issues).
      
      ## How was this patch tested?
      
      (Not tested locally - only a minor exception message change.)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: asmith26 <asmith26@users.noreply.github.com>
      
      Closes #17565 from asmith26/master.
      34fc48fb
  3. Apr 07, 2017
    • Reynold Xin's avatar
      [SPARK-20262][SQL] AssertNotNull should throw NullPointerException · e1afc4dc
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      AssertNotNull currently throws RuntimeException. It should throw NullPointerException, which is more specific.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #17573 from rxin/SPARK-20262.
      e1afc4dc
    • Wenchen Fan's avatar
      [SPARK-20246][SQL] should not push predicate down through aggregate with... · 7577e9c3
      Wenchen Fan authored
      [SPARK-20246][SQL] should not push predicate down through aggregate with non-deterministic expressions
      
      ## What changes were proposed in this pull request?
      
      Similar to `Project`, when `Aggregate` has non-deterministic expressions, we should not push predicate down through it, as it will change the number of input rows and thus change the evaluation result of non-deterministic expressions in `Aggregate`.
      
      ## How was this patch tested?
      
      new regression test
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #17562 from cloud-fan/filter.
      7577e9c3
    • Adrian Ionescu's avatar
      [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex · 589f3edb
      Adrian Ionescu authored
      ## What changes were proposed in this pull request
      
      Trying to get a grip on the `FileIndex` hierarchy, I was confused by the following inconsistency:
      
      On the one hand, `PartitioningAwareFileIndex` defines `leafFiles` and `leafDirToChildrenFiles` as abstract, but on the other it fully implements `listLeafFiles` which does all the listing of files. However, the latter is only used by `InMemoryFileIndex`.
      
      I'm hereby proposing to move this method (and all its dependencies) to the implementation class that actually uses it, and thus unclutter the `PartitioningAwareFileIndex` interface.
      
      ## How was this patch tested?
      
      `./build/sbt sql/test`
      
      Author: Adrian Ionescu <adrian@databricks.com>
      
      Closes #17570 from adrian-ionescu/list-leaf-files.
      589f3edb
    • actuaryzhang's avatar
      [SPARK-20258][DOC][SPARKR] Fix SparkR logistic regression example in... · 1ad73f0a
      actuaryzhang authored
      [SPARK-20258][DOC][SPARKR] Fix SparkR logistic regression example in programming guide (did not converge)
      
      ## What changes were proposed in this pull request?
      
      SparkR logistic regression example did not converge in programming guide (for IRWLS). All estimates are essentially zero:
      
      ```
      training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm")
      df_list2 <- randomSplit(training2, c(7,3), 2)
      binomialDF <- df_list2[[1]]
      binomialTestDF <- df_list2[[2]]
      binomialGLM <- spark.glm(binomialDF, label ~ features, family = "binomial")
      
      17/04/07 11:42:03 WARN WeightedLeastSquares: Cholesky solver failed due to singular covariance matrix. Retrying with Quasi-Newton solver.
      
      > summary(binomialGLM)
      
      Coefficients:
                       Estimate
      (Intercept)    9.0255e+00
      features_0     0.0000e+00
      features_1     0.0000e+00
      features_2     0.0000e+00
      features_3     0.0000e+00
      features_4     0.0000e+00
      features_5     0.0000e+00
      features_6     0.0000e+00
      features_7     0.0000e+00
      ```
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #17571 from actuaryzhang/programGuide2.
      1ad73f0a
    • Felix Cheung's avatar
      [SPARK-20197][SPARKR] CRAN check fail with package installation · 8feb799a
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Test failed because SPARK_HOME is not set before Spark is installed.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17516 from felixcheung/rdircheckincran.
      8feb799a
    • actuaryzhang's avatar
      [SPARK-20026][DOC][SPARKR] Add Tweedie example for SparkR in programming guide · 870b9d9a
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      Add Tweedie example for SparkR in programming guide.
      The doc was already updated in #17103.
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #17553 from actuaryzhang/programGuide.
      870b9d9a
    • 郭小龙 10207633's avatar
      [SPARK-20218][DOC][APP-ID] applications//stages' in REST API,add description. · 9e0893b5
      郭小龙 10207633 authored
      ## What changes were proposed in this pull request?
      
      1. '/applications/[app-id]/stages' in rest api.status should add description '?status=[active|complete|pending|failed] list only stages in the state.'
      
      Now the lack of this description, resulting in the use of this api do not know the use of the status through the brush stage list.
      
      2.'/applications/[app-id]/stages/[stage-id]' in REST API,remove redundant description ‘?status=[active|complete|pending|failed] list only stages in the state.’.
      Because only one stage is determined based on stage-id.
      
      code:
        GET
        def stageList(QueryParam("status") statuses: JList[StageStatus]): Seq[StageData] = {
          val listener = ui.jobProgressListener
          val stageAndStatus = AllStagesResource.stagesAndStatus(ui)
          val adjStatuses = {
            if (statuses.isEmpty()) {
              Arrays.asList(StageStatus.values(): _*)
            } else {
              statuses
            }
          };
      
      ## How was this patch tested?
      
      manual tests
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: 郭小龙 10207633 <guo.xiaolong1@zte.com.cn>
      
      Closes #17534 from guoxiaolongzte/SPARK-20218.
      9e0893b5
    • Liang-Chi Hsieh's avatar
      [SPARK-20076][ML][PYSPARK] Add Python interface for ml.stats.Correlation · 1a52a623
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      The Dataframes-based support for the correlation statistics is added in #17108. This patch adds the Python interface for it.
      
      ## How was this patch tested?
      
      Python unit test.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #17494 from viirya/correlation-python-api.
      1a52a623
    • Wenchen Fan's avatar
      [SPARK-20245][SQL][MINOR] pass output to LogicalRelation directly · ad3cc131
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Currently `LogicalRelation` has a `expectedOutputAttributes` parameter, which makes it hard to reason about what the actual output is. Like other leaf nodes, `LogicalRelation` should also take `output` as a parameter, to simplify the logic
      
      ## How was this patch tested?
      
      existing tests
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #17552 from cloud-fan/minor.
      ad3cc131
  4. Apr 06, 2017
    • Reynold Xin's avatar
      [SPARK-19495][SQL] Make SQLConf slightly more extensible - addendum · 626b4caf
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This is a tiny addendum to SPARK-19495 to remove the private visibility for copy, which is the only package private method in the entire file.
      
      ## How was this patch tested?
      N/A - no semantic change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #17555 from rxin/SPARK-19495-2.
      626b4caf
    • Dustin Koupal's avatar
      [MINOR][DOCS] Fix typo in Hive Examples · 8129d59d
      Dustin Koupal authored
      ## What changes were proposed in this pull request?
      
      Fix typo in hive examples from "DaraFrames" to "DataFrames"
      
      ## How was this patch tested?
      
      N/A
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Dustin Koupal <dkoupal@blizzard.com>
      
      Closes #17554 from cooper6581/typo-daraframes.
      8129d59d
    • jerryshao's avatar
      [SPARK-17019][CORE] Expose on-heap and off-heap memory usage in various places · a4491626
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      With [SPARK-13992](https://issues.apache.org/jira/browse/SPARK-13992), Spark supports persisting data into off-heap memory, but the usage of on-heap and off-heap memory is not exposed currently, it is not so convenient for user to monitor and profile, so here propose to expose off-heap memory as well as on-heap memory usage in various places:
      1. Spark UI's executor page will display both on-heap and off-heap memory usage.
      2. REST request returns both on-heap and off-heap memory.
      3. Also this can be gotten from MetricsSystem.
      4. Last this usage can be obtained programmatically from SparkListener.
      
      Attach the UI changes:
      
      ![screen shot 2016-08-12 at 11 20 44 am](https://cloud.githubusercontent.com/assets/850797/17612032/6c2f4480-607f-11e6-82e8-a27fb8cbb4ae.png)
      
      Backward compatibility is also considered for event-log and REST API. Old event log can still be replayed with off-heap usage displayed as 0. For REST API, only adds the new fields, so JSON backward compatibility can still be kept.
      ## How was this patch tested?
      
      Unit test added and manual verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #14617 from jerryshao/SPARK-17019.
      a4491626
    • Felix Cheung's avatar
      [SPARK-20195][SPARKR][SQL] add createTable catalog API and deprecate createExternalTable · 5a693b41
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Following up on #17483, add createTable (which is new in 2.2.0) and deprecate createExternalTable, plus a number of minor fixes
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17511 from felixcheung/rceatetable.
      5a693b41
    • Felix Cheung's avatar
      [SPARK-20196][PYTHON][SQL] update doc for catalog functions for all languages,... · bccc3301
      Felix Cheung authored
      [SPARK-20196][PYTHON][SQL] update doc for catalog functions for all languages, add pyspark refreshByPath API
      
      ## What changes were proposed in this pull request?
      
      Update doc to remove external for createTable, add refreshByPath in python
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17512 from felixcheung/catalogdoc.
      bccc3301
    • setjet's avatar
      [SPARK-20064][PYSPARK] Bump the PySpark verison number to 2.2 · d009fb36
      setjet authored
      ## What changes were proposed in this pull request?
      PySpark version in version.py was lagging behind
      Versioning is  in line with PEP 440: https://www.python.org/dev/peps/pep-0440/
      
      ## How was this patch tested?
      Simply rebuild the project with existing tests
      
      Author: setjet <rubenljanssen@gmail.com>
      Author: Ruben Janssen <rubenljanssen@gmail.com>
      
      Closes #17523 from setjet/SPARK-20064.
      d009fb36
    • Kalvin Chau's avatar
      [SPARK-20085][MESOS] Configurable mesos labels for executors · c8fc1f3b
      Kalvin Chau authored
      ## What changes were proposed in this pull request?
      
      Add spark.mesos.task.labels configuration option to add mesos key:value labels to the executor.
      
       "k1:v1,k2:v2" as the format, colons separating key-value and commas to list out more than one.
      
      Discussion of labels with mgummelt at #17404
      
      ## How was this patch tested?
      
      Added unit tests to verify labels were added correctly, with incorrect labels being ignored and added a test to test the name of the executor.
      
      Tested with: `./build/sbt -Pmesos mesos/test`
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Kalvin Chau <kalvin.chau@viasat.com>
      
      Closes #17413 from kalvinnchau/mesos-labels.
      c8fc1f3b
    • Bryan Cutler's avatar
      [SPARK-19953][ML] Random Forest Models use parent UID when being fit · e156b5dd
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      The ML `RandomForestClassificationModel` and `RandomForestRegressionModel` were not using the estimator parent UID when being fit.  This change fixes that so the models can be properly be identified with their parents.
      
      ## How was this patch tested?Existing tests.
      
      Added check to verify that model uid matches that of the parent, then renamed `checkCopy` to `checkCopyAndUids` and verified that it was called by one test for each ML algorithm.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #17296 from BryanCutler/rfmodels-use-parent-uid-SPARK-19953.
      e156b5dd
  5. Apr 05, 2017
    • Eric Liang's avatar
      [SPARK-20217][CORE] Executor should not fail stage if killed task throws non-interrupted exception · 5142e5d4
      Eric Liang authored
      ## What changes were proposed in this pull request?
      
      If tasks throw non-interrupted exceptions on kill (e.g. java.nio.channels.ClosedByInterruptException), their death is reported back as TaskFailed instead of TaskKilled. This causes stage failure in some cases.
      
      This is reproducible as follows. Run the following, and then use SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will fail since we threw a RuntimeException instead of InterruptedException.
      
      ```
      spark.range(100).repartition(100).foreach { i =>
        try {
          Thread.sleep(10000000)
        } catch {
          case t: InterruptedException =>
            throw new RuntimeException(t)
        }
      }
      ```
      Based on the code in TaskSetManager, I think this also affects kills of speculative tasks. However, since the number of speculated tasks is few, and usually you need to fail a task a few times before the stage is cancelled, it unlikely this would be noticed in production unless both speculation was enabled and the num allowed task failures was = 1.
      
      We should probably unconditionally return TaskKilled instead of TaskFailed if the task was killed by the driver, regardless of the actual exception thrown.
      
      ## How was this patch tested?
      
      Unit test. The test fails before the change in Executor.scala
      
      cc JoshRosen
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #17531 from ericl/fix-task-interrupt.
      5142e5d4
    • Ioana Delaney's avatar
      [SPARK-20231][SQL] Refactor star schema code for the subsequent star join detection in CBO · 4000f128
      Ioana Delaney authored
      ## What changes were proposed in this pull request?
      
      This commit moves star schema code from ```join.scala``` to ```StarSchemaDetection.scala```. It also applies some minor fixes in ```StarJoinReorderSuite.scala```.
      
      ## How was this patch tested?
      Run existing ```StarJoinReorderSuite.scala```.
      
      Author: Ioana Delaney <ioanamdelaney@gmail.com>
      
      Closes #17544 from ioana-delaney/starSchemaCBOv2.
      4000f128
    • Liang-Chi Hsieh's avatar
      [SPARK-20214][ML] Make sure converted csc matrix has sorted indices · 12206058
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      `_convert_to_vector` converts a scipy sparse matrix to csc matrix for initializing `SparseVector`. However, it doesn't guarantee the converted csc matrix has sorted indices and so a failure happens when you do something like that:
      
          from scipy.sparse import lil_matrix
          lil = lil_matrix((4, 1))
          lil[1, 0] = 1
          lil[3, 0] = 2
          _convert_to_vector(lil.todok())
      
          File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 78, in _convert_to_vector
            return SparseVector(l.shape[0], csc.indices, csc.data)
          File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 556, in __init__
            % (self.indices[i], self.indices[i + 1]))
          TypeError: Indices 3 and 1 are not strictly increasing
      
      A simple test can confirm that `dok_matrix.tocsc()` won't guarantee sorted indices:
      
          >>> from scipy.sparse import lil_matrix
          >>> lil = lil_matrix((4, 1))
          >>> lil[1, 0] = 1
          >>> lil[3, 0] = 2
          >>> dok = lil.todok()
          >>> csc = dok.tocsc()
          >>> csc.has_sorted_indices
          0
          >>> csc.indices
          array([3, 1], dtype=int32)
      
      I checked the source codes of scipy. The only way to guarantee it is `csc_matrix.tocsr()` and `csr_matrix.tocsc()`.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #17532 from viirya/make-sure-sorted-indices.
      12206058
    • Dilip Biswal's avatar
      [SPARK-20204][SQL][FOLLOWUP] SQLConf should react to change in default timezone settings · 9d68c672
      Dilip Biswal authored
      ## What changes were proposed in this pull request?
      Make sure SESSION_LOCAL_TIMEZONE reflects the change in JVM's default timezone setting. Currently several timezone related tests fail as the change to default timezone is not picked up by SQLConf.
      
      ## How was this patch tested?
      Added an unit test in ConfigEntrySuite
      
      Author: Dilip Biswal <dbiswal@us.ibm.com>
      
      Closes #17537 from dilipbiswal/timezone_debug.
      9d68c672
    • Tathagata Das's avatar
      [SPARK-20224][SS] Updated docs for streaming dropDuplicates and mapGroupsWithState · 9543fc0e
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      - Fixed bug in Java API not passing timeout conf to scala API
      - Updated markdown docs
      - Updated scala docs
      - Added scala and Java example
      
      ## How was this patch tested?
      Manually ran examples.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #17539 from tdas/SPARK-20224.
      9543fc0e
    • zero323's avatar
      [SPARK-19454][PYTHON][SQL] DataFrame.replace improvements · e2773996
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Allows skipping `value` argument if `to_replace` is a `dict`:
      	```python
      	df = sc.parallelize([("Alice", 1, 3.0)]).toDF()
      	df.replace({"Alice": "Bob"}).show()
      	````
      - Adds validation step to ensure homogeneous values / replacements.
      - Simplifies internal control flow.
      - Improves unit tests coverage.
      
      ## How was this patch tested?
      
      Existing unit tests, additional unit tests, manual testing.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #16793 from zero323/SPARK-19454.
      e2773996
    • wangzhenhua's avatar
      [SPARK-20223][SQL] Fix typo in tpcds q77.sql · a2d8d767
      wangzhenhua authored
      ## What changes were proposed in this pull request?
      
      Fix typo in tpcds q77.sql
      
      ## How was this patch tested?
      
      N/A
      
      Author: wangzhenhua <wangzhenhua@huawei.com>
      
      Closes #17538 from wzhfy/typoQ77.
      a2d8d767
    • shaolinliu's avatar
      [SPARK-19807][WEB UI] Add reason for cancellation when a stage is killed using web UI · 71c3c481
      shaolinliu authored
      ## What changes were proposed in this pull request?
      
      When a user kills a stage using web UI (in Stages page), StagesTab.handleKillRequest requests SparkContext to cancel the stage without giving a reason. SparkContext has cancelStage(stageId: Int, reason: String) that Spark could use to pass the information for monitoring/debugging purposes.
      
      ## How was this patch tested?
      
      manual tests
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: shaolinliu <liu.shaolin1@zte.com.cn>
      Author: lvdongr <lv.dongdong@zte.com.cn>
      
      Closes #17258 from shaolinliu/SPARK-19807.
      71c3c481
    • Oliver Köth's avatar
      [SPARK-20042][WEB UI] Fix log page buttons for reverse proxy mode · 6f09dc70
      Oliver Köth authored
      with spark.ui.reverseProxy=true, full path URLs like /log will point to
      the master web endpoint which is serving the worker UI as reverse proxy.
      To access a REST endpoint in the worker in reverse proxy mode , the
      leading /proxy/"target"/ part of the base URI must be retained.
      
      Added logic to log-view.js to handle this, similar to executorspage.js
      
      Patch was tested manually
      
      Author: Oliver Köth <okoeth@de.ibm.com>
      
      Closes #17370 from okoethibm/master.
      6f09dc70
    • Tathagata Das's avatar
      [SPARK-20209][SS] Execute next trigger immediately if previous batch took... · dad499f3
      Tathagata Das authored
      [SPARK-20209][SS] Execute next trigger immediately if previous batch took longer than trigger interval
      
      ## What changes were proposed in this pull request?
      
      For large trigger intervals (e.g. 10 minutes), if a batch takes 11 minutes, then it will wait for 9 mins before starting the next batch. This does not make sense. The processing time based trigger policy should be to do process batches as fast as possible, but no faster than 1 in every trigger interval. If batches are taking longer than trigger interval anyways, then no point waiting extra trigger interval.
      
      In this PR, I modified the ProcessingTimeExecutor to do so. Another minor change I did was to extract our StreamManualClock into a separate class so that it can be used outside subclasses of StreamTest. For example, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock.
      
      ## How was this patch tested?
      Added new unit tests to comprehensively test this behavior.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #17525 from tdas/SPARK-20209.
      dad499f3
    • Reynold Xin's avatar
      Small doc fix for ReuseSubquery. · b6e71032
      Reynold Xin authored
      b6e71032
    • Felix Cheung's avatar
      [SPARKR][DOC] update doc for fpgrowth · c1b8b667
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      minor update
      
      zero323
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17526 from felixcheung/rfpgrowthfollowup.
      c1b8b667
Loading