- Jul 25, 2016
-
-
Philipp Hoffmann authored
## What changes were proposed in this pull request? Mesos agents by default will not pull docker images which are cached locally already. In order to run Spark executors from mutable tags like `:latest` this commit introduces a Spark setting `spark.mesos.executor.docker.forcePullImage`. Setting this flag to true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous implementation and Mesos' default behaviour). ## How was this patch tested? I ran a sample application including this change on a Mesos cluster and verified the correct behaviour for both, with and without, force pulling the executor image. As expected the image is being force pulled if the flag is set. Author: Philipp Hoffmann <mail@philipphoffmann.de> Closes #13051 from philipphoffmann/force-pull-image.
-
Reynold Xin authored
## What changes were proposed in this pull request? This patch removes dev/audit-release. It was initially created to do basic release auditing. They have been unused by for the last one year+. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #14342 from rxin/SPARK-16685.
-
WeichenXu authored
## What changes were proposed in this pull request? replace ANN convergence tolerance param default from 1e-4 to 1e-6 so that it will be the same with other algorithms in MLLib which use LBFGS as optimizer. ## How was this patch tested? Existing Test. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14286 from WeichenXu123/update_ann_tol.
-
Felix Cheung authored
## What changes were proposed in this pull request? Fix broken url, also, sparkR.session.stop doc page should have it in the header, instead of saying "sparkR.stop"  Data type section is in the middle of a list of gapply/gapplyCollect subsections:  ## How was this patch tested? manual test Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14329 from felixcheung/rdoclinkfix.
-
Cheng Lian authored
## What changes were proposed in this pull request? This PR fixes a minor formatting issue of `WindowSpecDefinition.sql` when no partitioning expressions are present. Before: ```sql ( ORDER BY `a` ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ``` After: ```sql (ORDER BY `a` ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ``` ## How was this patch tested? New test case added in `ExpressionSQLBuilderSuite`. Author: Cheng Lian <lian@databricks.com> Closes #14334 from liancheng/window-spec-sql-format.
-
hyukjinkwon authored
## What changes were proposed in this pull request? It seems this is a regression assuming from https://issues.apache.org/jira/browse/SPARK-16698. Field name having dots throws an exception. For example the codes below: ```scala val path = "/tmp/path" val json =""" {"a.b":"data"}""" spark.sparkContext .parallelize(json :: Nil) .saveAsTextFile(path) spark.read.json(path).collect() ``` throws an exception as below: ``` Unable to resolve a.b given [a.b]; org.apache.spark.sql.AnalysisException: Unable to resolve a.b given [a.b]; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) at scala.Option.getOrElse(Option.scala:121) ``` This problem was introduced in https://github.com/apache/spark/commit/17eec0a71ba8713c559d641e3f43a1be726b037c#diff-27c76f96a7b2733ecfd6f46a1716e153R121 When extracting the data columns, it does not count that it can contains dots in field names. Actually, it seems the fields name are not expected as quoted when defining schema. So, It not have to consider whether this is wrapped with quotes because the actual schema (inferred or user-given schema) would not have the quotes for fields. For example, this throws an exception. (**Loading JSON from RDD is fine**) ```scala val json =""" {"a.b":"data"}""" val rdd = spark.sparkContext.parallelize(json :: Nil) spark.read.schema(StructType(Seq(StructField("`a.b`", StringType, true)))) .json(rdd).select("`a.b`").printSchema() ``` as below: ``` cannot resolve '```a.b```' given input columns: [`a.b`]; org.apache.spark.sql.AnalysisException: cannot resolve '```a.b```' given input columns: [`a.b`]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ``` ## How was this patch tested? Unit tests in `FileSourceStrategySuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14339 from HyukjinKwon/SPARK-16698-regression.
-
Sameer Agarwal authored
[SPARK-16668][TEST] Test parquet reader for row groups containing both dictionary and plain encoded pages ## What changes were proposed in this pull request? This patch adds an explicit test for [SPARK-14217] by setting the parquet dictionary and page size the generated parquet file spans across 3 pages (within a single row group) where the first page is dictionary encoded and the remaining two are plain encoded. ## How was this patch tested? 1. ParquetEncodingSuite 2. Also manually tested that this test fails without https://github.com/apache/spark/pull/12279 Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14304 from sameeragarwal/hybrid-encoding-test.
-
Wenchen Fan authored
## What changes were proposed in this pull request? It's weird that we have `BucketSpec` to abstract bucket info, but don't use it in `CatalogTable`. This PR moves `BucketSpec` into catalyst module. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #14331 from cloud-fan/check.
-
Wenchen Fan authored
## What changes were proposed in this pull request? `CreateViewCommand` only needs some information of a `CatalogTable`, but not all of them. We have some tricks(e.g. we need to check the table type is `VIEW`, we need to make `CatalogColumn.dataType` nullable) to allow it to take a `CatalogTable`. This PR cleans it up and only pass in necessary information to `CreateViewCommand`. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #14297 from cloud-fan/minor2.
-
hyukjinkwon authored
## What changes were proposed in this pull request? Currently, `JDBCRDD.compute` is doing type dispatch for each row to read appropriate values. It might not have to be done like this because the schema is already kept in `JDBCRDD`. So, appropriate converters can be created first according to the schema, and then apply them to each row. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14313 from HyukjinKwon/SPARK-16674.
-
Cheng Lian authored
## What changes were proposed in this pull request? Default `TreeNode.withNewChildren` implementation doesn't work for `Last` and when both constructor arguments are the same, e.g.: ```sql LAST_VALUE(FALSE) -- The 2nd argument defaults to FALSE LAST_VALUE(FALSE, FALSE) LAST_VALUE(TRUE, TRUE) ``` This is because although `Last` is a unary expression, both of its constructor arguments, `child` and `ignoreNullsExpr`, are `Expression`s. When they have the same value, `TreeNode.withNewChildren` treats both of them as child nodes by mistake. `First` is also affected by this issue in exactly the same way. This PR fixes this issue by making `ignoreNullsExpr` a child expression of `First` and `Last`. ## How was this patch tested? New test case added in `WindowQuerySuite`. Author: Cheng Lian <lian@databricks.com> Closes #14295 from liancheng/spark-16648-last-value.
-
- Jul 24, 2016
-
-
Qifan Pu authored
In the following code in `VectorizedHashMapGenerator.scala`: ``` def hashBytes(b: String): String = { val hash = ctx.freshName("hash") s""" |int $result = 0; |for (int i = 0; i < $b.length; i++) { | ${genComputeHash(ctx, s"$b[i]", ByteType, hash)} | $result = ($result ^ (0x9e3779b9)) + $hash + ($result << 6) + ($result >>> 2); |} """.stripMargin } ``` when b=input.getBytes(), the current 2.0 code results in getBytes() being called n times, n being length of input. getBytes() involves memory copy is thus expensive and causes a performance degradation. Fix is to evaluate getBytes() before the for loop. Performance bug, no additional test added. Author: Qifan Pu <qifan.pu@gmail.com> Closes #14337 from ooq/SPARK-16699. (cherry picked from commit d226dce1) Signed-off-by:
Reynold Xin <rxin@databricks.com>
-
Brian Cho authored
…close between each partition ## What changes were proposed in this pull request? Replace commitAndClose with separate commit and close to avoid opening and closing the file between partitions. ## How was this patch tested? Run existing unit tests, add a few unit tests regarding reverts. Observed a ~20% reduction in total time in tasks on stages with shuffle writes to many partitions. JoshRosen Author: Brian Cho <bcho@fb.com> Closes #13382 from dafrista/separatecommit-master.
-
Wenchen Fan authored
## What changes were proposed in this pull request? we also store data source table options in this field, it's unreasonable to call it `serdeProperties`. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #14283 from cloud-fan/minor1.
-
Mikael Ståldal authored
## What changes were proposed in this pull request? Force eager creation of loggers to avoid shutdown hook conflicts. ## How was this patch tested? Manually tested with a project using Log4j 2, verified that the shutdown hook conflict issue was solved. Author: Mikael Ståldal <mikael.staldal@magine.com> Closes #14320 from mikaelstaldal/shutdown-hook-logging.
-
WeichenXu authored
## What changes were proposed in this pull request? add `SparseMatrix` class whick support pickler. ## How was this patch tested? Existing test. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14265 from WeichenXu123/picklable_py.
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? This PR adds a boolean option, `truncate`, for `SaveMode.Overwrite` of JDBC DataFrameWriter. If this option is `true`, it try to take advantage of `TRUNCATE TABLE` instead of `DROP TABLE`. This is a trivial option, but will provide great **convenience** for BI tool users based on RDBMS tables generated by Spark. **Goal** - Without `CREATE/DROP` privilege, we can save dataframe to database. Sometime these are not allowed for security. - It will preserve the existing table information, so users can add and keep some additional `INDEX` and `CONSTRAINT`s for the table. - Sometime, `TRUNCATE` is faster than the combination of `DROP/CREATE`. **Supported DBMS** The following is `truncate`-option support table. Due to the different behavior of `TRUNCATE TABLE` among DBMSs, it's not always safe to use `TRUNCATE TABLE`. Spark will ignore the `truncate` option for **unknown** and **some** DBMS with **default CASCADING** behavior. Newly added JDBCDialect should implement corresponding function to support `truncate` option additionally. Spark Dialects | `truncate` OPTION SUPPORT ---------------|------------------------------- MySQLDialect | O PostgresDialect | X DB2Dialect | O MsSqlServerDialect | O DerbyDialect | O OracleDialect | O **Before (TABLE with INDEX case)**: SparkShell & MySQL CLI are interleaved intentionally. ```scala scala> val (url, prop)=("jdbc:mysql://localhost:3306/temp?useSSL=false", new java.util.Properties) scala> prop.setProperty("user","root") scala> df.write.mode("overwrite").jdbc(url, "table_with_index", prop) scala> spark.range(10).write.mode("overwrite").jdbc(url, "table_with_index", prop) mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | | NULL | | +-------+------------+------+-----+---------+-------+ mysql> CREATE UNIQUE INDEX idx_id ON table_with_index(id); mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | PRI | NULL | | +-------+------------+------+-----+---------+-------+ scala> spark.range(10).write.mode("overwrite").jdbc(url, "table_with_index", prop) mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | | NULL | | +-------+------------+------+-----+---------+-------+ ``` **After (TABLE with INDEX case)** ```scala scala> spark.range(10).write.mode("overwrite").option("truncate", true).jdbc(url, "table_with_index", prop) mysql> DESC table_with_index; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | id | bigint(20) | NO | PRI | NULL | | +-------+------------+------+-----+---------+-------+ ``` **Error Handling** - In case of exceptions, Spark will not retry. Users should turn off the `truncate` option. - In case of schema change: - If one of the column names changes, this will raise exceptions intuitively. - If there exists only type difference, this will work like Append mode. ## How was this patch tested? Pass the Jenkins tests with a updated testcase. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14086 from dongjoon-hyun/SPARK-16410.
-
Liwei Lin authored
## Problem The current `sed` in `test_script.sh` is missing a `$`, leading to the failure of `script` test on OS X: ``` == Results == !== Correct Answer - 2 == == Spark Answer - 2 == ![x1_y1] [x1] ![x2_y2] [x2] ``` In addition, this `script` test would also fail on systems like Windows where we couldn't be able to invoke `bash` or `echo | sed`. ## What changes were proposed in this pull request? This patch - fixes `sed` in `test_script.sh` - adds command guards so that the `script` test would pass on systems like Windows ## How was this patch tested? - Jenkins - Manually verified tests pass on OS X Author: Liwei Lin <lwlin7@gmail.com> Closes #14280 from lw-lin/osx-sed.
-
Sean Owen authored
Closes #11598 Closes #7278 Closes #13882 Closes #12053 Closes #14125 Closes #8760 Closes #12848 Closes #14224 Author: Sean Owen <sowen@cloudera.com> Closes #14328 from srowen/CloseOldPRs.
-
- Jul 23, 2016
-
-
Cheng Lian authored
This PR is based on PR #14098 authored by wangmiao1981. ## What changes were proposed in this pull request? This PR replaces the original Python Spark SQL example file with the following three files: - `sql/basic.py` Demonstrates basic Spark SQL features. - `sql/datasource.py` Demonstrates various Spark SQL data sources. - `sql/hive.py` Demonstrates Spark SQL Hive interaction. This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag. ## How was this patch tested? Manually tested. Author: wm624@hotmail.com <wm624@hotmail.com> Author: Cheng Lian <lian@databricks.com> Closes #14317 from liancheng/py-examples-update.
-
Wenchen Fan authored
## What changes were proposed in this pull request? after https://github.com/apache/spark/pull/12945, we renamed the `registerTempTable` to `createTempView`, as we do create a view actually. This PR renames `SQLTestUtils.withTempTable` to reflect this change. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #14318 from cloud-fan/minor4.
-
WeichenXu authored
## What changes were proposed in this pull request? move the `HiveContext` deprecate warning printing statement into `HiveContext` constructor. so that this warning will appear only when we use `HiveContext` otherwise this warning will always appear if we reference the pyspark.ml.context code file. ## How was this patch tested? Manual. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14301 from WeichenXu123/hiveContext_python_warning_update.
-
WeichenXu authored
## What changes were proposed in this pull request? renaming var names to make code more clear: nnz => weightSum weightSum => totalWeightSum and add a new member vector `nnz` (not `nnz` in previous code, which renamed to `weightSum`) to count each dimensions non-zero value number. using `nnz` which I added above instead of `weightSum` when calculating min/max so that it fix several numerical error in some extreme case. ## How was this patch tested? A new testcase added. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14216 from WeichenXu123/multivarOnlineSummary.
-
- Jul 22, 2016
-
-
Liang-Chi Hsieh authored
[SPARK-16622][SQL] Fix NullPointerException when the returned value of the called method in Invoke is null ## What changes were proposed in this pull request? Currently we don't check the value returned by called method in `Invoke`. When the returned value is null and is assigned to a variable of primitive type, `NullPointerException` will be thrown. ## How was this patch tested? Jenkins tests. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #14259 from viirya/agg-empty-ds.
-
Dongjoon Hyun authored
[SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description more consistent with Scala API ## What changes were proposed in this pull request? `withColumnRenamed` and `drop` is a no-op if the given column name does not exists. Python documentation also describe that, but this PR adds more explicit line consistently with Scala to reduce the ambiguity. ## How was this patch tested? It's about docs. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14288 from dongjoon-hyun/SPARK-16651.
-
Tom Graves authored
Clarify documentation on spark.task.maxFailures No tests run as its documentation Author: Tom Graves <tgraves@yahoo-inc.com> Closes #14287 from tgravescs/SPARK-16650.
-
WeichenXu authored
## What changes were proposed in this pull request? Add Pydev & Rstudio project file to gitignore list, I think the two IEDs are used by many developers. so that won't need personal gitignore_global config. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14293 from WeichenXu123/update_gitignore.
-
Ahmed Mahran authored
## What changes were proposed in this pull request? In `JobGenerator`, the code reads like that some batches might not get marked as fully processed. In the following flowchart, the batch should get marked fully processed before endpoint C however it is not. Currently, this does not actually cause an issue, as the condition `(time - zeroTime) is multiple of checkpoint duration?` always evaluates to `true` as the `checkpoint duration` is always set to be equal to the `batch duration`.  This PR fixes this issue so as to improve code readability and to avoid any potential issue in case there is any future change making checkpoint duration to be set different from batch duration. Author: Ahmed Mahran <ahmed.mahran@mashin.io> Closes #14145 from ahmed-mahran/b-mark-batch-fully-processed.
-
Jacek Laskowski authored
## What changes were proposed in this pull request? Build fix for [SPARK-16287][SQL] Implement str_to_map SQL function that has introduced this compilation error: ``` /Users/jacek/dev/oss/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala:402: error: annotation argument needs to be a constant; found: "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text ".+("into key/value pairs using delimiters. ").+("Default delimiters are \',\' for pairDelim and \':\' for keyValueDelim.") "into key/value pairs using delimiters. " + ^ ``` ## How was this patch tested? Local build Author: Jacek Laskowski <jacek@japila.pl> Closes #14315 from jaceklaskowski/build-fix-complexTypeCreator.
-
gatorsmile authored
### What changes were proposed in this pull request? **Issue 1: Silent Ignorance of Bucket Specification When Creating Table Using Schema Inference** When creating a data source table without explicit specification of schema or SELECT clause, we silently ignore the bucket specification (CLUSTERED BY... SORTED BY...) in [the code](https://github.com/apache/spark/blob/ce3b98bae28af72299722f56e4e4ef831f471ec0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L339-L354). For example, ```SQL CREATE TABLE jsonTable USING org.apache.spark.sql.json OPTIONS ( path '${tempDir.getCanonicalPath}' ) CLUSTERED BY (inexistentColumnA) SORTED BY (inexistentColumnB) INTO 2 BUCKETS ``` This PR captures it and issues an error message. **Issue 2: Got a run-time `java.lang.ArithmeticException` when num of buckets is set to zero.** For example, ```SQL CREATE TABLE t USING PARQUET OPTIONS (PATH '${path.toString}') CLUSTERED BY (a) SORTED BY (b) INTO 0 BUCKETS AS SELECT 1 AS a, 2 AS b ``` The exception we got is ``` ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 1.0 (TID 2) java.lang.ArithmeticException: / by zero ``` This PR captures the misuse and issues an appropriate error message. ### How was this patch tested? Added a test case in DDLSuite Author: gatorsmile <gatorsmile@gmail.com> Closes #14210 from gatorsmile/createTableWithoutSchema.
-
- Jul 21, 2016
-
-
Sandeep Singh authored
## What changes were proposed in this pull request? This PR adds `str_to_map` SQL function in order to remove Hive fallback. ## How was this patch tested? Pass the Jenkins tests with newly added. Author: Sandeep Singh <sandeep@techaddict.me> Closes #13990 from techaddict/SPARK-16287.
-
Sameer Agarwal authored
## What changes were proposed in this pull request? As part of the bugfix in https://github.com/apache/spark/pull/12279, if a row batch consist of both dictionary encoded and non-dictionary encoded pages, we explicitly decode the dictionary for the values that are already dictionary encoded. Currently we reset the dictionary while reading every page that can potentially cause ` java.lang.ArrayIndexOutOfBoundsException` while decoding older pages. This patch fixes the problem by maintaining a single dictionary per row-batch in vectorized parquet reader. ## How was this patch tested? Manual Tests against a number of hand-generated parquet files. Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14225 from sameeragarwal/vectorized.
-
Yin Huai authored
## What changes were proposed in this pull request? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62593/testReport/junit/org.apache.spark.sql.sources/CreateTableAsSelectSuite/create_a_table__drop_it_and_create_another_one_with_the_same_name/ shows that `create a table, drop it and create another one with the same name` failed. But other runs were good. Seems it is a flaky test. This PR tries to make this test more stable. Author: Yin Huai <yhuai@databricks.com> Closes #14289 from yhuai/SPARK-16656.
-
Michael Gummelt authored
## What changes were proposed in this pull request? Added new configuration namespace: spark.mesos.env.* This allows a user submitting a job in cluster mode to set arbitrary environment variables on the driver. spark.mesos.driverEnv.KEY=VAL will result in the env var "KEY" being set to "VAL" I've also refactored the tests a bit so we can re-use code in MesosClusterScheduler. And I've refactored the command building logic in `buildDriverCommand`. Command builder values were very intertwined before, and now it's easier to determine exactly how each variable is set. ## How was this patch tested? unit tests Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #14167 from mgummelt/driver-env-vars.
-
Cheng Lian authored
## What changes were proposed in this pull request? PR #14278 is a more general and simpler fix for SPARK-16632 than PR #14272. After merging #14278, we no longer need changes made in #14272. So here I revert them. This PR targets both master and branch-2.0. ## How was this patch tested? Existing tests. Author: Cheng Lian <lian@databricks.com> Closes #14300 from liancheng/revert-pr-14272.
-
Liang-Chi Hsieh authored
## What changes were proposed in this pull request? Elt function doesn't support codegen execution now. We should add the support. ## How was this patch tested? Jenkins tests. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #14277 from viirya/elt-codegen.
-
Cheng Lian authored
## What changes were proposed in this pull request? In `SpecificParquetRecordReaderBase`, which is used by the vectorized Parquet reader, we convert the Parquet requested schema into a Spark schema to guide column reader initialization. However, the Parquet requested schema is tailored from the schema of the physical file being scanned, and may have inaccurate type information due to bugs of other systems (e.g. HIVE-14294). On the other hand, we already set the real Spark requested schema into Hadoop configuration in [`ParquetFileFormat`][1]. This PR simply reads out this schema to replace the converted one. ## How was this patch tested? New test case added in `ParquetQuerySuite`. [1]: https://github.com/apache/spark/blob/v2.0.0-rc5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L292-L294 Author: Cheng Lian <lian@databricks.com> Closes #14278 from liancheng/spark-16632-simpler-fix.
-
Sean Owen authored
## What changes were proposed in this pull request? Saving partitions to JDBC in transaction can use a weaker transaction isolation level to reduce locking. Use better method to check if transactions are supported. ## How was this patch tested? Existing Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #14054 from srowen/SPARK-16226.
-
Holden Karau authored
## What changes were proposed in this pull request? Fix parquet to csv in a comment to match the input format being read. ## How was this patch tested? N/A (doc change only) Author: Holden Karau <holden@us.ibm.com> Closes #14274 from holdenk/minor-docfix-schema-of-csv-rather-than-parquet.
-
- Jul 20, 2016
-
-
Wenchen Fan authored
## What changes were proposed in this pull request? aggregate expressions can only be executed inside `Aggregate`, if we propagate it up with constraints, the parent operator can not execute it and will fail at runtime. ## How was this patch tested? new test in SQLQuerySuite Author: Wenchen Fan <wenchen@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #14281 from cloud-fan/bug.
-