- Aug 04, 2016
-
-
WeichenXu authored
## What changes were proposed in this pull request? To Make sure ANN layer input training data to be persisted, so that it can avoid overhead cost if the RDD need to be computed from lineage. ## How was this patch tested? Existing Tests. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14483 from WeichenXu123/add_ann_persist_training_data.
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? Add the missing args-checking for randomSplit and sample ## How was this patch tested? unit tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #14478 from zhengruifeng/fix_randomSplit.
-
Eric Liang authored
## What changes were proposed in this pull request? This moves DataSourceScanExec out so it's more discoverable, and now that it doesn't necessarily depend on an existing RDD. cc davies ## How was this patch tested? Existing tests. Author: Eric Liang <ekl@databricks.com> Closes #14487 from ericl/split-scan.
-
Davies Liu authored
## What changes were proposed in this pull request? This patch fix the overflow in LongToUnsafeRowMap when the range of key is very wide (the key is much much smaller then minKey, for example, key is Long.MinValue, minKey is > 0). ## How was this patch tested? Added regression test (also for SPARK-16740) Author: Davies Liu <davies@databricks.com> Closes #14464 from davies/fix_overflow.
-
Sean Zhong authored
## What changes were proposed in this pull request? For DataSet typed select: ``` def select[U1: Encoder](c1: TypedColumn[T, U1]): Dataset[U1] ``` If type T is a case class or a tuple class that is not atomic, the resulting logical plan's schema will mismatch with `Dataset[T]` encoder's schema, which will cause encoder error and throw AnalysisException. ### Before change: ``` scala> case class A(a: Int, b: Int) scala> Seq((0, A(1,2))).toDS.select($"_2".as[A]) org.apache.spark.sql.AnalysisException: cannot resolve '`a`' given input columns: [_2]; .. ``` ### After change: ``` scala> case class A(a: Int, b: Int) scala> Seq((0, A(1,2))).toDS.select($"_2".as[A]).show +---+---+ | a| b| +---+---+ | 1| 2| +---+---+ ``` ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes #14474 from clockfly/SPARK-16853.
-
Wenchen Fan authored
## What changes were proposed in this pull request? These 2 methods take `CatalogTable` as parameter, which already have the database information. ## How was this patch tested? existing test Author: Wenchen Fan <wenchen@databricks.com> Closes #14476 from cloud-fan/minor5.
-
Sean Zhong authored
## What changes were proposed in this pull request? Implements `eval()` method for expression `AssertNotNull` so that we can convert local projection on LocalRelation to another LocalRelation. ### Before change: ``` scala> import org.apache.spark.sql.catalyst.dsl.expressions._ scala> import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull scala> import org.apache.spark.sql.Column scala> case class A(a: Int) scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain java.lang.UnsupportedOperationException: Only code-generated evaluation is supported. at org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull.eval(objects.scala:850) ... ``` ### After the change: ``` scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain(true) == Parsed Logical Plan == 'Project [assertnotnull('_1) AS assertnotnull(_1)#5] +- LocalRelation [_1#2, _2#3] == Analyzed Logical Plan == assertnotnull(_1): struct<a:int> Project [assertnotnull(_1#2) AS assertnotnull(_1)#5] +- LocalRelation [_1#2, _2#3] == Optimized Logical Plan == LocalRelation [assertnotnull(_1)#5] == Physical Plan == LocalTableScan [assertnotnull(_1)#5] ``` ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes #14486 from clockfly/assertnotnull_eval.
-
Cheng Lian authored
## What changes were proposed in this pull request? This PR fixes a minor formatting issue (missing space after comma) of `SorgAggregateExec.toString`. Before: ``` SortAggregate(key=[a#76,b#77], functions=[max(c#78),min(c#78)], output=[a#76,b#77,max(c)#89,min(c)#90]) +- *Sort [a#76 ASC, b#77 ASC], false, 0 +- Exchange hashpartitioning(a#76, b#77, 200) +- SortAggregate(key=[a#76,b#77], functions=[partial_max(c#78),partial_min(c#78)], output=[a#76,b#77,max#99,min#100]) +- *Sort [a#76 ASC, b#77 ASC], false, 0 +- LocalTableScan <empty>, [a#76, b#77, c#78] ``` After: ``` SortAggregate(key=[a#76, b#77], functions=[max(c#78), min(c#78)], output=[a#76, b#77, max(c)#89, min(c)#90]) +- *Sort [a#76 ASC, b#77 ASC], false, 0 +- Exchange hashpartitioning(a#76, b#77, 200) +- SortAggregate(key=[a#76, b#77], functions=[partial_max(c#78), partial_min(c#78)], output=[a#76, b#77, max#99, min#100]) +- *Sort [a#76 ASC, b#77 ASC], false, 0 +- LocalTableScan <empty>, [a#76, b#77, c#78] ``` ## How was this patch tested? Manually tested. Author: Cheng Lian <lian@databricks.com> Closes #14480 from liancheng/fix-sort-based-agg-string-format.
-
- Aug 03, 2016
-
-
sharkd authored
## What changes were proposed in this pull request? SpillReader NPE when spillFile has no data. See follow logs: 16/07/31 20:54:04 INFO collection.ExternalSorter: spill memory to file:/data4/yarnenv/local/usercache/tesla/appcache/application_1465785263942_56138/blockmgr-db5f46c3-d7a4-4f93-8b77-565e469696fb/09/temp_shuffle_ec3ece08-4569-4197-893a-4a5dfcbbf9fa, fileSize:0.0 B 16/07/31 20:54:04 WARN memory.TaskMemoryManager: leak 164.3 MB memory from org.apache.spark.util.collection.ExternalSorter3db4b52d 16/07/31 20:54:04 ERROR executor.Executor: Managed memory leak detected; size = 190458101 bytes, TID = 2358516/07/31 20:54:04 ERROR executor.Executor: Exception in task 1013.0 in stage 18.0 (TID 23585) java.lang.NullPointerException at org.apache.spark.util.collection.ExternalSorter$SpillReader.cleanup(ExternalSorter.scala:624) at org.apache.spark.util.collection.ExternalSorter$SpillReader.nextBatchStream(ExternalSorter.scala:539) at org.apache.spark.util.collection.ExternalSorter$SpillReader.<init>(ExternalSorter.scala:507) at org.apache.spark.util.collection.ExternalSorter$SpillableIterator.spill(ExternalSorter.scala:816) at org.apache.spark.util.collection.ExternalSorter.forceSpill(ExternalSorter.scala:251) at org.apache.spark.util.collection.Spillable.spill(Spillable.scala:109) at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:154) at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249) at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 16/07/31 20:54:30 INFO executor.Executor: Executor is trying to kill task 1090.1 in stage 18.0 (TID 23793) 16/07/31 20:54:30 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown ## How was this patch tested? Manual test. Author: sharkd <sharkd.tu@gmail.com> Author: sharkdtu <sharkdtu@tencent.com> Closes #14479 from sharkdtu/master.
-
Holden Karau authored
## What changes were proposed in this pull request? Replace deprecated ParquetWriter with the new builders ## How was this patch tested? Existing tests Author: Holden Karau <holden@us.ibm.com> Closes #14419 from holdenk/SPARK-16814-fix-deprecated-parquet-constructor-usage.
-
Stefan Schulze authored
## What changes were proposed in this pull request? As of Scala 2.11.x there is no longer a org.scala-lang:jline version aligned to the scala version itself. Scala console now uses the plain jline:jline module. Spark's dependency management did not reflect this change properly, causing Maven to pull in Jline via transitive dependency. Unfortunately Jline 2.12 contained a minor but very annoying bug rendering the shell almost useless for developers with german keyboard layout. This request contains the following chages: - Exclude transitive dependency 'jline:jline' from hive-exec module - Remove global properties 'jline.version' and 'jline.groupId' - Add both properties and dependency to 'scala-2.11' profile - Add explicit dependency on 'jline:jline' to module 'spark-repl' ## How was this patch tested? - Running mvn dependency:tree and checking for correct Jline version 2.12.1 - Running full builds with assembly and checking for jline-2.12.1.jar in 'lib' folder of generated tarball Author: Stefan Schulze <stefan.schulze@pentasys.de> Closes #14429 from stsc-pentasys/SPARK-16770.
-
Kevin McHale authored
This is a pull request that was originally merged against branch-1.6 as #12000, now being merged into master as well. srowen zzcclp JoshRosen This pull request fixes an issue in which cluster-mode executors fail to properly register a JDBC driver when the driver is provided in a jar by the user, but the driver class name is derived from a JDBC URL (rather than specified by the user). The consequence of this is that all JDBC accesses under the described circumstances fail with an IllegalStateException. I reported the issue here: https://issues.apache.org/jira/browse/SPARK-14204 My proposed solution is to have the executors register the JDBC driver class under all circumstances, not only when the driver is specified by the user. This patch was tested manually. I built an assembly jar, deployed it to a cluster, and confirmed that the problem was fixed. Author: Kevin McHale <kevin@premise.com> Closes #14420 from mchalek/mchalek-jdbc_driver_registration.
-
Eric Liang authored
[SPARK-16596] [SQL] Refactor DataSourceScanExec to do partition discovery at execution instead of planning time ## What changes were proposed in this pull request? Partition discovery is rather expensive, so we should do it at execution time instead of during physical planning. Right now there is not much benefit since ListingFileCatalog will read scan for all partitions at planning time anyways, but this can be optimized in the future. Also, there might be more information for partition pruning not available at planning time. This PR moves a lot of the file scan logic from planning to execution time. All file scan operations are handled by `FileSourceScanExec`, which handles both batched and non-batched file scans. This requires some duplication with `RowDataSourceScanExec`, but is probably worth it so that `FileSourceScanExec` does not need to depend on an input RDD. TODO: In another pr, move DataSourceScanExec to it's own file. ## How was this patch tested? Existing tests (it might be worth adding a test that catalog.listFiles() is delayed until execution, but this can be delayed until there is an actual benefit to doing so). Author: Eric Liang <ekl@databricks.com> Closes #14241 from ericl/refactor.
-
Wenchen Fan authored
[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type ## What changes were proposed in this pull request? Here is a table about the behaviours of `array`/`map` and `greatest`/`least` in Hive, MySQL and Postgres: | |Hive|MySQL|Postgres| |---|---|---|---|---| |`array`/`map`|can find a wider type with decimal type arguments, and will truncate the wider decimal type if necessary|can find a wider type with decimal type arguments, no truncation problem|can find a wider type with decimal type arguments, no truncation problem| |`greatest`/`least`|can find a wider type with decimal type arguments, and truncate if necessary, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion| I think these behaviours makes sense and Spark SQL should follow them. This PR fixes `array` and `map` by using `findWiderCommonType` to get the wider type. This PR fixes `greatest` and `least` by add a `findWiderTypeWithoutStringPromotion`, which provides similar semantic of `findWiderCommonType`, but without string promotion. ## How was this patch tested? new tests in `TypeCoersionSuite` Author: Wenchen Fan <wenchen@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #14439 from cloud-fan/bug.
-
=^_^= authored
## What changes were proposed in this pull request? avgMetrics was summed, not averaged, across folds Author: =^_^= <maxmoroz@gmail.com> Closes #14456 from pkch/pkch-patch-1.
-
- Aug 02, 2016
-
-
Wenchen Fan authored
## What changes were proposed in this pull request? a small code style change, it's better to make the type parameter more accurate. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #14458 from cloud-fan/parquet.
-
Artur Sukhenko authored
## What changes were proposed in this pull request? Mask spark.ssl.keyPassword, spark.ssl.keyStorePassword, spark.ssl.trustStorePassword in Web UI environment page. (Changes their values to ***** in env. page) ## How was this patch tested? I've built spark, run spark shell and checked that this values have been masked with *****. Also run tests: ./dev/run-tests [info] ScalaTest [info] Run completed in 1 hour, 9 minutes, 5 seconds. [info] Total number of tests run: 2166 [info] Suites: completed 65, aborted 0 [info] Tests: succeeded 2166, failed 0, canceled 0, ignored 590, pending 0 [info] All tests passed.  Author: Artur Sukhenko <artur.sukhenko@gmail.com> Closes #14409 from Devian-ua/maskpass.
-
gatorsmile authored
### What changes were proposed in this pull request? This PR is to remove `TestHiveSharedState`. Also, this is also associated with the Hive refractoring for removing `HiveSharedState`. ### How was this patch tested? The existing test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #14463 from gatorsmile/removeTestHiveSharedState.
-
Josh Rosen authored
## What changes were proposed in this pull request? The behavior of `SparkContext.addFile()` changed slightly with the introduction of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where it was disabled by default) and became the default / only file server in Spark 2.0.0. Prior to 2.0, calling `SparkContext.addFile()` with files that have the same name and identical contents would succeed. This behavior was never explicitly documented but Spark has behaved this way since very early 1.x versions. In 2.0 (or 1.6 with the Netty file server enabled), the second `addFile()` call will fail with a requirement error because NettyStreamManager tries to guard against duplicate file registration. This problem also affects `addJar()` in a more subtle way: the `fileServer.addJar()` call will also fail with an exception but that exception is logged and ignored; I believe that the problematic exception-catching path was mistakenly copied from some old code which was only relevant to very old versions of Spark and YARN mode. I believe that this change of behavior was unintentional, so this patch weakens the `require` check so that adding the same filename at the same path will succeed. At file download time, Spark tasks will fail with exceptions if an executor already has a local copy of a file and that file's contents do not match the contents of the file being downloaded / added. As a result, it's important that we prevent files with the same name and different contents from being served because allowing that can effectively brick an executor by preventing it from successfully launching any new tasks. Before this patch's change, this was prevented by forbidding `addFile()` from being called twice on files with the same name. Because Spark does not defensively copy local files that are passed to `addFile` it is vulnerable to files' contents changing, so I think it's okay to rely on an implicit assumption that these files are intended to be immutable (since if they _are_ mutable then this can lead to either explicit task failures or implicit incorrectness (in case new executors silently get newer copies of the file while old executors continue to use an older version)). To guard against this, I have decided to only update the file addition timestamps on the first call to `addFile()`; duplicate calls will succeed but will not update the timestamp. This behavior is fine as long as we assume files are immutable, which seems reasonable given the behaviors described above. As part of this change, I also improved the thread-safety of the `addedJars` and `addedFiles` maps; this is important because these maps may be concurrently read by a task launching thread and written by a driver thread in case the user's driver code is multi-threaded. ## How was this patch tested? I added regression tests in `SparkContextSuite`. Author: Josh Rosen <joshrosen@databricks.com> Closes #14396 from JoshRosen/SPARK-16787.
-
Wenchen Fan authored
## What changes were proposed in this pull request? `Greatest` and `Least` are not conditional expressions, but arithmetic expressions. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #14460 from cloud-fan/move.
-
sandy authored
## What changes were proposed in this pull request? Modify java example which is also reflect in document. ## How was this patch tested? run test cases. Author: sandy <phalodi@gmail.com> Closes #14436 from phalodi/SPARK-16816.
-
Herman van Hovell authored
## What changes were proposed in this pull request? In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions as literals (without adding braces), for example: ```SQL select /* Spark 1.6: */ current_date, /* Spark 1.6 & Spark 2.0: */ current_date() ``` This was accidentally dropped in Spark 2.0. This PR reinstates this functionality. ## How was this patch tested? Added a case to ExpressionParserSuite. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #14442 from hvanhovell/SPARK-16836.
-
Liang-Chi Hsieh authored
## What changes were proposed in this pull request? There are two related bugs of Python-only UDTs. Because the test case of second one needs the first fix too. I put them into one PR. If it is not appropriate, please let me know. ### First bug: When MapObjects works on Python-only UDTs `RowEncoder` will use `PythonUserDefinedType.sqlType` for its deserializer expression. If the sql type is `ArrayType`, we will have `MapObjects` working on it. But `MapObjects` doesn't consider `PythonUserDefinedType` as its input data type. It causes error like: import pyspark.sql.group from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql.types import * schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT()) df = spark.createDataFrame([(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], schema=schema) df.show() File "/home/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o36.showString. : java.lang.RuntimeException: Error while decoding: scala.MatchError: org.apache.spark.sql.types.PythonUserDefinedTypef4ceede8 (of class org.apache.spark.sql.types.PythonUserDefinedType) ... ### Second bug: When Python-only UDTs is the element type of ArrayType import pyspark.sql.group from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql.types import * schema = StructType().add("key", LongType()).add("val", ArrayType(PythonOnlyUDT())) df = spark.createDataFrame([(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in range(10)], schema=schema) df.show() ## How was this patch tested? PySpark's sql tests. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #13778 from viirya/fix-pyudt.
-
Tom Magrino authored
## What changes were proposed in this pull request? Fix of incorrect arguments (dropping slideDuration and using windowDuration) in constructors for TimeWindow. The JIRA this addresses is here: https://issues.apache.org/jira/browse/SPARK-16837 ## How was this patch tested? Added a test to TimeWindowSuite to check that the results of TimeWindow object apply and TimeWindow class constructor are equivalent. Author: Tom Magrino <tmagrino@fb.com> Closes #14441 from tmagrino/windowing-fix.
-
Shuai Lin authored
## What changes were proposed in this pull request? Support using latex in scaladoc by adding MathJax javascript to the js template. ## How was this patch tested? Generated scaladoc. Preview: - LogisticGradient: [before](https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.mllib.optimization.LogisticGradient) and [after](https://sparkdocs.lins05.pw/spark-16822/api/scala/index.html#org.apache.spark.mllib.optimization.LogisticGradient) - MinMaxScaler: [before](https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.ml.feature.MinMaxScaler) and [after](https://sparkdocs.lins05.pw/spark-16822/api/scala/index.html#org.apache.spark.ml.feature.MinMaxScaler) Author: Shuai Lin <linshuai2012@gmail.com> Closes #14438 from lins05/spark-16822-support-latex-in-scaladoc.
-
Maciej Brynski authored
## What changes were proposed in this pull request? Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 ## How was this patch tested? Compilation. Existing automatic tests Author: Maciej Brynski <maciej.brynski@adpilot.pl> Closes #14459 from maver1ck/spark-15541-master.
-
Xusen Yin authored
[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector ## What changes were proposed in this pull request? mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format. ## How was this patch tested? Test manually. Author: Xusen Yin <yinxusen@gmail.com> Closes #14212 from yinxusen/SPARK-16558.
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? Add a length checking for threshoulds' length in method `setThreshoulds()` of classification models. ## How was this patch tested? unit tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #14457 from zhengruifeng/check_setThresholds.
-
petermaxlee authored
## What changes were proposed in this pull request? Greatest/least function does not have the most friendly error message for data types. This patch improves the error message to not show the Seq type, and use more human readable data types. Before: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST (ArrayBuffer(DecimalType(2,1), StringType)).; line 1 pos 7 ``` After: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST(decimal(2,1), string).; line 1 pos 7 ``` ## How was this patch tested? Manually verified the output and also added unit tests to ConditionalExpressionSuite. Author: petermaxlee <petermaxlee@gmail.com> Closes #14453 from petermaxlee/SPARK-16850.
-
Cheng Lian authored
## What changes were proposed in this pull request? This PR makes various minor updates to examples of all language bindings to make sure they are consistent with each other. Some typos and missing parts (JDBC example in Scala/Java/Python) are also fixed. ## How was this patch tested? Manually tested. Author: Cheng Lian <lian@databricks.com> Closes #14368 from liancheng/revise-examples.
-
jiangxingbo authored
## What changes were proposed in this pull request? With SPARK-15034, we could use the value of spark.sql.warehouse.dir to set the warehouse location. In TestHive, we can now simply set the temporary warehouse path in sc's conf, and thus, param "warehousePath" could be removed. ## How was this patch tested? exsiting testsuites. Author: jiangxingbo <jiangxingbo@meituan.com> Closes #14401 from jiangxb1987/warehousePath.
-
- Aug 01, 2016
-
-
Wenchen Fan authored
## What changes were proposed in this pull request? These 2 expressions are not needed anymore after we have `Greatest` and `Least`. This PR removes them and related tests. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #14434 from cloud-fan/minor1.
-
Shixiong Zhu authored
## What changes were proposed in this pull request? Moved `asScala` to a `map` to avoid NPE. ## How was this patch tested? Existing unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #14443 from zsxwing/SPARK-15869.
-
Holden Karau authored
## What changes were proposed in this pull request? Removes the deprecated timestamp constructor and incidentally fixes the use which was using system timezone rather than the one specified when working near DST. This change also causes the roundtrip tests to fail since it now actually uses all the timezones near DST boundaries where it didn't before. Note: this is only a partial the solution, longer term we should follow up with https://issues.apache.org/jira/browse/SPARK-16788 to avoid this problem & simplify our timezone handling code. ## How was this patch tested? New tests for two timezones added so even if user timezone happens to coincided with one, the other tests should still fail. Important note: this (temporarily) disables the round trip tests until we can fix the issue more thoroughly. Author: Holden Karau <holden@us.ibm.com> Closes #14398 from holdenk/SPARK-16774-fix-use-of-deprecated-timestamp-constructor.
-
eyal farago authored
## What changes were proposed in this pull request? a failing test case + fix to SPARK-16791 (https://issues.apache.org/jira/browse/SPARK-16791) ## How was this patch tested? added a failing test case to CastSuit, then fixed the Cast code and rerun the entire CastSuit Author: eyal farago <eyal farago> Author: Eyal Farago <eyal.farago@actimize.com> Closes #14400 from eyalfa/SPARK-16791_cast_struct_with_timestamp_field_fails.
-
hyukjinkwon authored
## What changes were proposed in this pull request? This PR replaces the old Kafka API to 0.10.0 ones in `KafkaTestUtils`. The change include: - `Producer` to `KafkaProducer` - Change configurations to equalvant ones. (I referred [here](http://kafka.apache.org/documentation.html#producerconfigs) for 0.10.0 and [here](http://kafka.apache.org/082/documentation.html#producerconfigs ) for old, 0.8.2). This PR will remove the build warning as below: ```scala [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:71: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead. [WARNING] private var producer: Producer[String, String] = _ [WARNING] ^ [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:181: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead. [WARNING] producer = new Producer[String, String](new ProducerConfig(producerConfiguration)) [WARNING] ^ [WARNING] .../spark/streaming/kafka010/KafkaTestUtils.scala:181: class ProducerConfig in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerConfig instead. [WARNING] producer = new Producer[String, String](new ProducerConfig(producerConfiguration)) [WARNING] ^ [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:182: class KeyedMessage in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerRecord instead. [WARNING] producer.send(messages.map { new KeyedMessage[String, String](topic, _ ) }: _*) [WARNING] ^ [WARNING] four warnings found [WARNING] warning: [options] bootstrap class path not set in conjunction with -source 1.7 [WARNING] 1 warning ``` ## How was this patch tested? Existing tests that use `KafkaTestUtils` should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14416 from HyukjinKwon/SPARK-16776.
-
Holden Karau authored
## What changes were proposed in this pull request? Change to non-deprecated constructor for SQLContext. ## How was this patch tested? Existing tests Author: Holden Karau <holden@us.ibm.com> Closes #14406 from holdenk/SPARK-16778-fix-use-of-deprecated-SQLContext-constructor.
-
Shuai Lin authored
## What changes were proposed in this pull request? Removed useless latex in a log messge. ## How was this patch tested? Check generated scaladoc. Author: Shuai Lin <linshuai2012@gmail.com> Closes #14380 from lins05/fix-docs-formatting.
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? Currently, `UNION` queries on incompatible types show misleading error messages, i.e., `unresolved operator Union`. We had better show a more correct message. This will help users in the situation of [SPARK-16704](https://issues.apache.org/jira/browse/SPARK-16704). **Before** ```scala scala> sql("select 1,2,3 union (select 1,array(2),3)") org.apache.spark.sql.AnalysisException: unresolved operator 'Union; scala> sql("select 1,2,3 intersect (select 1,array(2),3)") org.apache.spark.sql.AnalysisException: unresolved operator 'Intersect; scala> sql("select 1,2,3 except (select 1,array(2),3)") org.apache.spark.sql.AnalysisException: unresolved operator 'Except; ``` **After** ```scala scala> sql("select 1,2,3 union (select 1,array(2),3)") org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. ArrayType(IntegerType,false) <> IntegerType at the second column of the second table; scala> sql("select 1,2,3 intersect (select 1,array(2),3)") org.apache.spark.sql.AnalysisException: Intersect can only be performed on tables with the compatible column types. ArrayType(IntegerType,false) <> IntegerType at the second column of the second table; scala> sql("select 1,2,3 except (select array(1),array(2),3)") org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. ArrayType(IntegerType,false) <> IntegerType at the first column of the second table; ``` ## How was this patch tested? Pass the Jenkins test with a new test case. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14355 from dongjoon-hyun/SPARK-16726.
-
- Jul 31, 2016
-
-
Reynold Xin authored
## What changes were proposed in this pull request? It is useful to log the timezone when query result does not match, especially on build machines that have different timezone from AMPLab Jenkins. ## How was this patch tested? This is a test-only change. Author: Reynold Xin <rxin@databricks.com> Closes #14413 from rxin/SPARK-16805.
-