- Jun 21, 2017
-
-
Li Yichao authored
## What changes were proposed in this pull request? Currently the shuffle service registration timeout and retry has been hardcoded. This works well for small workloads but under heavy workload when the shuffle service is busy transferring large amount of data we see significant delay in responding to the registration request, as a result we often see the executors fail to register with the shuffle service, eventually failing the job. We need to make these two parameters configurable. ## How was this patch tested? * Updated `BlockManagerSuite` to test registration timeout and max attempts configuration actually works. cc sitalkedia Author: Li Yichao <lyc@zhihu.com> Closes #18092 from liyichao/SPARK-20640.
-
sureshthalamati authored
This patch adds DB2 specific data type mappings for decfloat, real, xml , and timestamp with time zone (DB2Z specific type) types on read and for byte, short data types on write to the to jdbc data source DB2 dialect. Default mapping does not work for these types when reading/writing from DB2 database. Added docker test, and a JDBC unit test case. Author: sureshthalamati <suresh.thalamati@gmail.com> Closes #9162 from sureshthalamati/db2dialect_enhancements-spark-10655.
-
- Jun 20, 2017
-
-
Reynold Xin authored
## What changes were proposed in this pull request? QueryPlanConstraints should be part of LogicalPlan, rather than QueryPlan, since the constraint framework is only used for query plan rewriting and not for physical planning. ## How was this patch tested? Should be covered by existing tests, since it is a simple refactoring. Author: Reynold Xin <rxin@databricks.com> Closes #18310 from rxin/SPARK-21103.
-
Wenchen Fan authored
## What changes were proposed in this pull request? This is a regression in Spark 2.2. In Spark 2.2, we introduced a new way to resolve persisted view: https://issues.apache.org/jira/browse/SPARK-18209 , but this makes the persisted view non case-preserving because we store the schema in hive metastore directly. We should follow data source table and store schema in table properties. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes #18360 from cloud-fan/view.
-
Xingbo Jiang authored
[SPARK-20989][CORE] Fail to start multiple workers on one host if external shuffle service is enabled in standalone mode ## What changes were proposed in this pull request? In standalone mode, if we enable external shuffle service by setting `spark.shuffle.service.enabled` to true, and then we try to start multiple workers on one host(by setting `SPARK_WORKER_INSTANCES=3` in spark-env.sh, and then run `sbin/start-slaves.sh`), we can only launch one worker on each host successfully and the rest of the workers fail to launch. The reason is the port of external shuffle service if configed by `spark.shuffle.service.port`, so currently we could start no more than one external shuffle service on each host. In our case, each worker tries to start a external shuffle service, and only one of them succeeded doing this. We should give explicit reason of failure instead of fail silently. ## How was this patch tested? Manually test by the following steps: 1. SET `SPARK_WORKER_INSTANCES=1` in `conf/spark-env.sh`; 2. SET `spark.shuffle.service.enabled` to `true` in `conf/spark-defaults.conf`; 3. Run `sbin/start-all.sh`. Before the change, you will see no error in the command line, as the following: ``` starting org.apache.spark.deploy.master.Master, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.master.Master-1-xxx.local.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-1-xxx.local.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-2-xxx.local.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-3-xxx.local.out ``` And you can see in the webUI that only one worker is running. After the change, you get explicit error messages in the command line: ``` starting org.apache.spark.deploy.master.Master, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.master.Master-1-xxx.local.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-1-xxx.local.out localhost: failed to launch: nice -n 0 /Users/xxx/workspace/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://xxx.local:7077 localhost: 17/06/13 23:24:53 INFO SecurityManager: Changing view acls to: xxx localhost: 17/06/13 23:24:53 INFO SecurityManager: Changing modify acls to: xxx localhost: 17/06/13 23:24:53 INFO SecurityManager: Changing view acls groups to: localhost: 17/06/13 23:24:53 INFO SecurityManager: Changing modify acls groups to: localhost: 17/06/13 23:24:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); groups with view permissions: Set(); users with modify permissions: Set(xxx); groups with modify permissions: Set() localhost: 17/06/13 23:24:54 INFO Utils: Successfully started service 'sparkWorker' on port 63354. localhost: Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Start multiple worker on one host failed because we may launch no more than one external shuffle service on each host, please set spark.shuffle.service.enabled to false or set SPARK_WORKER_INSTANCES to 1 to resolve the conflict. localhost: at scala.Predef$.require(Predef.scala:224) localhost: at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:752) localhost: at org.apache.spark.deploy.worker.Worker.main(Worker.scala) localhost: full log in /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-1-xxx.local.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-2-xxx.local.out localhost: failed to launch: nice -n 0 /Users/xxx/workspace/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8082 spark://xxx.local:7077 localhost: 17/06/13 23:24:56 INFO SecurityManager: Changing view acls to: xxx localhost: 17/06/13 23:24:56 INFO SecurityManager: Changing modify acls to: xxx localhost: 17/06/13 23:24:56 INFO SecurityManager: Changing view acls groups to: localhost: 17/06/13 23:24:56 INFO SecurityManager: Changing modify acls groups to: localhost: 17/06/13 23:24:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); groups with view permissions: Set(); users with modify permissions: Set(xxx); groups with modify permissions: Set() localhost: 17/06/13 23:24:56 INFO Utils: Successfully started service 'sparkWorker' on port 63359. localhost: Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Start multiple worker on one host failed because we may launch no more than one external shuffle service on each host, please set spark.shuffle.service.enabled to false or set SPARK_WORKER_INSTANCES to 1 to resolve the conflict. localhost: at scala.Predef$.require(Predef.scala:224) localhost: at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:752) localhost: at org.apache.spark.deploy.worker.Worker.main(Worker.scala) localhost: full log in /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-2-xxx.local.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-3-xxx.local.out localhost: failed to launch: nice -n 0 /Users/xxx/workspace/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8083 spark://xxx.local:7077 localhost: 17/06/13 23:24:59 INFO SecurityManager: Changing view acls to: xxx localhost: 17/06/13 23:24:59 INFO SecurityManager: Changing modify acls to: xxx localhost: 17/06/13 23:24:59 INFO SecurityManager: Changing view acls groups to: localhost: 17/06/13 23:24:59 INFO SecurityManager: Changing modify acls groups to: localhost: 17/06/13 23:24:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); groups with view permissions: Set(); users with modify permissions: Set(xxx); groups with modify permissions: Set() localhost: 17/06/13 23:24:59 INFO Utils: Successfully started service 'sparkWorker' on port 63360. localhost: Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Start multiple worker on one host failed because we may launch no more than one external shuffle service on each host, please set spark.shuffle.service.enabled to false or set SPARK_WORKER_INSTANCES to 1 to resolve the conflict. localhost: at scala.Predef$.require(Predef.scala:224) localhost: at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:752) localhost: at org.apache.spark.deploy.worker.Worker.main(Worker.scala) localhost: full log in /Users/xxx/workspace/spark/logs/spark-xxx-org.apache.spark.deploy.worker.Worker-3-xxx.local.out ``` Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes #18290 from jiangxb1987/start-slave.
-
Joseph K. Bradley authored
## What changes were proposed in this pull request? LinearSVC should use its own threshold param, rather than the shared one, since it applies to rawPrediction instead of probability. This PR changes the param in the Scala, Python and R APIs. ## How was this patch tested? New unit test to make sure the threshold can be set to any Double value. Author: Joseph K. Bradley <joseph@databricks.com> Closes #18151 from jkbradley/ml-2.2-linearsvc-cleanup.
-
- Jun 19, 2017
-
-
actuaryzhang authored
## What changes were proposed in this pull request? Grouped documentation for the aggregate functions for Column. Author: actuaryzhang <actuaryzhang10@gmail.com> Closes #18025 from actuaryzhang/sparkRDoc4.
-
Yuming Wang authored
## What changes were proposed in this pull request? Fix HighlyCompressedMapStatus#writeExternal NPE: ``` 17/06/18 15:00:27 ERROR Utils: Exception encountered java.lang.NullPointerException at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) at org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) at org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) at org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) at org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) ... 17 more 17/06/18 15:00:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.17.47.20:50188 17/06/18 15:00:27 ERROR Utils: Exception encountered java.lang.NullPointerException at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) at org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) at org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` ## How was this patch tested? manual tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18343 from wangyum/SPARK-21133.
-
Marcelo Vanzin authored
Closes #18311 Closes #18278
-
sharkdtu authored
[SPARK-21138][YARN] Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different ## What changes were proposed in this pull request? When I set different clusters for "spark.hadoop.fs.defaultFS" and "spark.yarn.stagingDir" as follows: ``` spark.hadoop.fs.defaultFS hdfs://tl-nn-tdw.tencent-distribute.com:54310 spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark ``` The staging dir can not be deleted, it will prompt following message: ``` java.lang.IllegalArgumentException: Wrong FS: hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310 ``` ## How was this patch tested? Existing tests Author: sharkdtu <sharkdtu@tencent.com> Closes #18352 from sharkdtu/master.
-
Marcelo Vanzin authored
The jobs page currently shows the application user, but it assumes the OS user is the same as the user running the application, which may not be true in all scenarios (e.g., kerberos). While it might be useful to show both in the UI, this change just chooses the application user over the OS user, since the latter can be found in the environment page if needed. Tested in live application and in history server. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #18331 from vanzin/SPARK-21124.
-
Xianyang Liu authored
## What changes were proposed in this pull request? Fix some typo of the document. ## How was this patch tested? Existing tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Xianyang Liu <xianyang.liu@intel.com> Closes #18350 from ConeyLiu/fixtypo.
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? This PR cleans up a few Java linter errors for Apache Spark 2.2 release. ## How was this patch tested? ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` We can check the result at Travis CI, [here](https://travis-ci.org/dongjoon-hyun/spark/builds/244297894). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #18345 from dongjoon-hyun/fix_lint_java_2.
-
Yong Tang authored
## What changes were proposed in this pull request? This fix tries to address the issue in SPARK-19975 where we have `map_keys` and `map_values` functions in SQL yet there is no Python equivalent functions. This fix adds `map_keys` and `map_values` functions to Python. ## How was this patch tested? This fix is tested manually (See Python docs for examples). Author: Yong Tang <yong.tang.github@outlook.com> Closes #17328 from yongtang/SPARK-19975.
-
assafmendelson authored
## What changes were proposed in this pull request? The description for several options of File Source for structured streaming appeared in the File Sink description instead. This pull request has two commits: The first includes changes to the version as it appeared in spark 2.1 and the second handled an additional option added for spark 2.2 ## How was this patch tested? Built the documentation by SKIP_API=1 jekyll build and visually inspected the structured streaming programming guide. The original documentation was written by tdas and lw-lin Author: assafmendelson <assaf.mendelson@gmail.com> Closes #18342 from assafmendelson/spark-21123.
-
saturday_s authored
## What changes were proposed in this pull request? Reload the `spark.yarn.credentials.file` property when restarting a streaming application from checkpoint. ## How was this patch tested? Manual tested with 1.6.3 and 2.1.1. I didn't test this with master because of some compile problems, but I think it will be the same result. ## Notice This should be merged into maintenance branches too. jira: [SPARK-21008](https://issues.apache.org/jira/browse/SPARK-21008) Author: saturday_s <shi.indetail@gmail.com> Closes #18230 from saturday-shi/SPARK-21008.
-
hyukjinkwon authored
## What changes were proposed in this pull request? #17753 bumps master branch version to 2.3.0-SNAPSHOT, but it seems SparkR and PySpark version were omitted. ditto of https://github.com/apache/spark/pull/16488 / https://github.com/apache/spark/pull/17523 ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #18341 from HyukjinKwon/r-version.
-
Xiao Li authored
### What changes were proposed in this pull request? We should not silently ignore `DISTINCT` when they are not supported in the function arguments. This PR is to block these cases and issue the error messages. ### How was this patch tested? Added test cases for both regular functions and window functions Author: Xiao Li <gatorsmile@gmail.com> Closes #18340 from gatorsmile/firstCount.
-
Xingbo Jiang authored
## What changes were proposed in this pull request? Fix any inconsistent part in JsonProtocol with the UI. This PR also contains the modifications in #17181 ## How was this patch tested? Updated JsonProtocolSuite. Before this change, localhost:8080/json shows: ``` { "url" : "spark://xingbos-MBP.local:7077", "workers" : [ { "id" : "worker-20170615172946-192.168.0.101-49450", "host" : "192.168.0.101", "port" : 49450, "webuiaddress" : "http://192.168.0.101:8081", "cores" : 8, "coresused" : 8, "coresfree" : 0, "memory" : 15360, "memoryused" : 1024, "memoryfree" : 14336, "state" : "ALIVE", "lastheartbeat" : 1497519481722 }, { "id" : "worker-20170615172948-192.168.0.101-49452", "host" : "192.168.0.101", "port" : 49452, "webuiaddress" : "http://192.168.0.101:8082", "cores" : 8, "coresused" : 8, "coresfree" : 0, "memory" : 15360, "memoryused" : 1024, "memoryfree" : 14336, "state" : "ALIVE", "lastheartbeat" : 1497519484160 }, { "id" : "worker-20170615172951-192.168.0.101-49469", "host" : "192.168.0.101", "port" : 49469, "webuiaddress" : "http://192.168.0.101:8083", "cores" : 8, "coresused" : 8, "coresfree" : 0, "memory" : 15360, "memoryused" : 1024, "memoryfree" : 14336, "state" : "ALIVE", "lastheartbeat" : 1497519486905 } ], "cores" : 24, "coresused" : 24, "memory" : 46080, "memoryused" : 3072, "activeapps" : [ { "starttime" : 1497519426990, "id" : "app-20170615173706-0001", "name" : "Spark shell", "user" : "xingbojiang", "memoryperslave" : 1024, "submitdate" : "Thu Jun 15 17:37:06 CST 2017", "state" : "RUNNING", "duration" : 65362 } ], "completedapps" : [ { "starttime" : 1497519250893, "id" : "app-20170615173410-0000", "name" : "Spark shell", "user" : "xingbojiang", "memoryperslave" : 1024, "submitdate" : "Thu Jun 15 17:34:10 CST 2017", "state" : "FINISHED", "duration" : 116895 } ], "activedrivers" : [ ], "status" : "ALIVE" } ``` After the change: ``` { "url" : "spark://xingbos-MBP.local:7077", "workers" : [ { "id" : "worker-20170615175032-192.168.0.101-49951", "host" : "192.168.0.101", "port" : 49951, "webuiaddress" : "http://192.168.0.101:8081", "cores" : 8, "coresused" : 8, "coresfree" : 0, "memory" : 15360, "memoryused" : 1024, "memoryfree" : 14336, "state" : "ALIVE", "lastheartbeat" : 1497520292900 }, { "id" : "worker-20170615175034-192.168.0.101-49953", "host" : "192.168.0.101", "port" : 49953, "webuiaddress" : "http://192.168.0.101:8082", "cores" : 8, "coresused" : 8, "coresfree" : 0, "memory" : 15360, "memoryused" : 1024, "memoryfree" : 14336, "state" : "ALIVE", "lastheartbeat" : 1497520280301 }, { "id" : "worker-20170615175037-192.168.0.101-49955", "host" : "192.168.0.101", "port" : 49955, "webuiaddress" : "http://192.168.0.101:8083", "cores" : 8, "coresused" : 8, "coresfree" : 0, "memory" : 15360, "memoryused" : 1024, "memoryfree" : 14336, "state" : "ALIVE", "lastheartbeat" : 1497520282884 } ], "aliveworkers" : 3, "cores" : 24, "coresused" : 24, "memory" : 46080, "memoryused" : 3072, "activeapps" : [ { "id" : "app-20170615175122-0001", "starttime" : 1497520282115, "name" : "Spark shell", "cores" : 24, "user" : "xingbojiang", "memoryperslave" : 1024, "submitdate" : "Thu Jun 15 17:51:22 CST 2017", "state" : "RUNNING", "duration" : 10805 } ], "completedapps" : [ { "id" : "app-20170615175058-0000", "starttime" : 1497520258766, "name" : "Spark shell", "cores" : 24, "user" : "xingbojiang", "memoryperslave" : 1024, "submitdate" : "Thu Jun 15 17:50:58 CST 2017", "state" : "FINISHED", "duration" : 9876 } ], "activedrivers" : [ ], "completeddrivers" : [ ], "status" : "ALIVE" } ``` Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes #18303 from jiangxb1987/json-protocol.
-
- Jun 18, 2017
-
-
liuxian authored
## What changes were proposed in this pull request? 1.In `acquireStorageMemory`, when the Memory Mode is OFF_HEAP ,the `maxOffHeapMemory` should be modified to `maxOffHeapStorageMemory`. after this PR,it will same as ON_HEAP Memory Mode. Because when acquire memory is between `maxOffHeapStorageMemory` and `maxOffHeapMemory`,it will fail surely, so if acquire memory is greater than `maxOffHeapStorageMemory`(not greater than `maxOffHeapMemory`),we should fail fast. 2. Borrow memory from execution, `numBytes` modified to `numBytes - storagePool.memoryFree` will be more reasonable. Because we just acquire `(numBytes - storagePool.memoryFree)`, unnecessary borrowed `numBytes` from execution ## How was this patch tested? added unit test case Author: liuxian <liu.xian3@zte.com.cn> Closes #18296 from 10110346/wip-lx-0614.
-
Yuming Wang authored
## What changes were proposed in this pull request? Built-in SQL Function UnaryMinus/UnaryPositive support string type, if it's string type, convert it to double type, after this PR: ```sql spark-sql> select positive('-1.11'), negative('-1.11'); -1.11 1.11 spark-sql> ``` ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18173 from wangyum/SPARK-20948.
-
Yuming Wang authored
## What changes were proposed in this pull request? The function `char_length` is shorthand for `character_length` function. Both Hive and Postgresql support `character_length`, This PR add support for `character_length`. Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions https://www.postgresql.org/docs/current/static/functions-string.html ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18330 from wangyum/SPARK-20749-character_length.
-
actuaryzhang authored
## What changes were proposed in this pull request? Add SQL trunc function ## How was this patch tested? standard test Author: actuaryzhang <actuaryzhang10@gmail.com> Closes #18291 from actuaryzhang/sparkRTrunc2.
-
hyukjinkwon authored
## What changes were proposed in this pull request? This PR proposes to list the files in test _after_ removing both "spark-warehouse" and "metastore_db" so that the next run of R tests pass fine. This is sometimes a bit annoying. ## How was this patch tested? Manually running multiple times R tests via `./R/run-tests.sh`. **Before** Second run: ``` SparkSQL functions: Spark package found in SPARK_HOME: .../spark ............................................................................................................................................................... ............................................................................................................................................................... ............................................................................................................................................................... ............................................................................................................................................................... ............................................................................................................................................................... ....................................................................................................1234....................... Failed ------------------------------------------------------------------------- 1. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3384) length(list1) not equal to length(list2). 1/1 mismatches [1] 25 - 23 == 2 2. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3384) sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). 10/25 mismatches x[16]: "metastore_db" y[16]: "pkg" x[17]: "pkg" y[17]: "R" x[18]: "R" y[18]: "README.md" x[19]: "README.md" y[19]: "run-tests.sh" x[20]: "run-tests.sh" y[20]: "SparkR_2.2.0.tar.gz" x[21]: "metastore_db" y[21]: "pkg" x[22]: "pkg" y[22]: "R" x[23]: "R" y[23]: "README.md" x[24]: "README.md" y[24]: "run-tests.sh" x[25]: "run-tests.sh" y[25]: "SparkR_2.2.0.tar.gz" 3. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3388) length(list1) not equal to length(list2). 1/1 mismatches [1] 25 - 23 == 2 4. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3388) sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). 10/25 mismatches x[16]: "metastore_db" y[16]: "pkg" x[17]: "pkg" y[17]: "R" x[18]: "R" y[18]: "README.md" x[19]: "README.md" y[19]: "run-tests.sh" x[20]: "run-tests.sh" y[20]: "SparkR_2.2.0.tar.gz" x[21]: "metastore_db" y[21]: "pkg" x[22]: "pkg" y[22]: "R" x[23]: "R" y[23]: "README.md" x[24]: "README.md" y[24]: "run-tests.sh" x[25]: "run-tests.sh" y[25]: "SparkR_2.2.0.tar.gz" DONE =========================================================================== ``` **After** Second run: ``` SparkSQL functions: Spark package found in SPARK_HOME: .../spark ............................................................................................................................................................... ............................................................................................................................................................... ............................................................................................................................................................... ............................................................................................................................................................... ............................................................................................................................................................... ............................................................................................................................... ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #18335 from HyukjinKwon/SPARK-21128.
-
hyukjinkwon authored
## What changes were proposed in this pull request? This PR proposes three things as below: **Install packages per documentation** - this does not affect the tests itself (but CRAN which we are not doing via AppVeyor) up to my knowledge. This adds `knitr` and `rmarkdown` per https://github.com/apache/spark/blob/45824fb608930eb461e7df53bb678c9534c183a9/R/WINDOWS.md#unit-tests (please see https://github.com/apache/spark/commit/45824fb608930eb461e7df53bb678c9534c183a9) **Improve logs/shorten logs** - actually, long logs can be a problem on AppVeyor (e.g., see https://github.com/apache/spark/pull/17873) `R -e ...` repeats printing R information for each invocation as below: ``` R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. ``` It looks reducing the call might be slightly better and print out the versions together looks more readable. Before: ``` # R information ... > packageVersion('testthat') [1] '1.0.2' > > # R information ... > packageVersion('e1071') [1] '1.6.8' > > ... 3 more times ``` After: ``` # R information ... > packageVersion('knitr'); packageVersion('rmarkdown'); packageVersion('testthat'); packageVersion('e1071'); packageVersion('survival') [1] ‘1.16’ [1] ‘1.6’ [1] ‘1.0.2’ [1] ‘1.6.8’ [1] ‘2.41.3’ ``` **Add`appveyor.yml`/`dev/appveyor-install-dependencies.ps1` for triggering the test** Changing this file might break the test, e.g., https://github.com/apache/spark/pull/16927 ## How was this patch tested? Before (please see https://ci.appveyor.com/project/HyukjinKwon/spark/build/169-master) After (please see the AppVeyor build in this PR): Author: hyukjinkwon <gurwls223@gmail.com> Closes #18336 from HyukjinKwon/minor-add-knitr-and-rmarkdown.
-
liuzhaokun authored
[SPARK-21126] The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark [https://issues.apache.org/jira/browse/SPARK-21126](https://issues.apache.org/jira/browse/SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark,so I think it should be removed from configuration.md. Author: liuzhaokun <liu.zhaokun@zte.com.cn> Closes #18333 from liu-zhaokun/new3.
-
- Jun 16, 2017
-
-
zuotingbing authored
## What changes were proposed in this pull request? “spark.eventLog.dir” supports with space characters. 1. Update EventLoggingListenerSuite like `testDir = Utils.createTempDir(namePrefix = s"history log")` 2. Fix EventLoggingListenerSuite tests ## How was this patch tested? update unit tests Author: zuotingbing <zuo.tingbing9@zte.com.cn> Closes #18285 from zuotingbing/spark-resolveURI.
-
Yuming Wang authored
## What changes were proposed in this pull request? ABS function support string type. Hive/MySQL support this feature. Ref: https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java#L93 ## How was this patch tested? unit tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18153 from wangyum/SPARK-20931.
-
Wenchen Fan authored
## What changes were proposed in this pull request? Previous code mistakenly use `table.properties.get("comment")` to read the existing table comment, we should use `table.comment` ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes #18325 from cloud-fan/unset.
-
jinxing authored
## What changes were proposed in this pull request? In current code, blockIds in `OpenBlocks` are stored in the iterator on shuffle service. There are some redundant characters in blockId(`"shuffle_" + shuffleId + "_" + mapId + "_" + reduceId`). This pr proposes to improve the footprint and alleviate the memory pressure on shuffle service. Author: jinxing <jinxing6042@126.com> Closes #18231 from jinxing64/SPARK-20994-v2.
-
Yuming Wang authored
## What changes were proposed in this pull request? Update Running R Tests dependence packages to: ```bash R -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'e1071', 'survival'), repos='http://cran.us.r-project.org')" ``` ## How was this patch tested? manual tests Author: Yuming Wang <wgyumg@gmail.com> Closes #18271 from wangyum/building-spark.
-
jerryshao authored
[SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s.deploy.master.MasterSuite.master correctly recover the application" ## What changes were proposed in this pull request? Due to the RPC asynchronous event processing, The test "correctly recover the application" could potentially be failed. The issue could be found in here: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78126/testReport/org.apache.spark.deploy.master/MasterSuite/master_correctly_recover_the_application/. So here fixing this flaky test. ## How was this patch tested? Existing UT. CC cloud-fan jiangxb1987 , please help to review, thanks! Author: jerryshao <sshao@hortonworks.com> Closes #18321 from jerryshao/SPARK-12552-followup.
-
Kazuaki Ishizaki authored
## What changes were proposed in this pull request? This PR adds built-in SQL function `BIT_LENGTH()`, `CHAR_LENGTH()`, and `OCTET_LENGTH()` functions. `BIT_LENGTH()` returns the bit length of the given string or binary expression. `CHAR_LENGTH()` returns the length of the given string or binary expression. (i.e. equal to `LENGTH()`) `OCTET_LENGTH()` returns the byte length of the given string or binary expression. ## How was this patch tested? Added new test suites for these three functions Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #18046 from kiszk/SPARK-20749.
-
- Jun 15, 2017
-
-
Xianyang Liu authored
## What changes were proposed in this pull request? Just as the function name and comments of `TreeNode.mapChildren` mentioned, the function should be apply to all currently node children. So, the follow code should judge whether it is the children node. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L342 ## How was this patch tested? Existing tests. Author: Xianyang Liu <xianyang.liu@intel.com> Closes #18284 from ConeyLiu/treenode.
-
Xiao Li authored
### What changes were proposed in this pull request? `ALTER TABLE SET TBLPROPERTIES` should not overwrite `COMMENT` even if the input property does not have the property of `COMMENT`. This PR is to fix the issue. ### How was this patch tested? Covered by the existing tests. Author: Xiao Li <gatorsmile@gmail.com> Closes #18318 from gatorsmile/fixTableComment.
-
Michael Gummelt authored
## What changes were proposed in this pull request? Move Hadoop delegation token code from `spark-yarn` to `spark-core`, so that other schedulers (such as Mesos), may use it. In order to avoid exposing Hadoop interfaces in spark-core, the new Hadoop delegation token classes are kept private. In order to provider backward compatiblity, and to allow YARN users to continue to load their own delegation token providers via Java service loading, the old YARN interfaces, as well as the client code that uses them, have been retained. Summary: - Move registered `yarn.security.ServiceCredentialProvider` classes from `spark-yarn` to `spark-core`. Moved them into a new, private hierarchy under `HadoopDelegationTokenProvider`. Client code in `HadoopDelegationTokenManager` now loads credentials from a whitelist of three providers (`HadoopFSDelegationTokenProvider`, `HiveDelegationTokenProvider`, `HBaseDelegationTokenProvider`), instead of service loading, which means that users are not able to implement their own delegation token providers, as they are in the `spark-yarn` module. - The `yarn.security.ServiceCredentialProvider` interface has been kept for backwards compatibility, and to continue to allow YARN users to implement their own delegation token provider implementations. Client code in YARN now fetches tokens via the new `YARNHadoopDelegationTokenManager` class, which fetches tokens from the core providers through `HadoopDelegationTokenManager`, as well as service loads them from `yarn.security.ServiceCredentialProvider`. Old Hierarchy: ``` yarn.security.ServiceCredentialProvider (service loaded) HadoopFSCredentialProvider HiveCredentialProvider HBaseCredentialProvider yarn.security.ConfigurableCredentialManager ``` New Hierarchy: ``` HadoopDelegationTokenManager HadoopDelegationTokenProvider (not service loaded) HadoopFSDelegationTokenProvider HiveDelegationTokenProvider HBaseDelegationTokenProvider yarn.security.ServiceCredentialProvider (service loaded) yarn.security.YARNHadoopDelegationTokenManager ``` ## How was this patch tested? unit tests Author: Michael Gummelt <mgummelt@mesosphere.io> Author: Dr. Stefan Schimanski <sttts@mesosphere.io> Closes #17723 from mgummelt/SPARK-20434-refactor-kerberos.
-
Xingbo Jiang authored
[SPARK-16251][SPARK-20200][CORE][TEST] Flaky test: org.apache.spark.rdd.LocalCheckpointSuite.missing checkpoint block fails with informative message ## What changes were proposed in this pull request? Currently we don't wait to confirm the removal of the block from the slave's BlockManager, if the removal takes too much time, we will fail the assertion in this test case. The failure can be easily reproduced if we sleep for a while before we remove the block in BlockManagerSlaveEndpoint.receiveAndReply(). ## How was this patch tested? N/A Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes #18314 from jiangxb1987/LocalCheckpointSuite.
-
Felix Cheung authored
## What changes were proposed in this pull request? doc only change ## How was this patch tested? manually Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #18312 from felixcheung/sqljsonwholefiledoc.
-
ALeksander Eskilson authored
## What changes were proposed in this pull request? This pull-request exclusively includes the class splitting feature described in #16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes: 1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the `CodeGenerator` class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes. 2. Changes `addNewFunction` to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of `addNewFunction` throughout the codebase are modified so that the returned name is properly used. 3. Removes instances of the `this` keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the `this` keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class. ## How was this patch tested? Added a test case to the `GeneratedProjectionSuite` that increases the number of columns tested in various projections to a threshold that would previously have triggered a `JaninoRuntimeException` for the Constant Pool. Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in #16648): excess global mutable state. A second PR may be opened to resolve that issue. Author: ALeksander Eskilson <alek.eskilson@cerner.com> Closes #18075 from bdrillard/class_splitting_only.
-
Xiao Li authored
### What changes were proposed in this pull request? The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`. ### How was this patch tested? N/A Author: Xiao Li <gatorsmile@gmail.com> Closes #18202 from gatorsmile/renameCVSOption.
-