From d8e14db84f5ea752fbe92036209f67232b4dcc1f Mon Sep 17 00:00:00 2001
From: hyukjinkwon <gurwls223@gmail.com>
Date: Tue, 27 Dec 2016 18:50:54 +0000
Subject: [PATCH] [SPARK-18842][TESTS] De-duplicate paths in classpaths in
 processes for local-cluster mode in ReplSuite to work around the length
 limitation on Windows

## What changes were proposed in this pull request?

`ReplSuite`s hang due to the length limitation on Windows with the exception as below:

```
Spark context available as 'sc' (master = local-cluster[1,1,1024], app id = app-20161223114000-0000).
Spark session available as 'spark'.
Exception in thread "ExecutorRunner for app-20161223114000-0000/26995" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOf(Arrays.java:3332)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
	at java.lang.StringBuilder.append(StringBuilder.java:202)
	at java.lang.ProcessImpl.createCommandLine(ProcessImpl.java:194)
	at java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
	at java.lang.ProcessImpl.start(ProcessImpl.java:137)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:167)
	at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
```

The reason is, it keeps failing and goes in an infinite loop. This fails because it uses the paths (via `getFile`) from URLs in the tests whereas some added afterward are normal local paths.
(`url.getFile` gives `/C:/a/b/c` and some paths are added later as the format of `C:\a\b\c`. )

So, many classpaths are duplicated because normal local paths and paths from URLs are mixed. This length is up to 40K which hits the length limitation problem (32K) on Windows.

The full command line built here is - https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb

## How was this patch tested?

Manually via AppVeyor.

**Before**
https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues

**After**
https://ci.appveyor.com/project/spark-test/spark/build/398-find-path-issues

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16398 from HyukjinKwon/SPARK-18842-more.
---
 .../src/test/scala/org/apache/spark/repl/ReplSuite.scala        | 2 +-
 .../src/test/scala/org/apache/spark/repl/ReplSuite.scala        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 26b8600c32..b3688c9606 100644
--- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -44,7 +44,7 @@ class ReplSuite extends SparkFunSuite {
         }
       }
     }
-    val classpath = paths.mkString(File.pathSeparator)
+    val classpath = paths.map(new File(_).getAbsolutePath).mkString(File.pathSeparator)
 
     val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH)
     System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath)
diff --git a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 9262e938c2..55c91675ed 100644
--- a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -45,7 +45,7 @@ class ReplSuite extends SparkFunSuite {
         }
       }
     }
-    val classpath = paths.mkString(File.pathSeparator)
+    val classpath = paths.map(new File(_).getAbsolutePath).mkString(File.pathSeparator)
 
     val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH)
     System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath)
-- 
GitLab