Merge remote-tracking branch 'apache/master' into project-refactor

Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala

Merge remote-tracking branch 'apache/master' into project-refactor
3b4c4c7f · Tathagata Das · d0fd3b9a · a2e7e049 · 3b4c4c7f · 3b4c4c7f
Commit 3b4c4c7f authored 11 years ago by Tathagata Das
--- a/.gitignore
+++ b/.gitignore
 *~
 *.swp
+*.ipr
 *.iml
+*.iws
 .idea/
 .settings
 .cache

--- a/README.md
+++ b/README.md
@@ -13,20 +13,20 @@ This README file only contains basic setup instructions.
 ## Building

 Spark requires Scala 2.10. The project is built using Simple Build Tool (SBT),
-which is packaged with it. To build Spark and its example programs, run:
+which can be obtained [here](http://www.scala-sbt.org). To build Spark and its example programs, run:

-    sbt/sbt assembly
+    sbt assembly

 Once you've built Spark, the easiest way to start using it is the shell:

-    ./spark-shell
+    ./bin/spark-shell

-Or, for the Python API, the Python shell (`./pyspark`).
+Or, for the Python API, the Python shell (`./bin/pyspark`).

 Spark also comes with several sample programs in the `examples` directory.
-To run one of them, use `./run-example <class> <params>`. For example:
+To run one of them, use `./bin/run-example <class> <params>`. For example:

-    ./run-example org.apache.spark.examples.SparkLR local[2]
+    ./bin/run-example org.apache.spark.examples.SparkLR local[2]

 will run the Logistic Regression example locally on 2 CPUs.

@@ -36,7 +36,13 @@ All of the Spark samples take a `<master>` parameter that is the cluster URL
 to connect to. This can be a mesos:// or spark:// URL, or "local" to run
 locally with one thread, or "local[N]" to run locally with N threads.

+## Running tests

+Testing first requires [Building](#Building) Spark. Once Spark is built, tests
+can be run using:
+
+`sbt test`
+ 
 ## A Note About Hadoop Versions

 Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
@@ -49,22 +55,22 @@ For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
 versions without YARN, use:

    # Apache Hadoop 1.2.1
-    $ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
+    $ SPARK_HADOOP_VERSION=1.2.1 sbt assembly

    # Cloudera CDH 4.2.0 with MapReduce v1
-    $ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly
+    $ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt assembly

 For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
 with YARN, also set `SPARK_YARN=true`:

    # Apache Hadoop 2.0.5-alpha
-    $ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
+    $ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt assembly

    # Cloudera CDH 4.2.0 with MapReduce v2
-    $ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly
+    $ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt assembly

    # Apache Hadoop 2.2.X and newer
-    $ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
+    $ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt assembly

 When developing a Spark application, specify the Hadoop version by adding the
 "hadoop-client" artifact to your project's dependencies. For example, if you're

--- a/assembly/lib/PY4J_LICENSE.txt
+++ b/assembly/lib/PY4J_LICENSE.txt
-
-Copyright (c) 2009-2011, Barthelemy Dagenais All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
-
- Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
-
- Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation
-and/or other materials provided with the distribution.
-
- The name of the author may not be used to endorse or promote products
-derived from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGE.
--- a/assembly/lib/PY4J_VERSION.txt
+++ b/assembly/lib/PY4J_VERSION.txt
-b7924aabe9c5e63f0a4d8bbd17019534c7ec014e
--- a/assembly/lib/net/sf/py4j/py4j/0.7/py4j-0.7.jar
+++ b/assembly/lib/net/sf/py4j/py4j/0.7/py4j-0.7.jar
--- a/assembly/lib/net/sf/py4j/py4j/0.7/py4j-0.7.pom
+++ b/assembly/lib/net/sf/py4j/py4j/0.7/py4j-0.7.pom
-<?xml version="1.0" encoding="UTF-8"?>
-<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
-    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
-  <modelVersion>4.0.0</modelVersion>
-  <groupId>net.sf.py4j</groupId>
-  <artifactId>py4j</artifactId>
-  <version>0.7</version>
-  <description>POM was created from install:install-file</description>
-</project>
--- a/assembly/lib/net/sf/py4j/py4j/maven-metadata-local.xml
+++ b/assembly/lib/net/sf/py4j/py4j/maven-metadata-local.xml
-<?xml version="1.0" encoding="UTF-8"?>
-<metadata>
-  <groupId>net.sf.py4j</groupId>
-  <artifactId>py4j</artifactId>
-  <versioning>
-    <release>0.7</release>
-    <versions>
-      <version>0.7</version>
-    </versions>
-    <lastUpdated>20130828020333</lastUpdated>
-  </versioning>
-</metadata>
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -67,7 +67,7 @@
    <dependency>
      <groupId>net.sf.py4j</groupId>
      <artifactId>py4j</artifactId>
-      <version>0.7</version>
+      <version>0.8.1</version>
    </dependency>
  </dependencies>

@@ -124,7 +124,17 @@

  <profiles>
    <profile>
-      <id>hadoop2-yarn</id>
+      <id>yarn-alpha</id>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.spark</groupId>
+          <artifactId>spark-yarn-alpha_${scala.binary.version}</artifactId>
+          <version>${project.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
+    <profile>
+      <id>yarn</id>
      <dependencies>
        <dependency>
          <groupId>org.apache.spark</groupId>

--- a/assembly/src/main/assembly/assembly.xml
+++ b/assembly/src/main/assembly/assembly.xml
@@ -39,23 +39,20 @@
    </fileSet>
    <fileSet>
      <directory>
-        ${project.parent.basedir}/bin/
+        ${project.parent.basedir}/sbin/
      </directory>
-      <outputDirectory>/bin</outputDirectory>
+      <outputDirectory>/sbin</outputDirectory>
      <includes>
        <include>**/*</include>
      </includes>
    </fileSet>
    <fileSet>
      <directory>
-        ${project.parent.basedir}
+        ${project.parent.basedir}/bin/
      </directory>
      <outputDirectory>/bin</outputDirectory>
      <includes>
-        <include>run-example*</include>
-        <include>spark-class*</include>
-        <include>spark-shell*</include>
-        <include>spark-executor*</include>
+        <include>**/*</include>
      </includes>
    </fileSet>
  </fileSets>

--- a/bin/compute-classpath.cmd
+++ b/bin/compute-classpath.cmd
@@ -29,7 +29,7 @@ rem Load environment variables from conf\spark-env.cmd, if it exists
 if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"

 rem Build up classpath
-set CLASSPATH=%SPARK_CLASSPATH%;%FWDIR%conf
+set CLASSPATH=%FWDIR%conf
 if exist "%FWDIR%RELEASE" (
  for %%d in ("%FWDIR%jars\spark-assembly*.jar") do (
    set ASSEMBLY_JAR=%%d

--- a/bin/compute-classpath.sh
+++ b/bin/compute-classpath.sh
@@ -26,7 +26,7 @@ SCALA_VERSION=2.10
 FWDIR="$(cd `dirname $0`/..; pwd)"

 # Load environment variables from conf/spark-env.sh, if it exists
-if [ -e $FWDIR/conf/spark-env.sh ] ; then
+if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
  . $FWDIR/conf/spark-env.sh
 fi


--- a/pyspark
+++ b/pyspark
@@ -18,7 +18,7 @@
 #

 # Figure out where the Scala framework is installed
-FWDIR="$(cd `dirname $0`; pwd)"
+FWDIR="$(cd `dirname $0`/..; pwd)"

 # Export this as SPARK_HOME
 export SPARK_HOME="$FWDIR"
@@ -31,13 +31,13 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
  ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*.jar >& /dev/null
  if [[ $? != 0 ]]; then
    echo "Failed to find Spark assembly in $FWDIR/assembly/target" >&2
-    echo "You need to build Spark with sbt/sbt assembly before running this program" >&2
+    echo "You need to build Spark with sbt assembly before running this program" >&2
    exit 1
  fi
 fi

 # Load environment variables from conf/spark-env.sh, if it exists
-if [ -e $FWDIR/conf/spark-env.sh ] ; then
+if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
  . $FWDIR/conf/spark-env.sh
 fi


--- a/pyspark.cmd
+++ b/pyspark.cmd
--- a/pyspark2.cmd
+++ b/pyspark2.cmd
@@ -20,7 +20,7 @@ rem
 set SCALA_VERSION=2.10

 rem Figure out where the Spark framework is installed
-set FWDIR=%~dp0
+set FWDIR=%~dp0..\

 rem Export this as SPARK_HOME
 set SPARK_HOME=%FWDIR%

--- a/run-example
+++ b/run-example
@@ -25,13 +25,13 @@ esac
 SCALA_VERSION=2.10

 # Figure out where the Scala framework is installed
-FWDIR="$(cd `dirname $0`; pwd)"
+FWDIR="$(cd `dirname $0`/..; pwd)"

 # Export this as SPARK_HOME
 export SPARK_HOME="$FWDIR"

 # Load environment variables from conf/spark-env.sh, if it exists
-if [ -e $FWDIR/conf/spark-env.sh ] ; then
+if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
  . $FWDIR/conf/spark-env.sh
 fi

@@ -55,7 +55,7 @@ if [ -e "$EXAMPLES_DIR"/target/spark-examples*[0-9Tg].jar ]; then
 fi
 if [[ -z $SPARK_EXAMPLES_JAR ]]; then
  echo "Failed to find Spark examples assembly in $FWDIR/examples/target" >&2
-  echo "You need to build Spark with sbt/sbt assembly before running this program" >&2
+  echo "You need to build Spark with sbt assembly before running this program" >&2
  exit 1
 fi


--- a/run-example.cmd
+++ b/run-example.cmd
--- a/run-example2.cmd
+++ b/run-example2.cmd
@@ -20,7 +20,7 @@ rem
 set SCALA_VERSION=2.10

 rem Figure out where the Spark framework is installed
-set FWDIR=%~dp0
+set FWDIR=%~dp0..\

 rem Export this as SPARK_HOME
 set SPARK_HOME=%FWDIR%
@@ -49,7 +49,7 @@ if "x%SPARK_EXAMPLES_JAR%"=="x" (

 rem Compute Spark classpath using external script
 set DONT_PRINT_CLASSPATH=1
-call "%FWDIR%bin\compute-classpath.cmd"
+call "%FWDIR%sbin\compute-classpath.cmd"
 set DONT_PRINT_CLASSPATH=0
 set CLASSPATH=%SPARK_EXAMPLES_JAR%;%CLASSPATH%


--- a/spark-class
+++ b/spark-class
@@ -25,13 +25,13 @@ esac
 SCALA_VERSION=2.10

 # Figure out where the Scala framework is installed
-FWDIR="$(cd `dirname $0`; pwd)"
+FWDIR="$(cd `dirname $0`/..; pwd)"

 # Export this as SPARK_HOME
 export SPARK_HOME="$FWDIR"

 # Load environment variables from conf/spark-env.sh, if it exists
-if [ -e $FWDIR/conf/spark-env.sh ] ; then
+if [ -e "$FWDIR/conf/spark-env.sh" ] ; then
  . $FWDIR/conf/spark-env.sh
 fi

@@ -92,7 +92,7 @@ JAVA_OPTS="$OUR_JAVA_OPTS"
 JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
 JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"
 # Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $FWDIR/conf/java-opts ] ; then
+if [ -e "$FWDIR/conf/java-opts" ] ; then
  JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
 fi
 export JAVA_OPTS
@@ -104,7 +104,7 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
  jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar")
  if [ "$num_jars" -eq "0" ]; then
    echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
-    echo "You need to build Spark with 'sbt/sbt assembly' before running this program." >&2
+    echo "You need to build Spark with 'sbt assembly' before running this program." >&2
    exit 1
  fi
  if [ "$num_jars" -gt "1" ]; then
@@ -129,11 +129,16 @@ fi

 # Compute classpath using external script
 CLASSPATH=`$FWDIR/bin/compute-classpath.sh`
-CLASSPATH="$CLASSPATH:$SPARK_TOOLS_JAR"
+
+if [ "$1" == "org.apache.spark.tools.JavaAPICompletenessChecker" ]; then
+  CLASSPATH="$CLASSPATH:$SPARK_TOOLS_JAR"
+fi

 if $cygwin; then
  CLASSPATH=`cygpath -wp $CLASSPATH`
-  export SPARK_TOOLS_JAR=`cygpath -w $SPARK_TOOLS_JAR`
+  if [ "$1" == "org.apache.spark.tools.JavaAPICompletenessChecker" ]; then
+    export SPARK_TOOLS_JAR=`cygpath -w $SPARK_TOOLS_JAR`
+  fi
 fi
 export CLASSPATH


--- a/spark-class.cmd
+++ b/spark-class.cmd
--- a/spark-class2.cmd
+++ b/spark-class2.cmd
@@ -20,7 +20,7 @@ rem
 set SCALA_VERSION=2.10

 rem Figure out where the Spark framework is installed
-set FWDIR=%~dp0
+set FWDIR=%~dp0..\

 rem Export this as SPARK_HOME
 set SPARK_HOME=%FWDIR%
@@ -73,7 +73,7 @@ for %%d in ("%TOOLS_DIR%\target\scala-%SCALA_VERSION%\spark-tools*assembly*.jar"

 rem Compute classpath using external script
 set DONT_PRINT_CLASSPATH=1
-call "%FWDIR%bin\compute-classpath.cmd"
+call "%FWDIR%sbin\compute-classpath.cmd"
 set DONT_PRINT_CLASSPATH=0
 set CLASSPATH=%CLASSPATH%;%SPARK_TOOLS_JAR%