- Dec 26, 2013
-
-
Matei Zaharia authored
Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
-
- Dec 25, 2013
-
-
Tor Myklebust authored
-
Tor Myklebust authored
-
Matei Zaharia authored
Typo: avaiable -> available
-
Reynold Xin authored
Fixed job name in the java streaming example.
-
- Dec 24, 2013
-
-
Tor Myklebust authored
-
Tor Myklebust authored
-
Tor Myklebust authored
-
Andrew Ash authored
-
Patrick Wendell authored
Deduplicate Local and Cluster schedulers. The code in LocalScheduler/LocalTaskSetManager was nearly identical to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy made making updating the schedulers unnecessarily painful and error- prone. This commit combines the two into a single TaskScheduler/ TaskSetManager. Unfortunately the diff makes this change look much more invasive than it is -- TaskScheduler.scala is only superficially changed (names updated, overrides removed) from the old ClusterScheduler.scala, and the same with TaskSetManager.scala. Thanks @rxin for suggesting this change!
-
Patrick Wendell authored
Clean up shuffle files once their metadata is gone Previously, we would only clean the in-memory metadata for consolidated shuffle files. Additionally, fixes a bug where the Metadata Cleaner was ignoring type-specific TTLs.
-
Kay Ousterhout authored
-
Tor Myklebust authored
-
Tor Myklebust authored
-
Tor Myklebust authored
-
Matei Zaharia authored
Change the order of CLASSPATH. SPARK_TOOLS_JAR should be placed after CLASSPATH or at least after SPARK_CLASSPATH. If SPARK_TOOLS_JAR is placed before CLASSPATH, all assembled classes and resources in spark-tools-assembly.jar beat those in CLASSPATH or SPARK_CLASSPATH, which might be replaced by customized versions.
-
Tor Myklebust authored
-
azuryyu authored
-
azuryyu authored
-
Reynold Xin authored
Show full stack trace and time taken in unit tests.
-
- Dec 23, 2013
-
-
Reynold Xin authored
-
Matei Zaharia authored
Refactored the streaming scheduler and added StreamingListener interface - Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package. - Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it. - Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
-
Reynold Xin authored
Added SPARK-968 implementation for review Added SPARK-968 implementation for review
-
wangda.tan authored
-
- Dec 22, 2013
-
-
Tor Myklebust authored
-
Kay Ousterhout authored
-
wangda.tan authored
-
Kay Ousterhout authored
-
- Dec 21, 2013
-
-
Tor Myklebust authored
-
Tor Myklebust authored
-
- Dec 20, 2013
-
-
Tathagata Das authored
-
Kay Ousterhout authored
-
Kay Ousterhout authored
Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
-
Patrick Wendell authored
Minor cleanup for standalone scheduler See commit messages
-
Tor Myklebust authored
-
Tor Myklebust authored
-
Tor Myklebust authored
-