- Aug 21, 2013
-
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Matei Zaharia authored
-
Reynold Xin authored
-
Reynold Xin authored
-
Matei Zaharia authored
-
- Aug 20, 2013
-
-
Reynold Xin authored
Removed meaningless types
-
Mark Hamstra authored
-
Matei Zaharia authored
Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala
-
Matei Zaharia authored
Renamed 'priority' to 'jobId' and assorted minor changes
-
Mark Hamstra authored
-
Mark Hamstra authored
-
Matei Zaharia authored
Scheduler fixes and improvements
-
- Aug 19, 2013
-
-
Matei Zaharia authored
Small fixes to web UI
-
Matei Zaharia authored
-
Matei Zaharia authored
Allow subclasses of Product2 in all key-value related classes
-
Matei Zaharia authored
- Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too
-
Reynold Xin authored
Code review feedback. (added tests for cogroup and substract; added more documentation on MutablePair)
-
Jey Kottalam authored
-
Reynold Xin authored
-
Reynold Xin authored
Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle code path working with general Product2.
-
- Aug 18, 2013
-
-
Reynold Xin authored
-
Reynold Xin authored
Allow subclasses of Product2 in all key-value related classes (ShuffleDependency, PairRDDFunctions, etc).
-
Matei Zaharia authored
resetting locality level after a non-local launch
-
Matei Zaharia authored
-
Matei Zaharia authored
- When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options
-
Matei Zaharia authored
-
Matei Zaharia authored
- Added periodic revival of offers in StandaloneSchedulerBackend - Replaced task scheduling aggression with multi-level delay scheduling in ClusterTaskSetManager - Fixed ZippedRDD preferred locations because they can't currently be process-local - Fixed some uses of hostPort
-
Matei Zaharia authored
- Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Matei Zaharia authored
Implementing SPARK-878 for PySpark: adding zip and egg files to context ...
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Matei Zaharia authored
Two minor RDD refactoring
-
- Aug 17, 2013
-
-
Reynold Xin authored
Moved shuffle serializer setting from a constructor parameter to a setSerializer method in various RDDs that involve shuffle operations.
-
Reynold Xin authored
-
Reynold Xin authored
-