Commits · 6585f49841ada637b0811e0aadcf93132fff7001 · cs525-sp18-g07 / spark

Aug 21, 2013
- Update build docs · 6585f498
  Jey Kottalam authored 11 years ago
  
  6585f498
- Allow build configuration to be set in conf/spark-env.sh · 66e7a38a
  Jey Kottalam authored 11 years ago
  
  66e7a38a
- Use "hadoop.version" property when specifying Hadoop YARN version too · 31644a01
  Jey Kottalam authored 11 years ago
  
  31644a01
- Update jekyll plugin to match docs/README.md · 9c6f8df3
  Jey Kottalam authored 11 years ago
  
  9c6f8df3
- Change default SPARK_HADOOP_VERSION in make-distribution.sh too · 111b2741
  Matei Zaharia authored 11 years ago
  
  111b2741
- Merge branch 'master' of github.com:mesos/spark · 8e3ea4c7
  Reynold Xin authored 11 years ago
  
  8e3ea4c7
- Downgraded default build hadoop version to 1.0.4. · af602ba9
  Reynold Xin authored 11 years ago
  
  af602ba9
- Update docs for Spark UI port · 53b1c306
  Matei Zaharia authored 11 years ago
  
  53b1c306
Aug 20, 2013
- Merge pull request #851 from markhamstra/MutablePairTE · 2905611c
  Reynold Xin authored 11 years ago
  
  Removed meaningless types
  2905611c
- Removed meaningless types · 5eea613e
  Mark Hamstra authored 11 years ago
  
  5eea613e
- Merge remote-tracking branch 'jey/hadoop-agnostic' · aa2b89d9
  Matei Zaharia authored 11 years ago
  
  Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala
  aa2b89d9
- Merge pull request #844 from markhamstra/priorityRename · d61337f6
  Matei Zaharia authored 11 years ago
  
  Renamed 'priority' to 'jobId' and assorted minor changes
  d61337f6
- changeGeneration --> changeEpoch renaming · 1630fbf8
  Mark Hamstra authored 11 years ago
  
  1630fbf8
- Renamed 'priority' to 'jobId' and assorted minor changes · ad184104
  Mark Hamstra authored 11 years ago
  
  ad184104
- Merge pull request #828 from mateiz/sched-improvements · 8cae72e9
  Matei Zaharia authored 11 years ago
  
  Scheduler fixes and improvements
  8cae72e9
Aug 19, 2013
- Merge pull request #849 from mateiz/web-fixes · efeb1429
  Matei Zaharia authored 11 years ago
  
  Small fixes to web UI
  efeb1429
- Allow some wiggle room in UISuite port test and in EC2 ports · 793a722f
  Matei Zaharia authored 11 years ago
  
  793a722f
- Merge pull request #847 from rxin/rdd · abdc1f8b
  Matei Zaharia authored 11 years ago
  
  Allow subclasses of Product2 in all key-value related classes
  abdc1f8b
- Small fixes to web UI: · 498a2618
  Matei Zaharia authored 11 years ago
  
  - Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too
  498a2618
- Code review feedback. (added tests for cogroup and substract; added more... · 5054abd4
  Reynold Xin authored 11 years ago
  
  Code review feedback. (added tests for cogroup and substract; added more documentation on MutablePair)
  5054abd4
- Update SBT build to use simpler fix for Hadoop 0.23.9 · 6f6944c8
  Jey Kottalam authored 11 years ago
  
  6f6944c8
- Added a test for sorting using MutablePair's. · acc4aa1f
  Reynold Xin authored 11 years ago
  
  acc4aa1f
- Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle... · 71d705a6
  Reynold Xin authored 11 years ago
  
  Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle code path working with general Product2.
  71d705a6
Aug 18, 2013

Added the missing RDD files and cleaned up SparkContext. · 2a7b99c0
Reynold Xin authored 11 years ago

2a7b99c0
Allow subclasses of Product2 in all key-value related classes... · 82bf4c03
Reynold Xin authored 11 years ago
```
Allow subclasses of Product2 in all key-value related classes (ShuffleDependency, PairRDDFunctions, etc).
```
82bf4c03
Added unit tests for ClusterTaskSetManager, and fix a bug found with · 8ac3d1e2
Matei Zaharia authored 11 years ago
```
resetting locality level after a non-local launch
```
8ac3d1e2
Added some comments on threading in scheduler code · 4004cf77
Matei Zaharia authored 11 years ago

4004cf77

Address some review comments: · 2a4ed102

Matei Zaharia authored 11 years ago

- When a resourceOffers() call has multiple offers, force the TaskSets
  to consider them in increasing order of locality levels so that they
  get a chance to launch stuff locally across all offers

- Simplify ClusterScheduler.prioritizeContainers

- Add docs on the new configuration options

2a4ed102

Comment cleanup (via Kay) and some debug messages · 222c8971
Matei Zaharia authored 11 years ago

222c8971

More scheduling fixes: · cf39d45d

Matei Zaharia authored 11 years ago

- Added periodic revival of offers in StandaloneSchedulerBackend

- Replaced task scheduling aggression with multi-level delay scheduling
  in ClusterTaskSetManager

- Fixed ZippedRDD preferred locations because they can't currently be
  process-local

- Fixed some uses of hostPort

cf39d45d

Initial work towards scheduler refactoring: · 90a04dab

Matei Zaharia authored 11 years ago

- Replace use of hostPort vs host in Task.preferredLocations with a
  TaskLocation class that contains either an executorId and a host or
  just a host. This is part of a bigger effort to eliminate hostPort
  based data structures and just use executorID, since the hostPort vs
  host stuff is confusing (and not checkable with static typing, leading
  to ugly debug code), and hostPorts are not provided by Mesos.

- Replaced most hostPort-based data structures and fields as above.

- Simplified ClusterTaskSetManager to deal with preferred locations in a
  more concise way and generally be more concise.

- Updated the way ClusterTaskSetManager handles racks: instead of
  enqueueing a task to a separate queue for all the hosts in the rack,
  which would create lots of large queues, have one queue per rack name.

- Removed non-local fallback stuff in ClusterScheduler that tried to
  launch less-local tasks on a node once the local ones were all
  assigned. This change didn't work because many cluster schedulers send
  offers for just one node at a time (even the standalone and YARN ones
  do so as nodes join the cluster one by one). Thus, lots of non-local
  tasks would be assigned even though a node with locality for them
  would be able to receive tasks just a short time later.

- Renamed MapOutputTracker "generations" to "epochs".

90a04dab

Remove redundant dependencies from POMs · 23f4622a
Jey Kottalam authored 11 years ago

23f4622a
Fix Maven build with Hadoop 0.23.9 · bdd861c6
Jey Kottalam authored 11 years ago

bdd861c6
Merge pull request #840 from AndreSchumacher/zipegg · 8fa07479
Matei Zaharia authored 11 years ago
```
Implementing SPARK-878 for PySpark: adding zip and egg files to context ...
```
8fa07479
Don't assume spark-examples JAR always exists · 47a7c433
Jey Kottalam authored 11 years ago

47a7c433
Make YARN POM file valid · 44000b10
Jey Kottalam authored 11 years ago

44000b10
Merge pull request #846 from rxin/rdd · 1e137a5a
Matei Zaharia authored 11 years ago
```
Two minor RDD refactoring
```
1e137a5a

Aug 17, 2013
- Moved shuffle serializer setting from a constructor parameter to a... · 2c00ea3e
  Reynold Xin authored 11 years ago
  
  Moved shuffle serializer setting from a constructor parameter to a setSerializer method in various RDDs that involve shuffle operations.
  2c00ea3e
- Removed the mapSideCombine option in partitionBy. · 0e84fee7
  Reynold Xin authored 11 years ago
  
  0e84fee7
- Removed the mapSideCombine option in CoGroupedRDD. · 10af952a
  Reynold Xin authored 11 years ago
  
  10af952a