Commits · 26f616d73a3441cec749335913890b8c721de9b1 · cs525-sp18-g07 / spark

Nov 17, 2013
- Merge pull request #3 from aarondav/pv-test · 26f616d7
  Reynold Xin authored 11 years ago
  
  Add PrimitiveVectorSuite and fix bug in resize()
  26f616d7
- Add PrimitiveVectorSuite and fix bug in resize() · 85763f49
  Aaron Davidson authored 11 years ago
  
  85763f49
- Return the vector itself for trim and resize method in PrimitiveVector. · 16a2286d
  Reynold Xin authored 11 years ago
  
  16a2286d
- Slightly enhanced PrimitiveVector: · c30979c7
  Reynold Xin authored 11 years ago
  
  1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.
  c30979c7
Nov 16, 2013

Merge pull request #178 from hsaputra/simplecleanupcode · 1b5b3583

Matei Zaharia authored 11 years ago

Simple cleanup on Spark's Scala code

Simple cleanup on Spark's Scala code while testing some modules:
-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method calls like size that does not have size effect.

1b5b3583

Nov 15, 2013

Simple cleanup on Spark's Scala code while testing core and yarn modules: · c33f8020

Henry Saputra authored 11 years ago

-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method call like size that does not have size effect.

c33f8020

Merge pull request #173 from kayousterhout/scheduler_hang · 96e0fb46

Matei Zaharia authored 11 years ago

Fix bug where scheduler could hang after task failure.

When a task fails, we need to call reviveOffers() so that the
task can be rescheduled on a different machine. In the current code,
the state in ClusterTaskSetManager indicating which tasks are
pending may be updated after revive offers is called (there's a
race condition here), so when revive offers is called, the task set
manager does not yet realize that there are failed tasks that need
to be relaunched.

This isn't currently unit tested but will be once my pull request for
merging the cluster and local schedulers goes in -- at which point
many more of the unit tests will exercise the code paths through
the cluster scheduler (currently the failure test suite uses the local
scheduler, which is why we didn't see this bug before).

96e0fb46

Nov 14, 2013
- Merge pull request #175 from kayousterhout/no_retry_not_serializable · dfd40e9f
  Matei Zaharia authored 11 years ago
  
  Don't retry tasks when they fail due to a NotSerializableException As with my previous pull request, this will be unit tested once the Cluster and Local schedulers get merged.
  dfd40e9f
- Merge pull request #174 from ahirreddy/master · ed25105f
  Matei Zaharia authored 11 years ago
  
  Write Spark UI url to driver file on HDFS This makes the SIMR code path simpler
  ed25105f
- Don't retry tasks when they fail due to a NotSerializableException · 29c88e40
  Kay Ousterhout authored 11 years ago
  
  29c88e40
- Fix bug where scheduler could hang after task failure. · b4546ba9
  Kay Ousterhout authored 11 years ago
  
  When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched.
  b4546ba9
- Merge pull request #169 from kayousterhout/mesos_fix · 1a4cfbea
  Reynold Xin authored 11 years ago
  
  Don't ignore spark.cores.max when using Mesos Coarse mode totalCoresAcquired is decremented but never incremented, causing Spark to effectively ignore spark.cores.max in coarse grained Mesos mode.
  1a4cfbea
- Merge pull request #170 from liancheng/hadooprdd-doc-typo · 5a4f4836
  Reynold Xin authored 11 years ago
  
  Fixed a scaladoc typo in HadoopRDD.scala
  5a4f4836
- Merge pull request #171 from RIA-pierre-borckmans/master · d76f5203
  Reynold Xin authored 11 years ago
  
  Fixed typos in the CDH4 distributions version codes. Nothing important, but annoying when doing a copy/paste...
  d76f5203
- Fixed typos in the CDH4 distributions version codes. · bef398e5
  RIA-pierre-borckmans authored 11 years ago
  
  bef398e5
- Fixed a scaladoc typo in HadoopRDD.scala · cc8995c8
  Lian, Cheng authored 11 years ago
  
  cc8995c8
- Don't ignore spark.cores.max when using Mesos Coarse mode · 5125cd34
  Kay Ousterhout authored 11 years ago
  
  5125cd34
Nov 13, 2013

Merge pull request #159 from liancheng/dagscheduler-actor-refine · 2054c61a

Matei Zaharia authored 11 years ago

Migrate the daemon thread started by DAGScheduler to Akka actor

`DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`. This is a classical actor use case. By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread).

But things become a little complicated when taking existing test code into consideration.

Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`. To minimize code change, I chose to let the actor to delegate messages to `processEvent`. Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct.

Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null. But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`. What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?). I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold. That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896).

At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.

2054c61a

Merge pull request #165 from NathanHowell/kerberos-master · 9290e5bc

Matei Zaharia authored 11 years ago

spark-assembly.jar fails to authenticate with YARN ResourceManager

The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.

9290e5bc

Write Spark UI url to driver file on HDFS · 0ea1f8b2
Ahir Reddy authored 11 years ago

0ea1f8b2

Merge pull request #166 from ahirreddy/simr-spark-ui · 39af914b

Matei Zaharia authored 11 years ago

SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to ...

...be retrieved by SIMR clients

39af914b

Nov 12, 2013

Merge pull request #137 from tgravescs/sparkYarnJarsHdfsRebase · f49ea28d

Matei Zaharia authored 11 years ago

Allow spark on yarn to be run from HDFS.

Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs.  Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate.  Also add a bit of error handling for missing arguments.

f49ea28d

Merge pull request #153 from ankurdave/stop-spot-cluster · 87f2f4e5

Matei Zaharia authored 11 years ago

Enable stopping and starting a spot cluster

Clusters launched using `--spot-price` contain an on-demand master and spot slaves. Because EC2 does not support stopping spot instances, the spark-ec2 script previously could only destroy such clusters.

This pull request makes it possible to stop and restart a spot cluster.
* The `stop` command works as expected for a spot cluster: the master is stopped and the slaves are terminated.
* To start a stopped spot cluster, the user must invoke `launch --use-existing-master`. This launches fresh spot slaves but resumes the existing master.

87f2f4e5

Merge pull request #160 from xiajunluan/JIRA-923 · b8bf04a0

Matei Zaharia authored 11 years ago

Fix bug JIRA-923

Fix column sort issue in UI for JIRA-923.
https://spark-project.atlassian.net/browse/SPARK-923

Conflicts:
	core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
	core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala

b8bf04a0

SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to be... · ccb099e8
Ahir Reddy authored 11 years ago
```
SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to be retrieved by SIMR clients
```
ccb099e8
Upgrade to sbt-assembly 0.9.2 · 48eac0bc
Nathan Howell authored 11 years ago

48eac0bc

spark-assembly.jar fails to authenticate with YARN ResourceManager · 23146a67

Nathan Howell authored 11 years ago

sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement:

    DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null
    DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null
    ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583)
    WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

This previously would just contain a single class:

$ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo
Archive:  assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar
  inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo

    org.apache.hadoop.security.AnnotatedSecurityInfo

And now has the full list of classes:

$ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive:  assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar
  inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo

    org.apache.hadoop.security.AnnotatedSecurityInfo
    org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo
    org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo
    org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo
    org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo
    org.apache.hadoop.yarn.security.SchedulerSecurityInfo
    org.apache.hadoop.yarn.security.admin.AdminSecurityInfo
    org.apache.hadoop.yarn.server.RMNMSecurityInfoClass

23146a67

Merge pull request #164 from tdas/kafka-fix · dfd1ebc2

Matei Zaharia authored 11 years ago

Made block generator thread safe to fix Kafka bug.

This is a very important bug fix. Data can and was being lost in the kafka due to this.

dfd1ebc2

Made block generator thread safe to fix Kafka bug. · 7ccbbdac
Tathagata Das authored 11 years ago

7ccbbdac

Nov 11, 2013
- Enable stopping and starting a spot cluster · bc9f7eac
  Ankur Dave authored 11 years ago
  
  bc9f7eac
- Merge pull request #156 from haoyuan/master · 23b53efc
  Matei Zaharia authored 11 years ago
  
  add tachyon module
  23b53efc
- Add mockito to the sbt build · 17bb9a27
  tgravescs authored 11 years ago
  
  17bb9a27
- fix format error · e13da054
  Andrew xia authored 11 years ago
  
  e13da054
- cut lines to less than 100 · 37d2f374
  Andrew xia authored 11 years ago
  
  37d2f374
- Fix bug JIRA-923 · b3208063
  Andrew xia authored 11 years ago
  
  b3208063
Nov 10, 2013
- Made some changes according to suggestions from @aarondav · e2a43b3d
  Lian, Cheng authored 11 years ago
  
  e2a43b3d
- expose UI port only · 6f455553
  Haoyuan Li authored 11 years ago
  
  6f455553
- Put the periodical resubmitFailedStages() call into a scheduled task · ba552851
  Lian, Cheng authored 11 years ago
  
  ba552851
- Merge pull request #157 from rxin/kryo · 58d4f6c8
  Matei Zaharia authored 11 years ago
  
  3 Kryo related changes. 1. Call Kryo setReferences before calling user specified Kryo registrator. This is done so the user specified registrator can override the default setting. 2. Register more internal classes (MapStatus, BlockManagerId). 3. Slightly refactored the internal class registration to allocate less memory.
  58d4f6c8
- Moved the Spark internal class registration for Kryo into an object, and added... · c845611f
  Reynold Xin authored 11 years ago
  
  Moved the Spark internal class registration for Kryo into an object, and added more classes (e.g. MapStatus, BlockManagerId) to the registration.
  c845611f