- Nov 17, 2013
-
-
Reynold Xin authored
Add PrimitiveVectorSuite and fix bug in resize()
-
Aaron Davidson authored
-
Reynold Xin authored
-
Reynold Xin authored
1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.
-
- Nov 16, 2013
-
-
Matei Zaharia authored
Simple cleanup on Spark's Scala code Simple cleanup on Spark's Scala code while testing some modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method calls like size that does not have size effect.
-
- Nov 15, 2013
-
-
Henry Saputra authored
-) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method call like size that does not have size effect.
-
Matei Zaharia authored
Fix bug where scheduler could hang after task failure. When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched. This isn't currently unit tested but will be once my pull request for merging the cluster and local schedulers goes in -- at which point many more of the unit tests will exercise the code paths through the cluster scheduler (currently the failure test suite uses the local scheduler, which is why we didn't see this bug before).
-
- Nov 14, 2013
-
-
Matei Zaharia authored
Don't retry tasks when they fail due to a NotSerializableException As with my previous pull request, this will be unit tested once the Cluster and Local schedulers get merged.
-
Matei Zaharia authored
Write Spark UI url to driver file on HDFS This makes the SIMR code path simpler
-
Kay Ousterhout authored
-
Kay Ousterhout authored
When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched.
-
Reynold Xin authored
Don't ignore spark.cores.max when using Mesos Coarse mode totalCoresAcquired is decremented but never incremented, causing Spark to effectively ignore spark.cores.max in coarse grained Mesos mode.
-
Reynold Xin authored
Fixed a scaladoc typo in HadoopRDD.scala
-
Reynold Xin authored
Fixed typos in the CDH4 distributions version codes. Nothing important, but annoying when doing a copy/paste...
-
RIA-pierre-borckmans authored
-
Lian, Cheng authored
-
Kay Ousterhout authored
-
- Nov 13, 2013
-
-
Matei Zaharia authored
Migrate the daemon thread started by DAGScheduler to Akka actor `DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`. This is a classical actor use case. By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread). But things become a little complicated when taking existing test code into consideration. Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`. To minimize code change, I chose to let the actor to delegate messages to `processEvent`. Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct. Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null. But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`. What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?). I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold. That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896). At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.
-
Matei Zaharia authored
spark-assembly.jar fails to authenticate with YARN ResourceManager The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.
-
Ahir Reddy authored
-
Matei Zaharia authored
SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to ... ...be retrieved by SIMR clients
-
- Nov 12, 2013
-
-
Matei Zaharia authored
Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs. Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate. Also add a bit of error handling for missing arguments.
-
Matei Zaharia authored
Enable stopping and starting a spot cluster Clusters launched using `--spot-price` contain an on-demand master and spot slaves. Because EC2 does not support stopping spot instances, the spark-ec2 script previously could only destroy such clusters. This pull request makes it possible to stop and restart a spot cluster. * The `stop` command works as expected for a spot cluster: the master is stopped and the slaves are terminated. * To start a stopped spot cluster, the user must invoke `launch --use-existing-master`. This launches fresh spot slaves but resumes the existing master.
-
Matei Zaharia authored
Fix bug JIRA-923 Fix column sort issue in UI for JIRA-923. https://spark-project.atlassian.net/browse/SPARK-923 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
-
Ahir Reddy authored
SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to be retrieved by SIMR clients
-
Nathan Howell authored
-
Nathan Howell authored
sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement: DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583) WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] This previously would just contain a single class: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo Archive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo And now has the full list of classes: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo org.apache.hadoop.yarn.security.SchedulerSecurityInfo org.apache.hadoop.yarn.security.admin.AdminSecurityInfo org.apache.hadoop.yarn.server.RMNMSecurityInfoClass
-
Matei Zaharia authored
Made block generator thread safe to fix Kafka bug. This is a very important bug fix. Data can and was being lost in the kafka due to this.
-
Tathagata Das authored
-
- Nov 11, 2013
-
-
Ankur Dave authored
-
Matei Zaharia authored
add tachyon module
-
tgravescs authored
-
Andrew xia authored
-
Andrew xia authored
-
Andrew xia authored
-
- Nov 10, 2013
-
-
Lian, Cheng authored
-
Haoyuan Li authored
-
Lian, Cheng authored
-
Matei Zaharia authored
3 Kryo related changes. 1. Call Kryo setReferences before calling user specified Kryo registrator. This is done so the user specified registrator can override the default setting. 2. Register more internal classes (MapStatus, BlockManagerId). 3. Slightly refactored the internal class registration to allocate less memory.
-
Reynold Xin authored
Moved the Spark internal class registration for Kryo into an object, and added more classes (e.g. MapStatus, BlockManagerId) to the registration.
-