- Dec 31, 2013
-
-
Reynold Xin authored
Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Patrick Wendell authored
upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final the changes are listed at https://github.com/netty/netty/wiki/New-and-noteworthy
-
Patrick Wendell authored
Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
-
Tathagata Das authored
-
- Dec 30, 2013
-
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Patrick Wendell authored
Changed naming of StageCompleted event to be consistent The rest of the SparkListener events are named with "SparkListener" as the prefix of the name; this commit renames the StageCompleted event to SparkListenerStageCompleted for consistency.
-
- Dec 29, 2013
-
-
Kay Ousterhout authored
-
Reynold Xin authored
This reverts commit 79b20e4d, reversing changes made to 7375047d.
-
Reynold Xin authored
Fix typo in the Accumulators section Change 'val' to 'var'
-
- Dec 28, 2013
-
-
Jyun-Fan Tsai authored
val => var
-
Patrick Wendell authored
Removed unused failed and causeOfFailure variables (in TaskSetManager)
-
- Dec 27, 2013
-
-
Matei Zaharia authored
Removed unused OtherFailure TaskEndReason. The OtherFailure TaskEndReason was added by @mateiz 3 years ago in this commit: https://github.com/apache/incubator-spark/commit/24a1e7f8380bfd8d4fbdda688482a451bd6ea215 Unless I am missing something, it doesn't seem to have been used then, and is not used now, so seems safe for deletion.
-
Matei Zaharia authored
Remove unused hasPendingTasks methods
-
Kay Ousterhout authored
The rest of the SparkListener events are named with "SparkListener" as the prefix of the name; this commit renames the StageCompleted event to SparkListenerStageCompleted for consistency.
-
Kay Ousterhout authored
-
Kay Ousterhout authored
-
Patrick Wendell authored
Fixed >100char lines in DAGScheduler.scala There's no changed functionality here -- only line spacing and one grammatical fix in a comment.
-
Tathagata Das authored
-
Kay Ousterhout authored
-
Kay Ousterhout authored
-
Binh Nguyen authored
Also clean up a bit.
-
Kay Ousterhout authored
-
Reynold Xin authored
Minor: Decrease margin of left side of Log page Before  After  It's a start anyway...
-
Reynold Xin authored
SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10 Reported by Qiuzhuang Lian
-
Patrick Wendell authored
-
- Dec 26, 2013
-
-
Matei Zaharia authored
Avoid a lump of coal (NPE) in JobProgressListener's stocking.
-
Aaron Davidson authored
-
Tathagata Das authored
-
Tathagata Das authored
Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored).
-
Matei Zaharia authored
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn package
-
Tathagata Das authored
Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions.
-
liguoqiang authored
-
Mark Hamstra authored
-