- Jan 01, 2014
-
-
Matei Zaharia authored
Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
-
Matei Zaharia authored
Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
-
Patrick Wendell authored
SPARK-1008: Logging improvments 1. Adds a default log4j file that gets loaded if users haven't specified a log4j file. 2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings after building with SBT (and I've seen similar warnings on the mailing list).
-
Patrick Wendell authored
Conflicts: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
-
Matei Zaharia authored
Conflicts: project/SparkBuild.scala
-
- Dec 31, 2013
-
-
Reynold Xin authored
restore core/pom.xml file modification
-
liguoqiang authored
-
Reynold Xin authored
Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
-
Patrick Wendell authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Matei Zaharia authored
-
Matei Zaharia authored
Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
-
Patrick Wendell authored
upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final the changes are listed at https://github.com/netty/netty/wiki/New-and-noteworthy
-
Patrick Wendell authored
Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
-
Tathagata Das authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Dec 30, 2013
-
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Hossein Falaki authored
-
Matei Zaharia authored
-
Hossein Falaki authored
-
Patrick Wendell authored
-
Patrick Wendell authored
Changed naming of StageCompleted event to be consistent The rest of the SparkListener events are named with "SparkListener" as the prefix of the name; this commit renames the StageCompleted event to SparkListenerStageCompleted for consistency.
-
Patrick Wendell authored
1. Adds a default log4j file that gets loaded if users haven't specified a log4j file. 2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings after building with SBT (and I've seen similar warnings on the mailing list).
-
- Dec 29, 2013
-
-
Kay Ousterhout authored
-
Matei Zaharia authored
-
Matei Zaharia authored
tests so we don't get the test spark.conf on the classpath.
-
Matei Zaharia authored
Typesafe Config caches system properties the first time it's invoked by default, ignoring later changes unless you do something special
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
-
Matei Zaharia authored
-
Matei Zaharia authored
The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
-
Matei Zaharia authored
-