Commits · e2c68642c64345434e2034082cf9b299491e9e9f · cs525-sp18-g07 / spark

Jan 01, 2014

Miscellaneous fixes from code review. · e2c68642

Matei Zaharia authored 11 years ago

Also replaced SparkConf.getOrElse with just a "get" that takes a default
value, and added getInt, getLong, etc to make code that uses this
simpler later on.

e2c68642

Merge remote-tracking branch 'apache/master' into conf2 · 45ff8f41

Matei Zaharia authored 11 years ago

Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
	core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala

45ff8f41

Merge pull request #312 from pwendell/log4j-fix-2 · c1d928a8

Patrick Wendell authored 11 years ago

SPARK-1008: Logging improvments

1. Adds a default log4j file that gets loaded if users haven't specified a log4j file.
2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings
after building with SBT (and I've seen similar warnings on the mailing list).

c1d928a8

Merge remote-tracking branch 'apache-github/master' into log4j-fix-2 · f8d245bd
Patrick Wendell authored 11 years ago
```
Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
```
f8d245bd
Merge remote-tracking branch 'apache/master' into conf2 · 0e5b2adb
Matei Zaharia authored 11 years ago
```
Conflicts:
	project/SparkBuild.scala
```
0e5b2adb

Dec 31, 2013

Merge pull request #314 from witgo/master · 9a0ff721
Reynold Xin authored 11 years ago
```
restore core/pom.xml file modification
```
9a0ff721
restore core/pom.xml file modification · b5d0b3b0
liguoqiang authored 11 years ago

b5d0b3b0

Merge pull request #73 from falaki/ApproximateDistinctCount · 8b8e70eb

Reynold Xin authored 11 years ago

Approximate distinct count

Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.

8b8e70eb

Adding outer checkout when initializing logging · 37c43c9d
Patrick Wendell authored 11 years ago

37c43c9d
Made the code more compact and readable · bee445c9
Hossein Falaki authored 11 years ago

bee445c9
minor improvements · acb03230
Hossein Falaki authored 11 years ago

acb03230
Fix two compile errors introduced in merge · 42bcfb2b
Matei Zaharia authored 11 years ago

42bcfb2b

Merge remote-tracking branch 'apache/master' into conf2 · ba9338f1

Matei Zaharia authored 11 years ago

Conflicts:
	core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
	streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala

ba9338f1

Merge pull request #238 from ngbinh/upgradeNetty · 63b411dd

Patrick Wendell authored 11 years ago

upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final

the changes are listed at https://github.com/netty/netty/wiki/New-and-noteworthy

63b411dd

Merge pull request #289 from tdas/filestream-fix · 55b7e2fd

Patrick Wendell authored 11 years ago

Bug fixes for file input stream and checkpointing

- Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.)
- Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration.
- Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten.
- Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.

55b7e2fd

Fixed comments and long lines based on comments on PR 289. · fcd17a1e
Tathagata Das authored 11 years ago

fcd17a1e
Tiny typo fix · 4abb0c57
Patrick Wendell authored 11 years ago

4abb0c57
Removing use in test · 4d009dca
Patrick Wendell authored 11 years ago

4d009dca
Minor fixes · 3c254f2e
Patrick Wendell authored 11 years ago

3c254f2e
Removing initLogging entirely · 18181e6c
Patrick Wendell authored 11 years ago

18181e6c

Dec 30, 2013
- Added Java unit tests for countApproxDistinct and countApproxDistinctByKey · d6cded71
  Hossein Falaki authored 11 years ago
  
  d6cded71
- Added Java API for countApproxDistinct · c3073b6c
  Hossein Falaki authored 11 years ago
  
  c3073b6c
- Added Java API for countApproxDistinctByKey · ed06500d
  Hossein Falaki authored 11 years ago
  
  ed06500d
- Added stream 2.5.1 jar depenency · b75d7c98
  Hossein Falaki authored 11 years ago
  
  b75d7c98
- Renamed countDistinct and countDistinctByKey methods to include Approx · a7de8e9b
  Hossein Falaki authored 11 years ago
  
  a7de8e9b
- Updated docs for SparkConf and handled review comments · 0fa58097
  Matei Zaharia authored 11 years ago
  
  0fa58097
- Using origin version · d50ccc5c
  Hossein Falaki authored 11 years ago
  
  d50ccc5c
- Response to Shivaram's review · 1cbef081
  Patrick Wendell authored 11 years ago
  
  1cbef081
- Merge pull request #308 from kayousterhout/stage_naming · 50e3b8ec
  Patrick Wendell authored 11 years ago
  
  Changed naming of StageCompleted event to be consistent The rest of the SparkListener events are named with "SparkListener" as the prefix of the name; this commit renames the StageCompleted event to SparkListenerStageCompleted for consistency.
  50e3b8ec
- SPARK-1008: Logging improvments · cffe1c1d
  Patrick Wendell authored 11 years ago
  
  1. Adds a default log4j file that gets loaded if users haven't specified a log4j file. 2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings after building with SBT (and I've seen similar warnings on the mailing list).
  cffe1c1d
Dec 29, 2013

Updated code style according to Patrick's comments · c2c1af39
Kay Ousterhout authored 11 years ago

c2c1af39
Properly show Spark properties on web UI, and change app name property · 994f080f
Matei Zaharia authored 11 years ago

994f080f
Fix some Python docs and make sure to unset SPARK_TESTING in Python · eaa8a68f
Matei Zaharia authored 11 years ago
```
tests so we don't get the test spark.conf on the classpath.
```
eaa8a68f

Added tests for SparkConf and fixed a bug · 11540b79

Matei Zaharia authored 11 years ago

Typesafe Config caches system properties the first time it's invoked
by default, ignoring later changes unless you do something special

11540b79

Fix a change that was lost during merge · 1ee7f5ae
Matei Zaharia authored 11 years ago

1ee7f5ae
Fix a few settings that were being read as system properties after merge · 0bd1900c
Matei Zaharia authored 11 years ago

0bd1900c

Merge remote-tracking branch 'origin/master' into conf2 · b4ceed40

Matei Zaharia authored 11 years ago

Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala

b4ceed40

Add Python docs about SparkConf · 58c6fa20
Matei Zaharia authored 11 years ago

58c6fa20

Fix some other Python tests due to initializing JVM in a different way · 615fb649

Matei Zaharia authored 11 years ago

The test in context.py created two different instances of the
SparkContext class by copying "globals", so that some tests can have a
global "sc" object and others can try initializing their own contexts.
This led to two JVM gateways being created since SparkConf also looked
at pyspark.context.SparkContext to get the JVM.

615fb649

Add SparkConf support in Python · cd00225d
Matei Zaharia authored 11 years ago

cd00225d