Commits · 80aabc0bd33dc5661a90133156247e7a8c1bf7f5 · cs525-sp18-g07 / spark

Nov 28, 2016
- Preparing Spark release v2.1.0-rc1 · 80aabc0b
  Patrick Wendell authored 8 years ago
  
  80aabc0b
Jul 19, 2016

[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition... · 21a6dd2a

Xin Ren authored 8 years ago

[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent

https://issues.apache.org/jira/browse/SPARK-16535

## What changes were proposed in this pull request?

When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot
```
Definition of groupId is redundant, because it's inherited from the parent
```
![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png)

I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok.
```
<groupId>org.apache.spark</groupId>
```
As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1).

ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762

## How was this patch tested?

I've tested by re-building the project, and build succeeded.

Author: Xin Ren <iamshrek@126.com>

Closes #14189 from keypointt/SPARK-16535.

21a6dd2a

Jul 11, 2016

[SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05

Reynold Xin authored 8 years ago

## What changes were proposed in this pull request?
After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #14130 from rxin/SPARK-16477.

ffcb6e05

May 17, 2016

[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags · 122302cb

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

(See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.)

Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags`

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13074 from srowen/SPARK-15290.

122302cb

Apr 28, 2016

Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local" · 9c7c42bc
Yin Huai authored 8 years ago
```
This reverts commit dae538a4.
```
9c7c42bc

[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local · dae538a4

Pravin Gadakh authored 8 years ago

## What changes were proposed in this pull request?

This PR adds `since` tag into the matrix and vector classes in spark-mllib-local.

## How was this patch tested?

Scala-style checks passed.

Author: Pravin Gadakh <prgadakh@in.ibm.com>

Closes #12416 from pravingadakh/SPARK-14613.

dae538a4

Mar 21, 2016

[SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule · 20fd2541

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

[Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables **LineLength** checkstyle again. To help that, this also introduces **RedundantImport** and **RedundantModifier**, too. The following is the diff on `checkstyle.xml`.

```xml
-        <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places -->
-        <!--
         <module name="LineLength">
             <property name="max" value="100"/>
             <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/>
         </module>
-        -->
         <module name="NoLineWrap"/>
         <module name="EmptyBlock">
             <property name="option" value="TEXT"/>
 -167,5 +164,7
         </module>
         <module name="CommentsIndentation"/>
         <module name="UnusedImports"/>
+        <module name="RedundantImport"/>
+        <module name="RedundantModifier"/>
```

## How was this patch tested?

Currently, `lint-java` is disabled in Jenkins. It needs a manual test.
After passing the Jenkins tests, `dev/lint-java` should passes locally.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11831 from dongjoon-hyun/SPARK-14011.

20fd2541

Mar 16, 2016

[SPARK-13823][SPARK-13397][SPARK-13395][CORE] More warnings, StandardCharset follow up · 3b461d9e

Sean Owen authored 9 years ago

## What changes were proposed in this pull request?

Follow up to https://github.com/apache/spark/pull/11657

- Also update `String.getBytes("UTF-8")` to use `StandardCharsets.UTF_8`
- And fix one last new Coverity warning that turned up (use of unguarded `wait()` replaced by simpler/more robust `java.util.concurrent` classes in tests)
- And while we're here cleaning up Coverity warnings, just fix about 15 more build warnings

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #11725 from srowen/SPARK-13823.2.

3b461d9e

Feb 26, 2016

[MINOR][SQL] Fix modifier order. · 727e7801

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR fixes the order of modifier from `abstract public` into `public abstract`.
Currently, when we run `./dev/lint-java`, it shows the error.
```
Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/util/sketch/CountMinSketch.java:[53,10] (modifier) ModifierOrder: 'public' modifier out of order with the JLS suggestions.
```

## How was this patch tested?

```
$ ./dev/lint-java
Checkstyle checks passed.
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11390 from dongjoon-hyun/fix_modifier_order.

727e7801

Feb 22, 2016

[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments · 024482bf

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.

024482bf

Jan 30, 2016

[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2

Josh Rosen authored 9 years ago

This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).

The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).

After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10608 from JoshRosen/SPARK-6363.

289373b2

Jan 29, 2016

[SPARK-12818] Polishes spark-sketch module · 2b027e9a

Cheng Lian authored 9 years ago

Fixes various minor code and Javadoc styling issues.

Author: Cheng Lian <lian@databricks.com>

Closes #10985 from liancheng/sketch-polishing.

2b027e9a

[SPARK-13050][BUILD] Scalatest tags fail build with the addition of the sketch module · 8d3cc3de

Alex Bozarth authored 9 years ago

A dependency on the spark test tags was left out of the sketch module pom file causing builds to fail when test tags were used. This dependency is found in the pom file for every other module in spark.

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #10954 from ajbozarth/spark13050.

8d3cc3de

Jan 28, 2016

[SPARK-12818][SQL] Specialized integral and string types for Count-min Sketch · 415d0a85

Cheng Lian authored 9 years ago

This PR is a follow-up of #10911. It adds specialized update methods for `CountMinSketch` so that we can avoid doing internal/external row format conversion in `DataFrame.countMinSketch()`.

Author: Cheng Lian <lian@databricks.com>

Closes #10968 from liancheng/cms-specialized.

415d0a85

Jan 27, 2016

[SPARK-12938][SQL] DataFrame API for Bloom filter · 680afabe

Wenchen Fan authored 9 years ago

This PR integrates Bloom filter from spark-sketch into DataFrame. This version resorts to RDD.aggregate for building the filter. A more performant UDAF version can be built in future follow-up PRs.

This PR also add 2 specify `put` version(`putBinary` and `putLong`) into `BloomFilter`, which makes it easier to build a Bloom filter over a `DataFrame`.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10937 from cloud-fan/bloom-filter.

680afabe

Jan 26, 2016

[SPARK-12935][SQL] DataFrame API for Count-Min Sketch · ce38a35b

Cheng Lian authored 9 years ago

This PR integrates Count-Min Sketch from spark-sketch into DataFrame. This version resorts to `RDD.aggregate` for building the sketch. A more performant UDAF version can be built in future follow-up PRs.

Author: Cheng Lian <lian@databricks.com>

Closes #10911 from liancheng/cms-df-api.

ce38a35b

[SPARK-12937][SQL] bloom filter serialization · 6743de3a

Wenchen Fan authored 9 years ago

This PR adds serialization support for BloomFilter.

A version number is added to version the serialized binary format.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10920 from cloud-fan/bloom-filter.

6743de3a

Jan 25, 2016

[SPARK-12934] use try-with-resources for streams · fdcc3512

tedyu authored 9 years ago

liancheng please take a look

Author: tedyu <yuzhihong@gmail.com>

Closes #10906 from tedyu/master.

fdcc3512

[SPARK-12936][SQL] Initial bloom filter implementation · 109061f7

Wenchen Fan authored 9 years ago

This PR adds an initial implementation of bloom filter in the newly added sketch module.  The implementation is based on the [`BloomFilter` class in guava](https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/hash/BloomFilter.java).

Some difference from the design doc:

* expose `bitSize` instead of `sizeInBytes` to user.
* always need the `expectedInsertions` parameter when create bloom filter.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10883 from cloud-fan/bloom-filter.

109061f7

[SPARK-12934][SQL] Count-min sketch serialization · 6f0f1d9e

Cheng Lian authored 9 years ago

This PR adds serialization support for `CountMinSketch`.

A version number is added to version the serialized binary format.

Author: Cheng Lian <lian@databricks.com>

Closes #10893 from liancheng/cms-serialization.

6f0f1d9e

Jan 23, 2016

[SPARK-12933][SQL] Initial implementation of Count-Min sketch · 1c690dda

Cheng Lian authored 9 years ago

This PR adds an initial implementation of count min sketch, contained in a new module spark-sketch under `common/sketch`. The implementation is based on the [`CountMinSketch` class in stream-lib][1].

As required by the [design doc][2], spark-sketch should have no external dependency.
Two classes, `Murmur3_x86_32` and `Platform` are copied to spark-sketch from spark-unsafe for hashing facilities. They'll also be used in the upcoming bloom filter implementation.

The following features will be added in future follow-up PRs:

- Serialization support
- DataFrame API integration

[1]: https://github.com/addthis/stream-lib/blob/aac6b4d23a8686b000f80baa447e0922ecac3bcb/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java
[2]: https://issues.apache.org/jira/secure/attachment/12782378/BloomFilterandCount-MinSketchinSpark2.0.pdf

Author: Cheng Lian <lian@databricks.com>

Closes #10851 from liancheng/count-min-sketch.

1c690dda