Commits · 2d4f6e70f7de50489c2b5f0d6a4756c3b1aace7d · cs525-sp18-g07 / spark

Dec 02, 2014

Minor nit style cleanup in GraphX. · 2d4f6e70
Reynold Xin authored 10 years ago

2d4f6e70

[SPARK-4695][SQL] Get result using executeCollect · 3ae0cda8

wangfei authored 10 years ago

Using ```executeCollect``` to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect

Author: wangfei <wangfei1@huawei.com>

Closes #3547 from scwf/executeCollect and squashes the following commits:

a5ab68e [wangfei] Revert "adding debug info"
a60d680 [wangfei] fix test failure
0db7ce8 [wangfei] adding debug info
184c594 [wangfei] using executeCollect instead collect

3ae0cda8

[SPARK-4670] [SQL] wrong symbol for bitwise not · 1f5ddf17

Daoyuan Wang authored 10 years ago

We should use `~` instead of `-` for bitwise NOT.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3528 from adrian-wang/symbol and squashes the following commits:

affd4ad [Daoyuan Wang] fix code gen test case
56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type
f55fbae [Daoyuan Wang] wrong symbol for bitwise not

1f5ddf17

[SPARK-4593][SQL] Return null when denominator is 0 · f6df609d

Daoyuan Wang authored 10 years ago

SELECT max(1/0) FROM src
would return a very large number, which is obviously not right.
For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0.
I think it is better to keep our behavior with newer Hive version.
This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3443 from adrian-wang/div and squashes the following commits:

2e98677 [Daoyuan Wang] fix code gen for divide 0
85c28ba [Daoyuan Wang] temp
36236a5 [Daoyuan Wang] add test cases
6f5716f [Daoyuan Wang] fix comments
cee92bd [Daoyuan Wang] avoid evaluation 2 times
22ecd9a [Daoyuan Wang] fix style
cf28c58 [Daoyuan Wang] divide fix
2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type

f6df609d

[SPARK-4676][SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null · 10664276

YanTangZhai authored 10 years ago

val jsc = new org.apache.spark.api.java.JavaSparkContext(sc)
val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc)
val nrdd = jhc.hql("select null from spark_test.for_test")
println(nrdd.schema)
Then the error is thrown as follows:
scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$)
at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43)

Author: YanTangZhai <hakeemzhai@tencent.com>
Author: yantangzhai <tyz0303@163.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #3538 from YanTangZhai/MatchNullType and squashes the following commits:

e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null
4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null
896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null
6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
e249846 [YanTangZhai] Merge pull request #10 from apache/master
d26d982 [YanTangZhai] Merge pull request #9 from apache/master
76d4027 [YanTangZhai] Merge pull request #8 from apache/master
03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
8a00106 [YanTangZhai] Merge pull request #6 from apache/master
cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
cdef539 [YanTangZhai] Merge pull request #1 from apache/master

10664276

[SPARK-4663][sql]add finally to avoid resource leak · 69b6fed2

baishuo authored 10 years ago

Author: baishuo <vc_java@hotmail.com>

Closes #3526 from baishuo/master-trycatch and squashes the following commits:

d446e14 [baishuo] correct the code style
b36bf96 [baishuo] correct the code style
ae0e447 [baishuo] add finally to avoid resource leak

69b6fed2

[SPARK-4536][SQL] Add sqrt and abs to Spark SQL DSL · e75e04f9

Kousuke Saruta authored 10 years ago

Spark SQL has embeded sqrt and abs but DSL doesn't support those functions.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits:

07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite
8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL

e75e04f9

Indent license header properly for interfaces.scala. · b1f8fe31

Reynold Xin authored 10 years ago

A very small nit update.

Author: Reynold Xin <rxin@databricks.com>

Closes #3552 from rxin/license-header and squashes the following commits:

df8d1a4 [Reynold Xin] Indent license header properly for interfaces.scala.

b1f8fe31

[SPARK-4686] Link to allowed master URLs is broken · d9a148ba

Kay Ousterhout authored 10 years ago

The link points to the old scala programming guide; it should point to the submitting applications page.

This should be backported to 1.1.2 (it's been broken as of 1.0).

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #3542 from kayousterhout/SPARK-4686 and squashes the following commits:

a8fc43b [Kay Ousterhout] [SPARK-4686] Link to allowed master URLs is broken

d9a148ba

[SPARK-4397][Core] Cleanup 'import SparkContext._' in core · 6dfe38a0

zsxwing authored 10 years ago

This PR cleans up `import SparkContext._` in core for SPARK-4397(#3262) to prove it really works well.

Author: zsxwing <zsxwing@gmail.com>

Closes #3530 from zsxwing/SPARK-4397-cleanup and squashes the following commits:

04e2273 [zsxwing] Cleanup 'import SparkContext._' in core

6dfe38a0

Dec 01, 2014

[SPARK-4611][MLlib] Implement the efficient vector norm · 64f3175b

DB Tsai authored 10 years ago

The vector norm in breeze is implemented by `activeIterator` which is known to be very slow.
In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and
`k-means` have big performance improvement.

Here is the benchmark against mnist8m dataset.

a) `Normalizer`
Before
DenseVector: 68.25secs
SparseVector: 17.01secs

With this PR
DenseVector: 12.71secs
SparseVector: 2.73secs

b) `k-means`
Before
DenseVector: 83.46secs
SparseVector: 61.60secs

With this PR
DenseVector: 70.04secs
SparseVector: 59.05secs

Author: DB Tsai <dbtsai@alpinenow.com>

Closes #3462 from dbtsai/norm and squashes the following commits:

63c7165 [DB Tsai] typo
0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back
6fa616c [DB Tsai] address feedback
9b7cb56 [DB Tsai] move norm to static method
0b632e6 [DB Tsai] kmeans
dbed124 [DB Tsai] style
c1a877c [DB Tsai] first commit

64f3175b

MAINTENANCE: Automated closing of pull requests. · b0a46d89

Patrick Wendell authored 10 years ago

This commit exists to close the following pull requests on Github:

Closes #1612 (close requested by 'marmbrus')
Closes #2723 (close requested by 'marmbrus')
Closes #1737 (close requested by 'marmbrus')
Closes #2252 (close requested by 'marmbrus')
Closes #2029 (close requested by 'marmbrus')
Closes #2386 (close requested by 'marmbrus')
Closes #2997 (close requested by 'marmbrus')

b0a46d89

[SPARK-4268][SQL] Use #::: to get benefit from Stream in SqlLexical.allCaseVersions · d3e02ddd

zsxwing authored 10 years ago

In addition, using `s.isEmpty` to eliminate the string comparison.

Author: zsxwing <zsxwing@gmail.com>

Closes #3132 from zsxwing/SPARK-4268 and squashes the following commits:

358e235 [zsxwing] Improvement of allCaseVersions

d3e02ddd

[SPARK-4529] [SQL] support view with column alias · 4df60a8c

Daoyuan Wang authored 10 years ago

Support view definition like

CREATE VIEW view3(valoo)
TBLPROPERTIES ("fear" = "factor")
AS SELECT upper(value) FROM src WHERE key=86;

[valoo as the alias of upper(value)]. This is missing part of SPARK-4239, for a fully view support.

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3396 from adrian-wang/viewcolumn and squashes the following commits:

4d001d0 [Daoyuan Wang] support view with column alias

4df60a8c

[SQL][DOC] Date type in SQL programming guide · 5edbcbfb

Daoyuan Wang authored 10 years ago

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #3535 from adrian-wang/datedoc and squashes the following commits:

18ff1ed [Daoyuan Wang] [DOC] Date type

5edbcbfb

[SQL] Minor fix for doc and comment · 7b799578

wangfei authored 10 years ago

Author: wangfei <wangfei1@huawei.com>

Closes #3533 from scwf/sql-doc1 and squashes the following commits:

962910b [wangfei] doc and comment fix

7b799578

[SPARK-4658][SQL] Code documentation issue in DDL of datasource API · bc353819

ravipesala authored 10 years ago

Author: ravipesala <ravindra.pesala@huawei.com>

Closes #3516 from ravipesala/ddl_doc and squashes the following commits:

d101fdf [ravipesala] Style issues fixed
d2238cd [ravipesala] Corrected documentation

bc353819

[SPARK-4650][SQL] Supporting multi column support in countDistinct function... · 6a9ff19d

ravipesala authored 10 years ago

[SPARK-4650][SQL] Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL

Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL

Author: ravipesala <ravindra.pesala@huawei.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #3511 from ravipesala/countdistinct and squashes the following commits:

cc4dbb1 [ravipesala] style
070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL

6a9ff19d

[SPARK-4358][SQL] Let BigDecimal do checking type compatibility · b57365a1

Liang-Chi Hsieh authored 10 years ago

Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #3208 from viirya/more_numericLit and squashes the following commits:

e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal.
1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer.
cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast.
91fe489 [Liang-Chi Hsieh] add Byte and Short.
1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility.

b57365a1

[SQL] add @group tab in limit() and count() · bafee67e

Jacky Li authored 10 years ago

group tab is missing for scaladoc

Author: Jacky Li <jacky.likun@gmail.com>

Closes #3458 from jackylk/patch-7 and squashes the following commits:

0121a70 [Jacky Li] add @group tab in limit() and count()

bafee67e

[SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown · 5db8dcaf

Cheng Lian authored 10 years ago

Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits:

2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown

5db8dcaf

Documentation: add description for repartitionAndSortWithinPartitions · 2b233f5f

Madhu Siddalingaiah authored 10 years ago

Author: Madhu Siddalingaiah <madhu@madhu.com>

Closes #3390 from msiddalingaiah/master and squashes the following commits:

cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again)
332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code>
cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions

2b233f5f

[SPARK-4661][Core] Minor code and docs cleanup · 30a86acd

zsxwing authored 10 years ago

Author: zsxwing <zsxwing@gmail.com>

Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits:

03cbe3f [zsxwing] Minor code and docs cleanup

30a86acd

[SPARK-4664][Core] Throw an exception when spark.akka.frameSize > 2047 · 1d238f22

zsxwing authored 10 years ago

If `spark.akka.frameSize` > 2047, it will overflow and become negative. Should have some assertion in `maxFrameSizeBytes` to warn people.

Author: zsxwing <zsxwing@gmail.com>

Closes #3527 from zsxwing/SPARK-4664 and squashes the following commits:

0089c7a [zsxwing] Throw an exception when spark.akka.frameSize > 2047

1d238f22

SPARK-2192 [BUILD] Examples Data Not in Binary Distribution · 6384f42a

Sean Owen authored 10 years ago

Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI.

Author: Sean Owen <sowen@cloudera.com>

Closes #3480 from srowen/SPARK-2192 and squashes the following commits:

47688f1 [Sean Owen] Add data/ to distributions

6384f42a

Fix wrong file name pattern in .gitignore · 97eb6d7f

Kousuke Saruta authored 10 years ago

In .gitignore, there is an entry for spark-*-bin.tar.gz but considering make-distribution.sh, the name pattern should be spark-*-bin-*.tgz.

This change is really small so I don't open issue in JIRA. If it's needed, please let me know.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #3529 from sarutak/fix-wrong-tgz-pattern and squashes the following commits:

de3c70a [Kousuke Saruta] Fixed wrong file name pattern in .gitignore

97eb6d7f

Nov 30, 2014

[SPARK-4632] version update · 5e7a6dcb

Prabeesh K authored 10 years ago

Author: Prabeesh K <prabsmails@gmail.com>

Closes #3495 from prabeesh/master and squashes the following commits:

ab03d50 [Prabeesh K] Update pom.xml
8c6437e [Prabeesh K] Revert
e10b40a [Prabeesh K] version update
dbac9eb [Prabeesh K] Revert
ec0b1c3 [Prabeesh K] [SPARK-4632] version update
a835505 [Prabeesh K] [SPARK-4632] version update
831391b [Prabeesh K]  [SPARK-4632] version update

5e7a6dcb

MAINTENANCE: Automated closing of pull requests. · 06dc1b15

Patrick Wendell authored 10 years ago

This commit exists to close the following pull requests on Github:

Closes #2915 (close requested by 'JoshRosen')
Closes #3140 (close requested by 'JoshRosen')
Closes #3366 (close requested by 'JoshRosen')

06dc1b15

[DOC] Fixes formatting typo in SQL programming guide · 2a4d389f

Cheng Lian authored 10 years ago

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #3498 from liancheng/fix-sql-doc-typo and squashes the following commits:

865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide

2a4d389f

[SPARK-4656][Doc] Typo in Programming Guide markdown · a217ec5f

lewuathe authored 10 years ago

Grammatical error in Programming Guide document

Author: lewuathe <lewuathe@me.com>

Closes #3412 from Lewuathe/typo-programming-guide and squashes the following commits:

a3e2f00 [lewuathe] Typo in Programming Guide markdown

a217ec5f

[SPARK-4623]Add the some error infomation if using spark-sql in yarn-cluster mode · aea7a997

carlmartin authored 10 years ago

If using spark-sql in yarn-cluster mode, print an error infomation just as the spark shell in yarn-cluster mode.

Author: carlmartin <carlmartinmax@gmail.com>
Author: huangzhaowei <carlmartinmax@gmail.com>

Closes #3479 from SaintBacchus/sparkSqlShell and squashes the following commits:

35829a9 [carlmartin] improve the description of comment
e6c1eb7 [carlmartin] add a comment in bin/spark-sql to remind user who wants to change the class
f1c5c8d [carlmartin] Merge branch 'master' into sparkSqlShell
8e112c5 [huangzhaowei] singular form
ec957bc [carlmartin] Add the some error infomation if using spark-sql in yarn-cluster mode
7bcecc2 [carlmartin] Merge branch 'master' of https://github.com/apache/spark into codereview
4fad75a [carlmartin] Add the Error infomation using spark-sql in yarn-cluster mode

aea7a997

SPARK-2143 [WEB UI] Add Spark version to UI footer · 048ecca6

Sean Owen authored 10 years ago

This PR adds the Spark version number to the UI footer; this is how it looks:

![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png)

Author: Sean Owen <sowen@cloudera.com>

Closes #3410 from srowen/SPARK-2143 and squashes the following commits:

e9b3a7a [Sean Owen] Add Spark version to footer

048ecca6

Nov 29, 2014

[DOCS][BUILD] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'. · 0fcd24cc

Takuya UESHIN authored 10 years ago

To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes #3361 from ueshin/docs/building-spark_2.11 and squashes the following commits:

1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'.

0fcd24cc

SPARK-4507: PR merge script should support closing multiple JIRA tickets · 4316a7b0

Takayuki Hasegawa authored 10 years ago

This will fix SPARK-4507.

For pull requests that reference multiple JIRAs in their titles, it would be helpful if the PR merge script offered to close all of them.

Author: Takayuki Hasegawa <takayuki.hasegawa0311@gmail.com>

Closes #3428 from hase1031/SPARK-4507 and squashes the following commits:

bf6d64b [Takayuki Hasegawa] SPARK-4507: try to resolve issue when no JIRAs in title
401224c [Takayuki Hasegawa] SPARK-4507: moved codes as before
ce89021 [Takayuki Hasegawa] SPARK-4507: PR merge script should support closing multiple JIRA tickets

4316a7b0

[SPARK-4505][Core] Add a ClassTag parameter to CompactBuffer[T] · c0622242

zsxwing authored 10 years ago

Added a ClassTag parameter to CompactBuffer. So CompactBuffer[T] can create primitive arrays for primitive types. It will reduce the memory usage for primitive types significantly and only pay minor performance lost.

Here is my test code:
```Scala
  // Call org.apache.spark.util.SizeEstimator.estimate
  def estimateSize(obj: AnyRef): Long = {
    val c = Class.forName("org.apache.spark.util.SizeEstimator$")
    val f = c.getField("MODULE$")
    val o = f.get(c)
    val m = c.getMethod("estimate", classOf[Object])
    m.setAccessible(true)
    m.invoke(o, obj).asInstanceOf[Long]
  }

  sc.parallelize(1 to 10000).groupBy(_ => 1).foreach {
    case (k, v) =>
      println(v.getClass() + " size: " + estimateSize(v))
  }
```

Using the previous CompactBuffer outputed
```
class org.apache.spark.util.collection.CompactBuffer size: 313358
```

Using the new CompactBuffer outputed
```
class org.apache.spark.util.collection.CompactBuffer size: 65712
```

In this case, the new `CompactBuffer` only used 20% memory of the previous one. It's really helpful for `groupByKey` when using a primitive value.

Author: zsxwing <zsxwing@gmail.com>

Closes #3378 from zsxwing/SPARK-4505 and squashes the following commits:

4abdbba [zsxwing] Add a ClassTag parameter to reduce the memory usage of CompactBuffer[T] when T is a primitive type

c0622242

[SPARK-4057] Use -agentlib instead of -Xdebug in sbt-launch-lib.bash for debugging · 938dc141

Kousuke Saruta authored 10 years ago

In -launch-lib.bash, -Xdebug option is used for debugging. We should use -agentlib option for Java 6+.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2904 from sarutak/SPARK-4057 and squashes the following commits:

39b5320 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4057
26b4af8 [Kousuke Saruta] Improved java option for debugging

938dc141

Include the key name when failing on an invalid value. · 95290bf4

Stephen Haberman authored 10 years ago

Admittedly a really small tweak.

Author: Stephen Haberman <stephen@exigencecorp.com>

Closes #3514 from stephenh/include-key-name-in-npe and squashes the following commits:

937740a [Stephen Haberman] Include the key name when failing on an invalid value.

95290bf4

[SPARK-3398] [SPARK-4325] [EC2] Use EC2 status checks. · 317e114e

Nicholas Chammas authored 10 years ago

This PR re-introduces [0e648bc](https://github.com/apache/spark/commit/0e648bc2bedcbeb55fce5efac04f6dbad9f063b4) from PR #2339, which somehow never made it into the codebase.

Additionally, it removes a now-unnecessary linear backoff on the SSH checks since we are blocking on EC2 status checks before testing SSH.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #3195 from nchammas/remove-ec2-ssh-backoff and squashes the following commits:

efb29e1 [Nicholas Chammas] Revert "Remove linear backoff."
ef3ca99 [Nicholas Chammas] reuse conn
adb4eaa [Nicholas Chammas] Remove linear backoff.
55caa24 [Nicholas Chammas] Check EC2 status checks before SSH.

317e114e

Nov 28, 2014

MAINTENANCE: Automated closing of pull requests. · 047ff573

Patrick Wendell authored 10 years ago

This commit exists to close the following pull requests on Github:

Closes #3451 (close requested by 'pwendell')
Closes #1310 (close requested by 'pwendell')
Closes #3207 (close requested by 'JoshRosen')

047ff573

[SPARK-4597] Use proper exception and reset variable in Utils.createTempDir() · 49fe8797

Liang-Chi Hsieh authored 10 years ago

`File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #3449 from viirya/fix_createtempdir and squashes the following commits:

36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.

49fe8797