Commits · 6c1b981c3fad671bff4795f061bd40e111956621 · cs525-sp18-g07 / spark

Oct 28, 2014

[SPARK-4058] [PySpark] Log file name is hard coded even though there is a variable '$LOG_FILE ' · 6c1b981c

Kousuke Saruta authored 10 years ago

In a script 'python/run-tests', log file name is represented by a variable 'LOG_FILE' and it is used in run-tests. But, there are some hard-coded log file name in the script.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2905 from sarutak/SPARK-4058 and squashes the following commits:

7710490 [Kousuke Saruta] Fixed python/run-tests not to use hard-coded log file name

6c1b981c

[SPARK-4065] Add check for IPython on Windows · 2f254dac

Michael Griffiths authored 10 years ago

This issue employs logic similar to the bash launcher (pyspark) to check
if IPTYHON=1, and if so launch ipython with options in IPYTHON_OPTS.
This fix assumes that ipython is available in the system Path, and can
be invoked with a plain "ipython" command.

Author: Michael Griffiths <msjgriffiths@gmail.com>

Closes #2910 from msjgriffiths/pyspark-windows and squashes the following commits:

ef34678 [Michael Griffiths] Change build message to comply with [SPARK-3775]
361e3d8 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows
9ce72d1 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows

2f254dac

[SPARK-4089][Doc][Minor] The version number of Spark in _config.yaml is wrong. · 4d52cec2

Kousuke Saruta authored 10 years ago

The version number of Spark in docs/_config.yaml for master branch should be 1.2.0 for now.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2943 from sarutak/SPARK-4089 and squashes the following commits:

aba7fb4 [Kousuke Saruta] Fixed the version number of Spark in _config.yaml

4d52cec2

[SPARK-3657] yarn alpha YarnRMClientImpl throws NPE... · 247c529b

Kousuke Saruta authored 10 years ago

[SPARK-3657] yarn alpha YarnRMClientImpl throws NPE appMasterRequest.setTrackingUrl starting spark-shell

tgravescs reported this issue.

Following is quoted from tgravescs' report.

YarnRMClientImpl.registerApplicationMaster can throw null pointer exception when setting the trackingurl if its empty:

    appMasterRequest.setTrackingUrl(new URI(uiAddress).getAuthority())

I hit this just start spark-shell without the tracking url set.

14/09/23 16:18:34 INFO yarn.YarnRMClientImpl: Connecting to ResourceManager at kryptonitered-jt1.red.ygrid.yahoo.com/98.139.154.99:8030
Exception in thread "main" java.lang.NullPointerException
        at org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterRequestProto$Builder.setTrackingUrl(YarnServiceProtos.java:710)
        at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.RegisterApplicationMasterRequestPBImpl.setTrackingUrl(RegisterApplicationMasterRequestPBImpl.java:132)
        at org.apache.spark.deploy.yarn.YarnRMClientImpl.registerApplicationMaster(YarnRMClientImpl.scala:102)
        at org.apache.spark.deploy.yarn.YarnRMClientImpl.register(YarnRMClientImpl.scala:55)
        at org.apache.spark.deploy.yarn.YarnRMClientImpl.register(YarnRMClientImpl.scala:38)
        at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:168)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:206)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:120)

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2981 from sarutak/SPARK-3657-2 and squashes the following commits:

e2fd6bc [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3657
70b8882 [Kousuke Saruta] Fixed NPE thrown

247c529b

[SPARK-4096][YARN]let ApplicationMaster accept executor memory argument in... · 1ea3e3dc

WangTaoTheTonic authored 10 years ago

[SPARK-4096][YARN]let ApplicationMaster accept executor memory argument in same format as JVM memory strings

Here `ApplicationMaster` accept executor memory argument only in number format, we should let it accept JVM style memory strings as well.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2955 from WangTaoTheTonic/modifyDesc and squashes the following commits:

ab98c70 [WangTaoTheTonic] append parameter passed in
3779767 [WangTaoTheTonic] Update executor memory description in the help message

1ea3e3dc

[SPARK-4110] Wrong comments about default settings in spark-daemon.sh · 44d8b45a

Kousuke Saruta authored 10 years ago

In spark-daemon.sh, thare are following comments.

    #   SPARK_CONF_DIR  Alternate conf dir. Default is ${SPARK_PREFIX}/conf.
    #   SPARK_LOG_DIR   Where log files are stored.  PWD by default.

But, I think the default value for SPARK_CONF_DIR is `${SPARK_HOME}/conf` and for SPARK_LOG_DIR is `${SPARK_HOME}/logs`.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2972 from sarutak/SPARK-4110 and squashes the following commits:

5a171a2 [Kousuke Saruta] Fixed wrong comments

44d8b45a

[SPARK-4031] Make torrent broadcast read blocks on use. · 7768a800

Shivaram Venkataraman authored 10 years ago

This avoids reading torrent broadcast variables when they are referenced in the closure but not used in the closure. This is done by using a `lazy val` to read broadcast blocks

cc rxin JoshRosen for review

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #2871 from shivaram/broadcast-read-value and squashes the following commits:

1456d65 [Shivaram Venkataraman] Use getUsedTimeMs and remove readObject
d6c5ee9 [Shivaram Venkataraman] Use laxy val to implement readBroadcastBlock
0b34df7 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into broadcast-read-value
9cec507 [Shivaram Venkataraman] Test if broadcast variables are read lazily
768b40b [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into broadcast-read-value
8792ed8 [Shivaram Venkataraman] Make torrent broadcast read blocks on use. This avoids reading broadcast variables when they are referenced in the closure but not used by the code.

7768a800

[SPARK-4098][YARN]use appUIAddress instead of appUIHostPort in yarn-client mode · 0ac52e30

WangTaoTheTonic authored 10 years ago

https://issues.apache.org/jira/browse/SPARK-4098

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2958 from WangTaoTheTonic/useAddress and squashes the following commits:

29236e6 [WangTaoTheTonic] use appUIAddress instead of appUIHostPort in yarn-cluster mode

0ac52e30

[SPARK-4095][YARN][Minor]extract val isLaunchingDriver in ClientBase · e8813be6

WangTaoTheTonic authored 10 years ago

Instead of checking if `args.userClass` is null repeatedly, we extract it to an global val as in `ApplicationMaster`.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2954 from WangTaoTheTonic/MemUnit and squashes the following commits:

13bda20 [WangTaoTheTonic] extract val isLaunchingDriver in ClientBase

e8813be6

[SPARK-4116][YARN]Delete the abandoned log4j-spark-container.properties · 47346cd0

WangTaoTheTonic authored 10 years ago

Since its name reduced at https://github.com/apache/spark/pull/560, the log4j-spark-container.properties was never used again.
And I have searched its name globally in code and found no cite.

Author: WangTaoTheTonic <barneystinson@aliyun.com>

Closes #2977 from WangTaoTheTonic/delLog4j and squashes the following commits:

fb2729f [WangTaoTheTonic] delete the log4j file obsoleted

47346cd0

[SPARK-3961] [MLlib] [PySpark] Python API for mllib.feature · fae095bc

Davies Liu authored 10 years ago

Added completed Python API for MLlib.feature

Normalizer
StandardScalerModel
StandardScaler
HashTF
IDFModel
IDF

cc mengxr

Author: Davies Liu <davies@databricks.com>
Author: Davies Liu <davies.liu@gmail.com>

Closes #2819 from davies/feature and squashes the following commits:

4f48f48 [Davies Liu] add a note for HashingTF
67f6d21 [Davies Liu] address comments
b628693 [Davies Liu] rollback changes in Word2Vec
efb4f4f [Davies Liu] Merge branch 'master' into feature
806c7c2 [Davies Liu] address comments
3abb8c2 [Davies Liu] address comments
59781b9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into feature
a405ae7 [Davies Liu] fix tests
7a1891a [Davies Liu] fix tests
486795f [Davies Liu] update programming guide, HashTF -> HashingTF
8a50584 [Davies Liu] Python API for mllib.feature

fae095bc

[SPARK-4107] Fix incorrect handling of read() and skip() return values · 46c63417

Josh Rosen authored 10 years ago

`read()` may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors.  `skip()` faces similar issues, too.

This patch fixes several cases where we mis-handle these methods' return values.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #2969 from JoshRosen/file-channel-read-fix and squashes the following commits:

e724a9f [Josh Rosen] Fix similar issue of not checking skip() return value.
cbc03ce [Josh Rosen] Update the other log message, too.
01e6015 [Josh Rosen] file.getName -> file.getAbsolutePath
d961d95 [Josh Rosen] Fix another issue in FileServerSuite.
b9265d2 [Josh Rosen] Fix a similar (minor) issue in TestUtils.
cd9d76f [Josh Rosen] Fix a similar error in Tachyon:
3db0008 [Josh Rosen] Fix a similar read() error in Utils.offsetBytes().
db985ed [Josh Rosen] Fix unsafe usage of FileChannel.read():

46c63417

fix broken links in README.md · 4ceb048b

Ryan Williams authored 10 years ago

seems like `building-spark.html` was renamed to `building-with-maven.html`?

Is Maven the blessed build tool these days, or SBT? I couldn't find a building-with-sbt page so I went with the Maven one here.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #2859 from ryan-williams/broken-links-readme and squashes the following commits:

7692253 [Ryan Williams] fix broken links in README.md

4ceb048b

[SPARK-4064]NioBlockTransferService.fetchBlocks may cause spark to hang. · 7c0c26cd

GuoQiang Li authored 10 years ago

cc @rxin

Author: GuoQiang Li <witgo@qq.com>

Closes #2929 from witgo/SPARK-4064 and squashes the following commits:

20110f2 [GuoQiang Li] Modify the exception msg
3425225 [GuoQiang Li] review commits
2b07e49 [GuoQiang Li] If we create a lot of big broadcast variables, Spark may hang

7c0c26cd

[SPARK-3907][SQL] Add truncate table support · 0c34fa5b

wangxiaojing authored 10 years ago

JIRA issue: [SPARK-3907]https://issues.apache.org/jira/browse/SPARK-3907

Add turncate table support
TRUNCATE TABLE table_name [PARTITION partition_spec];
partition_spec:
  : (partition_col = partition_col_value, partition_col = partiton_col_value, ...)
Removes all rows from a table or partition(s). Currently target table should be native/managed table or exception will be thrown. User can specify partial partition_spec for truncating multiple partitions at once and omitting partition_spec will truncate all partitions in the table.

Author: wangxiaojing <u9jing@gmail.com>

Closes #2770 from wangxiaojing/spark-3907 and squashes the following commits:

63dbd81 [wangxiaojing] change hive scalastyle
7a03707 [wangxiaojing] add comment
f6e710e [wangxiaojing] change truncate table
a1f692c [wangxiaojing] Correct spelling mistakes
3b20007 [wangxiaojing] add truncate can not support column err message
e483547 [wangxiaojing] add golden file
77b1f20 [wangxiaojing]  add truncate table support

0c34fa5b

Oct 27, 2014

[SQL] Correct a variable name in JavaApplySchemaSuite.applySchemaToJSON · 27470d34

Yin Huai authored 10 years ago

`schemaRDD2` is not tested because `schemaRDD1` is registered again.

Author: Yin Huai <huai@cse.ohio-state.edu>

Closes #2869 from yhuai/JavaApplySchemaSuite and squashes the following commits:

95fe894 [Yin Huai] Correct variable name.

27470d34

[SPARK-4041][SQL] Attributes names in table scan should converted to lowercase... · 89af6dfc

wangfei authored 10 years ago

[SPARK-4041][SQL] Attributes names in table scan should converted to lowercase when compare with relation attributes

In ```MetastoreRelation``` the attributes name is lowercase because of hive using lowercase for fields name, so we should convert attributes name in table scan lowercase in ```indexWhere(_.name == a.name)```.
```neededColumnIDs``` may be not correct if not convert to lowercase.

Author: wangfei <wangfei1@huawei.com>
Author: scwf <wangfei1@huawei.com>

Closes #2884 from scwf/fixColumnIds and squashes the following commits:

6174046 [scwf] use AttributeMap for this issue
dc74a24 [wangfei] use lowerName and add a test case for this issue
3ff3a80 [wangfei] more safer change
294fcb7 [scwf] attributes names in table scan should convert lowercase in neededColumnsIDs

89af6dfc

[SPARK-3816][SQL] Add table properties from storage handler to output jobConf · 698a7eab

Alex Liu authored 10 years ago

...ob conf in SparkHadoopWriter class

Author: Alex Liu <alex_liu68@yahoo.com>

Closes #2677 from alexliu68/SPARK-SQL-3816 and squashes the following commits:

79c269b [Alex Liu] [SPARK-3816][SQL] Add table properties from storage handler to job conf

698a7eab

[SPARK-3911] [SQL] HiveSimpleUdf can not be optimized in constant folding · 418ad83f

Cheng Hao authored 10 years ago

```
explain extended select cos(null) from src limit 1;
```
outputs:
```
 Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#5]
  MetastoreRelation default, src, None

== Optimized Logical Plan ==
Limit 1
 Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#5]
  MetastoreRelation default, src, None

== Physical Plan ==
Limit 1
 Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#5]
  HiveTableScan [], (MetastoreRelation default, src, None), None
```
After patching this PR it outputs
```
== Parsed Logical Plan ==
Limit 1
 Project ['cos(null) AS c_0#0]
  UnresolvedRelation None, src, None

== Analyzed Logical Plan ==
Limit 1
 Project [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFCos(null) AS c_0#0]
  MetastoreRelation default, src, None

== Optimized Logical Plan ==
Limit 1
 Project [null AS c_0#0]
  MetastoreRelation default, src, None

== Physical Plan ==
Limit 1
 Project [null AS c_0#0]
  HiveTableScan [], (MetastoreRelation default, src, None), None
```

Author: Cheng Hao <hao.cheng@intel.com>

Closes #2771 from chenghao-intel/hive_udf_constant_folding and squashes the following commits:

1379c73 [Cheng Hao] duplicate the PlanTest with catalyst/plans/PlanTest
1e52dda [Cheng Hao] add unit test for hive simple udf constant folding
01609ff [Cheng Hao] support constant folding for HiveSimpleUdf

418ad83f

[MLlib] SPARK-3987: add test case on objective value for NNLS · 7e3a1ada

coderxiang authored 10 years ago

Also update step parameter to pass the proposed test

Author: coderxiang <shuoxiangpub@gmail.com>

Closes #2965 from coderxiang/nnls-test and squashes the following commits:

24b06f9 [coderxiang] add test case on objective value for NNLS; update step parameter to pass the test

7e3a1ada

SPARK-4022 [CORE] [MLLIB] Replace colt dependency (LGPL) with commons-math · bfa614b1

Sean Owen authored 10 years ago

This change replaces usages of colt with commons-math3 equivalents, and makes some minor necessary adjustments to related code and tests to match.

Author: Sean Owen <sowen@cloudera.com>

Closes #2928 from srowen/SPARK-4022 and squashes the following commits:

61a232f [Sean Owen] Fix failure due to different sampling in JavaAPISuite.sample()
16d66b8 [Sean Owen] Simplify seeding with call to reseedRandomGenerator
a1a78e0 [Sean Owen] Use Well19937c
31c7641 [Sean Owen] Fix Python Poisson test by choosing a different seed; about 88% of seeds should work but 1 didn't, it seems
5c9c67f [Sean Owen] Additional test fixes from review
d8f88e0 [Sean Owen] Replace colt with commons-math3. Some tests do not pass yet.

bfa614b1

[SQL] Fixes caching related JoinSuite failure · 1d7bcc88

Cheng Lian authored 10 years ago

PR #2860 refines in-memory table statistics and enables broader broadcasted hash join optimization for in-memory tables. This makes `JoinSuite` fail when some test suite caches test table `testData` and gets executed before `JoinSuite`. Because expected `ShuffledHashJoin`s are optimized to `BroadcastedHashJoin` according to collected in-memory table statistics.

This PR fixes this issue by clearing the cache before testing join operator selection. A separate test case is also added to test broadcasted hash join operator selection.

Author: Cheng Lian <lian@databricks.com>

Closes #2960 from liancheng/fix-join-suite and squashes the following commits:

715b2de [Cheng Lian] Fixes caching related JoinSuite failure

1d7bcc88

SPARK-2621. Update task InputMetrics incrementally · dea302dd

Sandy Ryza authored 10 years ago

The patch takes advantage an API provided in Hadoop 2.5 that allows getting accurate data on Hadoop FileSystem bytes read. It eliminates the old method, which naively accepts the split size as the input bytes. An impact of this change will be that input metrics go away when using against Hadoop versions earlier thatn 2.5. I can add this back in, but my opinion is that no metrics are better than inaccurate metrics.

This is difficult to write a test for because we don't usually build against a version of Hadoop that contains the function we need. I've tested it manually on a pseudo-distributed cluster.

Author: Sandy Ryza <sandy@cloudera.com>

Closes #2087 from sryza/sandy-spark-2621 and squashes the following commits:

23010b8 [Sandy Ryza] Missing style fixes
74fc9bb [Sandy Ryza] Make getFSBytesReadOnThreadCallback private
1ab662d [Sandy Ryza] Clear things up a bit
984631f [Sandy Ryza] Switch from pull to push model and add test
7ef7b22 [Sandy Ryza] Add missing curly braces
219abc9 [Sandy Ryza] Fall back to split size
90dbc14 [Sandy Ryza] SPARK-2621. Update task InputMetrics incrementally

dea302dd

[SPARK-4032] Deprecate YARN alpha support in Spark 1.2 · c9e05ca2

Prashant Sharma authored 10 years ago

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #2878 from ScrapCodes/SPARK-4032/deprecate-yarn-alpha and squashes the following commits:

17e9857 [Prashant Sharma] added deperecated comment to Client and ExecutorRunnable.
3a34b1e [Prashant Sharma] Updated docs...
4608dea [Prashant Sharma] [SPARK-4032] Deprecate YARN alpha support in Spark 1.2

c9e05ca2

[SPARK-4030] Make destroy public for broadcast variables · 9aa340a2

Shivaram Venkataraman authored 10 years ago

This change makes the destroy function public for broadcast variables. Motivation for the change is described in https://issues.apache.org/jira/browse/SPARK-4030.
This patch also logs where destroy was called from if a broadcast variable is used after destruction.

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #2922 from shivaram/broadcast-destroy and squashes the following commits:

a11abab [Shivaram Venkataraman] Fix scala style in Utils.scala
bed9c9d [Shivaram Venkataraman] Make destroy blocking by default
e80c1ab [Shivaram Venkataraman] Make destroy public for broadcast variables Also log where destroy was called from if a broadcast variable is used after destruction.

9aa340a2

Oct 26, 2014

[SPARK-3970] Remove duplicate removal of local dirs · 6377adaf

Liang-Chi Hsieh authored 10 years ago

The shutdown hook of `DiskBlockManager` would remove localDirs. So do not need to register them with `Utils.registerShutdownDeleteDir`. It causes duplicate removal of these local dirs and corresponding exceptions.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #2826 from viirya/fix_duplicate_localdir_remove and squashes the following commits:

051d4b5 [Liang-Chi Hsieh] check dir existing and return empty List as default.
2b91a9c [Liang-Chi Hsieh] remove duplicate removal of local dirs.

6377adaf

[SPARK-4042][SQL] Append columns ids and names before broadcast · f4e8c289

scwf authored 10 years ago

Append columns ids and names before broadcast ```hiveExtraConf``` in ```HadoopTableReader```.

Author: scwf <wangfei1@huawei.com>

Closes #2885 from scwf/HadoopTableReader and squashes the following commits:

a8c498c [scwf] append columns ids and names before broadcast

f4e8c289

[SPARK-4061][SQL] We cannot use EOL character in the operand of LIKE predicate. · 3a9d66cf

Kousuke Saruta authored 10 years ago

We cannot use EOL character like \n or \r in the operand of LIKE predicate.
So following condition is never true.

    -- someStr is 'hoge\nfuga'
    where someStr LIKE 'hoge_fuga'

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2908 from sarutak/spark-sql-like-match-modification and squashes the following commits:

d15798b [Kousuke Saruta] Remove test setting for thriftserver
f99a2f4 [Kousuke Saruta] Fixed LIKE predicate so that we can use EOL character as in a operand

3a9d66cf

[SPARK-3959][SPARK-3960][SQL] SqlParser fails to parse literal... · ace41e8b

Kousuke Saruta authored 10 years ago

[SPARK-3959][SPARK-3960][SQL] SqlParser fails to parse literal -9223372036854775808 (Long.MinValue). / We can apply unary minus only to literal.

SqlParser fails to parse -9223372036854775808 (Long.MinValue) so we cannot write queries such like as follows.

    SELECT value FROM someTable WHERE value > -9223372036854775808

Additionally, because of the wrong syntax definition, we cannot apply unary minus only to literal. So, we cannot write such expressions.

    -(value1 + value2) // Parenthesized expressions
    -column // Columns
    -MAX(column) // Functions

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2816 from sarutak/spark-sql-dsl-improvement2 and squashes the following commits:

32a5005 [Kousuke Saruta] Remove test setting for thriftserver
c2bab5e [Kousuke Saruta] Fixed SPARK-3959 and SPARK-3960

ace41e8b

[SPARK-3483][SQL] Special chars in column names · 974d7b23

ravipesala authored 10 years ago

Supporting special chars in column names by using back ticks. Closed https://github.com/apache/spark/pull/2804 and created this PR as it has merge conflicts

Author: ravipesala <ravindra.pesala@huawei.com>

Closes #2927 from ravipesala/SPARK-3483-NEW and squashes the following commits:

f6329f3 [ravipesala] Rebased with master

974d7b23

[SPARK-4068][SQL] NPE in jsonRDD schema inference · 0481aaa8

Yin Huai authored 10 years ago

Please refer to added tests for cases that can trigger the bug.

JIRA: https://issues.apache.org/jira/browse/SPARK-4068

Author: Yin Huai <huai@cse.ohio-state.edu>

Closes #2918 from yhuai/SPARK-4068 and squashes the following commits:

d360eae [Yin Huai] Handle nulls when building key paths from elements of an array.

0481aaa8

[SPARK-4052][SQL] Use scala.collection.Map for pattern matching instead of... · 05308426

Yin Huai authored 10 years ago

[SPARK-4052][SQL] Use scala.collection.Map for pattern matching instead of using Predef.Map (it is scala.collection.immutable.Map)

Please check https://issues.apache.org/jira/browse/SPARK-4052 for cases triggering this bug.

Author: Yin Huai <huai@cse.ohio-state.edu>

Closes #2899 from yhuai/SPARK-4052 and squashes the following commits:

1188f70 [Yin Huai] Address liancheng's comments.
b6712be [Yin Huai] Use scala.collection.Map instead of Predef.Map (scala.collection.immutable.Map).

05308426

[SPARK-3953][SQL][Minor] Confusable variable name. · d518bc24

Kousuke Saruta authored 10 years ago

In SqlParser.scala, there is following code.

    case d ~ p ~ r ~ f ~ g ~ h ~ o ~ l  =>
      val base = r.getOrElse(NoRelation)
      val withFilter = f.map(f => Filter(f, base)).getOrElse(base)

In the code above, there are 2 variables which have same name "f" in near place.
One is receiver "f" and other is bound variable "f".

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2807 from sarutak/SPARK-3953 and squashes the following commits:

4957c32 [Kousuke Saruta] Improved variable name in SqlParser.scala

d518bc24

[SQL][DOC] Wrong package name "scala.math.sql" in sql-programming-guide.md · dc51f4d6

Kousuke Saruta authored 10 years ago

In sql-programming-guide.md, there is a wrong package name "scala.math.sql".

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2873 from sarutak/wrong-packagename-fix and squashes the following commits:

4d5ecf4 [Kousuke Saruta] Fixed wrong package name in sql-programming-guide.md

dc51f4d6

[SPARK-3997][Build]scalastyle should output the error location · 89e8a5d8

GuoQiang Li authored 10 years ago

Author: GuoQiang Li <witgo@qq.com>

Closes #2846 from witgo/SPARK-3997 and squashes the following commits:

d6a57f8 [GuoQiang Li] scalastyle should output the error location

89e8a5d8

[SPARK-3537][SPARK-3914][SQL] Refines in-memory columnar table statistics · 2838bf8a

Cheng Lian authored 10 years ago

This PR refines in-memory columnar table statistics:

1. adds 2 more statistics for in-memory table columns: `count` and `sizeInBytes`
1. adds filter pushdown support for `IS NULL` and `IS NOT NULL`.
1. caches and propagates statistics in `InMemoryRelation` once the underlying cached RDD is materialized.

Statistics are collected to driver side with an accumulator.

This PR also fixes SPARK-3914 by properly propagating in-memory statistics.

Author: Cheng Lian <lian@databricks.com>

Closes #2860 from liancheng/propagates-in-mem-stats and squashes the following commits:

0cc5271 [Cheng Lian] Restricts visibility of o.a.s.s.c.p.l.Statistics
c5ff904 [Cheng Lian] Fixes test table name conflict
a8c818d [Cheng Lian] Refines tests
1d01074 [Cheng Lian] Bug fix: shouldn't call STRING.actualSize on null string value
7dc6a34 [Cheng Lian] Adds more in-memory table statistics and propagates them properly

2838bf8a

[HOTFIX][SQL] Temporarily turn off hive-server tests. · 879a1658

Michael Armbrust authored 10 years ago

The thirift server is not available in the default (hive13) profile yet which is breaking all SQL only PRs. This turns off these test until #2685 is merged.

Author: Michael Armbrust <michael@databricks.com>

Closes #2950 from marmbrus/fixTests and squashes the following commits:

1a6dfee [Michael Armbrust] [HOTFIX][SQL] Temporarily turn of hive-server tests.

879a1658

[SPARK-3925][SQL] Do not consider the ordering of qualifiers during comparison · 0af7e514

Liang-Chi Hsieh authored 10 years ago

The orderings should not be considered during the comparison between old qualifiers and new qualifiers.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #2783 from viirya/full_qualifier_comp and squashes the following commits:

89f652c [Liang-Chi Hsieh] modification for comment.
abb5762 [Liang-Chi Hsieh] More comprehensive comparison of qualifiers.

0af7e514

Just fixing comment that shows usage · 677852c3

anant asthana authored 10 years ago

Author: anant asthana <anant.asty@gmail.com>

Closes #2948 from anantasty/patch-1 and squashes the following commits:

d8fea0b [anant asthana] Just fixing comment that shows usage

677852c3

[SPARK-3616] Add basic Selenium tests to WebUISuite · bf589fc7

Josh Rosen authored 10 years ago

This patch adds Selenium tests for Spark's web UI. To avoid adding extra
dependencies to the test environment, the tests use Selenium's HtmlUnitDriver,
which is pure-Java, instead of, say, ChromeDriver.

I added new tests to try to reproduce a few UI bugs reported on JIRA, namely
SPARK-3021, SPARK-2105, and SPARK-2527. I wasn't able to reproduce these bugs;
I suspect that the older ones might have been fixed by other patches.

In order to use HtmlUnitDriver, I added an explicit dependency on the
org.apache.httpcomponents version of httpclient in order to prevent jets3t's
older version from taking precedence on the classpath.

I also upgraded ScalaTest to 2.2.1.

Author: Josh Rosen <joshrosen@apache.org>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #2474 from JoshRosen/webui-selenium-tests and squashes the following commits:

fcc9e83 [Josh Rosen] scalautils -> scalactic package rename
510e54a [Josh Rosen] [SPARK-3616] Add basic Selenium tests to WebUISuite.

bf589fc7