Commits · 4f98d7a7f1715273bc91f1903bb7e0f287cc7394 · cs525-sp18-g07 / spark

May 27, 2015

[SPARK-7697][SQL] Use LongType for unsigned int in JDBCRDD · 4f98d7a7

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-7697

The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6229 from viirya/unsignedint_as_long and squashes the following commits:

dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long
608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.

4f98d7a7

[SPARK-7850][BUILD] Hive 0.12.0 profile in POM should be removed · 6dd64587

Cheolsoo Park authored 9 years ago

I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references.

Author: Cheolsoo Park <cheolsoop@netflix.com>

Closes #6393 from piaozhexiu/SPARK-7850 and squashes the following commits:

fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile
82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code
f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs

6dd64587

[SPARK-7535] [.1] [MLLIB] minor changes to the pipeline API · a9f1c0c5

Xiangrui Meng authored 9 years ago

1. removed `Params.validateParams(extra)`
2. added `Evaluate.evaluate(dataset, paramPairs*)`
3. updated `RegressionEvaluator` doc

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6392 from mengxr/SPARK-7535.1 and squashes the following commits:

5ff5af8 [Xiangrui Meng] add unit test for CV.validateParams
f1f8369 [Xiangrui Meng] update CV.validateParams() to test estimatorParamMaps
607445d [Xiangrui Meng] merge master
8716f5f [Xiangrui Meng] specify default metric name in RegressionEvaluator
e4e5631 [Xiangrui Meng] update RegressionEvaluator doc
801e864 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7535.1
fcbd3e2 [Xiangrui Meng] Merge branch 'master' into SPARK-7535.1
2192316 [Xiangrui Meng] remove validateParams(extra); add evaluate(dataset, extra*)

a9f1c0c5

May 26, 2015

[SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelation · b463e6d6

Cheng Lian authored 9 years ago

So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan.

Author: Cheng Lian <lian@databricks.com>

Closes #6411 from liancheng/spark-7868 and squashes the following commits:

273ea36 [Cheng Lian] Ignores _temporary directories

b463e6d6

[SPARK-7858] [SQL] Use output schema, not relation schema, for data source input conversion · 0c33c7b4

Josh Rosen authored 9 years ago

In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows.  However, we should be using the output schema instead, since our scan might return a subset of the relation's columns.

This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests:

> In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested.

Closes #5986.

Author: Josh Rosen <joshrosen@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Author: Cheng Lian <liancheng@users.noreply.github.com>

Closes #6400 from JoshRosen/SPARK-7858 and squashes the following commits:

e71c866 [Josh Rosen] Re-fix bug so that the tests pass again
56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites
2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator
6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion.
5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858
8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 61664732
9968fba [Cheng Lian] Tests the data type conversion code paths

0c33c7b4

[SPARK-7637] [SQL] O(N) merge implementation for StructType merge · 03668348

rowan authored 9 years ago

Contribution is my original work and I license the work to the project under the projects open source license.

Author: rowan <rowan.chattaway@googlemail.com>

Closes #6259 from rowan000/SPARK-7637 and squashes the following commits:

c479df4 [rowan] SPARK-7637: rename mapFields to fieldsMap as per comments on github.
8d2e419 [rowan] SPARK-7637: fix up whitespace changes
0e9d662 [rowan] SPARK-7637: O(N) merge implementatio for StructType merge

03668348

[SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib... · 0463428b

Mike Dusenberry authored 9 years ago

[SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation.

Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6422 from dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the following commits:

36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.

0463428b

[SPARK-7864] [UI] Do not kill innocent stages from visualization · 8f208242

Andrew Or authored 9 years ago

**Reproduction.** Run a long-running job, go to the job page, expand the DAG visualization, and click into a stage. Your stage is now killed. Why? This is because the visualization code just reaches into the stage table and grabs the first link it finds. In our case, this first link happens to be the kill link instead of the one to the stage page.

**Fix.** Use proper CSS selectors to avoid ambiguity.

This is an alternative to #6407. Thanks carsonwang for catching this.

Author: Andrew Or <andrew@databricks.com>

Closes #6419 from andrewor14/fix-ui-viz-kill and squashes the following commits:

25203bd [Andrew Or] Do not kill innocent stages

8f208242

[SPARK-7748] [MLLIB] Graduate spark.ml from alpha · 836a7589

Xiangrui Meng authored 9 years ago

With descent coverage of feature transformers, algorithms, and model tuning support, it is time to graduate `spark.ml` from alpha. This PR changes all `AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, depending on whether we expect a class/method to be used by end users (who use the pipeline API to assemble/tune their ML pipelines but not to create new pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR.

jkbradley harsha2010

Author: Xiangrui Meng <meng@databricks.com>

Closes #6417 from mengxr/SPARK-7748 and squashes the following commits:

effbccd [Xiangrui Meng] organize imports
c15028e [Xiangrui Meng] added missing docs
1b2e5f8 [Xiangrui Meng] update package doc
73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest
93819db [Xiangrui Meng] alpha -> ex/dev in ml.param
55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature
83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)

836a7589

[SPARK-6602] [CORE] Remove some places in core that calling SparkEnv.actorSystem · 9f742241

zsxwing authored 9 years ago

Author: zsxwing <zsxwing@gmail.com>

Closes #6333 from zsxwing/remove-actor-system-usage and squashes the following commits:

f125aa6 [zsxwing] Fix YarnAllocatorSuite
ceadcf6 [zsxwing] Change the "port" parameter type of "AkkaUtils.address" to "int"; update ApplicationMaster and YarnAllocator to get the driverUrl from RpcEnv
3239380 [zsxwing] Remove some places in core that calling SparkEnv.actorSystem

9f742241

[SPARK-3674] YARN support in Spark EC2 · 2e9a5f22

Shivaram Venkataraman authored 9 years ago

This corresponds to https://github.com/mesos/spark-ec2/pull/116 in the spark-ec2 repo. The only changes required on the spark_ec2.py script is to open the RM port.

cc andrewor14

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6376 from shivaram/spark-ec2-yarn and squashes the following commits:

961504a [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into spark-ec2-yarn
152c94c [Shivaram Venkataraman] Open 8088 for YARN in EC2

2e9a5f22

[SPARK-7844] [MLLIB] Fix broken tests in KernelDensity · 61664732

MechCoder authored 9 years ago

The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6383 from MechCoder/spark-7844 and squashes the following commits:

ab81302 [MechCoder] Math->math
9b8ed50 [MechCoder] Make one pass to update count
a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity

61664732

Revert "[SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x" · b7d80859
Patrick Wendell authored 9 years ago
```
This reverts commit 43aa819c.
```
b7d80859

[SPARK-7854] [TEST] refine Kryo test suite · 63099122

Zhang, Liye authored 9 years ago

this modification is according to JoshRosen 's comments, for details, please refer to [#5934](https://github.com/apache/spark/pull/5934/files#r30949751).

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #6395 from liyezhang556520/kryoTest and squashes the following commits:

da214c8 [Zhang, Liye] refine Kryo test suite accroding to Josh's comments

63099122

[DOCS] [MLLIB] Fixing misformatted links in v1.4 MLlib Naive Bayes... · e5a63a0e

Mike Dusenberry authored 9 years ago

[DOCS] [MLLIB] Fixing misformatted links in v1.4 MLlib Naive Bayes documentation by removing space and newline characters.

A couple of links in the MLlib Naive Bayes documentation for v1.4 were broken due to the addition of either space or newline characters between the link title and link URL in the markdown doc.  (Interestingly enough, they are rendered correctly in the GitHub viewer, but not when compiled to HTML by Jekyll.)

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6412 from dusenberrymw/Fix_Broken_Links_In_MLlib_Naive_Bayes_Docs and squashes the following commits:

91a4028 [Mike Dusenberry] Fixing misformatted links by removing space and newline characters.

e5a63a0e

[SPARK-7806][EC2] Fixes that allow the spark_ec2.py tool to run with Python3 · 8dbe7777

meawoppl authored 9 years ago

I have used this script to launch, destroy, start, and stop clusters successfully.

Author: meawoppl <meawoppl@gmail.com>

Closes #6336 from meawoppl/py3ec2spark and squashes the following commits:

2e87046 [meawoppl] Py3 compat fixes.

8dbe7777

[SPARK-7339] [PYSPARK] PySpark shuffle spill memory sometimes are not correct · 8948ad3f

linweizhong authored 9 years ago

In PySpark we get memory used before and after spill, then use the difference of these two value as memorySpilled, but if the before value is small than after value, then we will get a negative value, but this scenario 0 value may be more reasonable.

Below is the result in HistoryServer we have tested:
Index	ID	Attempt	Status	Locality Level	Executor ID / Host	Launch Time	Duration	GC Time	Input Size / Records	Write Time	Shuffle Write Size / Records	Shuffle Spill (Memory)	Shuffle Spill (Disk)	Errors
0	0	0	SUCCESS	NODE_LOCAL	3 / vm119	2015/05/04 17:31:06	21 s	0.1 s	128.1 MB (hadoop) / 3237	70 ms	10.1 MB / 2529	0.0 B	5.7 MB
2	2	0	SUCCESS	NODE_LOCAL	1 / vm118	2015/05/04 17:31:06	22 s	89 ms	128.1 MB (hadoop) / 3205	0.1 s	10.1 MB / 2529	-1048576.0 B	5.9 MB
1	1	0	SUCCESS	NODE_LOCAL	2 / vm117	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3271	68 ms	10.1 MB / 2529	-1048576.0 B	5.6 MB
4	4	0	SUCCESS	NODE_LOCAL	2 / vm117	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3192	51 ms	10.1 MB / 2529	-1048576.0 B	5.9 MB
3	3	0	SUCCESS	NODE_LOCAL	3 / vm119	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3262	51 ms	10.1 MB / 2529	1024.0 KB	5.8 MB
5	5	0	SUCCESS	NODE_LOCAL	1 / vm118	2015/05/04 17:31:06	22 s	89 ms	128.1 MB (hadoop) / 3256	93 ms	10.1 MB / 2529	-1048576.0 B	5.7 MB

/cc davies

Author: linweizhong <linweizhong@huawei.com>

Closes #5887 from Sephiroth-Lin/spark-7339 and squashes the following commits:

9186c81 [linweizhong] Use max function to get a nonnegative value
d41672b [linweizhong] Update MemoryBytesSpilled when memorySpilled > 0

8948ad3f

[CORE] [TEST] Fix SimpleDateParamTest · bf49c221

scwf authored 9 years ago

```
sbt.ForkMain$ForkError: 1424424077190 was not equal to 1424474477190
	at org.scalatest.MatchersHelper$.newTestFailedException(MatchersHelper.scala:160)
	at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6231)
	at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6265)
	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply$mcV$sp(SimpleDateParamTest.scala:25)
	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	at org.scalatest.Suite$class.withFixture(Suite.scala:
```

Set timezone to fix SimpleDateParamTest

Author: scwf <wangfei1@huawei.com>
Author: Fei Wang <wangfei1@huawei.com>

Closes #6377 from scwf/fix-SimpleDateParamTest and squashes the following commits:

b8df1e5 [Fei Wang] Update SimpleDateParamSuite.scala
8bb74f0 [scwf] fix SimpleDateParamSuite

bf49c221

[SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x · 43aa819c

Konstantin Shaposhnikov authored 9 years ago

Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs
custom 2.3.4-spark akka version that shades protobuf-2.5

This partially fixes SPARK-7042 (for hadoop-2.x builds)

Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com>

Closes #6341 from kostya-sh/SPARK-7042 and squashes the following commits:

7eb8c60 [Konstantin Shaposhnikov] [SPARK-7042][BUILD] use the standard akka artifacts with hadoop-2.x

43aa819c

[SQL][minor] Removed unused Catalyst logical plan DSL. · c9adcad8

Reynold Xin authored 9 years ago

The Catalyst DSL is no longer used as a public facing API. This pull request removes the UDF and writeToFile feature from it since they are not used in unit tests.

Author: Reynold Xin <rxin@databricks.com>

Closes #6350 from rxin/unused-logical-dsl and squashes the following commits:

90b3de6 [Reynold Xin] [SQL][minor] Removed unused Catalyst logical plan DSL.

c9adcad8

May 25, 2015

[SPARK-7832] [Build] Always run SQL tests in master build. · f38e619c

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-7832

Author: Yin Huai <yhuai@databricks.com>

Closes #6385 from yhuai/runSQLTests and squashes the following commits:

3d399bc [Yin Huai] Always run SQL tests in master build.

f38e619c

[SPARK-6391][DOCS] Document Tachyon compatibility. · ce0051d6

Calvin Jia authored 9 years ago

Adds a section in the RDD persistence section of the programming-guide docs detailing Spark-Tachyon version compatibility as discussed in [[SPARK-6391]](https://issues.apache.org/jira/browse/SPARK-6391).

Author: Calvin Jia <jia.calvin@gmail.com>

Closes #6382 from calvinjia/spark-6391 and squashes the following commits:

113e863 [Calvin Jia] Move compatibility info to the offheap storage level section.
7942dc5 [Calvin Jia] Add a section in the programming-guide docs for Tachyon compatibility.

ce0051d6

[SPARK-7842] [SQL] Makes task committing/aborting in InsertIntoHadoopFsRelation more robust · 8af1bf10

Cheng Lian authored 9 years ago

When committing/aborting a write task issued in `InsertIntoHadoopFsRelation`, if an exception is thrown from `OutputWriter.close()`, the committing/aborting process will be interrupted, and leaves messy stuff behind (e.g., the `_temporary` directory created by `FileOutputCommitter`).

This PR makes these two process more robust by catching potential exceptions and falling back to normal task committment/abort.

Author: Cheng Lian <lian@databricks.com>

Closes #6378 from liancheng/spark-7838 and squashes the following commits:

f18253a [Cheng Lian] Makes task committing/aborting in InsertIntoHadoopFsRelation more robust

8af1bf10

[SPARK-7684] [SQL] Invoking HiveContext.newTemporaryConfiguration() shouldn't... · bfeedc69

Cheng Lian authored 9 years ago

[SPARK-7684] [SQL] Invoking HiveContext.newTemporaryConfiguration() shouldn't create new metastore directory

The "Database does not exist" error reported in SPARK-7684 was caused by `HiveContext.newTemporaryConfiguration()`, which always creates a new temporary metastore directory and returns a metastore configuration pointing that directory. This makes `TestHive.reset()` always replaces old temporary metastore with an empty new one.

Author: Cheng Lian <lian@databricks.com>

Closes #6359 from liancheng/spark-7684 and squashes the following commits:

95d2eb8 [Cheng Lian] Addresses @marmbrust's comment
042769d [Cheng Lian] Don't create new temp directory in HiveContext.newTemporaryConfiguration()

bfeedc69

Add test which shows Kryo buffer size configured in mb is properly supported · fd31fd49

tedyu authored 9 years ago

This PR adds test which shows that Kryo buffer size configured in mb is supported properly

Author: tedyu <yuzhihong@gmail.com>

Closes #6390 from tedyu/master and squashes the following commits:

c51ea64 [tedyu] Fix KryoSerializer creation
f12ee04 [tedyu] Correct conf variable name in test
642de51 [tedyu] Drop change in KryoSerializer so that the new test runs
d2fdbc4 [tedyu] Give bufferSizeKb initial value
9a17277 [tedyu] Rewrite bufferSize checking
4739998 [tedyu] Rewrite bufferSize checking
830d0d0 [tedyu] Kryo buffer size configured in mb should be properly supported

fd31fd49

Close HBaseAdmin at the end of HBaseTest · 23bea97d

tedyu authored 9 years ago

Author: tedyu <yuzhihong@gmail.com>

Closes #6381 from ted-yu/master and squashes the following commits:

e2f0ea1 [tedyu] Close HBaseAdmin at the end of HBaseTest

23bea97d

May 24, 2015

[SPARK-7811] Fix typo on slf4j configuration on metrics.properties.tem… · 4f4ba8fd

Judy Nash authored 9 years ago

Fix minor typo on metrics.properties.template where slf4j is incorrectly spelled as sl4j.

Author: Judy Nash <judynash@microsoft.com>

Closes #6362 from judynash/master and squashes the following commits:

c644875 [Judy Nash] SPARK-7811: Fix typo on slf4j configuration on metrics.properties.template

4f4ba8fd

[SPARK-7833] [ML] Add python wrapper for RegressionEvaluator · 65c696ec

Ram Sriharsha authored 9 years ago

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6365 from harsha2010/SPARK-7833 and squashes the following commits:

923f288 [Ram Sriharsha] cleanup
7623b7d [Ram Sriharsha] python style fix
9743f83 [Ram Sriharsha] [SPARK-7833][ml] Add python wrapper for RegressionEvaluator

65c696ec

[SPARK-7805] [SQL] Move SQLTestUtils.scala and ParquetTest.scala to src/test · ed21476b

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-7805

Because `sql/hive`'s tests depend on the test jar of `sql/core`, we do not need to store `SQLTestUtils` and `ParquetTest` in `src/main`. We should only add stuff that will be needed by `sql/console` or Python tests (for Python, we need it in `src/main`, right? davies).

Author: Yin Huai <yhuai@databricks.com>

Closes #6334 from yhuai/SPARK-7805 and squashes the following commits:

af6d0c9 [Yin Huai] mima
b86746a [Yin Huai] Move SQLTestUtils.scala and ParquetTest.scala to src/test.

ed21476b

[SPARK-7845] [BUILD] Bump "Hadoop 1" tests to version 1.2.1 · bfbc0df7

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-7845

Author: Yin Huai <yhuai@databricks.com>

Closes #6384 from yhuai/hadoop1Test and squashes the following commits:

82fcea8 [Yin Huai] Use hadoop 1.2.1 (a stable version) for hadoop 1 test.

bfbc0df7

May 23, 2015

[SPARK-7287] [HOTFIX] Disable o.a.s.deploy.SparkSubmitSuite --packages · 3c1a2d04
Patrick Wendell authored 9 years ago

3c1a2d04

[HOTFIX] Copy SparkR lib if it exists in make-distribution · b231baa2

Shivaram Venkataraman authored 9 years ago

This is to fix an issue reported in #6373 where the `cp` would fail if `-Psparkr` was not used in the build

cc dragos pwendell

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6379 from shivaram/make-distribution-hotfix and squashes the following commits:

08eb7e4 [Shivaram Venkataraman] Copy SparkR lib if it exists in make-distribution

b231baa2

[SPARK-7654] [SQL] Move insertInto into reader/writer interface. · 2b7e6358

Yin Huai authored 9 years ago

This one continues the work of https://github.com/apache/spark/pull/6216.

Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>

Closes #6366 from yhuai/insert and squashes the following commits:

3d717fb [Yin Huai] Use insertInto to handle the casue when table exists and Append is used for saveAsTable.
56d2540 [Yin Huai] Add PreWriteCheck to HiveContext's analyzer.
c636e35 [Yin Huai] Remove unnecessary empty lines.
cf83837 [Yin Huai] Move insertInto to write. Also, remove the partition columns from InsertIntoHadoopFsRelation.
0841a54 [Reynold Xin] Removed experimental tag for deprecated methods.
33ed8ef [Reynold Xin] [SPARK-7654][SQL] Move insertInto into reader/writer interface.

2b7e6358

Fix install jira-python · a4df0f2d

Davies Liu authored 9 years ago

jira-pytyhon package should be installed by

  sudo pip install jira

cc pwendell

Author: Davies Liu <davies@databricks.com>

Closes #6367 from davies/fix_jira_python2 and squashes the following commits:

fbb3c8e [Davies Liu] Fix install jira-python

a4df0f2d

[SPARK-7840] add insertInto() to Writer · be47af1b

Davies Liu authored 9 years ago

Add tests later.

Author: Davies Liu <davies@databricks.com>

Closes #6375 from davies/insertInto and squashes the following commits:

826423e [Davies Liu] add insertInto() to Writer

be47af1b

[SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates · efe3bfdf

Davies Liu authored 9 years ago

1. ntile should take an integer as parameter.
2. Added Python API (based on #6364)
3. Update documentation of various DataFrame Python functions.

Author: Davies Liu <davies@databricks.com>
Author: Reynold Xin <rxin@databricks.com>

Closes #6374 from rxin/window-final and squashes the following commits:

69004c7 [Reynold Xin] Style fix.
288cea9 [Reynold Xin] Update documentaiton.
7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
66092b4 [Davies Liu] update docs
ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
8936ade [Davies Liu] fix maxint in python 3
2649358 [Davies Liu] update docs
778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions

efe3bfdf

[SPARK-7777][Streaming] Handle the case when there is no block in a batch · ad0badba

zsxwing authored 9 years ago

In the old implementation, if a batch has no block, `areWALRecordHandlesPresent` will be `true` and it will return `WriteAheadLogBackedBlockRDD`.

This PR handles this case by returning `WriteAheadLogBackedBlockRDD` or `BlockRDD` according to the configuration.

Author: zsxwing <zsxwing@gmail.com>

Closes #6372 from zsxwing/SPARK-7777 and squashes the following commits:

788f895 [zsxwing] Handle the case when there is no block in a batch

ad0badba

[SPARK-6811] Copy SparkR lib in make-distribution.sh · a40bca01

Shivaram Venkataraman authored 9 years ago

This change also remove native libraries from SparkR to make sure our distribution works across platforms

Tested by building on Mac, running on Amazon Linux (CentOS), Windows VM and vice-versa (built on Linux run on Mac)

I will also test this with YARN soon and update this PR.

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6373 from shivaram/sparkr-binary and squashes the following commits:

ae41b5c [Shivaram Venkataraman] Remove native libraries from SparkR Also include the built SparkR package in make-distribution.sh

a40bca01

[SPARK-6806] [SPARKR] [DOCS] Fill in SparkR examples in programming guide · 7af3818c

Davies Liu authored 9 years ago

sqlCtx -> sqlContext

You can check the docs by:

```
$ cd docs
$ SKIP_SCALADOC=1 jekyll serve
```
cc shivaram

Author: Davies Liu <davies@databricks.com>

Closes #5442 from davies/r_docs and squashes the following commits:

7a12ec6 [Davies Liu] remove rdd in R docs
8496b26 [Davies Liu] remove the docs related to RDD
e23b9d6 [Davies Liu] delete R docs for RDD API
222e4ff [Davies Liu] Merge branch 'master' into r_docs
89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
f0a10e1 [Davies Liu] address comments from @shivaram
f61de71 [Davies Liu] Update pairRDD.R
3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
2f10a77 [Davies Liu] address comments from @cafreeman
9c2a062 [Davies Liu] mention R api together with Python API
23f751a [Davies Liu] Fill in SparkR examples in programming guide

7af3818c

[SPARK-5090] [EXAMPLES] The improvement of python converter for hbase · 4583cf4b

GenTang authored 9 years ago

Hi,

Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples:
1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string
2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict
3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made

Author: GenTang <gen.tang86@gmail.com>

Closes #3920 from GenTang/master and squashes the following commits:

d2153df [GenTang] import JSONObject precisely
4802481 [GenTang] dump the result into a singl String
62df7f0 [GenTang] remove the comment
21de653 [GenTang] return the string in json format
15b1fe3 [GenTang] the modification of comments
5cbbcfc [GenTang] the improvement of pythonconverter
ceb31c5 [GenTang] the modification for adapting updation of hbase
3253b61 [GenTang] the modification accompanying the improvement of pythonconverter

4583cf4b