Skip to content
Snippets Groups Projects
  1. May 27, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-7697][SQL] Use LongType for unsigned int in JDBCRDD · 4f98d7a7
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7697
      
      The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6229 from viirya/unsignedint_as_long and squashes the following commits:
      
      dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long
      608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.
      4f98d7a7
    • Cheolsoo Park's avatar
      [SPARK-7850][BUILD] Hive 0.12.0 profile in POM should be removed · 6dd64587
      Cheolsoo Park authored
      I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references.
      
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #6393 from piaozhexiu/SPARK-7850 and squashes the following commits:
      
      fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile
      82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code
      f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
      6dd64587
    • Xiangrui Meng's avatar
      [SPARK-7535] [.1] [MLLIB] minor changes to the pipeline API · a9f1c0c5
      Xiangrui Meng authored
      1. removed `Params.validateParams(extra)`
      2. added `Evaluate.evaluate(dataset, paramPairs*)`
      3. updated `RegressionEvaluator` doc
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6392 from mengxr/SPARK-7535.1 and squashes the following commits:
      
      5ff5af8 [Xiangrui Meng] add unit test for CV.validateParams
      f1f8369 [Xiangrui Meng] update CV.validateParams() to test estimatorParamMaps
      607445d [Xiangrui Meng] merge master
      8716f5f [Xiangrui Meng] specify default metric name in RegressionEvaluator
      e4e5631 [Xiangrui Meng] update RegressionEvaluator doc
      801e864 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7535.1
      fcbd3e2 [Xiangrui Meng] Merge branch 'master' into SPARK-7535.1
      2192316 [Xiangrui Meng] remove validateParams(extra); add evaluate(dataset, extra*)
      a9f1c0c5
  2. May 26, 2015
    • Cheng Lian's avatar
      [SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelation · b463e6d6
      Cheng Lian authored
      So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6411 from liancheng/spark-7868 and squashes the following commits:
      
      273ea36 [Cheng Lian] Ignores _temporary directories
      b463e6d6
    • Josh Rosen's avatar
      [SPARK-7858] [SQL] Use output schema, not relation schema, for data source input conversion · 0c33c7b4
      Josh Rosen authored
      In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows.  However, we should be using the output schema instead, since our scan might return a subset of the relation's columns.
      
      This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests:
      
      > In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested.
      
      Closes #5986.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Cheng Lian <lian@databricks.com>
      Author: Cheng Lian <liancheng@users.noreply.github.com>
      
      Closes #6400 from JoshRosen/SPARK-7858 and squashes the following commits:
      
      e71c866 [Josh Rosen] Re-fix bug so that the tests pass again
      56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites
      2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator
      6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion.
      5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858
      8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 61664732
      9968fba [Cheng Lian] Tests the data type conversion code paths
      0c33c7b4
    • rowan's avatar
      [SPARK-7637] [SQL] O(N) merge implementation for StructType merge · 03668348
      rowan authored
      Contribution is my original work and I license the work to the project under the projects open source license.
      
      Author: rowan <rowan.chattaway@googlemail.com>
      
      Closes #6259 from rowan000/SPARK-7637 and squashes the following commits:
      
      c479df4 [rowan] SPARK-7637: rename mapFields to fieldsMap as per comments on github.
      8d2e419 [rowan] SPARK-7637: fix up whitespace changes
      0e9d662 [rowan] SPARK-7637: O(N) merge implementatio for StructType merge
      03668348
    • Mike Dusenberry's avatar
      [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib... · 0463428b
      Mike Dusenberry authored
      [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation.
      
      Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6422 from dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the following commits:
      
      36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
      0463428b
    • Andrew Or's avatar
      [SPARK-7864] [UI] Do not kill innocent stages from visualization · 8f208242
      Andrew Or authored
      **Reproduction.** Run a long-running job, go to the job page, expand the DAG visualization, and click into a stage. Your stage is now killed. Why? This is because the visualization code just reaches into the stage table and grabs the first link it finds. In our case, this first link happens to be the kill link instead of the one to the stage page.
      
      **Fix.** Use proper CSS selectors to avoid ambiguity.
      
      This is an alternative to #6407. Thanks carsonwang for catching this.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6419 from andrewor14/fix-ui-viz-kill and squashes the following commits:
      
      25203bd [Andrew Or] Do not kill innocent stages
      8f208242
    • Xiangrui Meng's avatar
      [SPARK-7748] [MLLIB] Graduate spark.ml from alpha · 836a7589
      Xiangrui Meng authored
      With descent coverage of feature transformers, algorithms, and model tuning support, it is time to graduate `spark.ml` from alpha. This PR changes all `AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, depending on whether we expect a class/method to be used by end users (who use the pipeline API to assemble/tune their ML pipelines but not to create new pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR.
      
      jkbradley harsha2010
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6417 from mengxr/SPARK-7748 and squashes the following commits:
      
      effbccd [Xiangrui Meng] organize imports
      c15028e [Xiangrui Meng] added missing docs
      1b2e5f8 [Xiangrui Meng] update package doc
      73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest
      93819db [Xiangrui Meng] alpha -> ex/dev in ml.param
      55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature
      83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)
      836a7589
    • zsxwing's avatar
      [SPARK-6602] [CORE] Remove some places in core that calling SparkEnv.actorSystem · 9f742241
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6333 from zsxwing/remove-actor-system-usage and squashes the following commits:
      
      f125aa6 [zsxwing] Fix YarnAllocatorSuite
      ceadcf6 [zsxwing] Change the "port" parameter type of "AkkaUtils.address" to "int"; update ApplicationMaster and YarnAllocator to get the driverUrl from RpcEnv
      3239380 [zsxwing] Remove some places in core that calling SparkEnv.actorSystem
      9f742241
    • Shivaram Venkataraman's avatar
      [SPARK-3674] YARN support in Spark EC2 · 2e9a5f22
      Shivaram Venkataraman authored
      This corresponds to https://github.com/mesos/spark-ec2/pull/116 in the spark-ec2 repo. The only changes required on the spark_ec2.py script is to open the RM port.
      
      cc andrewor14
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6376 from shivaram/spark-ec2-yarn and squashes the following commits:
      
      961504a [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into spark-ec2-yarn
      152c94c [Shivaram Venkataraman] Open 8088 for YARN in EC2
      2e9a5f22
    • MechCoder's avatar
      [SPARK-7844] [MLLIB] Fix broken tests in KernelDensity · 61664732
      MechCoder authored
      The densities in KernelDensity are scaled down by
      (number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6383 from MechCoder/spark-7844 and squashes the following commits:
      
      ab81302 [MechCoder] Math->math
      9b8ed50 [MechCoder] Make one pass to update count
      a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity
      61664732
    • Patrick Wendell's avatar
    • Zhang, Liye's avatar
      [SPARK-7854] [TEST] refine Kryo test suite · 63099122
      Zhang, Liye authored
      this modification is according to JoshRosen 's comments, for details, please refer to [#5934](https://github.com/apache/spark/pull/5934/files#r30949751).
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #6395 from liyezhang556520/kryoTest and squashes the following commits:
      
      da214c8 [Zhang, Liye] refine Kryo test suite accroding to Josh's comments
      63099122
    • Mike Dusenberry's avatar
      [DOCS] [MLLIB] Fixing misformatted links in v1.4 MLlib Naive Bayes... · e5a63a0e
      Mike Dusenberry authored
      [DOCS] [MLLIB] Fixing misformatted links in v1.4 MLlib Naive Bayes documentation by removing space and newline characters.
      
      A couple of links in the MLlib Naive Bayes documentation for v1.4 were broken due to the addition of either space or newline characters between the link title and link URL in the markdown doc.  (Interestingly enough, they are rendered correctly in the GitHub viewer, but not when compiled to HTML by Jekyll.)
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6412 from dusenberrymw/Fix_Broken_Links_In_MLlib_Naive_Bayes_Docs and squashes the following commits:
      
      91a4028 [Mike Dusenberry] Fixing misformatted links by removing space and newline characters.
      e5a63a0e
    • meawoppl's avatar
      [SPARK-7806][EC2] Fixes that allow the spark_ec2.py tool to run with Python3 · 8dbe7777
      meawoppl authored
      I have used this script to launch, destroy, start, and stop clusters successfully.
      
      Author: meawoppl <meawoppl@gmail.com>
      
      Closes #6336 from meawoppl/py3ec2spark and squashes the following commits:
      
      2e87046 [meawoppl] Py3 compat fixes.
      8dbe7777
    • linweizhong's avatar
      [SPARK-7339] [PYSPARK] PySpark shuffle spill memory sometimes are not correct · 8948ad3f
      linweizhong authored
      In PySpark we get memory used before and after spill, then use the difference of these two value as memorySpilled, but if the before value is small than after value, then we will get a negative value, but this scenario 0 value may be more reasonable.
      
      Below is the result in HistoryServer we have tested:
      Index	ID	Attempt	Status	Locality Level	Executor ID / Host	Launch Time	Duration	GC Time	Input Size / Records	Write Time	Shuffle Write Size / Records	Shuffle Spill (Memory)	Shuffle Spill (Disk)	Errors
      0	0	0	SUCCESS	NODE_LOCAL	3 / vm119	2015/05/04 17:31:06	21 s	0.1 s	128.1 MB (hadoop) / 3237	70 ms	10.1 MB / 2529	0.0 B	5.7 MB
      2	2	0	SUCCESS	NODE_LOCAL	1 / vm118	2015/05/04 17:31:06	22 s	89 ms	128.1 MB (hadoop) / 3205	0.1 s	10.1 MB / 2529	-1048576.0 B	5.9 MB
      1	1	0	SUCCESS	NODE_LOCAL	2 / vm117	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3271	68 ms	10.1 MB / 2529	-1048576.0 B	5.6 MB
      4	4	0	SUCCESS	NODE_LOCAL	2 / vm117	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3192	51 ms	10.1 MB / 2529	-1048576.0 B	5.9 MB
      3	3	0	SUCCESS	NODE_LOCAL	3 / vm119	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3262	51 ms	10.1 MB / 2529	1024.0 KB	5.8 MB
      5	5	0	SUCCESS	NODE_LOCAL	1 / vm118	2015/05/04 17:31:06	22 s	89 ms	128.1 MB (hadoop) / 3256	93 ms	10.1 MB / 2529	-1048576.0 B	5.7 MB
      
      /cc davies
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #5887 from Sephiroth-Lin/spark-7339 and squashes the following commits:
      
      9186c81 [linweizhong] Use max function to get a nonnegative value
      d41672b [linweizhong] Update MemoryBytesSpilled when memorySpilled > 0
      8948ad3f
    • scwf's avatar
      [CORE] [TEST] Fix SimpleDateParamTest · bf49c221
      scwf authored
      ```
      sbt.ForkMain$ForkError: 1424424077190 was not equal to 1424474477190
      	at org.scalatest.MatchersHelper$.newTestFailedException(MatchersHelper.scala:160)
      	at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6231)
      	at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6265)
      	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply$mcV$sp(SimpleDateParamTest.scala:25)
      	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
      	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
      	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
      	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
      	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
      	at org.scalatest.Transformer.apply(Transformer.scala:22)
      	at org.scalatest.Transformer.apply(Transformer.scala:20)
      	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
      	at org.scalatest.Suite$class.withFixture(Suite.scala:
      ```
      
      Set timezone to fix SimpleDateParamTest
      
      Author: scwf <wangfei1@huawei.com>
      Author: Fei Wang <wangfei1@huawei.com>
      
      Closes #6377 from scwf/fix-SimpleDateParamTest and squashes the following commits:
      
      b8df1e5 [Fei Wang] Update SimpleDateParamSuite.scala
      8bb74f0 [scwf] fix SimpleDateParamSuite
      bf49c221
    • Konstantin Shaposhnikov's avatar
      [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x · 43aa819c
      Konstantin Shaposhnikov authored
      Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs
      custom 2.3.4-spark akka version that shades protobuf-2.5
      
      This partially fixes SPARK-7042 (for hadoop-2.x builds)
      
      Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com>
      
      Closes #6341 from kostya-sh/SPARK-7042 and squashes the following commits:
      
      7eb8c60 [Konstantin Shaposhnikov] [SPARK-7042][BUILD] use the standard akka artifacts with hadoop-2.x
      43aa819c
    • Reynold Xin's avatar
      [SQL][minor] Removed unused Catalyst logical plan DSL. · c9adcad8
      Reynold Xin authored
      The Catalyst DSL is no longer used as a public facing API. This pull request removes the UDF and writeToFile feature from it since they are not used in unit tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6350 from rxin/unused-logical-dsl and squashes the following commits:
      
      90b3de6 [Reynold Xin] [SQL][minor] Removed unused Catalyst logical plan DSL.
      c9adcad8
  3. May 25, 2015
    • Yin Huai's avatar
      [SPARK-7832] [Build] Always run SQL tests in master build. · f38e619c
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-7832
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6385 from yhuai/runSQLTests and squashes the following commits:
      
      3d399bc [Yin Huai] Always run SQL tests in master build.
      f38e619c
    • Calvin Jia's avatar
      [SPARK-6391][DOCS] Document Tachyon compatibility. · ce0051d6
      Calvin Jia authored
      Adds a section in the RDD persistence section of the programming-guide docs detailing Spark-Tachyon version compatibility as discussed in [[SPARK-6391]](https://issues.apache.org/jira/browse/SPARK-6391).
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #6382 from calvinjia/spark-6391 and squashes the following commits:
      
      113e863 [Calvin Jia] Move compatibility info to the offheap storage level section.
      7942dc5 [Calvin Jia] Add a section in the programming-guide docs for Tachyon compatibility.
      ce0051d6
    • Cheng Lian's avatar
      [SPARK-7842] [SQL] Makes task committing/aborting in InsertIntoHadoopFsRelation more robust · 8af1bf10
      Cheng Lian authored
      When committing/aborting a write task issued in `InsertIntoHadoopFsRelation`, if an exception is thrown from `OutputWriter.close()`, the committing/aborting process will be interrupted, and leaves messy stuff behind (e.g., the `_temporary` directory created by `FileOutputCommitter`).
      
      This PR makes these two process more robust by catching potential exceptions and falling back to normal task committment/abort.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6378 from liancheng/spark-7838 and squashes the following commits:
      
      f18253a [Cheng Lian] Makes task committing/aborting in InsertIntoHadoopFsRelation more robust
      8af1bf10
    • Cheng Lian's avatar
      [SPARK-7684] [SQL] Invoking HiveContext.newTemporaryConfiguration() shouldn't... · bfeedc69
      Cheng Lian authored
      [SPARK-7684] [SQL] Invoking HiveContext.newTemporaryConfiguration() shouldn't create new metastore directory
      
      The "Database does not exist" error reported in SPARK-7684 was caused by `HiveContext.newTemporaryConfiguration()`, which always creates a new temporary metastore directory and returns a metastore configuration pointing that directory. This makes `TestHive.reset()` always replaces old temporary metastore with an empty new one.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6359 from liancheng/spark-7684 and squashes the following commits:
      
      95d2eb8 [Cheng Lian] Addresses @marmbrust's comment
      042769d [Cheng Lian] Don't create new temp directory in HiveContext.newTemporaryConfiguration()
      bfeedc69
    • tedyu's avatar
      Add test which shows Kryo buffer size configured in mb is properly supported · fd31fd49
      tedyu authored
      This PR adds test which shows that Kryo buffer size configured in mb is supported properly
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #6390 from tedyu/master and squashes the following commits:
      
      c51ea64 [tedyu] Fix KryoSerializer creation
      f12ee04 [tedyu] Correct conf variable name in test
      642de51 [tedyu] Drop change in KryoSerializer so that the new test runs
      d2fdbc4 [tedyu] Give bufferSizeKb initial value
      9a17277 [tedyu] Rewrite bufferSize checking
      4739998 [tedyu] Rewrite bufferSize checking
      830d0d0 [tedyu] Kryo buffer size configured in mb should be properly supported
      fd31fd49
    • tedyu's avatar
      Close HBaseAdmin at the end of HBaseTest · 23bea97d
      tedyu authored
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #6381 from ted-yu/master and squashes the following commits:
      
      e2f0ea1 [tedyu] Close HBaseAdmin at the end of HBaseTest
      23bea97d
  4. May 24, 2015
  5. May 23, 2015
    • Patrick Wendell's avatar
    • Shivaram Venkataraman's avatar
      [HOTFIX] Copy SparkR lib if it exists in make-distribution · b231baa2
      Shivaram Venkataraman authored
      This is to fix an issue reported in #6373 where the `cp` would fail if `-Psparkr` was not used in the build
      
      cc dragos pwendell
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6379 from shivaram/make-distribution-hotfix and squashes the following commits:
      
      08eb7e4 [Shivaram Venkataraman] Copy SparkR lib if it exists in make-distribution
      b231baa2
    • Yin Huai's avatar
      [SPARK-7654] [SQL] Move insertInto into reader/writer interface. · 2b7e6358
      Yin Huai authored
      This one continues the work of https://github.com/apache/spark/pull/6216.
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6366 from yhuai/insert and squashes the following commits:
      
      3d717fb [Yin Huai] Use insertInto to handle the casue when table exists and Append is used for saveAsTable.
      56d2540 [Yin Huai] Add PreWriteCheck to HiveContext's analyzer.
      c636e35 [Yin Huai] Remove unnecessary empty lines.
      cf83837 [Yin Huai] Move insertInto to write. Also, remove the partition columns from InsertIntoHadoopFsRelation.
      0841a54 [Reynold Xin] Removed experimental tag for deprecated methods.
      33ed8ef [Reynold Xin] [SPARK-7654][SQL] Move insertInto into reader/writer interface.
      2b7e6358
    • Davies Liu's avatar
      Fix install jira-python · a4df0f2d
      Davies Liu authored
      jira-pytyhon package should be installed by
      
        sudo pip install jira
      
      cc pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6367 from davies/fix_jira_python2 and squashes the following commits:
      
      fbb3c8e [Davies Liu] Fix install jira-python
      a4df0f2d
    • Davies Liu's avatar
      [SPARK-7840] add insertInto() to Writer · be47af1b
      Davies Liu authored
      Add tests later.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6375 from davies/insertInto and squashes the following commits:
      
      826423e [Davies Liu] add insertInto() to Writer
      be47af1b
    • Davies Liu's avatar
      [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates · efe3bfdf
      Davies Liu authored
      1. ntile should take an integer as parameter.
      2. Added Python API (based on #6364)
      3. Update documentation of various DataFrame Python functions.
      
      Author: Davies Liu <davies@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6374 from rxin/window-final and squashes the following commits:
      
      69004c7 [Reynold Xin] Style fix.
      288cea9 [Reynold Xin] Update documentaiton.
      7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
      66092b4 [Davies Liu] update docs
      ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
      ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
      8936ade [Davies Liu] fix maxint in python 3
      2649358 [Davies Liu] update docs
      778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
      efe3bfdf
    • zsxwing's avatar
      [SPARK-7777][Streaming] Handle the case when there is no block in a batch · ad0badba
      zsxwing authored
      In the old implementation, if a batch has no block, `areWALRecordHandlesPresent` will be `true` and it will return `WriteAheadLogBackedBlockRDD`.
      
      This PR handles this case by returning `WriteAheadLogBackedBlockRDD` or `BlockRDD` according to the configuration.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6372 from zsxwing/SPARK-7777 and squashes the following commits:
      
      788f895 [zsxwing] Handle the case when there is no block in a batch
      ad0badba
    • Shivaram Venkataraman's avatar
      [SPARK-6811] Copy SparkR lib in make-distribution.sh · a40bca01
      Shivaram Venkataraman authored
      This change also remove native libraries from SparkR to make sure our distribution works across platforms
      
      Tested by building on Mac, running on Amazon Linux (CentOS), Windows VM and vice-versa (built on Linux run on Mac)
      
      I will also test this with YARN soon and update this PR.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6373 from shivaram/sparkr-binary and squashes the following commits:
      
      ae41b5c [Shivaram Venkataraman] Remove native libraries from SparkR Also include the built SparkR package in make-distribution.sh
      a40bca01
    • Davies Liu's avatar
      [SPARK-6806] [SPARKR] [DOCS] Fill in SparkR examples in programming guide · 7af3818c
      Davies Liu authored
      sqlCtx -> sqlContext
      
      You can check the docs by:
      
      ```
      $ cd docs
      $ SKIP_SCALADOC=1 jekyll serve
      ```
      cc shivaram
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5442 from davies/r_docs and squashes the following commits:
      
      7a12ec6 [Davies Liu] remove rdd in R docs
      8496b26 [Davies Liu] remove the docs related to RDD
      e23b9d6 [Davies Liu] delete R docs for RDD API
      222e4ff [Davies Liu] Merge branch 'master' into r_docs
      89684ce [Davies Liu] Merge branch 'r_docs' of github.com:davies/spark into r_docs
      f0a10e1 [Davies Liu] address comments from @shivaram
      f61de71 [Davies Liu] Update pairRDD.R
      3ef7cf3 [Davies Liu] use + instead of function(a,b) a+b
      2f10a77 [Davies Liu] address comments from @cafreeman
      9c2a062 [Davies Liu] mention R api together with Python API
      23f751a [Davies Liu] Fill in SparkR examples in programming guide
      7af3818c
    • GenTang's avatar
      [SPARK-5090] [EXAMPLES] The improvement of python converter for hbase · 4583cf4b
      GenTang authored
      Hi,
      
      Following the discussion in http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-td10001.html. I made some modification in three files in package examples:
      1. HBaseConverters.scala: the new converter will converts all the records in an hbase results into a single string
      2. hbase_input.py: as the value string may contain several records, we can use ast package to convert the string into dict
      3. HBaseTest.scala: as the package examples use hbase 0.98.7 the original constructor HTableDescriptor is deprecated. The updation to new constructor is made
      
      Author: GenTang <gen.tang86@gmail.com>
      
      Closes #3920 from GenTang/master and squashes the following commits:
      
      d2153df [GenTang] import JSONObject precisely
      4802481 [GenTang] dump the result into a singl String
      62df7f0 [GenTang] remove the comment
      21de653 [GenTang] return the string in json format
      15b1fe3 [GenTang] the modification of comments
      5cbbcfc [GenTang] the improvement of pythonconverter
      ceb31c5 [GenTang] the modification for adapting updation of hbase
      3253b61 [GenTang] the modification accompanying the improvement of pythonconverter
      4583cf4b
Loading