Skip to content
Snippets Groups Projects
  1. Sep 20, 2014
  2. Sep 19, 2014
    • andrewor14's avatar
      [Docs] Fix outdated docs for standalone cluster · 8af23706
      andrewor14 authored
      This is now supported!
      
      Author: andrewor14 <andrewor14@gmail.com>
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2461 from andrewor14/document-standalone-cluster and squashes the following commits:
      
      85c8b9e [andrewor14] Wording change per Patrick
      35e30ee [Andrew Or] Fix outdated docs for standalone cluster
      8af23706
    • Nicholas Chammas's avatar
      [Build] Fix passing of args to sbt · 99b06b6f
      Nicholas Chammas authored
      Simple mistake, simple fix:
      ```shell
      args="arg1 arg2 arg3"
      
      sbt $args    # sbt sees 3 arguments
      sbt "$args"  # sbt sees 1 argument
      ```
      
      Should fix the problems we are seeing [here](https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/694/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console), for example.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2462 from nchammas/fix-sbt-master-build and squashes the following commits:
      
      4500c86 [Nicholas Chammas] warn about quoting
      10018a6 [Nicholas Chammas] Revert "test hadoop1 build"
      7d5356c [Nicholas Chammas] Revert "re-add bad quoting for testing"
      061600c [Nicholas Chammas] re-add bad quoting for testing
      b2de56c [Nicholas Chammas] test hadoop1 build
      43fb854 [Nicholas Chammas] unquote profile args
      99b06b6f
    • Daoyuan Wang's avatar
      [SPARK-3485][SQL] Use GenericUDFUtils.ConversionHelper for Simple UDF type conversions · ba68a51c
      Daoyuan Wang authored
      This is just another solution to SPARK-3485, in addition to PR #2355
      In this patch, we will use ConventionHelper and FunctionRegistry to invoke a simple udf evaluation, which rely more on hive, but much cleaner and safer.
      We can discuss which one is better.
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #2407 from adrian-wang/simpleudf and squashes the following commits:
      
      15762d2 [Daoyuan Wang] add posmod test which would fail the test but now ok
      0d69eb4 [Daoyuan Wang] another way to pass to hive simple udf
      ba68a51c
    • Sandy Ryza's avatar
      SPARK-3605. Fix typo in SchemaRDD. · 3b9cd13e
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #2460 from sryza/sandy-spark-3605 and squashes the following commits:
      
      09d940b [Sandy Ryza] SPARK-3605. Fix typo in SchemaRDD.
      3b9cd13e
    • Davies Liu's avatar
      [SPARK-3592] [SQL] [PySpark] support applySchema to RDD of Row · a95ad99e
      Davies Liu authored
      Fix the issue when applySchema() to an RDD of Row.
      
      Also add type mapping for BinaryType.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2448 from davies/row and squashes the following commits:
      
      dd220cf [Davies Liu] fix test
      3f3f188 [Davies Liu] add more test
      f559746 [Davies Liu] add tests, fix serialization
      9688fd2 [Davies Liu] support applySchema to RDD of Row
      a95ad99e
    • ravipesala's avatar
      [SPARK-2594][SQL] Support CACHE TABLE <name> AS SELECT ... · 5522151e
      ravipesala authored
      This feature allows user to add cache table from the select query.
      Example : ```CACHE TABLE testCacheTable AS SELECT * FROM TEST_TABLE```
      Spark takes this type of SQL as command and it does lazy caching just like ```SQLContext.cacheTable```, ```CACHE TABLE <name>``` does.
      It can be executed from both SQLContext and HiveContext.
      
      Recreated the pull request after rebasing with master.And fixed all the comments raised in previous pull requests.
      https://github.com/apache/spark/pull/2381
      https://github.com/apache/spark/pull/2390
      
      Author : ravipesala ravindra.pesalahuawei.com
      
      Author: ravipesala <ravindra.pesala@huawei.com>
      
      Closes #2397 from ravipesala/SPARK-2594 and squashes the following commits:
      
      a5f0beb [ravipesala] Simplified the code as per Admin comment.
      8059cd2 [ravipesala] Changed the behaviour from eager caching to lazy caching.
      d6e469d [ravipesala] Code review comments by Admin are handled.
      c18aa38 [ravipesala] Merge remote-tracking branch 'remotes/ravipesala/Add-Cache-table-as' into SPARK-2594
      394d5ca [ravipesala] Changed style
      fb1759b [ravipesala] Updated as per Admin comments
      8c9993c [ravipesala] Changed the style
      d8b37b2 [ravipesala] Updated as per the comments by Admin
      bc0bffc [ravipesala] Merge remote-tracking branch 'ravipesala/Add-Cache-table-as' into Add-Cache-table-as
      e3265d0 [ravipesala] Updated the code as per the comments by Admin in pull request.
      724b9db [ravipesala] Changed style
      aaf5b59 [ravipesala] Added comment
      dc33895 [ravipesala] Updated parser to support add cache table command
      b5276b2 [ravipesala] Updated parser to support add cache table command
      eebc0c1 [ravipesala] Add CACHE TABLE <name> AS SELECT ...
      6758f80 [ravipesala] Changed style
      7459ce3 [ravipesala] Added comment
      13c8e27 [ravipesala] Updated parser to support add cache table command
      4e858d8 [ravipesala] Updated parser to support add cache table command
      b803fc8 [ravipesala] Add CACHE TABLE <name> AS SELECT ...
      5522151e
    • Cheng Hao's avatar
      [SPARK-3501] [SQL] Fix the bug of Hive SimpleUDF creates unnecessary type cast · 2c3cc764
      Cheng Hao authored
      When do the query like:
      ```
      select datediff(cast(value as timestamp), cast('2002-03-21 00:00:00' as timestamp)) from src;
      ```
      SparkSQL will raise exception:
      ```
      [info] scala.MatchError: TimestampType (of class org.apache.spark.sql.catalyst.types.TimestampType$)
      [info] at org.apache.spark.sql.catalyst.expressions.Cast.castToTimestamp(Cast.scala:77)
      [info] at org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:251)
      [info] at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247)
      [info] at org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263)
      [info] at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$5$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:217)
      [info] at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$5$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:210)
      [info] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
      [info] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$2.apply(TreeNode.scala:180)
      [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      [info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      ```
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2368 from chenghao-intel/cast_exception and squashes the following commits:
      
      5c9c3a5 [Cheng Hao] make more clear code
      49dfc50 [Cheng Hao] Add no-op for Cast and revert the position of SimplifyCasts
      b804abd [Cheng Hao] Add unit test to show the failure in identical data type casting
      330a5c8 [Cheng Hao] Update Code based on comments
      b834ed4 [Cheng Hao] Fix bug of HiveSimpleUDF with unnecessary type cast which cause exception in constant folding
      2c3cc764
    • Davies Liu's avatar
      [SPARK-3491] [MLlib] [PySpark] use pickle to serialize data in MLlib · fce5e251
      Davies Liu authored
      Currently, we serialize the data between JVM and Python case by case manually, this cannot scale to support so many APIs in MLlib.
      
      This patch will try to address this problem by serialize the data using pickle protocol, using Pyrolite library to serialize/deserialize in JVM. Pickle protocol can be easily extended to support customized class.
      
      All the modules are refactored to use this protocol.
      
      Known issues: There will be some performance regression (both CPU and memory, the serialized data increased)
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2378 from davies/pickle_mllib and squashes the following commits:
      
      dffbba2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into pickle_mllib
      810f97f [Davies Liu] fix equal of matrix
      032cd62 [Davies Liu] add more type check and conversion for user_product
      bd738ab [Davies Liu] address comments
      e431377 [Davies Liu] fix cache of rdd, refactor
      19d0967 [Davies Liu] refactor Picklers
      2511e76 [Davies Liu] cleanup
      1fccf1a [Davies Liu] address comments
      a2cc855 [Davies Liu] fix tests
      9ceff73 [Davies Liu] test size of serialized Rating
      44e0551 [Davies Liu] fix cache
      a379a81 [Davies Liu] fix pickle array in python2.7
      df625c7 [Davies Liu] Merge commit '154d141' into pickle_mllib
      154d141 [Davies Liu] fix autobatchedpickler
      44736d7 [Davies Liu] speed up pickling array in Python 2.7
      e1d1bfc [Davies Liu] refactor
      708dc02 [Davies Liu] fix tests
      9dcfb63 [Davies Liu] fix style
      88034f0 [Davies Liu] rafactor, address comments
      46a501e [Davies Liu] choose batch size automatically
      df19464 [Davies Liu] memorize the module and class name during pickleing
      f3506c5 [Davies Liu] Merge branch 'master' into pickle_mllib
      722dd96 [Davies Liu] cleanup _common.py
      0ee1525 [Davies Liu] remove outdated tests
      b02e34f [Davies Liu] remove _common.py
      84c721d [Davies Liu] Merge branch 'master' into pickle_mllib
      4d7963e [Davies Liu] remove muanlly serialization
      6d26b03 [Davies Liu] fix tests
      c383544 [Davies Liu] classification
      f2a0856 [Davies Liu] mllib/regression
      d9f691f [Davies Liu] mllib/util
      cccb8b1 [Davies Liu] mllib/tree
      8fe166a [Davies Liu] Merge branch 'pickle' into pickle_mllib
      aa2287e [Davies Liu] random
      f1544c4 [Davies Liu] refactor clustering
      52d1350 [Davies Liu] use new protocol in mllib/stat
      b30ef35 [Davies Liu] use pickle to serialize data for mllib/recommendation
      f44f771 [Davies Liu] enable tests about array
      3908f5c [Davies Liu] Merge branch 'master' into pickle
      c77c87b [Davies Liu] cleanup debugging code
      60e4e2f [Davies Liu] support unpickle array.array for Python 2.6
      fce5e251
    • Matthew Farrellee's avatar
      [SPARK-1701] [PySpark] remove slice terminology from python examples · a03e5b81
      Matthew Farrellee authored
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2304 from mattf/SPARK-1701-partition-over-slice-for-python-examples and squashes the following commits:
      
      928a581 [Matthew Farrellee] [SPARK-1701] [PySpark] remove slice terminology from python examples
      a03e5b81
    • Matthew Farrellee's avatar
      [SPARK-1701] Clarify slice vs partition in the programming guide · be0c7563
      Matthew Farrellee authored
      This is a partial solution to SPARK-1701, only addressing the
      documentation confusion.
      
      Additional work can be to actually change the numSlices parameter name
      across languages, with care required for scala & python to maintain
      backward compatibility for named parameters.
      
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2305 from mattf/SPARK-1701 and squashes the following commits:
      
      c0af05d [Matthew Farrellee] Further tweak
      06f80fc [Matthew Farrellee] Wording tweak from Josh Rosen's review
      7b045e0 [Matthew Farrellee] [SPARK-1701] Clarify slice vs partition in the programming guide
      be0c7563
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · a48956f5
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #726 (close requested by 'pwendell')
      Closes #151 (close requested by 'pwendell')
      a48956f5
    • Larry Xiao's avatar
      [SPARK-2062][GraphX] VertexRDD.apply does not use the mergeFunc · 3bbbdd81
      Larry Xiao authored
      VertexRDD.apply had a bug where it ignored the merge function for
      duplicate vertices and instead used whichever vertex attribute occurred
      first. This commit fixes the bug by passing the merge function through
      to ShippableVertexPartition.apply, which merges any duplicates using the
      merge function and then fills in missing vertices using the specified
      default vertex attribute. This commit also adds a unit test for
      VertexRDD.apply.
      
      Author: Larry Xiao <xiaodi@sjtu.edu.cn>
      Author: Blie Arkansol <xiaodi@sjtu.edu.cn>
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #1903 from larryxiao/2062 and squashes the following commits:
      
      625aa9d [Blie Arkansol] Merge pull request #1 from ankurdave/SPARK-2062
      476770b [Ankur Dave] ShippableVertexPartition.initFrom: Don't run mergeFunc on default values
      614059f [Larry Xiao] doc update: note about the default null value vertices construction
      dfdb3c9 [Larry Xiao] minor fix
      1c70366 [Larry Xiao] scalastyle check: wrap line, parameter list indent 4 spaces
      e4ca697 [Larry Xiao] [TEST] VertexRDD.apply mergeFunc
      6a35ea8 [Larry Xiao] [TEST] VertexRDD.apply mergeFunc
      4fbc29c [Blie Arkansol] undo unnecessary change
      efae765 [Larry Xiao] fix mistakes: should be able to call with or without mergeFunc
      b2422f9 [Larry Xiao] Merge branch '2062' of github.com:larryxiao/spark into 2062
      52dc7f7 [Larry Xiao] pass mergeFunc to VertexPartitionBase, where merge is handled
      581e9ee [Larry Xiao] TODO: VertexRDDSuite
      20d80a3 [Larry Xiao] [SPARK-2062][GraphX] VertexRDD.apply does not use the mergeFunc
      3bbbdd81
    • Burak's avatar
      [SPARK-3418] Sparse Matrix support (CCS) and additional native BLAS operations added · e76ef5cb
      Burak authored
      Local `SparseMatrix` support added in Compressed Column Storage (CCS) format in addition to Level-2 and Level-3 BLAS operations such as dgemv and dgemm respectively.
      
      BLAS doesn't support  sparse matrix operations, therefore support for `SparseMatrix`-`DenseMatrix` multiplication and `SparseMatrix`-`DenseVector` implementations have been added. I will post performance comparisons in the comments momentarily.
      
      Author: Burak <brkyvz@gmail.com>
      
      Closes #2294 from brkyvz/SPARK-3418 and squashes the following commits:
      
      88814ed [Burak] Hopefully fixed MiMa this time
      47e49d5 [Burak] really fixed MiMa issue
      f0bae57 [Burak] [SPARK-3418] Fixed MiMa compatibility issues (excluded from check)
      4b7dbec [Burak] 9/17 comments addressed
      7af2f83 [Burak] sealed traits Vector and Matrix
      d3a8a16 [Burak] [SPARK-3418] Squashed missing alpha bug.
      421045f [Burak] [SPARK-3418] New code review comments addressed
      f35a161 [Burak] [SPARK-3418] Code review comments addressed and multiplication further optimized
      2508577 [Burak] [SPARK-3418] Fixed one more style issue
      d16e8a0 [Burak] [SPARK-3418] Fixed style issues and added documentation for methods
      204a3f7 [Burak] [SPARK-3418] Fixed failing Matrix unit test
      6025297 [Burak] [SPARK-3418] Fixed Scala-style errors
      dc7be71 [Burak] [SPARK-3418][MLlib] Matrix unit tests expanded with indexing and updating
      d2d5851 [Burak] [SPARK-3418][MLlib] Sparse Matrix support and additional native BLAS operations added
      e76ef5cb
  3. Sep 18, 2014
  4. Sep 17, 2014
    • WangTaoTheTonic's avatar
      [SPARK-3565]Fix configuration item not consistent with document · 3f169bfe
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-3565
      
      "spark.ports.maxRetries" should be "spark.port.maxRetries". Make the configuration keys in document and code consistent.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #2427 from WangTaoTheTonic/fixPortRetries and squashes the following commits:
      
      c178813 [WangTaoTheTonic] Use blank lines trigger Jenkins
      646f3fe [WangTaoTheTonic] also in SparkBuild.scala
      3700dba [WangTaoTheTonic] Fix configuration item not consistent with document
      3f169bfe
    • Kousuke Saruta's avatar
      [SPARK-3567] appId field in SparkDeploySchedulerBackend should be volatile · 1147973f
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2428 from sarutak/appid-volatile-modification and squashes the following commits:
      
      c7d890d [Kousuke Saruta] Added volatile modifier to appId field in SparkDeploySchedulerBackend
      1147973f
    • Kousuke Saruta's avatar
      [SPARK-3564][WebUI] Display App ID on HistoryPage · 6688a266
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2424 from sarutak/display-appid-on-webui and squashes the following commits:
      
      417fe90 [Kousuke Saruta] Added "App ID column" to HistoryPage
      6688a266
    • Kousuke Saruta's avatar
      [SPARK-3571] Spark standalone cluster mode doesn't work. · cbc06503
      Kousuke Saruta authored
      I think, this issue is caused by #1106
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2436 from sarutak/SPARK-3571 and squashes the following commits:
      
      7a4deea [Kousuke Saruta] Modified Master.scala to use numWorkersVisited and numWorkersAlive instead of stopPos
      4e51e35 [Kousuke Saruta] Modified Master to prevent from 0 divide
      4817ecd [Kousuke Saruta] Brushed up previous change
      71e84b6 [Kousuke Saruta] Modified Master to enable schedule normally
      cbc06503
    • Nicholas Chammas's avatar
      [SPARK-3534] Fix expansion of testing arguments to sbt · 7fc3bb7c
      Nicholas Chammas authored
      Testing arguments to `sbt` need to be passed as an array, not a single, long string.
      
      Fixes a bug introduced in #2420.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2437 from nchammas/selective-testing and squashes the following commits:
      
      a9f9c1c [Nicholas Chammas] fix printing of sbt test arguments
      cf57cbf [Nicholas Chammas] fix sbt test arguments
      e33b978 [Nicholas Chammas] Merge pull request #2 from apache/master
      0b47ca4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark
      8051486 [Nicholas Chammas] Merge pull request #1 from apache/master
      03180a4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark
      d4c5f43 [Nicholas Chammas] Merge pull request #6 from apache/master
      7fc3bb7c
    • Andrew Ash's avatar
      Docs: move HA subsections to a deeper indentation level · b3830b28
      Andrew Ash authored
      Makes the table of contents read better
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #2402 from ash211/docs/better-indentation and squashes the following commits:
      
      ea0e130 [Andrew Ash] Move HA subsections to a deeper indentation level
      b3830b28
    • Nicholas Chammas's avatar
      [SPARK-1455] [SPARK-3534] [Build] When possible, run SQL tests only. · 5044e495
      Nicholas Chammas authored
      If the only files changed are related to SQL, then only run the SQL tests.
      
      This patch includes some cosmetic/maintainability refactoring. I would be more than happy to undo some of these changes if they are inappropriate.
      
      We can accept this patch mostly as-is and address the immediate need documented in [SPARK-3534](https://issues.apache.org/jira/browse/SPARK-3534), or we can keep it open until a satisfactory solution along the lines [discussed here](https://issues.apache.org/jira/browse/SPARK-1455?focusedCommentId=14136424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14136424) is reached.
      
      Note: I had to hack this patch up to test it locally, so what I'm submitting here and what I tested are technically different.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2420 from nchammas/selective-testing and squashes the following commits:
      
      db3fa2d [Nicholas Chammas] diff against master!
      f9e23f6 [Nicholas Chammas] when possible, run SQL tests only
      5044e495
    • Michael Armbrust's avatar
      [SQL][DOCS] Improve table caching section · cbf983bb
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2434 from marmbrus/patch-1 and squashes the following commits:
      
      67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section
      cbf983bb
    • Nicholas Chammas's avatar
      [Docs] minor grammar fix · 8fbd5f4a
      Nicholas Chammas authored
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2430 from nchammas/patch-2 and squashes the following commits:
      
      d476bfb [Nicholas Chammas] [Docs] minor grammar fix
      8fbd5f4a
    • chesterxgchen's avatar
      SPARK-3177 (on Master Branch) · 7d1a3723
      chesterxgchen authored
      The JIRA and PR was original created for branch-1.1, and move to master branch now.
      Chester
      
      The Issue is due to that yarn-alpha and yarn have different APIs for certain class fields. In this particular case,  the ClientBase using reflection to to address this issue, and we need to different way to test the ClientBase's method.  Original ClientBaseSuite using getFieldValue() method to do this. But it doesn't work for yarn-alpha as the API returns an array of String instead of just String (which is the case for Yarn-stable API).
      
       To fix the test, I add a new method
      
        def getFieldValue2[A: ClassTag, A1: ClassTag, B](clazz: Class[_], field: String,
                                                            defaults: => B)
                                    (mapTo:  A => B)(mapTo1: A1 => B) : B =
          Try(clazz.getField(field)).map(_.get(null)).map {
            case v: A => mapTo(v)
            case v1: A1 => mapTo1(v1)
            case _ => defaults
          }.toOption.getOrElse(defaults)
      
      to handle the cases where the field type can be either type A or A1. In this new method the type A or A1 is pattern matched and corresponding mapTo function (mapTo or mapTo1) is used.
      
      Author: chesterxgchen <chester@alpinenow.com>
      
      Closes #2204 from chesterxgchen/SPARK-3177-master and squashes the following commits:
      
      e72a6ea [chesterxgchen]  The Issue is due to that yarn-alpha and yarn have different APIs for certain class fields. In this particular case,  the ClientBase using reflection to to address this issue, and we need to different way to test the ClientBase's method.  Original ClientBaseSuite using getFieldValue() method to do this. But it doesn't work for yarn-alpha as the API returns an array of String instead of just String (which is the case for Yarn-stable API).
      7d1a3723
    • viper-kun's avatar
      [Docs] Correct spark.files.fetchTimeout default value · 983609a4
      viper-kun authored
      change the value of spark.files.fetchTimeout
      
      Author: viper-kun <xukun.xu@huawei.com>
      
      Closes #2406 from viper-kun/master and squashes the following commits:
      
      ecb0d46 [viper-kun] [Docs] Correct spark.files.fetchTimeout default value
      7cf4c7a [viper-kun] Update configuration.md
      983609a4
  5. Sep 16, 2014
    • wangfei's avatar
      [Minor]ignore all config files in conf · 008a5ed4
      wangfei authored
      Some config files in ```conf``` should ignore, such as
              conf/fairscheduler.xml
              conf/hive-log4j.properties
              conf/metrics.properties
      ...
      So ignore all ```sh```/```properties```/```conf```/```xml``` files
      
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #2395 from scwf/patch-2 and squashes the following commits:
      
      3dc53f2 [wangfei] duplicate ```conf/*.conf```
      3c2986f [wangfei] ignore all config files
      008a5ed4
    • Andrew Or's avatar
      [SPARK-3555] Fix UISuite race condition · 0a7091e6
      Andrew Or authored
      The test "jetty selects different port under contention" is flaky.
      
      If another process binds to 4040 before the test starts, then the first server we start there will fail, and the subsequent servers we start thereafter may successfully bind to 4040 if it was released between the servers starting. Instead, we should just let Java find a random free port for us and hold onto it for the duration of the test.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2418 from andrewor14/fix-port-contention and squashes the following commits:
      
      0cd4974 [Andrew Or] Stop them servers
      a7071fe [Andrew Or] Pick random port instead of 4040
      0a7091e6
    • Evan Chan's avatar
      Add a Community Projects page · a6e1712f
      Evan Chan authored
      This adds a new page to the docs listing community projects -- those created outside of Apache Spark that are of interest to the community of Spark users.   Anybody can add to it just by submitting a PR.
      
      There was a discussion thread about alternatives:
      * Creating a Github organization for Spark projects -  we could not find any sponsors for this, and it would be difficult to organize since many folks just create repos in their company organization or personal accounts
      * Apache has some place for storing community projects, but it was deemed difficult to work with, and again would be some permissions issues -- not everyone could update it.
      
      Author: Evan Chan <velvia@gmail.com>
      
      Closes #2219 from velvia/community-projects-page and squashes the following commits:
      
      7316822 [Evan Chan] Point to Spark wiki: supplemental projects page
      613b021 [Evan Chan] Add a few more projects
      a85eaaf [Evan Chan] Add a Community Projects page
      a6e1712f
    • Dan Osipov's avatar
      [SPARK-787] Add S3 configuration parameters to the EC2 deploy scripts · b2017126
      Dan Osipov authored
      When deploying to AWS, there is additional configuration that is required to read S3 files. EMR creates it automatically, there is no reason that the Spark EC2 script shouldn't.
      
      This PR requires a corresponding PR to the mesos/spark-ec2 to be merged, as it gets cloned in the process of setting up machines: https://github.com/mesos/spark-ec2/pull/58
      
      Author: Dan Osipov <daniil.osipov@shazam.com>
      
      Closes #1120 from danosipov/s3_credentials and squashes the following commits:
      
      758da8b [Dan Osipov] Modify documentation to include the new parameter
      71fab14 [Dan Osipov] Use a parameter --copy-aws-credentials to enable S3 credential deployment
      7e0da26 [Dan Osipov] Get AWS credentials out of boto connection instance
      39bdf30 [Dan Osipov] Add S3 configuration parameters to the EC2 deploy scripts
      b2017126
    • Davies Liu's avatar
      [SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx · ec1adecb
      Davies Liu authored
      Using Sphinx to generate API docs for PySpark.
      
      requirement: Sphinx
      
      ```
      $ cd python/docs/
      $ make html
      ```
      
      The generated API docs will be located at python/docs/_build/html/index.html
      
      It can co-exists with those generated by Epydoc.
      
      This is the first working version, after merging in, then we can continue to improve it and replace the epydoc finally.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2292 from davies/sphinx and squashes the following commits:
      
      425a3b1 [Davies Liu] cleanup
      1573298 [Davies Liu] move docs to python/docs/
      5fe3903 [Davies Liu] Merge branch 'master' into sphinx
      9468ab0 [Davies Liu] fix makefile
      b408f38 [Davies Liu] address all comments
      e2ccb1b [Davies Liu] update name and version
      9081ead [Davies Liu] generate PySpark API docs using Sphinx
      ec1adecb
    • Kousuke Saruta's avatar
      [SPARK-3546] InputStream of ManagedBuffer is not closed and causes running out of file descriptor · a9e91043
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2408 from sarutak/resolve-resource-leak-issue and squashes the following commits:
      
      074781d [Kousuke Saruta] Modified SuffleBlockFetcherIterator
      5f63f67 [Kousuke Saruta] Move metrics increment logic and debug logging outside try block
      b37231a [Kousuke Saruta] Modified FileSegmentManagedBuffer#nioByteBuffer to check null or not before invoking channel.close
      bf29d4a [Kousuke Saruta] Modified FileSegment to close channel
      a9e91043
Loading