Skip to content
Snippets Groups Projects
  1. Aug 09, 2014
    • Chris Cope's avatar
      [SPARK-1766] sorted functions to meet pedantic requirements · e45daf22
      Chris Cope authored
      Pedantry is underrated
      
      Author: Chris Cope <ccope@resilientscience.com>
      
      Closes #1859 from copester/master and squashes the following commits:
      
      0fb4499 [Chris Cope] [SPARK-1766] sorted functions to meet pedantic requirements
      e45daf22
    • Chandan Kumar's avatar
      [SPARK-2861] Fix Doc comment of histogram method · b431e674
      Chandan Kumar authored
      Tested and ready to merge.
      
      Author: Chandan Kumar <chandan.kumar@imaginea.com>
      
      Closes #1786 from nrchandan/spark-2861 and squashes the following commits:
      
      cb0bc1e [Chandan Kumar] [SPARK-2861] Fix a typo in the histogram doc comment
      6a2a71b [Chandan Kumar] SPARK-2861. Fix Doc comment of histogram method
      b431e674
    • li-zhihui's avatar
      [SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode · 28dbae85
      li-zhihui authored
      In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.
      
      Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.
      
      Author: li-zhihui <zhihui.li@intel.com>
      Author: Li Zhihui <zhihui.li@intel.com>
      
      Closes #1525 from li-zhihui/fixre4s and squashes the following commits:
      
      e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
      abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
      ca54bd9 [li-zhihui] Format log with String interpolation
      88c7dc6 [li-zhihui] Few codes and docs refactor
      41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
      28dbae85
  2. Aug 08, 2014
    • Erik Erlandson's avatar
      [SPARK-2911] apply parent[T](j) to clarify UnionRDD code · 43af2817
      Erik Erlandson authored
      References to dependencies(j) for actually obtaining RDD parents are less common than I originally estimated.   It does clarify UnionRDD (also will clarify some of my other PRs)
      
      Use of firstParent[T] is ubiquitous, but not as sure that benefits from being replaced with parent(0)[T].
      
      Author: Erik Erlandson <eerlands@redhat.com>
      
      Closes #1858 from erikerlandson/spark-2911-pr2 and squashes the following commits:
      
      7ffea74 [Erik Erlandson] [SPARK-2911] apply parent[T](j) to clarify UnionRDD code
      43af2817
    • WangTao's avatar
      [Web UI]Make decision order of Worker's WebUI port consistent with Master's · 1c84dba9
      WangTao authored
      The decision order of Worker's WebUI port is "--webui-port", SPARK_WORKER_WEBUI_POR, 8081(default), spark.worker.ui.port. But in Master, the order is "--webui-port", spark.master.ui.port, SPARK_MASTER_WEBUI_PORT and 8080(default).
      
      So we change the order in Worker's to keep it consistent with Master.
      
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #1838 from WangTaoTheTonic/reOrder and squashes the following commits:
      
      460f4d4 [WangTao] Make decision order of Worker's WebUI consistent with Master's
      1c84dba9
    • GuoQiang Li's avatar
      [SPARK-2897][SPARK-2920]TorrentBroadcast does use the serializer class... · ec79063f
      GuoQiang Li authored
      [SPARK-2897][SPARK-2920]TorrentBroadcast does use the serializer class specified in the spark option "spark.serializer"
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #1836 from witgo/SPARK-2897 and squashes the following commits:
      
      23cdc5b [GuoQiang Li] review commit
      ada4fba [GuoQiang Li] TorrentBroadcast does not support broadcast compression
      fb91792 [GuoQiang Li] org.apache.spark.broadcast.TorrentBroadcast does use the serializer class specified in the spark option "spark.serializer"
      ec79063f
    • Xiangrui Meng's avatar
      [SPARK-1997][MLLIB] update breeze to 0.9 · 74d6f622
      Xiangrui Meng authored
      0.9 dependences (this version doesn't depend on scalalogging and I excluded commons-math3 from its transitive dependencies):
      ~~~
      +-org.scalanlp:breeze_2.10:0.9 [S]
        +-com.github.fommil.netlib:core:1.1.2
        +-com.github.rwl:jtransforms:2.4.0
        +-net.sf.opencsv:opencsv:2.3
        +-net.sourceforge.f2j:arpack_combined_all:0.1
        +-org.scalanlp:breeze-macros_2.10:0.3.1 [S]
        | +-org.scalamacros:quasiquotes_2.10:2.0.0 [S]
        |
        +-org.slf4j:slf4j-api:1.7.5
        +-org.spire-math:spire_2.10:0.7.4 [S]
          +-org.scalamacros:quasiquotes_2.10:2.0.0 [S]
          |
          +-org.spire-math:spire-macros_2.10:0.7.4 [S]
            +-org.scalamacros:quasiquotes_2.10:2.0.0 [S]
      ~~~
      
      Closes #1749
      
      CC: witgo avati
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1857 from mengxr/breeze-0.9 and squashes the following commits:
      
      7fc16b6 [Xiangrui Meng] don't know why but exclude a private method for mima
      dcc502e [Xiangrui Meng] update breeze to 0.9
      74d6f622
    • chutium's avatar
      [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be... · b7c89a7f
      chutium authored
      [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
      
      Author: chutium <teng.qiu@gmail.com>
      
      Closes #1691 from chutium/SPARK-2700 and squashes the following commits:
      
      b76ae8c [chutium] [SPARK-2700] [SQL] fixed styling issue
      d75a8bd [chutium] [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
      b7c89a7f
    • Yin Huai's avatar
      [SPARK-2919] [SQL] Basic support for analyze command in HiveQl · 45d8f4de
      Yin Huai authored
      The command we will support is
      ```
      ANALYZE TABLE tablename COMPUTE STATISTICS noscan
      ```
      Other cases shown in https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables will still be treated as Hive native commands.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-2919
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1848 from yhuai/sqlAnalyze and squashes the following commits:
      
      0b79d36 [Yin Huai] Typo and format.
      c59d94b [Yin Huai] Support "ANALYZE TABLE tableName COMPUTE STATISTICS noscan".
      45d8f4de
    • Yin Huai's avatar
      [SPARK-2877] [SQL] MetastoreRelation should use SparkClassLoader when creating the tableDesc · c874723f
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2877
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1806 from yhuai/SPARK-2877 and squashes the following commits:
      
      4142bcb [Yin Huai] Use Spark's classloader.
      c874723f
    • Yin Huai's avatar
      [SPARK-2908] [SQL] JsonRDD.nullTypeToStringType does not convert all NullType to StringType · 0489cee6
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2908
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1840 from yhuai/SPARK-2908 and squashes the following commits:
      
      86e833e [Yin Huai] Update test.
      cb11759 [Yin Huai] nullTypeToStringType should check columns with the type of array of structs.
      0489cee6
    • Yin Huai's avatar
      [SPARK-2888] [SQL] Fix addColumnMetadataToConf in HiveTableScan · 9016af3f
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2888
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1817 from yhuai/fixAddColumnMetadataToConf and squashes the following commits:
      
      fba728c [Yin Huai] Fix addColumnMetadataToConf.
      9016af3f
    • Erik Erlandson's avatar
      [SPARK-2911]: provide rdd.parent[T](j) to obtain jth parent RDD · 9a54de16
      Erik Erlandson authored
      Author: Erik Erlandson <eerlands@redhat.com>
      
      Closes #1841 from erikerlandson/spark-2911-pr and squashes the following commits:
      
      4699e2f [Erik Erlandson] [SPARK-2911]: provide rdd.parent[T](j) to obtain jth parent RDD
      9a54de16
  3. Aug 07, 2014
    • Kousuke Saruta's avatar
      [SPARK-2904] Remove non-used local variable in SparkSubmitArguments · 9de6a42b
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #1834 from sarutak/SPARK-2904 and squashes the following commits:
      
      38e7d45 [Kousuke Saruta] Removed non-used variable in SparkSubmitArguments
      9de6a42b
    • Sandy Ryza's avatar
      SPARK-2565. Update ShuffleReadMetrics as blocks are fetched · 4c51098f
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1507 from sryza/sandy-spark-2565 and squashes the following commits:
      
      74dad41 [Sandy Ryza] SPARK-2565. Update ShuffleReadMetrics as blocks are fetched
      4c51098f
    • Matei Zaharia's avatar
      SPARK-2787: Make sort-based shuffle write files directly when there's no... · 6906b69c
      Matei Zaharia authored
      SPARK-2787: Make sort-based shuffle write files directly when there's no sorting/aggregation and # partitions is small
      
      As described in https://issues.apache.org/jira/browse/SPARK-2787, right now sort-based shuffle is more expensive than hash-based for map operations that do no partial aggregation or sorting, such as groupByKey. This is because it has to serialize each data item twice (once when spilling to intermediate files, and then again when merging these files object-by-object). This patch adds a code path to just write separate files directly if the # of output partitions is small, and concatenate them at the end to produce a sorted file.
      
      On the unit test side, I added some tests that force or don't force this bypass path to be used, and checked that our tests for other features (e.g. all the operations) cover both cases.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #1799 from mateiz/SPARK-2787 and squashes the following commits:
      
      88cf26a [Matei Zaharia] Fix rebase
      10233af [Matei Zaharia] Review comments
      398cb95 [Matei Zaharia] Fix looking up shuffle manager in conf
      ca3efd9 [Matei Zaharia] Add docs for shuffle manager properties, and allow short names for them
      d0ae3c5 [Matei Zaharia] Fix some comments
      90d084f [Matei Zaharia] Add code path to bypass merge-sort in ExternalSorter, and tests
      31e5d7c [Matei Zaharia] Move existing logic for writing partitioned files into ExternalSorter
      6906b69c
    • Prashant Sharma's avatar
      SPARK-2899 Doc generation is back to working in new SBT Build. · 32096c2a
      Prashant Sharma authored
      The reason for this bug was introduciton of OldDeps project. It had to be excluded to prevent unidocs from trying to put it on "docs compile" classpath.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1830 from ScrapCodes/doc-fix and squashes the following commits:
      
      e5d52e6 [Prashant Sharma] SPARK-2899 Doc generation is back to working in new SBT Build.
      32096c2a
    • Oleg Danilov's avatar
      SPARK-2905 Fixed path sbin => bin · 80ec5bad
      Oleg Danilov authored
      Author: Oleg Danilov <oleg.danilov@wandisco.com>
      
      Closes #1835 from dosoft/SPARK-2905 and squashes the following commits:
      
      4df423c [Oleg Danilov] SPARK-2905 Fixed path sbin => bin
      80ec5bad
    • Xiangrui Meng's avatar
      [SPARK-2852][MLLIB] Separate model from IDF/StandardScaler algorithms · b9e9e537
      Xiangrui Meng authored
      This is part of SPARK-2828:
      
      1. separate IDF model from IDF algorithm (which generates a model)
      2. separate StandardScaler model from StandardScaler
      
      CC: dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1814 from mengxr/feature-api-update and squashes the following commits:
      
      40d863b [Xiangrui Meng] move mean and variance to model
      48a0fff [Xiangrui Meng] separate Model from StandardScaler algorithm
      89f3486 [Xiangrui Meng] update IDF to separate Model from Algorithm
      b9e9e537
    • Joseph K. Bradley's avatar
      [mllib] DecisionTree Strategy parameter checks · 8d1dec4f
      Joseph K. Bradley authored
      Added some checks to Strategy to print out meaningful error messages when given invalid DecisionTree parameters.
      CC mengxr
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #1821 from jkbradley/dt-robustness and squashes the following commits:
      
      4dc449a [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-robustness
      7a61f7b [Joseph K. Bradley] Added some checks to Strategy to print out meaningful error messages when given invalid DecisionTree parameters
      8d1dec4f
    • Sean Owen's avatar
      SPARK-2879 part 2 [BUILD] Use HTTPS to access Maven Central and other repos · 75993a65
      Sean Owen authored
      .. and use canonical repo1.maven.org Maven Central repo. (And make sure snapshots are disabled for plugins from Maven Central.)
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1828 from srowen/SPARK-2879.2 and squashes the following commits:
      
      639f495 [Sean Owen] .. and use canonical repo1.maven.org Maven Central repo. (And make sure snapshots are disabled for plugins from Maven Central.)
      75993a65
    • Joseph K. Bradley's avatar
      [SPARK-2851] [mllib] DecisionTree Python consistency update · 47ccd5e7
      Joseph K. Bradley authored
      Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).
      
      Added factory classes for Algo and Impurity, but made private[mllib].
      
      CC: mengxr dorx  Please let me know if there are other changes which would help with API consistency---thanks!
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #1798 from jkbradley/dt-python-consistency and squashes the following commits:
      
      6f7edf8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
      a0d7dbe [Joseph K. Bradley] DecisionTree: In Java-friendly train* methods, changed to use JavaRDD instead of RDD.
      ee1d236 [Joseph K. Bradley] DecisionTree API updates: * Removed train() function in Python API (tree.py) ** Removed corresponding function in Scala/Java API (the ones taking basic types)
      00f820e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
      fe6dbfa [Joseph K. Bradley] removed unnecessary imports
      e358661 [Joseph K. Bradley] DecisionTree API change: * Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).
      c699850 [Joseph K. Bradley] a few doc comments
      eaf84c0 [Joseph K. Bradley] Added DecisionTree static train() methods API to match Python, but without default parameters
      47ccd5e7
  4. Aug 06, 2014
    • Davies Liu's avatar
      [SPARK-2887] fix bug of countApproxDistinct() when have more than one partition · ffd1f59a
      Davies Liu authored
      fix bug of countApproxDistinct() when have more than one partition
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1812 from davies/approx and squashes the following commits:
      
      bf757ce [Davies Liu] fix bug of countApproxDistinct() when have more than one partition
      ffd1f59a
    • Patrick Wendell's avatar
      HOTFIX: Support custom Java 7 location · a263a7e9
      Patrick Wendell authored
      a263a7e9
    • Sean Owen's avatar
      SPARK-2879 [BUILD] Use HTTPS to access Maven Central and other repos · 4201d271
      Sean Owen authored
      Maven Central has just now enabled HTTPS access for everyone to Maven Central (http://central.sonatype.org/articles/2014/Aug/03/https-support-launching-now/) This is timely, as a reminder of how easily an attacker can slip malicious code into a build that's downloading artifacts over HTTP (http://blog.ontoillogical.com/blog/2014/07/28/how-to-take-over-any-java-developer/).
      
      In the meantime, it looks like the Spring repo also now supports HTTPS, so can be used this way too.
      
      I propose to use HTTPS to access these repos.
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1805 from srowen/SPARK-2879 and squashes the following commits:
      
      7043a8e [Sean Owen] Use HTTPS for Maven Central libs and plugins; use id 'central' to override parent properly; use HTTPS for Spring repo
      4201d271
    • Kousuke Saruta's avatar
      [SPARK-2583] ConnectionManager error reporting · 17caae48
      Kousuke Saruta authored
      This patch modifies the ConnectionManager so that error messages are sent in reply when uncaught exceptions occur during message processing.  This prevents message senders from hanging while waiting for an acknowledgment if the remote message processing failed.
      
      This is an updated version of sarutak's PR, #1490.  The main change is to use Futures / Promises to signal errors.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #1758 from JoshRosen/connection-manager-fixes and squashes the following commits:
      
      68620cb [Josh Rosen] Fix test in BlockFetcherIteratorSuite:
      83673de [Josh Rosen] Error ACKs should trigger IOExceptions, so catch only those exceptions in the test.
      b8bb4d4 [Josh Rosen] Fix manager.id vs managerServer.id typo that broke security tests.
      659521f [Josh Rosen] Include previous exception when throwing new one
      a2f745c [Josh Rosen] Remove sendMessageReliablySync; callers can wait themselves.
      c01c450 [Josh Rosen] Return Try[Message] from sendMessageReliablySync.
      f1cd1bb [Josh Rosen] Clean up @sarutak's PR #1490 for [SPARK-2583]: ConnectionManager error reporting
      7399c6b [Josh Rosen] Merge remote-tracking branch 'origin/pr/1490' into connection-manager-fixes
      ee91bb7 [Kousuke Saruta] Modified BufferMessage.scala to keep the spark code style
      9dfd0d8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      e7d9aa6 [Kousuke Saruta] rebase to master
      326a17f [Kousuke Saruta] Add test cases to ConnectionManagerSuite.scala for SPARK-2583
      2a18d6b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      22d7ebd [Kousuke Saruta] Add test cases to BlockManagerSuite for SPARK-2583
      e579302 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      281589c [Kousuke Saruta] Add a test case to BlockFetcherIteratorSuite.scala for fetching block from remote from successfully
      0654128 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      ffaa83d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      12d3de8 [Kousuke Saruta] Added BlockFetcherIteratorSuite.scala
      4117b8f [Kousuke Saruta] Modified ConnectionManager to be alble to handle error during processing message
      717c9c3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      6635467 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      e2b8c4a [Kousuke Saruta] Modify to propagete error using ConnectionManager
      17caae48
    • Gregory Owen's avatar
      SPARK-2882: Spark build now checks local maven cache for dependencies · 4e008334
      Gregory Owen authored
      Fixes [SPARK-2882](https://issues.apache.org/jira/browse/SPARK-2882)
      
      Author: Gregory Owen <greowen@gmail.com>
      
      Closes #1818 from GregOwen/spark-2882 and squashes the following commits:
      
      294446d [Gregory Owen] SPARK-2882: Spark build now checks local maven cache for dependencies
      4e008334
    • Andrew Or's avatar
      [HOTFIX][Streaming] Handle port collisions in flume polling test · c6889d2c
      Andrew Or authored
      This is failing my tests in #1777. @tdas
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1803 from andrewor14/fix-flaky-streaming-test and squashes the following commits:
      
      ea11a03 [Andrew Or] Catch all exceptions caused by BindExceptions
      54a0ca0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-flaky-streaming-test
      664095c [Andrew Or] Tone down bind exception message
      af3ddc9 [Andrew Or] Handle port collisions in flume polling test
      c6889d2c
    • RJ Nowling's avatar
      [PySpark] Add blanklines to Python docstrings so example code renders correctly · e537b33c
      RJ Nowling authored
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #1808 from rnowling/pyspark_docs and squashes the following commits:
      
      c06d774 [RJ Nowling] Add blanklines to Python docstrings so example code renders correctly
      e537b33c
    • Xiangrui Meng's avatar
      [SPARK-2852][MLLIB] API consistency for `mllib.feature` · 25cff101
      Xiangrui Meng authored
      This is part of SPARK-2828:
      
      1. added a Java-friendly fit method to Word2Vec with tests
      2. change DeveloperApi to Experimental for Normalizer & StandardScaler
      3. change default feature dimension to 2^20 in HashingTF
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1807 from mengxr/feature-api-check and squashes the following commits:
      
      773c1a9 [Xiangrui Meng] change default numFeatures to 2^20 in HashingTF change annotation from DeveloperApi to Experimental in Normalizer and StandardScaler
      883e122 [Xiangrui Meng] add @Experimental to Word2VecModel add a Java-friendly method to Word2Vec.fit with tests
      25cff101
    • Sandy Ryza's avatar
      SPARK-2566. Update ShuffleWriteMetrics incrementally · 4e982364
      Sandy Ryza authored
      I haven't tested this out on a cluster yet, but wanted to make sure the approach (passing ShuffleWriteMetrics down to DiskBlockObjectWriter) was ok
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1481 from sryza/sandy-spark-2566 and squashes the following commits:
      
      8090d88 [Sandy Ryza] Fix ExternalSorter
      b2a62ed [Sandy Ryza] Fix more test failures
      8be6218 [Sandy Ryza] Fix test failures and mark a couple variables private
      c5e68e5 [Sandy Ryza] SPARK-2566. Update ShuffleWriteMetrics incrementally
      4e982364
    • Nicholas Chammas's avatar
      [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically · d614967b
      Nicholas Chammas authored
      As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.
      
      Notes:
      * We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server.
      * I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
      * I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
      * Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #1744 from nchammas/master and squashes the following commits:
      
      274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes
      983d963 [nchammas] Merge pull request #5 from apache/master
      1db5314 [nchammas] Merge pull request #4 from apache/master
      0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes
      bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing
      6db9a44 [nchammas] Merge pull request #3 from apache/master
      7b4750e [Nicholas Chammas] merge upstream changes
      91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks
      44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files
      b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily
      bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes
      9da347f [nchammas] Merge pull request #2 from apache/master
      aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks
      d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine
      dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime
      a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections
      21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8
      6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes
      fe57ed0 [Nicholas Chammas] removing merge conflict backups
      9c01d4c [nchammas] Merge pull request #1 from apache/master
      9a66cb0 [Nicholas Chammas] resolving merge conflicts
      a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes
      beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status
      723ed39 [Nicholas Chammas] always delete the report file
      0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests
      12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter
      61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter
      75ad552 [Nicholas Chammas] make check output style consistent
      d614967b
    • Cheng Lian's avatar
      [SPARK-2678][Core][SQL] A workaround for SPARK-2678 · a6cd3110
      Cheng Lian authored
      JIRA issues:
      
      - Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      - Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874)
      
      Related PR:
      
      - #1715
      
      This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1801 from liancheng/spark-2874 and squashes the following commits:
      
      8045d7a [Cheng Lian] Make sure test suites pass
      8493a9e [Cheng Lian] Using eval to retain quoted arguments
      aed523f [Cheng Lian] Fixed typo in bin/spark-sql
      f12a0b1 [Cheng Lian] Worked arount SPARK-2678
      daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
      a6cd3110
    • Davies Liu's avatar
      [SPARK-2875] [PySpark] [SQL] handle null in schemaRDD() · 48789117
      Davies Liu authored
      Handle null in schemaRDD during converting them into Python.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1802 from davies/json and squashes the following commits:
      
      88e6b1f [Davies Liu] handle null in schemaRDD()
      48789117
    • Andrew Or's avatar
      [SPARK-2157] Enable tight firewall rules for Spark · 09f7e458
      Andrew Or authored
      The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107.
      
      The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs.
      
      My spark-env.sh looks like this:
      ```
      export SPARK_MASTER_PORT=6060
      export SPARK_WORKER_PORT=7070
      export SPARK_MASTER_WEBUI_PORT=9090
      export SPARK_WORKER_WEBUI_PORT=9091
      ```
      and my spark-defaults.conf looks like this:
      ```
      spark.master spark://andrews-mbp:6060
      spark.driver.port 5001
      spark.fileserver.port 5011
      spark.broadcast.port 5021
      spark.replClassServer.port 5031
      spark.blockManager.port 5041
      spark.executor.port 5051
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #1777 from andrewor14/configure-ports and squashes the following commits:
      
      621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      8a6b820 [Andrew Or] Use a random UI port during tests
      7da0493 [Andrew Or] Fix tests
      523c30e [Andrew Or] Add test for isBindCollision
      b97b02a [Andrew Or] Minor fixes
      c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      93d359f [Andrew Or] Executors connect to wrong port when collision occurs
      d502e5f [Andrew Or] Handle port collisions when creating Akka systems
      a2dd05c [Andrew Or] Patrick's comment nit
      86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port
      1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode
      cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.)
      e837cde [Andrew Or] Remove outdated TODOs
      bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      de1b207 [Andrew Or] Update docs to reflect new ports
      b565079 [Andrew Or] Add spark.ports.maxRetries
      2551eb2 [Andrew Or] Remove spark.worker.watcher.port
      151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      9868358 [Andrew Or] Add a few miscellaneous ports
      6016e77 [Andrew Or] Add spark.executor.port
      8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT
      4d9e6f3 [Andrew Or] Fix super subtle bug
      3f8e51b [Andrew Or] Correct erroneous docs...
      e111d08 [Andrew Or] Add names for UI services
      470f38c [Andrew Or] Special case non-"Address already in use" exceptions
      1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port
      ba32280 [Andrew Or] Minor fixes
      6b550b0 [Andrew Or] Assorted fixes
      73fbe89 [Andrew Or] Move start service logic to Utils
      ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports
      038a579 [Andrew Ash] Trust the server start function to report the port the service started on
      7c5bdc4 [Andrew Ash] Fix style issue
      0347aef [Andrew Ash] Unify port fallback logic to a single place
      24a4c32 [Andrew Ash] Remove type on val to match surrounding style
      9e4ad96 [Andrew Ash] Reformat for style checker
      5d84e0e [Andrew Ash] Document new port configuration options
      066dc7a [Andrew Ash] Fix up HttpServer port increments
      cad16da [Andrew Ash] Add fallover increment logic for HttpServer
      c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment
      b80d2fd [Andrew Ash] Make Spark's block manager port configurable
      17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server
      f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast
      49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer
      1c0981a [Andrew Ash] Make port in HttpServer configurable
      09f7e458
    • Tathagata Das's avatar
      [SPARK-1022][Streaming][HOTFIX] Fixed zookeeper dependency of Kafka · ee7f3085
      Tathagata Das authored
      https://github.com/apache/spark/pull/1751 caused maven builds to fail.
      
      ```
      ~/Apache/spark(branch-1.1|✔️) ➤ mvn -U -DskipTests clean install
      .
      .
      .
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:36: object NIOServerCnxnFactory is not a member of package org.apache.zookeeper.server
      [error] import org.apache.zookeeper.server.NIOServerCnxnFactory
      [error]        ^
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:199: not found: type NIOServerCnxnFactory
      [error]     val factory = new NIOServerCnxnFactory()
      [error]                       ^
      [error] two errors found
      [error] Compile failed at Aug 5, 2014 1:42:36 PM [0.503s]
      ```
      
      The problem is how SBT and Maven resolves multiple versions of the same library, which in this case, is Zookeeper. Observing and comparing the dependency trees from Maven and SBT showed this. Spark depends on ZK 3.4.5 whereas Apache Kafka transitively depends on upon ZK 3.3.4. SBT decides to evict 3.3.4 and use the higher version 3.4.5. But Maven decides to stick to the closest (in the tree) dependent version of 3.3.4. And 3.3.4 does not have NIOServerCnxnFactory.
      
      The solution in this patch excludes zookeeper from the apache-kafka dependency in streaming-kafka module so that it just inherits zookeeper from Spark core.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1797 from tdas/kafka-zk-fix and squashes the following commits:
      
      94b3931 [Tathagata Das] Fixed zookeeper dependency of Kafka
      ee7f3085
    • DB Tsai's avatar
      [MLlib] Use this.type as return type in k-means' builder pattern · c7b52010
      DB Tsai authored
      to ensure that the return object is itself.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #1796 from dbtsai/dbtsai-kmeans and squashes the following commits:
      
      658989e [DB Tsai] Alpine Data Labs
      c7b52010
    • CodingCat's avatar
      SPARK-2294: fix locality inversion bug in TaskManager · 63bdb1f4
      CodingCat authored
      copied from original JIRA (https://issues.apache.org/jira/browse/SPARK-2294):
      
      If an executor E is free, a task may be speculatively assigned to E when there are other tasks in the job that have not been launched (at all) yet. Similarly, a task without any locality preferences may be assigned to E when there was another NODE_LOCAL task that could have been scheduled.
      This happens because TaskSchedulerImpl calls TaskSetManager.resourceOffer (which in turn calls TaskSetManager.findTask) with increasing locality levels, beginning with PROCESS_LOCAL, followed by NODE_LOCAL, and so on until the highest currently allowed level. Now, supposed NODE_LOCAL is the highest currently allowed locality level. The first time findTask is called, it will be called with max level PROCESS_LOCAL; if it cannot find any PROCESS_LOCAL tasks, it will try to schedule tasks with no locality preferences or speculative tasks. As a result, speculative tasks or tasks with no preferences may be scheduled instead of NODE_LOCAL tasks.
      
      ----
      
      I added an additional parameter in resourceOffer and findTask, maxLocality, indicating when we should consider the tasks without locality preference
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #1313 from CodingCat/SPARK-2294 and squashes the following commits:
      
      bf3f13b [CodingCat] rollback some forgotten changes
      89f9bc0 [CodingCat] address matei's comments
      18cae02 [CodingCat] add test case for node-local tasks
      2ba6195 [CodingCat] fix failed test cases
      87dd09e [CodingCat] fix style
      9b9432f [CodingCat] remove hasNodeLocalOnlyTasks
      fdd1573 [CodingCat] fix failed test cases
      941a4fd [CodingCat] see my shocked face..........
      f600085 [CodingCat] remove hasNodeLocalOnlyTasks checking
      0b8a46b [CodingCat] test whether hasNodeLocalOnlyTasks affect the results
      73ceda8 [CodingCat] style fix
      b3a430b [CodingCat] remove fine granularity tracking for node-local only tasks
      f9a2ad8 [CodingCat] simplify the logic in TaskSchedulerImpl
      c8c1de4 [CodingCat] simplify the patch
      be652ed [CodingCat] avoid unnecessary delay when we only have nopref tasks
      dee9e22 [CodingCat] fix locality inversion bug in TaskManager by moving nopref branch
      63bdb1f4
    • Michael Armbrust's avatar
      [SQL] Fix logging warn -> debug · 5a826c00
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1800 from marmbrus/warning and squashes the following commits:
      
      8ea9cf1 [Michael Armbrust] [SQL] Fix logging warn -> debug.
      5a826c00
    • Reynold Xin's avatar
      [SQL] Tighten the visibility of various SQLConf methods and renamed setter/getters · b70bae40
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1794 from rxin/sql-conf and squashes the following commits:
      
      3ac11ef [Reynold Xin] getAllConfs return an immutable Map instead of an Array.
      4b19d6c [Reynold Xin] Tighten the visibility of various SQLConf methods and renamed setter/getters.
      b70bae40
Loading