Skip to content
Snippets Groups Projects
  1. Aug 07, 2014
    • Sean Owen's avatar
      SPARK-2879 part 2 [BUILD] Use HTTPS to access Maven Central and other repos · 75993a65
      Sean Owen authored
      .. and use canonical repo1.maven.org Maven Central repo. (And make sure snapshots are disabled for plugins from Maven Central.)
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1828 from srowen/SPARK-2879.2 and squashes the following commits:
      
      639f495 [Sean Owen] .. and use canonical repo1.maven.org Maven Central repo. (And make sure snapshots are disabled for plugins from Maven Central.)
      75993a65
    • Joseph K. Bradley's avatar
      [SPARK-2851] [mllib] DecisionTree Python consistency update · 47ccd5e7
      Joseph K. Bradley authored
      Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).
      
      Added factory classes for Algo and Impurity, but made private[mllib].
      
      CC: mengxr dorx  Please let me know if there are other changes which would help with API consistency---thanks!
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #1798 from jkbradley/dt-python-consistency and squashes the following commits:
      
      6f7edf8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
      a0d7dbe [Joseph K. Bradley] DecisionTree: In Java-friendly train* methods, changed to use JavaRDD instead of RDD.
      ee1d236 [Joseph K. Bradley] DecisionTree API updates: * Removed train() function in Python API (tree.py) ** Removed corresponding function in Scala/Java API (the ones taking basic types)
      00f820e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
      fe6dbfa [Joseph K. Bradley] removed unnecessary imports
      e358661 [Joseph K. Bradley] DecisionTree API change: * Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).
      c699850 [Joseph K. Bradley] a few doc comments
      eaf84c0 [Joseph K. Bradley] Added DecisionTree static train() methods API to match Python, but without default parameters
      47ccd5e7
  2. Aug 06, 2014
    • Davies Liu's avatar
      [SPARK-2887] fix bug of countApproxDistinct() when have more than one partition · ffd1f59a
      Davies Liu authored
      fix bug of countApproxDistinct() when have more than one partition
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1812 from davies/approx and squashes the following commits:
      
      bf757ce [Davies Liu] fix bug of countApproxDistinct() when have more than one partition
      ffd1f59a
    • Patrick Wendell's avatar
      HOTFIX: Support custom Java 7 location · a263a7e9
      Patrick Wendell authored
      a263a7e9
    • Sean Owen's avatar
      SPARK-2879 [BUILD] Use HTTPS to access Maven Central and other repos · 4201d271
      Sean Owen authored
      Maven Central has just now enabled HTTPS access for everyone to Maven Central (http://central.sonatype.org/articles/2014/Aug/03/https-support-launching-now/) This is timely, as a reminder of how easily an attacker can slip malicious code into a build that's downloading artifacts over HTTP (http://blog.ontoillogical.com/blog/2014/07/28/how-to-take-over-any-java-developer/).
      
      In the meantime, it looks like the Spring repo also now supports HTTPS, so can be used this way too.
      
      I propose to use HTTPS to access these repos.
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1805 from srowen/SPARK-2879 and squashes the following commits:
      
      7043a8e [Sean Owen] Use HTTPS for Maven Central libs and plugins; use id 'central' to override parent properly; use HTTPS for Spring repo
      4201d271
    • Kousuke Saruta's avatar
      [SPARK-2583] ConnectionManager error reporting · 17caae48
      Kousuke Saruta authored
      This patch modifies the ConnectionManager so that error messages are sent in reply when uncaught exceptions occur during message processing.  This prevents message senders from hanging while waiting for an acknowledgment if the remote message processing failed.
      
      This is an updated version of sarutak's PR, #1490.  The main change is to use Futures / Promises to signal errors.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #1758 from JoshRosen/connection-manager-fixes and squashes the following commits:
      
      68620cb [Josh Rosen] Fix test in BlockFetcherIteratorSuite:
      83673de [Josh Rosen] Error ACKs should trigger IOExceptions, so catch only those exceptions in the test.
      b8bb4d4 [Josh Rosen] Fix manager.id vs managerServer.id typo that broke security tests.
      659521f [Josh Rosen] Include previous exception when throwing new one
      a2f745c [Josh Rosen] Remove sendMessageReliablySync; callers can wait themselves.
      c01c450 [Josh Rosen] Return Try[Message] from sendMessageReliablySync.
      f1cd1bb [Josh Rosen] Clean up @sarutak's PR #1490 for [SPARK-2583]: ConnectionManager error reporting
      7399c6b [Josh Rosen] Merge remote-tracking branch 'origin/pr/1490' into connection-manager-fixes
      ee91bb7 [Kousuke Saruta] Modified BufferMessage.scala to keep the spark code style
      9dfd0d8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      e7d9aa6 [Kousuke Saruta] rebase to master
      326a17f [Kousuke Saruta] Add test cases to ConnectionManagerSuite.scala for SPARK-2583
      2a18d6b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      22d7ebd [Kousuke Saruta] Add test cases to BlockManagerSuite for SPARK-2583
      e579302 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      281589c [Kousuke Saruta] Add a test case to BlockFetcherIteratorSuite.scala for fetching block from remote from successfully
      0654128 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      ffaa83d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      12d3de8 [Kousuke Saruta] Added BlockFetcherIteratorSuite.scala
      4117b8f [Kousuke Saruta] Modified ConnectionManager to be alble to handle error during processing message
      717c9c3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      6635467 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2583
      e2b8c4a [Kousuke Saruta] Modify to propagete error using ConnectionManager
      17caae48
    • Gregory Owen's avatar
      SPARK-2882: Spark build now checks local maven cache for dependencies · 4e008334
      Gregory Owen authored
      Fixes [SPARK-2882](https://issues.apache.org/jira/browse/SPARK-2882)
      
      Author: Gregory Owen <greowen@gmail.com>
      
      Closes #1818 from GregOwen/spark-2882 and squashes the following commits:
      
      294446d [Gregory Owen] SPARK-2882: Spark build now checks local maven cache for dependencies
      4e008334
    • Andrew Or's avatar
      [HOTFIX][Streaming] Handle port collisions in flume polling test · c6889d2c
      Andrew Or authored
      This is failing my tests in #1777. @tdas
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1803 from andrewor14/fix-flaky-streaming-test and squashes the following commits:
      
      ea11a03 [Andrew Or] Catch all exceptions caused by BindExceptions
      54a0ca0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-flaky-streaming-test
      664095c [Andrew Or] Tone down bind exception message
      af3ddc9 [Andrew Or] Handle port collisions in flume polling test
      c6889d2c
    • RJ Nowling's avatar
      [PySpark] Add blanklines to Python docstrings so example code renders correctly · e537b33c
      RJ Nowling authored
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #1808 from rnowling/pyspark_docs and squashes the following commits:
      
      c06d774 [RJ Nowling] Add blanklines to Python docstrings so example code renders correctly
      e537b33c
    • Xiangrui Meng's avatar
      [SPARK-2852][MLLIB] API consistency for `mllib.feature` · 25cff101
      Xiangrui Meng authored
      This is part of SPARK-2828:
      
      1. added a Java-friendly fit method to Word2Vec with tests
      2. change DeveloperApi to Experimental for Normalizer & StandardScaler
      3. change default feature dimension to 2^20 in HashingTF
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1807 from mengxr/feature-api-check and squashes the following commits:
      
      773c1a9 [Xiangrui Meng] change default numFeatures to 2^20 in HashingTF change annotation from DeveloperApi to Experimental in Normalizer and StandardScaler
      883e122 [Xiangrui Meng] add @Experimental to Word2VecModel add a Java-friendly method to Word2Vec.fit with tests
      25cff101
    • Sandy Ryza's avatar
      SPARK-2566. Update ShuffleWriteMetrics incrementally · 4e982364
      Sandy Ryza authored
      I haven't tested this out on a cluster yet, but wanted to make sure the approach (passing ShuffleWriteMetrics down to DiskBlockObjectWriter) was ok
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1481 from sryza/sandy-spark-2566 and squashes the following commits:
      
      8090d88 [Sandy Ryza] Fix ExternalSorter
      b2a62ed [Sandy Ryza] Fix more test failures
      8be6218 [Sandy Ryza] Fix test failures and mark a couple variables private
      c5e68e5 [Sandy Ryza] SPARK-2566. Update ShuffleWriteMetrics incrementally
      4e982364
    • Nicholas Chammas's avatar
      [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically · d614967b
      Nicholas Chammas authored
      As described in [SPARK-2627](https://issues.apache.org/jira/browse/SPARK-2627), we'd like Python code to automatically be checked for PEP 8 compliance by Jenkins. This pull request aims to do that.
      
      Notes:
      * We may need to install [`pep8`](https://pypi.python.org/pypi/pep8) on the build server.
      * I'm expecting tests to fail now that PEP 8 compliance is being checked as part of the build. I'm fine with cleaning up any remaining PEP 8 violations as part of this pull request.
      * I did not understand why the RAT and scalastyle reports are saved to text files. I did the same for the PEP 8 check, but only so that the console output style can match those for the RAT and scalastyle checks. The PEP 8 report is removed right after the check is complete.
      * Updates to the ["Contributing to Spark"](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) guide will be submitted elsewhere, as I don't believe that text is part of the Spark repo.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #1744 from nchammas/master and squashes the following commits:
      
      274b238 [Nicholas Chammas] [SPARK-2627] [PySpark] minor indentation changes
      983d963 [nchammas] Merge pull request #5 from apache/master
      1db5314 [nchammas] Merge pull request #4 from apache/master
      0e0245f [Nicholas Chammas] [SPARK-2627] undo erroneous whitespace fixes
      bf30942 [Nicholas Chammas] [SPARK-2627] PEP8: comment spacing
      6db9a44 [nchammas] Merge pull request #3 from apache/master
      7b4750e [Nicholas Chammas] merge upstream changes
      91b7584 [Nicholas Chammas] [SPARK-2627] undo unnecessary line breaks
      44e3e56 [Nicholas Chammas] [SPARK-2627] use tox.ini to exclude files
      b09fae2 [Nicholas Chammas] don't wrap comments unnecessarily
      bfb9f9f [Nicholas Chammas] [SPARK-2627] keep up with the PEP 8 fixes
      9da347f [nchammas] Merge pull request #2 from apache/master
      aa5b4b5 [Nicholas Chammas] [SPARK-2627] follow Spark bash style for if blocks
      d0a83b9 [Nicholas Chammas] [SPARK-2627] check that pep8 downloaded fine
      dffb5dd [Nicholas Chammas] [SPARK-2627] download pep8 at runtime
      a1ce7ae [Nicholas Chammas] [SPARK-2627] space out test report sections
      21da538 [Nicholas Chammas] [SPARK-2627] it's PEP 8, not PEP8
      6f4900b [Nicholas Chammas] [SPARK-2627] more misc PEP 8 fixes
      fe57ed0 [Nicholas Chammas] removing merge conflict backups
      9c01d4c [nchammas] Merge pull request #1 from apache/master
      9a66cb0 [Nicholas Chammas] resolving merge conflicts
      a31ccc4 [Nicholas Chammas] [SPARK-2627] miscellaneous PEP 8 fixes
      beaa9ac [Nicholas Chammas] [SPARK-2627] fail check on non-zero status
      723ed39 [Nicholas Chammas] always delete the report file
      0541ebb [Nicholas Chammas] [SPARK-2627] call Python linter from run-tests
      12440fa [Nicholas Chammas] [SPARK-2627] add Scala linter
      61c07b9 [Nicholas Chammas] [SPARK-2627] add Python linter
      75ad552 [Nicholas Chammas] make check output style consistent
      d614967b
    • Cheng Lian's avatar
      [SPARK-2678][Core][SQL] A workaround for SPARK-2678 · a6cd3110
      Cheng Lian authored
      JIRA issues:
      
      - Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      - Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874)
      
      Related PR:
      
      - #1715
      
      This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1801 from liancheng/spark-2874 and squashes the following commits:
      
      8045d7a [Cheng Lian] Make sure test suites pass
      8493a9e [Cheng Lian] Using eval to retain quoted arguments
      aed523f [Cheng Lian] Fixed typo in bin/spark-sql
      f12a0b1 [Cheng Lian] Worked arount SPARK-2678
      daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
      a6cd3110
    • Davies Liu's avatar
      [SPARK-2875] [PySpark] [SQL] handle null in schemaRDD() · 48789117
      Davies Liu authored
      Handle null in schemaRDD during converting them into Python.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1802 from davies/json and squashes the following commits:
      
      88e6b1f [Davies Liu] handle null in schemaRDD()
      48789117
    • Andrew Or's avatar
      [SPARK-2157] Enable tight firewall rules for Spark · 09f7e458
      Andrew Or authored
      The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107.
      
      The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs.
      
      My spark-env.sh looks like this:
      ```
      export SPARK_MASTER_PORT=6060
      export SPARK_WORKER_PORT=7070
      export SPARK_MASTER_WEBUI_PORT=9090
      export SPARK_WORKER_WEBUI_PORT=9091
      ```
      and my spark-defaults.conf looks like this:
      ```
      spark.master spark://andrews-mbp:6060
      spark.driver.port 5001
      spark.fileserver.port 5011
      spark.broadcast.port 5021
      spark.replClassServer.port 5031
      spark.blockManager.port 5041
      spark.executor.port 5051
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #1777 from andrewor14/configure-ports and squashes the following commits:
      
      621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      8a6b820 [Andrew Or] Use a random UI port during tests
      7da0493 [Andrew Or] Fix tests
      523c30e [Andrew Or] Add test for isBindCollision
      b97b02a [Andrew Or] Minor fixes
      c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      93d359f [Andrew Or] Executors connect to wrong port when collision occurs
      d502e5f [Andrew Or] Handle port collisions when creating Akka systems
      a2dd05c [Andrew Or] Patrick's comment nit
      86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port
      1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode
      cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.)
      e837cde [Andrew Or] Remove outdated TODOs
      bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      de1b207 [Andrew Or] Update docs to reflect new ports
      b565079 [Andrew Or] Add spark.ports.maxRetries
      2551eb2 [Andrew Or] Remove spark.worker.watcher.port
      151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      9868358 [Andrew Or] Add a few miscellaneous ports
      6016e77 [Andrew Or] Add spark.executor.port
      8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT
      4d9e6f3 [Andrew Or] Fix super subtle bug
      3f8e51b [Andrew Or] Correct erroneous docs...
      e111d08 [Andrew Or] Add names for UI services
      470f38c [Andrew Or] Special case non-"Address already in use" exceptions
      1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port
      ba32280 [Andrew Or] Minor fixes
      6b550b0 [Andrew Or] Assorted fixes
      73fbe89 [Andrew Or] Move start service logic to Utils
      ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports
      038a579 [Andrew Ash] Trust the server start function to report the port the service started on
      7c5bdc4 [Andrew Ash] Fix style issue
      0347aef [Andrew Ash] Unify port fallback logic to a single place
      24a4c32 [Andrew Ash] Remove type on val to match surrounding style
      9e4ad96 [Andrew Ash] Reformat for style checker
      5d84e0e [Andrew Ash] Document new port configuration options
      066dc7a [Andrew Ash] Fix up HttpServer port increments
      cad16da [Andrew Ash] Add fallover increment logic for HttpServer
      c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment
      b80d2fd [Andrew Ash] Make Spark's block manager port configurable
      17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server
      f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast
      49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer
      1c0981a [Andrew Ash] Make port in HttpServer configurable
      09f7e458
    • Tathagata Das's avatar
      [SPARK-1022][Streaming][HOTFIX] Fixed zookeeper dependency of Kafka · ee7f3085
      Tathagata Das authored
      https://github.com/apache/spark/pull/1751 caused maven builds to fail.
      
      ```
      ~/Apache/spark(branch-1.1|:heavy_check_mark:) ➤ mvn -U -DskipTests clean install
      .
      .
      .
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:36: object NIOServerCnxnFactory is not a member of package org.apache.zookeeper.server
      [error] import org.apache.zookeeper.server.NIOServerCnxnFactory
      [error]        ^
      [error] Apache/spark/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala:199: not found: type NIOServerCnxnFactory
      [error]     val factory = new NIOServerCnxnFactory()
      [error]                       ^
      [error] two errors found
      [error] Compile failed at Aug 5, 2014 1:42:36 PM [0.503s]
      ```
      
      The problem is how SBT and Maven resolves multiple versions of the same library, which in this case, is Zookeeper. Observing and comparing the dependency trees from Maven and SBT showed this. Spark depends on ZK 3.4.5 whereas Apache Kafka transitively depends on upon ZK 3.3.4. SBT decides to evict 3.3.4 and use the higher version 3.4.5. But Maven decides to stick to the closest (in the tree) dependent version of 3.3.4. And 3.3.4 does not have NIOServerCnxnFactory.
      
      The solution in this patch excludes zookeeper from the apache-kafka dependency in streaming-kafka module so that it just inherits zookeeper from Spark core.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1797 from tdas/kafka-zk-fix and squashes the following commits:
      
      94b3931 [Tathagata Das] Fixed zookeeper dependency of Kafka
      ee7f3085
    • DB Tsai's avatar
      [MLlib] Use this.type as return type in k-means' builder pattern · c7b52010
      DB Tsai authored
      to ensure that the return object is itself.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #1796 from dbtsai/dbtsai-kmeans and squashes the following commits:
      
      658989e [DB Tsai] Alpine Data Labs
      c7b52010
    • CodingCat's avatar
      SPARK-2294: fix locality inversion bug in TaskManager · 63bdb1f4
      CodingCat authored
      copied from original JIRA (https://issues.apache.org/jira/browse/SPARK-2294):
      
      If an executor E is free, a task may be speculatively assigned to E when there are other tasks in the job that have not been launched (at all) yet. Similarly, a task without any locality preferences may be assigned to E when there was another NODE_LOCAL task that could have been scheduled.
      This happens because TaskSchedulerImpl calls TaskSetManager.resourceOffer (which in turn calls TaskSetManager.findTask) with increasing locality levels, beginning with PROCESS_LOCAL, followed by NODE_LOCAL, and so on until the highest currently allowed level. Now, supposed NODE_LOCAL is the highest currently allowed locality level. The first time findTask is called, it will be called with max level PROCESS_LOCAL; if it cannot find any PROCESS_LOCAL tasks, it will try to schedule tasks with no locality preferences or speculative tasks. As a result, speculative tasks or tasks with no preferences may be scheduled instead of NODE_LOCAL tasks.
      
      ----
      
      I added an additional parameter in resourceOffer and findTask, maxLocality, indicating when we should consider the tasks without locality preference
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #1313 from CodingCat/SPARK-2294 and squashes the following commits:
      
      bf3f13b [CodingCat] rollback some forgotten changes
      89f9bc0 [CodingCat] address matei's comments
      18cae02 [CodingCat] add test case for node-local tasks
      2ba6195 [CodingCat] fix failed test cases
      87dd09e [CodingCat] fix style
      9b9432f [CodingCat] remove hasNodeLocalOnlyTasks
      fdd1573 [CodingCat] fix failed test cases
      941a4fd [CodingCat] see my shocked face..........
      f600085 [CodingCat] remove hasNodeLocalOnlyTasks checking
      0b8a46b [CodingCat] test whether hasNodeLocalOnlyTasks affect the results
      73ceda8 [CodingCat] style fix
      b3a430b [CodingCat] remove fine granularity tracking for node-local only tasks
      f9a2ad8 [CodingCat] simplify the logic in TaskSchedulerImpl
      c8c1de4 [CodingCat] simplify the patch
      be652ed [CodingCat] avoid unnecessary delay when we only have nopref tasks
      dee9e22 [CodingCat] fix locality inversion bug in TaskManager by moving nopref branch
      63bdb1f4
    • Michael Armbrust's avatar
      [SQL] Fix logging warn -> debug · 5a826c00
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1800 from marmbrus/warning and squashes the following commits:
      
      8ea9cf1 [Michael Armbrust] [SQL] Fix logging warn -> debug.
      5a826c00
    • Reynold Xin's avatar
      [SQL] Tighten the visibility of various SQLConf methods and renamed setter/getters · b70bae40
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1794 from rxin/sql-conf and squashes the following commits:
      
      3ac11ef [Reynold Xin] getAllConfs return an immutable Map instead of an Array.
      4b19d6c [Reynold Xin] Tighten the visibility of various SQLConf methods and renamed setter/getters.
      b70bae40
  3. Aug 05, 2014
    • Anand Avati's avatar
      [SPARK-2806] core - upgrade to json4s-jackson 3.2.10 · 82624e2c
      Anand Avati authored
      Scala 2.11 packages not available for the current version (3.2.6)
      
      Signed-off-by: Anand Avati <avatiredhat.com>
      
      Author: Anand Avati <avati@redhat.com>
      
      Closes #1702 from avati/SPARK-1812-json4s-jackson-3.2.10 and squashes the following commits:
      
      7be8324 [Anand Avati] SPARK-1812: core - upgrade to json4s 3.2.10
      82624e2c
    • Michael Armbrust's avatar
      [SPARK-2866][SQL] Support attributes in ORDER BY that aren't in SELECT · 1d70c4f6
      Michael Armbrust authored
      Minor refactoring to allow resolution either using a nodes input or output.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1795 from marmbrus/ordering and squashes the following commits:
      
      237f580 [Michael Armbrust] style
      74d833b [Michael Armbrust] newline
      705d963 [Michael Armbrust] Add a rule for resolving ORDER BY expressions that reference attributes not present in the SELECT clause.
      82cabda [Michael Armbrust] Generalize attribute resolution.
      1d70c4f6
    • Yin Huai's avatar
      [SPARK-2854][SQL] Finalize _acceptable_types in pyspark.sql · 69ec678d
      Yin Huai authored
      This PR aims to finalize accepted data value types in Python RDDs provided to Python `applySchema`.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-2854
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1793 from yhuai/SPARK-2854 and squashes the following commits:
      
      32f0708 [Yin Huai] LongType only accepts long values.
      c2b23dd [Yin Huai] Do data type conversions based on the specified Spark SQL data type.
      69ec678d
    • Cheng Lian's avatar
      [SPARK-2650][SQL] Try to partially fix SPARK-2650 by adjusting initial buffer... · d0ae3f39
      Cheng Lian authored
      [SPARK-2650][SQL] Try to partially fix SPARK-2650 by adjusting initial buffer size and reducing memory allocation
      
      JIRA issue: [SPARK-2650](https://issues.apache.org/jira/browse/SPARK-2650)
      
      Please refer to [comments](https://issues.apache.org/jira/browse/SPARK-2650?focusedCommentId=14084397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14084397) of SPARK-2650 for some other details.
      
      This PR adjusts the initial in-memory columnar buffer size to 1MB, same as the default value of Shark's `shark.column.partitionSize.mb` property when running in local mode. Will add Shark style partition size estimation in another PR.
      
      Also, before this PR, `NullableColumnBuilder` copies the whole buffer to add the null positions section, and then `CompressibleColumnBuilder` copies and compresses the buffer again, even if compression is disabled (`PassThrough` compression scheme is used to disable compression). In this PR the first buffer copy is eliminated to reduce memory consumption.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1769 from liancheng/spark-2650 and squashes the following commits:
      
      88a042e [Cheng Lian] Fixed method visibility and removed dead code
      001f2e5 [Cheng Lian] Try fixing SPARK-2650 by adjusting initial buffer size and reducing memory allocation
      d0ae3f39
    • wangfei's avatar
      [sql] rename project name in pom.xml of hive-thriftserver module · d94f5990
      wangfei authored
      module spark-hive-thriftserver_2.10 and spark-hive_2.10 both named "Spark Project Hive" in pom.xml, so rename spark-hive-thriftserver_2.10 project name to "Spark Project Hive Thrift Server"
      
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #1789 from scwf/patch-1 and squashes the following commits:
      
      ca1f5e9 [wangfei] [sql] rename module name of hive-thriftserver
      d94f5990
    • Stephen Boesch's avatar
      SPARK-2869 - Fix tiny bug in JdbcRdd for closing jdbc connection · 2643e660
      Stephen Boesch authored
      I inquired on  dev mailing list about the motivation for checking the jdbc statement instead of the connection in the close() logic of JdbcRDD. Ted Yu believes there essentially is none-  it is a simple cut and paste issue. So here is the tiny fix to patch it.
      
      Author: Stephen Boesch <javadba>
      Author: Stephen Boesch <javadba@gmail.com>
      
      Closes #1792 from javadba/closejdbc and squashes the following commits:
      
      363be4f [Stephen Boesch] SPARK-2869 - Fix tiny bug in JdbcRdd for closing jdbc connection (reformat with braces)
      6518d36 [Stephen Boesch] SPARK-2689 Fix tiny bug in JdbcRdd for closing jdbc connection
      3fb23ed [Stephen Boesch] SPARK-2689 Fix potential leak of connection/PreparedStatement in case of error in JdbcRDD
      095b2c9 [Stephen Boesch] Fix tiny bug (likely copy and paste error) in closing jdbc connection
      2643e660
    • Michael Giannakopoulos's avatar
      [SPARK-2550][MLLIB][APACHE SPARK] Support regularization and intercept in pyspark's linear methods · 1aad9114
      Michael Giannakopoulos authored
      Related to Jira Issue: [SPARK-2550](https://issues.apache.org/jira/browse/SPARK-2550?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20priority%20%3D%20Major%20ORDER%20BY%20key%20DESC)
      
      Author: Michael Giannakopoulos <miccagiann@gmail.com>
      
      Closes #1775 from miccagiann/linearMethodsReg and squashes the following commits:
      
      cb774c3 [Michael Giannakopoulos] MiniBatchFraction added in related PythonMLLibAPI java stubs.
      81fcbc6 [Michael Giannakopoulos] Fixing a typo-error.
      8ad263e [Michael Giannakopoulos] Adding regularizer type and intercept parameters to LogisticRegressionWithSGD and SVMWithSGD.
      1aad9114
    • Reynold Xin's avatar
      [SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 32KB. · acff9a7f
      Reynold Xin authored
      This can substantially reduce memory usage during shuffle.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1781 from rxin/SPARK-2503-spark.shuffle.file.buffer.kb and squashes the following commits:
      
      104b8d8 [Reynold Xin] [SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 32KB.
      acff9a7f
    • Xiangrui Meng's avatar
      [SPARK-2864][MLLIB] fix random seed in word2vec; move model to local · cc491f69
      Xiangrui Meng authored
      It also moves the model to local in order to map `RDD[String]` to `RDD[Vector]`.
      
      Ishiihara
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1790 from mengxr/word2vec-fix and squashes the following commits:
      
      a87146c [Xiangrui Meng] add setters and make a default constructor
      e5c923b [Xiangrui Meng] fix random seed in word2vec; move model to local
      cc491f69
    • Thomas Graves's avatar
      SPARK-1680: use configs for specifying environment variables on YARN · 41e0a21b
      Thomas Graves authored
      Note that this also documents spark.executorEnv.*  which to me means its public.  If we don't want that please speak up.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1512 from tgravescs/SPARK-1680 and squashes the following commits:
      
      11525df [Thomas Graves] more doc changes
      553bad0 [Thomas Graves] fix documentation
      152bf7c [Thomas Graves] fix docs
      5382326 [Thomas Graves] try fix docs
      32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
      41e0a21b
    • Patrick Wendell's avatar
      SPARK-2380: Support displaying accumulator values in the web UI · 74f82c71
      Patrick Wendell authored
      This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish.
      
      Currently in Spark developers have been extending the `TaskMetrics` functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually `TaskMetrics` and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going.
      
      ![counters](https://cloud.githubusercontent.com/assets/320616/3488815/6ee7bc34-0505-11e4-84ce-e36d9886e2cf.png)
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #1309 from pwendell/metrics and squashes the following commits:
      
      8815308 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into HEAD
      93fbe0f [Patrick Wendell] Other minor fixes
      cc43f68 [Patrick Wendell] Updating unit tests
      c991b1b [Patrick Wendell] Moving some code into the Accumulators class
      9a9ba3c [Patrick Wendell] More merge fixes
      c5ace9e [Patrick Wendell] More merge conflicts
      1da15e3 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into metrics
      9860c55 [Patrick Wendell] Potential solution to posting listener events
      0bb0e33 [Patrick Wendell] Remove "display" variable and assume display = name.isDefined
      0ec4ac7 [Patrick Wendell] Java API's
      e95bf69 [Patrick Wendell] Stash
      be97261 [Patrick Wendell] Style fix
      8407308 [Patrick Wendell] Removing examples in Hadoop and RDD class
      64d405f [Patrick Wendell] Adding missing file
      5d8b156 [Patrick Wendell] Changes based on Kay's review.
      9f18bad [Patrick Wendell] Minor style changes and tests
      7a63abc [Patrick Wendell] Adding Json serialization and responding to Reynold's feedback
      ad85076 [Patrick Wendell] Example of using named accumulators for custom RDD metrics.
      0b72660 [Patrick Wendell] Initial WIP example of supporing globally named accumulators.
      74f82c71
    • Guancheng (G.C.) Chen's avatar
      [SPARK-2859] Update url of Kryo project in related docs · ac3440f4
      Guancheng (G.C.) Chen authored
      JIRA Issue: https://issues.apache.org/jira/browse/SPARK-2859
      
      Kryo project has been migrated from googlecode to github, hence we need to update its URL in related docs such as tuning.md.
      
      Author: Guancheng (G.C.) Chen <chenguancheng@gmail.com>
      
      Closes #1782 from gchen/kryo-docs and squashes the following commits:
      
      b62543c [Guancheng (G.C.) Chen] update url of Kryo project
      ac3440f4
    • Michael Armbrust's avatar
      [SPARK-2860][SQL] Fix coercion of CASE WHEN. · 6e821e3d
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1785 from marmbrus/caseNull and squashes the following commits:
      
      126006d [Michael Armbrust] better error message
      2fe357f [Michael Armbrust] Fix coercion of CASE WHEN.
      6e821e3d
    • Thomas Graves's avatar
      SPARK-1890 and SPARK-1891- add admin and modify acls · 1c5555a2
      Thomas Graves authored
      It was easier to combine these 2 jira since they touch many of the same places.  This pr adds the following:
      
      - adds modify acls
      - adds admin acls (list of admins/users that get added to both view and modify acls)
      - modify Kill button on UI to take modify acls into account
      - changes config name of spark.ui.acls.enable to spark.acls.enable since I choose poorly in original name. We keep backwards compatibility so people can still use spark.ui.acls.enable. The acls should apply to any web ui as well as any CLI interfaces.
      - send view and modify acls information on to YARN so that YARN interfaces can use (yarn cli for killing applications for example).
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1196 from tgravescs/SPARK-1890 and squashes the following commits:
      
      8292eb1 [Thomas Graves] review comments
      b92ec89 [Thomas Graves] remove unneeded variable from applistener
      4c765f4 [Thomas Graves] Add in admin acls
      72eb0ac [Thomas Graves] Add modify acls
      1c5555a2
    • Thomas Graves's avatar
      SPARK-1528 - spark on yarn, add support for accessing remote HDFS · 2c0f705e
      Thomas Graves authored
      Add a config (spark.yarn.access.namenodes) to allow applications running on yarn to access other secure HDFS cluster.  User just specifies the namenodes of the other clusters and we get Tokens for those and ship them with the spark application.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1159 from tgravescs/spark-1528 and squashes the following commits:
      
      ddbcd16 [Thomas Graves] review comments
      0ac8501 [Thomas Graves] SPARK-1528 - add support for accessing remote HDFS
      2c0f705e
    • jerryshao's avatar
      [SPARK-1022][Streaming] Add Kafka real unit test · e87075df
      jerryshao authored
      This PR is a updated version of (https://github.com/apache/spark/pull/557) to actually test sending and receiving data through Kafka, and fix previous flaky issues.
      
      @tdas, would you mind reviewing this PR? Thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #1751 from jerryshao/kafka-unit-test and squashes the following commits:
      
      b6a505f [jerryshao] code refactor according to comments
      5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it
      5525f10 [jerryshao] Fix flaky issue of Kafka real unit test
      4559310 [jerryshao] Minor changes for Kafka unit test
      860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness
      796d4ca [jerryshao] Add real Kafka streaming test
      e87075df
    • Reynold Xin's avatar
      [SPARK-2856] Decrease initial buffer size for Kryo to 64KB. · 184048f8
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1780 from rxin/kryo-init-size and squashes the following commits:
      
      551b935 [Reynold Xin] [SPARK-2856] Decrease initial buffer size for Kryo to 64KB.
      184048f8
    • wangfei's avatar
      [SPARK-1779] Throw an exception if memory fractions are not between 0 and 1 · 9862c614
      wangfei authored
      Author: wangfei <scnbwf@yeah.net>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #714 from scwf/memoryFraction and squashes the following commits:
      
      6e385b9 [wangfei] Update SparkConf.scala
      da6ee59 [wangfei] add configs
      829a195 [wangfei] add indent
      717c0ca [wangfei] updated to make more concise
      fc45476 [wangfei] validate memoryfraction in sparkconf
      2e79b3d [wangfei] && => ||
      43621bd [wangfei] && => ||
      cf38bcf [wangfei] throw IllegalArgumentException
      14d18ac [wangfei] throw IllegalArgumentException
      dff1f0f [wangfei] Update BlockManager.scala
      764965f [wangfei] Update ExternalAppendOnlyMap.scala
      a59d76b [wangfei] Throw exception when memoryFracton is out of range
      7b899c2 [wangfei] 【SPARK-1779】
      9862c614
    • Andrew Or's avatar
      [SPARK-2857] Correct properties to set Master / Worker ports · a646a365
      Andrew Or authored
      `master.ui.port` and `worker.ui.port` were never picked up by SparkConf, simply because they are not prefixed with "spark." Unfortunately, this is also currently the documented way of setting these values.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1779 from andrewor14/master-worker-port and squashes the following commits:
      
      8475e95 [Andrew Or] Update docs to reflect changes in configs
      4db3d5d [Andrew Or] Stop using configs that don't actually work
      a646a365
    • Matei Zaharia's avatar
      SPARK-2711. Create a ShuffleMemoryManager to track memory for all spilling collections · 4fde28c2
      Matei Zaharia authored
      This tracks memory properly if there are multiple spilling collections in the same task (which was a problem before), and also implements an algorithm that lets each thread grow up to 1 / 2N of the memory pool (where N is the number of threads) before spilling, which avoids an inefficiency with small spills we had before (some threads would spill many times at 0-1 MB because the pool was allocated elsewhere).
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #1707 from mateiz/spark-2711 and squashes the following commits:
      
      debf75b [Matei Zaharia] Review comments
      24f28f3 [Matei Zaharia] Small rename
      c8f3a8b [Matei Zaharia] Update ShuffleMemoryManager to be able to partially grant requests
      315e3a5 [Matei Zaharia] Some review comments
      b810120 [Matei Zaharia] Create central manager to track memory for all spilling collections
      4fde28c2
Loading