Skip to content
Snippets Groups Projects
  1. Oct 30, 2014
    • Tathagata Das's avatar
      [SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received either... · fb1fbca2
      Tathagata Das authored
      [SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received either from BlockManager or WAL in HDFS
      
      As part of the initiative of preventing data loss on streaming driver failure, this sub-task implements a BlockRDD that is backed by HDFS. This BlockRDD can either read data from the Spark's BlockManager, or read the data from file-segments in write ahead log in HDFS.
      
      Most of this code has been written by @harishreedharan
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #2931 from tdas/driver-ha-rdd and squashes the following commits:
      
      209e49c [Tathagata Das] Better fix to style issue.
      4a5866f [Tathagata Das] Addressed one more comment.
      ed5fbf0 [Tathagata Das] Minor updates.
      b0a18b1 [Tathagata Das] Fixed import order.
      20aa7c6 [Tathagata Das] Fixed more line length issues.
      29aa099 [Tathagata Das] Fixed line length issues.
      9e47b5b [Tathagata Das] Renamed class, simplified+added unit tests.
      6e1bfb8 [Tathagata Das] Tweaks testuite to create spark contxt lazily to prevent contxt leaks.
      9c86a61 [Tathagata Das] Merge pull request #22 from harishreedharan/driver-ha-rdd
      2878c38 [Hari Shreedharan] Shutdown spark context after tests. Formatting/minor fixes
      c709f2f [Tathagata Das] Merge pull request #21 from harishreedharan/driver-ha-rdd
      5cce16f [Hari Shreedharan] Make sure getBlockLocations uses offset and length to find the blocks on HDFS
      eadde56 [Tathagata Das] Transferred HDFSBackedBlockRDD for the driver-ha-working branch
      fb1fbca2
    • Tathagata Das's avatar
      [SPARK-4028][Streaming] ReceivedBlockHandler interface to abstract the... · 234de923
      Tathagata Das authored
      [SPARK-4028][Streaming] ReceivedBlockHandler interface to abstract the functionality of storage of received data
      
      As part of the initiative to prevent data loss on streaming driver failure, this JIRA tracks the subtask of implementing a ReceivedBlockHandler, that abstracts the functionality of storage of received data blocks. The default implementation will maintain the current behavior of storing the data into BlockManager. The optional implementation will store the data to both BlockManager as well as a write ahead log.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2940 from tdas/driver-ha-rbh and squashes the following commits:
      
      78a4aaa [Tathagata Das] Fixed bug causing test failures.
      f192f47 [Tathagata Das] Fixed import order.
      df5f320 [Tathagata Das] Updated code to use ReceivedBlockStoreResult as the return type for handler's storeBlock
      33c30c9 [Tathagata Das] Added license, and organized imports.
      2f025b3 [Tathagata Das] Updates based on PR comments.
      18aec1e [Tathagata Das] Moved ReceivedBlockInfo back into spark.streaming.scheduler package
      95a4987 [Tathagata Das] Added ReceivedBlockHandler and its associated tests
      234de923
  2. Oct 29, 2014
    • Tathagata Das's avatar
      [SPARK-4053][Streaming] Made the ReceiverSuite test more reliable, by fixing... · 12342580
      Tathagata Das authored
      [SPARK-4053][Streaming] Made the ReceiverSuite test more reliable, by fixing block generator throttling
      
      In the unit test that checked whether blocks generated by throttled block generator had expected number of records, the thresholds are too tight, which sometimes led to the test failing.
      This PR fixes it by relaxing the thresholds and the time intervals for testing.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2900 from tdas/receiver-suite-flakiness and squashes the following commits:
      
      28508a2 [Tathagata Das] Made the ReceiverSuite test more reliable
      12342580
  3. Oct 24, 2014
    • Josh Rosen's avatar
      [SPARK-4080] Only throw IOException from [write|read][Object|External] · 6c98c29a
      Josh Rosen authored
      If classes implementing Serializable or Externalizable interfaces throw
      exceptions other than IOException or ClassNotFoundException from their
      (de)serialization methods, then this results in an unhelpful
      "IOException: unexpected exception type" rather than the actual exception that
      produced the (de)serialization error.
      
      This patch fixes this by adding a utility method that re-wraps any uncaught
      exceptions in IOException (unless they are already instances of IOException).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits:
      
      cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write|read][Object|External].
      6c98c29a
    • Hari Shreedharan's avatar
      [SPARK-4026][Streaming] Write ahead log management · 6a40a768
      Hari Shreedharan authored
      As part of the effort to avoid data loss on Spark Streaming driver failure, we want to implement a write ahead log that can write received data to HDFS. This allows the received data to be persist across driver failures. So when the streaming driver is restarted, it can find and reprocess all the data that were received but not processed.
      
      This was primarily implemented by @harishreedharan. This is still WIP, as he is going to improve the unitests by using HDFS mini cluster.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2882 from tdas/driver-ha-wal and squashes the following commits:
      
      e4bee20 [Tathagata Das] Removed synchronized, Path.getFileSystem is threadsafe
      55514e2 [Tathagata Das] Minor changes based on PR comments.
      d29fddd [Tathagata Das] Merge pull request #20 from harishreedharan/driver-ha-wal
      a317a4d [Hari Shreedharan] Directory deletion should not fail tests
      9514dc8 [Tathagata Das] Added unit tests to test reading of corrupted data and other minor edits
      3881706 [Tathagata Das] Merge pull request #19 from harishreedharan/driver-ha-wal
      4705fff [Hari Shreedharan] Sort listed files by name. Use local files for WAL tests.
      eb356ca [Tathagata Das] Merge pull request #18 from harishreedharan/driver-ha-wal
      82ce56e [Hari Shreedharan] Fix file ordering issue in WALManager tests
      5ff90ee [Hari Shreedharan] Fix tests to not ignore ordering and also assert all data is present
      ef8db09 [Tathagata Das] Merge pull request #17 from harishreedharan/driver-ha-wal
      7e40e56 [Hari Shreedharan] Restore old build directory after tests
      587b876 [Hari Shreedharan] Fix broken test. Call getFileSystem only from synchronized method.
      b4be0c1 [Hari Shreedharan] Remove unused method
      edcbee1 [Hari Shreedharan] Tests reading and writing data using writers now use Minicluster.
      5c70d1f [Hari Shreedharan] Remove underlying stream from the WALWriter.
      4ab602a [Tathagata Das] Refactored write ahead stuff from streaming.storage to streaming.util
      b06be2b [Tathagata Das] Adding missing license.
      5182ffb [Hari Shreedharan] Added documentation
      172358d [Tathagata Das] Pulled WriteAheadLog-related stuff from tdas/spark/tree/driver-ha-working
      6a40a768
  4. Oct 21, 2014
    • Holden Karau's avatar
      replace awaitTransformation with awaitTermination in scaladoc/javadoc · 2aeb84bc
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #2861 from holdenk/SPARK-4015-Documentation-in-the-streaming-context-references-non-existent-function and squashes the following commits:
      
      081db8a [Holden Karau] fix pyspark streaming doc too
      0e03863 [Holden Karau] replace awaitTransformation with awaitTermination
      2aeb84bc
  5. Oct 12, 2014
    • giwa's avatar
      [SPARK-2377] Python API for Streaming · 69c67aba
      giwa authored
      This patch brings Python API for Streaming.
      
      This patch is based on work from @giwa
      
      Author: giwa <ugw.gi.world@gmail.com>
      Author: Ken Takagiwa <ken@Kens-MacBook-Pro.local>
      Author: Davies Liu <davies.liu@gmail.com>
      Author: Ken Takagiwa <ken@kens-mbp.gateway.sonic.net>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Ken <ugw.gi.world@gmail.com>
      Author: Ken Takagiwa <ugw.gi.world@gmail.com>
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2538 from davies/streaming and squashes the following commits:
      
      64561e4 [Davies Liu] fix tests
      331ecce [Davies Liu] fix example
      3e2492b [Davies Liu] change updateStateByKey() to easy API
      182be73 [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      02d0575 [Davies Liu] add wrapper for foreachRDD()
      bebeb4a [Davies Liu] address all comments
      6db00da [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      8380064 [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      52c535b [Davies Liu] remove fix for sum()
      e108ec1 [Davies Liu]  address comments
      37fe06f [Davies Liu] use random port for callback server
      d05871e [Davies Liu] remove reuse of PythonRDD
      be5e5ff [Davies Liu] merge branch of env, make tests stable.
      8071541 [Davies Liu] Merge branch 'env' into streaming
      c7bbbce [Davies Liu] fix sphinx docs
      6bb9d91 [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      4d0ea8b [Davies Liu] clear reference of SparkEnv after stop
      54bd92b [Davies Liu] improve tests
      c2b31cb [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      7a88f9f [Davies Liu] rollback RDD.setContext(), use textFileStream() to test checkpointing
      bd8a4c2 [Davies Liu] fix scala style
      7797c70 [Davies Liu] refactor
      ff88bec [Davies Liu] rename RDDFunction to TransformFunction
      d328aca [Davies Liu] fix serializer in queueStream
      6f0da2f [Davies Liu] recover from checkpoint
      fa7261b [Davies Liu] refactor
      a13ff34 [Davies Liu] address comments
      8466916 [Davies Liu] support checkpoint
      9a16bd1 [Davies Liu] change number of partitions during tests
      b98d63f [Davies Liu] change private[spark] to private[python]
      eed6e2a [Davies Liu] rollback not needed changes
      e00136b [Davies Liu] address comments
      069a94c [Davies Liu] fix the number of partitions during window()
      338580a [Davies Liu] change _first(), _take(), _collect() as private API
      19797f9 [Davies Liu] clean up
      6ebceca [Davies Liu] add more tests
      c40c52d [Davies Liu] change first(), take(n) to has the same behavior as RDD
      98ac6c2 [Davies Liu] support ssc.transform()
      b983f0f [Davies Liu] address comments
      847f9b9 [Davies Liu] add more docs, add first(), take()
      e059ca2 [Davies Liu] move check of window into Python
      fce0ef5 [Davies Liu] rafactor of foreachRDD()
      7001b51 [Davies Liu] refactor of queueStream()
      26ea396 [Davies Liu] refactor
      74df565 [Davies Liu] fix print and docs
      b32774c [Davies Liu] move java_import into streaming
      604323f [Davies Liu] enable streaming tests
      c499ba0 [Davies Liu] remove Time and Duration
      3f0fb4b [Davies Liu] refactor fix tests
      c28f520 [Davies Liu] support updateStateByKey
      d357b70 [Davies Liu] support windowed dstream
      bd13026 [Davies Liu] fix examples
      eec401e [Davies Liu] refactor, combine TransformedRDD, fix reuse PythonRDD, fix union
      9a57685 [Davies Liu] fix python style
      bd27874 [Davies Liu] fix scala style
      7339be0 [Davies Liu] delete tests
      7f53086 [Davies Liu] support transform(), refactor and cleanup
      df098fc [Davies Liu] Merge branch 'master' into giwa
      550dfd9 [giwa] WIP fixing 1.1 merge
      5cdb6fa [giwa] changed for SCCallSiteSync
      e685853 [giwa] meged with rebased 1.1 branch
      2d32a74 [giwa] added some StreamingContextTestSuite
      4a59e1e [giwa] WIP:added more test for StreamingContext
      8ffdbf1 [giwa] added atexit to handle callback server
      d5f5fcb [giwa] added comment for StreamingContext.sparkContext
      63c881a [giwa] added StreamingContext.sparkContext
      d39f102 [giwa] added StreamingContext.remember
      d542743 [giwa] clean up code
      2fdf0de [Matthew Farrellee] Fix scalastyle errors
      c0a06bc [giwa] delete not implemented functions
      f385976 [giwa] delete inproper comments
      b0f2015 [giwa] added comment in dstream._test_output
      bebb3f3 [giwa] remove the last brank line
      fbed8da [giwa] revert pom.xml
      8ed93af [giwa] fixed explanaiton
      066ba90 [giwa] revert pom.xml
      fa4af88 [giwa] remove duplicated import
      6ae3caa [giwa] revert pom.xml
      7dc7391 [giwa] fixed typo
      62dc7a3 [giwa] clean up exmples
      f04882c [giwa] clen up examples
      b171ec3 [giwa] fixed pep8 violation
      f198d14 [giwa] clean up code
      3166d31 [giwa] clean up
      c00e091 [giwa] change test case not to use awaitTermination
      e80647e [giwa] adopted the latest compression way of python command
      58e41ff [giwa] merge with master
      455e5af [giwa] removed wasted print in DStream
      af336b7 [giwa] add comments
      ddd4ee1 [giwa] added TODO coments
      99ce042 [giwa] added saveAsTextFiles and saveAsPickledFiles
      2a06cdb [giwa] remove waste duplicated code
      c5ecfc1 [giwa] basic function test cases are passed
      8dcda84 [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      795b2cd [giwa] broke something
      1e126bf [giwa] WIP: solved partitioned and None is not recognized
      f67cf57 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      953deb0 [giwa] edited the comment to add more precise description
      af610d3 [giwa] removed unnesessary changes
      c1d546e [giwa] fixed PEP-008 violation
      99410be [giwa] delete waste file
      b3b0362 [giwa] added basic operation test cases
      9cde7c9 [giwa] WIP added test case
      bd3ba53 [giwa] WIP
      5c04a5f [giwa] WIP: added PythonTestInputStream
      019ef38 [giwa] WIP
      1934726 [giwa] update comment
      376e3ac [giwa] WIP
      932372a [giwa] clean up dstream.py
      0b09cff [giwa] added stop in StreamingContext
      92e333e [giwa] implemented reduce and count function in Dstream
      1b83354 [giwa] Removed the waste line
      88f7506 [Ken Takagiwa] Kill py4j callback server properly
      54b5358 [Ken Takagiwa] tried to restart callback server
      4f07163 [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      fe02547 [Ken Takagiwa] remove waste file
      2ad7bd3 [Ken Takagiwa] clean up codes
      6197a11 [Ken Takagiwa] clean up code
      eb4bf48 [Ken Takagiwa] fix map function
      98c2a00 [Ken Takagiwa] added count operation but this implementation need double check
      58591d2 [Ken Takagiwa] reduceByKey is working
      0df7111 [Ken Takagiwa] delete old file
      f485b1d [Ken Takagiwa] fied input of socketTextDStream
      dd6de81 [Ken Takagiwa] initial commit for socketTextStream
      247fd74 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      4bcb318 [Ken Takagiwa] implementing transform function in Python
      38adf95 [Ken Takagiwa] added reducedByKey not working yet
      66fcfff [Ken Takagiwa] modify dstream.py to fix indent error
      41886c2 [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      0b99bec [Ken] initial commit for pySparkStreaming
      c214199 [giwa] added testcase for combineByKey
      5625bdc [giwa] added gorupByKey testcase
      10ab87b [giwa] added sparkContext as input parameter in StreamingContext
      10b5b04 [giwa] removed wasted print in DStream
      e54f986 [giwa] add comments
      16aa64f [giwa] added TODO coments
      74535d4 [giwa] added saveAsTextFiles and saveAsPickledFiles
      f76c182 [giwa] remove waste duplicated code
      18c8723 [giwa] modified streaming test case to add coment
      13fb44c [giwa] basic function test cases are passed
      3000b2b [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      ff14070 [giwa] broke something
      bcdec33 [giwa] WIP: solved partitioned and None is not recognized
      270a9e1 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      bb10956 [giwa] edited the comment to add more precise description
      253a863 [giwa] removed unnesessary changes
      3d37822 [giwa] fixed PEP-008 violation
      f21cab3 [giwa] delete waste file
      878bad7 [giwa] added basic operation test cases
      ce2acd2 [giwa] WIP added test case
      9ad6855 [giwa] WIP
      1df77f5 [giwa] WIP: added PythonTestInputStream
      1523b66 [giwa] WIP
      8a0fbbc [giwa] update comment
      fe648e3 [giwa] WIP
      29c2bc5 [giwa] initial commit for testcase
      4d40d63 [giwa] clean up dstream.py
      c462bb3 [giwa] added stop in StreamingContext
      d2c01ba [giwa] clean up examples
      3c45cd2 [giwa] implemented reduce and count function in Dstream
      b349649 [giwa] Removed the waste line
      3b498e1 [Ken Takagiwa] Kill py4j callback server properly
      84a9668 [Ken Takagiwa] tried to restart callback server
      9ab8952 [Tathagata Das] Added extra line.
      05e991b [Tathagata Das] Added missing file
      b1d2a30 [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      678e854 [Ken Takagiwa] remove waste file
      0a8bbbb [Ken Takagiwa] clean up codes
      bab31c1 [Ken Takagiwa] clean up code
      72b9738 [Ken Takagiwa] fix map function
      d3ee86a [Ken Takagiwa] added count operation but this implementation need double check
      15feea9 [Ken Takagiwa] edit python sparkstreaming example
      6f98e50 [Ken Takagiwa] reduceByKey is working
      c455c8d [Ken Takagiwa] added reducedByKey not working yet
      dc6995d [Ken Takagiwa] delete old file
      b31446a [Ken Takagiwa] fixed typo of network_workdcount.py
      ccfd214 [Ken Takagiwa] added doctest for pyspark.streaming.duration
      0d1b954 [Ken Takagiwa] fied input of socketTextDStream
      f746109 [Ken Takagiwa] initial commit for socketTextStream
      bb7ccf3 [Ken Takagiwa] remove unused import in python
      224fc5e [Ken Takagiwa] add empty line
      d2099d8 [Ken Takagiwa] sorted the import following Spark coding convention
      5bac7ec [Ken Takagiwa] revert streaming/pom.xml
      e1df940 [Ken Takagiwa] revert pom.xml
      494cae5 [Ken Takagiwa] remove not implemented DStream functions in python
      17a74c6 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      1a0f065 [Ken Takagiwa] implementing transform function in Python
      d7b4d6f [Ken Takagiwa] added reducedByKey not working yet
      87438e2 [Ken Takagiwa] modify dstream.py to fix indent error
      b406252 [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      454981d [Ken] initial commit for pySparkStreaming
      150b94c [giwa] added some StreamingContextTestSuite
      f7bc8f9 [giwa] WIP:added more test for StreamingContext
      ee50c5a [giwa] added atexit to handle callback server
      fdc9125 [giwa] added comment for StreamingContext.sparkContext
      f5bfb70 [giwa] added StreamingContext.sparkContext
      da09768 [giwa] added StreamingContext.remember
      d68b568 [giwa] clean up code
      4afa390 [giwa] clean up code
      1fd6bc7 [Ken Takagiwa] Merge pull request #2 from mattf/giwa-master
      d9d59fe [Matthew Farrellee] Fix scalastyle errors
      67473a9 [giwa] delete not implemented functions
      c97377c [giwa] delete inproper comments
      2ea769e [giwa] added comment in dstream._test_output
      3b27bd4 [giwa] remove the last brank line
      acfcaeb [giwa] revert pom.xml
      93f7637 [giwa] fixed explanaiton
      50fd6f9 [giwa] revert pom.xml
      4f82c89 [giwa] remove duplicated import
      9d1de23 [giwa] revert pom.xml
      7339df2 [giwa] fixed typo
      9c85e48 [giwa] clean up exmples
      24f95db [giwa] clen up examples
      0d30109 [giwa] fixed pep8 violation
      b7dab85 [giwa] improve test case
      583e66d [giwa] move tests for streaming inside streaming directory
      1d84142 [giwa] remove unimplement test
      f0ea311 [giwa] clean up code
      171edeb [giwa] clean up
      4dedd2d [giwa] change test case not to use awaitTermination
      268a6a5 [giwa] Changed awaitTermination not to call awaitTermincation in Scala. Just use time.sleep instread
      09a28bf [giwa] improve testcases
      58150f5 [giwa] Changed the test case to focus the test operation
      199e37f [giwa] adopted the latest compression way of python command
      185fdbf [giwa] merge with master
      f1798c4 [giwa] merge with master
      e70f706 [giwa] added testcase for combineByKey
      e162822 [giwa] added gorupByKey testcase
      97742fe [giwa] added sparkContext as input parameter in StreamingContext
      14d4c0e [giwa] removed wasted print in DStream
      6d8190a [giwa] add comments
      4aa99e4 [giwa] added TODO coments
      e9fab72 [giwa] added saveAsTextFiles and saveAsPickledFiles
      94f2b65 [giwa] remove waste duplicated code
      580fbc2 [giwa] modified streaming test case to add coment
      99e4bb3 [giwa] basic function test cases are passed
      7051a84 [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      35933e1 [giwa] broke something
      9767712 [giwa] WIP: solved partitioned and None is not recognized
      4f2d7e6 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      33c0f94d [giwa] edited the comment to add more precise description
      774f18d [giwa] removed unnesessary changes
      3a671cc [giwa] remove export PYSPARK_PYTHON in spark submit
      8efa266 [giwa] fixed PEP-008 violation
      fa75d71 [giwa] delete waste file
      7f96294 [giwa] added basic operation test cases
      3dda31a [giwa] WIP added test case
      1f68b78 [giwa] WIP
      c05922c [giwa] WIP: added PythonTestInputStream
      1fd12ae [giwa] WIP
      c880a33 [giwa] update comment
      5d22c92 [giwa] WIP
      ea4b06b [giwa] initial commit for testcase
      5a9b525 [giwa] clean up dstream.py
      79c5809 [giwa] added stop in StreamingContext
      189dcea [giwa] clean up examples
      b8d7d24 [giwa] implemented reduce and count function in Dstream
      b6468e6 [giwa] Removed the waste line
      b47b5fd [Ken Takagiwa] Kill py4j callback server properly
      19ddcdd [Ken Takagiwa] tried to restart callback server
      c9fc124 [Tathagata Das] Added extra line.
      4caae3f [Tathagata Das] Added missing file
      4eff053 [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      5e822d4 [Ken Takagiwa] remove waste file
      aeaf8a5 [Ken Takagiwa] clean up codes
      9fa249b [Ken Takagiwa] clean up code
      05459c6 [Ken Takagiwa] fix map function
      a9f4ecb [Ken Takagiwa] added count operation but this implementation need double check
      d1ee6ca [Ken Takagiwa] edit python sparkstreaming example
      0b8b7d0 [Ken Takagiwa] reduceByKey is working
      d25d5cf [Ken Takagiwa] added reducedByKey not working yet
      7f7c5d1 [Ken Takagiwa] delete old file
      967dc26 [Ken Takagiwa] fixed typo of network_workdcount.py
      57fb740 [Ken Takagiwa] added doctest for pyspark.streaming.duration
      4b69fb1 [Ken Takagiwa] fied input of socketTextDStream
      02f618a [Ken Takagiwa] initial commit for socketTextStream
      4ce4058 [Ken Takagiwa] remove unused import in python
      856d98e [Ken Takagiwa] add empty line
      490e338 [Ken Takagiwa] sorted the import following Spark coding convention
      5594bd4 [Ken Takagiwa] revert pom.xml
      2adca84 [Ken Takagiwa] remove not implemented DStream functions in python
      e551e13 [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      3758175 [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      c5518b4 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      dcf243f [Ken Takagiwa] implementing transform function in Python
      9af03f4 [Ken Takagiwa] added reducedByKey not working yet
      6e0d9c7 [Ken Takagiwa] modify dstream.py to fix indent error
      e497b9b [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      5c3a683 [Ken] initial commit for pySparkStreaming
      665bfdb [giwa] added testcase for combineByKey
      a3d2379 [giwa] added gorupByKey testcase
      636090a [giwa] added sparkContext as input parameter in StreamingContext
      e7ebb08 [giwa] removed wasted print in DStream
      d8b593b [giwa] add comments
      ea9c873 [giwa] added TODO coments
      89ae38a [giwa] added saveAsTextFiles and saveAsPickledFiles
      e3033fc [giwa] remove waste duplicated code
      a14c7e1 [giwa] modified streaming test case to add coment
      536def4 [giwa] basic function test cases are passed
      2112638 [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      080541a [giwa] broke something
      0704b86 [giwa] WIP: solved partitioned and None is not recognized
      90a6484 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      a65f302 [giwa] edited the comment to add more precise description
      bdde697 [giwa] removed unnesessary changes
      e8c7bfc [giwa] remove export PYSPARK_PYTHON in spark submit
      3334169 [giwa] fixed PEP-008 violation
      db0a303 [giwa] delete waste file
      2cfd3a0 [giwa] added basic operation test cases
      90ae568 [giwa] WIP added test case
      a120d07 [giwa] WIP
      f671cdb [giwa] WIP: added PythonTestInputStream
      56fae45 [giwa] WIP
      e35e101 [giwa] Merge branch 'master' into testcase
      ba5112d [giwa] update comment
      28aa56d [giwa] WIP
      fb08559 [giwa] initial commit for testcase
      a613b85 [giwa] clean up dstream.py
      c40c0ef [giwa] added stop in StreamingContext
      31e4260 [giwa] clean up examples
      d2127d6 [giwa] implemented reduce and count function in Dstream
      48f7746 [giwa] Removed the waste line
      0f83eaa [Ken Takagiwa] delete py4j 0.8.1
      1679808 [Ken Takagiwa] Kill py4j callback server properly
      f96cd4e [Ken Takagiwa] tried to restart callback server
      fe86198 [Ken Takagiwa] add py4j 0.8.2.1 but server is not launched
      1064fe0 [Ken Takagiwa] Merge branch 'master' of https://github.com/giwa/spark
      28c6620 [Ken Takagiwa] Implemented DStream.foreachRDD in the Python API using Py4J callback server
      85b0fe1 [Ken Takagiwa] Merge pull request #1 from tdas/python-foreach
      54e2e8c [Tathagata Das] Added extra line.
      e185338 [Tathagata Das] Added missing file
      a778d4b [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      cc2092b [Ken Takagiwa] remove waste file
      d042ac6 [Ken Takagiwa] clean up codes
      84a021f [Ken Takagiwa] clean up code
      bd20e17 [Ken Takagiwa] fix map function
      d01a125 [Ken Takagiwa] added count operation but this implementation need double check
      7d05109 [Ken Takagiwa] merge with remote branch
      ae464e0 [Ken Takagiwa] edit python sparkstreaming example
      04af046 [Ken Takagiwa] reduceByKey is working
      3b6d7b0 [Ken Takagiwa] implementing transform function in Python
      571d52d [Ken Takagiwa] added reducedByKey not working yet
      5720979 [Ken Takagiwa] delete old file
      e604fcb [Ken Takagiwa] fixed typo of network_workdcount.py
      4b7c08b [Ken Takagiwa] Merge branch 'master' of https://github.com/giwa/spark
      ce7d426 [Ken Takagiwa] added doctest for pyspark.streaming.duration
      a8c9fd5 [Ken Takagiwa] fixed for socketTextStream
      a61fa9e [Ken Takagiwa] fied input of socketTextDStream
      1e84f41 [Ken Takagiwa] initial commit for socketTextStream
      6d012f7 [Ken Takagiwa] remove unused import in python
      25d30d5 [Ken Takagiwa] add empty line
      6e0a64a [Ken Takagiwa] sorted the import following Spark coding convention
      fa4a7fc [Ken Takagiwa] revert streaming/pom.xml
      8f8202b [Ken Takagiwa] revert streaming pom.xml
      c9d79dd [Ken Takagiwa] revert pom.xml
      57e3e52 [Ken Takagiwa] remove not implemented DStream functions in python
      0a516f5 [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      a7a0b5c [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      72bfc66 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      69e9cd3 [Ken Takagiwa] implementing transform function in Python
      94a0787 [Ken Takagiwa] added reducedByKey not working yet
      88068cf [Ken Takagiwa] modify dstream.py to fix indent error
      1367be5 [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      eb2b3ba [Ken] Merge remote-tracking branch 'upstream/master'
      d8e51f9 [Ken] initial commit for pySparkStreaming
      69c67aba
  6. Oct 10, 2014
    • Anand Avati's avatar
      [SPARK-2805] Upgrade Akka to 2.3.4 · 411cf29f
      Anand Avati authored
      This is a second rev of the Akka upgrade (earlier merged, but reverted). I made a slight modification which is that I also upgrade Hive to deal with a compatibility issue related to the protocol buffers library.
      
      Author: Anand Avati <avati@redhat.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2752 from pwendell/akka-upgrade and squashes the following commits:
      
      4c7ca3f [Patrick Wendell] Upgrading to new hive->protobuf version
      57a2315 [Anand Avati] SPARK-1812: streaming - remove tests which depend on akka.actor.IO
      2a551d3 [Anand Avati] SPARK-1812: core - upgrade to akka 2.3.4
      411cf29f
  7. Oct 09, 2014
    • Sean Owen's avatar
      SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, Utils.createTempDir · 363baaca
      Sean Owen authored
      I noticed a few issues with how temp directories are created and deleted:
      
      *Minor*
      
      * Guava's `Files.createTempDir()` plus `File.deleteOnExit()` is used in many tests to make a temp dir, but `Utils.createTempDir()` seems to be the standard Spark mechanism
      * Call to `File.deleteOnExit()` could be pushed into `Utils.createTempDir()` as well, along with this replacement
      * _I messed up the message in an exception in `Utils` in SPARK-3794; fixed here_
      
      *Bit Less Minor*
      
      * `Utils.deleteRecursively()` fails immediately if any `IOException` occurs, instead of trying to delete any remaining files and subdirectories. I've observed this leave temp dirs around. I suggest changing it to continue in the face of an exception and throw one of the possibly several exceptions that occur at the end.
      * `Utils.createTempDir()` will add a JVM shutdown hook every time the method is called. Even if the subdir is the parent of another parent dir, since this check is inside the hook. However `Utils` manages a set of all dirs to delete on shutdown already, called `shutdownDeletePaths`. A single hook can be registered to delete all of these on exit. This is how Tachyon temp paths are cleaned up in `TachyonBlockManager`.
      
      I noticed a few other things that might be changed but wanted to ask first:
      
      * Shouldn't the set of dirs to delete be `File`, not just `String` paths?
      * `Utils` manages the set of `TachyonFile` that have been registered for deletion, but the shutdown hook is managed in `TachyonBlockManager`. Should this logic not live together, and not in `Utils`? it's more specific to Tachyon, and looks a slight bit odd to import in such a generic place.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2670 from srowen/SPARK-3811 and squashes the following commits:
      
      071ae60 [Sean Owen] Update per @vanzin's review
      da0146d [Sean Owen] Make Utils.deleteRecursively try to delete all paths even when an exception occurs; use one shutdown hook instead of one per method call to delete temp dirs
      3a0faa4 [Sean Owen] Standardize on Utils.createTempDir instead of Files.createTempDir
      363baaca
    • Patrick Wendell's avatar
      Revert "[SPARK-2805] Upgrade to akka 2.3.4" · 1faa1135
      Patrick Wendell authored
      This reverts commit b9df8af6.
      1faa1135
    • Anand Avati's avatar
      [SPARK-2805] Upgrade to akka 2.3.4 · b9df8af6
      Anand Avati authored
      Upgrade to akka 2.3.4
      
      Author: Anand Avati <avati@redhat.com>
      
      Closes #1685 from avati/SPARK-1812-akka-2.3 and squashes the following commits:
      
      57a2315 [Anand Avati] SPARK-1812: streaming - remove tests which depend on akka.actor.IO
      2a551d3 [Anand Avati] SPARK-1812: core - upgrade to akka 2.3.4
      b9df8af6
  8. Oct 07, 2014
    • Davies Liu's avatar
      [SPARK-3762] clear reference of SparkEnv after stop · 65503296
      Davies Liu authored
      SparkEnv is cached in ThreadLocal object, so after stop and create a new SparkContext, old SparkEnv is still used by some threads, it will trigger many problems, for example, pyspark will have problem after restart SparkContext, because py4j use thread pool for RPC.
      
      This patch will clear all the references after stop a SparkEnv.
      
      cc mateiz tdas pwendell
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2624 from davies/env and squashes the following commits:
      
      a69f30c [Davies Liu] deprecate getThreadLocal
      ba77ca4 [Davies Liu] remove getThreadLocal(), update docs
      ee62bb7 [Davies Liu] cleanup ThreadLocal of SparnENV
      4d0ea8b [Davies Liu] clear reference of SparkEnv after stop
      65503296
  9. Oct 03, 2014
    • Kousuke Saruta's avatar
      [SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History... · 79e45c93
      Kousuke Saruta authored
      [SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input
      
      This PR is another solution for #2250
      
      I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I saw following 2 problems.
      
      (1) When applications which have same spark.app.name run on cluster at the same time, some metrics names are mixed. For instance, if 2+ application is running on the cluster at the same time, each application emits the same named metric like "SparkPi.DAGScheduler.stage.failedStages" and Graphite cannot distinguish the metrics is for which application.
      
      (2) When 2+ executors run on the same machine, JVM metrics of each executors are mixed. For instance, 2+ executors running on the same node can emit the same named metric "jvm.memory" and Graphite cannot distinguish the metrics is from which application.
      
      And there is an similar issue. The directory for event logs is named using application name.
      Application name is defined by user and the name can includes illegal character for path names.
      Further more, the directory name consists of application name and System.currentTimeMillis even though each application has unique Application ID so if we run jobs which have same name, it's difficult to identify which directory is for which application.
      
      Closes #2250
      Closes #1067
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2432 from sarutak/metrics-structure-improvement2 and squashes the following commits:
      
      3288b2b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      39169e4 [Kousuke Saruta] Fixed style
      6570494 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      817e4f0 [Kousuke Saruta] Simplified MetricsSystem#buildRegistryName
      67fa5eb [Kousuke Saruta] Unified MetricsSystem#registerSources and registerSinks in start
      10be654 [Kousuke Saruta] Fixed style.
      990c078 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      f0c7fba [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      59cc2cd [Kousuke Saruta] Modified SparkContextSchedulerCreationSuite
      f9b6fb3 [Kousuke Saruta] Modified style.
      2cf8a0f [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      389090d [Kousuke Saruta] Replaced taskScheduler.applicationId() with getApplicationId in SparkContext#postApplicationStart
      ff45c89 [Kousuke Saruta] Added some test cases to MetricsSystemSuite
      69c46a6 [Kousuke Saruta] Added warning logging logic to MetricsSystem#buildRegistryName
      5cca0d2 [Kousuke Saruta] Added Javadoc comment to SparkContext#getApplicationId
      16a9f01 [Kousuke Saruta] Added data types to be returned to some methods
      6434b06 [Kousuke Saruta] Reverted changes related to ApplicationId
      0413b90 [Kousuke Saruta] Deleted ApplicationId.java and ApplicationIdSuite.java
      a42300c [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      0fc1b09 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      42bea55 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      248935d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      f6af132 [Kousuke Saruta] Modified SchedulerBackend and TaskScheduler to return System.currentTimeMillis as an unique Application Id
      1b8b53e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      97cb85c [Kousuke Saruta] Modified confliction of MimExcludes
      2cdd009 [Kousuke Saruta] Modified defailt implementation of applicationId
      9aadb0b [Kousuke Saruta] Modified NetworkReceiverSuite to ensure "executor.start()" is finished in test "network receiver life cycle"
      3011efc [Kousuke Saruta] Added ApplicationIdSuite.scala
      d009c55 [Kousuke Saruta] Modified ApplicationId#equals to compare appIds
      dfc83fd [Kousuke Saruta] Modified ApplicationId to implement Serializable
      9ff4851 [Kousuke Saruta] Modified MimaExcludes.scala to ignore createTaskScheduler method in SparkContext
      4567ffc [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      6a91b14 [Kousuke Saruta] Modified SparkContextSchedulerCreationSuite, ExecutorRunnerTest and EventLoggingListenerSuite
      0325caf [Kousuke Saruta] Added ApplicationId.scala
      0a2fc14 [Kousuke Saruta] Modified style
      eabda80 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      0f890e6 [Kousuke Saruta] Modified SparkDeploySchedulerBackend and Master to pass baseLogDir instead f eventLogDir
      bcf25bf [Kousuke Saruta] Modified directory name for EventLogs
      28d4d93 [Kousuke Saruta] Modified SparkContext and EventLoggingListener so that the directory for EventLogs is named same for Application ID
      203634e [Kousuke Saruta] Modified comment in SchedulerBackend#applicationId and TaskScheduler#applicationId
      424fea4 [Kousuke Saruta] Modified  the subclasses of TaskScheduler and SchedulerBackend so that they can return non-optional Unique Application ID
      b311806 [Kousuke Saruta] Swapped last 2 arguments passed to CoarseGrainedExecutorBackend
      8a2b6ec [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      086ee25 [Kousuke Saruta] Merge branch 'metrics-structure-improvement2' of github.com:sarutak/spark into metrics-structure-improvement2
      e705386 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      36d2f7a [Kousuke Saruta] Added warning message for the situation we cannot get application id for the prefix for the name of metrics
      eea6e19 [Kousuke Saruta] Modified CoarseGrainedMesosSchedulerBackend and MesosSchedulerBackend so that we can get Application ID
      c229fbe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      e719c39 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      4a93c7f [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      4776f9e [Kousuke Saruta] Modified MetricsSystemSuite.scala
      efcb6e1 [Kousuke Saruta] Modified to add application id to metrics name
      2ec848a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      3ea7896 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      ead8966 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      08e627e [Kousuke Saruta] Revert "tmp"
      7b67f5a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      45bd33d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      93e263a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      848819c [Kousuke Saruta] Merge branch 'metrics-structure-improvement' of github.com:sarutak/spark into metrics-structure-improvement
      912a637 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      e4a4593 [Kousuke Saruta] tmp
      3e098d8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      4603a39 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      fa7175b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      15f88a3 [Kousuke Saruta] Modified MetricsSystem#buildRegistryName because conf.get does not return null when correspondin entry is absent
      6f7dcd4 [Kousuke Saruta] Modified constructor of DAGSchedulerSource and BlockManagerSource because the instance of SparkContext is no longer used
      6fc5560 [Kousuke Saruta] Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockManagerSource
      4e057c9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      85ffc02 [Kousuke Saruta] Revert "Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockManagerSource"
      868e326 [Kousuke Saruta] Modified MetricsSystem to set registry name with unique application-id and driver/executor-id
      71609f5 [Kousuke Saruta] Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockManagerSource
      55debab [Kousuke Saruta] Modified SparkContext and Executor to set spark.executor.id to identifiers
      4180993 [Kousuke Saruta] Modified SparkContext to retain spark.unique.app.name property in SparkConf
      79e45c93
  10. Oct 01, 2014
    • Reynold Xin's avatar
      [SPARK-3748] Log thread name in unit test logs · 3888ee2f
      Reynold Xin authored
      Thread names are useful for correlating failures.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2600 from rxin/log4j and squashes the following commits:
      
      83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs
      3888ee2f
  11. Sep 27, 2014
  12. Sep 25, 2014
    • Sean Owen's avatar
      SPARK-2932 [STREAMING] Move MasterFailureTest out of "main" source directory · c3f2a858
      Sean Owen authored
      (HT @vanzin) Whatever the reason was for having this test class in `main`, if there is one, appear to be moot. This may have been a result of earlier streaming test reorganization.
      
      This simply puts `MasterFailureTest` back under `test/`, removes some redundant copied code, and touches up a few tiny inspection warnings along the way.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2399 from srowen/SPARK-2932 and squashes the following commits:
      
      3909411 [Sean Owen] Move MasterFailureTest to src/test, and remove redundant TestOutputStream
      c3f2a858
  13. Sep 24, 2014
    • Aaron Staple's avatar
      [SPARK-546] Add full outer join to RDD and DStream. · 8ca4ecb6
      Aaron Staple authored
      leftOuterJoin and rightOuterJoin are already implemented.  This patch adds fullOuterJoin.
      
      Author: Aaron Staple <aaron.staple@gmail.com>
      
      Closes #1395 from staple/SPARK-546 and squashes the following commits:
      
      1f5595c [Aaron Staple] Fix python style
      7ac0aa9 [Aaron Staple] [SPARK-546] Add full outer join to RDD and DStream.
      3b5d137 [Aaron Staple] In JavaPairDStream, make class tag specification in rightOuterJoin consistent with other functions.
      31f2956 [Aaron Staple] Fix left outer join documentation comments.
      8ca4ecb6
  14. Sep 23, 2014
    • Mubarak Seyed's avatar
      [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI · 729952a5
      Mubarak Seyed authored
      This is a refactored version of the original PR https://github.com/apache/spark/pull/1723 my mubarak
      
      Please take a look andrewor14, mubarak
      
      Author: Mubarak Seyed <mubarak.seyed@gmail.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2464 from tdas/streaming-callsite and squashes the following commits:
      
      dc54c71 [Tathagata Das] Made changes based on PR comments.
      390b45d [Tathagata Das] Fixed minor bugs.
      904cd92 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-callsite
      7baa427 [Tathagata Das] Refactored getCallSite and setCallSite to make it simpler. Also added unit test for DStream creation site.
      b9ed945 [Mubarak Seyed] Adding streaming utils
      c461cf4 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      ceb43da [Mubarak Seyed] Changing default regex function name
      8c5d443 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      196121b [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      491a1eb [Mubarak Seyed] Removing streaming visibility from getRDDCreationCallSite in DStream
      33a7295 [Mubarak Seyed] Fixing review comments: Merging both setCallSite methods
      c26d933 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      f51fd9f [Mubarak Seyed] Fixing scalastyle, Regex for Utils.getCallSite, and changing method names in DStream
      5051c58 [Mubarak Seyed] Getting return value of compute() into variable and call setCallSite(prevCallSite) only once. Adding return for other code paths (for None)
      a207eb7 [Mubarak Seyed] Fixing code review comments
      ccde038 [Mubarak Seyed] Removing Utils import from MappedDStream
      2a09ad6 [Mubarak Seyed] Changes in Utils.scala for SPARK-1853
      1d90cc3 [Mubarak Seyed] Changes for SPARK-1853
      5f3105a [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      70f494f [Mubarak Seyed] Changes for SPARK-1853
      1500deb [Mubarak Seyed] Changes in Spark Streaming UI
      9d38d3c [Mubarak Seyed] [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
      d466d75 [Mubarak Seyed] Changes for spark streaming UI
      729952a5
    • Sean Owen's avatar
      SPARK-2745 [STREAMING] Add Java friendly methods to Duration class · e73b48ac
      Sean Owen authored
      tdas is this what you had in mind for this JIRA? I saw this one and thought it would be easy to take care of, and helpful as I use streaming from Java.
      
      I could do the same for `Time`? Happy to do so.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2403 from srowen/SPARK-2745 and squashes the following commits:
      
      5a9e706 [Sean Owen] Change "Duration" to "Durations" to avoid changing Duration case class API
      bda301c [Sean Owen] Just delegate to Scala binary operator syntax to avoid scalastyle warning
      7dde949 [Sean Owen] Disable scalastyle for false positives. Add Java static factory methods seconds(), minutes() to Duration. Add Java-friendly methods to Time too, and unit tests. Remove unnecessary math.floor from Time.floor()
      4dee32e [Sean Owen] Add named methods to Duration in parallel to symbolic methods for Java-friendliness. Also add unit tests for Duration, in Scala and Java.
      e73b48ac
  15. Sep 13, 2014
    • Sean Owen's avatar
      SPARK-3470 [CORE] [STREAMING] Add Closeable / close() to Java context objects · feaa3706
      Sean Owen authored
      ...  that expose a stop() lifecycle method. This doesn't add `AutoCloseable`, which is Java 7+ only. But it should be possible to use try-with-resources on a `Closeable` in Java 7, as long as the `close()` does not throw a checked exception, and these don't. Q.E.D.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2346 from srowen/SPARK-3470 and squashes the following commits:
      
      612c21d [Sean Owen] Add Closeable / close() to Java context objects that expose a stop() lifecycle method
      feaa3706
  16. Sep 11, 2014
    • witgo's avatar
      SPARK-2482: Resolve sbt warnings during build · 33c7a738
      witgo authored
      At the same time, import the `scala.language.postfixOps` and ` org.scalatest.time.SpanSugar._` cause `scala.language.postfixOps` doesn't work
      
      Author: witgo <witgo@qq.com>
      
      Closes #1330 from witgo/sbt_warnings3 and squashes the following commits:
      
      179ba61 [witgo] Resolve sbt warnings during build
      33c7a738
    • Andrew Or's avatar
      [Spark-3490] Disable SparkUI for tests · 6324eb7b
      Andrew Or authored
      We currently open many ephemeral ports during the tests, and as a result we occasionally can't bind to new ones. This has caused the `DriverSuite` and the `SparkSubmitSuite` to fail intermittently.
      
      By disabling the `SparkUI` when it's not needed, we already cut down on the number of ports opened significantly, on the order of the number of `SparkContexts` ever created. We must keep it enabled for a few tests for the UI itself, however.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2363 from andrewor14/disable-ui-for-tests and squashes the following commits:
      
      332a7d5 [Andrew Or] No need to set spark.ui.port to 0 anymore
      30c93a2 [Andrew Or] Simplify streaming UISuite
      a431b84 [Andrew Or] Fix streaming test failures
      8f5ae53 [Andrew Or] Fix no new line at the end
      29c9b5b [Andrew Or] Disable SparkUI for tests
      6324eb7b
  17. Sep 09, 2014
    • Mario Pastorelli's avatar
      [Docs] actorStream storageLevel default is MEMORY_AND_DISK_SER_2 · c419e4f1
      Mario Pastorelli authored
      Comment of the storageLevel param of actorStream says that it defaults to memory-only while the default is MEMORY_AND_DISK_SER_2.
      
      Author: Mario Pastorelli <pastorelli.mario@gmail.com>
      
      Closes #2319 from melrief/master and squashes the following commits:
      
      7b6ce68 [Mario Pastorelli] [Docs] actorStream storageLevel default is MEMORY_AND_DISK_SER_2
      c419e4f1
  18. Sep 06, 2014
  19. Aug 28, 2014
    • Yadong Qi's avatar
      [SPARK-3285] [examples] Using values.sum is easier to understand than using... · 39012452
      Yadong Qi authored
      [SPARK-3285] [examples] Using values.sum is easier to understand than using values.foldLeft(0)(_ + _)
      
      def sum[B >: A](implicit num: Numeric[B]): B = foldLeft(num.zero)(num.plus)
      Using values.sum is easier to understand than using values.foldLeft(0)(_ + _), so we'd better use values.sum instead of values.foldLeft(0)(_ + _)
      
      Author: Yadong Qi <qiyadong2010@gmail.com>
      
      Closes #2182 from watermen/bug-fix3 and squashes the following commits:
      
      17be9fb [Yadong Qi] Update CheckpointSuite.scala
      714bda5 [Yadong Qi] Update BasicOperationsSuite.scala
      57e704c [Yadong Qi] Update StatefulNetworkWordCount.scala
      39012452
  20. Aug 26, 2014
    • Andrew Or's avatar
      [SPARK-2886] Use more specific actor system name than "spark" · b21ae5bb
      Andrew Or authored
      As of #1777 we log the name of the actor system when it binds to a port. The current name "spark" is super general and does not convey any meaning. For instance, the following line is taken from my driver log after setting `spark.driver.port` to 5001.
      ```
      14/08/13 19:33:29 INFO Remoting: Remoting started; listening on addresses:
      [akka.tcp://sparkandrews-mbp:5001]
      14/08/13 19:33:29 INFO Remoting: Remoting now listens on addresses:
      [akka.tcp://sparkandrews-mbp:5001]
      14/08/06 13:40:05 INFO Utils: Successfully started service 'spark' on port 5001.
      ```
      This commit renames this to "sparkDriver" and "sparkExecutor". The goal of this unambitious PR is simply to make the logged information more explicit without introducing any change in functionality.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1810 from andrewor14/service-name and squashes the following commits:
      
      8c459ed [Andrew Or] Use a common variable for driver/executor actor system names
      3a92843 [Andrew Or] Change actor name to sparkDriver and sparkExecutor
      921363e [Andrew Or] Merge branch 'master' of github.com:apache/spark into service-name
      c8c6a62 [Andrew Or] Do not include hyphens in actor name
      1c1b42e [Andrew Or] Avoid spaces in akka system name
      f644b55 [Andrew Or] Use more specific service name
      b21ae5bb
  21. Aug 20, 2014
    • Xiangrui Meng's avatar
      [HOTFIX][Streaming][MLlib] use temp folder for checkpoint · fce5c0fb
      Xiangrui Meng authored
      or Jenkins will complain about no Apache header in checkpoint files. tdas rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #2046 from mengxr/tmp-checkpoint and squashes the following commits:
      
      0d3ec73 [Xiangrui Meng] remove ssc.stop
      9797843 [Xiangrui Meng] change checkpointDir to lazy val
      89964ab [Xiangrui Meng] use temp folder for checkpoint
      fce5c0fb
  22. Aug 19, 2014
    • freeman's avatar
      [SPARK-3128][MLLIB] Use streaming test suite for StreamingLR · 31f0b071
      freeman authored
      Refactored tests for streaming linear regression to use existing  streaming test utilities. Summary of changes:
      - Made ``mllib`` depend on tests from ``streaming``
      - Rewrote accuracy and convergence tests to use ``setupStreams`` and ``runStreams``
      - Added new test for the accuracy of predictions generated by ``predictOnValue``
      
      These tests should run faster, be easier to extend/maintain, and provide a reference for new tests.
      
      mengxr tdas
      
      Author: freeman <the.freeman.lab@gmail.com>
      
      Closes #2037 from freeman-lab/streamingLR-predict-tests and squashes the following commits:
      
      e851ca7 [freeman] Fixed long lines
      50eb0bf [freeman] Refactored tests to use streaming test tools
      32c43c2 [freeman] Added test for prediction
      31f0b071
  23. Aug 18, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2169] Don't copy appName / basePath everywhere. · 66ade00f
      Marcelo Vanzin authored
      Instead of keeping copies in all pages, just reference the values
      kept in the base SparkUI instance (by making them available via
      getters).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1252 from vanzin/SPARK-2169 and squashes the following commits:
      
      4412fc6 [Marcelo Vanzin] Simplify UIUtils.headerSparkPage signature.
      4e5d35a [Marcelo Vanzin] [SPARK-2169] Don't copy appName / basePath everywhere.
      66ade00f
  24. Aug 15, 2014
    • Anand Avati's avatar
      [SPARK-2924] remove default args to overloaded methods · 7589c39d
      Anand Avati authored
      Not supported in Scala 2.11. Split them into separate methods instead.
      
      Author: Anand Avati <avati@redhat.com>
      
      Closes #1704 from avati/SPARK-1812-default-args and squashes the following commits:
      
      3e3924a [Anand Avati] SPARK-1812: Add Mima excludes for the broken ABI
      901dfc7 [Anand Avati] SPARK-1812: core - Fix overloaded methods with default arguments
      07f00af [Anand Avati] SPARK-1812: streaming - Fix overloaded methods with default arguments
      7589c39d
    • Sean Owen's avatar
      SPARK-2955 [BUILD] Test code fails to compile with "mvn compile" without "install" · e1b85f31
      Sean Owen authored
      (This is the corrected follow-up to https://issues.apache.org/jira/browse/SPARK-2903)
      
      Right now, `mvn compile test-compile` fails to compile Spark. (Don't worry; `mvn package` works, so this is not major.) The issue stems from test code in some modules depending on test code in other modules. That is perfectly fine and supported by Maven.
      
      It takes extra work to get this to work with scalatest, and this has been attempted: https://github.com/apache/spark/blob/master/sql/catalyst/pom.xml#L86
      
      This formulation is not quite enough, since the SQL Core module's tests fail to compile for lack of finding test classes in SQL Catalyst, and likewise for most Streaming integration modules depending on core Streaming test code. Example:
      
      ```
      [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:23: not found: type PlanTest
      [error] class QueryTest extends PlanTest {
      [error]                         ^
      [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:28: package org.apache.spark.sql.test is not a value
      [error]   test("SPARK-1669: cacheTable should be idempotent") {
      [error]   ^
      ...
      ```
      
      The issue I believe is that generation of a `test-jar` is bound here to the `compile` phase, but the test classes are not being compiled in this phase. It should bind to the `test-compile` phase.
      
      It works when executing `mvn package` or `mvn install` since test-jar artifacts are actually generated available through normal Maven mechanisms as each module is built. They are then found normally, regardless of scalatest configuration.
      
      It would be nice for a simple `mvn compile test-compile` to work since the test code is perfectly compilable given the Maven declarations.
      
      On the plus side, this change is low-risk as it only affects tests.
      yhuai made the original scalatest change and has glanced at this and thinks it makes sense.
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1879 from srowen/SPARK-2955 and squashes the following commits:
      
      ad8242f [Sean Owen] Generate test-jar on test-compile for modules whose tests are needed by others' tests
      e1b85f31
  25. Aug 14, 2014
    • Reynold Xin's avatar
      Minor cleanup of metrics.Source · eaeb0f76
      Reynold Xin authored
      - Added override.
      - Marked some variables as private.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1943 from rxin/metricsSource and squashes the following commits:
      
      fbfa943 [Reynold Xin] Minor cleanup of metrics.Source. - Added override. - Marked some variables as private.
      eaeb0f76
  26. Aug 02, 2014
    • Andrew Or's avatar
      [SPARK-2454] Do not ship spark home to Workers · 148af608
      Andrew Or authored
      When standalone Workers launch executors, they inherit the Spark home set by the driver. This means if the worker machines do not share the same directory structure as the driver node, the Workers will attempt to run scripts (e.g. bin/compute-classpath.sh) that do not exist locally and fail. This is a common scenario if the driver is launched from outside of the cluster.
      
      The solution is to simply not pass the driver's Spark home to the Workers. This PR further makes an attempt to avoid overloading the usages of `spark.home`, which is now only used for setting executor Spark home on Mesos and in python.
      
      This is based on top of #1392 and originally reported by YanTangZhai. Tested on standalone cluster.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1734 from andrewor14/spark-home-reprise and squashes the following commits:
      
      f71f391 [Andrew Or] Revert changes in python
      1c2532c [Andrew Or] Merge branch 'master' of github.com:apache/spark into spark-home-reprise
      188fc5d [Andrew Or] Avoid using spark.home where possible
      09272b7 [Andrew Or] Always use Worker's working directory as spark home
      148af608
  27. Aug 01, 2014
    • joyyoj's avatar
      [SPARK-2379] Fix the bug that streaming's receiver may fall into a dead loop · b270309d
      joyyoj authored
      Author: joyyoj <sunshch@gmail.com>
      
      Closes #1694 from joyyoj/SPARK-2379 and squashes the following commits:
      
      d73790d [joyyoj] SPARK-2379 Fix the bug that streaming's receiver may fall into a dead loop
      22e7821 [joyyoj] Merge remote-tracking branch 'apache/master'
      3f4a602 [joyyoj] Merge remote-tracking branch 'remotes/apache/master'
      f4660c5 [joyyoj] [SPARK-1998] SparkFlumeEvent with body bigger than 1020 bytes are not read properly
      b270309d
  28. Jul 30, 2014
    • Sean Owen's avatar
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven... · 6ab96a6f
      Sean Owen authored
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep
      
      The Maven-based builds in the build matrix have been failing for a few days:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/
      
      On inspection, it looks like the Spark SQL Java tests don't compile:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/244/consoleFull
      
      I confirmed it by repeating the command vs master:
      
      `mvn -Dhadoop.version=1.0.4 -Dlabel=centos -DskipTests clean package`
      
      The problem is that this module doesn't depend on JUnit. In fact, none of the modules do, but `com.novocode:junit-interface` (the SBT-JUnit bridge) pulls it in, in most places. However this module doesn't depend on `com.novocode:junit-interface`
      
      Adding the `junit:junit` dependency fixes the compile problem. In fact, the other modules with Java tests should probably depend on it explicitly instead of happening to get it via `com.novocode:junit-interface`, since that is a bit SBT/Scala-specific (and I am not even sure it's needed).
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1660 from srowen/SPARK-2749 and squashes the following commits:
      
      858ff7c [Sean Owen] Add explicit junit dep to other modules with Java tests for robustness
      9636794 [Sean Owen] Add junit dep so that Spark SQL Java tests compile
      6ab96a6f
  29. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  30. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Andrew Or's avatar
      [SPARK-1777] Prevent OOMs from single partitions · ecf30ee7
      Andrew Or authored
      **Problem.** When caching, we currently unroll the entire RDD partition before making sure we have enough free memory. This is a common cause for OOMs especially when (1) the BlockManager has little free space left in memory, and (2) the partition is large.
      
      **Solution.** We maintain a global memory pool of `M` bytes shared across all threads, similar to the way we currently manage memory for shuffle aggregation. Then, while we unroll each partition, periodically check if there is enough space to continue. If not, drop enough RDD blocks to ensure we have at least `M` bytes to work with, then try again. If we still don't have enough space to unroll the partition, give up and drop the block to disk directly if applicable.
      
      **New configurations.**
      - `spark.storage.bufferFraction` - the value of `M` as a fraction of the storage memory. (default: 0.2)
      - `spark.storage.safetyFraction` - a margin of safety in case size estimation is slightly off. This is the equivalent of the existing `spark.shuffle.safetyFraction`. (default 0.9)
      
      For more detail, see the [design document](https://issues.apache.org/jira/secure/attachment/12651793/spark-1777-design-doc.pdf). Tests pending for performance and memory usage patterns.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1165 from andrewor14/them-rdd-memories and squashes the following commits:
      
      e77f451 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      c7c8832 [Andrew Or] Simplify logic + update a few comments
      269d07b [Andrew Or] Very minor changes to tests
      6645a8a [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      b7e165c [Andrew Or] Add new tests for unrolling blocks
      f12916d [Andrew Or] Slightly clean up tests
      71672a7 [Andrew Or] Update unrollSafely tests
      369ad07 [Andrew Or] Correct ensureFreeSpace and requestMemory behavior
      f4d035c [Andrew Or] Allow one thread to unroll multiple blocks
      a66fbd2 [Andrew Or] Rename a few things + update comments
      68730b3 [Andrew Or] Fix weird scalatest behavior
      e40c60d [Andrew Or] Fix MIMA excludes
      ff77aa1 [Andrew Or] Fix tests
      1a43c06 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      b9a6eee [Andrew Or] Simplify locking behavior on unrollMemoryMap
      ed6cda4 [Andrew Or] Formatting fix (super minor)
      f9ff82e [Andrew Or] putValues -> putIterator + putArray
      beb368f [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      8448c9b [Andrew Or] Fix tests
      a49ba4d [Andrew Or] Do not expose unroll memory check period
      69bc0a5 [Andrew Or] Always synchronize on putLock before unrollMemoryMap
      3f5a083 [Andrew Or] Simplify signature of ensureFreeSpace
      dce55c8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      8288228 [Andrew Or] Synchronize put and unroll properly
      4f18a3d [Andrew Or] bufferFraction -> unrollFraction
      28edfa3 [Andrew Or] Update a few comments / log messages
      728323b [Andrew Or] Do not synchronize every 1000 elements
      5ab2329 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      129c441 [Andrew Or] Fix bug: Use toArray rather than array
      9a65245 [Andrew Or] Update a few comments + minor control flow changes
      57f8d85 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      abeae4f [Andrew Or] Add comment clarifying the MEMORY_AND_DISK case
      3dd96aa [Andrew Or] AppendOnlyBuffer -> Vector (+ a few small changes)
      f920531 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      0871835 [Andrew Or] Add an effective storage level interface to BlockManager
      64e7d4c [Andrew Or] Add/modify a few comments (minor)
      8af2f35 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      4f4834e [Andrew Or] Use original storage level for blocks dropped to disk
      ecc8c2d [Andrew Or] Fix binary incompatibility
      24185ea [Andrew Or] Avoid dropping a block back to disk if reading from disk
      2b7ee66 [Andrew Or] Fix bug in SizeTracking*
      9b9a273 [Andrew Or] Fix tests
      20eb3e5 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      649bdb3 [Andrew Or] Document spark.storage.bufferFraction
      a10b0e7 [Andrew Or] Add initial memory request threshold + rename a few things
      e9c3cb0 [Andrew Or] cacheMemoryMap -> unrollMemoryMap
      198e374 [Andrew Or] Unfold -> unroll
      0d50155 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      d9d02a8 [Andrew Or] Remove unused param in unfoldSafely
      ec728d8 [Andrew Or] Add tests for safe unfolding of blocks
      22b2209 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      078eb83 [Andrew Or] Add check for hasNext in PrimitiveVector.iterator
      0871535 [Andrew Or] Fix tests in BlockManagerSuite
      d68f31e [Andrew Or] Safely unfold blocks for all memory puts
      5961f50 [Andrew Or] Fix tests
      195abd7 [Andrew Or] Refactor: move unfold logic to MemoryStore
      1e82d00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      3ce413e [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      d5dd3b4 [Andrew Or] Free buffer memory in finally
      ea02eec [Andrew Or] Fix tests
      b8e1d9c [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      a8704c1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      e1b8b25 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      87aa75c [Andrew Or] Fix mima excludes again (typo)
      11eb921 [Andrew Or] Clarify comment (minor)
      50cae44 [Andrew Or] Remove now duplicate mima exclude
      7de5ef9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      df47265 [Andrew Or] Fix binary incompatibility
      6d05a81 [Andrew Or] Merge branch 'master' of github.com:apache/spark into them-rdd-memories
      f94f5af [Andrew Or] Update a few comments (minor)
      776aec9 [Andrew Or] Prevent OOM if a single RDD partition is too large
      bbd3eea [Andrew Or] Fix CacheManagerSuite to use Array
      97ea499 [Andrew Or] Change BlockManager interface to use Arrays
      c12f093 [Andrew Or] Add SizeTrackingAppendOnlyBuffer and tests
      ecf30ee7
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  31. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
Loading