-
- Downloads
[SPARK-4969][STREAMING][PYTHON] Add binaryRecords to streaming
In Spark 1.2 we added a `binaryRecords` input method for loading flat binary data. This format is useful for numerical array data, e.g. in scientific computing applications. This PR adds support for the same format in Streaming applications, where it is similarly useful, especially for streaming time series or sensor data. Summary of additions - adding `binaryRecordsStream` to Spark Streaming - exposing `binaryRecordsStream` in the new PySpark Streaming - new unit tests in Scala and Python This required adding an optional Hadoop configuration param to `fileStream` and `FileInputStream`, but was otherwise straightforward. tdas davies Author: freeman <the.freeman.lab@gmail.com> Closes #3803 from freeman-lab/streaming-binary-records and squashes the following commits: b676534 [freeman] Clarify note 5ff1b75 [freeman] Add note to java streaming context eba925c [freeman] Simplify notes c4237b8 [freeman] Add experimental tag 30eba67 [freeman] Add filter and newFilesOnly alongside conf c2cfa6d [freeman] Expose new version of fileStream with conf in java 34d20ef [freeman] Add experimental tag 14bca9a [freeman] Add experimental tag b85bffc [freeman] Formatting 47560f4 [freeman] Space formatting 9a3715a [freeman] Refactor to reflect changes to FileInputSuite 7373f73 [freeman] Add note and defensive assertion for byte length 3ceb684 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-binary-records 317b6d1 [freeman] Make test inline fcb915c [freeman] Formatting becb344 [freeman] Formatting d3e75b2 [freeman] Add tests in python a4324a3 [freeman] Line length 029d49c [freeman] Formatting 1c739aa [freeman] Simpler default arg handling 94d90d0 [freeman] Spelling 2843e9d [freeman] Add params to docstring 8b70fbc [freeman] Reorganization 28bff9b [freeman] Fix missing arg 9398bcb [freeman] Expose optional hadoop configuration 23dd69f [freeman] Tests for binaryRecordsStream 36cb0fd [freeman] Add binaryRecordsStream to scala fe4e803 [freeman] Add binaryRecordStream to Java API ecef0eb [freeman] Add binaryRecordsStream to python 8550c26 [freeman] Expose additional argument combination
Showing
- core/src/main/scala/org/apache/spark/SparkContext.scala 8 additions, 1 deletioncore/src/main/scala/org/apache/spark/SparkContext.scala
- python/pyspark/streaming/context.py 15 additions, 1 deletionpython/pyspark/streaming/context.py
- python/pyspark/streaming/tests.py 15 additions, 0 deletionspython/pyspark/streaming/tests.py
- streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala 58 additions, 1 deletion...n/scala/org/apache/spark/streaming/StreamingContext.scala
- streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala 51 additions, 1 deletion...pache/spark/streaming/api/java/JavaStreamingContext.scala
- streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala 14 additions, 3 deletions...org/apache/spark/streaming/dstream/FileInputDStream.scala
- streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala 51 additions, 0 deletions.../scala/org/apache/spark/streaming/InputStreamsSuite.scala
Loading
Please register or sign in to comment