Skip to content
  • freeman's avatar
    242b4f02
    [SPARK-4969][STREAMING][PYTHON] Add binaryRecords to streaming · 242b4f02
    freeman authored
    In Spark 1.2 we added a `binaryRecords` input method for loading flat binary data. This format is useful for numerical array data, e.g. in scientific computing applications. This PR adds support for the same format in Streaming applications, where it is similarly useful, especially for streaming time series or sensor data.
    
    Summary of additions
    - adding `binaryRecordsStream` to Spark Streaming
    - exposing `binaryRecordsStream` in the new PySpark Streaming
    - new unit tests in Scala and Python
    
    This required adding an optional Hadoop configuration param to `fileStream` and `FileInputStream`, but was otherwise straightforward.
    
    tdas davies
    
    Author: freeman <the.freeman.lab@gmail.com>
    
    Closes #3803 from freeman-lab/streaming-binary-records and squashes the following commits:
    
    b676534 [freeman] Clarify note
    5ff1b75 [freeman] Add note to java streaming context
    eba925c [freeman] Simplify notes
    c4237b8 [freeman] Add experimental tag
    30eba67 [freeman] Add filter and newFilesOnly alongside conf
    c2cfa6d [freeman] Expose new version of fileStream with conf in java
    34d20ef [freeman] Add experimental tag
    14bca9a [freeman] Add experimental tag
    b85bffc [freeman] Formatting
    47560f4 [freeman] Space formatting
    9a3715a [freeman] Refactor to reflect changes to FileInputSuite
    7373f73 [freeman] Add note and defensive assertion for byte length
    3ceb684 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-binary-records
    317b6d1 [freeman] Make test inline
    fcb915c [freeman] Formatting
    becb344 [freeman] Formatting
    d3e75b2 [freeman] Add tests in python
    a4324a3 [freeman] Line length
    029d49c [freeman] Formatting
    1c739aa [freeman] Simpler default arg handling
    94d90d0 [freeman] Spelling
    2843e9d [freeman] Add params to docstring
    8b70fbc [freeman] Reorganization
    28bff9b [freeman] Fix missing arg
    9398bcb [freeman] Expose optional hadoop configuration
    23dd69f [freeman] Tests for binaryRecordsStream
    36cb0fd [freeman] Add binaryRecordsStream to scala
    fe4e803 [freeman] Add binaryRecordStream to Java API
    ecef0eb [freeman] Add binaryRecordsStream to python
    8550c26 [freeman] Expose additional argument combination
    242b4f02
    [SPARK-4969][STREAMING][PYTHON] Add binaryRecords to streaming
    freeman authored
    In Spark 1.2 we added a `binaryRecords` input method for loading flat binary data. This format is useful for numerical array data, e.g. in scientific computing applications. This PR adds support for the same format in Streaming applications, where it is similarly useful, especially for streaming time series or sensor data.
    
    Summary of additions
    - adding `binaryRecordsStream` to Spark Streaming
    - exposing `binaryRecordsStream` in the new PySpark Streaming
    - new unit tests in Scala and Python
    
    This required adding an optional Hadoop configuration param to `fileStream` and `FileInputStream`, but was otherwise straightforward.
    
    tdas davies
    
    Author: freeman <the.freeman.lab@gmail.com>
    
    Closes #3803 from freeman-lab/streaming-binary-records and squashes the following commits:
    
    b676534 [freeman] Clarify note
    5ff1b75 [freeman] Add note to java streaming context
    eba925c [freeman] Simplify notes
    c4237b8 [freeman] Add experimental tag
    30eba67 [freeman] Add filter and newFilesOnly alongside conf
    c2cfa6d [freeman] Expose new version of fileStream with conf in java
    34d20ef [freeman] Add experimental tag
    14bca9a [freeman] Add experimental tag
    b85bffc [freeman] Formatting
    47560f4 [freeman] Space formatting
    9a3715a [freeman] Refactor to reflect changes to FileInputSuite
    7373f73 [freeman] Add note and defensive assertion for byte length
    3ceb684 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-binary-records
    317b6d1 [freeman] Make test inline
    fcb915c [freeman] Formatting
    becb344 [freeman] Formatting
    d3e75b2 [freeman] Add tests in python
    a4324a3 [freeman] Line length
    029d49c [freeman] Formatting
    1c739aa [freeman] Simpler default arg handling
    94d90d0 [freeman] Spelling
    2843e9d [freeman] Add params to docstring
    8b70fbc [freeman] Reorganization
    28bff9b [freeman] Fix missing arg
    9398bcb [freeman] Expose optional hadoop configuration
    23dd69f [freeman] Tests for binaryRecordsStream
    36cb0fd [freeman] Add binaryRecordsStream to scala
    fe4e803 [freeman] Add binaryRecordStream to Java API
    ecef0eb [freeman] Add binaryRecordsStream to python
    8550c26 [freeman] Expose additional argument combination
Loading