Skip to content
Snippets Groups Projects
Commit a234cc61 authored by Liwei Lin's avatar Liwei Lin Committed by Michael Armbrust
Browse files

[SPARK-14874][SQL][STREAMING] Remove the obsolete Batch representation

## What changes were proposed in this pull request?

The `Batch` class, which had been used to indicate progress in a stream, was abandoned by [[SPARK-13985][SQL] Deterministic batches with ids](https://github.com/apache/spark/commit/caea15214571d9b12dcf1553e5c1cc8b83a8ba5b) and then became useless.

This patch:
- removes the `Batch` class
- ~~does some related renaming~~ (update: this has been reverted)
- fixes some related comments

## How was this patch tested?

N/A

Author: Liwei Lin <lwlin7@gmail.com>

Closes #12638 from lw-lin/remove-batch.
parent 7dd01d9c
No related branches found
No related tags found
No related merge requests found
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.sql.execution.streaming
import org.apache.spark.sql.DataFrame
/**
* Used to pass a batch of data through a streaming query execution along with an indication
* of progress in the stream.
*/
class Batch(val end: Offset, val data: DataFrame)
......@@ -88,7 +88,7 @@ class FileStreamSource(
}
/**
* Returns the next batch of data that is available after `start`, if any is available.
* Returns the data that is between the offsets (`start`, `end`].
*/
override def getBatch(start: Option[Offset], end: Offset): DataFrame = {
val startId = start.map(_.asInstanceOf[LongOffset].offset).getOrElse(-1L)
......
......@@ -27,7 +27,7 @@ import org.apache.spark.sql.DataFrame
trait Sink {
/**
* Adds a batch of data to this sink. The data for a given `batchId` is deterministic and if
* Adds a batch of data to this sink. The data for a given `batchId` is deterministic and if
* this method is called more than once with the same batchId (which will happen in the case of
* failures), then `data` should only be added once.
*/
......
......@@ -34,7 +34,7 @@ trait Source {
def getOffset: Option[Offset]
/**
* Returns the data that is between the offsets (`start`, `end`]. When `start` is `None` then
* Returns the data that is between the offsets (`start`, `end`]. When `start` is `None` then
* the batch should begin with the first available record. This method must always return the
* same data for a particular `start` and `end` pair.
*/
......
......@@ -91,7 +91,7 @@ case class MemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
}
/**
* Returns the next batch of data that is available after `start`, if any is available.
* Returns the data that is between the offsets (`start`, `end`].
*/
override def getBatch(start: Option[Offset], end: Offset): DataFrame = {
val startOrdinal =
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment