Skip to content
Snippets Groups Projects
Commit 1a49496b authored by Cheng Lian's avatar Cheng Lian Committed by Michael Armbrust
Browse files

[SPARK-6082] [SQL] Provides better error message for malformed rows when caching tables

Constructs like Hive `TRANSFORM` may generate malformed rows (via badly authored external scripts for example). I'm a bit hesitant to have this feature, since it introduces per-tuple cost when caching tables. However, considering caching tables is usually a one-time cost, this is probably worth having.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4842)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #4842 from liancheng/spark-6082 and squashes the following commits:

b05dbff [Cheng Lian] Provides better error message for malformed rows when caching tables
parent 8223ce6a
No related branches found
No related tags found
No related merge requests found
......@@ -119,6 +119,17 @@ private[sql] case class InMemoryRelation(
var rowCount = 0
while (rowIterator.hasNext && rowCount < batchSize) {
val row = rowIterator.next()
// Added for SPARK-6082. This assertion can be useful for scenarios when something
// like Hive TRANSFORM is used. The external data generation script used in TRANSFORM
// may result malformed rows, causing ArrayIndexOutOfBoundsException, which is somewhat
// hard to decipher.
assert(
row.size == columnBuilders.size,
s"""Row column number mismatch, expected ${output.size} columns, but got ${row.size}.
|Row content: $row
""".stripMargin)
var i = 0
while (i < row.length) {
columnBuilders(i).appendFrom(row, i)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment