Skip to content
Snippets Groups Projects
Commit 8c49cebc authored by Josh Rosen's avatar Josh Rosen Committed by Reynold Xin
Browse files

[SPARK-14966] SizeEstimator should ignore classes in the scala.reflect package

In local profiling, I noticed SizeEstimator spending tons of time estimating the size of objects which contain TypeTag or ClassTag fields. The problem with these tags is that they reference global Scala reflection objects, which, in turn, reference many singletons, such as TestHive. This throws off the accuracy of the size estimation and wastes tons of time traversing a huge object graph.

As a result, I think that SizeEstimator should ignore any classes in the `scala.reflect` package.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #12741 from JoshRosen/ignore-scala-reflect-in-size-estimator.
parent f5ebb18c
No related branches found
No related tags found
No related merge requests found
......@@ -207,6 +207,9 @@ object SizeEstimator extends Logging {
val cls = obj.getClass
if (cls.isArray) {
visitArray(obj, cls, state)
} else if (cls.getName.startsWith("scala.reflect")) {
// Many objects in the scala.reflect package reference global reflection objects which, in
// turn, reference many other large global objects. Do nothing in this case.
} else if (obj.isInstanceOf[ClassLoader] || obj.isInstanceOf[Class[_]]) {
// Hadoop JobConfs created in the interpreter have a ClassLoader, which greatly confuses
// the size estimator since it references the whole REPL. Do nothing in this case. In
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment