Skip to content
Snippets Groups Projects
  • Davies Liu's avatar
    ce95bd8e
    [SPARK-4531] [MLlib] cache serialized java object · ce95bd8e
    Davies Liu authored
    The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it cause much performance regression in 1.2, because we cache the serialized Python object in JVM, deserialize them into Java object in each step.
    
    This PR change to cache the deserialized JavaRDD instead of PythonRDD to avoid the deserialization of Pyrolite. It should have similar memory usage as before, but much faster.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3397 from davies/cache and squashes the following commits:
    
    7f6e6ce [Davies Liu] Update -> Updater
    4b52edd [Davies Liu] using named argument
    63b984e [Davies Liu] fix
    7da0332 [Davies Liu] add unpersist()
    dff33e1 [Davies Liu] address comments
    c2bdfc2 [Davies Liu] refactor
    d572f00 [Davies Liu] Merge branch 'master' into cache
    f1063e1 [Davies Liu] cache serialized java object
    ce95bd8e
    History
    [SPARK-4531] [MLlib] cache serialized java object
    Davies Liu authored
    The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it cause much performance regression in 1.2, because we cache the serialized Python object in JVM, deserialize them into Java object in each step.
    
    This PR change to cache the deserialized JavaRDD instead of PythonRDD to avoid the deserialization of Pyrolite. It should have similar memory usage as before, but much faster.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3397 from davies/cache and squashes the following commits:
    
    7f6e6ce [Davies Liu] Update -> Updater
    4b52edd [Davies Liu] using named argument
    63b984e [Davies Liu] fix
    7da0332 [Davies Liu] add unpersist()
    dff33e1 [Davies Liu] address comments
    c2bdfc2 [Davies Liu] refactor
    d572f00 [Davies Liu] Merge branch 'master' into cache
    f1063e1 [Davies Liu] cache serialized java object