Skip to content
Snippets Groups Projects
  1. Oct 27, 2015
    • maxwell's avatar
      [SPARK-5569][STREAMING] fix ObjectInputStreamWithLoader for supporting load array classes. · 17f49992
      maxwell authored
      When use Kafka DirectStream API to create checkpoint and restore saved checkpoint when restart,
      ClassNotFound exception would occur.
      
      The reason for this error is that ObjectInputStreamWithLoader extends the ObjectInputStream class and override its resolveClass method. But Instead of Using Class.forName(desc,false,loader), Spark uses loader.loadClass(desc) to instance the class, which do not works with array class.
      
      For example:
      Class.forName("[Lorg.apache.spark.streaming.kafka.OffsetRange.",false,loader) works well while loader.loadClass("[Lorg.apache.spark.streaming.kafka.OffsetRange") would throw an class not found exception.
      
      details of the difference between Class.forName and loader.loadClass can be found here.
      http://bugs.java.com/view_bug.do?bug_id=6446627
      
      Author: maxwell <maxwellzdm@gmail.com>
      Author: DEMING ZHU <deming.zhu@linecorp.com>
      
      Closes #8955 from maxwellzdm/master.
      17f49992
    • Nick Evans's avatar
      [SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition... · 8f888eea
      Nick Evans authored
      [SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition from the Kafka Streaming API
      
      jerryshao tdas
      
      I know this is kind of minor, and I know you all are busy, but this brings this class in line with the `OffsetRange` class, and makes tests a little more concise.
      
      Instead of doing something like:
      ```
      assert topic_and_partition_instance._topic == "foo"
      assert topic_and_partition_instance._partition == 0
      ```
      
      You can do something like:
      ```
      assert topic_and_partition_instance == TopicAndPartition("foo", 0)
      ```
      
      Before:
      ```
      >>> from pyspark.streaming.kafka import TopicAndPartition
      >>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0)
      False
      ```
      
      After:
      ```
      >>> from pyspark.streaming.kafka import TopicAndPartition
      >>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0)
      True
      ```
      
      I couldn't find any tests - am I missing something?
      
      Author: Nick Evans <me@nicolasevans.org>
      
      Closes #9236 from manygrams/topic_and_partition_equality.
      8f888eea
    • Sem Mulder's avatar
      [SPARK-11276][CORE] SizeEstimator prevents class unloading · feb8d6a4
      Sem Mulder authored
      The SizeEstimator keeps a cache of ClassInfos but this cache uses Class objects as keys.
      Which results in strong references to the Class objects. If these classes are dynamically created
      this prevents the corresponding ClassLoader from being GCed. Leading to PermGen exhaustion.
      
      We use a Map with WeakKeys to prevent this issue.
      
      Author: Sem Mulder <sem.mulder@site2mobile.com>
      
      Closes #9244 from SemMulder/fix-sizeestimator-classunloading.
      feb8d6a4
    • Xusen Yin's avatar
      [SPARK-11297] Add new code tags · d77d198f
      Xusen Yin authored
      mengxr https://issues.apache.org/jira/browse/SPARK-11297
      
      Add new code tags to hold the same look and feel with previous documents.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #9265 from yinxusen/SPARK-11297.
      d77d198f
    • Reza Zadeh's avatar
      [SPARK-10654][MLLIB] Add columnSimilarities to IndexedRowMatrix · 8b292b19
      Reza Zadeh authored
      Add columnSimilarities to IndexedRowMatrix by delegating to functionality already in RowMatrix.
      
      With a test.
      
      Author: Reza Zadeh <reza@databricks.com>
      
      Closes #8792 from rezazadeh/colsims.
      8b292b19
  2. Oct 26, 2015
  3. Oct 25, 2015
    • Xiangrui Meng's avatar
      [SPARK-11127][STREAMING] upgrade AWS SDK and Kinesis Client Library (KCL) · 87f82a5f
      Xiangrui Meng authored
      AWS SDK 1.9.40 is the latest 1.9.x release. KCL 1.5.1 is the latest release that using AWS SDK 1.9.x. The main goal is to have Kinesis consumer be able to read messages generated from Kinesis Producer Library (KPL). The API should be compatible with old versions.
      
      tdas brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #9153 from mengxr/SPARK-11127.
      87f82a5f
    • Josh Rosen's avatar
      [SPARK-10984] Simplify *MemoryManager class structure · 85e654c5
      Josh Rosen authored
      This patch refactors the MemoryManager class structure. After #9000, Spark had the following classes:
      
      - MemoryManager
      - StaticMemoryManager
      - ExecutorMemoryManager
      - TaskMemoryManager
      - ShuffleMemoryManager
      
      This is fairly confusing. To simplify things, this patch consolidates several of these classes:
      
      - ShuffleMemoryManager and ExecutorMemoryManager were merged into MemoryManager.
      - TaskMemoryManager is moved into Spark Core.
      
      **Key changes and tasks**:
      
      - [x] Merge ExecutorMemoryManager into MemoryManager.
        - [x] Move pooling logic into Allocator.
      - [x] Move TaskMemoryManager from `spark-unsafe` to `spark-core`.
      - [x] Refactor the existing Tungsten TaskMemoryManager interactions so Tungsten code use only this and not both this and ShuffleMemoryManager.
      - [x] Refactor non-Tungsten code to use the TaskMemoryManager instead of ShuffleMemoryManager.
      - [x] Merge ShuffleMemoryManager into MemoryManager.
        - [x] Move code
        - [x] ~~Simplify 1/n calculation.~~ **Will defer to followup, since this needs more work.**
      - [x] Port ShuffleMemoryManagerSuite tests.
      - [x] Move classes from `unsafe` package to `memory` package.
      - [ ] Figure out how to handle the hacky use of the memory managers in HashedRelation's broadcast variable construction.
      - [x] Test porting and cleanup: several tests relied on mock functionality (such as `TestShuffleMemoryManager.markAsOutOfMemory`) which has been changed or broken during the memory manager consolidation
        - [x] AbstractBytesToBytesMapSuite
        - [x] UnsafeExternalSorterSuite
        - [x] UnsafeFixedWidthAggregationMapSuite
        - [x] UnsafeKVExternalSorterSuite
      
      **Compatiblity notes**:
      
      - This patch introduces breaking changes in `ExternalAppendOnlyMap`, which is marked as `DevloperAPI` (likely for legacy reasons): this class now cannot be used outside of a task.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #9127 from JoshRosen/SPARK-10984.
      85e654c5
    • Burak Yavuz's avatar
      [SPARK-10891][STREAMING][KINESIS] Add MessageHandler to... · 63accc79
      Burak Yavuz authored
      [SPARK-10891][STREAMING][KINESIS] Add MessageHandler to KinesisUtils.createStream similar to Direct Kafka
      
      This PR allows users to map a Kinesis `Record` to a generic `T` when creating a Kinesis stream. This is particularly useful, if you would like to do extra work with Kinesis metadata such as sequence number, and partition key.
      
      TODO:
       - [x] add tests
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #8954 from brkyvz/kinesis-handler.
      63accc79
    • Bryan Cutler's avatar
      [SPARK-11287] Fixed class name to properly start TestExecutor from deploy.client.TestClient · 80279ac1
      Bryan Cutler authored
      Executing deploy.client.TestClient fails due to bad class name for TestExecutor in ApplicationDescription.
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #9255 from BryanCutler/fix-TestClient-classname-SPARK-11287.
      80279ac1
    • Alexander Slesarenko's avatar
      [SPARK-6428][SQL] Removed unnecessary typecasts in MutableInt, MutableDouble etc. · 92b9c5ed
      Alexander Slesarenko authored
      marmbrus rxin I believe these typecasts are not required in the presence of explicit return types.
      
      Author: Alexander Slesarenko <avslesarenko@gmail.com>
      
      Closes #9262 from aslesarenko/remove-typecasts.
      92b9c5ed
    • Josh Rosen's avatar
      [SPARK-11299][DOC] Fix link to Scala DataFrame Functions reference · b67dc6a4
      Josh Rosen authored
      The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #9269 from JoshRosen/SPARK-11299.
      b67dc6a4
  4. Oct 24, 2015
    • Jacek Laskowski's avatar
      Fix typos · 146da0d8
      Jacek Laskowski authored
      Two typos squashed.
      
      BTW Let me know how to proceed with other typos if I ran across any. I don't feel well to leave them aside as much as sending pull requests with such tiny changes. Guide me.
      
      Author: Jacek Laskowski <jacek.laskowski@deepsense.io>
      
      Closes #9250 from jaceklaskowski/typos-hunting.
      146da0d8
    • Jeffrey Naisbitt's avatar
      [SPARK-11264] bin/spark-class can't find assembly jars with certain GREP_OPTIONS set · 28132ceb
      Jeffrey Naisbitt authored
      Temporarily remove GREP_OPTIONS if set in bin/spark-class.
      
      Some GREP_OPTIONS will modify the output of the grep commands that are looking for the assembly jars.
      For example, if the -n option is specified, the grep output will look like:
      5:spark-assembly-1.5.1-hadoop2.4.0.jar
      
      This will not match the regular expressions, and so the jar files will not be found.  We could improve the regular expression to handle this case and trim off extra characters, but it is difficult to know which options may or may not be set.  Unsetting GREP_OPTIONS within the script handles all the cases and gives the desired output.
      
      Author: Jeffrey Naisbitt <jnaisbitt@familysearch.org>
      
      Closes #9231 from naisbitt/unset-GREP_OPTIONS.
      28132ceb
    • dima's avatar
      [SPARK-11245] update twitter4j to 4.0.4 version · e5bc8c27
      dima authored
      update twitter4j to 4.0.4 version
      https://issues.apache.org/jira/browse/SPARK-11245
      
      Author: dima <pronix.service@gmail.com>
      
      Closes #9221 from pronix/twitter4j_update.
      e5bc8c27
    • Jeff Zhang's avatar
      [SPARK-11125] [SQL] Uninformative exception when running spark-sql witho… · ffed0049
      Jeff Zhang authored
      …ut building with -Phive-thriftserver and SPARK_PREPEND_CLASSES is set
      
      This is the exception after this patch. Please help review.
      ```
      java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
      	at java.lang.ClassLoader.defineClass1(Native Method)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
      	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
      	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
      	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
      	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
      	at java.lang.Class.forName0(Native Method)
      	at java.lang.Class.forName(Class.java:270)
      	at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
      	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:647)
      	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
      	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
      	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
      	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliDriver
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
      	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
      	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
      	... 21 more
      Failed to load hive class.
      You need to build Spark with -Phive and -Phive-thriftserver.
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #9134 from zjffdu/SPARK-11125.
      ffed0049
  5. Oct 23, 2015
Loading