Skip to content
Snippets Groups Projects
  1. Nov 18, 2015
  2. Nov 17, 2015
    • Reynold Xin's avatar
      [SPARK-11797][SQL] collect, first, and take should use encoders for serialization · 91f4b6f2
      Reynold Xin authored
      They were previously using Spark's default serializer for serialization.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9787 from rxin/SPARK-11797.
      91f4b6f2
    • Davies Liu's avatar
      [SPARK-11737] [SQL] Fix serialization of UTF8String with Kyro · 98be8169
      Davies Liu authored
      The default implementation of serialization UTF8String with Kyro may be not correct (BYTE_ARRAY_OFFSET could be different across JVM)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9704 from davies/kyro_string.
      98be8169
    • Kent Yao's avatar
      [SPARK-11583] [CORE] MapStatus Using RoaringBitmap More Properly · e33053ee
      Kent Yao authored
      This PR upgrade the version of RoaringBitmap to 0.5.10, to optimize the memory layout, will be much smaller when most of blocks are empty.
      
      This PR is based on #9661 (fix conflicts), see all of the comments at https://github.com/apache/spark/pull/9661 .
      
      Author: Kent Yao <yaooqinn@hotmail.com>
      Author: Davies Liu <davies@databricks.com>
      Author: Charles Allen <charles@allen-net.com>
      
      Closes #9746 from davies/roaring_mapstatus.
      e33053ee
    • Davies Liu's avatar
      [SPARK-11016] Move RoaringBitmap to explicit Kryo serializer · bf25f9bd
      Davies Liu authored
      Fix the serialization of RoaringBitmap with Kyro serializer
      
      This PR came from https://github.com/metamx/spark/pull/1, thanks to drcrallen
      
      Author: Davies Liu <davies@databricks.com>
      Author: Charles Allen <charles@allen-net.com>
      
      Closes #9748 from davies/SPARK-11016.
      bf25f9bd
    • Reynold Xin's avatar
      [SPARK-11793][SQL] Dataset should set the resolved encoders internally for maps. · ed8d1531
      Reynold Xin authored
      I also wrote a test case -- but unfortunately the test case is not working due to SPARK-11795.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9784 from rxin/SPARK-11503.
      ed8d1531
    • jerryshao's avatar
      [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API · 75a29229
      jerryshao authored
      Fixed the merge conflicts in #7410
      
      Closes #7410
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      Author: jerryshao <saisai.shao@intel.com>
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #9742 from zsxwing/pr7410.
      75a29229
    • Jacek Lewandowski's avatar
      [SPARK-11726] Throw exception on timeout when waiting for REST server response · b362d50f
      Jacek Lewandowski authored
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #9692 from jacek-lewandowski/SPARK-11726.
      b362d50f
    • Holden Karau's avatar
      [SPARK-11771][YARN][TRIVIAL] maximum memory in yarn is controlled by two... · 52c734b5
      Holden Karau authored
      [SPARK-11771][YARN][TRIVIAL] maximum memory in yarn is controlled by two params have both in error msg
      
      When we exceed the max memory tell users to increase both params instead of just the one.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #9758 from holdenk/SPARK-11771-maximum-memory-in-yarn-is-controlled-by-two-params-have-both-in-error-msg.
      52c734b5
    • Shixiong Zhu's avatar
      [SPARK-11790][STREAMING][TESTS] Increase the connection timeout · 3720b148
      Shixiong Zhu authored
      Sometimes, EmbeddedZookeeper may need more than 6 seconds to setup up in a slow Jenkins worker. So just increase the timeout, it won't increase the test time if the test passes.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #9778 from zsxwing/SPARK-11790.
      3720b148
    • Rohan Bhanderi's avatar
      [MINOR] Correct comments in JavaDirectKafkaWordCount · e29656f8
      Rohan Bhanderi authored
      Author: Rohan Bhanderi <rohan.bhanderi@sjsu.edu>
      
      Closes #9781 from RohanBhanderi/patch-3.
      e29656f8
    • Grace's avatar
      [SPARK-9552] Add force control for killExecutors to avoid false killing for those busy executors · 965245d0
      Grace authored
      By using the dynamic allocation, sometimes it occurs false killing for those busy executors. Some executors with assignments will be killed because of being idle for enough time (say 60 seconds). The root cause is that the Task-Launch listener event is asynchronized.
      
      For example, some executors are under assigning tasks, but not sending out the listener notification yet. Meanwhile, the dynamic allocation's executor idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the same time.
       1. the timer expiration starts before the listener event arrives.
       2. Then, the task is going to run on top of that killed/killing executor. It will lead to task failure finally.
      
      Here is the proposal to fix it. We can add the force control for killExecutor. If the force control is not set (i.e., false), we'd better to check if the executor under killing is idle or busy. If the current executor has some assignment, we should not kill that executor and return back false (to indicate killing failure). In dynamic allocation, we'd better to turn off force killing (i.e., force = false), we will meet killing failure if tries to kill a busy executor. And then, the executor timer won't be invalid. Later on, the task assignment event arrives, we can remove the idle timer accordingly. So that we can avoid false killing for those busy executors in dynamic allocation.
      
      For the rest of usages, the end users can decide if to use force killing or not by themselves.  If to turn on that option, the killExecutor will do the action without any status checking.
      
      Author: Grace <jie.huang@intel.com>
      Author: Andrew Or <andrew@databricks.com>
      Author: Jie Huang <jie.huang@intel.com>
      
      Closes #7888 from GraceH/forcekill.
      965245d0
    • Shixiong Zhu's avatar
      [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch · 928d6316
      Shixiong Zhu authored
      We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch interval, checkpointing for completing an old batch may run after checkpointing for generating a new batch. If this happens, checkpoint of an old batch actually has the latest information, so we want to recovery from it. This PR will use the latest checkpoint time as the file name, so that we can always recovery from the latest checkpoint file.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #9707 from zsxwing/fix-checkpoint.
      928d6316
    • Marcelo Vanzin's avatar
      [SPARK-11786][CORE] Tone down messages from akka error monitor. · 936bc0bc
      Marcelo Vanzin authored
      There events happen normally during the app's lifecycle, so printing
      out ERROR logs all the time is misleading, and can actually affect usability
      of interactive shells.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9772 from vanzin/SPARK-11786.
      936bc0bc
    • Xiangrui Meng's avatar
      [SPARK-11764][ML] make Param.jsonEncode/jsonDecode support Vector · 3e9e6380
      Xiangrui Meng authored
      This PR makes the default read/write work with simple transformers/estimators that have params of type `Param[Vector]`. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #9776 from mengxr/SPARK-11764.
      3e9e6380
    • Joseph K. Bradley's avatar
      [SPARK-11763][ML] Add save,load to LogisticRegression Estimator · 6eb7008b
      Joseph K. Bradley authored
      Add save/load to LogisticRegression Estimator, and refactor tests a little to make it easier to add similar support to other Estimator, Model pairs.
      
      Moved LogisticRegressionReader/Writer to within LogisticRegressionModel
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #9749 from jkbradley/lr-io-2.
      6eb7008b
    • Xusen Yin's avatar
      [SPARK-11729] Replace example code in ml-linear-methods.md using include_example · 328eb49e
      Xusen Yin authored
      JIRA link: https://issues.apache.org/jira/browse/SPARK-11729
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #9713 from yinxusen/SPARK-11729.
      328eb49e
    • Timothy Hunter's avatar
      [SPARK-11732] Removes some MiMa false positives · fa603e08
      Timothy Hunter authored
      This adds an extra filter for private or protected classes. We only filter for package private right now.
      
      Author: Timothy Hunter <timhunter@databricks.com>
      
      Closes #9697 from thunterdb/spark-11732.
      fa603e08
Loading