Skip to content
Snippets Groups Projects
  1. Dec 18, 2015
    • Shixiong Zhu's avatar
      [SPARK-11097][CORE] Add channelActive callback to RpcHandler to monitor the new connections · 007a32f9
      Shixiong Zhu authored
      Added `channelActive` to `RpcHandler` so that `NettyRpcHandler` doesn't need `clients` any more.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10301 from zsxwing/network-events.
      007a32f9
    • Nong Li's avatar
      [SPARK-12411][CORE] Decrease executor heartbeat timeout to match heartbeat interval · 0514e8d4
      Nong Li authored
      Previously, the rpc timeout was the default network timeout, which is the same value
      the driver uses to determine dead executors. This means if there is a network issue,
      the executor is determined dead after one heartbeat attempt. There is a separate config
      for the heartbeat interval which is a better value to use for the heartbeat RPC. With
      this change, the executor will make multiple heartbeat attempts even with RPC issues.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #10365 from nongli/spark-12411.
      0514e8d4
    • Grace's avatar
      [SPARK-9552] Return "false" while nothing to kill in killExecutors · 60da0e11
      Grace authored
      In discussion (SPARK-9552), we proposed a force kill in `killExecutors`. But if there is nothing to kill, it will return back with true (acknowledgement). And then, it causes the certain executor(s) (which is not eligible to kill) adding to pendingToRemove list for further actions.
      
      In this patch, we'd like to change the return semantics. If there is nothing to kill, we will return "false". and therefore  all those non-eligible executors won't be added to the pendingToRemove list.
      
      vanzin andrewor14 As the follow up of PR#7888, please let me know your comments.
      
      Author: Grace <jie.huang@intel.com>
      Author: Jie Huang <hjie@fosun.com>
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #9796 from GraceH/emptyPendingToRemove.
      60da0e11
    • Burak Yavuz's avatar
      [SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docs · 2377b707
      Burak Yavuz authored
       - Provide example on `message handler`
       - Provide bit on KPL record de-aggregation
       - Fix typos
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #9970 from brkyvz/kinesis-docs.
      2377b707
    • Kousuke Saruta's avatar
      [SPARK-12404][SQL] Ensure objects passed to StaticInvoke is Serializable · 6eba6552
      Kousuke Saruta authored
      Now `StaticInvoke` receives `Any` as a object and `StaticInvoke` can be serialized but sometimes the object passed is not serializable.
      
      For example, following code raises Exception because `RowEncoder#extractorsFor` invoked indirectly makes `StaticInvoke`.
      
      ```
      case class TimestampContainer(timestamp: java.sql.Timestamp)
      val rdd = sc.parallelize(1 to 2).map(_ => TimestampContainer(System.currentTimeMillis))
      val df = rdd.toDF
      val ds = df.as[TimestampContainer]
      val rdd2 = ds.rdd                                 <----------------- invokes extractorsFor indirectory
      ```
      
      I'll add test cases.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #10357 from sarutak/SPARK-12404.
      6eba6552
    • Yin Huai's avatar
      [SPARK-12218][SQL] Invalid splitting of nested AND expressions in Data Source filter API · 41ee7c57
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-12218
      
      When creating filters for Parquet/ORC, we should not push nested AND expressions partially.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #10362 from yhuai/SPARK-12218.
      41ee7c57
    • Davies Liu's avatar
      [SPARK-12054] [SQL] Consider nullability of expression in codegen · 4af647c7
      Davies Liu authored
      This could simplify the generated code for expressions that is not nullable.
      
      This PR fix lots of bugs about nullability.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10333 from davies/skip_nullable.
      4af647c7
    • Dilip Biswal's avatar
      [SPARK-11619][SQL] cannot use UDTF in DataFrame.selectExpr · ee444fe4
      Dilip Biswal authored
      Description of the problem from cloud-fan
      
      Actually this line: https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L689
      When we use `selectExpr`, we pass in `UnresolvedFunction` to `DataFrame.select` and fall in the last case. A workaround is to do special handling for UDTF like we did for `explode`(and `json_tuple` in 1.6), wrap it with `MultiAlias`.
      Another workaround is using `expr`, for example, `df.select(expr("explode(a)").as(Nil))`, I think `selectExpr` is no longer needed after we have the `expr` function....
      
      Author: Dilip Biswal <dbiswal@us.ibm.com>
      
      Closes #9981 from dilipbiswal/spark-11619.
      ee444fe4
    • Marcelo Vanzin's avatar
      [SPARK-12350][CORE] Don't log errors when requested stream is not found. · 27828182
      Marcelo Vanzin authored
      If a client requests a non-existent stream, just send a failure message
      back, without logging any error on the server side (since it's not a
      server error).
      
      On the executor side, avoid error logs by translating any errors during
      transfer to a `ClassNotFoundException`, so that loading the class is
      retried on a the parent class loader. This can mask IO errors during
      transmission, but the most common cause is that the class is not
      served by the remote end.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10337 from vanzin/SPARK-12350.
      27828182
    • Jeff L's avatar
      [SPARK-9057][STREAMING] Twitter example joining to static RDD of word sentiment values · ea59b0f3
      Jeff L authored
      Example of joining a static RDD of word sentiments to a streaming RDD of Tweets in order to demo the usage of the transform() method.
      
      Author: Jeff L <sha0lin@alumni.carnegiemellon.edu>
      
      Closes #8431 from Agent007/SPARK-9057.
      ea59b0f3
    • Michael Gummelt's avatar
      [SPARK-12413] Fix Mesos ZK persistence · 2bebaa39
      Michael Gummelt authored
      I believe this fixes SPARK-12413.  I'm currently running an integration test to verify.
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #10366 from mgummelt/fix-zk-mesos.
      2bebaa39
    • Jeff Zhang's avatar
      [CORE][TESTS] minor fix of JavaSerializerSuite · 40e52a27
      Jeff Zhang authored
      Not jira is created.
      The original test is passed because the class cast is lazy (only when the object's method is invoked).
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10371 from zjffdu/minor_fix.
      40e52a27
  2. Dec 17, 2015
  3. Dec 16, 2015
    • Andrew Or's avatar
      [SPARK-12390] Clean up unused serializer parameter in BlockManager · 97678ede
      Andrew Or authored
      No change in functionality is intended. This only changes internal API.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10343 from andrewor14/clean-bm-serializer.
      97678ede
    • Marcelo Vanzin's avatar
      [SPARK-12386][CORE] Fix NPE when spark.executor.port is set. · d1508dd9
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10339 from vanzin/SPARK-12386.
      d1508dd9
    • Rohit Agarwal's avatar
      [SPARK-12186][WEB UI] Send the complete request URI including the query string when redirecting. · fdb38227
      Rohit Agarwal authored
      Author: Rohit Agarwal <rohita@qubole.com>
      
      Closes #10180 from mindprince/SPARK-12186.
      fdb38227
    • tedyu's avatar
      [SPARK-12365][CORE] Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called · f590178d
      tedyu authored
      SPARK-9886 fixed ExternalBlockStore.scala
      
      This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook()
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #10325 from ted-yu/master.
      f590178d
    • Imran Rashid's avatar
      [SPARK-10248][CORE] track exceptions in dagscheduler event loop in tests · 38d9795a
      Imran Rashid authored
      `DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs).  However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.
      
      This was suggested by mateiz on https://github.com/apache/spark/pull/7699.  It may have already turned up an issue in "zero split job".
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #8466 from squito/SPARK-10248.
      38d9795a
    • Andrew Or's avatar
      MAINTENANCE: Automated closing of pull requests. · ce5fd400
      Andrew Or authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1217 (requested by ankurdave, srowen)
      Closes #4650 (requested by andrewor14)
      Closes #5307 (requested by vanzin)
      Closes #5664 (requested by andrewor14)
      Closes #5713 (requested by marmbrus)
      Closes #5722 (requested by andrewor14)
      Closes #6685 (requested by srowen)
      Closes #7074 (requested by srowen)
      Closes #7119 (requested by andrewor14)
      Closes #7997 (requested by jkbradley)
      Closes #8292 (requested by srowen)
      Closes #8975 (requested by andrewor14, vanzin)
      Closes #8980 (requested by andrewor14, davies)
      ce5fd400
    • Andrew Or's avatar
      [MINOR] Add missing interpolation in NettyRPCEnv · 861549ac
      Andrew Or authored
      ```
      Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
      Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
      	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10334 from andrewor14/rpc-typo.
      861549ac
    • Davies Liu's avatar
      [SPARK-12380] [PYSPARK] use SQLContext.getOrCreate in mllib · 27b98e99
      Davies Liu authored
      MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10338 from davies/create_context.
      27b98e99
    • Martin Menestret's avatar
      [SPARK-9690][ML][PYTHON] pyspark CrossValidator random seed · 3a44aebd
      Martin Menestret authored
      Extend CrossValidator with HasSeed in PySpark.
      
      This PR replaces [https://github.com/apache/spark/pull/7997]
      
      CC: yanboliang thunterdb mmenestret  Would one of you mind taking a look?  Thanks!
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      Author: Martin MENESTRET <mmenestret@ippon.fr>
      
      Closes #10268 from jkbradley/pyspark-cv-seed.
      3a44aebd
    • hyukjinkwon's avatar
      [SPARK-11677][SQL] ORC filter tests all pass if filters are actually not pushed down. · 9657ee87
      hyukjinkwon authored
      Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this.
      Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #9687 from HyukjinKwon/SPARK-11677.
      9657ee87
    • gatorsmile's avatar
      [SPARK-12164][SQL] Decode the encoded values and then display · edf65cd9
      gatorsmile authored
      Based on the suggestions from marmbrus cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded values(user objects) in `Dataset.show`
      ```scala
          implicit val kryoEncoder = Encoders.kryo[KryoClassData]
          val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS()
          ds.show(20, false);
      ```
      The current output is like
      ```
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |value                                                                                                                                                                                 |
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]|
      |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]|
      |[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]|
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      ```
      After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData`
      ```scala
      override def toString: String = s"KryoClassData($a, $b)"
      ```
      ```
      +-------------------+
      |value              |
      +-------------------+
      |KryoClassData(a, 1)|
      |KryoClassData(b, 2)|
      |KryoClassData(c, 3)|
      +-------------------+
      ```
      
      If users do not override the `toString` function, the results will be like
      ```
      +---------------------------------------+
      |value                                  |
      +---------------------------------------+
      |org.apache.spark.sql.KryoClassData68ef|
      |org.apache.spark.sql.KryoClassData6915|
      |org.apache.spark.sql.KryoClassData693b|
      +---------------------------------------+
      ```
      
      Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values?
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #10215 from gatorsmile/showDecodedValue.
      edf65cd9
    • Wenchen Fan's avatar
      [SPARK-12320][SQL] throw exception if the number of fields does not line up for Tuple encoder · a783a8ed
      Wenchen Fan authored
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10293 from cloud-fan/err-msg.
      a783a8ed
    • Yanbo Liang's avatar
      [SPARK-12364][ML][SPARKR] Add ML example for SparkR · 1a8b2a17
      Yanbo Liang authored
      We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```.
      
      cc mengxr jkbradley shivaram
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10324 from yanboliang/spark-12364.
      1a8b2a17
    • Joseph K. Bradley's avatar
      [SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6 · 8148cc7a
      Joseph K. Bradley authored
      No known breaking changes, but some deprecations and changes of behavior.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #10235 from jkbradley/mllib-guide-update-1.6.
      8148cc7a
Loading