Skip to content
Snippets Groups Projects
  1. Nov 25, 2015
    • felixcheung's avatar
      [SPARK-11984][SQL][PYTHON] Fix typos in doc for pivot for scala and python · faabdfa2
      felixcheung authored
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #9967 from felixcheung/pypivotdoc.
      faabdfa2
    • Marcelo Vanzin's avatar
      [SPARK-11956][CORE] Fix a few bugs in network lib-based file transfer. · c1f85fc7
      Marcelo Vanzin authored
      - NettyRpcEnv::openStream() now correctly propagates errors to
        the read side of the pipe.
      - NettyStreamManager now throws if the file being transferred does
        not exist.
      - The network library now correctly handles zero-sized streams.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9941 from vanzin/SPARK-11956.
      c1f85fc7
    • Mark Hamstra's avatar
      [SPARK-10666][SPARK-6880][CORE] Use properties from ActiveJob associated with a Stage · 0a5aef75
      Mark Hamstra authored
      This issue was addressed in https://github.com/apache/spark/pull/5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug.  The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId".  Because of a long-standing bug, the `jobId` parameter was never being used.  Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880.
      
      The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks.
      
      This fix should be applied to all maintenance branches, since it has existed since 1.0.
      
      kayousterhout pankajarora12
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #6291 from markhamstra/SPARK-6880.
      0a5aef75
    • Jeff Zhang's avatar
      [SPARK-11860][PYSAPRK][DOCUMENTATION] Invalid argument specification … · b9b6fbe8
      Jeff Zhang authored
      …for registerFunction [Python]
      
      Straightforward change on the python doc
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #9901 from zjffdu/SPARK-11860.
      b9b6fbe8
    • Ashwin Swaroop's avatar
      [SPARK-11686][CORE] Issue WARN when dynamic allocation is disabled due to... · 63850026
      Ashwin Swaroop authored
      [SPARK-11686][CORE] Issue WARN when dynamic allocation is disabled due to spark.dynamicAllocation.enabled and spark.executor.instances both set
      
      Changed the log type to a 'warning' instead of 'info' as required.
      
      Author: Ashwin Swaroop <Ashwin Swaroop>
      
      Closes #9926 from ashwinswaroop/master.
      63850026
    • Reynold Xin's avatar
      [SPARK-11981][SQL] Move implementations of methods back to DataFrame from Queryable · a0f1a118
      Reynold Xin authored
      Also added show methods to Dataset.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9964 from rxin/SPARK-11981.
      a0f1a118
    • gatorsmile's avatar
      [SPARK-11970][SQL] Adding JoinType into JoinWith and support Sample in Dataset API · 2610e061
      gatorsmile authored
      Except inner join, maybe the other join types are also useful when users are using the joinWith function. Thus, added the joinType into the existing joinWith call in Dataset APIs.
      
      Also providing another joinWith interface for the cartesian-join-like functionality.
      
      Please provide your opinions. marmbrus rxin cloud-fan Thank you!
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #9921 from gatorsmile/joinWith.
      2610e061
    • Tathagata Das's avatar
      [SPARK-11979][STREAMING] Empty TrackStateRDD cannot be checkpointed and... · 21698868
      Tathagata Das authored
      [SPARK-11979][STREAMING] Empty TrackStateRDD cannot be checkpointed and recovered from checkpoint file
      
      This solves the following exception caused when empty state RDD is checkpointed and recovered. The root cause is that an empty OpenHashMapBasedStateMap cannot be deserialized as the initialCapacity is set to zero.
      ```
      Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 20, localhost): java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity
      	at scala.Predef$.require(Predef.scala:233)
      	at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.<init>(StateMap.scala:96)
      	at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.<init>(StateMap.scala:86)
      	at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.readObject(StateMap.scala:291)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
      	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
      	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
      	at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
      	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
      	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
      	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
      	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
      	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
      	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
      	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
      	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
      	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
      	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:921)
      	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:921)
      	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
      	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:88)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      ```
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #9958 from tdas/SPARK-11979.
      21698868
  2. Nov 24, 2015
  3. Nov 23, 2015
    • Stephen Samuel's avatar
      Updated sql programming guide to include jdbc fetch size · 026ea2ea
      Stephen Samuel authored
      Author: Stephen Samuel <sam@sksamuel.com>
      
      Closes #9377 from sksamuel/master.
      026ea2ea
    • Bryan Cutler's avatar
      [SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionWithSGD... · 10574564
      Bryan Cutler authored
      [SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionWithSGD Python API equal to Scala one
      
      This is to bring the API documentation of StreamingLogisticReressionWithSGD and StreamingLinearRegressionWithSGC in line with the Scala versions.
      
      -Fixed the algorithm descriptions
      -Added default values to parameter descriptions
      -Changed StreamingLogisticRegressionWithSGD regParam to default to 0, as in the Scala version
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #9141 from BryanCutler/StreamingLogisticRegressionWithSGD-python-api-sync.
      10574564
    • Josh Rosen's avatar
      [SPARK-9866][SQL] Speed up VersionsSuite by using persistent Ivy cache · 9db5f601
      Josh Rosen authored
      This patch attempts to speed up VersionsSuite by storing fetched Hive JARs in an Ivy cache that persists across tests runs. If `SPARK_VERSIONS_SUITE_IVY_PATH` is set, that path will be used for the cache; if it is not set, VersionsSuite will create a temporary Ivy cache which is deleted after the test completes.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #9624 from JoshRosen/SPARK-9866.
      9db5f601
    • Marcelo Vanzin's avatar
      [SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv. · c2467dad
      Marcelo Vanzin authored
      This change abstracts the code that serves jars / files to executors so that
      each RpcEnv can have its own implementation; the akka version uses the existing
      HTTP-based file serving mechanism, while the netty versions uses the new
      stream support added to the network lib, which makes file transfers benefit
      from the easier security configuration of the network library, and should also
      reduce overhead overall.
      
      The change includes a small fix to TransportChannelHandler so that it propagates
      user events to downstream handlers.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9530 from vanzin/SPARK-11140.
      c2467dad
    • Marcelo Vanzin's avatar
      [SPARK-11865][NETWORK] Avoid returning inactive client in TransportClientFactory. · 7cfa4c6b
      Marcelo Vanzin authored
      There's a very narrow race here where it would be possible for the timeout handler
      to close a channel after the client factory verified that the channel was still
      active. This change makes sure the client is marked as being recently in use so
      that the timeout handler does not close it until a new timeout cycle elapses.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9853 from vanzin/SPARK-11865.
      7cfa4c6b
    • Luciano Resende's avatar
      [SPARK-11910][STREAMING][DOCS] Update twitter4j dependency version · 242be7da
      Luciano Resende authored
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #9892 from lresende/SPARK-11910.
      242be7da
    • Davies Liu's avatar
      [SPARK-11836][SQL] udf/cast should not create new SQLContext · 1d912020
      Davies Liu authored
      They should use the existing SQLContext.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9914 from davies/create_udf.
      1d912020
    • Josh Rosen's avatar
      [SPARK-4424] Remove spark.driver.allowMultipleContexts override in tests · 1b6e938b
      Josh Rosen authored
      This patch removes `spark.driver.allowMultipleContexts=true` from our test configuration. The multiple SparkContexts check was originally disabled because certain tests suites in SQL needed to create multiple contexts. As far as I know, this configuration change is no longer necessary, so we should remove it in order to make it easier to find test cleanup bugs.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #9865 from JoshRosen/SPARK-4424.
      1b6e938b
Loading