Skip to content
Snippets Groups Projects
  1. Nov 04, 2015
    • Reynold Xin's avatar
      Closes #9464 · 987df4bf
      Reynold Xin authored
      987df4bf
    • Reynold Xin's avatar
      [SPARK-11490][SQL] variance should alias var_samp instead of var_pop. · 3bd6f5d2
      Reynold Xin authored
      stddev is an alias for stddev_samp. variance should be consistent with stddev.
      
      Also took the chance to remove internal Stddev and Variance, and only kept StddevSamp/StddevPop and VarianceSamp/VariancePop.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9449 from rxin/SPARK-11490.
      3bd6f5d2
    • Wenchen Fan's avatar
      [SPARK-11197][SQL] add doc for run SQL on files directly · e0fc9c7e
      Wenchen Fan authored
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #9467 from cloud-fan/doc.
      e0fc9c7e
    • Reynold Xin's avatar
      [SPARK-11485][SQL] Make DataFrameHolder and DatasetHolder public. · cd1df662
      Reynold Xin authored
      These two classes should be public, since they are used in public code.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9445 from rxin/SPARK-11485.
      cd1df662
    • Marcelo Vanzin's avatar
      [SPARK-11235][NETWORK] Add ability to stream data using network lib. · 27feafcc
      Marcelo Vanzin authored
      The current interface used to fetch shuffle data is not very efficient for
      large buffers; it requires the receiver to buffer the entirety of the
      contents being downloaded in memory before processing the data.
      
      To use the network library to transfer large files (such as those that
      can be added using SparkContext addJar / addFile), this change adds a
      more efficient way of downloding data, by streaming the data and feeding
      it to a callback as data arrives.
      
      This is achieved by a custom frame decoder that replaces the current netty
      one; this decoder allows entering a mode where framing is skipped and data
      is instead provided directly to a callback. The existing netty classes
      (ByteToMessageDecoder and LengthFieldBasedFrameDecoder) could not be reused
      since their semantics do not allow for the interception approach the new
      decoder uses.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9206 from vanzin/SPARK-11235.
      27feafcc
    • Marcelo Vanzin's avatar
      [SPARK-10622][CORE][YARN] Differentiate dead from "mostly dead" executors. · 8790ee6d
      Marcelo Vanzin authored
      In YARN mode, when preemption is enabled, we may leave executors in a
      zombie state while we wait to retrieve the reason for which the executor
      exited. This is so that we don't account for failed tasks that were
      running on a preempted executor.
      
      The issue is that while we wait for this information, the scheduler
      might decide to schedule tasks on the executor, which will never be
      able to run them. Other side effects include the block manager still
      considering the executor available to cache blocks, for example.
      
      So, when we know that an executor went down but we don't know why,
      stop everything related to the executor, except its running tasks.
      Only when we know the reason for the exit (or give up waiting for
      it) we do update the running tasks.
      
      This is achieved by a new `disableExecutor()` method in the
      `Schedulable` interface. For managers that do not behave like this
      (i.e. every one but YARN), the existing `executorLost()` method
      will behave the same way it did before.
      
      On top of that change, a few minor changes that made debugging easier,
      and fixed some other minor issues:
      - The cluster-mode AM was printing a misleading log message every
        time an executor disconnected from the driver (because the akka
        actor system was shared between driver and AM).
      - Avoid sending unnecessary requests for an executor's exit reason
        when we already know it was explicitly disabled / killed. This
        avoids both multiple requests, and unnecessary requests that would
        just cause warning messages on the AM (in the explicit kill case).
      - Tone down a log message about the executor being lost when it
        exited normally (e.g. preemption)
      - Wake up the AM monitor thread when requests for executor loss
        reasons arrive too, so that we can more quickly remove executors
        from this zombie state.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8887 from vanzin/SPARK-10622.
      8790ee6d
    • Xusen Yin's avatar
      [SPARK-11443] Reserve space lines · 9b214cea
      Xusen Yin authored
      The trim_codeblock(lines) function in include_example.rb removes some blank lines in the code.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #9400 from yinxusen/SPARK-11443.
      9b214cea
    • Pravin Gadakh's avatar
      [SPARK-11380][DOCS] Replace example code in mllib-frequent-pattern-mining.md using include_example · 820064e6
      Pravin Gadakh authored
      Author: Pravin Gadakh <pravingadakh177@gmail.com>
      Author: Pravin Gadakh <prgadakh@in.ibm.com>
      
      Closes #9340 from pravingadakh/SPARK-11380.
      820064e6
    • Yanbo Liang's avatar
      [SPARK-9492][ML][R] LogisticRegression in R should provide model statistics · e328b69c
      Yanbo Liang authored
      Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9303 from yanboliang/spark-9492.
      e328b69c
    • tedyu's avatar
      [SPARK-11442] Reduce numSlices for local metrics test of SparkListenerSuite · c09e5139
      tedyu authored
      In the thread, http://search-hadoop.com/m/q3RTtcQiFSlTxeP/test+failed+due+to+OOME&subj=test+failed+due+to+OOME, it was discussed that memory consumption for SparkListenerSuite should be brought down.
      
      This is an attempt in that direction by reducing numSlices for local metrics test.
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #9384 from tedyu/master.
      c09e5139
    • jerryshao's avatar
      [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) · 8aff36e9
      jerryshao authored
      This PR is based on the work of roji to support running Spark scripts from symlinks. Thanks for the great work roji . Would you mind taking a look at this PR, thanks a lot.
      
      For releases like HDP and others, normally it will expose the Spark executables as symlinks and put in `PATH`, but current Spark's scripts do not support finding real path from symlink recursively, this will make spark fail to execute from symlink. This PR try to solve this issue by finding the absolute path from symlink.
      
      Instead of using `readlink -f` like what this PR (https://github.com/apache/spark/pull/2386) implemented is that `-f` is not support for Mac, so here manually seeking the path through loop.
      
      I've tested with Mac and Linux (Cent OS), looks fine.
      
      This PR did not fix the scripts under `sbin` folder, not sure if it needs to be fixed also?
      
      Please help to review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      Author: Shay Rojansky <roji@roji.org>
      
      Closes #8669 from jerryshao/SPARK-2960.
      8aff36e9
  2. Nov 03, 2015
  3. Nov 02, 2015
Loading