Skip to content
Snippets Groups Projects
  1. Feb 13, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19520][STREAMING] Do not encrypt data written to the WAL. · 7fe3543f
      Marcelo Vanzin authored
      
      Spark's I/O encryption uses an ephemeral key for each driver instance.
      So driver B cannot decrypt data written by driver A since it doesn't
      have the correct key.
      
      The write ahead log is used for recovery, thus needs to be readable by
      a different driver. So it cannot be encrypted by Spark's I/O encryption
      code.
      
      The BlockManager APIs used by the WAL code to write the data automatically
      encrypt data, so changes are needed so that callers can to opt out of
      encryption.
      
      Aside from that, the "putBytes" API in the BlockManager does not do
      encryption, so a separate situation arised where the WAL would write
      unencrypted data to the BM and, when those blocks were read, decryption
      would fail. So the WAL code needs to ask the BM to encrypt that data
      when encryption is enabled; this code is not optimal since it results
      in a (temporary) second copy of the data block in memory, but should be
      OK for now until a more performant solution is added. The non-encryption
      case should not be affected.
      
      Tested with new unit tests, and by running streaming apps that do
      recovery using the WAL data with I/O encryption turned on.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16862 from vanzin/SPARK-19520.
      
      (cherry picked from commit 0169360e)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      7fe3543f
  2. Dec 29, 2016
    • adesharatushar's avatar
      [SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section Design... · 47ab4afe
      adesharatushar authored
      [SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section Design Patterns for using foreachRDD
      
      ## What changes were proposed in this pull request?
      
      Added missing Java example under section "Design Patterns for using foreachRDD". Now this section has examples in all 3 languages, improving consistency of documentation.
      
      ## How was this patch tested?
      
      Manual.
      Generated docs using command "SKIP_API=1 jekyll build" and verified generated HTML page manually.
      
      The syntax of example has been tested for correctness using sample code on Java1.7 and Spark 2.2.0-SNAPSHOT.
      
      Author: adesharatushar <tushar_adeshara@persistent.com>
      
      Closes #16408 from adesharatushar/streaming-doc-fix.
      
      (cherry picked from commit dba81e1d)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      47ab4afe
  3. Nov 23, 2016
  4. Nov 19, 2016
    • hyukjinkwon's avatar
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note... · 4b396a65
      hyukjinkwon authored
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that`/`'''Note:'''` across Scala/Java API documentation
      
      It seems in Scala/Java,
      
      - `Note:`
      - `NOTE:`
      - `Note that`
      - `'''Note:'''`
      - `note`
      
      This PR proposes to fix those to `note` to be consistent.
      
      **Before**
      
      - Scala
        ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png)
      
      - Java
        ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png)
      
      **After**
      
      - Scala
        ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png)
      
      - Java
        ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png
      
      )
      
      The notes were found via
      
      ```bash
      grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// NOTE: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...`
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note that " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// '''Note:''' " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      And then fixed one by one comparing with API documentation/access modifiers.
      
      After that, manually tested via `jekyll build`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #15889 from HyukjinKwon/SPARK-18437.
      
      (cherry picked from commit d5b1d5fc)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      4b396a65
  5. Sep 29, 2016
    • José Hiram Soltren's avatar
      [DOCS] Reorganize explanation of Accumulators and Broadcast Variables · 95820049
      José Hiram Soltren authored
      ## What changes were proposed in this pull request?
      
      The discussion of the interaction of Accumulators and Broadcast Variables should logically follow the discussion on Checkpointing. As currently written, this section discusses Checkpointing before it is formally introduced. To remedy this:
      
       - Rename this section to "Accumulators, Broadcast Variables, and Checkpoints", and
       - Move this section after "Checkpointing".
      
      ## How was this patch tested?
      
      Testing: ran
      
      $ SKIP_API=1 jekyll build
      
      , and verified changes in a Web browser pointed at docs/_site/index.html.
      
      Author: José Hiram Soltren <jose@cloudera.com>
      
      Closes #15281 from jsoltren/doc-changes.
      95820049
  6. Sep 14, 2016
  7. Sep 09, 2016
    • Satendra Kumar's avatar
      Streaming doc correction. · 7098a129
      Satendra Kumar authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      Streaming doc correction.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Satendra Kumar <satendra@knoldus.com>
      
      Closes #14996 from satendrakumar06/patch-1.
      7098a129
  8. Aug 30, 2016
    • Dmitriy Sokolov's avatar
      [MINOR][DOCS] Fix minor typos in python example code · d4eee993
      Dmitriy Sokolov authored
      ## What changes were proposed in this pull request?
      
      Fix minor typos python example code in streaming programming guide
      
      ## How was this patch tested?
      
      N/A
      
      Author: Dmitriy Sokolov <silentsokolov@gmail.com>
      
      Closes #14805 from silentsokolov/fix-typos.
      d4eee993
  9. Aug 25, 2016
  10. Aug 13, 2016
    • Jagadeesan's avatar
      [SPARK-12370][DOCUMENTATION] Documentation should link to examples … · e46cb78b
      Jagadeesan authored
      ## What changes were proposed in this pull request?
      
      When documentation is built is should reference examples from the same build. There are times when the docs have links that point to files in the GitHub head which may not be valid on the current release. Changed that in URLs to make them point to the right tag in git using ```SPARK_VERSION_SHORT```
      
      …from its own release version] [Streaming programming guide]
      
      Author: Jagadeesan <as2@us.ibm.com>
      
      Closes #14596 from jagadeesanas2/SPARK-12370.
      e46cb78b
  11. Aug 12, 2016
  12. Aug 07, 2016
    • Shivansh's avatar
      [SPARK-16911] Fix the links in the programming guide · 6c1ecb19
      Shivansh authored
      ## What changes were proposed in this pull request?
      
       Fix the broken links in the programming guide of the Graphx Migration and understanding closures
      
      ## How was this patch tested?
      
      By running the test cases  and checking the links.
      
      Author: Shivansh <shiv4nsh@gmail.com>
      
      Closes #14503 from shiv4nsh/SPARK-16911.
      6c1ecb19
  13. Aug 05, 2016
  14. Jul 27, 2016
    • Bartek Wiśniewski's avatar
      [MINOR][DOC] missing keyword new · bc4851ad
      Bartek Wiśniewski authored
      ## What changes were proposed in this pull request?
      
      added missing keyword for java example
      
      ## How was this patch tested?
      
      wasn't
      
      Author: Bartek Wiśniewski <wedi@Ava.local>
      
      Closes #14381 from wedi-dev/quickfix/missing_keyword.
      bc4851ad
  15. Jul 15, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide · 5ffd5d38
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Made DataFrame-based API primary
      * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
      * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
      * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
        * **Reviewers: please check this carefully**
      * (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
      * Moved migration guide to ml-guide from mllib-guide
        * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
        * **Reviewers**: I did not change any of the content of the migration guides.
      
      Reorganized DataFrame-based guide:
      * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
      * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
        * **Reviewers**: I did not change the content of these guides, except some intro text.
      * Sidebar remains the same, but with pipeline and tuning sections added
      
      Other:
      * ml-classification-regression.html: Moved text about linear methods to new section in page
      
      ## How was this patch tested?
      
      Generated docs locally
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #14213 from jkbradley/ml-guide-2.0.
      5ffd5d38
  16. Jul 06, 2016
    • WeichenXu's avatar
      [DOC][SQL] update out-of-date code snippets using SQLContext in all documents. · b1310425
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      I search the whole documents directory using SQLContext, and update the following places:
      
      - docs/configuration.md, sparkR code snippets.
      - docs/streaming-programming-guide.md, several example code.
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14025 from WeichenXu123/WIP_SQLContext_update.
      b1310425
  17. Jun 15, 2016
  18. Jun 12, 2016
    • Sean Owen's avatar
      [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API · f51dfe61
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Deprecate old Java accumulator API; should use Scala now
      - Update Java tests and examples
      - Don't bother testing old accumulator API in Java 8 (too)
      - (fix a misspelling too)
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13606 from srowen/SPARK-15086.
      f51dfe61
  19. Jun 07, 2016
    • WeichenXu's avatar
      [MINOR] fix typo in documents · 1e2c9311
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      I use spell check tools checks typo in spark documents and fix them.
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #13538 from WeichenXu123/fix_doc_typo.
      1e2c9311
  20. Jun 02, 2016
    • Liwei Lin's avatar
      [SPARK-15208][WIP][CORE][STREAMING][DOCS] Update Spark examples with AccumulatorV2 · a0eec8e8
      Liwei Lin authored
      ## What changes were proposed in this pull request?
      
      The patch updates the codes & docs in the example module as well as the related doc module:
      
      - [ ] [docs] `streaming-programming-guide.md`
        - [x] scala code part
        - [ ] java code part
        - [ ] python code part
      - [x] [examples] `RecoverableNetworkWordCount.scala`
      - [ ] [examples] `JavaRecoverableNetworkWordCount.java`
      - [ ] [examples] `recoverable_network_wordcount.py`
      
      ## How was this patch tested?
      
      Ran the examples and verified results manually.
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #12981 from lw-lin/accumulatorV2-examples.
      a0eec8e8
  21. May 30, 2016
    • Matthew Wise's avatar
      [DOCS] fix example code issues in documentation · 2d34183b
      Matthew Wise authored
      ## What changes were proposed in this pull request?
      
      Fixed broken java code examples in streaming documentation
      
      Attn: tdas
      
      Author: Matthew Wise <matthew.rs.wise@gmail.com>
      
      Closes #13388 from mawise/fix_docs_java_streaming_example.
      2d34183b
  22. May 27, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'a -> an' · 6b1a6180
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `a` -> `an`
      
      I use regex to generate potential error lines:
      `grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml/*/*scala`
      and review them line by line.
      
      ## How was this patch tested?
      
      local build
      `lint-java` checking
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13317 from zhengruifeng/a_an.
      6b1a6180
  23. May 17, 2016
  24. May 11, 2016
    • cody koeninger's avatar
      [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact · 89e67d66
      cody koeninger authored
      ## What changes were proposed in this pull request?
      Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions
      
      ## How was this patch tested?
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #12946 from koeninger/SPARK-15085.
      89e67d66
  25. Apr 02, 2016
  26. Mar 26, 2016
    • Shixiong Zhu's avatar
      [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq,... · d23ad7c1
      Shixiong Zhu authored
      [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter
      
      ## What changes were proposed in this pull request?
      
      This PR removes all docs about the old streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter projects since I have already copied them to https://github.com/spark-packages
      
      Also remove mqtt_wordcount.py that I forgot to remove previously.
      
      ## How was this patch tested?
      
      Jenkins PR Build.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11824 from zsxwing/remove-doc.
      d23ad7c1
  27. Mar 15, 2016
  28. Mar 09, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. · c3689bc2
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator.
      
      ```
      -    final ArrayList<Product2<Object, Object>> dataToWrite =
      -      new ArrayList<Product2<Object, Object>>();
      +    final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>();
      ```
      
      Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this.
      
      ## How was this patch tested?
      
      Manual.
      Pass the existing tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11541 from dongjoon-hyun/SPARK-13702.
      c3689bc2
  29. Mar 07, 2016
    • rmishra's avatar
      [SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly... · 4b13896e
      rmishra authored
      [SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount
      
      ## What changes were proposed in this pull request?
      The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey.
      
      ## How was this patch tested?
      Have tested the new documentation with jekyll build.
      
      Author: rmishra <rmishra@pivotal.io>
      
      Closes #11545 from rishitesh/SPARK-13705.
      4b13896e
  30. Feb 22, 2016
  31. Feb 19, 2016
  32. Feb 10, 2016
    • Sean Owen's avatar
      [SPARK-12414][CORE] Remove closure serializer · 29c54730
      Sean Owen authored
      Remove spark.closure.serializer option and use JavaSerializer always
      
      CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11150 from srowen/SPARK-12414.
      29c54730
  33. Jan 26, 2016
    • Sean Owen's avatar
      [SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is... · 649e9d0f
      Sean Owen authored
      [SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator
      
      Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable.
      
      CC rxin pwendell for API change; tdas since it also touches streaming.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10413 from srowen/SPARK-3369.
      649e9d0f
  34. Jan 20, 2016
    • Shixiong Zhu's avatar
      [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60
      Shixiong Zhu authored
      Include the following changes:
      
      1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
      2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
      3. Update the ActorWordCount example and add the JavaActorWordCount example
      4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10744 from zsxwing/streaming-akka-2.
      b7d74a60
  35. Jan 08, 2016
  36. Jan 07, 2016
  37. Dec 22, 2015
  38. Nov 23, 2015
  39. Nov 17, 2015
  40. Nov 09, 2015
    • chriskang90's avatar
      [DOCS] Fix typo for Python section on unifying Kafka streams · 874cd66d
      chriskang90 authored
      1) kafkaStreams is a list.  The list should be unpacked when passing it into the streaming context union method, which accepts a variable number of streams.
      2) print() should be pprint() for pyspark.
      
      This contribution is my original work, and I license the work to the project under the project's open source license.
      
      Author: chriskang90 <jckang@uchicago.edu>
      
      Closes #9545 from c-kang/streaming_python_typo.
      874cd66d
Loading