Skip to content
Snippets Groups Projects
  1. Dec 12, 2016
  2. Dec 10, 2016
  3. Dec 09, 2016
    • Xiangrui Meng's avatar
      [SPARK-18812][MLLIB] explain "Spark ML" · e45345d9
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries to the MLlib user guide to explain "Spark ML" and reduce the confusion.
      
      I check the [Spark FAQ page](http://spark.apache.org/faq.html
      
      ), which seems too high-level for the content here. So I added it to the MLlib user guide instead.
      
      cc: mateiz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #16241 from mengxr/SPARK-18812.
      
      (cherry picked from commit d2493a20)
      Signed-off-by: default avatarXiangrui Meng <meng@databricks.com>
      e45345d9
    • Jacek Laskowski's avatar
      [MINOR][CORE][SQL][DOCS] Typo fixes · b226f10e
      Jacek Laskowski authored
      
      ## What changes were proposed in this pull request?
      
      Typo fixes
      
      ## How was this patch tested?
      
      Local build. Awaiting the official build.
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #16144 from jaceklaskowski/typo-fixes.
      
      (cherry picked from commit b162cc0c)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      b226f10e
  4. Dec 08, 2016
    • Yanbo Liang's avatar
      [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide · 9095c152
      Yanbo Liang authored
      
      ## What changes were proposed in this pull request?
      * Add all R examples for ML wrappers which were added during 2.1 release cycle.
      * Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
      * Add corresponding examples to ML user guide.
      * Update ML section of SparkR user guide.
      
      Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.
      
      ## How was this patch tested?
      Run all examples manually.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16148 from yanboliang/spark-18325.
      
      (cherry picked from commit 9bf8f3cd)
      Signed-off-by: default avatarYanbo Liang <ybliang8@gmail.com>
      9095c152
    • Patrick Wendell's avatar
      48aa6775
    • Patrick Wendell's avatar
      Preparing Spark release v2.1.0-rc2 · 08071749
      Patrick Wendell authored
      08071749
  5. Dec 07, 2016
  6. Dec 05, 2016
  7. Dec 04, 2016
    • Felix Cheung's avatar
      [SPARK-18643][SPARKR] SparkR hangs at session start when installed as a package without Spark · c13c2939
      Felix Cheung authored
      
      ## What changes were proposed in this pull request?
      
      If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session.
      
      This seems to be a regression on the earlier behavior.
      
      Fix is to always try to install or check for the cached Spark if running in an interactive session.
      As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc)
      
      ## How was this patch tested?
      
      Manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16077 from felixcheung/rsessioninteractive.
      
      (cherry picked from commit b019b3a8)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      c13c2939
  8. Dec 03, 2016
    • Yunni's avatar
      [SPARK-18081][ML][DOCS] Add user guide for Locality Sensitive Hashing(LSH) · 28f698b4
      Yunni authored
      
      ## What changes were proposed in this pull request?
      The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples.
      
      ## How was this patch tested?
      Doc has been generated through Jekyll, and checked through manual inspection.
      
      Author: Yunni <Euler57721@gmail.com>
      Author: Yun Ni <yunn@uber.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      Author: Yun Ni <Euler57721@gmail.com>
      
      Closes #15795 from Yunni/SPARK-18081-lsh-guide.
      
      (cherry picked from commit 34777184)
      Signed-off-by: default avatarJoseph K. Bradley <joseph@databricks.com>
      28f698b4
  9. Dec 02, 2016
  10. Nov 30, 2016
  11. Nov 29, 2016
  12. Nov 28, 2016
    • Marcelo Vanzin's avatar
      [SPARK-18547][CORE] Propagate I/O encryption key when executors register. · c4cbdc86
      Marcelo Vanzin authored
      
      This change modifies the method used to propagate encryption keys used during
      shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
      this change explicitly distributes the key using the messages exchanged between
      driver and executor during registration. When RPC encryption is enabled, this means
      key propagation is also secure.
      
      This allows shuffle encryption to work in non-YARN mode, which means that it's
      easier to write unit tests for areas of the code that are affected by the feature.
      
      The key is stored in the SecurityManager; because there are many instances of
      that class used in the code, the key is only guaranteed to exist in the instance
      managed by the SparkEnv. This path was chosen to avoid storing the key in the
      SparkConf, which would risk having the key being written to disk as part of the
      configuration (as, for example, is done when starting YARN applications).
      
      Tested by new and existing unit tests (which were moved from the YARN module to
      core), and by running apps with shuffle encryption enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15981 from vanzin/SPARK-18547.
      
      (cherry picked from commit 8b325b17)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      c4cbdc86
    • Patrick Wendell's avatar
      75d73d13
    • Patrick Wendell's avatar
      Preparing Spark release v2.1.0-rc1 · 80aabc0b
      Patrick Wendell authored
      80aabc0b
  13. Nov 26, 2016
  14. Nov 23, 2016
  15. Nov 19, 2016
    • Sean Owen's avatar
      [SPARK-18353][CORE] spark.rpc.askTimeout defalut value is not 120s · 30a6fbbb
      Sean Owen authored
      
      ## What changes were proposed in this pull request?
      
      Avoid hard-coding spark.rpc.askTimeout to non-default in Client; fix doc about spark.rpc.askTimeout default
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15833 from srowen/SPARK-18353.
      
      (cherry picked from commit 8b1e1088)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      30a6fbbb
    • hyukjinkwon's avatar
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note... · 4b396a65
      hyukjinkwon authored
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that`/`'''Note:'''` across Scala/Java API documentation
      
      It seems in Scala/Java,
      
      - `Note:`
      - `NOTE:`
      - `Note that`
      - `'''Note:'''`
      - `note`
      
      This PR proposes to fix those to `note` to be consistent.
      
      **Before**
      
      - Scala
        ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png)
      
      - Java
        ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png)
      
      **After**
      
      - Scala
        ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png)
      
      - Java
        ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png
      
      )
      
      The notes were found via
      
      ```bash
      grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// NOTE: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...`
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note that " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// '''Note:''' " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      And then fixed one by one comparing with API documentation/access modifiers.
      
      After that, manually tested via `jekyll build`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #15889 from HyukjinKwon/SPARK-18437.
      
      (cherry picked from commit d5b1d5fc)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      4b396a65
  16. Nov 17, 2016
    • Zheng RuiFeng's avatar
      [SPARK-18480][DOCS] Fix wrong links for ML guide docs · 536a2159
      Zheng RuiFeng authored
      
      ## What changes were proposed in this pull request?
      1, There are two `[Graph.partitionBy]` in `graphx-programming-guide.md`, the first one had no effert.
      2, `DataFrame`, `Transformer`, `Pipeline` and `Parameter`  in `ml-pipeline.md` were linked to `ml-guide.html` by mistake.
      3, `PythonMLLibAPI` in `mllib-linear-methods.md` was not accessable, because class `PythonMLLibAPI` is private.
      4, Other link updates.
      ## How was this patch tested?
       manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15912 from zhengruifeng/md_fix.
      
      (cherry picked from commit cdaf4ce9)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      536a2159
    • Weiqing Yang's avatar
      [YARN][DOC] Remove non-Yarn specific configurations from running-on-yarn.md · 2ee4fc88
      Weiqing Yang authored
      
      ## What changes were proposed in this pull request?
      
      Remove `spark.driver.memory`, `spark.executor.memory`,  `spark.driver.cores`, and `spark.executor.cores` from `running-on-yarn.md` as they are not Yarn-specific, and they are also defined in`configuration.md`.
      
      ## How was this patch tested?
      Build passed & Manually check.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #15869 from weiqingy/yarnDoc.
      
      (cherry picked from commit a3cac7bd)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      2ee4fc88
  17. Nov 16, 2016
  18. Nov 15, 2016
    • Zheng RuiFeng's avatar
      [SPARK-18427][DOC] Update docs of mllib.KMeans · 0762c0ce
      Zheng RuiFeng authored
      
      ## What changes were proposed in this pull request?
      1,Remove `runs` from docs of mllib.KMeans
      2,Add notes for `k` according to comments in sources
      ## How was this patch tested?
      existing tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15873 from zhengruifeng/update_doc_mllib_kmeans.
      
      (cherry picked from commit 33be4da5)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      0762c0ce
  19. Nov 14, 2016
  20. Nov 13, 2016
  21. Nov 08, 2016
  22. Nov 07, 2016
    • fidato's avatar
      [SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles · c8879bf1
      fidato authored
      
      ## What changes were proposed in this pull request?
      
      This Pull request comprises of the critical bug SPARK-16575 changes. This change rectifies the issue with BinaryFileRDD partition calculations as  upon creating an RDD with sc.binaryFiles, the resulting RDD always just consisted of two partitions only.
      ## How was this patch tested?
      
      The original issue ie. getNumPartitions on binary Files RDD (always having two partitions) was first replicated and then tested upon the changes. Also the unit tests have been checked and passed.
      
      This contribution is my original work and I licence the work to the project under the project's open source license
      
      srowen hvanhovell rxin vanzin skyluc kmader zsxwing datafarmer Please have a look .
      
      Author: fidato <fidato.july13@gmail.com>
      
      Closes #15327 from fidato13/SPARK-16575.
      
      (cherry picked from commit 6f369713)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      c8879bf1
Loading