Skip to content
Snippets Groups Projects
  1. Mar 26, 2017
    • zero323's avatar
      [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrowth · 0bc8847a
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Add `HasSupport` and `HasConfidence` `Params`.
      - Add new module `pyspark.ml.fpm`.
      - Add `FPGrowth` / `FPGrowthModel` wrappers.
      - Provide tests for new features.
      
      ## How was this patch tested?
      
      Unit tests.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17218 from zero323/SPARK-19281.
      0bc8847a
  2. Feb 18, 2017
  3. Feb 17, 2017
    • Roberto Agostino Vitillo's avatar
      [SPARK-19517][SS] KafkaSource fails to initialize partition offsets · 1a3f5f8c
      Roberto Agostino Vitillo authored
      ## What changes were proposed in this pull request?
      
      This patch fixes a bug in `KafkaSource` with the (de)serialization of the length of the JSON string that contains the initial partition offsets.
      
      ## How was this patch tested?
      
      I ran the test suite for spark-sql-kafka-0-10.
      
      Author: Roberto Agostino Vitillo <ra.vitillo@gmail.com>
      
      Closes #16857 from vitillo/kafka_source_fix.
      1a3f5f8c
  4. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][HOTFIX][BUILD] Use JAVA_HOME/bin/java if JAVA_HOME is set in dev/mima · dcc2d540
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Use JAVA_HOME/bin/java if JAVA_HOME is set in dev/mima script to run MiMa
      This follows on https://github.com/apache/spark/pull/16871 -- it's a slightly separate issue, but, is currently causing a build failure.
      
      ## How was this patch tested?
      
      Manually tested.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16957 from srowen/SPARK-19550.2.
      Unverified
      dcc2d540
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  5. Feb 14, 2017
  6. Feb 08, 2017
    • Dongjoon Hyun's avatar
      [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop2.6 · c618ccdb
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      After SPARK-19464, **SparkPullRequestBuilder** fails because it still tries to use hadoop2.3.
      
      **BEFORE**
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
      ```
      ========================================================================
      Building Spark
      ========================================================================
      [error] Could not find hadoop2.3 in the list. Valid options  are ['hadoop2.6', 'hadoop2.7']
      Attempting to post to Github...
       > Post successful.
      ```
      
      **AFTER**
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
      ```
      ========================================================================
      Building Spark
      ========================================================================
      [info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
      Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
      Note, this will be overridden by -java-home if it is set.
      ```
      
      ## How was this patch tested?
      
      Pass the existing test.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16858 from dongjoon-hyun/hotfix_run-tests.
      Unverified
      c618ccdb
    • Sean Owen's avatar
      [SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier · e8d3fca4
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove support for Hadoop 2.5 and earlier
      - Remove reflection and code constructs only needed to support multiple versions at once
      - Update docs to reflect newer versions
      - Remove older versions' builds and profiles.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16810 from srowen/SPARK-19464.
      Unverified
      e8d3fca4
  7. Jan 31, 2017
  8. Jan 25, 2017
    • Holden Karau's avatar
      [SPARK-19064][PYSPARK] Fix pip installing of sub components · 965c82d8
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Fix instalation of mllib and ml sub components, and more eagerly cleanup cache files during test script & make-distribution.
      
      ## How was this patch tested?
      
      Updated sanity test script to import mllib and ml sub-components.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #16465 from holdenk/SPARK-19064-fix-pip-install-sub-components.
      965c82d8
  9. Jan 19, 2017
    • José Hiram Soltren's avatar
      [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting · 640f9423
      José Hiram Soltren authored
      Builds on top of work in SPARK-8425 to update Application Level Blacklisting in the scheduler.
      
      ## What changes were proposed in this pull request?
      
      Adds a UI to these patches by:
      - defining new listener events for blacklisting and unblacklisting, nodes and executors;
      - sending said events at the relevant points in BlacklistTracker;
      - adding JSON (de)serialization code for these events;
      - augmenting the Executors UI page to show which, and how many, executors are blacklisted;
      - adding a unit test to make sure events are being fired;
      - adding HistoryServerSuite coverage to verify that the SHS reads these events correctly.
      - updates the Executor UI to show Blacklisted/Active/Dead as a tri-state in Executors Status
      
      Updates .rat-excludes to pass tests.
      
      username squito
      
      ## How was this patch tested?
      
      ./dev/run-tests
      testOnly org.apache.spark.util.JsonProtocolSuite
      testOnly org.apache.spark.scheduler.BlacklistTrackerSuite
      testOnly org.apache.spark.deploy.history.HistoryServerSuite
      https://github.com/jsoltren/jose-utils/blob/master/blacklist/test-blacklist.sh
      ![blacklist-20161219](https://cloud.githubusercontent.com/assets/1208477/21335321/9eda320a-c623-11e6-8b8c-9c912a73c276.jpg)
      
      Author: José Hiram Soltren <jose@cloudera.com>
      
      Closes #16346 from jsoltren/SPARK-16654-submit.
      640f9423
  10. Jan 18, 2017
  11. Jan 16, 2017
    • Felix Cheung's avatar
      [SPARK-18828][SPARKR] Refactor scripts for R · c84f7d3e
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Refactored script to remove duplications and clearer purpose for each script
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16249 from felixcheung/rscripts.
      c84f7d3e
  12. Jan 15, 2017
  13. Jan 14, 2017
    • hyukjinkwon's avatar
      [SPARK-19221][PROJECT INFRA][R] Add winutils binaries to the path in AppVeyor... · b6a7aa4f
      hyukjinkwon authored
      [SPARK-19221][PROJECT INFRA][R] Add winutils binaries to the path in AppVeyor tests for Hadoop libraries to call native codes properly
      
      ## What changes were proposed in this pull request?
      
      It seems Hadoop libraries need winutils binaries for native libraries in the path.
      
      It is not a problem in tests for now because we are only testing SparkR on Windows via AppVeyor but it can be a problem if we run Scala tests via AppVeyor as below:
      
      ```
       - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (3 seconds, 937 milliseconds)
         org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:609)
         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
         ...
      ```
      
      This PR proposes to add it to the `Path` for AppVeyor tests.
      
      ## How was this patch tested?
      
      Manually via AppVeyor.
      
      **Before**
      https://ci.appveyor.com/project/spark-test/spark/build/549-windows-complete/job/gc8a1pjua2bc4i8m
      
      **After**
      https://ci.appveyor.com/project/spark-test/spark/build/572-windows-complete/job/c4vrysr5uvj2hgu7
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16584 from HyukjinKwon/set-path-appveyor.
      b6a7aa4f
  14. Jan 10, 2017
  15. Jan 02, 2017
    • hyukjinkwon's avatar
      [SPARK-19002][BUILD][PYTHON] Check pep8 against all Python scripts · 46b21260
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to check pep8 against all other Python scripts and fix the errors as below:
      
      ```bash
      ./dev/create-release/generate-contributors.py
      ./dev/create-release/releaseutils.py
      ./dev/create-release/translate-contributors.py
      ./dev/lint-python
      ./python/docs/epytext.py
      ./examples/src/main/python/mllib/decision_tree_classification_example.py
      ./examples/src/main/python/mllib/decision_tree_regression_example.py
      ./examples/src/main/python/mllib/gradient_boosting_classification_example.py
      ./examples/src/main/python/mllib/gradient_boosting_regression_example.py
      ./examples/src/main/python/mllib/linear_regression_with_sgd_example.py
      ./examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
      ./examples/src/main/python/mllib/naive_bayes_example.py
      ./examples/src/main/python/mllib/random_forest_classification_example.py
      ./examples/src/main/python/mllib/random_forest_regression_example.py
      ./examples/src/main/python/mllib/svm_with_sgd_example.py
      ./examples/src/main/python/streaming/network_wordjoinsentiments.py
      ./sql/hive/src/test/resources/data/scripts/cat.py
      ./sql/hive/src/test/resources/data/scripts/cat_error.py
      ./sql/hive/src/test/resources/data/scripts/doubleescapedtab.py
      ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py
      ./sql/hive/src/test/resources/data/scripts/escapedcarriagereturn.py
      ./sql/hive/src/test/resources/data/scripts/escapednewline.py
      ./sql/hive/src/test/resources/data/scripts/escapedtab.py
      ./sql/hive/src/test/resources/data/scripts/input20_script.py
      ./sql/hive/src/test/resources/data/scripts/newline.py
      ```
      
      ## How was this patch tested?
      
      - `./python/docs/epytext.py`
      
        ```bash
        cd ./python/docs $$ make html
        ```
      
      - pep8 check (Python 2.7 / Python 3.3.6)
      
        ```
        ./dev/lint-python
        ```
      
      - `./dev/merge_spark_pr.py` (Python 2.7 only / Python 3.3.6 not working)
      
        ```bash
        python -m doctest -v ./dev/merge_spark_pr.py
        ```
      
      - `./dev/create-release/releaseutils.py` `./dev/create-release/generate-contributors.py` `./dev/create-release/translate-contributors.py` (Python 2.7 only / Python 3.3.6 not working)
      
        ```bash
        python generate-contributors.py
        python translate-contributors.py
        ```
      
      - Examples (Python 2.7 / Python 3.3.6)
      
        ```bash
        ./bin/spark-submit examples/src/main/python/mllib/decision_tree_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/decision_tree_regression_example.py
        ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_regression_example.p
        ./bin/spark-submit examples/src/main/python/mllib/random_forest_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/random_forest_regression_example.py
        ```
      
      - Examples (Python 2.7 only / Python 3.3.6 not working)
        ```
        ./bin/spark-submit examples/src/main/python/mllib/linear_regression_with_sgd_example.py
        ./bin/spark-submit examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
        ./bin/spark-submit examples/src/main/python/mllib/naive_bayes_example.py
        ./bin/spark-submit examples/src/main/python/mllib/svm_with_sgd_example.py
        ```
      
      - `sql/hive/src/test/resources/data/scripts/*.py` (Python 2.7 / Python 3.3.6 within suggested changes)
      
        Manually tested only changed ones.
      
      - `./dev/github_jira_sync.py` (Python 2.7 only / Python 3.3.6 not working)
      
        Manually tested this after disabling actually adding comments and links.
      
      And also via Jenkins tests.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16405 from HyukjinKwon/minor-pep8.
      Unverified
      46b21260
  16. Dec 29, 2016
    • Yin Huai's avatar
      Update known_translations for contributor names and also fix a small issue in... · 63036aee
      Yin Huai authored
      Update known_translations for contributor names and also fix a small issue in translate-contributors.py
      
      ## What changes were proposed in this pull request?
      This PR updates dev/create-release/known_translations to add more contributor name mapping. It also fixes a small issue in translate-contributors.py
      
      ## How was this patch tested?
      manually tested
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #16423 from yhuai/contributors.
      63036aee
  17. Dec 21, 2016
    • Felix Cheung's avatar
      [BUILD] make-distribution should find JAVA_HOME for non-RHEL systems · e1b43dc4
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      make-distribution.sh should find JAVA_HOME for Ubuntu, Mac and other non-RHEL systems
      
      ## How was this patch tested?
      
      Manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16363 from felixcheung/buildjava.
      e1b43dc4
    • Shixiong Zhu's avatar
      [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test · 95efc895
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      When KafkaSource fails on Kafka errors, we should create a new consumer to retry rather than using the existing broken one because it's possible that the broken one will fail again.
      
      This PR also assigns a new group id to the new created consumer for a possible race condition:  the broken consumer cannot talk with the Kafka cluster in `close` but the new consumer can talk to Kafka cluster. I'm not sure if this will happen or not. Just for safety to avoid that the Kafka cluster thinks there are two consumers with the same group id in a short time window. (Note: CachedKafkaConsumer doesn't need this fix since `assign` never uses the group id.)
      
      ## How was this patch tested?
      
      In https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70370/console , it ran this flaky test 120 times and all passed.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16282 from zsxwing/kafka-fix.
      95efc895
    • Yin Huai's avatar
      [SPARK-18951] Upgrade com.thoughtworks.paranamer/paranamer to 2.6 · 1a643889
      Yin Huai authored
      ## What changes were proposed in this pull request?
      I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes jackson fail to handle byte array defined in a case class. Then I find https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests that it is caused by a bug in paranamer. Let's upgrade paranamer. Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use com.thoughtworks.paranamer/paranamer 2.6, I suggests that we upgrade paranamer to 2.6.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #16359 from yhuai/SPARK-18951.
      1a643889
  18. Dec 15, 2016
  19. Dec 14, 2016
    • Cheng Lian's avatar
      [SPARK-18730] Post Jenkins test report page instead of the full console output page to GitHub · ba4aab9b
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      Currently, the full console output page of a Spark Jenkins PR build can be as large as several megabytes. It takes a relatively long time to load and may even freeze the browser for quite a while.
      
      This PR makes the build script to post the test report page link to GitHub instead. The test report page is way more concise and is usually the first page I'd like to check when investigating a Jenkins build failure.
      
      Note that for builds that a test report is not available (ongoing builds and builds that fail before test execution), the test report link automatically redirects to the build page.
      
      ## How was this patch tested?
      
      N/A.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #16163 from liancheng/jenkins-test-report.
      ba4aab9b
  20. Dec 09, 2016
  21. Dec 08, 2016
    • Shivaram Venkataraman's avatar
      [SPARKR][PYSPARK] Fix R source package name to match Spark version. Remove pip... · 4ac8b20b
      Shivaram Venkataraman authored
      [SPARKR][PYSPARK] Fix R source package name to match Spark version. Remove pip tar.gz from distribution
      
      ## What changes were proposed in this pull request?
      
      Fixes name of R source package so that the `cp` in release-build.sh works correctly.
      
      Issue discussed in https://github.com/apache/spark/pull/16014#issuecomment-265867125
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #16221 from shivaram/fix-sparkr-release-build-name.
      4ac8b20b
    • Shivaram Venkataraman's avatar
      [SPARK-18590][SPARKR] Change the R source build to Hadoop 2.6 · 202fcd21
      Shivaram Venkataraman authored
      This PR changes the SparkR source release tarball to be built using the Hadoop 2.6 profile. Previously it was using the without hadoop profile which leads to an error as discussed in https://github.com/apache/spark/pull/16014#issuecomment-265843991
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #16218 from shivaram/fix-sparkr-release-build.
      202fcd21
    • Felix Cheung's avatar
      [SPARK-18590][SPARKR] build R source package when making distribution · c3d3a9d0
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      This PR has 2 key changes. One, we are building source package (aka bundle package) for SparkR which could be released on CRAN. Two, we should include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not)
      
      But, because of various differences in how R performs different tasks, this PR is a fair bit more complicated. More details below.
      
      This PR also includes a few minor fixes.
      
      ### more details
      
      These are the additional steps in make-distribution; please see [here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md) on what's going to a CRAN release, which is now run during make-distribution.sh.
      1. package needs to be installed because the first code block in vignettes is `library(SparkR)` without lib path
      2. `R CMD build` will build vignettes (this process runs Spark/SparkR code and captures outputs into pdf documentation)
      3. `R CMD check` on the source package will install package and build vignettes again (this time from source packaged) - this is a key step required to release R package on CRAN
       (will skip tests here but tests will need to pass for CRAN release process to success - ideally, during release signoff we should install from the R source package and run tests)
      4. `R CMD Install` on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1)
       (the output of this step is what we package into Spark dist and sparkr.zip)
      
      Alternatively,
         R CMD build should already be installing the package in a temp directory though it might just be finding this location and set it to lib.loc parameter; another approach is perhaps we could try calling `R CMD INSTALL --build pkg` instead.
       But in any case, despite installing the package multiple times this is relatively fast.
      Building vignettes takes a while though.
      
      ## How was this patch tested?
      
      Manually, CI.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16014 from felixcheung/rdist.
      c3d3a9d0
  22. Dec 06, 2016
  23. Dec 03, 2016
  24. Dec 01, 2016
    • Reynold Xin's avatar
      [SPARK-18639] Build only a single pip package · 37e52f87
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      We current build 5 separate pip binary tar balls, doubling the release script runtime. It'd be better to build one, especially for use cases that are just using Spark locally. In the long run, it would make more sense to have Hadoop support be pluggable.
      
      ## How was this patch tested?
      N/A - this is a release build script that doesn't have any automated test coverage. We will know if it goes wrong when we prepare releases.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #16072 from rxin/SPARK-18639.
      37e52f87
  25. Nov 28, 2016
    • Yin Huai's avatar
      [SPARK-18602] Set the version of org.codehaus.janino:commons-compiler to 3.0.0... · eba72775
      Yin Huai authored
      [SPARK-18602] Set the version of org.codehaus.janino:commons-compiler to 3.0.0 to match the version of org.codehaus.janino:janino
      
      ## What changes were proposed in this pull request?
      org.codehaus.janino:janino depends on org.codehaus.janino:commons-compiler and we have been upgraded to org.codehaus.janino:janino 3.0.0.
      
      However, seems we are still pulling in org.codehaus.janino:commons-compiler 2.7.6 because of calcite. It looks like an accident because we exclude janino from calcite (see here https://github.com/apache/spark/blob/branch-2.1/pom.xml#L1759). So, this PR upgrades org.codehaus.janino:commons-compiler to 3.0.0.
      
      ## How was this patch tested?
      jenkins
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #16025 from yhuai/janino-commons-compile.
      eba72775
  26. Nov 23, 2016
  27. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
    • Xianyang Liu's avatar
      [SPARK-18420][BUILD] Fix the errors caused by lint check in Java · 7569cf6c
      Xianyang Liu authored
      ## What changes were proposed in this pull request?
      
      Small fix, fix the errors caused by lint check in Java
      
      - Clear unused objects and `UnusedImports`.
      - Add comments around the method `finalize` of `NioBufferedFileInputStream`to turn off checkstyle.
      - Cut the line which is longer than 100 characters into two lines.
      
      ## How was this patch tested?
      Travis CI.
      ```
      $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
      $ dev/lint-java
      ```
      Before:
      ```
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[21,8] (imports) UnusedImports: Unused import - org.apache.commons.crypto.cipher.CryptoCipherFactory.
      [ERROR] src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java:[516,5] (modifier) RedundantModifier: Redundant 'public' modifier.
      [ERROR] src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java:[133] (coding) NoFinalizer: Avoid using finalizer method.
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeMapData.java:[71] (sizes) LineLength: Line is longer than 100 characters (found 113).
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java:[112] (sizes) LineLength: Line is longer than 100 characters (found 110).
      [ERROR] src/test/java/org/apache/spark/sql/catalyst/expressions/HiveHasherSuite.java:[31,17] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions.
      [ERROR]src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103).
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[22,8] (imports) UnusedImports: Unused import - org.apache.spark.ml.linalg.Vectors.
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[51] (regexp) RegexpSingleline: No trailing whitespace allowed.
      ```
      
      After:
      ```
      $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
      $ dev/lint-java
      Using `mvn` from path: /home/travis/build/ConeyLiu/spark/build/apache-maven-3.3.9/bin/mvn
      Checkstyle checks passed.
      ```
      
      Author: Xianyang Liu <xyliu0530@icloud.com>
      
      Closes #15865 from ConeyLiu/master.
      Unverified
      7569cf6c
  28. Nov 12, 2016
Loading