Skip to content
Snippets Groups Projects
  1. Jun 29, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` · 715f084c
      Yu ISHIKAWA authored
      [[SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8554)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6947 from yu-iskw/SPARK-8554 and squashes the following commits:
      
      5ca240c [Yu ISHIKAWA] [SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license`
      715f084c
    • Brennon York's avatar
      [SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a nice way · 5c796d57
      Brennon York authored
      Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like:
      
      ```
      -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  streaming-kafka-assembly/assembly
      ```
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #7085 from brennonyork/SPARK-8693 and squashes the following commits:
      
      c5575f1 [Brennon York] added commas to end of print statements for proper printing
      5c796d57
    • zsxwing's avatar
      [SPARK-8702] [WEBUI] Avoid massive concating strings in Javascript · 630bd5fd
      zsxwing authored
      When there are massive tasks, such as `sc.parallelize(1 to 100000, 10000).count()`, the generated JS codes have a lot of string concatenations in the stage page, nearly 40 string concatenations for one task.
      
      We can generate the whole string for a task instead of execution string concatenations in the browser.
      
      Before this patch, the load time of the page is about 21 seconds.
      ![screen shot 2015-06-29 at 6 44 04 pm](https://cloud.githubusercontent.com/assets/1000778/8406644/eb55ed18-1e90-11e5-9ad5-50d27ad1dff1.png)
      
      After this patch, it reduces to about 17 seconds.
      
      ![screen shot 2015-06-29 at 6 47 34 pm](https://cloud.githubusercontent.com/assets/1000778/8406665/087003ca-1e91-11e5-80a8-3485aa9adafa.png)
      
      One disadvantage is that the generated JS codes become hard to read.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7082 from zsxwing/js-string and squashes the following commits:
      
      b29231d [zsxwing] Avoid massive concating strings in Javascript
      630bd5fd
    • Reynold Xin's avatar
      [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should... · 660c6cec
      Reynold Xin authored
      [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7079 from rxin/SPARK-8698 and squashes the following commits:
      
      8513e1c [Reynold Xin] [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.
      660c6cec
    • Cheolsoo Park's avatar
      [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala · ac2e17b0
      Cheolsoo Park authored
      I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match.
      
      I added `Option` to reader and writer and updated the `pyspark-sql` test.
      
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #7078 from piaozhexiu/SPARK-8355 and squashes the following commits:
      
      c63d419 [Cheolsoo Park] Fix version
      524e0aa [Cheolsoo Park] Add option function to df reader and writer
      ac2e17b0
    • BenFradet's avatar
      [SPARK-8575] [SQL] Deprecate callUDF in favor of udf · 0b10662f
      BenFradet authored
      Follow up of [SPARK-8356](https://issues.apache.org/jira/browse/SPARK-8356) and #6902.
      Removes the unit test for the now deprecated ```callUdf```
      Unit test in SQLQuerySuite now uses ```udf``` instead of ```callUDF```
      Replaced ```callUDF``` by ```udf``` where possible in mllib
      
      Author: BenFradet <benjamin.fradet@gmail.com>
      
      Closes #6993 from BenFradet/SPARK-8575 and squashes the following commits:
      
      26f5a7a [BenFradet] 2 spaces instead of 1
      1ddb452 [BenFradet] renamed initUDF in order to be consistent in OneVsRest
      48ca15e [BenFradet] used vector type tag for udf call in VectorIndexer
      0ebd0da [BenFradet] replace the now deprecated callUDF by udf in VectorIndexer
      8013409 [BenFradet] replaced the now deprecated callUDF by udf in Predictor
      94345b5 [BenFradet] unifomized udf calls in ProbabilisticClassifier
      1305492 [BenFradet] uniformized udf calls in Classifier
      a672228 [BenFradet] uniformized udf calls in OneVsRest
      49e4904 [BenFradet] Revert "removal of the unit test for the now deprecated callUdf"
      bbdeaf3 [BenFradet] fixed syntax for init udf in OneVsRest
      fe2a10b [BenFradet] callUDF => udf in ProbabilisticClassifier
      0ea30b3 [BenFradet] callUDF => udf in Classifier where possible
      197ec82 [BenFradet] callUDF => udf in OneVsRest
      84d6780 [BenFradet] modified unit test in SQLQuerySuite to use udf instead of callUDF
      477709f [BenFradet] removal of the unit test for the now deprecated callUdf
      0b10662f
    • Yanbo Liang's avatar
      [SPARK-5962] [MLLIB] Python support for Power Iteration Clustering · dfde31da
      Yanbo Liang authored
      Python support for Power Iteration Clustering
      https://issues.apache.org/jira/browse/SPARK-5962
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6992 from yanboliang/pyspark-pic and squashes the following commits:
      
      6b03d82 [Yanbo Liang] address comments
      4be4423 [Yanbo Liang] Python support for Power Iteration Clustering
      dfde31da
    • Feynman Liang's avatar
      [SPARK-7212] [MLLIB] Add sequence learning flag · 25f574eb
      Feynman Liang authored
      Support mining of ordered frequent item sequences.
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #6997 from feynmanliang/fp-sequence and squashes the following commits:
      
      7c14e15 [Feynman Liang] Improve scalatests with R code and Seq
      0d3e4b6 [Feynman Liang] Fix python test
      ce987cb [Feynman Liang] Backwards compatibility aux constructor
      34ef8f2 [Feynman Liang] Fix failing test due to reverse orderering
      f04bd50 [Feynman Liang] Naming, add ordered to FreqItemsets, test ordering using Seq
      648d4d4 [Feynman Liang] Test case for frequent item sequences
      252a36a [Feynman Liang] Add sequence learning flag
      25f574eb
  2. Jun 28, 2015
    • Cheng Lian's avatar
      [SPARK-7845] [BUILD] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1 · 00a9d22b
      Cheng Lian authored
      PR #5694 reverted PR #6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR #6384 didn't bump Hadoop 1 version defined in POM.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7062 from liancheng/spark-7845 and squashes the following commits:
      
      c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1
      00a9d22b
    • Liang-Chi Hsieh's avatar
      [SPARK-8677] [SQL] Fix non-terminating decimal expansion for decimal divide operation · 24fda738
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8677
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #7056 from viirya/fix_decimal3 and squashes the following commits:
      
      34d7419 [Liang-Chi Hsieh] Fix Non-terminating decimal expansion for decimal divide operation.
      24fda738
    • Vincent D. Warmerdam's avatar
      [SPARK-8596] [EC2] Added port for Rstudio · 9ce78b43
      Vincent D. Warmerdam authored
      This would otherwise need to be set manually by R users in AWS.
      
      https://issues.apache.org/jira/browse/SPARK-8596
      
      Author: Vincent D. Warmerdam <vincentwarmerdam@gmail.com>
      Author: vincent <vincentwarmerdam@gmail.com>
      
      Closes #7068 from koaning/rstudio-port-number and squashes the following commits:
      
      ac8100d [vincent] Update spark_ec2.py
      ce6ad88 [Vincent D. Warmerdam] added port number for rstudio
      9ce78b43
    • Kousuke Saruta's avatar
      [SPARK-8686] [SQL] DataFrame should support `where` with expression represented by String · ec784381
      Kousuke Saruta authored
      DataFrame supports `filter` function with two types of argument, `Column` and `String`. But `where` doesn't.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #7063 from sarutak/SPARK-8686 and squashes the following commits:
      
      180f9a4 [Kousuke Saruta] Added test
      d61aec4 [Kousuke Saruta] Add "where" method with String argument to DataFrame
      ec784381
    • Davies Liu's avatar
      [SPARK-8610] [SQL] Separate Row and InternalRow (part 2) · 77da5be6
      Davies Liu authored
      Currently, we use GenericRow both for Row and InternalRow, which is confusing because it could contain Scala type also Catalyst types.
      
      This PR changes to use GenericInternalRow for InternalRow (contains catalyst types), GenericRow for Row (contains Scala types).
      
      Also fixes some incorrect use of InternalRow or Row.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7003 from davies/internalrow and squashes the following commits:
      
      d05866c [Davies Liu] fix test: rollback changes for pyspark
      72878dd [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow
      efd0b25 [Davies Liu] fix copy of MutableRow
      87b13cf [Davies Liu] fix test
      d2ebd72 [Davies Liu] fix style
      eb4b473 [Davies Liu] mark expensive API as final
      bd4e99c [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow
      bdfb78f [Davies Liu] remove BaseMutableRow
      6f99a97 [Davies Liu] fix catalyst test
      defe931 [Davies Liu] remove BaseRow
      288b31f [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow
      9d24350 [Davies Liu] separate Row and InternalRow (part 2)
      77da5be6
    • Thomas Szymanski's avatar
      [SPARK-8649] [BUILD] Mapr repository is not defined properly · 52d12818
      Thomas Szymanski authored
      The previous commiter on this part was pwendell
      
      The previous url gives 404, the new one seems to be OK.
      
      This patch is added under the Apache License 2.0.
      
      The JIRA link: https://issues.apache.org/jira/browse/SPARK-8649
      
      Author: Thomas Szymanski <develop@tszymanski.com>
      
      Closes #7054 from tszym/SPARK-8649 and squashes the following commits:
      
      bfda9c4 [Thomas Szymanski] [SPARK-8649] [BUILD] Mapr repository is not defined properly
      52d12818
    • Josh Rosen's avatar
      [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all · f5100451
      Josh Rosen authored
      Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
      
      See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
      
      70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
      f5100451
    • Josh Rosen's avatar
      42db3a1c
  3. Jun 27, 2015
    • Josh Rosen's avatar
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with... · 40648c56
      Josh Rosen authored
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system
      
      This patch refactors the `python/run-tests` script:
      
      - It's now written in Python instead of Bash.
      - The descriptions of the tests to run are now stored in `dev/run-tests`'s modules.  This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes.  For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.
      - `python/run-tests` now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):
      
        ```
      Usage: run-tests [options]
      
      Options:
        -h, --help            show this help message and exit
        --python-executables=PYTHON_EXECUTABLES
                              A comma-separated list of Python executables to test
                              against (default: python2.6,python3.4,pypy)
        --modules=MODULES     A comma-separated list of Python modules to test
                              (default: pyspark-core,pyspark-ml,pyspark-mllib
                              ,pyspark-sql,pyspark-streaming)
         ```
      - `dev/run-tests` has been split into multiple files: the module definitions and test utility functions are now stored inside of a `dev/sparktestsupport` Python module, allowing them to be re-used from the Python test runner script.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6967 from JoshRosen/run-tests-python-modules and squashes the following commits:
      
      f578d6d [Josh Rosen] Fix print for Python 2.x
      8233d61 [Josh Rosen] Add python/run-tests.py to Python lint checks
      34c98d2 [Josh Rosen] Fix universal_newlines for Python 3
      8f65ed0 [Josh Rosen] Fix handling of  module in python/run-tests
      37aff00 [Josh Rosen] Python 3 fix
      27a389f [Josh Rosen] Skip MLLib tests for PyPy
      c364ccf [Josh Rosen] Use which() to convert PYSPARK_PYTHON to an absolute path before shelling out to run tests
      568a3fd [Josh Rosen] Fix hashbang
      3b852ae [Josh Rosen] Fall back to PYSPARK_PYTHON when sys.executable is None (fixes a test)
      f53db55 [Josh Rosen] Remove python2 flag, since the test runner script also works fine under Python 3
      9c80469 [Josh Rosen] Fix passing of PYSPARK_PYTHON
      d33e525 [Josh Rosen] Merge remote-tracking branch 'origin/master' into run-tests-python-modules
      4f8902c [Josh Rosen] Python lint fixes.
      8f3244c [Josh Rosen] Use universal_newlines to fix dev/run-tests doctest failures on Python 3.
      f542ac5 [Josh Rosen] Fix lint check for Python 3
      fff4d09 [Josh Rosen] Add dev/sparktestsupport to pep8 checks
      2efd594 [Josh Rosen] Update dev/run-tests to use new Python test runner flags
      b2ab027 [Josh Rosen] Add command-line options for running individual suites in python/run-tests
      caeb040 [Josh Rosen] Fixes to PySpark test module definitions
      d6a77d3 [Josh Rosen] Fix the tests of dev/run-tests
      def2d8a [Josh Rosen] Two minor fixes
      aec0b8f [Josh Rosen] Actually get the Kafka stuff to run properly
      04015b9 [Josh Rosen] First attempt at getting PySpark Kafka test to work in new runner script
      4c97136 [Josh Rosen] PYTHONPATH fixes
      dcc9c09 [Josh Rosen] Fix time division
      32660fc [Josh Rosen] Initial cut at Python test runner refactoring
      311c6a9 [Josh Rosen] Move shell utility functions to own module.
      1bdeb87 [Josh Rosen] Move module definitions to separate file.
      40648c56
    • Josh Rosen's avatar
      [SPARK-8606] Prevent exceptions in RDD.getPreferredLocations() from crashing DAGScheduler · 0b5abbf5
      Josh Rosen authored
      If `RDD.getPreferredLocations()` throws an exception it may crash the DAGScheduler and SparkContext. This patch addresses this by adding a try-catch block.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7023 from JoshRosen/SPARK-8606 and squashes the following commits:
      
      770b169 [Josh Rosen] Fix getPreferredLocations() DAGScheduler crash with try block.
      44a9b55 [Josh Rosen] Add test of a buggy getPartitions() method
      19aa9f7 [Josh Rosen] Add (failing) regression test for getPreferredLocations() DAGScheduler crash
      0b5abbf5
    • Sandy Ryza's avatar
      [SPARK-8623] Hadoop RDDs fail to properly serialize configuration · 4153776f
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #7050 from sryza/sandy-spark-8623 and squashes the following commits:
      
      58a8079 [Sandy Ryza] SPARK-8623. Hadoop RDDs fail to properly serialize configuration
      4153776f
    • Neelesh Srinivas Salian's avatar
      [SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" document · d48e7893
      Neelesh Srinivas Salian authored
      As per the description in the JIRA, I moved the contents of the page and added a few additional content.
      
      Author: Neelesh Srinivas Salian <nsalian@cloudera.com>
      
      Closes #6924 from nssalian/SPARK-3629 and squashes the following commits:
      
      944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters
      40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line
      9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section
      8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update
      151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document
      d48e7893
    • Rosstin's avatar
      [SPARK-8639] [DOCS] Fixed Minor Typos in Documentation · b5a6663d
      Rosstin authored
      Ticket: [SPARK-8639](https://issues.apache.org/jira/browse/SPARK-8639)
      
      fixed minor typos in docs/README.md and docs/api.md
      
      Author: Rosstin <asterazul@gmail.com>
      
      Closes #7046 from Rosstin/SPARK-8639 and squashes the following commits:
      
      6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
      b5a6663d
  4. Jun 26, 2015
    • cafreeman's avatar
      [SPARK-8607] SparkR -- jars not being added to application classpath correctly · 9d118177
      cafreeman authored
      Add `getStaticClass` method in SparkR's `RBackendHandler`
      
      This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185
      
      ).
      
      cc shivaram
      
      Author: cafreeman <cfreeman@alteryx.com>
      
      Closes #7001 from cafreeman/branch-1.4 and squashes the following commits:
      
      8f81194 [cafreeman] Add missing license
      31aedcf [cafreeman] Refactor test to call an external R script
      2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
      0bea809 [cafreeman] Fixed relative path issue and added smaller JAR
      ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
      9a5c362 [cafreeman] test for including JAR when launching sparkContext
      9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
      5a80844 [cafreeman] Fix style nits
      7c6bd0c [cafreeman] [SPARK-8607] SparkR
      
      (cherry picked from commit 2579948b)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      9d118177
    • cafreeman's avatar
      [SPARK-8662] SparkR Update SparkSQL Test · a56516fc
      cafreeman authored
      Test `infer_type` using a more fine-grained approach rather than comparing environments. Since `all.equal`'s behavior has changed in R 3.2, the test became unpassable.
      
      JIRA here:
      https://issues.apache.org/jira/browse/SPARK-8662
      
      
      
      Author: cafreeman <cfreeman@alteryx.com>
      
      Closes #7045 from cafreeman/R32_Test and squashes the following commits:
      
      b97cc52 [cafreeman] Add `checkStructField` utility
      3381e5c [cafreeman] Update SparkSQL Test
      
      (cherry picked from commit 78b31a2a)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      a56516fc
    • Josh Rosen's avatar
      [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() · 41afa165
      Josh Rosen authored
      This patch addresses a critical issue in the PySpark tests:
      
      Several of our Python modules' `__main__` methods call `doctest.testmod()` in order to run doctests but forget to check and handle its return value. As a result, some PySpark test failures can go unnoticed because they will not fail the build.
      
      Fortunately, there was only one test failure which was masked by this bug: a `pyspark.profiler` doctest was failing due to changes in RDD pipelining.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7032 from JoshRosen/testmod-fix and squashes the following commits:
      
      60dbdc0 [Josh Rosen] Account for int vs. long formatting change in Python 3
      8b8d80a [Josh Rosen] Fix failing test.
      e6423f9 [Josh Rosen] Check return code for all uses of doctest.testmod().
      41afa165
    • Marcelo Vanzin's avatar
      [SPARK-8302] Support heterogeneous cluster install paths on YARN. · 37bf76a2
      Marcelo Vanzin authored
      Some users have Hadoop installations on different paths across
      their cluster. Currently, that makes it hard to set up some
      configuration in Spark since that requires hardcoding paths to
      jar files or native libraries, which wouldn't work on such a cluster.
      
      This change introduces a couple of YARN-specific configurations
      that instruct the backend to replace certain paths when launching
      remote processes. That way, if the configuration says the Spark
      jar is in "/spark/spark.jar", and also says that "/spark" should be
      replaced with "{{SPARK_INSTALL_DIR}}", YARN will start containers
      in the NMs with "{{SPARK_INSTALL_DIR}}/spark.jar" as the location
      of the jar.
      
      Coupled with YARN's environment whitelist (which allows certain
      env variables to be exposed to containers), this allows users to
      support such heterogeneous environments, as long as a single
      replacement is enough. (Otherwise, this feature would need to be
      extended to support multiple path replacements.)
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6752 from vanzin/SPARK-8302 and squashes the following commits:
      
      4bff8d4 [Marcelo Vanzin] Add docs, rename configs.
      0aa2a02 [Marcelo Vanzin] Only do replacement for paths that need it.
      2e9cc9d [Marcelo Vanzin] Style.
      a5e1f68 [Marcelo Vanzin] [SPARK-8302] Support heterogeneous cluster install paths on YARN.
      37bf76a2
    • Holden Karau's avatar
      [SPARK-8613] [ML] [TRIVIAL] add param to disable linear feature scaling · c9e05a31
      Holden Karau authored
      Add a param to disable linear feature scaling (to be implemented later in linear & logistic regression). Done as a seperate PR so we can use same param & not conflict while working on the sub-tasks.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7024 from holdenk/SPARK-8522-Disable-Linear_featureScaling-Spark-8613-Add-param and squashes the following commits:
      
      ce8931a [Holden Karau] Regenerate the sharedParams code
      fa6427e [Holden Karau] update text for standardization param.
      7b24a2b [Holden Karau] generate the new standardization param
      3c190af [Holden Karau] Add the standardization param to sharedparamscodegen
      c9e05a31
    • Josh Rosen's avatar
      [SPARK-8344] Add message processing time metric to DAGScheduler · 9fed6abf
      Josh Rosen authored
      This commit adds a new metric, `messageProcessingTime`, to the DAGScheduler metrics source. This metrics tracks the time taken to process messages in the scheduler's event processing loop, which is a helpful debugging aid for diagnosing performance issues in the scheduler (such as SPARK-4961).
      
      In order to do this, I moved the creation of the DAGSchedulerSource metrics source into DAGScheduler itself, similar to how MasterSource is created and registered in Master.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7002 from JoshRosen/SPARK-8344 and squashes the following commits:
      
      57f914b [Josh Rosen] Fix import ordering
      7d6bb83 [Josh Rosen] Add message processing time metrics to DAGScheduler
      9fed6abf
    • Wenchen Fan's avatar
      [SPARK-8635] [SQL] improve performance of CatalystTypeConverters · 1a79f0eb
      Wenchen Fan authored
      In `CatalystTypeConverters.createToCatalystConverter`, we add special handling for primitive types. We can apply this strategy to more places to improve performance.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7018 from cloud-fan/converter and squashes the following commits:
      
      8b16630 [Wenchen Fan] another fix
      326c82c [Wenchen Fan] optimize type converter
      1a79f0eb
    • Wenchen Fan's avatar
      [SPARK-8620] [SQL] cleanup CodeGenContext · 40360112
      Wenchen Fan authored
      fix docs, remove nativeTypes , use java type to get boxed type ,default value, etc. to avoid handle `DateType` and `TimestampType` as int and long again and again.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7010 from cloud-fan/cg and squashes the following commits:
      
      aa01cf9 [Wenchen Fan] cleanup CodeGenContext
      40360112
    • Liang-Chi Hsieh's avatar
      [SPARK-8237] [SQL] Add misc function sha2 · 47c874ba
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8237
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6934 from viirya/expr_sha2 and squashes the following commits:
      
      35e0bb3 [Liang-Chi Hsieh] For comments.
      68b5284 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
      8573aff [Liang-Chi Hsieh] Remove unnecessary Product.
      ee61e06 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
      59e41aa [Liang-Chi Hsieh] Add misc function: sha2.
      47c874ba
  5. Jun 25, 2015
    • Shivaram Venkataraman's avatar
      [SPARK-8637] [SPARKR] [HOTFIX] Fix packages argument, sparkSubmitBinName · c392a9ef
      Shivaram Venkataraman authored
      cc cafreeman
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #7022 from shivaram/sparkr-init-hotfix and squashes the following commits:
      
      9178d15 [Shivaram Venkataraman] Fix packages argument, sparkSubmitBinName
      c392a9ef
    • Yanbo Liang's avatar
      [MINOR] [MLLIB] rename some functions of PythonMLLibAPI · 2519dcc3
      Yanbo Liang authored
      Keep the same naming conventions for PythonMLLibAPI.
      Only the following three functions is different from others
      ```scala
      trainNaiveBayes
      trainGaussianMixture
      trainWord2Vec
      ```
      So change them to
      ```scala
      trainNaiveBayesModel
      trainGaussianMixtureModel
      trainWord2VecModel
      ```
      It does not affect any users and public APIs, only to make better understand for developer and code hacker.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7011 from yanboliang/py-mllib-api-rename and squashes the following commits:
      
      771ffec [Yanbo Liang] rename some functions of PythonMLLibAPI
      2519dcc3
    • Yin Huai's avatar
      [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite. · f9b397f5
      Yin Huai authored
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7009 from yhuai/SPARK-8567 and squashes the following commits:
      
      62fb1f9 [Yin Huai] Add sc.stop().
      b22cf7d [Yin Huai] Add logs.
      f9b397f5
    • Tom Graves's avatar
      [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/ta… · e988adb5
      Tom Graves authored
      …rget versions.
      
      I basically copied the compatibility rules from the top level pom.xml into here.  Someone more familiar with all the options in the top level pom may want to make sure nothing else should be copied on down.
      
      With this is allows me to build with jdk8 and run with lower versions.  Source shows compiled for jdk6 as its supposed to.
      
      Author: Tom Graves <tgraves@yahoo-inc.com>
      Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
      
      Closes #6989 from tgravescs/SPARK-8574 and squashes the following commits:
      
      e1ea2d4 [Thomas Graves] Change to use combine.children="append"
      150d645 [Tom Graves] [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/target versions
      e988adb5
    • Joshi's avatar
      [SPARK-5768] [WEB UI] Fix for incorrect memory in Spark UI · 085a7216
      Joshi authored
      Fix for incorrect memory in Spark UI as per SPARK-5768
      
      Author: Joshi <rekhajoshm@gmail.com>
      Author: Rekha Joshi <rekhajoshm@gmail.com>
      
      Closes #6972 from rekhajoshm/SPARK-5768 and squashes the following commits:
      
      b678a91 [Joshi] Fix for incorrect memory in Spark UI
      2fe53d9 [Joshi] Fix for incorrect memory in Spark UI
      eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI
      0be142d [Rekha Joshi] Merge pull request #3 from apache/master
      106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
      e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
      085a7216
    • Cheng Lian's avatar
      [SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class · c337844e
      Cheng Lian authored
      `HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers).
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6998 from liancheng/spark-8604 and squashes the following commits:
      
      9be51d1 [Cheng Lian] Adds more comments
      6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class
      c337844e
    • Matt Massie's avatar
      [SPARK-7884] Move block deserialization from BlockStoreShuffleFetcher to ShuffleReader · 7bac2fe7
      Matt Massie authored
      This commit updates the shuffle read path to enable ShuffleReader implementations more control over the deserialization process.
      
      The BlockStoreShuffleFetcher.fetch() method has been renamed to BlockStoreShuffleFetcher.fetchBlockStreams(). Previously, this method returned a record iterator; now, it returns an iterator of (BlockId, InputStream). Deserialization of records is now handled in the ShuffleReader.read() method.
      
      This change creates a cleaner separation of concerns and allows implementations of ShuffleReader more flexibility in how records are retrieved.
      
      Author: Matt Massie <massie@cs.berkeley.edu>
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6423 from massie/shuffle-api-cleanup and squashes the following commits:
      
      8b0632c [Matt Massie] Minor Scala style fixes
      d0a1b39 [Matt Massie] Merge pull request #1 from kayousterhout/massie_shuffle-api-cleanup
      290f1eb [Kay Ousterhout] Added test for HashShuffleReader.read()
      5186da0 [Kay Ousterhout] Revert "Add test to ensure HashShuffleReader is freeing resources"
      f98a1b9 [Matt Massie] Add test to ensure HashShuffleReader is freeing resources
      a011bfa [Matt Massie] Use PrivateMethodTester on check that delegate stream is closed
      4ea1712 [Matt Massie] Small code cleanup for readability
      7429a98 [Matt Massie] Update tests to check that BufferReleasingStream is closing delegate InputStream
      f458489 [Matt Massie] Remove unnecessary map() on return Iterator
      4abb855 [Matt Massie] Consolidate metric code. Make it clear why InterrubtibleIterator is needed.
      5c30405 [Matt Massie] Return visibility of BlockStoreShuffleFetcher to private[hash]
      7eedd1d [Matt Massie] Small Scala import cleanup
      28f8085 [Matt Massie] Small import nit
      f93841e [Matt Massie] Update shuffle read metrics in ShuffleReader instead of BlockStoreShuffleFetcher.
      7e8e0fe [Matt Massie] Minor Scala style fixes
      01e8721 [Matt Massie] Explicitly cast iterator in branches for type clarity
      7c8f73e [Matt Massie] Close Block InputStream immediately after all records are read
      208b7a5 [Matt Massie] Small code style changes
      b70c945 [Matt Massie] Make BlockStoreShuffleFetcher visible to shuffle package
      19135f2 [Matt Massie] [SPARK-7884] Allow Spark shuffle APIs to be more customizable
      7bac2fe7
  6. Jun 24, 2015
Loading