Skip to content
Snippets Groups Projects
  1. Jul 27, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  2. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Yin Huai's avatar
      [SPARK-2682] Javadoc generated from Scala source code is not in javadoc's index · a19d8c89
      Yin Huai authored
      Add genjavadocSettings back to SparkBuild. It requires #1585 .
      
      https://issues.apache.org/jira/browse/SPARK-2682
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1584 from yhuai/SPARK-2682 and squashes the following commits:
      
      2e89461 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2682
      54e3b66 [Yin Huai] Add genjavadocSettings back.
      a19d8c89
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  3. Jul 23, 2014
    • Prashant Sharma's avatar
      [SPARK-2549] Functions defined inside of other functions trigger failures · 9b763329
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1510 from ScrapCodes/SPARK-2549/fun-in-fun and squashes the following commits:
      
      9458bc5 [Prashant Sharma] Tested by removing an inner function from excludes.
      bc03b1c [Prashant Sharma] SPARK-2549 Functions defined inside of other functions trigger failures
      9b763329
  4. Jul 22, 2014
  5. Jul 21, 2014
    • Gregory Owen's avatar
      [SPARK-2086] Improve output of toDebugString to make shuffle boundaries more clear · c3462c65
      Gregory Owen authored
      Changes RDD.toDebugString() to show hierarchy and shuffle transformations more clearly
      
      New output:
      
      ```
      (3) FlatMappedValuesRDD[325] at apply at Transformer.scala:22
       |  MappedValuesRDD[324] at apply at Transformer.scala:22
       |  CoGroupedRDD[323] at apply at Transformer.scala:22
       +-(5) MappedRDD[320] at apply at Transformer.scala:22
       |  |  MappedRDD[319] at apply at Transformer.scala:22
       |  |  MappedValuesRDD[318] at apply at Transformer.scala:22
       |  |  MapPartitionsRDD[317] at apply at Transformer.scala:22
       |  |  ShuffledRDD[316] at apply at Transformer.scala:22
       |  +-(10) MappedRDD[315] at apply at Transformer.scala:22
       |     |   ParallelCollectionRDD[314] at apply at Transformer.scala:22
       +-(100) MappedRDD[322] at apply at Transformer.scala:22
           |   ParallelCollectionRDD[321] at apply at Transformer.scala:22
      ```
      
      Author: Gregory Owen <greowen@gmail.com>
      
      Closes #1364 from GregOwen/to-debug-string and squashes the following commits:
      
      08f5c78 [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly
      1603f7b [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly
      c3462c65
  6. Jul 18, 2014
    • Manish Amde's avatar
      [MLlib] SPARK-1536: multiclass classification support for decision tree · d88f6be4
      Manish Amde authored
      The ability to perform multiclass classification is a big advantage for using decision trees and was a highly requested feature for mllib. This pull request adds multiclass classification support to the MLlib decision tree. It also adds sample weights support using WeightedLabeledPoint class for handling unbalanced datasets during classification. It will also support algorithms such as AdaBoost which requires instances to be weighted.
      
      It handles the special case where the categorical variables cannot be ordered for multiclass classification and thus the optimizations used for speeding up binary classification cannot be directly used for multiclass classification with categorical variables. More specifically, for m categories in a categorical feature, it analyses all the ```2^(m-1) - 1``` categorical splits provided that #splits are less than the maxBins provided in the input. This condition will not be met for features with large number of categories -- using decision trees is not recommended for such datasets in general since the categorical features are favored over continuous features. Moreover, the user can use a combination of tricks (increasing bin size of the tree algorithms, use binary encoding for categorical features or use one-vs-all classification strategy) to avoid these constraints.
      
      The new code is accompanied by unit tests and has also been tested on the iris and covtype datasets.
      
      cc: mengxr, etrain, hirakendu, atalwalkar, srowen
      
      Author: Manish Amde <manish9ue@gmail.com>
      Author: manishamde <manish9ue@gmail.com>
      Author: Evan Sparks <sparks@cs.berkeley.edu>
      
      Closes #886 from manishamde/multiclass and squashes the following commits:
      
      26f8acc [Manish Amde] another attempt at fixing mima
      c5b2d04 [Manish Amde] more MIMA fixes
      1ce7212 [Manish Amde] change problem filter for mima
      10fdd82 [Manish Amde] fixing MIMA excludes
      e1c970d [Manish Amde] merged master
      abf2901 [Manish Amde] adding classes to MimaExcludes.scala
      45e767a [Manish Amde] adding developer api annotation for overriden methods
      c8428c4 [Manish Amde] fixing weird multiline bug
      afced16 [Manish Amde] removed label weights support
      2d85a48 [Manish Amde] minor: fixed scalastyle issues reprise
      4e85f2c [Manish Amde] minor: fixed scalastyle issues
      b2ae41f [Manish Amde] minor: scalastyle
      e4c1321 [Manish Amde] using while loop for regression histograms
      d75ac32 [Manish Amde] removed WeightedLabeledPoint from this PR
      0fecd38 [Manish Amde] minor: add newline to EOF
      2061cf5 [Manish Amde] merged from master
      06b1690 [Manish Amde] fixed off-by-one error in bin to split conversion
      9cc3e31 [Manish Amde] added implicit conversion import
      5c1b2ca [Manish Amde] doc for PointConverter class
      485eaae [Manish Amde] implicit conversion from LabeledPoint to WeightedLabeledPoint
      3d7f911 [Manish Amde] updated doc
      8e44ab8 [Manish Amde] updated doc
      adc7315 [Manish Amde] support ordered categorical splits for multiclass classification
      e3e8843 [Manish Amde] minor code formatting
      23d4268 [Manish Amde] minor: another minor code style
      34ee7b9 [Manish Amde] minor: code style
      237762d [Manish Amde] renaming functions
      12e6d0a [Manish Amde] minor: removing line in doc
      9a90c93 [Manish Amde] Merge branch 'master' into multiclass
      1892a2c [Manish Amde] tests and use multiclass binaggregate length when atleast one categorical feature is present
      f5f6b83 [Manish Amde] multiclass for continous variables
      8cfd3b6 [Manish Amde] working for categorical multiclass classification
      828ff16 [Manish Amde] added categorical variable test
      bce835f [Manish Amde] code cleanup
      7e5f08c [Manish Amde] minor doc
      1dd2735 [Manish Amde] bin search logic for multiclass
      f16a9bb [Manish Amde] fixing while loop
      d811425 [Manish Amde] multiclass bin aggregate logic
      ab5cb21 [Manish Amde] multiclass logic
      d8e4a11 [Manish Amde] sample weights
      ed5a2df [Manish Amde] fixed classification requirements
      d012be7 [Manish Amde] fixed while loop
      18d2835 [Manish Amde] changing default values for num classes
      6b912dc [Manish Amde] added numclasses to tree runner, predict logic for multiclass, add multiclass option to train
      75f2bfc [Manish Amde] minor code style fix
      e547151 [Manish Amde] minor modifications
      34549d0 [Manish Amde] fixing error during merge
      098e8c5 [Manish Amde] merged master
      e006f9d [Manish Amde] changing variable names
      5c78e1a [Manish Amde] added multiclass support
      6c7af22 [Manish Amde] prepared for multiclass without breaking binary classification
      46e06ee [Manish Amde] minor mods
      3f85a17 [Manish Amde] tests for multiclass classification
      4d5f70c [Manish Amde] added multiclass support for find splits bins
      46f909c [Manish Amde] todo for multiclass support
      455bea9 [Manish Amde] fixed tests
      14aea48 [Manish Amde] changing instance format to weighted labeled point
      a1a6e09 [Manish Amde] added weighted point class
      968ca9d [Manish Amde] merged master
      7fc9545 [Manish Amde] added docs
      ce004a1 [Manish Amde] minor formatting
      b27ad2c [Manish Amde] formatting
      426bb28 [Manish Amde] programming guide blurb
      8053fed [Manish Amde] more formatting
      5eca9e4 [Manish Amde] grammar
      4731cda [Manish Amde] formatting
      5e82202 [Manish Amde] added documentation, fixed off by 1 error in max level calculation
      cbd9f14 [Manish Amde] modified scala.math to math
      dad9652 [Manish Amde] removed unused imports
      e0426ee [Manish Amde] renamed parameter
      718506b [Manish Amde] added unit test
      1517155 [Manish Amde] updated documentation
      9dbdabe [Manish Amde] merge from master
      719d009 [Manish Amde] updating user documentation
      fecf89a [manishamde] Merge pull request #6 from etrain/deep_tree
      0287772 [Evan Sparks] Fixing scalastyle issue.
      2f1e093 [Manish Amde] minor: added doc for maxMemory parameter
      2f6072c [manishamde] Merge pull request #5 from etrain/deep_tree
      abc5a23 [Evan Sparks] Parameterizing max memory.
      50b143a [Manish Amde] adding support for very deep trees
      d88f6be4
  7. Jul 17, 2014
    • Reynold Xin's avatar
      [SPARK-2534] Avoid pulling in the entire RDD in various operators · d988d345
      Reynold Xin authored
      This should go into both master and branch-1.0.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1450 from rxin/agg-closure and squashes the following commits:
      
      e40f363 [Reynold Xin] Mima check excludes.
      9186364 [Reynold Xin] Define the return type more explicitly.
      38e348b [Reynold Xin] Fixed the cases in RDD.scala.
      ea6b34d [Reynold Xin] Blah
      89b9c43 [Reynold Xin] Fix other instances of accidentally pulling in extra stuff in closures.
      73b2783 [Reynold Xin] [SPARK-2534] Avoid pulling in the entire RDD in groupByKey.
      d988d345
  8. Jul 15, 2014
    • Yin Huai's avatar
      [SPARK-2474][SQL] For a registered table in OverrideCatalog, the Analyzer... · 8af46d58
      Yin Huai authored
      [SPARK-2474][SQL] For a registered table in OverrideCatalog, the Analyzer failed to resolve references in the format of "tableName.fieldName"
      
      Please refer to JIRA (https://issues.apache.org/jira/browse/SPARK-2474) for how to reproduce the problem and my understanding of the root cause.
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1406 from yhuai/SPARK-2474 and squashes the following commits:
      
      96b1627 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2474
      af36d65 [Yin Huai] Fix comment.
      be86ba9 [Yin Huai] Correct SQL console settings.
      c43ad00 [Yin Huai] Wrap the relation in a Subquery named by the table name in OverrideCatalog.lookupRelation.
      a5c2145 [Yin Huai] Support sql/console.
      8af46d58
    • Takuya UESHIN's avatar
      [SPARK-2467] Revert SparkBuild to publish-local to both .m2 and .ivy2. · e2255e4b
      Takuya UESHIN authored
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1398 from ueshin/issues/SPARK-2467 and squashes the following commits:
      
      7f01d58 [Takuya UESHIN] Revert SparkBuild to publish-local to both .m2 and .ivy2.
      e2255e4b
  9. Jul 12, 2014
    • DB Tsai's avatar
      [SPARK-1969][MLlib] Online summarizer APIs for mean, variance, min, and max · 55960869
      DB Tsai authored
      It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi with documentation and unitests.
      
      Changes:
      1) Moved the private implementation from org.apache.spark.mllib.linalg.ColumnStatisticsAggregator to org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
      2) When creating OnlineSummarizer object, the number of columns is not needed in the constructor. It's determined when users add the first sample.
      3) Added the APIs documentation for MultivariateOnlineSummarizer.
      4) Added the unittests for MultivariateOnlineSummarizer.
      
      Author: DB Tsai <dbtsai@dbtsai.com>
      
      Closes #955 from dbtsai/dbtsai-summarizer and squashes the following commits:
      
      b13ac90 [DB Tsai] dbtsai-summarizer
      55960869
  10. Jul 11, 2014
    • Prashant Sharma's avatar
      [SPARK-2437] Rename MAVEN_PROFILES to SBT_MAVEN_PROFILES and add SBT_MAVEN_PROPERTIES · b23e9c3e
      Prashant Sharma authored
      NOTE: It is not possible to use both env variable  `SBT_MAVEN_PROFILES`  and `-P` flag at same time. `-P` if specified takes precedence.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1374 from ScrapCodes/SPARK-2437/rename-MAVEN_PROFILES and squashes the following commits:
      
      8694bde [Prashant Sharma] [SPARK-2437] Rename MAVEN_PROFILES to SBT_MAVEN_PROFILES and add SBT_MAVEN_PROPERTIES
      b23e9c3e
  11. Jul 10, 2014
    • tmalaska's avatar
      [SPARK-1478].3: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 · 40a8fef4
      tmalaska authored
      This is a modified version of this PR https://github.com/apache/spark/pull/1168 done by @tmalaska
      Adds MIMA binary check exclusions.
      
      Author: tmalaska <ted.malaska@cloudera.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1347 from tdas/FLUME-1915 and squashes the following commits:
      
      96065df [Tathagata Das] Added Mima exclusion for FlumeReceiver.
      41d5338 [tmalaska] Address line 57 that was too long
      12617e5 [tmalaska] SPARK-1478: Upgrade FlumeInputDStream's Flume...
      40a8fef4
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
  12. Jul 08, 2014
    • DB Tsai's avatar
      [SPARK-2413] Upgrade junit_xml_listener to 0.5.1 · ac9cdc11
      DB Tsai authored
      which fixes the following issues
      
      1) fix the class name to be fully qualified classpath
      2) make sure the the reporting time is in second not in miliseond, which causing JUnit HTML to report incorrect number
      3) make sure the duration of the tests are accumulative.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #1333 from dbtsai/dbtsai-junit and squashes the following commits:
      
      bbeac4b [DB Tsai] Upgrade junit_xml_listener to 0.5.1 which fixes the following issues
      ac9cdc11
  13. Jul 01, 2014
  14. Jun 23, 2014
    • Marcelo Vanzin's avatar
      [SPARK-1768] History server enhancements. · 21ddd7d1
      Marcelo Vanzin authored
      Two improvements to the history server:
      
      - Separate the HTTP handling from history fetching, so that it's easy to add
        new backends later (thinking about SPARK-1537 in the long run)
      
      - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
        memory for faster access. This allows the app limit to go away, since holding
        just the listing in memory shouldn't be too expensive unless the user has millions
        of completed apps in the history (at which point I'd expect other issues to arise
        aside from history server memory usage, such as FileSystem.listStatus()
        starting to become ridiculously expensive).
      
      I also fixed a few minor things along the way which aren't really worth mentioning.
      I also removed the app's log path from the UI since that information may not even
      exist depending on which backend is used (even though there is only one now).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #718 from vanzin/hist-server and squashes the following commits:
      
      53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording.
      c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs.
      dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration.
      4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo.
      2a7f68d [Marcelo Vanzin] Address review feedback.
      4e72c77 [Marcelo Vanzin] Remove comment about ordering.
      249bcea [Marcelo Vanzin] Remove offset / count from provider interface.
      ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps.
      6e2432f [Marcelo Vanzin] Second round of feedback.
      b2c570a [Marcelo Vanzin] Make class package-private.
      4406f61 [Marcelo Vanzin] Cosmetic change to listing header.
      e852149 [Marcelo Vanzin] Initialize new app array to expected size.
      e8026f4 [Marcelo Vanzin] Review feedback.
      49d2fd3 [Marcelo Vanzin] Fix a comment.
      91e96ca [Marcelo Vanzin] Fix scalastyle issues.
      6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info.
      eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down.
      bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI.
      b284478 [Marcelo Vanzin] Separate history server from history backend.
      21ddd7d1
  15. Jun 21, 2014
  16. Jun 20, 2014
    • Marcelo Vanzin's avatar
      Fix some tests. · 648553d4
      Marcelo Vanzin authored
      - JavaAPISuite was trying to compare a bare path with a URI. Fix by
        extracting the path from the URI, since we know it should be a
        local path anyway/
      
      - b9be1609 excluded the ASM dependency everywhere, but easymock needs
        it (because cglib needs it). So re-add the dependency, with test
        scope this time.
      
      The second one above actually uncovered a weird situation: the maven
      test target works, even though I can't find the class sbt complains
      about in its classpath. sbt complains with:
      
        [error] Uncaught exception when running org.apache.spark.util
        .random.RandomSamplerSuite: java.lang.NoClassDefFoundError:
        org/objectweb/asm/Type
      
      To avoid more weirdness caused by that, I explicitly added the asm
      dependency to both maven and sbt (for tests only), and verified
      the classes don't end up in the final assembly.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #917 from vanzin/flaky-tests and squashes the following commits:
      
      d022320 [Marcelo Vanzin] Fix some tests.
      648553d4
  17. Jun 17, 2014
    • Yin Huai's avatar
      [SPARK-2060][SQL] Querying JSON Datasets with SQL and DSL in Spark SQL · d2f4f30b
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2060
      
      Programming guide: http://yhuai.github.io/site/sql-programming-guide.html
      
      Scala doc of SQLContext: http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.sql.SQLContext
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #999 from yhuai/newJson and squashes the following commits:
      
      227e89e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      ce8eedd [Yin Huai] rxin's comments.
      bc9ac51 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      94ffdaa [Yin Huai] Remove "get" from method names.
      ce31c81 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      e2773a6 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      79ea9ba [Yin Huai] Fix typos.
      5428451 [Yin Huai] Newline
      1f908ce [Yin Huai] Remove extra line.
      d7a005c [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      7ea750e [Yin Huai] marmbrus's comments.
      6a5f5ef [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      83013fb [Yin Huai] Update Java Example.
      e7a6c19 [Yin Huai] SchemaRDD.javaToPython should convert a field with the StructType to a Map.
      6d20b85 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      4fbddf0 [Yin Huai] Programming guide.
      9df8c5a [Yin Huai] Python API.
      7027634 [Yin Huai] Java API.
      cff84cc [Yin Huai] Use a SchemaRDD for a JSON dataset.
      d0bd412 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      ab810b0 [Yin Huai] Make JsonRDD private.
      6df0891 [Yin Huai] Apache header.
      8347f2e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      66f9e76 [Yin Huai] Update docs and use the entire dataset to infer the schema.
      8ffed79 [Yin Huai] Update the example.
      a5a4b52 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      4325475 [Yin Huai] If a sampled dataset is used for schema inferring, update the schema of the JsonTable after first execution.
      65b87f0 [Yin Huai] Fix sampling...
      8846af5 [Yin Huai] API doc.
      52a2275 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      0387523 [Yin Huai] Address PR comments.
      666b957 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      a2313a6 [Yin Huai] Address PR comments.
      f3ce176 [Yin Huai] After type conflict resolution, if a NullType is found, StringType is used.
      0576406 [Yin Huai] Add Apache license header.
      af91b23 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson
      f45583b [Yin Huai] Infer the schema of a JSON dataset (a text file with one JSON object per line or a RDD[String] with one JSON object per string) and returns a SchemaRDD.
      f31065f [Yin Huai] A query plan or a SchemaRDD can print out its schema.
      d2f4f30b
  18. Jun 12, 2014
    • Andrew Or's avatar
      [Minor] Fix style, formatting and naming in BlockManager etc. · 44daec5a
      Andrew Or authored
      This is a precursor to a bigger change. I wanted to separate out the relatively insignificant changes so the ultimate PR is not inflated.
      
      (Warning: this PR is full of unimportant nitpicks)
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1058 from andrewor14/bm-minor and squashes the following commits:
      
      8e12eaf [Andrew Or] SparkException -> BlockException
      c36fd53 [Andrew Or] Make parts of BlockManager more readable
      0a5f378 [Andrew Or] Entry -> MemoryEntry
      e9762a5 [Andrew Or] Tone down string interpolation (minor reverts)
      c4de9ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into bm-minor
      b3470f1 [Andrew Or] More string interpolation (minor)
      7f9dcab [Andrew Or] Use string interpolation (minor)
      94a425b [Andrew Or] Refactor against duplicate code + minor changes
      8a6a7dc [Andrew Or] Exception -> SparkException
      97c410f [Andrew Or] Deal with MIMA excludes
      2480f1d [Andrew Or] Fixes in StorgeLevel.scala
      abb0163 [Andrew Or] Style, formatting and naming fixes
      44daec5a
    • Doris Xin's avatar
      SPARK-1939 Refactor takeSample method in RDD to use ScaSRS · 1de1d703
      Doris Xin authored
      Modified the takeSample method in RDD to use the ScaSRS sampling technique to improve performance. Added a private method that computes sampling rate > sample_size/total to ensure sufficient sample size with success rate >= 0.9999. Added a unit test for the private method to validate choice of sampling rate.
      
      Author: Doris Xin <doris.s.xin@gmail.com>
      Author: dorx <doris.s.xin@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #916 from dorx/takeSample and squashes the following commits:
      
      5b061ae [Doris Xin] merge master
      444e750 [Doris Xin] edge cases
      3de882b [dorx] Merge pull request #2 from mengxr/SPARK-1939
      82dde31 [Xiangrui Meng] update pyspark's takeSample
      48d954d [Doris Xin] remove unused imports from RDDSuite
      fb1452f [Doris Xin] allowing num to be greater than count in all cases
      1481b01 [Doris Xin] washing test tubes and making coffee
      dc699f3 [Doris Xin] give back imports removed by accident in rdd.py
      64e445b [Doris Xin] logwarnning as soon as it enters the while loop
      55518ed [Doris Xin] added TODO for logging in rdd.py
      eff89e2 [Doris Xin] addressed reviewer comments.
      ecab508 [Doris Xin] "fixed checkstyle violation
      0a9b3e3 [Doris Xin] "reviewer comment addressed"
      f80f270 [Doris Xin] Merge branch 'master' into takeSample
      ae3ad04 [Doris Xin] fixed edge cases to prevent overflow
      065ebcd [Doris Xin] Merge branch 'master' into takeSample
      9bdd36e [Doris Xin] Check sample size and move computeFraction
      e3fd6a6 [Doris Xin] Merge branch 'master' into takeSample
      7cab53a [Doris Xin] fixed import bug in rdd.py
      ffea61a [Doris Xin] SPARK-1939: Refactor takeSample method in RDD
      1441977 [Doris Xin] SPARK-1939 Refactor takeSample method in RDD to use ScaSRS
      1de1d703
    • Patrick Wendell's avatar
      SPARK-1843: Replace assemble-deps with env variable. · 1c04652c
      Patrick Wendell authored
      (This change is actually small, I moved some logic into
      compute-classpath that was previously in spark-class).
      
      Assemble deps has existed for a while to allow developers to
      run local code with new changes quickly. When I'm developing I
      typically use a simpler approach which just prepends the Spark
      classes to the classpath before the assembly jar. This is well
      defined in the JVM and the Spark classes take precedence over those
      in the assembly.
      
      This approach is portable across both builds which is the main reason I'd
      like to switch to it. It's also a bit easier to toggle on and off quickly.
      
      The way you use this is the following:
      ```
      $ ./bin/spark-shell # Use spark with the normal assembly
      $ export SPARK_PREPEND_CLASSES=true
      $ ./bin/spark-shell # Now it's using compiled classes
      $ unset SPARK_PREPEND_CLASSES
      $ ./bin/spark-shell # Back to normal
      ```
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #877 from pwendell/assemble-deps and squashes the following commits:
      
      8a11345 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into assemble-deps
      faa3168 [Patrick Wendell] Adding a warning for compatibility
      3f151a7 [Patrick Wendell] Small fix
      bbfb73c [Patrick Wendell] Review feedback
      328e9f8 [Patrick Wendell] SPARK-1843: Replace assemble-deps with env variable.
      1c04652c
    • Sandy Ryza's avatar
      SPARK-554. Add aggregateByKey. · ce92a9c1
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #705 from sryza/sandy-spark-554 and squashes the following commits:
      
      2302b8f [Sandy Ryza] Add MIMA exclude
      f52e0ad [Sandy Ryza] Fix Python tests for real
      2f3afa3 [Sandy Ryza] Fix Python test
      0b735e9 [Sandy Ryza] Fix line lengths
      ae56746 [Sandy Ryza] Fix doc (replace T with V)
      c2be415 [Sandy Ryza] Java and Python aggregateByKey
      23bf400 [Sandy Ryza] SPARK-554.  Add aggregateByKey.
      ce92a9c1
  19. Jun 11, 2014
    • Tor Myklebust's avatar
      [SPARK-1672][MLLIB] Separate user and product partitioning in ALS · d9203350
      Tor Myklebust authored
      Some clean up work following #593.
      
      1. Allow to set different number user blocks and number product blocks in `ALS`.
      2. Update `MovieLensALS` to reflect the change.
      
      Author: Tor Myklebust <tmyklebu@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1014 from mengxr/SPARK-1672 and squashes the following commits:
      
      0e910dd [Xiangrui Meng] change private[this] to private[recommendation]
      36420c7 [Xiangrui Meng] set exclusion rules for ALS
      9128b77 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-1672
      294efe9 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-1672
      9bab77b [Xiangrui Meng] clean up add numUserBlocks and numProductBlocks to MovieLensALS
      84c8e8c [Xiangrui Meng] Merge branch 'master' into SPARK-1672
      d17a8bf [Xiangrui Meng] merge master
      a4925fd [Tor Myklebust] Style.
      bd8a75c [Tor Myklebust] Merge branch 'master' of github.com:apache/spark into alsseppar
      021f54b [Tor Myklebust] Separate user and product blocks.
      dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go.
      23d6f91 [Tor Myklebust] Stop making the partitioner configurable.
      495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark
      674933a [Tor Myklebust] Fix style.
      40edc23 [Tor Myklebust] Fix missing space.
      f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach.
      5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'.
      36a0f43 [Tor Myklebust] Make the partitioner private.
      d872b09 [Tor Myklebust] Add negative id ALS test.
      df27697 [Tor Myklebust] Support custom partitioners.  Currently we use the same partitioner for users and products.
      c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing.
      c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.
      d9203350
    • Prashant Sharma's avatar
      [SPARK-2069] MIMA false positives · 5b754b45
      Prashant Sharma authored
      Fixes SPARK 2070 and 2071
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1021 from ScrapCodes/SPARK-2070/package-private-methods and squashes the following commits:
      
      7979a57 [Prashant Sharma] addressed code review comments
      558546d [Prashant Sharma] A little fancy error message.
      59275ab [Prashant Sharma] SPARK-2071 Mima ignores classes and its members from previous versions too.
      0c4ff2b [Prashant Sharma] SPARK-2070 Ignore methods along with annotated classes.
      5b754b45
  20. Jun 06, 2014
    • witgo's avatar
      [SPARK-1841]: update scalatest to version 2.1.5 · 41c4a331
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #713 from witgo/scalatest and squashes the following commits:
      
      b627a6a [witgo] merge master
      51fb3d6 [witgo] merge master
      3771474 [witgo] fix RDDSuite
      996d6f9 [witgo] fix TimeStampedWeakValueHashMap test
      9dfa4e7 [witgo] merge bug
      1479b22 [witgo] merge master
      29b9194 [witgo] fix code style
      022a7a2 [witgo] fix test dependency
      a52c0fa [witgo] fix test dependency
      cd8f59d [witgo] Merge branch 'master' of https://github.com/apache/spark into scalatest
      046540d [witgo] fix RDDSuite.scala
      2c543b9 [witgo] fix ReplSuite.scala
      c458928 [witgo] update scalatest to version 2.1.5
      41c4a331
  21. Jun 05, 2014
    • Marcelo Vanzin's avatar
      Remove compile-scoped junit dependency. · 668cb1de
      Marcelo Vanzin authored
      This avoids having junit classes showing up in the assembly jar.
      I verified that only test classes in the jtransforms package
      use junit.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #794 from vanzin/junit-dep-exclusion and squashes the following commits:
      
      274e1c2 [Marcelo Vanzin] Remove junit from assembly in sbt build also.
      ad950be [Marcelo Vanzin] Remove compile-scoped junit dependency.
      668cb1de
    • Kalpit Shah's avatar
      sbt 0.13.X should be using sbt-assembly 0.11.X · 5473aa7c
      Kalpit Shah authored
      https://github.com/sbt/sbt-assembly/blob/master/README.md
      
      Author: Kalpit Shah <shahkalpit84@gmail.com>
      
      Closes #555 from kalpit/upgrade/sbtassembly and squashes the following commits:
      
      1fa7324 [Kalpit Shah] sbt 0.13.X should be using sbt-assembly 0.11.X
      5473aa7c
  22. Jun 04, 2014
    • Kan Zhang's avatar
      [SPARK-1817] RDD.zip() should verify partition sizes for each partition · c402a4a6
      Kan Zhang authored
      RDD.zip() will throw an exception if it finds partition sizes are not the same.
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #944 from kanzhang/SPARK-1817 and squashes the following commits:
      
      c073848 [Kan Zhang] [SPARK-1817] Cosmetic updates
      524c670 [Kan Zhang] [SPARK-1817] RDD.zip() should verify partition sizes for each partition
      c402a4a6
  23. Jun 03, 2014
    • Reynold Xin's avatar
      SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog. · 1faef149
      Reynold Xin authored
      I also corrected some errors made in the previous HLL count approximate API, including relativeSD wasn't really a measure for error (and we used it to test error bounds in test results).
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #897 from rxin/hll and squashes the following commits:
      
      4d83f41 [Reynold Xin] New error bound and non-randomness.
      f154ea0 [Reynold Xin] Added a comment on the value bound for testing.
      e367527 [Reynold Xin] One more round of code review.
      41e649a [Reynold Xin] Update final mima list.
      9e320c8 [Reynold Xin] Incorporate code review feedback.
      e110d70 [Reynold Xin] Merge branch 'master' into hll
      354deb8 [Reynold Xin] Added comment on the Mima exclude rules.
      acaa524 [Reynold Xin] Added the right exclude rules in MimaExcludes.
      6555bfe [Reynold Xin] Added a default method and re-arranged MimaExcludes.
      1db1522 [Reynold Xin] Excluded util.SerializableHyperLogLog from MIMA check.
      9221b27 [Reynold Xin] Merge branch 'master' into hll
      88cfe77 [Reynold Xin] Updated documentation and restored the old incorrect API to maintain API compatibility.
      1294be6 [Reynold Xin] Updated HLL+.
      e7786cb [Reynold Xin] Merge branch 'master' into hll
      c0ef0c2 [Reynold Xin] SPARK-1941: Update streamlib to 2.7.0 and use HyperLogLogPlus instead of HyperLogLog.
      1faef149
    • Joseph E. Gonzalez's avatar
      Synthetic GraphX Benchmark · 894ecde0
      Joseph E. Gonzalez authored
      This PR accomplishes two things:
      
      1. It introduces a Synthetic Benchmark application that generates an arbitrarily large log-normal graph and executes either PageRank or connected components on the graph.  This can be used to profile GraphX system on arbitrary clusters without access to large graph datasets
      
      2. This PR improves the implementation of the log-normal graph generator.
      
      Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com>
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #720 from jegonzal/graphx_synth_benchmark and squashes the following commits:
      
      e40812a [Ankur Dave] Exclude all of GraphX from compatibility checks vs. 1.0.0
      bccccad [Ankur Dave] Fix long lines
      374678a [Ankur Dave] Bugfix and style changes
      1bdf39a [Joseph E. Gonzalez] updating options
      d943972 [Joseph E. Gonzalez] moving the benchmark application into the examples folder.
      f4f839a [Joseph E. Gonzalez] Creating a synthetic benchmark script.
      894ecde0
    • tzolov's avatar
      Add support for Pivotal HD in the Maven build: SPARK-1992 · b1f28535
      tzolov authored
      Allow Spark to build against particular Pivotal HD distributions. For example to build Spark against Pivotal HD 2.0.1 one can run:
      ```
      mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0-gphd-3.0.1.0 -DskipTests clean package
      ```
      
      Author: tzolov <christian.tzolov@gmail.com>
      
      Closes #942 from tzolov/master and squashes the following commits:
      
      bc3e05a [tzolov] Add support for Pivotal HD in the Maven build and SBT build: [SPARK-1992]
      b1f28535
  24. Jun 01, 2014
    • Patrick Wendell's avatar
      Better explanation for how to use MIMA excludes. · d17d2214
      Patrick Wendell authored
      This patch does a few things:
      1. We have a file MimaExcludes.scala exclusively for excludes.
      2. The test runner tells users about that file if a test fails.
      3. I've added back the excludes used from 0.9->1.0. We should keep
         these in the project as an official audit trail of times where
         we decided to make exceptions.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #937 from pwendell/mima and squashes the following commits:
      
      7ee0db2 [Patrick Wendell] Better explanation for how to use MIMA excludes.
      d17d2214
  25. May 31, 2014
    • Michael Armbrust's avatar
      Optionally include Hive as a dependency of the REPL. · 7463cd24
      Michael Armbrust authored
      Due to the way spark-shell launches from an assembly jar, I don't think this change will affect anyone who isn't trying to launch the shell directly from sbt.  That said, it is kinda nice to be able to launch all things directly from SBT when developing.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #801 from marmbrus/hiveRepl and squashes the following commits:
      
      9570571 [Michael Armbrust] Optionally include Hive as a dependency of the REPL.
      7463cd24
  26. May 30, 2014
    • Prashant Sharma's avatar
      [SPARK-1971] Update MIMA to compare against Spark 1.0.0 · 79fa8fd4
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #910 from ScrapCodes/enable-mima/spark-core and squashes the following commits:
      
      79f3687 [Prashant Sharma] updated Mima to check against version 1.0
      1e8969c [Prashant Sharma] Spark core missed out on Mima settings. So in effect we never tested spark core for mima related errors.
      79fa8fd4
  27. May 29, 2014
  28. May 19, 2014
  29. May 15, 2014
    • witgo's avatar
      fix different versions of commons-lang dependency and apache/spark#746 addendum · bae07e36
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #754 from witgo/commons-lang and squashes the following commits:
      
      3ebab31 [witgo] merge master
      f3b8fa2 [witgo] merge master
      2083fae [witgo] repeat definition
      5599cdb [witgo] multiple version of sbt  dependency
      c1b66a1 [witgo] fix different versions of commons-lang dependency
      bae07e36
Loading