Skip to content
Snippets Groups Projects
  1. Oct 07, 2014
    • Nicholas Chammas's avatar
      [SPARK-3398] [EC2] Have spark-ec2 intelligently wait for specific cluster states · 5912ca67
      Nicholas Chammas authored
      Instead of waiting arbitrary amounts of time for the cluster to reach a specific state, this patch lets `spark-ec2` explicitly wait for a cluster to reach a desired state.
      
      This is useful in a couple of situations:
      * The cluster is launching and you want to wait until SSH is available before installing stuff.
      * The cluster is being terminated and you want to wait until all the instances are terminated before trying to delete security groups.
      
      This patch removes the need for the `--wait` option and removes some of the time-based retry logic that was being used.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2339 from nchammas/spark-ec2-wait-properly and squashes the following commits:
      
      43a69f0 [Nicholas Chammas] short-circuit SSH check; linear backoff
      9a9e035 [Nicholas Chammas] remove extraneous comment
      26c5ed0 [Nicholas Chammas] replace print with write()
      bb67c06 [Nicholas Chammas] deprecate wait option; remove dead code
      7969265 [Nicholas Chammas] fix long line (PEP 8)
      126e4cf [Nicholas Chammas] wait for specific cluster states
      5912ca67
    • DB Tsai's avatar
      [SPARK-3832][MLlib] Upgrade Breeze dependency to 0.10 · b32bb72e
      DB Tsai authored
      In Breeze 0.10, the L1regParam can be configured through anonymous function in OWLQN, and each component can be penalized differently. This is required for GLMNET in MLlib with L1/L2 regularization.
      https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c
      
      Author: DB Tsai <dbtsai@dbtsai.com>
      
      Closes #2693 from dbtsai/breeze0.10 and squashes the following commits:
      
      7a0c45c [DB Tsai] In Breeze 0.10, the L1regParam can be configured through anonymous function in OWLQN, and each component can be penalized differently. This is required for GLMNET in MLlib with L1/L2 regularization. https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c
      b32bb72e
    • Liquan Pei's avatar
      [SPARK-3486][MLlib][PySpark] PySpark support for Word2Vec · 098c7344
      Liquan Pei authored
      mengxr
      Added PySpark support for Word2Vec
      Change list
      (1) PySpark support for Word2Vec
      (2) SerDe support of string sequence both on python side and JVM side
      (3) Test for SerDe of string sequence on JVM side
      
      Author: Liquan Pei <liquanpei@gmail.com>
      
      Closes #2356 from Ishiihara/Word2Vec-python and squashes the following commits:
      
      476ea34 [Liquan Pei] style fixes
      b13a0b9 [Liquan Pei] resolve merge conflicts and minor fixes
      8671eba [Liquan Pei] Merge remote-tracking branch 'upstream/master' into Word2Vec-python
      daf88a6 [Liquan Pei] modification according to feedback
      a73fa19 [Liquan Pei] clean up
      3d8007b [Liquan Pei] fix findSynonyms for vector
      1bdcd2e [Liquan Pei] minor fixes
      cdef9f4 [Liquan Pei] add missing comments
      b7447eb [Liquan Pei] modify according to feedback
      b9a7383 [Liquan Pei] cache words RDD in fit
      89490bf [Liquan Pei] add tests and Word2VecModelWrapper
      78bbb53 [Liquan Pei] use pickle for seq string SerDe
      a264b08 [Liquan Pei] Merge remote-tracking branch 'upstream/master' into Word2Vec-python
      ca1e5ff [Liquan Pei] fix test
      68e7276 [Liquan Pei] minor style fixes
      48d5e72 [Liquan Pei] Functionality improvement
      0ad3ac1 [Liquan Pei] minor fix
      c867fdf [Liquan Pei] add Word2Vec to pyspark
      098c7344
    • Reza Zadeh's avatar
      [SPARK-3790][MLlib] CosineSimilarity Example · 3d7b36e0
      Reza Zadeh authored
      Provide example  for `RowMatrix.columnSimilarity()`
      
      Author: Reza Zadeh <rizlar@gmail.com>
      
      Closes #2622 from rezazadeh/dimsumexample and squashes the following commits:
      
      8f20b82 [Reza Zadeh] update comment
      379066d [Reza Zadeh] cache rows
      792b81c [Reza Zadeh] Address review comments
      e573c7a [Reza Zadeh] Average absolute error
      b15685f [Reza Zadeh] Use scopt. Distribute evaluation.
      eca3dfd [Reza Zadeh] Documentation
      ac96fb2 [Reza Zadeh] Compute approximation error, add command line.
      4533579 [Reza Zadeh] CosineSimilarity Example
      3d7b36e0
    • zsxwing's avatar
      [SPARK-3777] Display "Executor ID" for Tasks in Stage page · 446063ec
      zsxwing authored
      Now the Stage page only displays "Executor"(host) for tasks. However, there may be more than one Executors running in the same host. Currently, when some task is hung, I only know the host of the faulty executor. Therefore I have to check all executors in the host.
      
      Adding "Executor ID" in the Tasks table. would be helpful to locate the faulty executor. Here is the new page:
      
      ![add_executor_id_for_tasks](https://cloud.githubusercontent.com/assets/1000778/4505774/acb9648c-4afa-11e4-8826-8768a0a60cc9.png)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #2642 from zsxwing/SPARK-3777 and squashes the following commits:
      
      37945af [zsxwing] Put Executor ID and Host into one cell
      4bbe2c7 [zsxwing] [SPARK-3777] Display "Executor ID" for Tasks in Stage page
      446063ec
    • Andrew Or's avatar
      [SPARK-3825] Log more detail when unrolling a block fails · 553737c6
      Andrew Or authored
      Before:
      ```
      14/10/06 16:45:42 WARN CacheManager: Not enough space to cache partition rdd_0_2
      in memory! Free memory is 481861527 bytes.
      ```
      After:
      ```
      14/10/07 11:08:24 WARN MemoryStore: Not enough space to cache rdd_2_0 in memory!
      (computed 68.8 MB so far)
      14/10/07 11:08:24 INFO MemoryStore: Memory use = 1088.0 B (blocks) + 445.1 MB
      (scratch space shared across 8 thread(s)) = 445.1 MB. Storage limit = 459.5 MB.
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2688 from andrewor14/cache-log-message and squashes the following commits:
      
      28e33d6 [Andrew Or] Shy away from "unrolling"
      5638c49 [Andrew Or] Grammar
      39a0c28 [Andrew Or] Log more detail when unrolling a block fails
      553737c6
    • Davies Liu's avatar
      [SPARK-3731] [PySpark] fix memory leak in PythonRDD · bc87cc41
      Davies Liu authored
      The parent.getOrCompute() of PythonRDD is executed in a separated thread, it should release the memory reserved for shuffle and unrolling finally.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2668 from davies/leak and squashes the following commits:
      
      ae98be2 [Davies Liu] fix memory leak in PythonRDD
      bc87cc41
    • Davies Liu's avatar
      [SPARK-3762] clear reference of SparkEnv after stop · 65503296
      Davies Liu authored
      SparkEnv is cached in ThreadLocal object, so after stop and create a new SparkContext, old SparkEnv is still used by some threads, it will trigger many problems, for example, pyspark will have problem after restart SparkContext, because py4j use thread pool for RPC.
      
      This patch will clear all the references after stop a SparkEnv.
      
      cc mateiz tdas pwendell
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2624 from davies/env and squashes the following commits:
      
      a69f30c [Davies Liu] deprecate getThreadLocal
      ba77ca4 [Davies Liu] remove getThreadLocal(), update docs
      ee62bb7 [Davies Liu] cleanup ThreadLocal of SparnENV
      4d0ea8b [Davies Liu] clear reference of SparkEnv after stop
      65503296
    • Masayoshi TSUZUKI's avatar
      [SPARK-3808] PySpark fails to start in Windows · 12e2551e
      Masayoshi TSUZUKI authored
      Modified syntax error of *.cmd script.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2669 from tsudukim/feature/SPARK-3808 and squashes the following commits:
      
      7f804e6 [Masayoshi TSUZUKI] [SPARK-3808] PySpark fails to start in Windows
      12e2551e
    • Hossein's avatar
      [SPARK-3827] Very long RDD names are not rendered properly in web UI · d65fd554
      Hossein authored
      With Spark SQL we generate very long RDD names. These names are not properly rendered in the web UI.
      
      This PR fixes the rendering issue.
      
      [SPARK-3827] #comment Linking PR with JIRA
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #2687 from falaki/sparkTableUI and squashes the following commits:
      
      fd06409 [Hossein] Limit width of cell when RDD name is too long
      d65fd554
    • Thomas Graves's avatar
      [SPARK-3627] - [yarn] - fix exit code and final status reporting to RM · 70e824f7
      Thomas Graves authored
      See the description and whats handled in the jira comment: https://issues.apache.org/jira/browse/SPARK-3627?focusedCommentId=14150013&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150013
      
      This does not handle yarn client mode reporting of the driver to the AM.   I think that should be handled when we make it an unmanaged AM.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #2577 from tgravescs/SPARK-3627 and squashes the following commits:
      
      9c2efbf [Thomas Graves] review comments
      e8cc261 [Thomas Graves] fix accidental typo during fixing comment
      24c98e3 [Thomas Graves] rework
      85f1901 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into SPARK-3627
      fab166d [Thomas Graves] update based on review comments
      32f4dfa [Thomas Graves] switch back
      f0b6519 [Thomas Graves] change order of cleanup staging dir
      d3cc800 [Thomas Graves] SPARK-3627 - yarn - fix exit code and final status reporting to RM
      70e824f7
  2. Oct 06, 2014
    • Nicholas Chammas's avatar
      [SPARK-3479] [Build] Report failed test category · 69c3f441
      Nicholas Chammas authored
      This PR allows SparkQA (i.e. Jenkins) to report in its posts to GitHub what category of test failed, if one can be determined.
      
      The failure categories are:
      * general failure
      * RAT checks failed
      * Scala style checks failed
      * Python style checks failed
      * Build failed
      * Spark unit tests failed
      * PySpark unit tests failed
      * MiMa checks failed
      
      This PR also fixes the diffing logic used to determine if a patch introduces new classes.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2606 from nchammas/report-failed-test-category and squashes the following commits:
      
      d67df03 [Nicholas Chammas] report what test category failed
      69c3f441
    • cocoatomo's avatar
      [SPARK-3773][PySpark][Doc] Sphinx build warning · 2300eb58
      cocoatomo authored
      When building Sphinx documents for PySpark, we have 12 warnings.
      Their causes are almost docstrings in broken ReST format.
      
      To reproduce this issue, we should run following commands on the commit: 6e27cb63.
      
      ```bash
      $ cd ./python/docs
      $ make clean html
      ...
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: missing attribute mentioned in :members: or __all__: module pyspark.mllib.regression, attribute RidgeRegressionModelLinearRegressionWithSGD
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation.
      ...
      checking consistency... /Users/<user>/MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document isn't included in any toctree
      ...
      copying static files... WARNING: html_static_path entry u'/Users/<user>/MyRepos/Scala/spark/python/docs/_static' does not exist
      ...
      build succeeded, 12 warnings.
      ```
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2653 from cocoatomo/issues/3773-sphinx-build-warnings and squashes the following commits:
      
      6f65661 [cocoatomo] [SPARK-3773][PySpark][Doc] Sphinx build warning
      2300eb58
    • Davies Liu's avatar
      [SPARK-3786] [PySpark] speedup tests · 4f01265f
      Davies Liu authored
      This patch try to speed up tests of PySpark, re-use the SparkContext in tests.py and mllib/tests.py to reduce the overhead of create SparkContext, remove some test cases, which did not make sense. It also improve the performance of some cases, such as MergerTests and SortTests.
      
      before this patch:
      
      real	21m27.320s
      user	4m42.967s
      sys	0m17.343s
      
      after this patch:
      
      real	9m47.541s
      user	2m12.947s
      sys	0m14.543s
      
      It almost cut the time by half.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2646 from davies/tests and squashes the following commits:
      
      c54de60 [Davies Liu] revert change about memory limit
      6a2a4b0 [Davies Liu] refactor of tests, speedup 100%
      4f01265f
    • Sandy Ryza's avatar
      [SPARK-2461] [PySpark] Add a toString method to GeneralizedLinearModel · 20ea54cc
      Sandy Ryza authored
      Add a toString method to GeneralizedLinearModel, also change `__str__` to `__repr__` for some classes, to provide better message in repr.
      
      This PR is based on #1388, thanks to sryza!
      
      closes #1388
      
      Author: Sandy Ryza <sandy@cloudera.com>
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2625 from davies/string and squashes the following commits:
      
      3544aad [Davies Liu] fix LinearModel
      0bcd642 [Davies Liu] Merge branch 'sandy-spark-2461' of github.com:sryza/spark
      1ce5c2d [Sandy Ryza] __repr__ back to __str__ in a couple places
      aa9e962 [Sandy Ryza] Switch __str__ to __repr__
      a0c5041 [Sandy Ryza] Add labels back in
      1aa17f5 [Sandy Ryza] Match existing conventions
      fac1bc4 [Sandy Ryza] Fix PEP8 error
      f7b58ed [Sandy Ryza] SPARK-2461. Add a toString method to GeneralizedLinearModel
      20ea54cc
  3. Oct 05, 2014
    • scwf's avatar
      [SPARK-3765][Doc] Add test information to sbt build docs · c9ae79fb
      scwf authored
      Add testing with sbt to doc ```building-spark.md```
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2629 from scwf/sbt-doc and squashes the following commits:
      
      fd9cf29 [scwf] add testing with sbt to docs
      c9ae79fb
    • Nathan Kronenfeld's avatar
      Rectify gereneric parameter names between SparkContext and AccumulablePa... · fd7b1553
      Nathan Kronenfeld authored
      AccumulableParam gave its generic parameters as 'R, T', whereas SparkContext labeled them 'T, R'.
      
      Trivial, but really confusing.
      
      I resolved this in favor of AccumulableParam, because it seemed to have some logic for its names.  I also extended this minimal, but at least present, justification into the SparkContext comments.
      
      Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com>
      
      Closes #2637 from nkronenfeld/accumulators and squashes the following commits:
      
      98d6b74 [Nathan Kronenfeld] Rectify gereneric parameter names between SparkContext and AccumulableParam
      fd7b1553
    • Sean Owen's avatar
      SPARK-3794 [CORE] Building spark core fails due to inadvertent dependency on Commons IO · 8d22dbb5
      Sean Owen authored
      Remove references to Commons IO FileUtils and replace with pure Java version, which doesn't need to traverse the whole directory tree first.
      
      I think this method could be refined further if it would be alright to rename it and its args and break it down into two methods. I'm starting with a simple recursive rendition.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2662 from srowen/SPARK-3794 and squashes the following commits:
      
      4cd172f [Sean Owen] Remove references to Commons IO FileUtils and replace with pure Java version, which doesn't need to traverse the whole directory tree first
      8d22dbb5
    • Renat Yusupov's avatar
      [SPARK-3776][SQL] Wrong conversion to Catalyst for Option[Product] · 90897ea5
      Renat Yusupov authored
      Author: Renat Yusupov <re.yusupov@2gis.ru>
      
      Closes #2641 from r3natko/feature/catalyst_option and squashes the following commits:
      
      55d0c06 [Renat Yusupov] [SQL] SPARK-3776: Wrong conversion to Catalyst for Option[Product]
      90897ea5
    • Cheng Lian's avatar
      [SPARK-3645][SQL] Makes table caching eager by default and adds syntax for lazy caching · 34b97a06
      Cheng Lian authored
      Although lazy caching for in-memory table seems consistent with the `RDD.cache()` API, it's relatively confusing for users who mainly work with SQL and not familiar with Spark internals. The `CACHE TABLE t; SELECT COUNT(*) FROM t;` pattern is also commonly seen just to ensure predictable performance.
      
      This PR makes both the `CACHE TABLE t [AS SELECT ...]` statement and the `SQLContext.cacheTable()` API eager by default, and adds a new `CACHE LAZY TABLE t [AS SELECT ...]` syntax to provide lazy in-memory table caching.
      
      Also, took the chance to make some refactoring: `CacheCommand` and `CacheTableAsSelectCommand` are now merged and renamed to `CacheTableCommand` since the former is strictly a special case of the latter. A new `UncacheTableCommand` is added for the `UNCACHE TABLE t` statement.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2513 from liancheng/eager-caching and squashes the following commits:
      
      fe92287 [Cheng Lian] Makes table caching eager by default and adds syntax for lazy caching
      34b97a06
    • scwf's avatar
      [SPARK-3792][SQL] Enable JavaHiveQLSuite · 58f5361c
      scwf authored
      Do not use TestSQLContext in JavaHiveQLSuite, that may lead to two SparkContexts in one jvm and enable JavaHiveQLSuite
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2652 from scwf/fix-JavaHiveQLSuite and squashes the following commits:
      
      be35c91 [scwf] enable JavaHiveQLSuite
      58f5361c
    • Liang-Chi Hsieh's avatar
      [Minor] Trivial fix to make codes more readable · 79b2108d
      Liang-Chi Hsieh authored
      It should just use `maxResults` there.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2654 from viirya/trivial_fix and squashes the following commits:
      
      1362289 [Liang-Chi Hsieh] Trivial fix to make codes more readable.
      79b2108d
    • Patrick Wendell's avatar
      HOTFIX: Fix unicode error in merge script. · e222221e
      Patrick Wendell authored
      The merge script builds up a big command array and sometimes
      this contains both unicode and ascii strings. This doesn't work
      if you try to join them into a single string. Longer term a solution
      is to go and make sure the source of all strings is unicode.
      
      This patch provides a simpler solution... just print the array
      rather than joining. I actually prefer printing an array here
      anyways since joining on spaces is lossy in the case of arguments
      that themselves contain spaces.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2645 from pwendell/merge-script and squashes the following commits:
      
      167b792 [Patrick Wendell] HOTFIX: Fix unicode error in merge script.
      e222221e
    • Cheng Lian's avatar
      [SPARK-3007][SQL] Fixes dynamic partitioning support for lower Hadoop versions · 1b97a941
      Cheng Lian authored
      This is a follow up of #2226 and #2616 to fix Jenkins master SBT build failures for lower Hadoop versions (1.0.x and 2.0.x).
      
      The root cause is the semantics difference of `FileSystem.globStatus()` between different versions of Hadoop, as illustrated by the following test code:
      
      ```scala
      object GlobExperiments extends App {
        val conf = new Configuration()
        val fs = FileSystem.getLocal(conf)
        fs.globStatus(new Path("/tmp/wh/*/*/*")).foreach { status =>
          println(status.getPath)
        }
      }
      ```
      
      Target directory structure:
      
      ```
      /tmp/wh
      ├── dir0
      │   ├── dir1
      │   │   └── level2
      │   └── level1
      └── level0
      ```
      
      Hadoop 2.4.1 result:
      
      ```
      file:/tmp/wh/dir0/dir1/level2
      ```
      
      Hadoop 1.0.4 resuet:
      
      ```
      file:/tmp/wh/dir0/dir1/level2
      file:/tmp/wh/dir0/level1
      file:/tmp/wh/level0
      ```
      
      In #2226 and #2616, we call `FileOutputCommitter.commitJob()` at the end of the job, and the `_SUCCESS` mark file is written. When working with lower Hadoop versions, due to the `globStatus()` semantics issue, `_SUCCESS` is included as a separate partition data file by `Hive.loadDynamicPartitions()`, and fails partition spec checking.  The fix introduced in this PR is kind of a hack: when inserting data with dynamic partitioning, we intentionally avoid writing the `_SUCCESS` marker to workaround this issue.
      
      Hive doesn't suffer this issue because `FileSinkOperator` doesn't call `FileOutputCommitter.commitJob()`, instead, it calls `Utilities.mvFileToFinalPath()` to cleanup the output directory and then loads it into Hive warehouse by with `loadDynamicPartitions()`/`loadPartition()`/`loadTable()`. This approach is better because it handles failed job and speculative tasks properly. We should add this step to `InsertIntoHiveTable` in another PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2663 from liancheng/dp-hadoop-1-fix and squashes the following commits:
      
      0177dae [Cheng Lian] Fixes dynamic partitioning support for lower Hadoop versions
      1b97a941
    • zsxwing's avatar
      SPARK-1656: Fix potential resource leaks · a7c73130
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1656
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #577 from zsxwing/SPARK-1656 and squashes the following commits:
      
      c431095 [zsxwing] Add a comment and fix the code style
      2de96e5 [zsxwing] Make sure file will be deleted if exception happens
      28b90dc [zsxwing] Update to follow the code style
      4521d6e [zsxwing] Merge branch 'master' into SPARK-1656
      afc3383 [zsxwing] Update to follow the code style
      071fdd1 [zsxwing] SPARK-1656: Fix potential resource leaks
      a7c73130
    • Brenden Matthews's avatar
      [SPARK-3597][Mesos] Implement `killTask`. · 32fad423
      Brenden Matthews authored
      The MesosSchedulerBackend did not previously implement `killTask`,
      resulting in an exception.
      
      Author: Brenden Matthews <brenden@diddyinc.com>
      
      Closes #2453 from brndnmtthws/implement-killtask and squashes the following commits:
      
      23ddcdc [Brenden Matthews] [SPARK-3597][Mesos] Implement `killTask`.
      32fad423
  4. Oct 03, 2014
    • mcheah's avatar
      [SPARK-1860] More conservative app directory cleanup. · cf1d32e3
      mcheah authored
      First contribution to the project, so apologize for any significant errors.
      
      This PR addresses [SPARK-1860]. The application directories are now cleaned up in a more conservative manner.
      
      Previously, app-* directories were cleaned up if the directory's timestamp was older than a given time. However, the timestamp on a directory does not reflect the modification times of the files in that directory. Therefore, app-* directories were wiped out even if the files inside them were created recently and possibly being used by Executor tasks.
      
      The solution is to change the cleanup logic to inspect all files within the app-* directory and only eliminate the app-* directory if all files in the directory are stale.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #2609 from mccheah/worker-better-app-dir-cleanup and squashes the following commits:
      
      87b5d03 [mcheah] [SPARK-1860] Using more string interpolation. Better error logging.
      802473e [mcheah] [SPARK-1860] Cleaning up the logs generated when cleaning directories.
      e0a1f2e [mcheah] [SPARK-1860] Fixing broken unit test.
      77a9de0 [mcheah] [SPARK-1860] More conservative app directory cleanup.
      cf1d32e3
    • Kousuke Saruta's avatar
      [SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History... · 79e45c93
      Kousuke Saruta authored
      [SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input
      
      This PR is another solution for #2250
      
      I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I saw following 2 problems.
      
      (1) When applications which have same spark.app.name run on cluster at the same time, some metrics names are mixed. For instance, if 2+ application is running on the cluster at the same time, each application emits the same named metric like "SparkPi.DAGScheduler.stage.failedStages" and Graphite cannot distinguish the metrics is for which application.
      
      (2) When 2+ executors run on the same machine, JVM metrics of each executors are mixed. For instance, 2+ executors running on the same node can emit the same named metric "jvm.memory" and Graphite cannot distinguish the metrics is from which application.
      
      And there is an similar issue. The directory for event logs is named using application name.
      Application name is defined by user and the name can includes illegal character for path names.
      Further more, the directory name consists of application name and System.currentTimeMillis even though each application has unique Application ID so if we run jobs which have same name, it's difficult to identify which directory is for which application.
      
      Closes #2250
      Closes #1067
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2432 from sarutak/metrics-structure-improvement2 and squashes the following commits:
      
      3288b2b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      39169e4 [Kousuke Saruta] Fixed style
      6570494 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      817e4f0 [Kousuke Saruta] Simplified MetricsSystem#buildRegistryName
      67fa5eb [Kousuke Saruta] Unified MetricsSystem#registerSources and registerSinks in start
      10be654 [Kousuke Saruta] Fixed style.
      990c078 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      f0c7fba [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      59cc2cd [Kousuke Saruta] Modified SparkContextSchedulerCreationSuite
      f9b6fb3 [Kousuke Saruta] Modified style.
      2cf8a0f [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      389090d [Kousuke Saruta] Replaced taskScheduler.applicationId() with getApplicationId in SparkContext#postApplicationStart
      ff45c89 [Kousuke Saruta] Added some test cases to MetricsSystemSuite
      69c46a6 [Kousuke Saruta] Added warning logging logic to MetricsSystem#buildRegistryName
      5cca0d2 [Kousuke Saruta] Added Javadoc comment to SparkContext#getApplicationId
      16a9f01 [Kousuke Saruta] Added data types to be returned to some methods
      6434b06 [Kousuke Saruta] Reverted changes related to ApplicationId
      0413b90 [Kousuke Saruta] Deleted ApplicationId.java and ApplicationIdSuite.java
      a42300c [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      0fc1b09 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      42bea55 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      248935d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      f6af132 [Kousuke Saruta] Modified SchedulerBackend and TaskScheduler to return System.currentTimeMillis as an unique Application Id
      1b8b53e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      97cb85c [Kousuke Saruta] Modified confliction of MimExcludes
      2cdd009 [Kousuke Saruta] Modified defailt implementation of applicationId
      9aadb0b [Kousuke Saruta] Modified NetworkReceiverSuite to ensure "executor.start()" is finished in test "network receiver life cycle"
      3011efc [Kousuke Saruta] Added ApplicationIdSuite.scala
      d009c55 [Kousuke Saruta] Modified ApplicationId#equals to compare appIds
      dfc83fd [Kousuke Saruta] Modified ApplicationId to implement Serializable
      9ff4851 [Kousuke Saruta] Modified MimaExcludes.scala to ignore createTaskScheduler method in SparkContext
      4567ffc [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      6a91b14 [Kousuke Saruta] Modified SparkContextSchedulerCreationSuite, ExecutorRunnerTest and EventLoggingListenerSuite
      0325caf [Kousuke Saruta] Added ApplicationId.scala
      0a2fc14 [Kousuke Saruta] Modified style
      eabda80 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      0f890e6 [Kousuke Saruta] Modified SparkDeploySchedulerBackend and Master to pass baseLogDir instead f eventLogDir
      bcf25bf [Kousuke Saruta] Modified directory name for EventLogs
      28d4d93 [Kousuke Saruta] Modified SparkContext and EventLoggingListener so that the directory for EventLogs is named same for Application ID
      203634e [Kousuke Saruta] Modified comment in SchedulerBackend#applicationId and TaskScheduler#applicationId
      424fea4 [Kousuke Saruta] Modified  the subclasses of TaskScheduler and SchedulerBackend so that they can return non-optional Unique Application ID
      b311806 [Kousuke Saruta] Swapped last 2 arguments passed to CoarseGrainedExecutorBackend
      8a2b6ec [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      086ee25 [Kousuke Saruta] Merge branch 'metrics-structure-improvement2' of github.com:sarutak/spark into metrics-structure-improvement2
      e705386 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      36d2f7a [Kousuke Saruta] Added warning message for the situation we cannot get application id for the prefix for the name of metrics
      eea6e19 [Kousuke Saruta] Modified CoarseGrainedMesosSchedulerBackend and MesosSchedulerBackend so that we can get Application ID
      c229fbe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      e719c39 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      4a93c7f [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement2
      4776f9e [Kousuke Saruta] Modified MetricsSystemSuite.scala
      efcb6e1 [Kousuke Saruta] Modified to add application id to metrics name
      2ec848a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      3ea7896 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      ead8966 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      08e627e [Kousuke Saruta] Revert "tmp"
      7b67f5a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      45bd33d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      93e263a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      848819c [Kousuke Saruta] Merge branch 'metrics-structure-improvement' of github.com:sarutak/spark into metrics-structure-improvement
      912a637 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      e4a4593 [Kousuke Saruta] tmp
      3e098d8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      4603a39 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      fa7175b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      15f88a3 [Kousuke Saruta] Modified MetricsSystem#buildRegistryName because conf.get does not return null when correspondin entry is absent
      6f7dcd4 [Kousuke Saruta] Modified constructor of DAGSchedulerSource and BlockManagerSource because the instance of SparkContext is no longer used
      6fc5560 [Kousuke Saruta] Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockManagerSource
      4e057c9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into metrics-structure-improvement
      85ffc02 [Kousuke Saruta] Revert "Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockManagerSource"
      868e326 [Kousuke Saruta] Modified MetricsSystem to set registry name with unique application-id and driver/executor-id
      71609f5 [Kousuke Saruta] Modified sourceName of ExecutorSource, DAGSchedulerSource and BlockManagerSource
      55debab [Kousuke Saruta] Modified SparkContext and Executor to set spark.executor.id to identifiers
      4180993 [Kousuke Saruta] Modified SparkContext to retain spark.unique.app.name property in SparkConf
      79e45c93
    • Kousuke Saruta's avatar
      [SPARK-3763] The example of building with sbt should be "sbt assembly" instead of "sbt compile" · 1eb8389c
      Kousuke Saruta authored
      In building-spark.md, there are some examples for making assembled package with maven but the example for building with sbt is only about for compiling.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2627 from sarutak/SPARK-3763 and squashes the following commits:
      
      fadb990 [Kousuke Saruta] Modified the example to build with sbt in building-spark.md
      1eb8389c
    • Marcelo Vanzin's avatar
      [SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA. · 30abef15
      Marcelo Vanzin authored
      The existing code only considered one of the RMs when running in
      Yarn HA mode, so it was possible to get errors if the active RM
      was not registered in the filter.
      
      The change makes use of a new API added to Yarn that returns all
      proxy addresses, and falls back to the old behavior if the API
      is not present. While there, I also made a change to look for the
      scheme (http or https) being used by Yarn when building the proxy
      URIs.
      
      Since, in the case of multiple RMs, Yarn uses commas as a separator,
      it was not possible anymore to use spark.filter.params to propagate
      this information (which used commas to delimit different config params).
      Instead, I added a new param (spark.filter.jsonParams) which expects
      a JSON string containing a map with the config data. I chose not to
      add it to the documentation at this point since I don't believe users
      will use it directly.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2469 from vanzin/SPARK-3606 and squashes the following commits:
      
      aeb458a [Marcelo Vanzin] Undelete needed import.
      65e400d [Marcelo Vanzin] Remove unused import.
      d121883 [Marcelo Vanzin] Use separate config for each param instead of json.
      04bc156 [Marcelo Vanzin] Review feedback.
      4d4d6b9 [Marcelo Vanzin] [SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA.
      30abef15
    • Masayoshi TSUZUKI's avatar
      [SPARK-3774] typo comment in bin/utils.sh · e5566e05
      Masayoshi TSUZUKI authored
      Modified the comment of bin/utils.sh.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2639 from tsudukim/feature/SPARK-3774 and squashes the following commits:
      
      707b779 [Masayoshi TSUZUKI] [SPARK-3774] typo comment in bin/utils.sh
      e5566e05
    • Masayoshi TSUZUKI's avatar
      [SPARK-3775] Not suitable error message in spark-shell.cmd · 358d7ffd
      Masayoshi TSUZUKI authored
      Modified some sentence of error message in bin\*.cmd.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2640 from tsudukim/feature/SPARK-3775 and squashes the following commits:
      
      3458afb [Masayoshi TSUZUKI] [SPARK-3775] Not suitable error message in spark-shell.cmd
      358d7ffd
    • Brenden Matthews's avatar
      [SPARK-3535][Mesos] Fix resource handling. · a8c52d53
      Brenden Matthews authored
      Author: Brenden Matthews <brenden@diddyinc.com>
      
      Closes #2401 from brndnmtthws/master and squashes the following commits:
      
      4abaa5d [Brenden Matthews] [SPARK-3535][Mesos] Fix resource handling.
      a8c52d53
    • Michael Armbrust's avatar
      [SPARK-3212][SQL] Use logical plan matching instead of temporary tables for table caching · 6a1d48f4
      Michael Armbrust authored
      _Also addresses: SPARK-1671, SPARK-1379 and SPARK-3641_
      
      This PR introduces a new trait, `CacheManger`, which replaces the previous temporary table based caching system.  Instead of creating a temporary table that shadows an existing table with and equivalent cached representation, the cached manager maintains a separate list of logical plans and their cached data.  After optimization, this list is searched for any matching plan fragments.  When a matching plan fragment is found it is replaced with the cached data.
      
      There are several advantages to this approach:
       - Calling .cache() on a SchemaRDD now works as you would expect, and uses the more efficient columnar representation.
       - Its now possible to provide a list of temporary tables, without having to decide if a given table is actually just a  cached persistent table. (To be done in a follow-up PR)
       - In some cases it is possible that cached data will be used, even if a cached table was not explicitly requested.  This is because we now look at the logical structure instead of the table name.
       - We now correctly invalidate when data is inserted into a hive table.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2501 from marmbrus/caching and squashes the following commits:
      
      63fbc2c [Michael Armbrust] Merge remote-tracking branch 'origin/master' into caching.
      0ea889e [Michael Armbrust] Address comments.
      1e23287 [Michael Armbrust] Add support for cache invalidation for hive inserts.
      65ed04a [Michael Armbrust] fix tests.
      bdf9a3f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into caching
      b4b77f2 [Michael Armbrust] Address comments
      6923c9d [Michael Armbrust] More comments / tests
      80f26ac [Michael Armbrust] First draft of improved semantics for Spark SQL caching.
      6a1d48f4
    • Cheng Lian's avatar
      [SPARK-3007][SQL] Adds dynamic partitioning support · bec0d0ea
      Cheng Lian authored
      PR #2226 was reverted because it broke Jenkins builds for unknown reason. This debugging PR aims to fix the Jenkins build.
      
      This PR also fixes two bugs:
      
      1. Compression configurations in `InsertIntoHiveTable` are disabled by mistake
      
         The `FileSinkDesc` object passed to the writer container doesn't have compression related configurations. These configurations are not taken care of until `saveAsHiveFile` is called. This PR moves compression code forward, right after instantiation of the `FileSinkDesc` object.
      
      1. `PreInsertionCasts` doesn't take table partitions into account
      
         In `castChildOutput`, `table.attributes` only contains non-partition columns, thus for partitioned table `childOutputDataTypes` never equals to `tableOutputDataTypes`. This results funny analyzed plan like this:
      
         ```
         == Analyzed Logical Plan ==
         InsertIntoTable Map(partcol1 -> None, partcol2 -> None), false
          MetastoreRelation default, dynamic_part_table, None
          Project [c_0#1164,c_1#1165,c_2#1166]
           Project [c_0#1164,c_1#1165,c_2#1166]
            Project [c_0#1164,c_1#1165,c_2#1166]
             ... (repeats 99 times) ...
              Project [c_0#1164,c_1#1165,c_2#1166]
               Project [c_0#1164,c_1#1165,c_2#1166]
                Project [1 AS c_0#1164,1 AS c_1#1165,1 AS c_2#1166]
                 Filter (key#1170 = 150)
                  MetastoreRelation default, src, None
         ```
      
         Awful though this logical plan looks, it's harmless because all projects will be eliminated by optimizer. Guess that's why this issue hasn't been caught before.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      Author: baishuo(白硕) <vc_java@hotmail.com>
      Author: baishuo <vc_java@hotmail.com>
      
      Closes #2616 from liancheng/dp-fix and squashes the following commits:
      
      21935b6 [Cheng Lian] Adds back deleted trailing space
      f471c4b [Cheng Lian] PreInsertionCasts should take table partitions into account
      a132c80 [Cheng Lian] Fixes output compression
      9c6eb2d [Cheng Lian] Adds tests to verify dynamic partitioning folder layout
      0eed349 [Cheng Lian] Addresses @yhuai's comments
      26632c3 [Cheng Lian] Adds more tests
      9227181 [Cheng Lian] Minor refactoring
      c47470e [Cheng Lian] Refactors InsertIntoHiveTable to a Command
      6fb16d7 [Cheng Lian] Fixes typo in test name, regenerated golden answer files
      d53daa5 [Cheng Lian] Refactors dynamic partitioning support
      b821611 [baishuo] pass check style
      997c990 [baishuo] use HiveConf.DEFAULTPARTITIONNAME to replace hive.exec.default.partition.name
      761ecf2 [baishuo] modify according micheal's advice
      207c6ac [baishuo] modify for some bad indentation
      caea6fb [baishuo] modify code to pass scala style checks
      b660e74 [baishuo] delete a empty else branch
      cd822f0 [baishuo] do a little modify
      8e7268c [baishuo] update file after test
      3f91665 [baishuo(白硕)] Update Cast.scala
      8ad173c [baishuo(白硕)] Update InsertIntoHiveTable.scala
      051ba91 [baishuo(白硕)] Update Cast.scala
      d452eb3 [baishuo(白硕)] Update HiveQuerySuite.scala
      37c603b [baishuo(白硕)] Update InsertIntoHiveTable.scala
      98cfb1f [baishuo(白硕)] Update HiveCompatibilitySuite.scala
      6af73f4 [baishuo(白硕)] Update InsertIntoHiveTable.scala
      adf02f1 [baishuo(白硕)] Update InsertIntoHiveTable.scala
      1867e23 [baishuo(白硕)] Update SparkHadoopWriter.scala
      6bb5880 [baishuo(白硕)] Update HiveQl.scala
      bec0d0ea
    • Marcelo Vanzin's avatar
      [SPARK-2778] [yarn] Add workaround for race in MiniYARNCluster. · fbe8e985
      Marcelo Vanzin authored
      Sometimes the cluster's start() method returns before the configuration
      having been updated, which is done by ClientRMService in, I assume, a
      separate thread (otherwise there would be no race). That can cause tests
      to fail if the old configuration data is read, since it will contain
      the wrong RM address.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2605 from vanzin/SPARK-2778 and squashes the following commits:
      
      8d02ce0 [Marcelo Vanzin] Minor cleanup.
      5bebee7 [Marcelo Vanzin] [SPARK-2778] [yarn] Add workaround for race in MiniYARNCluster.
      fbe8e985
    • ravipesala's avatar
      [SPARK-2693][SQL] Supported for UDAF Hive Aggregates like PERCENTILE · 22f8e1ee
      ravipesala authored
      Implemented UDAF Hive aggregates by adding wrapper to Spark Hive.
      
      Author: ravipesala <ravindra.pesala@huawei.com>
      
      Closes #2620 from ravipesala/SPARK-2693 and squashes the following commits:
      
      a8df326 [ravipesala] Removed resolver from constructor arguments
      caf25c6 [ravipesala] Fixed style issues
      5786200 [ravipesala] Supported for UDAF Hive Aggregates like PERCENTILE
      22f8e1ee
    • WangTaoTheTonic's avatar
      [SPARK-3696]Do not override the user-difined conf_dir · 9d320e22
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-3696
      
      We see if SPARK_CONF_DIR is already defined before assignment.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #2541 from WangTaoTheTonic/confdir and squashes the following commits:
      
      c3f31e0 [WangTaoTheTonic] Do not override the user-difined conf_dir
      9d320e22
    • EugenCepoi's avatar
      SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR · f0811f92
      EugenCepoi authored
      Update of PR #997.
      
      With this PR, setting SPARK_CONF_DIR overrides SPARK_HOME/conf (not only spark-defaults.conf and spark-env).
      
      Author: EugenCepoi <cepoi.eugen@gmail.com>
      
      Closes #2481 from EugenCepoi/SPARK-2058 and squashes the following commits:
      
      0bb32c2 [EugenCepoi] use orElse orNull and fixing trailing percent in compute-classpath.cmd
      77f35d7 [EugenCepoi] SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR
      f0811f92
    • qiping.lqp's avatar
      [SPARK-3366][MLLIB]Compute best splits distributively in decision tree · 2e4eae3a
      qiping.lqp authored
      Currently, all best splits are computed on the driver, which makes the driver a bottleneck for both communication and computation. This PR fix this problem by computed best splits on executors.
      Instead of send all aggregate stats to the driver node, we can send aggregate stats for a node to a particular executor, using `reduceByKey` operation, then we can compute best split for this node there.
      
      Implementation details:
      
      Each node now has a nodeStatsAggregator, which save aggregate stats for all features and bins.
      First use mapPartition to compute node aggregate stats for all nodes in each partition.
      Then transform node aggregate stats to (nodeIndex, nodeStatsAggregator) pairs and use to `reduceByKey` operation to combine nodeStatsAggregator for the same node.
      After all stats have been combined, best splits can be computed for each node based on the node aggregate stats. Best split result is collected to driver to construct the decision tree.
      
      CC: mengxr manishamde jkbradley, please help me review this, thanks.
      
      Author: qiping.lqp <qiping.lqp@alibaba-inc.com>
      Author: chouqin <liqiping1991@gmail.com>
      
      Closes #2595 from chouqin/dt-dist-agg and squashes the following commits:
      
      db0d24a [chouqin] fix a minor bug and adjust code
      a0d9de3 [chouqin] adjust code based on comments
      9f201a6 [chouqin] fix bug: statsSize -> allStatsSize
      a8a7ed0 [chouqin] Merge branch 'master' of https://github.com/apache/spark into dt-dist-agg
      f13b346 [chouqin] adjust randomforest comments
      c32636e [chouqin] adjust code based on comments
      ac6a505 [chouqin] adjust code based on comments
      7bbb787 [chouqin] add comments
      bdd2a63 [qiping.lqp] fix test suite
      a75df27 [qiping.lqp] fix test suite
      b5b0bc2 [qiping.lqp] fix style
      e76414f [qiping.lqp] fix testsuite
      748bd45 [qiping.lqp] fix type-mismatch bug
      24eacd8 [qiping.lqp] fix type-mismatch bug
      5f63d6c [qiping.lqp] add multiclassification using One-Vs-All strategy
      4f56496 [qiping.lqp] fix bug
      f00fc22 [qiping.lqp] fix bug
      532993a [qiping.lqp] Compute best splits distributively in decision tree
      2e4eae3a
Loading