Skip to content
Snippets Groups Projects
  1. Sep 18, 2014
  2. Sep 17, 2014
    • WangTaoTheTonic's avatar
      [SPARK-3565]Fix configuration item not consistent with document · 3f169bfe
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-3565
      
      "spark.ports.maxRetries" should be "spark.port.maxRetries". Make the configuration keys in document and code consistent.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #2427 from WangTaoTheTonic/fixPortRetries and squashes the following commits:
      
      c178813 [WangTaoTheTonic] Use blank lines trigger Jenkins
      646f3fe [WangTaoTheTonic] also in SparkBuild.scala
      3700dba [WangTaoTheTonic] Fix configuration item not consistent with document
      3f169bfe
    • Kousuke Saruta's avatar
      [SPARK-3567] appId field in SparkDeploySchedulerBackend should be volatile · 1147973f
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2428 from sarutak/appid-volatile-modification and squashes the following commits:
      
      c7d890d [Kousuke Saruta] Added volatile modifier to appId field in SparkDeploySchedulerBackend
      1147973f
    • Kousuke Saruta's avatar
      [SPARK-3564][WebUI] Display App ID on HistoryPage · 6688a266
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2424 from sarutak/display-appid-on-webui and squashes the following commits:
      
      417fe90 [Kousuke Saruta] Added "App ID column" to HistoryPage
      6688a266
    • Kousuke Saruta's avatar
      [SPARK-3571] Spark standalone cluster mode doesn't work. · cbc06503
      Kousuke Saruta authored
      I think, this issue is caused by #1106
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2436 from sarutak/SPARK-3571 and squashes the following commits:
      
      7a4deea [Kousuke Saruta] Modified Master.scala to use numWorkersVisited and numWorkersAlive instead of stopPos
      4e51e35 [Kousuke Saruta] Modified Master to prevent from 0 divide
      4817ecd [Kousuke Saruta] Brushed up previous change
      71e84b6 [Kousuke Saruta] Modified Master to enable schedule normally
      cbc06503
    • Nicholas Chammas's avatar
      [SPARK-3534] Fix expansion of testing arguments to sbt · 7fc3bb7c
      Nicholas Chammas authored
      Testing arguments to `sbt` need to be passed as an array, not a single, long string.
      
      Fixes a bug introduced in #2420.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2437 from nchammas/selective-testing and squashes the following commits:
      
      a9f9c1c [Nicholas Chammas] fix printing of sbt test arguments
      cf57cbf [Nicholas Chammas] fix sbt test arguments
      e33b978 [Nicholas Chammas] Merge pull request #2 from apache/master
      0b47ca4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark
      8051486 [Nicholas Chammas] Merge pull request #1 from apache/master
      03180a4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark
      d4c5f43 [Nicholas Chammas] Merge pull request #6 from apache/master
      7fc3bb7c
    • Andrew Ash's avatar
      Docs: move HA subsections to a deeper indentation level · b3830b28
      Andrew Ash authored
      Makes the table of contents read better
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #2402 from ash211/docs/better-indentation and squashes the following commits:
      
      ea0e130 [Andrew Ash] Move HA subsections to a deeper indentation level
      b3830b28
    • Nicholas Chammas's avatar
      [SPARK-1455] [SPARK-3534] [Build] When possible, run SQL tests only. · 5044e495
      Nicholas Chammas authored
      If the only files changed are related to SQL, then only run the SQL tests.
      
      This patch includes some cosmetic/maintainability refactoring. I would be more than happy to undo some of these changes if they are inappropriate.
      
      We can accept this patch mostly as-is and address the immediate need documented in [SPARK-3534](https://issues.apache.org/jira/browse/SPARK-3534), or we can keep it open until a satisfactory solution along the lines [discussed here](https://issues.apache.org/jira/browse/SPARK-1455?focusedCommentId=14136424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14136424) is reached.
      
      Note: I had to hack this patch up to test it locally, so what I'm submitting here and what I tested are technically different.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2420 from nchammas/selective-testing and squashes the following commits:
      
      db3fa2d [Nicholas Chammas] diff against master!
      f9e23f6 [Nicholas Chammas] when possible, run SQL tests only
      5044e495
    • Michael Armbrust's avatar
      [SQL][DOCS] Improve table caching section · cbf983bb
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2434 from marmbrus/patch-1 and squashes the following commits:
      
      67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section
      cbf983bb
    • Nicholas Chammas's avatar
      [Docs] minor grammar fix · 8fbd5f4a
      Nicholas Chammas authored
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2430 from nchammas/patch-2 and squashes the following commits:
      
      d476bfb [Nicholas Chammas] [Docs] minor grammar fix
      8fbd5f4a
    • chesterxgchen's avatar
      SPARK-3177 (on Master Branch) · 7d1a3723
      chesterxgchen authored
      The JIRA and PR was original created for branch-1.1, and move to master branch now.
      Chester
      
      The Issue is due to that yarn-alpha and yarn have different APIs for certain class fields. In this particular case,  the ClientBase using reflection to to address this issue, and we need to different way to test the ClientBase's method.  Original ClientBaseSuite using getFieldValue() method to do this. But it doesn't work for yarn-alpha as the API returns an array of String instead of just String (which is the case for Yarn-stable API).
      
       To fix the test, I add a new method
      
        def getFieldValue2[A: ClassTag, A1: ClassTag, B](clazz: Class[_], field: String,
                                                            defaults: => B)
                                    (mapTo:  A => B)(mapTo1: A1 => B) : B =
          Try(clazz.getField(field)).map(_.get(null)).map {
            case v: A => mapTo(v)
            case v1: A1 => mapTo1(v1)
            case _ => defaults
          }.toOption.getOrElse(defaults)
      
      to handle the cases where the field type can be either type A or A1. In this new method the type A or A1 is pattern matched and corresponding mapTo function (mapTo or mapTo1) is used.
      
      Author: chesterxgchen <chester@alpinenow.com>
      
      Closes #2204 from chesterxgchen/SPARK-3177-master and squashes the following commits:
      
      e72a6ea [chesterxgchen]  The Issue is due to that yarn-alpha and yarn have different APIs for certain class fields. In this particular case,  the ClientBase using reflection to to address this issue, and we need to different way to test the ClientBase's method.  Original ClientBaseSuite using getFieldValue() method to do this. But it doesn't work for yarn-alpha as the API returns an array of String instead of just String (which is the case for Yarn-stable API).
      7d1a3723
    • viper-kun's avatar
      [Docs] Correct spark.files.fetchTimeout default value · 983609a4
      viper-kun authored
      change the value of spark.files.fetchTimeout
      
      Author: viper-kun <xukun.xu@huawei.com>
      
      Closes #2406 from viper-kun/master and squashes the following commits:
      
      ecb0d46 [viper-kun] [Docs] Correct spark.files.fetchTimeout default value
      7cf4c7a [viper-kun] Update configuration.md
      983609a4
  3. Sep 16, 2014
    • wangfei's avatar
      [Minor]ignore all config files in conf · 008a5ed4
      wangfei authored
      Some config files in ```conf``` should ignore, such as
              conf/fairscheduler.xml
              conf/hive-log4j.properties
              conf/metrics.properties
      ...
      So ignore all ```sh```/```properties```/```conf```/```xml``` files
      
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #2395 from scwf/patch-2 and squashes the following commits:
      
      3dc53f2 [wangfei] duplicate ```conf/*.conf```
      3c2986f [wangfei] ignore all config files
      008a5ed4
    • Andrew Or's avatar
      [SPARK-3555] Fix UISuite race condition · 0a7091e6
      Andrew Or authored
      The test "jetty selects different port under contention" is flaky.
      
      If another process binds to 4040 before the test starts, then the first server we start there will fail, and the subsequent servers we start thereafter may successfully bind to 4040 if it was released between the servers starting. Instead, we should just let Java find a random free port for us and hold onto it for the duration of the test.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2418 from andrewor14/fix-port-contention and squashes the following commits:
      
      0cd4974 [Andrew Or] Stop them servers
      a7071fe [Andrew Or] Pick random port instead of 4040
      0a7091e6
    • Evan Chan's avatar
      Add a Community Projects page · a6e1712f
      Evan Chan authored
      This adds a new page to the docs listing community projects -- those created outside of Apache Spark that are of interest to the community of Spark users.   Anybody can add to it just by submitting a PR.
      
      There was a discussion thread about alternatives:
      * Creating a Github organization for Spark projects -  we could not find any sponsors for this, and it would be difficult to organize since many folks just create repos in their company organization or personal accounts
      * Apache has some place for storing community projects, but it was deemed difficult to work with, and again would be some permissions issues -- not everyone could update it.
      
      Author: Evan Chan <velvia@gmail.com>
      
      Closes #2219 from velvia/community-projects-page and squashes the following commits:
      
      7316822 [Evan Chan] Point to Spark wiki: supplemental projects page
      613b021 [Evan Chan] Add a few more projects
      a85eaaf [Evan Chan] Add a Community Projects page
      a6e1712f
    • Dan Osipov's avatar
      [SPARK-787] Add S3 configuration parameters to the EC2 deploy scripts · b2017126
      Dan Osipov authored
      When deploying to AWS, there is additional configuration that is required to read S3 files. EMR creates it automatically, there is no reason that the Spark EC2 script shouldn't.
      
      This PR requires a corresponding PR to the mesos/spark-ec2 to be merged, as it gets cloned in the process of setting up machines: https://github.com/mesos/spark-ec2/pull/58
      
      Author: Dan Osipov <daniil.osipov@shazam.com>
      
      Closes #1120 from danosipov/s3_credentials and squashes the following commits:
      
      758da8b [Dan Osipov] Modify documentation to include the new parameter
      71fab14 [Dan Osipov] Use a parameter --copy-aws-credentials to enable S3 credential deployment
      7e0da26 [Dan Osipov] Get AWS credentials out of boto connection instance
      39bdf30 [Dan Osipov] Add S3 configuration parameters to the EC2 deploy scripts
      b2017126
    • Davies Liu's avatar
      [SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx · ec1adecb
      Davies Liu authored
      Using Sphinx to generate API docs for PySpark.
      
      requirement: Sphinx
      
      ```
      $ cd python/docs/
      $ make html
      ```
      
      The generated API docs will be located at python/docs/_build/html/index.html
      
      It can co-exists with those generated by Epydoc.
      
      This is the first working version, after merging in, then we can continue to improve it and replace the epydoc finally.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2292 from davies/sphinx and squashes the following commits:
      
      425a3b1 [Davies Liu] cleanup
      1573298 [Davies Liu] move docs to python/docs/
      5fe3903 [Davies Liu] Merge branch 'master' into sphinx
      9468ab0 [Davies Liu] fix makefile
      b408f38 [Davies Liu] address all comments
      e2ccb1b [Davies Liu] update name and version
      9081ead [Davies Liu] generate PySpark API docs using Sphinx
      ec1adecb
    • Kousuke Saruta's avatar
      [SPARK-3546] InputStream of ManagedBuffer is not closed and causes running out of file descriptor · a9e91043
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2408 from sarutak/resolve-resource-leak-issue and squashes the following commits:
      
      074781d [Kousuke Saruta] Modified SuffleBlockFetcherIterator
      5f63f67 [Kousuke Saruta] Move metrics increment logic and debug logging outside try block
      b37231a [Kousuke Saruta] Modified FileSegmentManagedBuffer#nioByteBuffer to check null or not before invoking channel.close
      bf29d4a [Kousuke Saruta] Modified FileSegment to close channel
      a9e91043
    • Michael Armbrust's avatar
      [SQL][DOCS] Improve section on thrift-server · 84073eb1
      Michael Armbrust authored
      Taken from liancheng's updates. Merged conflicts with #2316.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2384 from marmbrus/sqlDocUpdate and squashes the following commits:
      
      2db6319 [Michael Armbrust] @liancheng's updates
      84073eb1
    • Nicholas Chammas's avatar
      [Docs] minor punctuation fix · df90e81f
      Nicholas Chammas authored
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2414 from nchammas/patch-1 and squashes the following commits:
      
      14664bf [Nicholas Chammas] [Docs] minor punctuation fix
      df90e81f
    • Aaron Staple's avatar
      [SPARK-2314][SQL] Override collect and take in python library, and count in... · 8e7ae477
      Aaron Staple authored
      [SPARK-2314][SQL] Override collect and take in python library, and count in java library, with optimized versions.
      
      SchemaRDD overrides RDD functions, including collect, count, and take, with optimized versions making use of the query optimizer.  The java and python interface classes wrapping SchemaRDD need to ensure the optimized versions are called as well.  This patch overrides relevant calls in the python and java interfaces with optimized versions.
      
      Adds a new Row serialization pathway between python and java, based on JList[Array[Byte]] versus the existing RDD[Array[Byte]]. I wasn’t overjoyed about doing this, but I noticed that some QueryPlans implement optimizations in executeCollect(), which outputs an Array[Row] rather than the typical RDD[Row] that can be shipped to python using the existing serialization code. To me it made sense to ship the Array[Row] over to python directly instead of converting it back to an RDD[Row] just for the purpose of sending the Rows to python using the existing serialization code.
      
      Author: Aaron Staple <aaron.staple@gmail.com>
      
      Closes #1592 from staple/SPARK-2314 and squashes the following commits:
      
      89ff550 [Aaron Staple] Merge with master.
      6bb7b6c [Aaron Staple] Fix typo.
      b56d0ac [Aaron Staple] [SPARK-2314][SQL] Override count in JavaSchemaRDD, forwarding to SchemaRDD's count.
      0fc9d40 [Aaron Staple] Fix comment typos.
      f03cdfa [Aaron Staple] [SPARK-2314][SQL] Override collect and take in sql.py, forwarding to SchemaRDD's collect.
      8e7ae477
    • Michael Armbrust's avatar
      [SPARK-2890][SQL] Allow reading of data when case insensitive resolution could... · 30f288ae
      Michael Armbrust authored
      [SPARK-2890][SQL] Allow reading of data when case insensitive resolution could cause possible ambiguity.
      
      Throwing an error in the constructor makes it possible to run queries, even when there is no actual ambiguity.  Remove this check in favor of throwing an error in analysis when they query is actually is ambiguous.
      
      Also took the opportunity to add test cases that would have caught a subtle bug in my first attempt at fixing this and refactor some other test code.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2209 from marmbrus/sameNameStruct and squashes the following commits:
      
      729cca4 [Michael Armbrust] Better tests.
      a003aeb [Michael Armbrust] Remove error (it'll be caught in analysis).
      30f288ae
    • Yin Huai's avatar
      [SPARK-3308][SQL] Ability to read JSON Arrays as tables · 75836998
      Yin Huai authored
      This PR aims to support reading top level JSON arrays and take every element in such an array as a row (an empty array will not generate a row).
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-3308
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2400 from yhuai/SPARK-3308 and squashes the following commits:
      
      990077a [Yin Huai] Handle top level JSON arrays.
      75836998
    • Matthew Farrellee's avatar
      [SPARK-3519] add distinct(n) to PySpark · 9d5fa763
      Matthew Farrellee authored
      Added missing rdd.distinct(numPartitions) and associated tests
      
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2383 from mattf/SPARK-3519 and squashes the following commits:
      
      30b837a [Matthew Farrellee] Combine test cases to save on JVM startups
      6bc4a2c [Matthew Farrellee] [SPARK-3519] add distinct(n) to SchemaRDD in PySpark
      7a17f2b [Matthew Farrellee] [SPARK-3519] add distinct(n) to PySpark
      9d5fa763
    • Cheng Hao's avatar
      [SPARK-3527] [SQL] Strip the string message · 86d253ec
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2392 from chenghao-intel/trim and squashes the following commits:
      
      e52024f [Cheng Hao] trim the string message
      86d253ec
    • Prashant Sharma's avatar
      [SPARK-2182] Scalastyle rule blocking non ascii characters. · 7b8008f5
      Prashant Sharma authored
      ...erators.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2358 from ScrapCodes/scalastyle-unicode and squashes the following commits:
      
      12a20f2 [Prashant Sharma] [SPARK-2182] Scalastyle rule blocking (non keyboard typeable) unicode operators.
      7b8008f5
    • Sean Owen's avatar
      SPARK-3069 [DOCS] Build instructions in README are outdated · 61e21fe7
      Sean Owen authored
      Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now.
      
      (Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2014 from srowen/SPARK-3069 and squashes the following commits:
      
      501507e [Sean Owen] Note that Zinc is for Maven builds too
      db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc
      be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark
      91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md
      999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT
      c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md
      8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page
      b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md
      61e21fe7
  4. Sep 15, 2014
    • Ye Xianjin's avatar
      [SPARK-3040] pick up a more proper local ip address for Utils.findLocalIpAddress method · febafefa
      Ye Xianjin authored
      Short version: NetworkInterface.getNetworkInterfaces returns ifs in reverse order compared to ifconfig output. It may pick up ip address associated with tun0 or virtual network interface.
      See [SPARK_3040](https://issues.apache.org/jira/browse/SPARK-3040) for more detail
      
      Author: Ye Xianjin <advancedxy@gmail.com>
      
      Closes #1946 from advancedxy/SPARK-3040 and squashes the following commits:
      
      f33f6b2 [Ye Xianjin] add windows support
      087a785 [Ye Xianjin] reverse the Networkinterface.getNetworkInterfaces output order to get a more proper local ip address.
      febafefa
    • Prashant Sharma's avatar
      [SPARK-3433][BUILD] Fix for Mima false-positives with @DeveloperAPI and @Experimental annotations. · ecf0c029
      Prashant Sharma authored
      Actually false positive reported was due to mima generator not picking up the new jars in presence of old jars(theoretically this should not have happened.). So as a workaround, ran them both separately and just append them together.
      
      Author: Prashant Sharma <prashant@apache.org>
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2285 from ScrapCodes/mima-fix and squashes the following commits:
      
      093c76f [Prashant Sharma] Update mima
      59012a8 [Prashant Sharma] Update mima
      35b6c71 [Prashant Sharma] SPARK-3433 Fix for Mima false-positives with @DeveloperAPI and @Experimental annotations.
      ecf0c029
    • Reynold Xin's avatar
      [SPARK-3540] Add reboot-slaves functionality to the ec2 script · d428ac6a
      Reynold Xin authored
      Tested on a real cluster.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2404 from rxin/ec2-reboot-slaves and squashes the following commits:
      
      00a2dbd [Reynold Xin] Allow rebooting slaves.
      d428ac6a
    • Aaron Staple's avatar
      [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. · 60050f42
      Aaron Staple authored
      Also made some cosmetic cleanups.
      
      Author: Aaron Staple <aaron.staple@gmail.com>
      
      Closes #2385 from staple/SPARK-1087 and squashes the following commits:
      
      7b3bb13 [Aaron Staple] Address review comments, cosmetic cleanups.
      10ba6e1 [Aaron Staple] [SPARK-1087] Move python traceback utilities into new traceback_utils.py file.
      60050f42
    • Davies Liu's avatar
      [SPARK-2951] [PySpark] support unpickle array.array for Python 2.6 · da33acb8
      Davies Liu authored
      Pyrolite can not unpickle array.array which pickled by Python 2.6, this patch fix it by extend Pyrolite.
      
      There is a bug in Pyrolite when unpickle array of float/double, this patch workaround it by reverse the endianness for float/double. This workaround should be removed after Pyrolite have a new release to fix this issue.
      
      I had send an PR to Pyrolite to fix it:  https://github.com/irmen/Pyrolite/pull/11
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2365 from davies/pickle and squashes the following commits:
      
      f44f771 [Davies Liu] enable tests about array
      3908f5c [Davies Liu] Merge branch 'master' into pickle
      c77c87b [Davies Liu] cleanup debugging code
      60e4e2f [Davies Liu] support unpickle array.array for Python 2.6
      da33acb8
    • qiping.lqp's avatar
      [SPARK-3516] [mllib] DecisionTree: Add minInstancesPerNode, minInfoGain params... · fdb302f4
      qiping.lqp authored
      [SPARK-3516] [mllib] DecisionTree: Add minInstancesPerNode, minInfoGain params to example and Python API
      
      Added minInstancesPerNode, minInfoGain params to:
      * DecisionTreeRunner.scala example
      * Python API (tree.py)
      
      Also:
      * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements"
      * small style fixes
      
      CC: mengxr
      
      Author: qiping.lqp <qiping.lqp@alibaba-inc.com>
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      Author: chouqin <liqiping1991@gmail.com>
      
      Closes #2349 from jkbradley/chouqin-dt-preprune and squashes the following commits:
      
      61b2e72 [Joseph K. Bradley] Added max of 10GB for maxMemoryInMB in Strategy.
      a95e7c8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune
      95c479d [Joseph K. Bradley] * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements" * small style fixes
      e2628b6 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune
      19b01af [Joseph K. Bradley] Merge remote-tracking branch 'chouqin/dt-preprune' into chouqin-dt-preprune
      f1d11d1 [chouqin] fix typo
      c7ebaf1 [chouqin] fix typo
      39f9b60 [chouqin] change edge `minInstancesPerNode` to 2 and add one more test
      c6e2dfc [Joseph K. Bradley] Added minInstancesPerNode and minInfoGain parameters to DecisionTreeRunner.scala and to Python API in tree.py
      0278a11 [chouqin] remove `noSplit` and set `Predict` private to tree
      d593ec7 [chouqin] fix docs and change minInstancesPerNode to 1
      efcc736 [qiping.lqp] fix bug
      10b8012 [qiping.lqp] fix style
      6728fad [qiping.lqp] minor fix: remove empty lines
      bb465ca [qiping.lqp] Merge branch 'master' of https://github.com/apache/spark into dt-preprune
      cadd569 [qiping.lqp] add api docs
      46b891f [qiping.lqp] fix bug
      e72c7e4 [qiping.lqp] add comments
      845c6fa [qiping.lqp] fix style
      f195e83 [qiping.lqp] fix style
      987cbf4 [qiping.lqp] fix bug
      ff34845 [qiping.lqp] separate calculation of predict of node from calculation of info gain
      ac42378 [qiping.lqp] add min info gain and min instances per node parameters in decision tree
      fdb302f4
    • Reza Zadeh's avatar
      [MLlib] Update SVD documentation in IndexedRowMatrix · 983d6a9c
      Reza Zadeh authored
      Updating this to reflect the newest SVD via ARPACK
      
      Author: Reza Zadeh <rizlar@gmail.com>
      
      Closes #2389 from rezazadeh/irmdocs and squashes the following commits:
      
      7fa1313 [Reza Zadeh] Update svd docs
      715da25 [Reza Zadeh] Updated computeSVD documentation IndexedRowMatrix
      983d6a9c
    • Christoph Sawade's avatar
      [SPARK-3396][MLLIB] Use SquaredL2Updater in LogisticRegressionWithSGD · 3b931281
      Christoph Sawade authored
      SimpleUpdater ignores the regularizer, which leads to an unregularized
      LogReg. To enable the common L2 regularizer (and the corresponding
      regularization parameter) for logistic regression the SquaredL2Updater
      has to be used in SGD (see, e.g., [SVMWithSGD])
      
      Author: Christoph Sawade <christoph@sawade.me>
      
      Closes #2398 from BigCrunsh/fix-regparam-logreg and squashes the following commits:
      
      0820c04 [Christoph Sawade] Use SquaredL2Updater in LogisticRegressionWithSGD
      3b931281
    • yantangzhai's avatar
      [SPARK-2714] DAGScheduler logs jobid when runJob finishes · 37d92528
      yantangzhai authored
      DAGScheduler logs jobid when runJob finishes
      
      Author: yantangzhai <tyz0303@163.com>
      
      Closes #1617 from YanTangZhai/SPARK-2714 and squashes the following commits:
      
      0a0243f [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
      fbb1150 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
      7aec2a9 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
      fb42f0f [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
      090d908 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes
      37d92528
    • Kousuke Saruta's avatar
      [SPARK-3518] Remove wasted statement in JsonProtocol · e59fac1f
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2380 from sarutak/SPARK-3518 and squashes the following commits:
      
      8a1464e [Kousuke Saruta] Replaced a variable with simple field reference
      c660fbc [Kousuke Saruta] Removed useless statement in JsonProtocol.scala
      e59fac1f
    • Matthew Farrellee's avatar
      [SPARK-3425] do not set MaxPermSize for OpenJDK 1.8 · fe2b1d6a
      Matthew Farrellee authored
      Closes #2387
      
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2301 from mattf/SPARK-3425 and squashes the following commits:
      
      20f3c09 [Matthew Farrellee] [SPARK-3425] do not set MaxPermSize for OpenJDK 1.8
      fe2b1d6a
    • Kousuke Saruta's avatar
      [SPARK-3410] The priority of shutdownhook for ApplicationMaster should not be integer literal · cc146444
      Kousuke Saruta authored
      I think, it need to keep the priority of shutdown hook for ApplicationMaster than the priority of shutdown hook for o.a.h.FileSystem depending on changing the priority for FileSystem.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2283 from sarutak/SPARK-3410 and squashes the following commits:
      
      1d44fef [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3410
      bd6cc53 [Kousuke Saruta] Modified style
      ee6f1aa [Kousuke Saruta] Added constant "SHUTDOWN_HOOK_PRIORITY" to ApplicationMaster
      54eb68f [Kousuke Saruta] Changed Shutdown hook priority to 20
      2f0aee3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3410
      4c5cb93 [Kousuke Saruta] Modified the priority for AM's shutdown hook
      217d1a4 [Kousuke Saruta] Removed unused import statements
      717aba2 [Kousuke Saruta] Modified ApplicationMaster to make to keep the priority of shutdown hook for ApplicationMaster higher than the priority of shutdown hook for HDFS
      cc146444
Loading