Skip to content
Snippets Groups Projects
  1. Feb 26, 2015
    • Judy Nash's avatar
      [SPARK-5914] to run spark-submit requiring only user perm on windows · 51a6f909
      Judy Nash authored
      Because windows on-default does not grant read permission to jars except to admin, spark-submit would fail with "ClassNotFound" exception if user runs slave service with only user permission.
      This fix is to add read permission to owner of the jar (which would be the slave service account in windows )
      
      Author: Judy Nash <judynash@microsoft.com>
      
      Closes #4742 from judynash/SPARK-5914 and squashes the following commits:
      
      e288e56 [Judy Nash] Fix spacing and refactor code
      1de3c0e [Judy Nash] [SPARK-5914] Enable spark-submit to run requiring only user permission on windows
      51a6f909
    • Xiangrui Meng's avatar
      [SPARK-5976][MLLIB] Add partitioner to factors returned by ALS · e43139f4
      Xiangrui Meng authored
      The model trained by ALS requires partitioning information to do quick lookup of a user/item factor for making recommendation on individual requests. In the new implementation, we didn't set partitioners in the factors returned by ALS, which would cause performance regression.
      
      srowen coderxiang
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4748 from mengxr/SPARK-5976 and squashes the following commits:
      
      9373a09 [Xiangrui Meng] add partitioner to factors returned by ALS
      260f183 [Xiangrui Meng] add a test for partitioner
      e43139f4
  2. Feb 25, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT · d20559b1
      Joseph K. Bradley authored
      * Add GradientBoostedTrees Python examples to ML guide
        * I ran these in the pyspark shell, and they worked.
      * Add save/load to examples in ML guide
      * Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits:
      
      c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
      bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide.  Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
      6d81c3e [Joseph K. Bradley] completed python GBT examples
      9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
      c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide.  Added GBT examples to ML guide
      d20559b1
    • Brennon York's avatar
      [SPARK-1182][Docs] Sort the configuration parameters in configuration.md · 46a044a3
      Brennon York authored
      Sorts all configuration options present on the `configuration.md` page to ease readability.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #3863 from brennonyork/SPARK-1182 and squashes the following commits:
      
      5696f21 [Brennon York] fixed merge conflict with port comments
      81a7b10 [Brennon York] capitalized A in Allocation
      e240486 [Brennon York] moved all spark.mesos properties into the running-on-mesos doc
      7de5f75 [Brennon York] moved serialization from application to compression and serialization section
      a16fec0 [Brennon York] moved shuffle settings from network to shuffle
      f8fa286 [Brennon York] sorted encryption category
      1023f15 [Brennon York] moved initialExecutors
      e9d62aa [Brennon York] fixed akka.heartbeat.interval
      25e6f6f [Brennon York] moved spark.executer.user*
      4625ade [Brennon York] added spark.executor.extra* items
      4ee5648 [Brennon York] fixed merge conflicts
      1b49234 [Brennon York] sorting mishap
      2b5758b [Brennon York] sorting mishap
      6fbdf42 [Brennon York] sorting mishap
      55dc6f8 [Brennon York] sorted security
      ec34294 [Brennon York] sorted dynamic allocation
      2a7c4a3 [Brennon York] sorted scheduling
      aa9acdc [Brennon York] sorted networking
      a4380b8 [Brennon York] sorted execution behavior
      27f3919 [Brennon York] sorted compression and serialization
      80a5bbb [Brennon York] sorted spark ui
      3f32e5b [Brennon York] sorted shuffle behavior
      6c51b38 [Brennon York] sorted runtime environment
      efe9d6f [Brennon York] sorted application properties
      46a044a3
    • Yanbo Liang's avatar
      [SPARK-5926] [SQL] make DataFrame.explain leverage queryExecution.logical · 41e2e5ac
      Yanbo Liang authored
      DataFrame.explain return wrong result when the query is DDL command.
      
      For example, the following two queries should print out the same execution plan, but it not.
      sql("create table tb as select * from src where key > 490").explain(true)
      sql("explain extended create table tb as select * from src where key > 490")
      
      This is because DataFrame.explain leverage logicalPlan which had been forced executed, we should use  the unexecuted plan queryExecution.logical.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #4707 from yanboliang/spark-5926 and squashes the following commits:
      
      fa6db63 [Yanbo Liang] logicalPlan is not lazy
      0e40a1b [Yanbo Liang] make DataFrame.explain leverage queryExecution.logical
      41e2e5ac
    • Liang-Chi Hsieh's avatar
      [SPARK-5999][SQL] Remove duplicate Literal matching block · 12dbf98c
      Liang-Chi Hsieh authored
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4760 from viirya/dup_literal and squashes the following commits:
      
      06e7516 [Liang-Chi Hsieh] Remove duplicate Literal matching block.
      12dbf98c
    • Cheng Lian's avatar
      [SPARK-6010] [SQL] Merging compatible Parquet schemas before computing splits · e0fdd467
      Cheng Lian authored
      `ReadContext.init` calls `InitContext.getMergedKeyValueMetadata`, which doesn't know how to merge conflicting user defined key-value metadata and throws exception. In our case, when dealing with different but compatible schemas, we have different Spark SQL schema JSON strings in different Parquet part-files, thus causes this problem. Reading similar Parquet files generated by Hive doesn't suffer from this issue.
      
      In this PR, we manually merge the schemas before passing it to `ReadContext` to avoid the exception.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4768)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4768 from liancheng/spark-6010 and squashes the following commits:
      
      9002f0a [Cheng Lian] Fixes SPARK-6010
      e0fdd467
    • Davies Liu's avatar
      [SPARK-5944] [PySpark] fix version in Python API docs · f3f4c87b
      Davies Liu authored
      use RELEASE_VERSION when building the Python API docs
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4731 from davies/api_version and squashes the following commits:
      
      c9744c9 [Davies Liu] Update create-release.sh
      08cbc3f [Davies Liu] fix python docs
      f3f4c87b
    • Kay Ousterhout's avatar
      [SPARK-5982] Remove incorrect Local Read Time Metric · 838a4803
      Kay Ousterhout authored
      This metric is incomplete, because the files are memory mapped, so much of the read from disk occurs later as tasks actually read the file's data.
      
      This should be merged into 1.3, so that we never expose this incorrect metric to users.
      
      CC pwendell ksakellis sryza
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #4749 from kayousterhout/SPARK-5982 and squashes the following commits:
      
      9737b5e [Kay Ousterhout] More fixes
      a1eb300 [Kay Ousterhout] Removed one more use of local read time
      cf13497 [Kay Ousterhout] [SPARK-5982] Remove incorrectwq Local Read Time Metric
      838a4803
    • Brennon York's avatar
      [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing · 9f603fce
      Brennon York authored
      Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the `zipPartitions` method. This fix tests whether the partitions are equal or not and, if not, will repartition the other to match the partition size of the calling VertexRDD.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #4705 from brennonyork/SPARK-1955 and squashes the following commits:
      
      0882590 [Brennon York] updated to properly handle differently-partitioned vertexRDDs
      9f603fce
    • Milan Straka's avatar
      [SPARK-5970][core] Register directory created in getOrCreateLocalRootDirs for automatic deletion. · a777c65d
      Milan Straka authored
      As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`.
      
      Author: Milan Straka <fox@ucw.cz>
      
      Closes #4759 from foxik/remove-tmp-dirs and squashes the following commits:
      
      280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...
      a777c65d
    • Sean Owen's avatar
      SPARK-5930 [DOCS] Documented default of spark.shuffle.io.retryWait is confusing · 7d8e6a2e
      Sean Owen authored
      Clarify default max wait in spark.shuffle.io.retryWait docs
      
      CC andrewor14
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4769 from srowen/SPARK-5930 and squashes the following commits:
      
      ae2792b [Sean Owen] Clarify default max wait in spark.shuffle.io.retryWait docs
      7d8e6a2e
    • Michael Armbrust's avatar
      [SPARK-5996][SQL] Fix specialized outbound conversions · f84c799e
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4757 from marmbrus/udtConversions and squashes the following commits:
      
      3714aad [Michael Armbrust] [SPARK-5996][SQL] Fix specialized outbound conversions
      f84c799e
    • guliangliang's avatar
      [SPARK-5771] Number of Cores in Completed Applications of Standalone Master... · dd077abf
      guliangliang authored
      [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
      
      In Standalone mode, the number of cores in Completed Applications of the Master Web Page will always be zero, if sc.stop() is called.
      But the number will always be right, if sc.stop() is not called.
      The reason maybe:
      after sc.stop() is called, the function removeExecutor of class ApplicationInfo will be called, thus reduce the variable coresGranted to zero. The variable coresGranted is used to display the number of Cores on the Web Page.
      
      Author: guliangliang <guliangliang@qiyi.com>
      
      Closes #4567 from marsishandsome/Spark5771 and squashes the following commits:
      
      694796e [guliangliang] remove duplicate code
      a20e390 [guliangliang] change to Cores Using & Requested
      0c19c95 [guliangliang] change Cores to Cores (max)
      cfbd97d [guliangliang] [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
      dd077abf
    • Benedikt Linse's avatar
      [GraphX] fixing 3 typos in the graphx programming guide · 5b8480e0
      Benedikt Linse authored
      Corrected 3 Typos in the GraphX programming guide. I hope this is the correct way to contribute.
      
      Author: Benedikt Linse <benedikt.linse@gmail.com>
      
      Closes #4766 from 1123/master and squashes the following commits:
      
      8a63812 [Benedikt Linse] fixing 3 typos in the graphx programming guide
      5b8480e0
    • prabs's avatar
      [SPARK-5666][streaming][MQTT streaming] some trivial fixes · d51ed263
      prabs authored
      modified to adhere to accepted coding standards as pointed by tdas in PR #3844
      
      Author: prabs <prabsmails@gmail.com>
      Author: Prabeesh K <prabsmails@gmail.com>
      
      Closes #4178 from prabeesh/master and squashes the following commits:
      
      bd2cb49 [Prabeesh K] adress the comment
      ccc0765 [prabs] adress the comment
      46f9619 [prabs] adress the comment
      c035bdc [prabs] adress the comment
      22dd7f7 [prabs] address the comments
      0cc67bd [prabs] adress the comment
      838c38e [prabs] adress the comment
      cd57029 [prabs] address the comments
      66919a3 [Prabeesh K] changed MqttDefaultFilePersistence to MemoryPersistence
      5857989 [prabs] modified to adhere to accepted coding standards
      d51ed263
  3. Feb 24, 2015
    • Davies Liu's avatar
      [SPARK-5994] [SQL] Python DataFrame documentation fixes · d641fbb3
      Davies Liu authored
      select empty should NOT be the same as select. make sure selectExpr is behaving the same.
      join param documentation
      link to source doesn't work in jekyll generated file
      cross reference of columns (i.e. enabling linking)
      show(): move df example before df.show()
      move tests in SQLContext out of docstring otherwise doc is too long
      Column.desc and .asc doesn't have any documentation
      in documentation, sort functions.*)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4756 from davies/df_docs and squashes the following commits:
      
      f30502c [Davies Liu] fix doc
      32f0d46 [Davies Liu] fix DataFrame docs
      d641fbb3
    • Yin Huai's avatar
      [SPARK-5286][SQL] SPARK-5286 followup · 769e092b
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5286
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4755 from yhuai/SPARK-5286-throwable and squashes the following commits:
      
      4c0c450 [Yin Huai] Catch Throwable instead of Exception.
      769e092b
    • Tathagata Das's avatar
      [SPARK-5993][Streaming][Build] Fix assembly jar location of kafka-assembly · 922b43b3
      Tathagata Das authored
      Published Kafka-assembly JAR was empty in 1.3.0-RC1
      This is because the maven build generated two Jars-
      1. an empty JAR file (since kafka-assembly has no code of its own)
      2. a assembly JAR file containing everything in a different location as 1
      The maven publishing plugin uploaded 1 and not 2.
      Instead if 2 is not configure to generate in a different location, there is only 1 jar containing everything, which gets published.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4753 from tdas/SPARK-5993 and squashes the following commits:
      
      c390db8 [Tathagata Das] Fix assembly jar location of kafka-assembly
      922b43b3
    • Reynold Xin's avatar
      [SPARK-5985][SQL] DataFrame sortBy -> orderBy in Python. · fba11c2f
      Reynold Xin authored
      Also added desc/asc function for constructing sorting expressions more conveniently. And added a small fix to lift alias out of cast expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4752 from rxin/SPARK-5985 and squashes the following commits:
      
      aeda5ae [Reynold Xin] Added Experimental flag to ColumnName.
      047ad03 [Reynold Xin] Lift alias out of cast.
      c9cf17c [Reynold Xin] [SPARK-5985][SQL] DataFrame sortBy -> orderBy in Python.
      fba11c2f
    • Reynold Xin's avatar
      [SPARK-5904][SQL] DataFrame Java API test suites. · 53a1ebf3
      Reynold Xin authored
      Added a new test suite to make sure Java DF programs can use varargs properly.
      Also moved all suites into test.org.apache.spark package to make sure the suites also test for method visibility.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4751 from rxin/df-tests and squashes the following commits:
      
      1e8b8e4 [Reynold Xin] Fixed imports and renamed JavaAPISuite.
      a6ca53b [Reynold Xin] [SPARK-5904][SQL] DataFrame Java API test suites.
      53a1ebf3
    • Cheng Lian's avatar
      [SPARK-5751] [SQL] [WIP] Revamped HiveThriftServer2Suite for robustness · f816e739
      Cheng Lian authored
      **NOTICE** Do NOT merge this, as we're waiting for #3881 to be merged.
      
      `HiveThriftServer2Suite` has been notorious for its flakiness for a while. This was mostly due to spawning and communicate with external server processes. This PR revamps this test suite for better robustness:
      
      1. Fixes a racing condition occurred while using `tail -f` to check log file
      
         It's possible that the line we are looking for has already been printed into the log file before we start the `tail -f` process. This PR uses `tail -n +0 -f` to ensure all lines are checked.
      
      2. Retries up to 3 times if the server fails to start
      
         In most of the cases, the server fails to start because of port conflict. This PR no longer asks the system to choose an available TCP port, but uses a random port first, and retries up to 3 times if the server fails to start.
      
      3. A server instance is reused among all test cases within a single suite
      
         The original `HiveThriftServer2Suite` is splitted into two test suites, `HiveThriftBinaryServerSuite` and `HiveThriftHttpServerSuite`. Each suite starts a `HiveThriftServer2` instance and reuses it for all of its test cases.
      
      **TODO**
      
      - [ ] Starts the Thrift server in foreground once #3881 is merged (adding `--foreground` flag to `spark-daemon.sh`)
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4720)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4720 from liancheng/revamp-thrift-server-tests and squashes the following commits:
      
      d6c80eb [Cheng Lian] Relaxes server startup timeout
      6f14eb1 [Cheng Lian] Revamped HiveThriftServer2Suite for robustness
      f816e739
    • MechCoder's avatar
      [SPARK-5436] [MLlib] Validate GradientBoostedTrees using runWithValidation · 2a0fe348
      MechCoder authored
      One can early stop if the decrease in error rate is lesser than a certain tol or if the error increases if the training data is overfit.
      
      This introduces a new method runWithValidation which takes in a pair of RDD's , one for the training data and the other for the validation.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4677 from MechCoder/spark-5436 and squashes the following commits:
      
      1bb21d4 [MechCoder] Combine regression and classification tests into a single one
      e4d799b [MechCoder] Addresses indentation and doc comments
      b48a70f [MechCoder] COSMIT
      b928a19 [MechCoder] Move validation while training section under usage tips
      fad9b6e [MechCoder] Made the following changes 1. Add section to documentation 2. Return corresponding to bestValidationError 3. Allow negative tolerance.
      55e5c3b [MechCoder] One liner for prevValidateError
      3e74372 [MechCoder] TST: Add test for classification
      77549a9 [MechCoder] [SPARK-5436] Validate GradientBoostedTrees using runWithValidation
      2a0fe348
    • Davies Liu's avatar
      [SPARK-5973] [PySpark] fix zip with two RDDs with AutoBatchedSerializer · da505e59
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4745 from davies/fix_zip and squashes the following commits:
      
      2124b2c [Davies Liu] Update tests.py
      b5c828f [Davies Liu] increase the number of records
      c1e40fd [Davies Liu] fix zip with two RDDs with AutoBatchedSerializer
      da505e59
    • Michael Armbrust's avatar
      [SPARK-5952][SQL] Lock when using hive metastore client · a2b91379
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4746 from marmbrus/hiveLock and squashes the following commits:
      
      8b871cf [Michael Armbrust] [SPARK-5952][SQL] Lock when using hive metastore client
      a2b91379
    • Judy's avatar
      [Spark-5708] Add Slf4jSink to Spark Metrics · c5ba975e
      Judy authored
      Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter.
      This sends metrics to log4j, allowing spark users to reuse log4j pipeline for metrics collection.
      
      Reviewed existing unit tests and didn't see any sink-related tests. Please advise on if tests should be added.
      
      Author: Judy <judynash@microsoft.com>
      Author: judynash <judynash@microsoft.com>
      
      Closes #4644 from judynash/master and squashes the following commits:
      
      57ef214 [judynash] doc clarification and indent fixes
      a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics
      c5ba975e
    • Xiangrui Meng's avatar
      [MLLIB] Change x_i to y_i in Variance's user guide · 105791e3
      Xiangrui Meng authored
      Variance is calculated on labels/responses.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4740 from mengxr/patch-1 and squashes the following commits:
      
      673317b [Xiangrui Meng] [MLLIB] Change x_i to y_i in Variance's user guide
      105791e3
    • Andrew Or's avatar
      [SPARK-5965] Standalone Worker UI displays {{USER_JAR}} · 6d2caa57
      Andrew Or authored
      For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965
      This was caused by 20a60131.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4739 from andrewor14/user-jar-blocker and squashes the following commits:
      
      23c4a9e [Andrew Or] Use right argument
      6d2caa57
    • Tathagata Das's avatar
      [Spark-5967] [UI] Correctly clean JobProgressListener.stageIdToActiveJobIds · 64d2c01f
      Tathagata Das authored
      Patch should be self-explanatory
      pwendell JoshRosen
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4741 from tdas/SPARK-5967 and squashes the following commits:
      
      653b5bb [Tathagata Das] Fixed the fix and added test
      e2de972 [Tathagata Das] Clear stages which have no corresponding active jobs.
      64d2c01f
    • Michael Armbrust's avatar
      [SPARK-5532][SQL] Repartition should not use external rdd representation · 20123662
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4738 from marmbrus/udtRepart and squashes the following commits:
      
      c06d7b5 [Michael Armbrust] fix compilation
      91c8829 [Michael Armbrust] [SQL][SPARK-5532] Repartition should not use external rdd representation
      20123662
    • Michael Armbrust's avatar
      [SPARK-5910][SQL] Support for as in selectExpr · 0a59e45e
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4736 from marmbrus/asExprs and squashes the following commits:
      
      5ba97e4 [Michael Armbrust] [SPARK-5910][SQL] Support for as in selectExpr
      0a59e45e
    • Cheng Lian's avatar
      [SPARK-5968] [SQL] Suppresses ParquetOutputCommitter WARN logs · 84033313
      Cheng Lian authored
      Please refer to the [JIRA ticket] [1] for the motivation.
      
      [1]: https://issues.apache.org/jira/browse/SPARK-5968
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4744)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4744 from liancheng/spark-5968 and squashes the following commits:
      
      caac6a8 [Cheng Lian] Suppresses ParquetOutputCommitter WARN logs
      84033313
    • Xiangrui Meng's avatar
      [SPARK-5958][MLLIB][DOC] update block matrix user guide · cf2e4165
      Xiangrui Meng authored
      * Removed SVD code from examples.
      * Corrected Java API doc link.
      * Updated variable names: `AtransposeA` -> `ata`.
      * Minor changes.
      
      brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the following commits:
      
      70f53ac [Xiangrui Meng] update block matrix user guide
      cf2e4165
  4. Feb 23, 2015
    • Michael Armbrust's avatar
      [SPARK-5873][SQL] Allow viewing of partially analyzed plans in queryExecution · 1ed57086
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4684 from marmbrus/explainAnalysis and squashes the following commits:
      
      afbaa19 [Michael Armbrust] fix python
      d93278c [Michael Armbrust] fix hive
      e5fa0a4 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
      52119f2 [Michael Armbrust] more tests
      82a5431 [Michael Armbrust] fix tests
      25753d2 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
      aee1e6a [Michael Armbrust] fix hive
      b23a844 [Michael Armbrust] newline
      de8dc51 [Michael Armbrust] more comments
      acf620a [Michael Armbrust] [SPARK-5873][SQL] Show partially analyzed plans in query execution
      1ed57086
    • Yin Huai's avatar
      [SPARK-5935][SQL] Accept MapType in the schema provided to a JSON dataset. · 48376bfe
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5935
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #4710 from yhuai/jsonMapType and squashes the following commits:
      
      3e40390 [Yin Huai] Remove unnecessary changes.
      f8e6267 [Yin Huai] Fix test.
      baa36e3 [Yin Huai] Accept MapType in the schema provided to jsonFile/jsonRDD.
      48376bfe
    • Joseph K. Bradley's avatar
      [SPARK-5912] [docs] [mllib] Small fixes to ChiSqSelector docs · 59536cc8
      Joseph K. Bradley authored
      Fixes:
      * typo in Scala example
      * Removed comment "usually applied on sparse data" since that is debatable
      * small edits to text for clarity
      
      CC: avulanov  I noticed a typo post-hoc and ended up making a few small edits.  Do the changes look OK?
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4732 from jkbradley/chisqselector-docs and squashes the following commits:
      
      9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
      3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs
      59536cc8
    • Alexander Ulanov's avatar
      [MLLIB] SPARK-5912 Programming guide for feature selection · 28ccf5ee
      Alexander Ulanov authored
      Added description of ChiSqSelector and few words about feature selection in general. I could add a code example, however it would not look reasonable in the absence of feature discretizer or a dataset in the `data` folder that has redundant features.
      
      Author: Alexander Ulanov <nashb@yandex.ru>
      
      Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
      
      19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
      58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
      eb6b9fe [Alexander Ulanov] Typo
      2921a1d [Alexander Ulanov] ChiSqSelector example of use
      c845350 [Alexander Ulanov] ChiSqSelector docs
      28ccf5ee
    • Jacky Li's avatar
      [SPARK-5939][MLLib] make FPGrowth example app take parameters · 651a1c01
      Jacky Li authored
      Add parameter parsing in FPGrowth example app in Scala and Java
      And a sample data file is added in data/mllib folder
      
      Author: Jacky Li <jacky.likun@huawei.com>
      
      Closes #4714 from jackylk/parameter and squashes the following commits:
      
      8c478b3 [Jacky Li] fix according to comments
      3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
      f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
      651a1c01
    • CodingCat's avatar
      [SPARK-5724] fix the misconfiguration in AkkaUtils · 242d4958
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-5724
      
      In AkkaUtil, we set several failure detector related the parameters as following
      
      ```
      al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String])
            .withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString(
            s"""
            |akka.daemonic = on
            |akka.loggers = [""akka.event.slf4j.Slf4jLogger""]
            |akka.stdout-loglevel = "ERROR"
            |akka.jvm-exit-on-fatal-error = off
            |akka.remote.require-cookie = "$requireCookie"
            |akka.remote.secure-cookie = "$secureCookie"
            |akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s
            |akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s
            |akka.remote.transport-failure-detector.threshold = $akkaFailureDetector
            |akka.actor.provider = "akka.remote.RemoteActorRefProvider"
            |akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport"
            |akka.remote.netty.tcp.hostname = "$host"
            |akka.remote.netty.tcp.port = $port
            |akka.remote.netty.tcp.tcp-nodelay = on
            |akka.remote.netty.tcp.connection-timeout = $akkaTimeout s
            |akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B
            |akka.remote.netty.tcp.execution-pool-size = $akkaThreads
            |akka.actor.default-dispatcher.throughput = $akkaBatchSize
            |akka.log-config-on-start = $logAkkaConfig
            |akka.remote.log-remote-lifecycle-events = $lifecycleEvents
            |akka.log-dead-letters = $lifecycleEvents
            |akka.log-dead-letters-during-shutdown = $lifecycleEvents
            """.stripMargin))
      
      ```
      
      Actually, we do not have any parameter naming "akka.remote.transport-failure-detector.threshold"
      see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html
      what we have is "akka.remote.watch-failure-detector.threshold"
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits:
      
      bafe56e [CodingCat] fix the grammar in configuration doc
      338296e [CodingCat] remove failure-detector related info
      8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils
      242d4958
    • Saisai Shao's avatar
      [SPARK-5943][Streaming] Update the test to use new API to reduce the warning · 757b14b8
      Saisai Shao authored
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #4722 from jerryshao/SPARK-5943 and squashes the following commits:
      
      1b01233 [Saisai Shao] Update the test to use new API to reduce the warning
      757b14b8
Loading