Skip to content
Snippets Groups Projects
  1. Apr 26, 2016
    • Jacek Laskowski's avatar
      [MINOR][DOCS] Minor typo fixes · b208229b
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Minor typo fixes (too minor to deserve separate a JIRA)
      
      ## How was this patch tested?
      
      local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #12469 from jaceklaskowski/minor-typo-fixes.
      b208229b
    • Azeem Jiva's avatar
      [SPARK-14756][CORE] Use parseLong instead of valueOf · de6e6334
      Azeem Jiva authored
      ## What changes were proposed in this pull request?
      
      Use Long.parseLong which returns a primative.
      Use a series of appends() reduces the creation of an extra StringBuilder type
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: Azeem Jiva <azeemj@gmail.com>
      
      Closes #12520 from javawithjiva/minor.
      de6e6334
    • Subhobrata Dey's avatar
      [SPARK-14889][SPARK CORE] scala.MatchError: NONE (of class scala.Enumeration)... · f70e4fff
      Subhobrata Dey authored
      [SPARK-14889][SPARK CORE] scala.MatchError: NONE (of class scala.Enumeration) when spark.scheduler.mode=NONE
      
      ## What changes were proposed in this pull request?
      
      Handling exception for the below mentioned issue
      
      ```
      ➜  spark git:(master) ✗ ./bin/spark-shell -c spark.scheduler.mode=NONE
      16/04/25 09:15:00 ERROR SparkContext: Error initializing SparkContext.
      scala.MatchError: NONE (of class scala.Enumeration$Val)
      	at org.apache.spark.scheduler.Pool.<init>(Pool.scala:53)
      	at org.apache.spark.scheduler.TaskSchedulerImpl.initialize(TaskSchedulerImpl.scala:131)
      	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2352)
      	at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
      ```
      
      The exception now looks like
      
      ```
      java.lang.RuntimeException: The scheduler mode NONE is not supported by Spark.
      ```
      
      ## How was this patch tested?
      
      manual tests
      
      Author: Subhobrata Dey <sbcd90@gmail.com>
      
      Closes #12666 from sbcd90/schedulerModeIssue.
      f70e4fff
    • Michael Gummelt's avatar
      Fix dynamic allocation docs to address cached data. · 6a7ba1ff
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Documentation changes
      
      ## How was this patch tested?
      
      No tests
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #12664 from mgummelt/fix-dynamic-docs.
      6a7ba1ff
    • BenFradet's avatar
      [SPARK-13962][ML] spark.ml Evaluators should support other numeric types for label · 2a5c9307
      BenFradet authored
      ## What changes were proposed in this pull request?
      
      Made BinaryClassificationEvaluator, MulticlassClassificationEvaluator and RegressionEvaluator accept all numeric types for label
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: BenFradet <benjamin.fradet@gmail.com>
      
      Closes #12500 from BenFradet/SPARK-13962.
      2a5c9307
  2. Apr 25, 2016
    • Reynold Xin's avatar
      f8709218
    • Reynold Xin's avatar
      [HOTFIX] Fix compilation · d2614eaa
      Reynold Xin authored
      d2614eaa
    • Andrew Or's avatar
      [SPARK-14861][SQL] Replace internal usages of SQLContext with SparkSession · 18c2c925
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      In Spark 2.0, `SparkSession` is the new thing. Internally we should stop using `SQLContext` everywhere since that's supposed to be not the main user-facing API anymore.
      
      In this patch I took care to not break any public APIs. The one place that's suspect is `o.a.s.ml.source.libsvm.DefaultSource`, but according to mengxr it's not supposed to be public so it's OK to change the underlying `FileFormat` trait.
      
      **Reviewers**: This is a big patch that may be difficult to review but the changes are actually really straightforward. If you prefer I can break it up into a few smaller patches, but it will delay the progress of this issue a little.
      
      ## How was this patch tested?
      
      No change in functionality intended.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12625 from andrewor14/spark-session-refactor.
      18c2c925
    • Andrew Or's avatar
      [SPARK-14904][SQL] Put removed HiveContext in compatibility module · fa3c0698
      Andrew Or authored
      ## What changes were proposed in this pull request?
      This is for users who can't upgrade and need to continue to use HiveContext.
      
      ## How was this patch tested?
      Added some basic tests for sanity check.
      
      This is based on #12672 and closes #12672.
      
      Author: Andrew Or <andrew@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #12682 from rxin/add-back-hive-context.
      fa3c0698
    • Sameer Agarwal's avatar
      [SPARK-14870][SQL][FOLLOW-UP] Move decimalDataWithNulls in DataFrameAggregateSuite · c71c6853
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      
      Minor followup to https://github.com/apache/spark/pull/12651
      
      ## How was this patch tested?
      
      Test-only change
      
      Author: Sameer Agarwal <sameer@databricks.com>
      
      Closes #12674 from sameeragarwal/tpcds-fix-2.
      c71c6853
    • Andrew Or's avatar
      [SPARK-14902][SQL] Expose RuntimeConfig in SparkSession · cfa64882
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      `RuntimeConfig` is the new user-facing API in 2.0 added in #11378. Until now, however, it's been dead code. This patch uses `RuntimeConfig` in `SessionState` and exposes that through the `SparkSession`.
      
      ## How was this patch tested?
      
      New test in `SQLContextSuite`.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12669 from andrewor14/use-runtime-conf.
      cfa64882
    • Reynold Xin's avatar
      [SPARK-14888][SQL] UnresolvedFunction should use FunctionIdentifier · f36c9c83
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch changes UnresolvedFunction and UnresolvedGenerator to use a FunctionIdentifier rather than just a String for function name. Also changed SessionCatalog to accept FunctionIdentifier in lookupFunction.
      
      ## How was this patch tested?
      Updated related unit tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #12659 from rxin/SPARK-14888.
      f36c9c83
    • Andrew Or's avatar
      [SPARK-14828][SQL] Start SparkSession in REPL instead of SQLContext · 34336b62
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      ```
      Spark context available as 'sc' (master = local[*], app id = local-1461283768192).
      Spark session available as 'spark'.
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
            /_/
      
      Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> sql("SHOW TABLES").collect()
      16/04/21 17:09:39 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
      16/04/21 17:09:39 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
      res0: Array[org.apache.spark.sql.Row] = Array([src,false])
      
      scala> sql("SHOW TABLES").collect()
      res1: Array[org.apache.spark.sql.Row] = Array([src,false])
      
      scala> spark.createDataFrame(Seq((1, 1), (2, 2), (3, 3)))
      res2: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
      ```
      
      Hive things are loaded lazily.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12589 from andrewor14/spark-session-repl.
      34336b62
    • Yanbo Liang's avatar
      [SPARK-14312][ML][SPARKR] NaiveBayes model persistence in SparkR · 9cb3ba10
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      SparkR ```NaiveBayesModel``` supports ```save/load``` by the following API:
      ```
      df <- createDataFrame(sqlContext, infert)
      model <- naiveBayes(education ~ ., df, laplace = 0)
      ml.save(model, path)
      model2 <- ml.load(path)
      ```
      
      ## How was this patch tested?
      Add unit tests.
      
      cc mengxr
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #12573 from yanboliang/spark-14312.
      9cb3ba10
    • gatorsmile's avatar
      [SPARK-13739][SQL] Push Predicate Through Window · 0c47e274
      gatorsmile authored
      #### What changes were proposed in this pull request?
      
      For performance, predicates can be pushed through Window if and only if the following conditions are satisfied:
       1. All the expressions are part of window partitioning key. The expressions can be compound.
       2. Deterministic
      
      #### How was this patch tested?
      
      TODO:
      - [X]  DSL needs to be modified for window
      - [X] more tests will be added.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      Closes #11635 from gatorsmile/pushPredicateThroughWindow.
      0c47e274
    • Andrew Or's avatar
      [SPARK-14721][SQL] Remove HiveContext (part 2) · 3c5e65c3
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.
      
      Note: A couple of things will break after this patch. These will be fixed separately.
      - the python HiveContext
      - all the documentation / comments referencing HiveContext
      - there will be no more HiveContext in the REPL (fixed by #12589)
      
      ## How was this patch tested?
      
      No change in functionality.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12585 from andrewor14/delete-hive-context.
      3c5e65c3
    • Lianhui Wang's avatar
      [SPARK-14731][shuffle]Revert SPARK-12130 to make 2.0 shuffle service compatible with 1.x · 6bfe42a3
      Lianhui Wang authored
      ## What changes were proposed in this pull request?
      SPARK-12130 make 2.0 shuffle service incompatible with 1.x. So from discussion: [http://apache-spark-developers-list.1001551.n3.nabble.com/YARN-Shuffle-service-and-its-compatibility-td17222.html](url) we should maintain compatibility between Spark 1.x and Spark 2.x's shuffle service.
      I put string comparison into executor's register at first avoid string comparison in getBlockData every time.
      
      ## How was this patch tested?
      N/A
      
      Author: Lianhui Wang <lianhuiwang09@gmail.com>
      
      Closes #12568 from lianhuiwang/SPARK-14731.
      6bfe42a3
    • Yanbo Liang's avatar
      [SPARK-10574][ML][MLLIB] HashingTF supports MurmurHash3 · 425f6916
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      As the discussion at [SPARK-10574](https://issues.apache.org/jira/browse/SPARK-10574), ```HashingTF``` should support MurmurHash3 and make it as the default hash algorithm. We should also expose set/get API for ```hashAlgorithm```, then users can choose the hash method.
      
      Note: The problem that ```mllib.feature.HashingTF``` behaves differently between Scala/Java and Python will be resolved in the followup work.
      
      ## How was this patch tested?
      unit tests.
      
      cc jkbradley MLnick
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12498 from yanboliang/spark-10574.
      425f6916
    • gatorsmile's avatar
      [SPARK-14892][SQL][TEST] Disable the HiveCompatibilitySuite test case for... · 88e54218
      gatorsmile authored
      [SPARK-14892][SQL][TEST] Disable the HiveCompatibilitySuite test case for INPUTDRIVER and OUTPUTDRIVER.
      
      #### What changes were proposed in this pull request?
      Disable the test case involving INPUTDRIVER and OUTPUTDRIVER, which are not supported
      
      #### How was this patch tested?
      N/A
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #12662 from gatorsmile/disableInOutDriver.
      88e54218
    • Joseph K. Bradley's avatar
      [MINOR][ML][PYTHON][DOC] Remove use of JavaMLWriter/Reader in public Python API docs · c7758ba3
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Removed instances of JavaMLWriter, JavaMLReader appearing in public Python API docs
      
      ## How was this patch tested?
      
      n/a
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12542 from jkbradley/javamlwriter-doc.
      c7758ba3
    • wm624@hotmail.com's avatar
      [SPARK-14433][PYSPARK][ML] PySpark ml GaussianMixture · b50e2eca
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      Add Python API in ML for GaussianMixture
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Add doctest and test cases are the same as mllib Python tests
      ./dev/lint-python
      PEP8 checks passed.
      rm -rf _build/*
      pydoc checks passed.
      
      ./python/run-tests --python-executables=python2.7 --modules=pyspark-ml
      Running PySpark tests. Output is in /Users/mwang/spark_ws_0904/python/unit-tests.log
      Will test against the following Python executables: ['python2.7']
      Will test the following Python modules: ['pyspark-ml']
      Finished test(python2.7): pyspark.ml.evaluation (18s)
      Finished test(python2.7): pyspark.ml.clustering (40s)
      Finished test(python2.7): pyspark.ml.classification (49s)
      Finished test(python2.7): pyspark.ml.recommendation (44s)
      Finished test(python2.7): pyspark.ml.feature (64s)
      Finished test(python2.7): pyspark.ml.regression (45s)
      Finished test(python2.7): pyspark.ml.tuning (30s)
      Finished test(python2.7): pyspark.ml.tests (56s)
      Tests passed in 106 seconds
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #12402 from wangmiao1981/gmm.
      b50e2eca
    • Marcelo Vanzin's avatar
      [SPARK-14744][EXAMPLES] Clean up examples packaging, remove outdated examples. · a680562a
      Marcelo Vanzin authored
      First, make all dependencies in the examples module provided, and explicitly
      list a couple of ones that somehow are promoted to compile by maven. This
      means that to run streaming examples, the streaming connector package needs
      to be provided to run-examples using --packages or --jars, just like regular
      apps.
      
      Also, remove a couple of outdated examples. HBase has had Spark bindings for
      a while and is even including them in the HBase distribution in the next
      version, making the examples obsolete. The same applies to Cassandra, which
      seems to have a proper Spark binding library already.
      
      I just tested the build, which passes, and ran SparkPi. The examples jars
      directory now has only two jars:
      
      ```
      $ ls -1 examples/target/scala-2.11/jars/
      scopt_2.11-3.3.0.jar
      spark-examples_2.11-2.0.0-SNAPSHOT.jar
      ```
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12544 from vanzin/SPARK-14744.
      a680562a
    • Jason Lee's avatar
      [SPARK-14768][ML][PYSPARK] removed expectedType from Param __init__() · bfda0999
      Jason Lee authored
      ## What changes were proposed in this pull request?
      Removed expectedType arg from PySpark Param __init__, as suggested by the JIRA.
      
      ## How was this patch tested?
      Manually looked through all places that use Param. Compiled and ran all ML PySpark test cases before and after the fix.
      
      Author: Jason Lee <cjlee@us.ibm.com>
      
      Closes #12581 from jasoncl/SPARK-14768.
      bfda0999
    • Cheng Lian's avatar
      [SPARK-14875][SQL] Makes OutputWriterFactory.newInstance public · e66afd5c
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This method was accidentally made `private[sql]` in Spark 2.0. This PR makes it public again, since 3rd party data sources like spark-avro depend on it.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #12652 from liancheng/spark-14875.
      e66afd5c
    • Peter Ableda's avatar
      [SPARK-14636] Add minimum memory checks for drivers and executors · cef77d1f
      Peter Ableda authored
      ## What changes were proposed in this pull request?
      
      Implement the same memory size validations for the StaticMemoryManager (Legacy) as the UnifiedMemoryManager has.
      
      ## How was this patch tested?
      
      Manual tests were done in CDH cluster.
      
      Test with small executor memory:
      `
      spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client --master yarn --executor-memory 15m --conf spark.memory.useLegacyMode=true /opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples*.jar 10
      `
      
      Exception thrown:
      ```
      ERROR spark.SparkContext: Error initializing SparkContext.
      java.lang.IllegalArgumentException: Executor memory 15728640 must be at least 471859200. Please increase executor memory using the --executor-memory option or spark.executor.memory in Spark configuration.
      	at org.apache.spark.memory.StaticMemoryManager$.org$apache$spark$memory$StaticMemoryManager$$getMaxExecutionMemory(StaticMemoryManager.scala:127)
      	at org.apache.spark.memory.StaticMemoryManager.<init>(StaticMemoryManager.scala:46)
      	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:352)
      	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
      	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:289)
      	at org.apache.spark.SparkContext.<init>(SparkContext.scala:462)
      	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
      	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
      	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
      	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
      	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
      	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      ```
      
      Author: Peter Ableda <peter.ableda@cloudera.com>
      
      Closes #12395 from peterableda/SPARK-14636.
      cef77d1f
    • Zheng RuiFeng's avatar
      [SPARK-14758][ML] Add checking for StepSize and Tol · e6f954a5
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      add the checking for StepSize and Tol in sharedParams
      
      ## How was this patch tested?
      Unit tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12530 from zhengruifeng/ml_args_checking.
      e6f954a5
    • Eric Liang's avatar
      [SPARK-14790] Always run scalastyle on sbt compile and test · 761fc46c
      Eric Liang authored
      ## What changes were proposed in this pull request?
      
      Sbt compile and test should also run scalastyle. This makes it less likely you forget to run scalastyle and fail in jenkins. Scalastyle results are cached for efficiency.
      
      This patch was originally written by ahirreddy; I just fixed it up to work with scalastyle 0.8.0.
      
      ## How was this patch tested?
      
      Tested manually with `build/sbt package`.
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #12555 from ericl/scalastyle.
      761fc46c
    • Sameer Agarwal's avatar
      [SPARK-14870] [SQL] Fix NPE in TPCDS q14a · cbdcd4ed
      Sameer Agarwal authored
      ## What changes were proposed in this pull request?
      
      This PR fixes a bug in `TungstenAggregate` that manifests while aggregating by keys over nullable `BigDecimal` columns. This causes a null pointer exception while executing TPCDS q14a.
      
      ## How was this patch tested?
      
      1. Added regression test in `DataFrameAggregateSuite`.
      2. Verified that TPCDS q14a works
      
      Author: Sameer Agarwal <sameer@databricks.com>
      
      Closes #12651 from sameeragarwal/tpcds-fix.
      cbdcd4ed
    • felixcheung's avatar
      [SPARK-14881] [PYTHON] [SPARKR] pyspark and sparkR shell default log level... · c752b6c5
      felixcheung authored
      [SPARK-14881] [PYTHON] [SPARKR] pyspark and sparkR shell default log level should match spark-shell/Scala
      
      ## What changes were proposed in this pull request?
      
      Change default logging to WARN for pyspark shell and sparkR shell for a much cleaner environment.
      
      ## How was this patch tested?
      
      Manually running pyspark and sparkR shell
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #12648 from felixcheung/pylogging.
      c752b6c5
    • Dongjoon Hyun's avatar
      [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date · 6ab4d9e0
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.
      
      - Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
      - Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
      - Fix datatypes in `sparkr.md`.
      - Update a data result in `sparkr.md`.
      - Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
      - Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
      - Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
      - Other minor syntax fixes and a typo.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12649 from dongjoon-hyun/SPARK-14883.
      6ab4d9e0
  3. Apr 24, 2016
    • Yin Huai's avatar
      [SPARK-14885][SQL] When creating a CatalogColumn, we should use the... · 35319d32
      Yin Huai authored
      [SPARK-14885][SQL] When creating a CatalogColumn, we should use the catalogString of a DataType object.
      
      ## What changes were proposed in this pull request?
      
      Right now, the data type field of a CatalogColumn is using the string representation. When we create this string from a DataType object, there are places where we use simpleString instead of catalogString. Although catalogString is the same as simpleString right now, it is still good to use catalogString. So, we will not silently introduce issues when we change the semantic of simpleString or the implementation of catalogString.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12654 from yhuai/useCatalogString.
      35319d32
    • Dongjoon Hyun's avatar
      [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix lint-java errors · d34d6503
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items.
      
      - Adds a new line at the end of the files (19 files)
      - Fixes 25 lint-java errors (12 RedundantModifier, 6 **ArrayTypeStyle**, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder)
      
      ## How was this patch tested?
      
      After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.)
      ```bash
      $ dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12632 from dongjoon-hyun/SPARK-14868.
      d34d6503
    • Reynold Xin's avatar
      [SPARK-14876][SQL] SparkSession should be case insensitive by default · d0ca5797
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch changes SparkSession to be case insensitive by default, in order to match other database systems.
      
      ## How was this patch tested?
      N/A - I'm sure some tests will fail and I will need to fix those.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #12643 from rxin/SPARK-14876.
      d0ca5797
    • Reynold Xin's avatar
      Disable flaky script transformation test · 0c8e5332
      Reynold Xin authored
      0c8e5332
    • jliwork's avatar
      [SPARK-14548][SQL] Support not greater than and not less than operator in Spark SQL · f0f1a8af
      jliwork authored
      !< means not less than which is equivalent to >=
      !> means not greater than which is equivalent to <=
      
      I'd to create a PR to support these two operators.
      
      I've added new test cases in: DataFrameSuite, ExpressionParserSuite, JDBCSuite, PlanParserSuite, SQLQuerySuite
      
      dilipbiswal viirya gatorsmile
      
      Author: jliwork <jiali@us.ibm.com>
      
      Closes #12316 from jliwork/SPARK-14548.
      f0f1a8af
    • gatorsmile's avatar
      [SPARK-14691][SQL] Simplify and Unify Error Generation for Unsupported Alter Table DDL · 337289d7
      gatorsmile authored
      #### What changes were proposed in this pull request?
      So far, we are capturing each unsupported Alter Table in separate visit functions. They should be unified and issue the same ParseException instead.
      
      This PR is to refactor the existing implementation and make error message consistent for Alter Table DDL.
      
      #### How was this patch tested?
      Updated the existing test cases and also added new test cases to ensure all the unsupported statements are covered.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      Closes #12459 from gatorsmile/cleanAlterTable.
      337289d7
    • Jacek Laskowski's avatar
      [DOCS][MINOR] Screenshot + minor fixes to improve reading for accumulators · 8df8a818
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Added screenshot + minor fixes to improve reading
      
      ## How was this patch tested?
      
      Manual
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #12569 from jaceklaskowski/docs-accumulators.
      8df8a818
    • Steve Loughran's avatar
      [SPARK-13267][WEB UI] document the ?param arguments of the REST API; lift the… · db7113b1
      Steve Loughran authored
      Add to the REST API details on the ? args and examples from the test suite.
      
      I've used the existing table, adding all the fields to the second table.
      
      see [in the pr](https://github.com/steveloughran/spark/blob/history/SPARK-13267-doc-params/docs/monitoring.md).
      
      There's a slightly more sophisticated option: make the table 3 columns wide, and for all existing entries, have the initial `td` span 2 columns. The new entries would then have an empty 1st column, param in 2nd and text in 3rd, with any examples after a `br` entry.
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #11152 from steveloughran/history/SPARK-13267-doc-params.
      db7113b1
    • mathieu longtin's avatar
      Support single argument version of sqlContext.getConf · 902c15c5
      mathieu longtin authored
      ## What changes were proposed in this pull request?
      
      In Python, sqlContext.getConf didn't allow getting the system default (getConf with one parameter).
      
      Now the following are supported:
      ```
      sqlContext.getConf(confName)  # System default if not locally set, this is new
      sqlContext.getConf(confName, myDefault)  # myDefault if not locally set, old behavior
      ```
      
      I also added doctests to this function. The original behavior does not change.
      
      ## How was this patch tested?
      
      Manually, but doctests were added.
      
      Author: mathieu longtin <mathieu.longtin@nuance.com>
      
      Closes #12488 from mathieulongtin/pyfixgetconf3.
      902c15c5
    • Yin Huai's avatar
      [SPARK-14879][SQL] Move CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect to sql/core · 1672149c
      Yin Huai authored
      ## What changes were proposed in this pull request?
      
      CreateMetastoreDataSource and CreateMetastoreDataSourceAsSelect are not Hive-specific. So, this PR moves them from sql/hive to sql/core. Also, I am adding `Command` suffix to these two classes.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #12645 from yhuai/moveCreateDataSource.
      1672149c
Loading