Skip to content
Snippets Groups Projects
  1. Jun 24, 2015
    • Yin Huai's avatar
      [SPARK-8578] [SQL] Should ignore user defined output committer when appending data · bba6699d
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-8578
      
      It is not very safe to use a custom output committer when append data to an existing dir. This changes adds the logic to check if we are appending data, and if so, we use the output committer associated with the file output format.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6964 from yhuai/SPARK-8578 and squashes the following commits:
      
      43544c4 [Yin Huai] Do not use a custom output commiter when appendiing data.
      bba6699d
    • Cheng Lian's avatar
      [SPARK-8567] [SQL] Debugging flaky HiveSparkSubmitSuite · 9d36ec24
      Cheng Lian authored
      Using similar approach used in `HiveThriftServer2Suite` to print stdout/stderr of the spawned process instead of logging them to see what happens on Jenkins. (This test suite only fails on Jenkins and doesn't spill out any log...)
      
      cc yhuai
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6978 from liancheng/debug-hive-spark-submit-suite and squashes the following commits:
      
      b031647 [Cheng Lian] Prints process stdout/stderr instead of logging them
      9d36ec24
    • Cheng Lian's avatar
      [SPARK-8138] [SQL] Improves error message when conflicting partition columns are found · cc465fd9
      Cheng Lian authored
      This PR improves the error message shown when conflicting partition column names are detected.  This can be particularly annoying and confusing when there are a large number of partitions while a handful of them happened to contain unexpected temporary file(s).  Now all suspicious directories are listed as below:
      
      ```
      java.lang.AssertionError: assertion failed: Conflicting partition column names detected:
      
              Partition column name list #0: b, c, d
              Partition column name list #1: b, c
              Partition column name list #2: b
      
      For partitioned table directories, data files should only live in leaf directories. Please check the following directories for unexpected files:
      
              file:/tmp/foo/b=0
              file:/tmp/foo/b=1
              file:/tmp/foo/b=1/c=1
              file:/tmp/foo/b=0/c=0
      ```
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6610 from liancheng/part-errmsg and squashes the following commits:
      
      7d05f2c [Cheng Lian] Fixes Scala style issue
      a149250 [Cheng Lian] Adds test case for the error message
      6b74dd8 [Cheng Lian] Also lists suspicious non-leaf partition directories
      a935eb8 [Cheng Lian] Improves error message when conflicting partition columns are found
      cc465fd9
    • Wenchen Fan's avatar
      [SPARK-8371] [SQL] improve unit test for MaxOf and MinOf and fix bugs · 09fcf96b
      Wenchen Fan authored
      a follow up of https://github.com/apache/spark/pull/6813
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6825 from cloud-fan/cg and squashes the following commits:
      
      43170cc [Wenchen Fan] fix bugs in code gen
      09fcf96b
    • Josh Rosen's avatar
      [HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for launcher project · 13ae806b
      Josh Rosen authored
      This commit changes the MiMa tests to test against the released 1.4.0 artifacts rather than 1.4.0-rc4; this change is necessary to fix a Jenkins build break since it seems that the RC4 snapshot is no longer available via Maven.
      
      I also enabled MiMa checks for the `launcher` subproject, which we should have done right after 1.4.0 was released.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6974 from JoshRosen/mima-hotfix and squashes the following commits:
      
      4b4175a [Josh Rosen] [HOTFIX] [BUILD] Fix MiMa checks in master branch; enable MiMa for launcher project
      13ae806b
    • Eric Liang's avatar
      [SPARK-6749] [SQL] Make metastore client robust to underlying socket connection loss · 50c3a86f
      Eric Liang authored
      This works around a bug in the underlying RetryingMetaStoreClient (HIVE-10384) by refreshing the metastore client on thrift exceptions. We attempt to emulate the proper hive behavior by retrying only as configured by hiveconf.
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #6912 from ericl/spark-6749 and squashes the following commits:
      
      2d54b55 [Eric Liang] use conf from state
      0e3a74e [Eric Liang] use shim properly
      980b3e5 [Eric Liang] Fix conf parsing hive 0.14 conf.
      92459b6 [Eric Liang] Work around RetryingMetaStoreClient bug
      50c3a86f
  2. Jun 23, 2015
    • Reynold Xin's avatar
      Revert "[SPARK-7157][SQL] add sampleBy to DataFrame" · a458efc6
      Reynold Xin authored
      This reverts commit 0401cbaa.
      
      The new test case on Jenkins is failing.
      a458efc6
    • Xiangrui Meng's avatar
      [SPARK-7157][SQL] add sampleBy to DataFrame · 0401cbaa
      Xiangrui Meng authored
      Add `sampleBy` to DataFrame. rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6769 from mengxr/SPARK-7157 and squashes the following commits:
      
      991f26f [Xiangrui Meng] fix seed
      4a14834 [Xiangrui Meng] move sampleBy to stat
      832f7cc [Xiangrui Meng] add sampleBy to DataFrame
      0401cbaa
    • Cheng Lian's avatar
      [SPARK-8139] [SQL] Updates docs and comments of data sources and Parquet output committer options · 111d6b9b
      Cheng Lian authored
      This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6683 from liancheng/output-committer-docs and squashes the following commits:
      
      b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option
      ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
      111d6b9b
    • Davies Liu's avatar
      [SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column is used in booelan expression · 7fb5ae50
      Davies Liu authored
      It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6961 from davies/column_bool and squashes the following commits:
      
      9f19beb [Davies Liu] update message
      af74bd6 [Davies Liu] fix tests
      07dff84 [Davies Liu] address comments, fix tests
      f70c08e [Davies Liu] raise Exception if column is used in booelan expression
      7fb5ae50
    • Cheng Lian's avatar
      [DOC] [SQL] Addes Hive metastore Parquet table conversion section · d96d7b55
      Cheng Lian authored
      This PR adds a section about Hive metastore Parquet table conversion. It documents:
      
      1. Schema reconciliation rules introduced in #5214 (see [this comment] [1] in #5188)
      2. Metadata refreshing requirement introduced in #5339
      
      [1]: https://github.com/apache/spark/pull/5188#issuecomment-86531248
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits:
      
      42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet
      4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet
      756e660 [Cheng Lian] Adds Python snippet for metadata refreshing
      50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section
      d96d7b55
    • Oleksiy Dyagilev's avatar
      [SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespace... · a8031183
      Oleksiy Dyagilev authored
      [SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespace between label and features vector
      
      fix LabeledPoint parser when there is a whitespace between label and features vector, e.g.
      (y, [x1, x2, x3])
      
      Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>
      
      Closes #6954 from fe2s/SPARK-8525 and squashes the following commits:
      
      0755b9d [Oleksiy Dyagilev] [SPARK-8525][MLLIB] addressing comment, removing dep on commons-lang
      c1abc2b [Oleksiy Dyagilev] [SPARK-8525][MLLIB] fix LabeledPoint parser when there is a whitespace on specific position
      a8031183
    • Alok  Singh's avatar
      [SPARK-8111] [SPARKR] SparkR shell should display Spark logo and version banner on startup. · f2fb0285
      Alok Singh authored
      spark version is taken from the environment variable SPARK_VERSION
      
      Author: Alok  Singh <singhal@Aloks-MacBook-Pro.local>
      Author: Alok  Singh <singhal@aloks-mbp.usca.ibm.com>
      
      Closes #6944 from aloknsingh/aloknsingh_spark_jiras and squashes the following commits:
      
      ed607bd [Alok  Singh] [SPARK-8111][SparkR] As per suggestion, 1) using the version from sparkContext rather than the Sys.env. 2) change "Welcome to SparkR!" to "Welcome to" followed by Spark logo and version
      acd5b85 [Alok  Singh] fix the jira SPARK-8111 to add the spark version and logo. Currently spark version is taken from the environment variable SPARK_VERSION
      f2fb0285
    • MechCoder's avatar
      [SPARK-8265] [MLLIB] [PYSPARK] Add LinearDataGenerator to pyspark.mllib.utils · f2022fa0
      MechCoder authored
      It is useful to generate linear data for easy testing of linear models and in general. Scala already has it. This is just a wrapper around the Scala code.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6715 from MechCoder/generate_linear_input and squashes the following commits:
      
      6182884 [MechCoder] Minor changes
      8bda047 [MechCoder] Minor style fixes
      0f1053c [MechCoder] [SPARK-8265] Add LinearDataGenerator to pyspark.mllib.utils
      f2022fa0
    • Holden Karau's avatar
      [SPARK-7888] Be able to disable intercept in linear regression in ml package · 2b1111dd
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6927 from holdenk/SPARK-7888-Be-able-to-disable-intercept-in-Linear-Regression-in-ML-package and squashes the following commits:
      
      0ad384c [Holden Karau] Add MiMa excludes
      4016fac [Holden Karau] Switch to wild card import, remove extra blank lines
      ae5baa8 [Holden Karau] CR feedback, move the fitIntercept down rather than changing ymean and etc above
      f34971c [Holden Karau] Fix some more long lines
      319bd3f [Holden Karau] Fix long lines
      3bb9ee1 [Holden Karau] Update the regression suite tests
      7015b9f [Holden Karau] Our code performs the same with R, except we need more than one data point but that seems reasonable
      0b0c8c0 [Holden Karau] fix the issue with the sample R code
      e2140ba [Holden Karau] Add a test, it fails!
      5e84a0b [Holden Karau] Write out thoughts and use the correct trait
      91ffc0a [Holden Karau] more murh
      006246c [Holden Karau] murp?
      2b1111dd
    • Davies Liu's avatar
      [SPARK-8432] [SQL] fix hashCode() and equals() of BinaryType in Row · 6f4cadf5
      Davies Liu authored
      Also added more tests in LiteralExpressionSuite
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6876 from davies/fix_hashcode and squashes the following commits:
      
      429c2c0 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode
      32d9811 [Davies Liu] fix test
      a0626ed [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_hashcode
      89c2432 [Davies Liu] fix style
      bd20780 [Davies Liu] check with catalyst types
      41caec6 [Davies Liu] change for to while
      d96929b [Davies Liu] address comment
      6ad2a90 [Davies Liu] fix style
      5819d33 [Davies Liu] unify equals() and hashCode()
      0fff25d [Davies Liu] fix style
      53c38b1 [Davies Liu] fix hashCode() and equals() of BinaryType in Row
      6f4cadf5
    • Cheng Hao's avatar
      [SPARK-7235] [SQL] Refactor the grouping sets · 7b1450b6
      Cheng Hao authored
      The logical plan `Expand` takes the `output` as constructor argument, which break the references chain. We need to refactor the code, as well as the column pruning.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #5780 from chenghao-intel/expand and squashes the following commits:
      
      76e4aa4 [Cheng Hao] revert the change for case insenstive
      7c10a83 [Cheng Hao] refactor the grouping sets
      7b1450b6
    • lockwobr's avatar
      [SQL] [DOCS] updated the documentation for explode · 4f7fbefb
      lockwobr authored
      the syntax was incorrect in the example in explode
      
      Author: lockwobr <lockwobr@gmail.com>
      
      Closes #6943 from lockwobr/master and squashes the following commits:
      
      3d864d1 [lockwobr] updated the documentation for explode
      4f7fbefb
    • Holden Karau's avatar
      [SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffle writer · 0f92be5b
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits:
      
      f807832 [Holden Karau] Log error if we can't throw it
      855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates.
      039d620 [Holden Karau] Add missing closeandwriteoutput
      30e558d [Holden Karau] go back to try/finally
      e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception
      ae0b7a7 [Holden Karau] Fix the test
      2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions
      0f92be5b
    • Reynold Xin's avatar
      [SPARK-8300] DataFrame hint for broadcast join. · 6ceb1696
      Reynold Xin authored
      Users can now do
      ```scala
      left.join(broadcast(right), "joinKey")
      ```
      to give the query planner a hint that "right" DataFrame is small and should be broadcasted.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6751 from rxin/broadcastjoin-hint and squashes the following commits:
      
      953eec2 [Reynold Xin] Code review feedback.
      88752d8 [Reynold Xin] Fixed import.
      8187b88 [Reynold Xin] [SPARK-8300] DataFrame hint for broadcast join.
      6ceb1696
    • Scott Taylor's avatar
      [SPARK-8541] [PYSPARK] test the absolute error in approx doctests · f0dcbe8a
      Scott Taylor authored
      A minor change but one which is (presumably) visible on the public api docs webpage.
      
      Author: Scott Taylor <github@megatron.me.uk>
      
      Closes #6942 from megatron-me-uk/patch-3 and squashes the following commits:
      
      fbed000 [Scott Taylor] test the absolute error in approx doctests
      f0dcbe8a
    • Hari Shreedharan's avatar
      [SPARK-8483] [STREAMING] Remove commons-lang3 dependency from Flume Si… · 9b618fb0
      Hari Shreedharan authored
      …nk. Also bump Flume version to 1.6.0
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6910 from harishreedharan/remove-commons-lang3 and squashes the following commits:
      
      9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0
      ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0
      9b618fb0
    • Liang-Chi Hsieh's avatar
      [SPARK-8359] [SQL] Fix incorrect decimal precision after multiplication · 31bd3068
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8359
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6814 from viirya/fix_decimal2 and squashes the following commits:
      
      071a757 [Liang-Chi Hsieh] Remove maximum precision and use MathContext.UNLIMITED.
      df217d4 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2
      a43bfc3 [Liang-Chi Hsieh] Add MathContext with maximum supported precision.
      72eeb3f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal2
      44c9348 [Liang-Chi Hsieh] Fix incorrect decimal precision after multiplication.
      31bd3068
    • Yu ISHIKAWA's avatar
      [SPARK-8431] [SPARKR] Add in operator to DataFrame Column in SparkR · d4f63351
      Yu ISHIKAWA authored
      [[SPARK-8431] Add in operator to DataFrame Column in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8431)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6941 from yu-iskw/SPARK-8431 and squashes the following commits:
      
      1f64423 [Yu ISHIKAWA] Modify the comment
      f4309a7 [Yu ISHIKAWA] Make a `setMethod` for `%in%` be independent
      6e37936 [Yu ISHIKAWA] Modify a variable name
      c196173 [Yu ISHIKAWA] [SPARK-8431][SparkR] Add in operator to DataFrame Column in SparkR
      d4f63351
    • Holden Karau's avatar
      [SPARK-7781] [MLLIB] gradient boosted trees.train regressor missing max bins · 164fe2aa
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #6331 from holdenk/SPARK-7781-GradientBoostedTrees.trainRegressor-missing-max-bins and squashes the following commits:
      
      2894695 [Holden Karau] remove extra blank line
      2573e8d [Holden Karau] Update the scala side of the pythonmllibapi and make the test a bit nicer too
      3a09170 [Holden Karau] add maxBins to to the train method as well
      af7f274 [Holden Karau] Add maxBins to GradientBoostedTrees.trainRegressor and correctly mention the default of 32 in other places where it mentioned 100
      164fe2aa
  3. Jun 22, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-8548] [SPARKR] Remove the trailing whitespaces from the SparkR files · 44fa7df6
      Yu ISHIKAWA authored
      [[SPARK-8548] Remove the trailing whitespaces from the SparkR files - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8548)
      
      - This is the result of `lint-r`
          https://gist.github.com/yu-iskw/0019b37a2c1167f33986
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6945 from yu-iskw/SPARK-8548 and squashes the following commits:
      
      0bd567a [Yu ISHIKAWA] [SPARK-8548][SparkR] Remove the trailing whitespaces from the SparkR files
      44fa7df6
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · c4d23439
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #2849 (close requested by 'srowen')
      Closes #2786 (close requested by 'andrewor14')
      Closes #4678 (close requested by 'JoshRosen')
      Closes #5457 (close requested by 'andrewor14')
      Closes #3346 (close requested by 'andrewor14')
      Closes #6518 (close requested by 'andrewor14')
      Closes #5403 (close requested by 'pwendell')
      Closes #2110 (close requested by 'srowen')
      c4d23439
    • Cheng Hao's avatar
      [SPARK-7859] [SQL] Collect_set() behavior differences which fails the unit test under jdk8 · 13321e65
      Cheng Hao authored
      To reproduce that:
      ```
      JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive  'test-only org.apache.spark.sql.hive.execution.HiveWindowFunctionQueryWithoutCodeGenSuite'
      ```
      
      A simple workaround to fix that is update the original query, for getting the output size instead of the exact elements of the array (output by collect_set())
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #6402 from chenghao-intel/windowing and squashes the following commits:
      
      99312ad [Cheng Hao] add order by for the select clause
      edf8ce3 [Cheng Hao] update the code as suggested
      7062da7 [Cheng Hao] fix the collect_set() behaviour differences under different versions of JDK
      13321e65
    • Davies Liu's avatar
      [SPARK-8307] [SQL] improve timestamp from parquet · 6b7f2cea
      Davies Liu authored
      This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp).
      
      cc adrian-wang rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6759 from davies/improve_ts and squashes the following commits:
      
      849e301 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
      b0e4cad [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
      8e2d56f [Davies Liu] address comments
      634b9f5 [Davies Liu] fix mima
      4891efb [Davies Liu] address comment
      bfc437c [Davies Liu] fix build
      ae5979c [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
      602b969 [Davies Liu] remove jodd
      2f2e48c [Davies Liu] fix test
      8ace611 [Davies Liu] fix mima
      212143b [Davies Liu] fix mina
      c834108 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
      a3171b8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
      5233974 [Davies Liu] fix scala style
      361fd62 [Davies Liu] address comments
      ea196d4 [Davies Liu] improve timestamp from parquet
      6b7f2cea
    • Wenchen Fan's avatar
      [SPARK-7153] [SQL] support all integral type ordinal in GetArrayItem · 860a49ef
      Wenchen Fan authored
      first convert `ordinal` to `Number`, then convert to int type.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #5706 from cloud-fan/7153 and squashes the following commits:
      
      915db79 [Wenchen Fan] fix 7153
      860a49ef
    • Andrew Or's avatar
      [HOTFIX] [TESTS] Typo mqqt -> mqtt · 1dfb0f7b
      Andrew Or authored
      This was introduced in #6866.
      1dfb0f7b
    • Davies Liu's avatar
      [SPARK-8492] [SQL] support binaryType in UnsafeRow · 96aa0137
      Davies Liu authored
      Support BinaryType in UnsafeRow, just like StringType.
      
      Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6911 from davies/unsafe_bin and squashes the following commits:
      
      d68706f [Davies Liu] update comment
      519f698 [Davies Liu] address comment
      98a964b [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_bin
      180b49d [Davies Liu] fix zero-out
      22e4c0a [Davies Liu] zero-out padding bytes
      6abfe93 [Davies Liu] fix style
      447dea0 [Davies Liu] support binaryType in UnsafeRow
      96aa0137
    • BenFradet's avatar
      [SPARK-8356] [SQL] Reconcile callUDF and callUdf · 50d3242d
      BenFradet authored
      Deprecates ```callUdf``` in favor of ```callUDF```.
      
      Author: BenFradet <benjamin.fradet@gmail.com>
      
      Closes #6902 from BenFradet/SPARK-8356 and squashes the following commits:
      
      ef4e9d8 [BenFradet] deprecated callUDF, use udf instead
      9b1de4d [BenFradet] reinstated unit test for the deprecated callUdf
      cbd80a5 [BenFradet] deprecated callUdf in favor of callUDF
      50d3242d
    • Yu ISHIKAWA's avatar
      [SPARK-8537] [SPARKR] Add a validation rule about the curly braces in SparkR to `.lintr` · b1f3a489
      Yu ISHIKAWA authored
      [[SPARK-8537] Add a validation rule about the curly braces in SparkR to `.lintr` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8537)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6940 from yu-iskw/SPARK-8537 and squashes the following commits:
      
      7eec1a0 [Yu ISHIKAWA] [SPARK-8537][SparkR] Add a validation rule about the curly braces in SparkR to `.lintr`
      b1f3a489
    • Feynman Liang's avatar
      [SPARK-8455] [ML] Implement n-gram feature transformer · afe35f05
      Feynman Liang authored
      Implementation of n-gram feature transformer for ML.
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #6887 from feynmanliang/ngram-featurizer and squashes the following commits:
      
      d2c839f [Feynman Liang] Make n > input length yield empty output
      9fadd36 [Feynman Liang] Add empty and corner test cases, fix names and spaces
      fe93873 [Feynman Liang] Implement n-gram feature transformer
      afe35f05
    • Yin Huai's avatar
      [SPARK-8532] [SQL] In Python's DataFrameWriter,... · 5ab9fcfb
      Yin Huai authored
      [SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json/parquet/jdbc always override mode
      
      https://issues.apache.org/jira/browse/SPARK-8532
      
      This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6937 from yhuai/SPARK-8532 and squashes the following commits:
      
      f972d5d [Yin Huai] davies's comment.
      d37abd2 [Yin Huai] style.
      d21290a [Yin Huai] Python doc.
      889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
      7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
      d696dff [Yin Huai] Python style.
      88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
      c40c461 [Yin Huai] Regression test.
      5ab9fcfb
    • Wenchen Fan's avatar
      [SPARK-8104] [SQL] auto alias expressions in analyzer · da7bbb94
      Wenchen Fan authored
      Currently we auto alias expression in parser. However, during parser phase we don't have enough information to do the right alias. For example, Generator that has more than 1 kind of element need MultiAlias, ExtractValue don't need Alias if it's in middle of a ExtractValue chain.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6647 from cloud-fan/alias and squashes the following commits:
      
      552eba4 [Wenchen Fan] fix python
      5b5786d [Wenchen Fan] fix agg
      73a90cb [Wenchen Fan] fix case-preserve of ExtractValue
      4cfd23c [Wenchen Fan] fix order by
      d18f401 [Wenchen Fan] refine
      9f07359 [Wenchen Fan] address comments
      39c1aef [Wenchen Fan] small fix
      33640ec [Wenchen Fan] auto alias expressions in analyzer
      da7bbb94
    • Yu ISHIKAWA's avatar
      [SPARK-8511] [PYSPARK] Modify a test to remove a saved model in `regression.py` · 5d89d9f0
      Yu ISHIKAWA authored
      [[SPARK-8511] Modify a test to remove a saved model in `regression.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8511)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6926 from yu-iskw/SPARK-8511 and squashes the following commits:
      
      7cd0948 [Yu ISHIKAWA] Use `shutil.rmtree()` to temporary directories for saving model testings, instead of `os.removedirs()`
      4a01c9e [Yu ISHIKAWA] [SPARK-8511][pyspark] Modify a test to remove a saved model in `regression.py`
      5d89d9f0
    • Pradeep Chhetri's avatar
      [SPARK-8482] Added M4 instances to the list. · ba8a4537
      Pradeep Chhetri authored
      AWS recently added M4 instances (https://aws.amazon.com/blogs/aws/the-new-m4-instance-type-bonus-price-reduction-on-m3-c4/).
      
      Author: Pradeep Chhetri <pradeep.chhetri89@gmail.com>
      
      Closes #6899 from pradeepchhetri/master and squashes the following commits:
      
      4f4ea79 [Pradeep Chhetri] Added t2.large instance
      3d2bb6c [Pradeep Chhetri] Added M4 instances to the list
      ba8a4537
    • Stefano Parmesan's avatar
      [SPARK-8429] [EC2] Add ability to set additional tags · 42a1f716
      Stefano Parmesan authored
      Add the `--additional-tags` parameter that allows to set additional tags to all the created instances (masters and slaves).
      
      The user can specify multiple tags by separating them with a comma (`,`), while each tag name and value should be separated by a colon (`:`); for example, `Task:MySparkProject,Env:production` would add two tags, `Task` and `Env`, with the given values.
      
      Author: Stefano Parmesan <s.parmesan@gmail.com>
      
      Closes #6857 from armisael/patch-1 and squashes the following commits:
      
      c5ac92c [Stefano Parmesan] python style (pep8)
      8e614f1 [Stefano Parmesan] Set multiple tags in a single request
      bfc56af [Stefano Parmesan] Address SPARK-7900 by inceasing sleep time
      daf8615 [Stefano Parmesan] Add ability to set additional tags
      42a1f716
Loading