Skip to content
Snippets Groups Projects
  1. Jul 10, 2015
    • Jonathan Alter's avatar
      [SPARK-7977] [BUILD] Disallowing println · e14b545d
      Jonathan Alter authored
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7093 from jonalter/SPARK-7977 and squashes the following commits:
      
      ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite
      7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite
      10724b6 [Jonathan Alter] Changing some printlns to logs in tests
      eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0b1dcb4 [Jonathan Alter] More println cleanup
      aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      0c16fa3 [Jonathan Alter] Replacing some printlns with logs
      45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      5c8e283 [Jonathan Alter] Allowing println in audit-release examples
      5b50da1 [Jonathan Alter] Allowing printlns in example files
      ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      83ab635 [Jonathan Alter] Fixing new printlns
      54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977
      1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns
      b837c3a [Jonathan Alter] Disallowing println
      e14b545d
  2. Jul 09, 2015
    • Michael Vogiatzis's avatar
      [DOCS] Added important updateStateByKey details · d538919c
      Michael Vogiatzis authored
      Runs for *all* existing keys and returning "None" will remove the key-value pair.
      
      Author: Michael Vogiatzis <michaelvogiatzis@gmail.com>
      
      Closes #7229 from mvogiatzis/patch-1 and squashes the following commits:
      
      e7a2946 [Michael Vogiatzis] Updated updateStateByKey text
      00283ed [Michael Vogiatzis] Removed space
      c2656f9 [Michael Vogiatzis] Moved description farther up
      0a42551 [Michael Vogiatzis] Added important updateStateByKey details
      d538919c
    • huangzhaowei's avatar
      [SPARK-8839] [SQL] ThriftServer2 will remove session and execution no matter it's finished or not. · 1903641e
      huangzhaowei authored
      In my test, `sessions` and `executions` in ThriftServer2 is not the same number as the connection number.
      For example, if there are 200 clients connecting to the server,  but it will have more than 200 `sessions` and `executions`.
      So if it reaches the `retainedStatements`, it has to remove some object which is not finished.
      So it may cause the exception described in [Jira Address](https://issues.apache.org/jira/browse/SPARK-8839)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #7239 from SaintBacchus/SPARK-8839 and squashes the following commits:
      
      cf7ef40 [huangzhaowei] Remove the a meanless funciton call
      3e9a5a6 [huangzhaowei] Add a filter before take
      9d5ceb8 [huangzhaowei] [SPARK-8839][SQL]ThriftServer2 will remove session and execution no matter it's finished or not.
      1903641e
    • Holden Karau's avatar
      [SPARK-8913] [ML] Simplify LogisticRegression suite to use Vector Vector comparision · 27273046
      Holden Karau authored
      Cleanup tests from SPARK 8700.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7335 from holdenk/SPARK-8913-cleanup-tests-from-SPARK-8700-logistic-regression-r2-really-logistic-regression-this-time and squashes the following commits:
      
      e5e2c5f [Holden Karau] Simplify LogisticRegression suite to use Vector <-> Vector comparisions instead of comparing element by element
      27273046
    • Marcelo Vanzin's avatar
      [SPARK-8852] [FLUME] Trim dependencies in flume assembly. · 0e78e40c
      Marcelo Vanzin authored
      Also, add support for the *-provided profiles. This avoids repackaging
      things that are already in the Spark assembly, or, in the case of the
      *-provided profiles, are provided by the distribution.
      
      The flume-ng-auth dependency was also excluded since it's not really
      used by Spark.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7247 from vanzin/SPARK-8852 and squashes the following commits:
      
      298a7d5 [Marcelo Vanzin] Feedback.
      c962082 [Marcelo Vanzin] [SPARK-8852] [flume] Trim dependencies in flume assembly.
      0e78e40c
    • Cheng Lian's avatar
      [SPARK-8959] [SQL] [HOTFIX] Removes parquet-thrift and libthrift dependencies · 2d45571f
      Cheng Lian authored
      These two dependencies were introduced in #7231 to help testing Parquet compatibility with `parquet-thrift`. However, they somehow crash the Scala compiler in Maven builds.
      
      This PR fixes this issue by:
      
      1. Removing these two dependencies, and
      2. Instead of generating the testing Parquet file programmatically, checking in an actual testing Parquet file generated by `parquet-thrift` as a test resource.
      
      This is just a quick fix to bring back Maven builds. Need to figure out the root case as binary Parquet files are harder to maintain.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7330 from liancheng/spark-8959 and squashes the following commits:
      
      cf69512 [Cheng Lian] Brings back Maven builds
      2d45571f
    • Feynman Liang's avatar
      [SPARK-8538] [SPARK-8539] [ML] Linear Regression Training and Testing Results · a0cc3e5a
      Feynman Liang authored
      Adds results (e.g. objective value at each iteration, residuals) on training and user-specified test sets for LinearRegressionModel.
      
      Notes to Reviewers:
       * Are the `*TrainingResults` and `Results` classes too specialized for `LinearRegressionModel`? Where would be an appropriate level of abstraction?
       * Please check `transient` annotations are correct; the datasets should not be copied and kept during serialization.
       * Any thoughts on `RDD`s versus `DataFrame`s? If using `DataFrame`s, suggested schemas for each intermediate step? Also, how to create a "local DataFrame" without a `sqlContext`?
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #7099 from feynmanliang/SPARK-8538 and squashes the following commits:
      
      d219fa4 [Feynman Liang] Update docs
      4a42680 [Feynman Liang] Change Summary to hold values, move transient annotations down to metrics and predictions DF
      6300031 [Feynman Liang] Code review changes
      0a5e762 [Feynman Liang] Fix build error
      e71102d [Feynman Liang] Merge branch 'master' into SPARK-8538
      3367489 [Feynman Liang] Merge branch 'master' into SPARK-8538
      70f267c [Feynman Liang] Make TrainingSummary transient and remove Serializable from *Summary and RegressionMetrics
      1d9ea42 [Feynman Liang] Fix failing Java test
      a65dfda [Feynman Liang] Make TrainingSummary and metrics serializable, prediction dataframe transient
      0a605d8 [Feynman Liang] Replace Params from LinearRegression*Summary with private constructor vals
      c2fe835 [Feynman Liang] Optimize imports
      02d8a70 [Feynman Liang] Add Params to LinearModel*Summary, refactor tests and add test for evaluate()
      8f999f4 [Feynman Liang] Refactor from jkbradley code review
      072e948 [Feynman Liang] Style
      509ae36 [Feynman Liang] Use DFs and localize serialization to LinearRegressionModel
      9509c79 [Feynman Liang] Fix imports
      b2bbaa3 [Feynman Liang] Refactored LinearRegressionResults API to be more private
      ffceaec [Feynman Liang] Merge branch 'master' into SPARK-8538
      1cedb2b [Feynman Liang] Add test for decreasing objective trace
      dab0aff [Feynman Liang] Add LinearRegressionTrainingResults tests, make test suite code copy+pasteable
      97b0a81 [Feynman Liang] Add LinearRegressionModel.evaluate() to get results on test sets
      dc51bce [Feynman Liang] Style guide fixes
      521f397 [Feynman Liang] Use RDD[(Double, Double)] instead of DF
      2ff5710 [Feynman Liang] Add training results and model summary to ML LinearRegression
      a0cc3e5a
    • Holden Karau's avatar
      [SPARK-8963][ML] cleanup tests in linear regression suite · e29ce319
      Holden Karau authored
      Simplify model weight assertions to use vector comparision, switch to using absTol when comparing with 0.0 intercepts
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7327 from holdenk/SPARK-8913-cleanup-tests-from-SPARK-8700-logistic-regression and squashes the following commits:
      
      5bac185 [Holden Karau] Simplify model weight assertions to use vector comparision, switch to using absTol when comparing with 0.0 intercepts
      e29ce319
    • Xiangrui Meng's avatar
      Closes #6837 · 69165330
      Xiangrui Meng authored
      Closes #7321
      Closes #2634
      Closes #4963
      Closes #2137
      69165330
    • guowei2's avatar
      [SPARK-8865] [STREAMING] FIX BUG: check key in kafka params · 89770036
      guowei2 authored
      Author: guowei2 <guowei@growingio.com>
      
      Closes #7254 from guowei2/spark-8865 and squashes the following commits:
      
      48ca17a [guowei2] fix contains key
      89770036
    • Davies Liu's avatar
      [SPARK-7902] [SPARK-6289] [SPARK-8685] [SQL] [PYSPARK] Refactor of... · c9e2ef52
      Davies Liu authored
      [SPARK-7902] [SPARK-6289] [SPARK-8685] [SQL] [PYSPARK] Refactor of serialization for Python DataFrame
      
      This PR fix the long standing issue of serialization between Python RDD and DataFrame, it change to using a customized Pickler for InternalRow to enable customized unpickling (type conversion, especially for UDT), now we can support UDT for UDF, cc mengxr .
      
      There is no generated `Row` anymore.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7301 from davies/sql_ser and squashes the following commits:
      
      81bef71 [Davies Liu] address comments
      e9217bd [Davies Liu] add regression tests
      db34167 [Davies Liu] Refactor of serialization for Python DataFrame
      c9e2ef52
    • jerryshao's avatar
      [SPARK-8389] [STREAMING] [PYSPARK] Expose KafkaRDDs offsetRange in Python · 3ccebf36
      jerryshao authored
      This PR propose a simple way to expose OffsetRange in Python code, also the usage of offsetRanges is similar to Scala/Java way, here in Python we could get OffsetRange like:
      
      ```
      dstream.foreachRDD(lambda r: KafkaUtils.offsetRanges(r))
      ```
      
      Reason I didn't follow the way what SPARK-8389 suggested is that: Python Kafka API has one more step to decode the message compared to Scala/Java, Which makes Python API return a transformed RDD/DStream, not directly wrapped so-called JavaKafkaRDD, so it is hard to backtrack to the original RDD to get the offsetRange.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #7185 from jerryshao/SPARK-8389 and squashes the following commits:
      
      4c6d320 [jerryshao] Another way to fix subclass deserialization issue
      e6a8011 [jerryshao] Address the comments
      fd13937 [jerryshao] Fix serialization bug
      7debf1c [jerryshao] bug fix
      cff3893 [jerryshao] refactor the code according to the comments
      2aabf9e [jerryshao] Style fix
      848c708 [jerryshao] Add HasOffsetRanges for Python
      3ccebf36
    • zsxwing's avatar
      [SPARK-8701] [STREAMING] [WEBUI] Add input metadata in the batch page · 1f6b0b12
      zsxwing authored
      This PR adds `metadata` to `InputInfo`. `InputDStream` can report its metadata for a batch and it will be shown in the batch page.
      
      For example,
      
      ![screen shot](https://cloud.githubusercontent.com/assets/1000778/8403741/d6ffc7e2-1e79-11e5-9888-c78c1575123a.png)
      
      FileInputDStream will display the new files for a batch, and DirectKafkaInputDStream will display its offset ranges.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7081 from zsxwing/input-metadata and squashes the following commits:
      
      f7abd9b [zsxwing] Revert the space changes in project/MimaExcludes.scala
      d906209 [zsxwing] Merge branch 'master' into input-metadata
      74762da [zsxwing] Fix MiMa tests
      7903e33 [zsxwing] Merge branch 'master' into input-metadata
      450a46c [zsxwing] Address comments
      1d94582 [zsxwing] Raname InputInfo to StreamInputInfo and change "metadata" to Map[String, Any]
      d496ae9 [zsxwing] Add input metadata in the batch page
      1f6b0b12
    • Iulian Dragos's avatar
      [SPARK-6287] [MESOS] Add dynamic allocation to the coarse-grained Mesos scheduler · c4830598
      Iulian Dragos authored
      This is largely based on extracting the dynamic allocation parts from tnachen's #3861.
      
      Author: Iulian Dragos <jaguarul@gmail.com>
      
      Closes #4984 from dragos/issue/mesos-coarse-dynamicAllocation and squashes the following commits:
      
      39df8cd [Iulian Dragos] Update tests to latest changes in core.
      9d2c9fa [Iulian Dragos] Remove adjustment of executorLimitOption in doKillExecutors.
      8b00f52 [Iulian Dragos] Latest round of reviews.
      0cd00e0 [Iulian Dragos] Add persistent shuffle directory
      15c45c1 [Iulian Dragos] Add dynamic allocation to the Spark coarse-grained scheduler.
      c4830598
    • Andrew Or's avatar
      [SPARK-2017] [UI] Stage page hangs with many tasks · ebdf5853
      Andrew Or authored
      (This reopens a patch that was closed in the past: #6248)
      
      When you view the stage page while running the following:
      ```
      sc.parallelize(1 to X, 10000).count()
      ```
      The page never loads, the job is stalled, and you end up running into an OOM:
      ```
      HTTP ERROR 500
      
      Problem accessing /stages/stage/. Reason:
          Server Error
      Caused by:
      java.lang.OutOfMemoryError: Java heap space
          at java.util.Arrays.copyOf(Arrays.java:2367)
          at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
      ```
      This patch compresses Jetty responses in gzip. The correct long-term fix is to add pagination.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7296 from andrewor14/gzip-jetty and squashes the following commits:
      
      a051c64 [Andrew Or] Use GZIP to compress Jetty responses
      ebdf5853
    • zsxwing's avatar
      [SPARK-7419] [STREAMING] [TESTS] Fix CheckpointSuite.recovery with file input stream · 88bf4303
      zsxwing authored
      Fix this failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2886/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/testReport/junit/org.apache.spark.streaming/CheckpointSuite/recovery_with_file_input_stream/
      
      To reproduce this failure, you can add `Thread.sleep(2000)` before this line
      https://github.com/apache/spark/blob/a9c4e29950a14e32acaac547e9a0e8879fd37fc9/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala#L477
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7323 from zsxwing/SPARK-7419 and squashes the following commits:
      
      b3caf58 [zsxwing] Fix CheckpointSuite.recovery with file input stream
      88bf4303
    • xutingjun's avatar
      [SPARK-8953] SPARK_EXECUTOR_CORES is not read in SparkSubmit · 930fe953
      xutingjun authored
      The configuration ```SPARK_EXECUTOR_CORES``` won't put into ```SparkConf```, so it has no effect to the dynamic executor allocation.
      
      Author: xutingjun <xutingjun@huawei.com>
      
      Closes #7322 from XuTingjun/SPARK_EXECUTOR_CORES and squashes the following commits:
      
      2cafa89 [xutingjun] make SPARK_EXECUTOR_CORES has effect to dynamicAllocation
      930fe953
    • Tathagata Das's avatar
      [MINOR] [STREAMING] Fix log statements in ReceiverSupervisorImpl · 7ce3b818
      Tathagata Das authored
      Log statements incorrectly showed that the executor was being stopped when receiver was being stopped.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7328 from tdas/fix-log and squashes the following commits:
      
      9cc6e99 [Tathagata Das] Fix log statements.
      7ce3b818
    • Cheng Hao's avatar
      [SPARK-8247] [SPARK-8249] [SPARK-8252] [SPARK-8254] [SPARK-8257] [SPARK-8258]... · 0b0b9cea
      Cheng Hao authored
      [SPARK-8247] [SPARK-8249] [SPARK-8252] [SPARK-8254] [SPARK-8257] [SPARK-8258] [SPARK-8259] [SPARK-8261] [SPARK-8262] [SPARK-8253] [SPARK-8260] [SPARK-8267] [SQL] Add String Expressions
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #6762 from chenghao-intel/str_funcs and squashes the following commits:
      
      b09a909 [Cheng Hao] update the code as feedback
      7ebbf4c [Cheng Hao] Add more string expressions
      0b0b9cea
    • Yuhao Yang's avatar
      [SPARK-8703] [ML] Add CountVectorizer as a ml transformer to convert document to words count vector · 0cd84c86
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-8703
      
      Converts a text document to a sparse vector of token counts.
      
      I can further add an estimator to extract vocabulary from corpus if that's appropriate.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #7084 from hhbyyh/countVectorization and squashes the following commits:
      
      5f3f655 [Yuhao Yang] text change
      24728e4 [Yuhao Yang] style improvement
      576728a [Yuhao Yang] rename to model and some fix
      1deca28 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization
      99b0c14 [Yuhao Yang] undo extension from HashingTF
      12c2dc8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization
      7ee1c31 [Yuhao Yang] extends HashingTF
      809fb59 [Yuhao Yang] minor fix for ut
      7c61fb3 [Yuhao Yang] add countVectorizer
      0cd84c86
    • JPark's avatar
      [SPARK-8863] [EC2] Check aws access key from aws credentials if there is no boto config · c59e268d
      JPark authored
      'spark_ec2.py' use boto to control ec2.
      And boto can support '~/.aws/credentials' which is AWS CLI default configuration file.
      
      We can check this information from ref of boto.
      
      "A boto config file is a text file formatted like an .ini configuration file that specifies values for options that control the behavior of the boto library. In Unix/Linux systems, on startup, the boto library looks for configuration files in the following locations and in the following order:
      /etc/boto.cfg - for site-wide settings that all users on this machine will use
      (if profile is given) ~/.aws/credentials - for credentials shared between SDKs
      (if profile is given) ~/.boto - for user-specific settings
      ~/.aws/credentials - for credentials shared between SDKs
      ~/.boto - for user-specific settings"
      
      * ref of boto: http://boto.readthedocs.org/en/latest/boto_config_tut.html
      * ref of aws cli : http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
      
      However 'spark_ec2.py' only check boto config & environment variable even if there is '~/.aws/credentials', and 'spark_ec2.py' is terminated.
      
      So I changed to check '~/.aws/credentials'.
      
      cc rxin
      
      Jira : https://issues.apache.org/jira/browse/SPARK-8863
      
      Author: JPark <JPark@JPark.me>
      
      Closes #7252 from JuhongPark/master and squashes the following commits:
      
      23c5792 [JPark] Check aws access key from aws credentials if there is no boto config
      c59e268d
    • Wenchen Fan's avatar
      [SPARK-8938][SQL] Implement toString for Interval data type · f6c0bd5c
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7315 from cloud-fan/toString and squashes the following commits:
      
      4fc8d80 [Wenchen Fan] Implement toString for Interval data type
      f6c0bd5c
    • Reynold Xin's avatar
      [SPARK-8926][SQL] Code review followup. · a870a82f
      Reynold Xin authored
      I merged https://github.com/apache/spark/pull/7303 so it unblocks another PR. This addresses my own code review comment for that PR.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7313 from rxin/adt and squashes the following commits:
      
      7ade82b [Reynold Xin] Fixed unit tests.
      f8d5533 [Reynold Xin] [SPARK-8926][SQL] Code review followup.
      a870a82f
    • Reynold Xin's avatar
      [SPARK-8948][SQL] Remove ExtractValueWithOrdinal abstract class · e204d22b
      Reynold Xin authored
      Also added more documentation for the file.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7316 from rxin/extract-value and squashes the following commits:
      
      069cb7e [Reynold Xin] Removed ExtractValueWithOrdinal.
      621b705 [Reynold Xin] Reverted a line.
      11ebd6c [Reynold Xin] [Minor][SQL] Improve documentation for complex type extractors.
      e204d22b
    • Liang-Chi Hsieh's avatar
      [SPARK-8940] [SPARKR] Don't overwrite given schema in createDataFrame · 59cc3894
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8940
      
      The given `schema` parameter will be overwritten in `createDataFrame` now. If it is not null, we shouldn't overwrite it.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7311 from viirya/df_not_overwrite_schema and squashes the following commits:
      
      2385139 [Liang-Chi Hsieh] Don't overwrite given schema if it is not null.
      59cc3894
    • Tarek Auel's avatar
      [SPARK-8830] [SQL] native levenshtein distance · a1964e9d
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8830
      
      rxin and HuJiayin can you have a look on it.
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7236 from tarekauel/native-levenshtein-distance and squashes the following commits:
      
      ee4c4de [Tarek Auel] [SPARK-8830] implemented improvement proposals
      c252e71 [Tarek Auel] [SPARK-8830] removed chartAt; use unsafe method for byte array comparison
      ddf2222 [Tarek Auel] Merge branch 'master' into native-levenshtein-distance
      179920a [Tarek Auel] [SPARK-8830] added description
      5e9ed54 [Tarek Auel] [SPARK-8830] removed StringUtils import
      dce4308 [Tarek Auel] [SPARK-8830] native levenshtein distance
      a1964e9d
    • Davies Liu's avatar
      [SPARK-8931] [SQL] Fallback to interpreted evaluation if failed to compile in codegen · 23448a9e
      Davies Liu authored
      Exception will not be catched during tests.
      
      cc marmbrus rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7309 from davies/fallback and squashes the following commits:
      
      969a612 [Davies Liu] throw exception during tests
      f844f77 [Davies Liu] fallback
      a3091bc [Davies Liu] Merge branch 'master' of github.com:apache/spark into fallback
      364a0d6 [Davies Liu] fallback to interpret mode if failed to compile
      23448a9e
    • lewuathe's avatar
      [SPARK-6266] [MLLIB] PySpark SparseVector missing doc for size, indices, values · f88b1253
      lewuathe authored
      Write missing pydocs in `SparseVector` attributes.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #7290 from Lewuathe/SPARK-6266 and squashes the following commits:
      
      51d9895 [lewuathe] Update docs
      0480d35 [lewuathe] Merge branch 'master' into SPARK-6266
      ba42cf3 [lewuathe] [SPARK-6266] PySpark SparseVector missing doc for size, indices, values
      f88b1253
    • Wenchen Fan's avatar
      [SPARK-8942][SQL] use double not decimal when cast double and float to timestamp · 09cb0d9c
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7312 from cloud-fan/minor and squashes the following commits:
      
      a4589fa [Wenchen Fan] use double not decimal when cast double and float to timestamp
      09cb0d9c
    • Weizhong Lin's avatar
      [SPARK-8928] [SQL] Makes CatalystSchemaConverter sticking to 1.4.x- when... · 851e247c
      Weizhong Lin authored
      [SPARK-8928] [SQL] Makes CatalystSchemaConverter sticking to 1.4.x- when handling Parquet LISTs in compatible mode
      
      This PR is based on #7209 authored by Sephiroth-Lin.
      
      Author: Weizhong Lin <linweizhong@huawei.com>
      
      Closes #7314 from liancheng/spark-8928 and squashes the following commits:
      
      75267fe [Cheng Lian] Makes CatalystSchemaConverter sticking to 1.4.x- when handling LISTs in compatible mode
      851e247c
    • Cheng Lian's avatar
      Revert "[SPARK-8928] [SQL] Makes CatalystSchemaConverter sticking to 1.4.x-... · c056484c
      Cheng Lian authored
      Revert "[SPARK-8928] [SQL] Makes CatalystSchemaConverter sticking to 1.4.x- when handling Parquet LISTs in compatible mode"
      
      This reverts commit 3dab0da4.
      c056484c
    • Cheng Lian's avatar
      [SPARK-8928] [SQL] Makes CatalystSchemaConverter sticking to 1.4.x- when... · 3dab0da4
      Cheng Lian authored
      [SPARK-8928] [SQL] Makes CatalystSchemaConverter sticking to 1.4.x- when handling Parquet LISTs in compatible mode
      
      This PR is based on #7209 authored by Sephiroth-Lin.
      
      Author: Weizhong Lin <linweizhong@huawei.com>
      
      Closes #7304 from liancheng/spark-8928 and squashes the following commits:
      
      75267fe [Cheng Lian] Makes CatalystSchemaConverter sticking to 1.4.x- when handling LISTs in compatible mode
      3dab0da4
    • Reynold Xin's avatar
      Closes #7310. · a240bf3b
      Reynold Xin authored
      a240bf3b
    • Michael Armbrust's avatar
      [SPARK-8926][SQL] Good errors for ExpectsInputType expressions · 768907eb
      Michael Armbrust authored
      For example: `cannot resolve 'testfunction(null)' due to data type mismatch: argument 1 is expected to be of type int, however, null is of type datetype.`
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #7303 from marmbrus/expectsTypeErrors and squashes the following commits:
      
      c654a0e [Michael Armbrust] fix udts and make errors pretty
      137160d [Michael Armbrust] style
      5428fda [Michael Armbrust] style
      10fac82 [Michael Armbrust] [SPARK-8926][SQL] Good errors for ExpectsInputType expressions
      768907eb
  3. Jul 08, 2015
    • Kousuke Saruta's avatar
      [SPARK-8937] [TEST] A setting `spark.unsafe.exceptionOnMemoryLeak ` is missing in ScalaTest config. · aba5784d
      Kousuke Saruta authored
      `spark.unsafe.exceptionOnMemoryLeak` is present in the config of surefire.
      
      ```
              <!-- Surefire runs all Java tests -->
              <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.18.1</version>
                <!-- Note config is repeated in scalatest config -->
      ...
      
      <spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak>
                  </systemProperties>
      ...
      ```
      
       but is absent in the config ScalaTest.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #7308 from sarutak/add-setting-for-memory-leak and squashes the following commits:
      
      95644e7 [Kousuke Saruta] Added a setting for memory leak
      aba5784d
    • Andrew Or's avatar
      [SPARK-8910] Fix MiMa flaky due to port contention issue · 47ef423f
      Andrew Or authored
      Due to the way MiMa works, we currently start a `SQLContext` pretty early on. This causes us to start a `SparkUI` that attempts to bind to port 4040. Because many tests run in parallel on the Jenkins machines, this  causes port contention sometimes and fails the MiMa tests.
      
      Note that we already disabled the SparkUI for scalatests. However, the MiMa test is run before we even have a chance to load the default scalatest settings, so we need to explicitly disable the UI ourselves.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7300 from andrewor14/mima-flaky and squashes the following commits:
      
      b55a547 [Andrew Or] Do not enable SparkUI during tests
      47ef423f
    • Josh Rosen's avatar
      [SPARK-8932] Support copy() for UnsafeRows that do not use ObjectPools · b55499a4
      Josh Rosen authored
      We call Row.copy() in many places throughout SQL but UnsafeRow currently throws UnsupportedOperationException when copy() is called.
      
      Supporting copying when ObjectPool is used may be difficult, since we may need to handle deep-copying of objects in the pool. In addition, this copy() method needs to produce a self-contained row object which may be passed around / buffered by downstream code which does not understand the UnsafeRow format.
      
      In the long run, we'll need to figure out how to handle the ObjectPool corner cases, but this may be unnecessary if other changes are made. Therefore, in order to unblock my sort patch (#6444) I propose that we support copy() for the cases where UnsafeRow does not use an ObjectPool and continue to throw UnsupportedOperationException when an ObjectPool is used.
      
      This patch accomplishes this by modifying UnsafeRow so that it knows the size of the row's backing data in order to be able to copy it into a byte array.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7306 from JoshRosen/SPARK-8932 and squashes the following commits:
      
      338e6bf [Josh Rosen] Support copy for UnsafeRows that do not use ObjectPools.
      b55499a4
    • Yijie Shen's avatar
      [SPARK-8866][SQL] use 1us precision for timestamp type · a2908148
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8866
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7283 from yijieshen/micro_timestamp and squashes the following commits:
      
      dc735df [Yijie Shen] update CastSuite to avoid round error
      714eaea [Yijie Shen] add timestamp_udf into blacklist due to precision lose
      c3ca2f4 [Yijie Shen] fix unhandled case in CurrentTimestamp
      8d4aa6b [Yijie Shen] use 1us precision for timestamp type
      a2908148
    • Jonathan Alter's avatar
      [SPARK-8927] [DOCS] Format wrong for some config descriptions · 28fa01e2
      Jonathan Alter authored
      A couple descriptions were not inside `<td></td>` and were being displayed immediately under the section title instead of in their row.
      
      Author: Jonathan Alter <jonalter@users.noreply.github.com>
      
      Closes #7292 from jonalter/docs-config and squashes the following commits:
      
      5ce1570 [Jonathan Alter] [DOCS] Format wrong for some config descriptions
      28fa01e2
    • Davies Liu's avatar
      [SPARK-8450] [SQL] [PYSARK] cleanup type converter for Python DataFrame · 74d8d3d9
      Davies Liu authored
      This PR fixes the converter for Python DataFrame, especially for DecimalType
      
      Closes #7106
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7131 from davies/decimal_python and squashes the following commits:
      
      4d3c234 [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_python
      20531d6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_python
      7d73168 [Davies Liu] fix conflit
      6cdd86a [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_python
      7104e97 [Davies Liu] improve type infer
      9cd5a21 [Davies Liu] run python tests with SPARK_PREPEND_CLASSES
      829a05b [Davies Liu] fix UDT in python
      c99e8c5 [Davies Liu] fix mima
      c46814a [Davies Liu] convert decimal for Python DataFrames
      74d8d3d9
Loading