Skip to content
Snippets Groups Projects
  1. Jun 17, 2015
    • zsxwing's avatar
      [SPARK-8373] [PYSPARK] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD · 0fc4b96f
      zsxwing authored
      This PR fixes the sum issue and also adds `emptyRDD` so that it's easy to create a test case.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6826 from zsxwing/python-emptyRDD and squashes the following commits:
      
      b36993f [zsxwing] Update the return type to JavaRDD[T]
      71df047 [zsxwing] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD
      0fc4b96f
    • Carson Wang's avatar
      [SPARK-8372] History server shows incorrect information for application not started · 2837e067
      Carson Wang authored
      The history server may show an incorrect App ID for an incomplete application like <App ID>.inprogress. This app info will never disappear even after the app is completed.
      ![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png)
      
      The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #6827 from carsonwang/SPARK-8372 and squashes the following commits:
      
      cdbb089 [Carson Wang] Fix code style
      3e46b35 [Carson Wang] Update code style
      90f5dde [Carson Wang] Add a unit test
      d8c9cd0 [Carson Wang] Replaying events only return information when app is started
      2837e067
    • Mingfei's avatar
      [SPARK-8161] Set externalBlockStoreInitialized to be true, after ExternalBlockStore is initialized · 7ad8c5d8
      Mingfei authored
      externalBlockStoreInitialized is never set to be true, which causes the blocks stored in ExternalBlockStore can not be removed.
      
      Author: Mingfei <mingfei.shi@intel.com>
      
      Closes #6702 from shimingfei/SetTrue and squashes the following commits:
      
      add61d8 [Mingfei] Set externalBlockStoreInitialized to be true, after ExternalBlockStore is initialized
      7ad8c5d8
    • OopsOutOfMemory's avatar
      [SPARK-8010] [SQL] Promote types to StringType as implicit conversion in... · 98ee3512
      OopsOutOfMemory authored
      [SPARK-8010] [SQL] Promote types to StringType as implicit conversion in non-binary expression of HiveTypeCoercion
      
      1. Given a query
      `select coalesce(null, 1, '1') from dual` will cause exception:
      java.lang.RuntimeException: Could not determine return type of Coalesce for IntegerType,StringType
      2. Given a query:
      `select case when true then 1 else '1' end from dual` will cause exception:
      java.lang.RuntimeException: Types in CASE WHEN must be the same or coercible to a common type: StringType != IntegerType
      I checked the code, the main cause is the HiveTypeCoercion doesn't do implicit convert when there is a IntegerType and StringType.
      
      Numeric types can be promoted to string type
      
      Hive will always do this implicit conversion.
      
      Author: OopsOutOfMemory <victorshengli@126.com>
      
      Closes #6551 from OopsOutOfMemory/pnts and squashes the following commits:
      
      7a209d7 [OopsOutOfMemory] rebase master
      6018613 [OopsOutOfMemory] convert function to method
      4cd5618 [OopsOutOfMemory] limit the data type to primitive type
      df365d2 [OopsOutOfMemory] refine
      95cbd58 [OopsOutOfMemory] fix style
      403809c [OopsOutOfMemory] promote non-string to string when can not found tighestCommonTypeOfTwo
      98ee3512
    • Imran Rashid's avatar
      [SPARK-6782] add sbt-revolver plugin · a4659443
      Imran Rashid authored
      to make it easier to start & stop http servers in sbt
      https://issues.apache.org/jira/browse/SPARK-6782
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #5426 from squito/SPARK-6782 and squashes the following commits:
      
      dc4fb19 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt
      a4659443
    • Sean Owen's avatar
      [SPARK-8395] [DOCS] start-slave.sh docs incorrect · f005be02
      Sean Owen authored
      start-slave.sh no longer takes a worker # param in 1.4+
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6855 from srowen/SPARK-8395 and squashes the following commits:
      
      300278e [Sean Owen] start-slave.sh no longer takes a worker # param in 1.4+
      f005be02
    • Michael Davies's avatar
      [SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of children · 0c1b2df0
      Michael Davies authored
      For example large IN clauses
      
      Large IN clauses are parsed very slowly. For example SQL below (10K items in IN) takes 45-50s.
      
      s"""SELECT * FROM Person WHERE ForeName IN ('${(1 to 10000).map("n" + _).mkString("','")}')"""
      
      This is principally due to TreeNode which repeatedly call contains on children, where children in this case is a List that is 10K long. In effect parsing for large IN clauses is O(N squared).
      A lazily initialised Set based on children for contains reduces parse time to around 2.5s
      
      Author: Michael Davies <Michael.BellDavies@gmail.com>
      
      Closes #6673 from MickDavies/SPARK-8077 and squashes the following commits:
      
      38cd425 [Michael Davies] SPARK-8077: Optimization for  TreeNodes with large numbers of children
      d80103b [Michael Davies] SPARK-8077: Optimization for  TreeNodes with large numbers of children
      e6be8be [Michael Davies] SPARK-8077: Optimization for  TreeNodes with large numbers of children
      0c1b2df0
    • Brennon York's avatar
      [SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python · 50a0496a
      Brennon York authored
      All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved to a much more modular approach (more functions, moved the calls around, etc.). What is here is the initial culmination and should provide a great base to various downstream issues (e.g. SPARK-7016, modularize / parallelize testing, etc.). Would love comments / suggestions for this initial first step!
      
      /cc srowen pwendell nchammas
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5694 from brennonyork/SPARK-7017 and squashes the following commits:
      
      154ed73 [Brennon York] updated finding java binary if JAVA_HOME not set
      3922a85 [Brennon York] removed necessary passed in variable
      f9fbe54 [Brennon York] reverted doc test change
      8135518 [Brennon York] removed the test check for documentation changes until jenkins can get updated
      05d435b [Brennon York] added check for jekyll install
      22edb78 [Brennon York] add check if jekyll isn't installed on the path
      2dff136 [Brennon York] fixed pep8 whitespace errors
      767a668 [Brennon York] fixed path joining issues, ensured docs actually build on doc changes
      c42cf9a [Brennon York] unpack set operations with splat (*)
      fb85a41 [Brennon York] fixed minor set bug
      0379833 [Brennon York] minor doc addition to print the changed modules
      aa03d9e [Brennon York] added documentation builds as a top level test component, altered high level project changes to properly execute core tests only when necessary, changed variable names for simplicity
      ec1ae78 [Brennon York] minor name changes, bug fixes
      b7c72b9 [Brennon York] reverting streaming context
      03fdd7b [Brennon York] fixed the tuple () wraps around example lambda
      705d12e [Brennon York] changed example to comply with pep3113 supporting python3
      60b3d51 [Brennon York] prepend rather than append onto PATH
      7d2f5e2 [Brennon York] updated python tests to remove unused variable
      2898717 [Brennon York] added a change to streaming test to check if it only runs streaming tests
      eb684b6 [Brennon York] fixed sbt_test_goals reference error
      db7ae6f [Brennon York] reverted SPARK_HOME from start of command
      1ecca26 [Brennon York] fixed merge conflicts
      2fcdfc0 [Brennon York] testing targte branch dump on jenkins
      1f607b1 [Brennon York] finalizing revisions to modular tests
      8afbe93 [Brennon York] made error codes a global
      0629de8 [Brennon York] updated to refactor and remove various small bugs, removed pep8 complaints
      d90ab2d [Brennon York] fixed merge conflicts, ensured that for regular builds both core and sql tests always run
      b1248dc [Brennon York] exec python rather than running python and exiting with return code
      f9deba1 [Brennon York] python to python2 and removed newline
      6d0a052 [Brennon York] incorporated merge conflicts with SPARK-7249
      f950010 [Brennon York] removed building hive-0.12.0 per SPARK-6908
      703f095 [Brennon York] fixed merge conflicts
      b1ca593 [Brennon York] reverted the sparkR test
      afeb093 [Brennon York] updated to make sparkR test fail
      1dada6b [Brennon York] reverted pyspark test failure
      9a592ec [Brennon York] reverted mima exclude issue, added pyspark test failure
      d825aa4 [Brennon York] revert build break, add mima break
      f041d8a [Brennon York] added space from commented import to now test build breaking
      983f2a2 [Brennon York] comment out import to fail build test
      2386785 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-7017
      76335fb [Brennon York] reverted rat license issue for sparkconf
      e4a96cc [Brennon York] removed the import error and added license error, fixed the way run-tests and run-tests.py report their error codes
      56d3cb9 [Brennon York] changed test back and commented out import to break compile
      b37328c [Brennon York] fixed typo and added default return is no error block was found in the environment
      7613558 [Brennon York] updated to return the proper env variable for return codes
      a5bd445 [Brennon York] reverted license, changed test in shuffle to fail
      803143a [Brennon York] removed license file for SparkContext
      b0b2604 [Brennon York] comment out import to see if build fails and returns properly
      83e80ef [Brennon York] attempt at better python output when called from bash
      c095fa6 [Brennon York] removed another wait() call
      26e18e8 [Brennon York] removed unnecessary wait()
      07210a9 [Brennon York] minor doc string change for java version with namedtuple update
      ec03bf3 [Brennon York] added namedtuple for java version to add readability
      2cb413b [Brennon York] upcased global variables, changes various calling methods from check_output to check_call
      639f1e9 [Brennon York] updated with pep8 rules, fixed minor bugs, added run-tests file in bash to call the run-tests.py script
      3c53a1a [Brennon York] uncomment the scala tests :)
      6126c4f [Brennon York] refactored run-tests into python
      50a0496a
    • MechCoder's avatar
      [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to PySpark · 6765ef98
      MechCoder authored
      MatrixUDT was recently coded in scala. This has been ported to PySpark
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6354 from MechCoder/spark-6390 and squashes the following commits:
      
      fc4dc1e [MechCoder] Better error message
      c940a44 [MechCoder] Added test
      aa9c391 [MechCoder] Add pyUDT to MatrixUDT
      62a2a7d [MechCoder] [SPARK-6390] Port MatrixUDT to PySpark
      6765ef98
    • Liang-Chi Hsieh's avatar
      [SPARK-7199] [SQL] Add date and timestamp support to UnsafeRow · 104f30c3
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7199
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5984 from viirya/add_date_timestamp and squashes the following commits:
      
      7f21ce9 [Liang-Chi Hsieh] For comment.
      0b89698 [Liang-Chi Hsieh] Add timestamp to settableFieldTypes.
      c30d490 [Liang-Chi Hsieh] Use default IntUnsafeColumnWriter and LongUnsafeColumnWriter.
      672ef17 [Liang-Chi Hsieh] Remove getter/setter for Date and Timestamp and use Int and Long for them.
      9f3e577 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_date_timestamp
      281e844 [Liang-Chi Hsieh] Fix scala style.
      fb532b5 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_date_timestamp
      80af342 [Liang-Chi Hsieh] Fix compiling error.
      f4f5de6 [Liang-Chi Hsieh] Fix scala style.
      a463e83 [Liang-Chi Hsieh] Use Long to store timestamp for rows.
      635388a [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_date_timestamp
      46946c6 [Liang-Chi Hsieh] Adapt for moved DateUtils.
      b16994e [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_date_timestamp
      752251f [Liang-Chi Hsieh] Support setDate. Fix failed test.
      fcf8db9 [Liang-Chi Hsieh] Add functions for Date and Timestamp to SpecificRow.
      e42a809 [Liang-Chi Hsieh] Fix style.
      4c07b57 [Liang-Chi Hsieh] Add date and timestamp support to UnsafeRow.
      104f30c3
    • Vyacheslav Baranov's avatar
      [SPARK-8309] [CORE] Support for more than 12M items in OpenHashMap · c13da20a
      Vyacheslav Baranov authored
      The problem occurs because the position mask `0xEFFFFFF` is incorrect. It has zero 25th bit, so when capacity grows beyond 2^24, `OpenHashMap` calculates incorrect index of value in `_values` array.
      
      I've also added a size check in `rehash()`, so that it fails instead of reporting invalid item indices.
      
      Author: Vyacheslav Baranov <slavik.baranov@gmail.com>
      
      Closes #6763 from SlavikBaranov/SPARK-8309 and squashes the following commits:
      
      8557445 [Vyacheslav Baranov] Resolved review comments
      4d5b954 [Vyacheslav Baranov] Resolved review comments
      eaf1e68 [Vyacheslav Baranov] Fixed failing test
      f9284fd [Vyacheslav Baranov] Resolved review comments
      3920656 [Vyacheslav Baranov] SPARK-8309: Support for more than 12M items in OpenHashMap
      c13da20a
    • Reynold Xin's avatar
      Closes #6850. · e3de14d3
      Reynold Xin authored
      e3de14d3
    • dragonli's avatar
      [SPARK-8220][SQL]Add positive identify function · bedff7d5
      dragonli authored
      chenghao-intel adrian-wang
      
      Author: dragonli <lisurprise@gmail.com>
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #6838 from zhichao-li/positive and squashes the following commits:
      
      e1032a0 [dragonli] remove useless import and refactor code
      624d438 [zhichao.li] add positive identify function
      bedff7d5
  2. Jun 16, 2015
    • baishuo's avatar
      [SPARK-8156] [SQL] create table to specific database by 'use dbname' · 0b8c8fdc
      baishuo authored
      when i test the following code:
      hiveContext.sql("""use testdb""")
      val df = (1 to 3).map(i => (i, s"val_$i", i * 2)).toDF("a", "b", "c")
      df.write
      .format("parquet")
      .mode(SaveMode.Overwrite)
      .saveAsTable("ttt3")
      hiveContext.sql("show TABLES in default")
      
      found that the table ttt3 will be created under the database "default"
      
      Author: baishuo <vc_java@hotmail.com>
      
      Closes #6695 from baishuo/SPARK-8516-use-database and squashes the following commits:
      
      9e155f9 [baishuo] remove no use comment
      cb9f027 [baishuo] modify testcase
      00a7a2d [baishuo] modify testcase
      4df48c7 [baishuo] modify testcase
      b742e69 [baishuo] modify testcase
      3d19ad9 [baishuo] create table to specific database
      0b8c8fdc
    • Yanbo Liang's avatar
      [SPARK-7916] [MLLIB] MLlib Python doc parity check for classification and regression · ca998757
      Yanbo Liang authored
      Check then make the MLlib Python classification and regression doc to be as complete as the Scala doc.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6460 from yanboliang/spark-7916 and squashes the following commits:
      
      f8deda4 [Yanbo Liang] trigger jenkins
      6dc4d99 [Yanbo Liang] address comments
      ce2a43e [Yanbo Liang] truncate too long line and remove extra sparse
      3eaf6ad [Yanbo Liang] MLlib Python doc parity check for classification and regression
      ca998757
    • Marcelo Vanzin's avatar
      [SPARK-8126] [BUILD] Make sure temp dir exists when running tests. · cebf2411
      Marcelo Vanzin authored
      If you ran "clean" at the top-level sbt project, the temp dir would
      go away, so running "test" without restarting sbt would fail. This
      fixes that by making sure the temp dir exists before running tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6805 from vanzin/SPARK-8126-fix and squashes the following commits:
      
      12d7768 [Marcelo Vanzin] [SPARK-8126] [build] Make sure temp dir exists when running tests.
      cebf2411
    • Radek Ostrowski's avatar
      [SQL] [DOC] improved a comment · 4bd10fd5
      Radek Ostrowski authored
      [SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs
      
      Author: Radek Ostrowski <dest.hawaii@gmail.com>
      Author: radek <radek@radeks-MacBook-Pro-2.local>
      
      Closes #6332 from radek1st/master and squashes the following commits:
      
      dae3347 [Radek Ostrowski] fixed typo
      c76bb3a [radek] improved a comment
      4bd10fd5
    • Moussa Taifi's avatar
      [SPARK-DOCS] [SPARK-SQL] Update sql-programming-guide.md · dc455b88
      Moussa Taifi authored
      Typo in thriftserver section
      
      Author: Moussa Taifi <moutai10@gmail.com>
      
      Closes #6847 from moutai/patch-1 and squashes the following commits:
      
      1bd29df [Moussa Taifi] Update sql-programming-guide.md
      dc455b88
    • hushan[胡珊]'s avatar
      [SPARK-8387] [WEBUI] Only show 4096 bytes content for executor log instead of show all · 29c5025a
      hushan[胡珊] authored
      Author: hushan[胡珊] <hushan@xiaomi.com>
      
      Closes #6834 from suyanNone/small-display and squashes the following commits:
      
      744212f [hushan[胡珊]] Only show 4096 bytes content for executor log instead all
      29c5025a
    • Kan Zhang's avatar
      [SPARK-8129] [CORE] [Sec] Pass auth secrets to executors via env variables · 658814c8
      Kan Zhang authored
      Env variables are not visible to non-Spark users, based on suggestion from vanzin.
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #6774 from kanzhang/env and squashes the following commits:
      
      5dd84c6 [Kan Zhang] remove auth secret conf from initial set up for executors
      90cb7d2 [Kan Zhang] always filter out auth secret
      af4d89d [Kan Zhang] minor refactering
      e88993e [Kan Zhang] pass auth secret to executors via env variable
      658814c8
    • huangzhaowei's avatar
      [SPARK-8367] [STREAMING] Add a limit for 'spark.streaming.blockInterval` since a data loss bug. · ccf010f2
      huangzhaowei authored
      Bug had reported in the jira [SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367)
      The relution is limitting the configuration `spark.streaming.blockInterval` to a positive number.
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      Author: huangzhaowei <SaintBacchus@users.noreply.github.com>
      
      Closes #6818 from SaintBacchus/SPARK-8367 and squashes the following commits:
      
      c9d1927 [huangzhaowei] Update BlockGenerator.scala
      bd3f71a [huangzhaowei] Use requre instead of if
      3d17796 [huangzhaowei] [SPARK_8367][Streaming]Add a limit for 'spark.streaming.blockInterval' since a data loss bug.
      ccf010f2
    • Davies Liu's avatar
      [SPARK-7184] [SQL] enable codegen by default · bc76a0f7
      Davies Liu authored
      In order to have better performance out of box, this PR turn on codegen by default, then codegen can be tested by sql/test and hive/test.
      
      This PR also fix some corner cases for codegen.
      
      Before 1.5 release, we should re-visit this, turn it off if it's not stable or causing regressions.
      
      cc rxin JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6726 from davies/enable_codegen and squashes the following commits:
      
      f3b25a5 [Davies Liu] fix warning
      73750ea [Davies Liu] fix long overflow when compare
      3017a47 [Davies Liu] Merge branch 'master' of github.com:apache/spark into enable_codegen
      a7d75da [Davies Liu] Merge branch 'master' of github.com:apache/spark into enable_codegen
      ff5b75a [Davies Liu] Merge branch 'master' of github.com:apache/spark into enable_codegen
      f4cf2c2 [Davies Liu] fix style
      99fc139 [Davies Liu] Merge branch 'enable_codegen' of github.com:davies/spark into enable_codegen
      91fc7a2 [Davies Liu] disable codegen for ScalaUDF
      207e339 [Davies Liu] Update CodeGenerator.scala
      44573a3 [Davies Liu] check thread safety of expression
      f3886fa [Davies Liu] don't inline primitiveTerm for null literal
      c8e7cd2 [Davies Liu] address comment
      a8618c9 [Davies Liu] enable codegen by default
      bc76a0f7
  3. Jun 15, 2015
    • tedyu's avatar
      SPARK-8336 Fix NullPointerException with functions.rand() · 1a62d616
      tedyu authored
      This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'
      
      Tested using spark-shell and verified that the following works:
      sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #6793 from tedyu/master and squashes the following commits:
      
      62fd97b [tedyu] Create RandomSuite
      750f92c [tedyu] Add test for Rand() with seed
      a1d66c5 [tedyu] Fix NullPointerException with functions.rand()
      1a62d616
    • Yadong Qi's avatar
      [SPARK-6583] [SQL] Support aggregate functions in ORDER BY · 6ae21a94
      Yadong Qi authored
      Add aggregates in ORDER BY clauses to the `Aggregate` operator beneath.  Project these results away after the Sort.
      
      Based on work by watermen.  Also Closes #5290.
      
      Author: Yadong Qi <qiyadong2010@gmail.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6816 from marmbrus/pr/5290 and squashes the following commits:
      
      3226a97 [Michael Armbrust] consistent ordering
      eb8938d [Michael Armbrust] no vars
      c8b25c1 [Yadong Qi] move the test data.
      7f9b736 [Yadong Qi] delete Substring case
      a1e87c1 [Yadong Qi] fix conflict
      f119849 [Yadong Qi] order by aggregated function
      6ae21a94
    • andrewor14's avatar
      [SPARK-8350] [R] Log R unit test output to "unit-tests.log" · 56d4e8a2
      andrewor14 authored
      Right now it's logged to "R-unit-tests.log". Jenkins currently only archives files named "unit-tests.log", and this is what all other modules (e.g. SQL, network, REPL) use.
      1. We should be consistent
      2. I don't want to reconfigure Jenkins to accept a different file
      
      shivaram
      
      Author: andrewor14 <andrew@databricks.com>
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6807 from andrewor14/r-logs and squashes the following commits:
      
      96005d2 [andrewor14] Nest unit-tests.log further until R
      407c46c [andrewor14] Add target to log path
      d7b68ae [Andrew Or] Log R unit test output to "unit-tests.log"
      56d4e8a2
    • Nicholas Chammas's avatar
      [SPARK-8316] Upgrade to Maven 3.3.3 · 4c5889e8
      Nicholas Chammas authored
      Versions of Maven older than 3.3.0 apparently have [a bug in how they handle transitive dependencies](https://github.com/apache/spark/pull/6492#issuecomment-111001101).
      
      I confirmed that upgrading to Maven 3.3.3 resolves at least the particular manifestation of this bug that I ran into.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #6770 from nchammas/maven-333 and squashes the following commits:
      
      6bed2d9 [Nicholas Chammas] upgrade to Maven 3.3.3
      4c5889e8
  4. Jun 14, 2015
    • Marcelo Vanzin's avatar
      [SPARK-8065] [SQL] Add support for Hive 0.14 metastores · 4eb48ed1
      Marcelo Vanzin authored
      This change has two parts.
      
      The first one gets rid of "ReflectionMagic". That worked well for the differences between 0.12 and
      0.13, but breaks in 0.14, since some of the APIs that need to be used have primitive types. I could
      not figure out a way to make that class work with primitive types. So instead I wrote some shims
       (I can already hear the collective sigh) that find the appropriate methods via reflection. This should
      be faster since the method instances are cached, and the code is not much uglier than before,
      with the advantage that all the ugliness is local to one file (instead of multiple switch statements on
      the version being used scattered in ClientWrapper).
      
      The second part is simple: add code to handle Hive 0.14. A few new methods had to be added
      to the new shims.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6627 from vanzin/SPARK-8065 and squashes the following commits:
      
      3fa4270 [Marcelo Vanzin] Indentation style.
      4b8a3d4 [Marcelo Vanzin] Fix dep exclusion.
      be3d0cc [Marcelo Vanzin] Merge branch 'master' into SPARK-8065
      ca3fb1e [Marcelo Vanzin] Merge branch 'master' into SPARK-8065
      b43f13e [Marcelo Vanzin] Since exclusions seem to work, clean up some of the code.
      73bd161 [Marcelo Vanzin] Botched merge.
      d2ddf01 [Marcelo Vanzin] Comment about excluded dep.
      0c929d1 [Marcelo Vanzin] Merge branch 'master' into SPARK-8065
      2c3c02e [Marcelo Vanzin] Try to fix tests by adding support for exclusions.
      0a03470 [Marcelo Vanzin] Try to fix tests by upgrading calcite dependency.
      13b2dfa [Marcelo Vanzin] Fix NPE.
      6439d88 [Marcelo Vanzin] Minor style thing.
      69b017b [Marcelo Vanzin] Style.
      a21cad8 [Marcelo Vanzin] Part II: Add shims / version for Hive 0.14.
      ae98c87 [Marcelo Vanzin] PART I: Get rid of reflection magic.
      4eb48ed1
    • Peter Hoffmann's avatar
      fix read/write mixup · f3f2a439
      Peter Hoffmann authored
      Author: Peter Hoffmann <ph@peter-hoffmann.com>
      
      Closes #6815 from hoffmann/patch-1 and squashes the following commits:
      
      2abb6da [Peter Hoffmann] fix read/write mixup
      f3f2a439
    • Reynold Xin's avatar
      [SPARK-8362] [SQL] Add unit tests for +, -, *, /, % · 53c16b92
      Reynold Xin authored
      Added unit tests for all supported data types for:
      - Add
      - Subtract
      - Multiply
      - Divide
      - UnaryMinus
      - Remainder
      
      Fixed bugs caught by the unit tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6813 from rxin/SPARK-8362 and squashes the following commits:
      
      fb3fe62 [Reynold Xin] Added Remainder.
      3b266ba [Reynold Xin] [SPARK-8362] Add unit tests for +, -, *, /.
      53c16b92
    • Michael Armbrust's avatar
      [SPARK-8358] [SQL] Wait for child resolution when resolving generators · 9073a426
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6811 from marmbrus/aliasExplodeStar and squashes the following commits:
      
      fbd2065 [Michael Armbrust] more style
      806a373 [Michael Armbrust] fix style
      7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when resolving generatorsa
      9073a426
    • Josh Rosen's avatar
      [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space... · ea7fd2ff
      Josh Rosen authored
      [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap
      
      UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when allocating row conversion scratch space: we take a size requirement, measured in bytes, then allocate a long array of that size.  This means that we end up allocating 8x too much conversion space.
      
      This patch fixes this by allocating a `byte[]` array instead.  This doesn't impose any new limitations on the maximum sizes of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits:
      
      6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is constrained by max byte[] size
      ea7fd2ff
    • Liang-Chi Hsieh's avatar
      [SPARK-8342][SQL] Fix Decimal setOrNull · cb7ada11
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8342
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6797 from viirya/fix_decimal and squashes the following commits:
      
      8a447b1 [Liang-Chi Hsieh] Add unit test.
      d67a5ea [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal
      ab6d8af [Liang-Chi Hsieh] Fix setOrNull.
      cb7ada11
  5. Jun 13, 2015
    • Mike Dusenberry's avatar
      [Spark-8343] [Streaming] [Docs] Improve Spark Streaming Guides. · 35d1267c
      Mike Dusenberry authored
      This improves the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6801 from dusenberrymw/SPARK-8343_Improve_Spark_Streaming_Guides_MERGED and squashes the following commits:
      
      6688090 [Mike Dusenberry] Improvements to the Spark Streaming Custom Receiver Guide, including slight rewording of confusing sections, and fixing typos & missing words.
      436fbd8 [Mike Dusenberry] Bunch of improvements to the Spark Streaming Guide, including fixing broken links, slight rewording of confusing sections, fixing typos & missing words, etc.
      35d1267c
    • Reynold Xin's avatar
      [SPARK-8349] [SQL] Use expression constructors (rather than apply) in FunctionRegistry · 2d71ba4c
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6806 from rxin/gs and squashes the following commits:
      
      ed1aebb [Reynold Xin] Fixed style.
      c7fc3e6 [Reynold Xin] [SPARK-8349][SQL] Use expression constructors (rather than apply) in FunctionRegistry
      2d71ba4c
    • Reynold Xin's avatar
      [SPARK-8347][SQL] Add unit tests for abs. · a1389533
      Reynold Xin authored
      Also addressed code review feedback from #6754
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6803 from rxin/abs and squashes the following commits:
      
      d07beba [Reynold Xin] [SPARK-8347] Add unit tests for abs.
      a1389533
    • Liang-Chi Hsieh's avatar
      [SPARK-8052] [SQL] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble · ddec4527
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8052
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6645 from viirya/cast_string_integraltype and squashes the following commits:
      
      e19c6a3 [Liang-Chi Hsieh] For comment.
      c3e472a [Liang-Chi Hsieh] Add test.
      7ced9b0 [Liang-Chi Hsieh] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble.
      ddec4527
    • Josh Rosen's avatar
      [SPARK-8319] [CORE] [SQL] Update logic related to key orderings in shuffle dependencies · af31335a
      Josh Rosen authored
      This patch updates two pieces of logic that are related to handling of keyOrderings in ShuffleDependencies:
      
      - The Tungsten ShuffleManager falls back to regular SortShuffleManager whenever the shuffle dependency specifies a key ordering, but technically we only need to fall back when an aggregator is also specified. This patch updates the fallback logic to reflect this so that the Tungsten optimizations can apply to more workloads.
      
      - The SQL Exchange operator performs defensive copying of shuffle inputs when a key ordering is specified, but this is unnecessary. The copying was added to guard against cases where ExternalSorter would buffer non-serialized records in memory.  When ExternalSorter is configured without an aggregator, it uses the following logic to determine whether to buffer records in a serialized or deserialized format:
      
         ```scala
           private val useSerializedPairBuffer =
              ordering.isEmpty &&
              conf.getBoolean("spark.shuffle.sort.serializeMapOutputs", true) &&
              ser.supportsRelocationOfSerializedObjects
         ```
      
         The `newOrdering.isDefined` branch in `ExternalSorter.needToCopyObjectsBeforeShuffle`, removed by this patch, is not necessary:
      
         - It was checked even if we weren't using sort-based shuffle, but this was unnecessary because only SortShuffleManager performs map-side sorting.
         - Map-side sorting during shuffle writing is only performed for shuffles that perform map-side aggregation as part of the shuffle (to see this, look at how SortShuffleWriter constructs ExternalSorter).  Since SQL never pushes aggregation into Spark's shuffle, we can guarantee that both the aggregator and ordering will be empty and Spark SQL always uses serializers that support relocation, so sort-shuffle will use the serialized pair buffer unless the user has explicitly disabled it via the SparkConf feature-flag.  Therefore, I think my optimization in Exchange should be safe.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6773 from JoshRosen/SPARK-8319 and squashes the following commits:
      
      7a14129 [Josh Rosen] Revise comments; add handler to guard against future ShuffleManager implementations
      07bb2c9 [Josh Rosen] Update comment to clarify circumstances under which shuffle operates on serialized records
      269089a [Josh Rosen] Avoid unnecessary copy in SQL Exchange
      34e526e [Josh Rosen] Enable Tungsten shuffle for non-agg shuffles w/ key orderings
      af31335a
    • Davies Liu's avatar
      [SPARK-8346] [SQL] Use InternalRow instread of catalyst.InternalRow · ce1041c3
      Davies Liu authored
      cc rxin marmbrus
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6802 from davies/cleanup_internalrow and squashes the following commits:
      
      769d2aa [Davies Liu] remove not needed cast
      4acbbe4 [Davies Liu] catalyst.Internal -> InternalRow
      ce1041c3
    • Rene Treffer's avatar
      [SPARK-7897] Improbe type for jdbc/"unsigned bigint" · d986fb9a
      Rene Treffer authored
      The original fix uses DecimalType.Unlimited, which is harder to
      handle afterwards. There is no scale and most data should fit into
      a long, thus DecimalType(20,0) should be better.
      
      Author: Rene Treffer <treffer@measite.de>
      
      Closes #6789 from rtreffer/spark-7897-unsigned-bigint-as-decimal and squashes the following commits:
      
      2006613 [Rene Treffer] Fix type for "unsigned bigint" jdbc loading.
      d986fb9a
    • Michael Armbrust's avatar
      [SPARK-8329][SQL] Allow _ in DataSource options · 4aed66f2
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6786 from marmbrus/optionsParser and squashes the following commits:
      
      e7d18ef [Michael Armbrust] add dots
      99a3452 [Michael Armbrust] [SPARK-8329][SQL] Allow _ in DataSource options
      4aed66f2
Loading