Skip to content
Snippets Groups Projects
  1. Jun 14, 2015
    • Peter Hoffmann's avatar
      fix read/write mixup · f3f2a439
      Peter Hoffmann authored
      Author: Peter Hoffmann <ph@peter-hoffmann.com>
      
      Closes #6815 from hoffmann/patch-1 and squashes the following commits:
      
      2abb6da [Peter Hoffmann] fix read/write mixup
      f3f2a439
    • Reynold Xin's avatar
      [SPARK-8362] [SQL] Add unit tests for +, -, *, /, % · 53c16b92
      Reynold Xin authored
      Added unit tests for all supported data types for:
      - Add
      - Subtract
      - Multiply
      - Divide
      - UnaryMinus
      - Remainder
      
      Fixed bugs caught by the unit tests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6813 from rxin/SPARK-8362 and squashes the following commits:
      
      fb3fe62 [Reynold Xin] Added Remainder.
      3b266ba [Reynold Xin] [SPARK-8362] Add unit tests for +, -, *, /.
      53c16b92
    • Michael Armbrust's avatar
      [SPARK-8358] [SQL] Wait for child resolution when resolving generators · 9073a426
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6811 from marmbrus/aliasExplodeStar and squashes the following commits:
      
      fbd2065 [Michael Armbrust] more style
      806a373 [Michael Armbrust] fix style
      7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when resolving generatorsa
      9073a426
    • Josh Rosen's avatar
      [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space... · ea7fd2ff
      Josh Rosen authored
      [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap
      
      UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when allocating row conversion scratch space: we take a size requirement, measured in bytes, then allocate a long array of that size.  This means that we end up allocating 8x too much conversion space.
      
      This patch fixes this by allocating a `byte[]` array instead.  This doesn't impose any new limitations on the maximum sizes of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits:
      
      6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is constrained by max byte[] size
      ea7fd2ff
    • Liang-Chi Hsieh's avatar
      [SPARK-8342][SQL] Fix Decimal setOrNull · cb7ada11
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8342
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6797 from viirya/fix_decimal and squashes the following commits:
      
      8a447b1 [Liang-Chi Hsieh] Add unit test.
      d67a5ea [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_decimal
      ab6d8af [Liang-Chi Hsieh] Fix setOrNull.
      cb7ada11
  2. Jun 13, 2015
    • Mike Dusenberry's avatar
      [Spark-8343] [Streaming] [Docs] Improve Spark Streaming Guides. · 35d1267c
      Mike Dusenberry authored
      This improves the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6801 from dusenberrymw/SPARK-8343_Improve_Spark_Streaming_Guides_MERGED and squashes the following commits:
      
      6688090 [Mike Dusenberry] Improvements to the Spark Streaming Custom Receiver Guide, including slight rewording of confusing sections, and fixing typos & missing words.
      436fbd8 [Mike Dusenberry] Bunch of improvements to the Spark Streaming Guide, including fixing broken links, slight rewording of confusing sections, fixing typos & missing words, etc.
      35d1267c
    • Reynold Xin's avatar
      [SPARK-8349] [SQL] Use expression constructors (rather than apply) in FunctionRegistry · 2d71ba4c
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6806 from rxin/gs and squashes the following commits:
      
      ed1aebb [Reynold Xin] Fixed style.
      c7fc3e6 [Reynold Xin] [SPARK-8349][SQL] Use expression constructors (rather than apply) in FunctionRegistry
      2d71ba4c
    • Reynold Xin's avatar
      [SPARK-8347][SQL] Add unit tests for abs. · a1389533
      Reynold Xin authored
      Also addressed code review feedback from #6754
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6803 from rxin/abs and squashes the following commits:
      
      d07beba [Reynold Xin] [SPARK-8347] Add unit tests for abs.
      a1389533
    • Liang-Chi Hsieh's avatar
      [SPARK-8052] [SQL] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble · ddec4527
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8052
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6645 from viirya/cast_string_integraltype and squashes the following commits:
      
      e19c6a3 [Liang-Chi Hsieh] For comment.
      c3e472a [Liang-Chi Hsieh] Add test.
      7ced9b0 [Liang-Chi Hsieh] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble.
      ddec4527
    • Josh Rosen's avatar
      [SPARK-8319] [CORE] [SQL] Update logic related to key orderings in shuffle dependencies · af31335a
      Josh Rosen authored
      This patch updates two pieces of logic that are related to handling of keyOrderings in ShuffleDependencies:
      
      - The Tungsten ShuffleManager falls back to regular SortShuffleManager whenever the shuffle dependency specifies a key ordering, but technically we only need to fall back when an aggregator is also specified. This patch updates the fallback logic to reflect this so that the Tungsten optimizations can apply to more workloads.
      
      - The SQL Exchange operator performs defensive copying of shuffle inputs when a key ordering is specified, but this is unnecessary. The copying was added to guard against cases where ExternalSorter would buffer non-serialized records in memory.  When ExternalSorter is configured without an aggregator, it uses the following logic to determine whether to buffer records in a serialized or deserialized format:
      
         ```scala
           private val useSerializedPairBuffer =
              ordering.isEmpty &&
              conf.getBoolean("spark.shuffle.sort.serializeMapOutputs", true) &&
              ser.supportsRelocationOfSerializedObjects
         ```
      
         The `newOrdering.isDefined` branch in `ExternalSorter.needToCopyObjectsBeforeShuffle`, removed by this patch, is not necessary:
      
         - It was checked even if we weren't using sort-based shuffle, but this was unnecessary because only SortShuffleManager performs map-side sorting.
         - Map-side sorting during shuffle writing is only performed for shuffles that perform map-side aggregation as part of the shuffle (to see this, look at how SortShuffleWriter constructs ExternalSorter).  Since SQL never pushes aggregation into Spark's shuffle, we can guarantee that both the aggregator and ordering will be empty and Spark SQL always uses serializers that support relocation, so sort-shuffle will use the serialized pair buffer unless the user has explicitly disabled it via the SparkConf feature-flag.  Therefore, I think my optimization in Exchange should be safe.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6773 from JoshRosen/SPARK-8319 and squashes the following commits:
      
      7a14129 [Josh Rosen] Revise comments; add handler to guard against future ShuffleManager implementations
      07bb2c9 [Josh Rosen] Update comment to clarify circumstances under which shuffle operates on serialized records
      269089a [Josh Rosen] Avoid unnecessary copy in SQL Exchange
      34e526e [Josh Rosen] Enable Tungsten shuffle for non-agg shuffles w/ key orderings
      af31335a
    • Davies Liu's avatar
      [SPARK-8346] [SQL] Use InternalRow instread of catalyst.InternalRow · ce1041c3
      Davies Liu authored
      cc rxin marmbrus
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6802 from davies/cleanup_internalrow and squashes the following commits:
      
      769d2aa [Davies Liu] remove not needed cast
      4acbbe4 [Davies Liu] catalyst.Internal -> InternalRow
      ce1041c3
    • Rene Treffer's avatar
      [SPARK-7897] Improbe type for jdbc/"unsigned bigint" · d986fb9a
      Rene Treffer authored
      The original fix uses DecimalType.Unlimited, which is harder to
      handle afterwards. There is no scale and most data should fit into
      a long, thus DecimalType(20,0) should be better.
      
      Author: Rene Treffer <treffer@measite.de>
      
      Closes #6789 from rtreffer/spark-7897-unsigned-bigint-as-decimal and squashes the following commits:
      
      2006613 [Rene Treffer] Fix type for "unsigned bigint" jdbc loading.
      d986fb9a
    • Michael Armbrust's avatar
      [SPARK-8329][SQL] Allow _ in DataSource options · 4aed66f2
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6786 from marmbrus/optionsParser and squashes the following commits:
      
      e7d18ef [Michael Armbrust] add dots
      99a3452 [Michael Armbrust] [SPARK-8329][SQL] Allow _ in DataSource options
      4aed66f2
    • Davies Liu's avatar
      [SPARK-7186] [SQL] Decouple internal Row from external Row · d46f8e5d
      Davies Liu authored
      Currently, we use o.a.s.sql.Row both internally and externally. The external interface is wider than what the internal needs because it is designed to facilitate end-user programming. This design has proven to be very error prone and cumbersome for internal Row implementations.
      
      As a first step, we create an InternalRow interface in the catalyst module, which is identical to the current Row interface. And we switch all internal operators/expressions to use this InternalRow instead. When we need to expose Row, we convert the InternalRow implementation into Row for users.
      
      For all public API, we use Row (for example, data source APIs), which will be converted into/from InternalRow by CatalystTypeConverters.
      
      For all internal data sources (Json, Parquet, JDBC, Hive), we use InternalRow for better performance, casted into Row in buildScan() (without change the public API). When create a PhysicalRDD, we cast them back to InternalRow.
      
      cc rxin marmbrus JoshRosen
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6792 from davies/internal_row and squashes the following commits:
      
      f2abd13 [Davies Liu] fix scalastyle
      a7e025c [Davies Liu] move InternalRow into catalyst
      30db8ba [Davies Liu] Merge branch 'master' of github.com:apache/spark into internal_row
      7cbced8 [Davies Liu] separate Row and InternalRow
      d46f8e5d
  3. Jun 12, 2015
    • Roger Menezes's avatar
      [SPARK-8314][MLlib] improvement in performance of MLUtils.appendBias · 6e9c3ff1
      Roger Menezes authored
      MLUtils.appendBias method is heavily used in creating intercepts for linear models.
      This method uses Breeze's vector concatenation which is very slow compared to the plain
      System.arrayCopy. This improvement is to change the implementation to use System.arrayCopy.
      
      I saw the following performance improvements after the change:
      Benchmark with mnist dataset for 50 times:
      MLUtils.appendBias (SparseVector Before): 47320 ms
      MLUtils.appendBias (SparseVector After): 1935 ms
      MLUtils.appendBias (DenseVector Before): 5340 ms
      MLUtils.appendBias (DenseVector After): 4080 ms
      This is almost a 24 times performance boost for SparseVectors.
      
      Author: Roger Menezes <rmenezes@netflix.com>
      
      Closes #6768 from rogermenezes/improve-append-bias and squashes the following commits:
      
      4e42f75 [Roger Menezes] address feedback
      e999d79 [Roger Menezes] first commit
      6e9c3ff1
    • Tathagata Das's avatar
      [SPARK-7284] [STREAMING] Updated streaming documentation · e9471d34
      Tathagata Das authored
      - Kinesis API updated
      - Kafka version updated, and Python API for Direct Kafka added
      - Added SQLContext.getOrCreate()
      - Added information on how to get partitionId in foreachRDD
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6781 from tdas/SPARK-7284 and squashes the following commits:
      
      aac7be0 [Tathagata Das] Added information on how to get partition id
      a66ec22 [Tathagata Das] Complete the line incomplete line,
      a92ca39 [Tathagata Das] Updated streaming documentation
      e9471d34
    • Andrew Or's avatar
      [SPARK-8330] DAG visualization: trim whitespace from input · 88604051
      Andrew Or authored
      Safeguard against DOM rewriting.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6787 from andrewor14/dag-viz-trim and squashes the following commits:
      
      0fb4afe [Andrew Or] Trim input metadata from DOM
      88604051
    • akhilthatipamula's avatar
      [SPARK-7993] [SQL] Improved DataFrame.show() output · 19834fa9
      akhilthatipamula authored
      Closes #6633
      
      Author: akhilthatipamula <130050068@iitb.ac.in>
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6784 from zsxwing/pr6633 and squashes the following commits:
      
      5da1c51 [zsxwing] Address comments and add unit tests
      17eab7b [akhilthatipamula] refactored code
      19874b3 [akhilthatipamula] Update DataFrame.scala
      0a76a5e [akhilthatipamula] Optimised showString()
      e3dd03f [akhilthatipamula] Modified showString() method
      a21012b [akhilthatipamula] improved the show()
      4bb742f [akhilthatipamula] Modified dataframe.show() method
      19834fa9
    • Mark Smith's avatar
      [SPARK-8322] [EC2] Added spark 1.4.0 into the VALID_SPARK_VERSIONS and… · 71cc17bf
      Mark Smith authored
      … SPARK_TACHYON_MAP
      
      This contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Mark Smith <mark.smith@bronto.com>
      
      Closes #6776 from markmsmith/SPARK-8322 and squashes the following commits:
      
      d744244 [Mark Smith] [SPARK-8322][EC2] Fixed tachyon mapp entry to point to 0.6.4
      e4f14d3 [Mark Smith] [SPARK-8322][EC2] Added spark 1.4.0 into the VALID_SPARK_VERSIONS and SPARK_TACHYON_MAP
      71cc17bf
    • Wenchen Fan's avatar
      [SQL] [MINOR] correct semanticEquals logic · c19c7857
      Wenchen Fan authored
      It's a follow up of https://github.com/apache/spark/pull/6173, for expressions like `Coalesce` that have a `Seq[Expression]`, when we do semantic equal check for it, we need to do semantic equal check for all of its children.
      Also we can just use `Seq[(Expression, NamedExpression)]` instead of `Map[Expression, NamedExpression]` as we only search it with `find`.
      
      chenghao-intel, I agree that we probably never knows `semanticEquals` in a general way, but I think we have done that in `TreeNode`, so we can use similar logic. Then we can handle something like `Coalesce(children: Seq[Expression])` correctly.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6261 from cloud-fan/tmp and squashes the following commits:
      
      4daef88 [Wenchen Fan] address comments
      dd8fbd9 [Wenchen Fan] correct semanticEquals
      c19c7857
    • Yash Datta's avatar
      [SPARK-6566] [SQL] Related changes for newer parquet version · e428b3a9
      Yash Datta authored
      This brings in major improvement in that footers are not read on the driver. This also cleans up the code in parquetTableOperations, where we had to override getSplits to eliminate multiple listStatus calls.
      
      cc liancheng
      
      are there any other changes we need for this ?
      
      Author: Yash Datta <Yash.Datta@guavus.com>
      
      Closes #5889 from saucam/parquet_1.6 and squashes the following commits:
      
      d1bf41e [Yash Datta] SPARK-7340: Fix scalastyle and incorporate review comments
      c9aa042 [Yash Datta] SPARK-7340: Use the new user defined filter predicate for pushing down inset into parquet
      56bc750 [Yash Datta] SPARK-7340: Change parquet version to latest release
      e428b3a9
    • zhichao.li's avatar
      [SPARK-7862] [SQL] Fix the deadlock in script transformation for stderr · 2dd7f930
      zhichao.li authored
      [Related PR SPARK-7044] (https://github.com/apache/spark/pull/5671)
      
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #6404 from zhichao-li/transform and squashes the following commits:
      
      8418c97 [zhichao.li] add comments and remove useless failAfter logic
      d9677e1 [zhichao.li] redirect the error desitination to be the same as the current process
      2dd7f930
    • Josh Rosen's avatar
      [SPARK-8317] [SQL] Do not push sort into shuffle in Exchange operator · b9d177c5
      Josh Rosen authored
      In some cases, Spark SQL pushes sorting operations into the shuffle layer by specifying a key ordering as part of the shuffle dependency. I think that we should not do this:
      
      - Since we do not delegate aggregation to Spark's shuffle, specifying the keyOrdering as part of the shuffle has no effect on the shuffle map side.
      - By performing the shuffle ourselves (by inserting a sort operator after the shuffle instead), we can use the Exchange planner to choose specialized sorting implementations based on the types of rows being sorted.
      - We can remove some complexity from SqlSerializer2 by not requiring it to know about sort orderings, since SQL's own sort operators will already perform the necessary defensive copying.
      
      This patch removes Exchange's `canSortWithShuffle` path and the associated code in `SqlSerializer2`.  Shuffles that used to go through the `canSortWithShuffle` path would always wind up using Spark's `ExternalSorter` (inside of `HashShuffleReader`); to avoid a performance regression as a result of handling these shuffles ourselves, I've changed the SQLConf defaults so that external sorting is enabled by default.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6772 from JoshRosen/SPARK-8317 and squashes the following commits:
      
      ebf9c0f [Josh Rosen] Do not push sort into shuffle in Exchange operator
      bf3b4c8 [Josh Rosen] Enable external sort by default
      b9d177c5
  4. Jun 11, 2015
    • Cheng Hao's avatar
      [SPARK-7158] [SQL] Fix bug of cached data cannot be used in collect() after cache() · 767cc94c
      Cheng Hao authored
      When df.cache() method called, the `withCachedData` of `QueryExecution` has been created, which mean it will not look up the cached tables when action method called afterward.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #5714 from chenghao-intel/SPARK-7158 and squashes the following commits:
      
      58ea8aa [Cheng Hao] style issue
      2bf740f [Cheng Hao] create new QueryExecution instance for CacheManager
      a5647d9 [Cheng Hao] hide the queryExecution of DataFrame
      fbfd3c5 [Cheng Hao] make the DataFrame.queryExecution mutable for cache/persist/unpersist
      767cc94c
    • Reynold Xin's avatar
      [SQL] Miscellaneous SQL/DF expression changes. · 337c16d5
      Reynold Xin authored
      SPARK-8201 conditional function: if
      SPARK-8205 conditional function: nvl
      SPARK-8208 math function: ceiling
      SPARK-8210 math function: degrees
      SPARK-8211 math function: radians
      SPARK-8219 math function: negative
      SPARK-8216 math function: rename log -> ln
      SPARK-8222 math function: alias power / pow
      SPARK-8225 math function: alias sign / signum
      SPARK-8228 conditional function: isnull
      SPARK-8229 conditional function: isnotnull
      SPARK-8250 string function: alias lower/lcase
      SPARK-8251 string function: alias upper / ucase
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6754 from rxin/expressions-misc and squashes the following commits:
      
      35fce15 [Reynold Xin] Removed println.
      2647067 [Reynold Xin] Promote to string type.
      3c32bbc [Reynold Xin] Fixed if.
      de827ac [Reynold Xin] Fixed style
      b201cd4 [Reynold Xin] Removed if.
      6b21a9b [Reynold Xin] [SQL] Miscellaneous SQL/DF expression changes.
      337c16d5
    • Zhongshuai Pei's avatar
      [SPARK-7824] [SQL] Collapse operator reordering and constant folding into a single batch. · 7914c720
      Zhongshuai Pei authored
      SQL
      ```
      select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e)
      ```
      Plan before modify
      ```
      == Optimized Logical Plan ==
      Project [a#293,b#294,c#295,d#296,e#297]
       Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297))))
        MetastoreRelation default, tablea, None
        MetastoreRelation default, tableb, None
      ```
      Plan after modify
      ```
      == Optimized Logical Plan ==
      Project [a#293,b#294,c#295,d#296,e#297]
       Join Inner, Some(((b#294 = d#296) || (b#294 = e#297)))
        Filter (a#293 > 3)
         MetastoreRelation default, tablea, None
        MetastoreRelation default, tableb, None
      ```
      
      CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification ,  so CombineLimits  must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin.
      
      Author: Zhongshuai Pei <799203320@qq.com>
      Author: DoingDone9 <799203320@qq.com>
      
      Closes #6351 from DoingDone9/master and squashes the following commits:
      
      20de7be [Zhongshuai Pei] Update Optimizer.scala
      7bc7d28 [Zhongshuai Pei] Merge pull request #17 from apache/master
      0ba5f42 [Zhongshuai Pei] Update Optimizer.scala
      f8b9314 [Zhongshuai Pei] Update FilterPushdownSuite.scala
      c529d9f [Zhongshuai Pei] Update FilterPushdownSuite.scala
      ae3af6d [Zhongshuai Pei] Update FilterPushdownSuite.scala
      a04ffae [Zhongshuai Pei] Update Optimizer.scala
      11beb61 [Zhongshuai Pei] Update FilterPushdownSuite.scala
      f2ee5fe [Zhongshuai Pei] Update Optimizer.scala
      be6b1d5 [Zhongshuai Pei] Update Optimizer.scala
      b01e622 [Zhongshuai Pei] Merge pull request #15 from apache/master
      8df716a [Zhongshuai Pei] Update FilterPushdownSuite.scala
      d98bc35 [Zhongshuai Pei] Update FilterPushdownSuite.scala
      fa65718 [Zhongshuai Pei] Update Optimizer.scala
      ab8e9a6 [Zhongshuai Pei] Merge pull request #14 from apache/master
      14952e2 [Zhongshuai Pei] Merge pull request #13 from apache/master
      f03fe7f [Zhongshuai Pei] Merge pull request #12 from apache/master
      f12fa50 [Zhongshuai Pei] Merge pull request #10 from apache/master
      f61210c [Zhongshuai Pei] Merge pull request #9 from apache/master
      34b1a9a [Zhongshuai Pei] Merge pull request #8 from apache/master
      802261c [DoingDone9] Merge pull request #7 from apache/master
      d00303b [DoingDone9] Merge pull request #6 from apache/master
      98b134f [DoingDone9] Merge pull request #5 from apache/master
      161cae3 [DoingDone9] Merge pull request #4 from apache/master
      c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
      cb1852d [DoingDone9] Merge pull request #2 from apache/master
      c3f046f [DoingDone9] Merge pull request #1 from apache/master
      7914c720
    • Reynold Xin's avatar
      [SPARK-8286] Rewrite UTF8String in Java and move it into unsafe package. · 7d669a56
      Reynold Xin authored
      Unit test is still in Scala.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6738 from rxin/utf8string-java and squashes the following commits:
      
      562dc6e [Reynold Xin] Flag...
      98e600b [Reynold Xin] Another try with encoding setting ..
      cfa6bdf [Reynold Xin] Merge branch 'master' into utf8string-java
      a3b124d [Reynold Xin] Try different UTF-8 encoded characters.
      1ff7c82 [Reynold Xin] Enable UTF-8 encoding.
      82d58cc [Reynold Xin] Reset run-tests.
      2cb3c69 [Reynold Xin] Use utf-8 encoding in set bytes.
      53f8ef4 [Reynold Xin] Hack Jenkins to run one test.
      9a48e8d [Reynold Xin] Fixed runtime compilation error.
      911c450 [Reynold Xin] Moved unit test also to Java.
      4eff7bd [Reynold Xin] Improved unit test coverage.
      8e89a3c [Reynold Xin] Fixed tests.
      77c64bd [Reynold Xin] Fixed string type codegen.
      ffedb62 [Reynold Xin] Code review feedback.
      0967ce6 [Reynold Xin] Fixed import ordering.
      45a123d [Reynold Xin] [SPARK-8286] Rewrite UTF8String in Java and move it into unsafe package.
      7d669a56
    • Marcelo Vanzin's avatar
      [SPARK-6511] [docs] Fix example command in hadoop-provided docs. · 9cbdf31e
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6766 from vanzin/SPARK-6511 and squashes the following commits:
      
      49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
      9cbdf31e
    • zsxwing's avatar
      [SPARK-7444] [TESTS] Eliminate noisy css warn/error logs for UISeleniumSuite · 95690a17
      zsxwing authored
      Eliminate the following noisy logs for `UISeleniumSuite`:
      ```
      15/05/07 10:09:50.196 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN DefaultCssErrorHandler: CSS error: 'http://192.168.0.170:4040/static/bootstrap.min.css' [793:167] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
      15/05/07 10:09:50.196 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN DefaultCssErrorHandler: CSS warning: 'http://192.168.0.170:4040/static/bootstrap.min.css' [793:167] Ignoring the following declarations in this rule.
      15/05/07 10:09:50.197 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN DefaultCssErrorHandler: CSS error: 'http://192.168.0.170:4040/static/bootstrap.min.css' [799:325] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
      15/05/07 10:09:50.197 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN DefaultCssErrorHandler: CSS warning: 'http://192.168.0.170:4040/static/bootstrap.min.css' [799:325] Ignoring the following declarations in this rule.
      15/05/07 10:09:50.198 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN DefaultCssErrorHandler: CSS error: 'http://192.168.0.170:4040/static/bootstrap.min.css' [805:18] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
      15/05/07 10:09:50.198 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN DefaultCssErrorHandler: CSS warning: 'http://192.168.0.170:4040/static/bootstrap.min.css' [805:18] Ignoring the following declarations in this rule.
      ```
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5983 from zsxwing/SPARK-7444 and squashes the following commits:
      
      4202728 [zsxwing] Add SparkUICssErrorHandler for all tests
      d1398ad [zsxwing] Merge remote-tracking branch 'origin/master' into SPARK-7444
      7bb7f11 [zsxwing] Merge branch 'master' into SPARK-7444
      a59f40e [zsxwing] Eliminate noisy css warn/error logs for UISeleniumSuite
      95690a17
    • Cheng Hao's avatar
      [SPARK-7915] [SQL] Support specifying the column list for target table in CTAS · 040f223c
      Cheng Hao authored
      ```
      create table t1 (a int, b string) as select key, value from src;
      
      desc t1;
      key	int	NULL
      value	string	NULL
      ```
      
      Thus Hive doesn't support specifying the column list for target table in CTAS, however, we should either throwing exception explicity, or supporting the this feature, we just pick up the later one, which seems useful and straightforward.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #6458 from chenghao-intel/ctas_column and squashes the following commits:
      
      d1fa9b6 [Cheng Hao] bug in unittest
      4e701aa [Cheng Hao] update as feedback
      f305ec1 [Cheng Hao] support specifying the column list for target table in CTAS
      040f223c
    • Shivaram Venkataraman's avatar
      [SPARK-8310] [EC2] Updates the master branch EC2 versions · c8d551d5
      Shivaram Venkataraman authored
      Will send another PR for `branch-1.4`
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6764 from shivaram/SPARK-8310 and squashes the following commits:
      
      d8cd3b3 [Shivaram Venkataraman] This updates the master branch EC2 versions
      c8d551d5
    • Davies Liu's avatar
      [SPARK-8305] [SPARK-8190] [SQL] improve codegen · 1191c3ef
      Davies Liu authored
      This PR fix a few small issues about codgen:
      
      1. cast decimal to boolean
      2. do not inline literal with null
      3. improve SpecificRow.equals()
      4. test expressions with optimized express
      5. fix compare with BinaryType
      
      cc rxin chenghao-intel
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6755 from davies/fix_codegen and squashes the following commits:
      
      ef27343 [Davies Liu] address comments
      6617ea6 [Davies Liu] fix scala tyle
      70b7dda [Davies Liu] improve codegen
      1191c3ef
    • Davies Liu's avatar
      [SPARK-6411] [SQL] [PySpark] support date/datetime with timezone in Python · 424b0075
      Davies Liu authored
      Spark SQL does not support timezone, and Pyrolite does not support timezone well. This patch will convert datetime into POSIX timestamp (without confusing of timezone), which is used by SQL. If the datetime object does not have timezone, it's treated as local time.
      
      The timezone in RDD will be lost after one round trip, all the datetime from SQL will be local time.
      
      Because of Pyrolite, datetime from SQL only has precision as 1 millisecond.
      
      This PR also drop the timezone in date, convert it to number of days since epoch (used in SQL).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6250 from davies/tzone and squashes the following commits:
      
      44d8497 [Davies Liu] add timezone support for DateType
      99d9d9c [Davies Liu] use int for timestamp
      10aa7ca [Davies Liu] Merge branch 'master' of github.com:apache/spark into tzone
      6a29aa4 [Davies Liu] support datetime with timezone
      424b0075
    • Adam Roberts's avatar
      [SPARK-8289] Specify stack size for consistency with Java tests - resolves test failures · 6b68366d
      Adam Roberts authored
      This change is a simple one and specifies a stack size of 4096k instead of the vendor default for Java tests (the defaults vary between Java vendors). This remedies test failures observed with JavaALSSuite with IBM and Oracle Java owing to a lower default size in comparison to the size with OpenJDK. 4096k is a suitable default where the tests pass with each Java vendor tested. The alternative is to reduce the number of iterations in the test (no observed failures with 5 iterations instead of 15).
      
      -Xss works with Oracle's HotSpot VM, IBM's J9 VM and OpenJDK (IcedTea).
      
      I have ensured this does not have any negative implications for other tests.
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      Author: a-roberts <aroberts@uk.ibm.com>
      
      Closes #6727 from a-roberts/IncJavaStackSize and squashes the following commits:
      
      ab40aea [Adam Roberts] Specify stack size for SBT builds
      5032d8d [a-roberts] Update pom.xml
      6b68366d
    • Patrick Wendell's avatar
      [HOTFIX] Fixing errors in name mappings · e84545fa
      Patrick Wendell authored
      e84545fa
  5. Jun 10, 2015
    • Patrick Wendell's avatar
      a777eb04
    • Daoyuan Wang's avatar
      [SPARK-8217] [SQL] math function log2 · 2758ff0a
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Reynold Xin <rxin@databricks.com>
      
      Closes #6718 from adrian-wang/udflog2 and squashes the following commits:
      
      3909f48 [Daoyuan Wang] math function: log2
      2758ff0a
    • Cheng Hao's avatar
      [SPARK-8248][SQL] string function: length · 9fe3adcc
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #6724 from chenghao-intel/length and squashes the following commits:
      
      aaa3c31 [Cheng Hao] revert the additional change
      97148a9 [Cheng Hao] remove the codegen testing temporally
      ae08003 [Cheng Hao] update the comments
      1eb1fd1 [Cheng Hao] simplify the code as commented
      3e92d32 [Cheng Hao] use the selectExpr in unit test intead of SQLQuery
      3c729aa [Cheng Hao] fix bug for constant null value in codegen
      3641f06 [Cheng Hao] keep the length() method for registered function
      8e30171 [Cheng Hao] update the code as comment
      db604ae [Cheng Hao] Add code gen support
      548d2ef [Cheng Hao] register the length()
      09a0738 [Cheng Hao] add length support
      9fe3adcc
    • Wenchen Fan's avatar
      [SPARK-8164] transformExpressions should support nested expression sequence · 4e42842e
      Wenchen Fan authored
      Currently we only support `Seq[Expression]`, we should handle cases like `Seq[Seq[Expression]]` so that we can remove the unnecessary `GroupExpression`.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6706 from cloud-fan/clean and squashes the following commits:
      
      60a1193 [Wenchen Fan] support nested expression sequence and remove GroupExpression
      4e42842e
    • navis.ryu's avatar
      [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first · 6a47114b
      navis.ryu authored
          case cs  CombineSum(expr) =>
              val calcType = expr.dataType
                expr.dataType match {
                  case DecimalType.Fixed(_, _) =>
                    DecimalType.Unlimited
                  case _ =>
                    expr.dataType
                }
      calcType is always expr.dataType. credits are all belong to IntelliJ
      
      Author: navis.ryu <navis@apache.org>
      
      Closes #6736 from navis/SPARK-8285 and squashes the following commits:
      
      20382c1 [navis.ryu] [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first
      6a47114b
Loading