Skip to content
Snippets Groups Projects
  1. Jul 28, 2015
    • zsxwing's avatar
      [SPARK-9335] [STREAMING] [TESTS] Make sure the test stream is deleted in KinesisBackedBlockRDDSuite · d93ab93d
      zsxwing authored
      KinesisBackedBlockRDDSuite should make sure delete the stream.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7663 from zsxwing/fix-SPARK-9335 and squashes the following commits:
      
      f0e9154 [zsxwing] Revert "[HOTFIX] - Disable Kinesis tests due to rate limits"
      71a4552 [zsxwing] Make sure the test stream is deleted
      d93ab93d
    • Cheng Hao's avatar
      [MINOR] [SQL] Support mutable expression unit test with codegen projection · 9c5612f4
      Cheng Hao authored
      This is actually contains 3 minor issues:
      1) Enable the unit test(codegen) for mutable expressions (FormatNumber, Regexp_Replace/Regexp_Extract)
      2) Use the `PlatformDependent.copyMemory` instead of the `System.arrayCopy`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7566 from chenghao-intel/codegen_ut and squashes the following commits:
      
      24f43ea [Cheng Hao] enable codegen for mutable expression & UTF8String performance
      9c5612f4
    • Reynold Xin's avatar
      [SPARK-9373][SQL] Support StructType in Tungsten projection · 60f08c7c
      Reynold Xin authored
      This pull request updates GenerateUnsafeProjection to support StructType. If an input struct type is backed already by an UnsafeRow, GenerateUnsafeProjection copies the bytes directly into its buffer space without any conversion. However, if the input is not an UnsafeRow, GenerateUnsafeProjection runs the code generated recursively to convert the input into an UnsafeRow and then copies it into the buffer space.
      
      Also create a TungstenProject operator that projects data directly into UnsafeRow. Note that I'm not sure if this is the way we want to structure Unsafe+codegen operators, but we can defer that decision to follow-up pull requests.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7689 from rxin/tungsten-struct-type and squashes the following commits:
      
      9162f42 [Reynold Xin] Support IntervalType in UnsafeRow's getter.
      be9f377 [Reynold Xin] Fixed tests.
      10c4b7c [Reynold Xin] Format generated code.
      77e8d0e [Reynold Xin] Fixed NondeterministicSuite.
      ac4951d [Reynold Xin] Yay.
      ac203bf [Reynold Xin] More comments.
      9f36216 [Reynold Xin] Updated comment.
      6b781fe [Reynold Xin] Reset the change in DataFrameSuite.
      525b95b [Reynold Xin] Merged with master, more documentation & test cases.
      321859a [Reynold Xin] [SPARK-9373][SQL] Support StructType in Tungsten projection [WIP]
      60f08c7c
    • Yijie Shen's avatar
      [SPARK-8828] [SQL] Revert SPARK-5680 · 63a492b9
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8828
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7667 from yjshen/revert_combinesum_2 and squashes the following commits:
      
      c37ccb1 [Yijie Shen] add test case
      8377214 [Yijie Shen] revert spark.sql.useAggregate2 to its default value
      e2305ac [Yijie Shen] fix bug - avg on decimal column
      7cb0e95 [Yijie Shen] [wip] resolving bugs
      1fadb5a [Yijie Shen] remove occurance
      17c6248 [Yijie Shen] revert SPARK-5680
      63a492b9
    • Reynold Xin's avatar
      Fixed a test failure. · 3bc7055e
      Reynold Xin authored
      3bc7055e
  2. Jul 27, 2015
    • Reynold Xin's avatar
      [SPARK-9395][SQL] Create a SpecializedGetters interface to track all the specialized getters. · 84da8792
      Reynold Xin authored
      As we are adding more and more specialized getters to more classes (coming soon ArrayData), this interface can help us prevent missing a method in some interfaces.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7713 from rxin/SpecializedGetters and squashes the following commits:
      
      3b39be1 [Reynold Xin] Added override modifier.
      567ba9c [Reynold Xin] [SPARK-9395][SQL] Create a SpecializedGetters interface to track all the specialized getters.
      84da8792
    • Daoyuan Wang's avatar
      [SPARK-8195] [SPARK-8196] [SQL] udf next_day last_day · 2e7f99a0
      Daoyuan Wang authored
      next_day, returns next certain dayofweek.
      last_day, returns the last day of the month which given date belongs to.
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #6986 from adrian-wang/udfnlday and squashes the following commits:
      
      ef7e3da [Daoyuan Wang] fix
      02b3426 [Daoyuan Wang] address 2 comments
      dc69630 [Daoyuan Wang] address comments from rxin
      8846086 [Daoyuan Wang] address comments from rxin
      d09bcce [Daoyuan Wang] multi fix
      1a9de3d [Daoyuan Wang] function next_day and last_day
      2e7f99a0
    • zsxwing's avatar
      [SPARK-8882] [STREAMING] Add a new Receiver scheduling mechanism · daa1964b
      zsxwing authored
      The design doc: https://docs.google.com/document/d/1ZsoRvHjpISPrDmSjsGzuSu8UjwgbtmoCTzmhgTurHJw/edit?usp=sharing
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7276 from zsxwing/receiver-scheduling and squashes the following commits:
      
      137b257 [zsxwing] Add preferredNumExecutors to rescheduleReceiver
      61a6c3f [zsxwing] Set state to ReceiverState.INACTIVE in deregisterReceiver
      5e1fa48 [zsxwing] Fix the code style
      7451498 [zsxwing] Move DummyReceiver back to ReceiverTrackerSuite
      715ef9c [zsxwing] Rename: scheduledLocations -> scheduledExecutors; locations -> executors
      05daf9c [zsxwing] Use receiverTrackingInfo.toReceiverInfo
      1d6d7c8 [zsxwing] Merge branch 'master' into receiver-scheduling
      8f93c8d [zsxwing] Use hostPort as the receiver location rather than host; fix comments and unit tests
      59f8887 [zsxwing] Schedule all receivers at the same time when launching them
      075e0a3 [zsxwing] Add receiver RDD name; use '!isTrackerStarted' instead
      276a4ac [zsxwing] Remove "ReceiverLauncher" and move codes to "launchReceivers"
      fab9a01 [zsxwing] Move methods back to the outer class
      4e639c4 [zsxwing] Fix unintentional changes
      f60d021 [zsxwing] Reorganize ReceiverTracker to use an event loop for lock free
      105037e [zsxwing] Merge branch 'master' into receiver-scheduling
      5fee132 [zsxwing] Update tha scheduling algorithm to avoid to keep restarting Receiver
      9e242c8 [zsxwing] Remove the ScheduleReceiver message because we can refuse it when receiving RegisterReceiver
      a9acfbf [zsxwing] Merge branch 'squash-pr-6294' into receiver-scheduling
      881edb9 [zsxwing] ReceiverScheduler -> ReceiverSchedulingPolicy
      e530bcc [zsxwing] [SPARK-5681][Streaming] Use a lock to eliminate the race condition when stopping receivers and registering receivers happen at the same time #6294
      3b87e4a [zsxwing] Revert SparkContext.scala
      a86850c [zsxwing] Remove submitAsyncJob and revert JobWaiter
      f549595 [zsxwing] Add comments for the scheduling approach
      9ecc08e [zsxwing] Fix comments and code style
      28d1bee [zsxwing] Make 'host' protected; rescheduleReceiver -> getAllowedLocations
      2c86a9e [zsxwing] Use tryFailure to support calling jobFailed multiple times
      ca6fe35 [zsxwing] Add a test for Receiver.restart
      27acd45 [zsxwing] Add unit tests for LoadBalanceReceiverSchedulerImplSuite
      cc76142 [zsxwing] Add JobWaiter.toFuture to avoid blocking threads
      d9a3e72 [zsxwing] Add a new Receiver scheduling mechanism
      daa1964b
    • Michael Armbrust's avatar
      [SPARK-9386] [SQL] Feature flag for metastore partition pruning · ce89ff47
      Michael Armbrust authored
      Since we have been seeing a lot of failures related to this new feature, lets put it behind a flag and turn it off by default.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #7703 from marmbrus/optionalMetastorePruning and squashes the following commits:
      
      6ad128c [Michael Armbrust] style
      8447835 [Michael Armbrust] [SPARK-9386][SQL] Feature flag for metastore partition pruning
      fd37b87 [Michael Armbrust] add config flag
      ce89ff47
    • Eric Liang's avatar
      [SPARK-9230] [ML] Support StringType features in RFormula · 8ddfa52c
      Eric Liang authored
      This adds StringType feature support via OneHotEncoder. As part of this task it was necessary to change RFormula to an Estimator, so that factor levels could be determined from the training dataset.
      
      Not sure if I am using uids correctly here, would be good to get reviewer help on that.
      cc mengxr
      
      Umbrella design doc: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit#
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #7574 from ericl/string-features and squashes the following commits:
      
      f99131a [Eric Liang] comments
      0bf3c26 [Eric Liang] update docs
      c302a2c [Eric Liang] fix tests
      9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features
      e713da3 [Eric Liang] comments
      4d79193 [Eric Liang] revert to seq + distinct
      169a085 [Eric Liang] tweak functional test
      a230a47 [Eric Liang] Merge branch 'master' into string-features
      72bd6f3 [Eric Liang] fix merge
      d841cec [Eric Liang] Merge branch 'master' into string-features
      5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015
      b01c7c5 [Eric Liang] add test
      8a637db [Eric Liang] encoder wip
      a1d03f4 [Eric Liang] refactor into estimator
      8ddfa52c
    • Yin Huai's avatar
      [SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint. · dafe8d85
      Yin Huai authored
      Instead of disabling all python style check, we should enable PEP8. So, this PR just comments out the part installing pylint.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7704 from yhuai/SPARK-9385 and squashes the following commits:
      
      0056359 [Yin Huai] Enable PEP8 but disable installing pylint.
      dafe8d85
    • jerryshao's avatar
      [SPARK-4352] [YARN] [WIP] Incorporate locality preferences in dynamic allocation requests · ab625956
      jerryshao authored
      Currently there's no locality preference for container request in YARN mode, this will affect the performance if fetching data remotely, so here proposed to add locality in Yarn dynamic allocation mode.
      
      Ping sryza, please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #6394 from jerryshao/SPARK-4352 and squashes the following commits:
      
      d45fecb [jerryshao] Add documents
      6c3fe5c [jerryshao] Fix bug
      8db6c0e [jerryshao] Further address the comments
      2e2b2cb [jerryshao] Fix rebase compiling problem
      ce5f096 [jerryshao] Fix style issue
      7f7df95 [jerryshao] Fix rebase issue
      9ca9e07 [jerryshao] Code refactor according to comments
      d3e4236 [jerryshao] Further address the comments
      5e7a593 [jerryshao] Fix bug introduced code rebase
      9ca7783 [jerryshao] Style changes
      08317f9 [jerryshao] code and comment refines
      65b2423 [jerryshao] Further address the comments
      a27c587 [jerryshao] address the comment
      27faabc [jerryshao] redundant code remove
      9ce06a1 [jerryshao] refactor the code
      f5ba27b [jerryshao] Style fix
      2c6cc8a [jerryshao] Fix bug and add unit tests
      0757335 [jerryshao] Consider the distribution of existed containers to recalculate the new container requests
      0ad66ff [jerryshao] Fix compile bugs
      1c20381 [jerryshao] Minor fix
      5ef2dc8 [jerryshao] Add docs and improve the code
      3359814 [jerryshao] Fix rebase and test bugs
      0398539 [jerryshao] reinitialize the new implementation
      67596d6 [jerryshao] Still fix the code
      654e1d2 [jerryshao] Fix some bugs
      45b1c89 [jerryshao] Further polish the algorithm
      dea0152 [jerryshao] Enable node locality information in YarnAllocator
      74bbcc6 [jerryshao] Support node locality for dynamic allocation initial commit
      ab625956
    • Yin Huai's avatar
      [SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check · 2104931d
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-9385
      
      Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7702 from yhuai/SPARK-9385 and squashes the following commits:
      
      146e6ef [Yin Huai] Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console
      2104931d
    • Hari Shreedharan's avatar
      [SPARK-8988] [YARN] Make sure driver log links appear in secure cluste… · c1be9f30
      Hari Shreedharan authored
      …r mode.
      
      The NodeReports API currently used does not work in secure mode since we do not get RM tokens. Instead this patch just uses environment vars exported by YARN to create the log links.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #7624 from harishreedharan/driver-logs-env and squashes the following commits:
      
      7368c7e [Hari Shreedharan] [SPARK-8988][YARN] Make sure driver log links appear in secure cluster mode.
      c1be9f30
    • Wenchen Fan's avatar
      [SPARK-9355][SQL] Remove InternalRow.get generic getter call in columnar cache code · 3ab7525d
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7673 from cloud-fan/row-generic-getter-columnar and squashes the following commits:
      
      88b1170 [Wenchen Fan] fix style
      eeae712 [Wenchen Fan] Remove Internal.get generic getter call in columnar cache code
      3ab7525d
    • Cheng Lian's avatar
      [SPARK-9378] [SQL] Fixes test case "CTAS with serde" · 8e7d2bee
      Cheng Lian authored
      This is a proper version of PR #7693 authored by viirya
      
      The reason why "CTAS with serde" fails is that the `MetastoreRelation` gets converted to a Parquet data source relation by default.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7700 from liancheng/spark-9378-fix-ctas-test and squashes the following commits:
      
      4413af0 [Cheng Lian] Fixes test case "CTAS with serde"
      8e7d2bee
    • Yin Huai's avatar
      [SPARK-9349] [SQL] UDAF cleanup · 55946e76
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-9349
      
      With this PR, we only expose `UserDefinedAggregateFunction` (an abstract class) and `MutableAggregationBuffer` (an interface). Other internal wrappers and helper classes are moved to `org.apache.spark.sql.execution.aggregate` and marked as `private[sql]`.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7687 from yhuai/UDAF-cleanup and squashes the following commits:
      
      db36542 [Yin Huai] Add comments to UDAF examples.
      ae17f66 [Yin Huai] Address comments.
      9c9fa5f [Yin Huai] UDAF cleanup.
      55946e76
    • Reynold Xin's avatar
      fa84e4a7
    • Reynold Xin's avatar
      85a50a63
    • Wenchen Fan's avatar
      [SPARK-9369][SQL] Support IntervalType in UnsafeRow · 75438422
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7688 from cloud-fan/interval and squashes the following commits:
      
      5b36b17 [Wenchen Fan] fix codegen
      a99ed50 [Wenchen Fan] address comment
      9e6d319 [Wenchen Fan] Support IntervalType in UnsafeRow
      75438422
    • Wenchen Fan's avatar
      [SPARK-9351] [SQL] remove literals from grouping expressions in Aggregate · dd9ae794
      Wenchen Fan authored
      literals in grouping expressions have no effect at all, only make our grouping key bigger, so we should remove them in Optimizer.
      
      I also make old and new aggregation code consistent about literals in grouping here. In old aggregation, actually literals in grouping are already removed but new aggregation is not. So I explicitly make it a rule in Optimizer.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7583 from cloud-fan/minor and squashes the following commits:
      
      471adff [Wenchen Fan] add test
      0839925 [Wenchen Fan] use transformDown when rewrite final result expressions
      dd9ae794
    • George Dittmar's avatar
      [SPARK-7423] [MLLIB] Modify ClassificationModel and Probabalistic model to use Vector.argmax · 1f7b3d9d
      George Dittmar authored
      Use Vector.argmax call instead of converting to dense vector before calculating predictions.
      
      Author: George Dittmar <georgedittmar@gmail.com>
      
      Closes #7670 from GeorgeDittmar/sprk-7423 and squashes the following commits:
      
      e796747 [George Dittmar] Changing ClassificationModel and ProbabilisticClassificationModel to use Vector.argmax instead of converting to DenseVector
      1f7b3d9d
    • Wenchen Fan's avatar
      [SPARK-9376] [SQL] use a seed in RandomDataGeneratorSuite · e2f38167
      Wenchen Fan authored
      Make this test deterministic, i.e. make sure this test can be passed no matter how many times we run it.
      
      The origin implementation uses a random seed and gives a chance that we may break the null check assertion `assert(Iterator.fill(100)(generator()).contains(null))`.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7691 from cloud-fan/seed and squashes the following commits:
      
      eae7281 [Wenchen Fan] use a seed in RandomDataGeneratorSuite
      e2f38167
    • Ryan Williams's avatar
      [SPARK-9366] use task's stageAttemptId in TaskEnd event · c0b7df68
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #7681 from ryan-williams/task-stage-attempt and squashes the following commits:
      
      d6d5f0f [Ryan Williams] use task's stageAttemptId in TaskEnd event
      c0b7df68
    • Josh Rosen's avatar
      [SPARK-9364] Fix array out of bounds and use-after-free bugs in UnsafeExternalSorter · ecad9d43
      Josh Rosen authored
      This patch fixes two bugs in UnsafeExternalSorter and UnsafeExternalRowSorter:
      
      - UnsafeExternalSorter does not properly update freeSpaceInCurrentPage, which can cause it to write past the end of memory pages and trigger segfaults.
      - UnsafeExternalRowSorter has a use-after-free bug when returning the last row from an iterator.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7680 from JoshRosen/SPARK-9364 and squashes the following commits:
      
      590f311 [Josh Rosen] null out row
      f4cf91d [Josh Rosen] Fix use-after-free bug in UnsafeExternalRowSorter.
      8abcf82 [Josh Rosen] Properly decrement freeSpaceInCurrentPage in UnsafeExternalSorter
      ecad9d43
    • Alexander Ulanov's avatar
      Pregel example type fix · 90006f3c
      Alexander Ulanov authored
      Pregel example to express single source shortest path from https://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api does not work due to incorrect type. The reason is that `GraphGenerators.logNormalGraph` returns the graph with `Long` vertices. Fixing `val graph: Graph[Int, Double]` to `val graph: Graph[Long, Double]`.
      
      Author: Alexander Ulanov <nashb@yandex.ru>
      
      Closes #7695 from avulanov/SPARK-9380-pregel-doc and squashes the following commits:
      
      c269429 [Alexander Ulanov] Pregel example type fix
      90006f3c
    • Rene Treffer's avatar
      [SPARK-4176] [SQL] Supports decimal types with precision > 18 in Parquet · aa19c696
      Rene Treffer authored
      This PR is based on #6796 authored by rtreffer.
      
      To support large decimal precisions (> 18), we do the following things in this PR:
      
      1. Making `CatalystSchemaConverter` support large decimal precision
      
         Decimal types with large precision are always converted to fixed-length byte array.
      
      2. Making `CatalystRowConverter` support reading decimal values with large precision
      
         When the precision is > 18, constructs `Decimal` values with an unscaled `BigInteger` rather than an unscaled `Long`.
      
      3. Making `RowWriteSupport` support writing decimal values with large precision
      
         In this PR we always write decimals as fixed-length byte array, because Parquet write path hasn't been refactored to conform Parquet format spec (see SPARK-6774 & SPARK-8848).
      
      Two follow-up tasks should be done in future PRs:
      
      - [ ] Writing decimals as `INT32`, `INT64` when possible while fixing SPARK-8848
      - [ ] Adding compatibility tests as part of SPARK-5463
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7455 from liancheng/spark-4176 and squashes the following commits:
      
      a543d10 [Cheng Lian] Fixes errors introduced while rebasing
      9e31cdf [Cheng Lian] Supports decimals with precision > 18 for Parquet
      aa19c696
    • Carson Wang's avatar
      [SPARK-8405] [DOC] Add how to view logs on Web UI when yarn log aggregation is enabled · 62283816
      Carson Wang authored
      Some users may not be aware that the logs are available on Web UI even if Yarn log aggregation is enabled. Update the doc to make this clear and what need to be configured.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #7463 from carsonwang/YarnLogDoc and squashes the following commits:
      
      274c054 [Carson Wang] Minor text fix
      74df3a1 [Carson Wang] address comments
      5a95046 [Carson Wang] Update the text in the doc
      e5775c1 [Carson Wang] Update doc about how to view the logs on Web UI when yarn log aggregation is enabled
      62283816
    • Cheng Lian's avatar
      [SPARK-7943] [SPARK-8105] [SPARK-8435] [SPARK-8714] [SPARK-8561] Fixes multi-database support · 72981bc8
      Cheng Lian authored
      This PR fixes a set of issues related to multi-database. A new data structure `TableIdentifier` is introduced to identify a table among multiple databases. We should stop using a single `String` (table name without database name), or `Seq[String]` (optional database name plus table name) to identify tables internally.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7623 from liancheng/spark-8131-multi-db and squashes the following commits:
      
      f3bcd4b [Cheng Lian] Addresses PR comments
      e0eb76a [Cheng Lian] Fixes styling issues
      41e2207 [Cheng Lian] Fixes multi-database support
      d4d1ec2 [Cheng Lian] Adds multi-database test cases
      72981bc8
    • Wenchen Fan's avatar
      [SPARK-9371][SQL] fix the support for special chars in column names for hive context · 4ffd3a1d
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7684 from cloud-fan/hive and squashes the following commits:
      
      da21ffe [Wenchen Fan] fix the support for special chars in column names for hive context
      4ffd3a1d
    • Reynold Xin's avatar
      [SPARK-9368][SQL] Support get(ordinal, dataType) generic getter in UnsafeRow. · aa80c64f
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7682 from rxin/unsaferow-generic-getter and squashes the following commits:
      
      3063788 [Reynold Xin] Reset the change for real this time.
      0f57c55 [Reynold Xin] Reset the changes in ExpressionEvalHelper.
      fb6ca30 [Reynold Xin] Support BinaryType.
      24a3e46 [Reynold Xin] Added support for DateType/TimestampType.
      9989064 [Reynold Xin] JoinedRow.
      11f80a3 [Reynold Xin] [SPARK-9368][SQL] Support get(ordinal, dataType) generic getter in UnsafeRow.
      aa80c64f
    • Liang-Chi Hsieh's avatar
      [SPARK-9306] [SQL] Don't use SortMergeJoin when joining on unsortable columns · 945d8bcb
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9306
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7645 from viirya/smj_unsortable and squashes the following commits:
      
      a240707 [Liang-Chi Hsieh] Use forall instead of exists for readability.
      55221fa [Liang-Chi Hsieh] Shouldn't use SortMergeJoin when joining on unsortable columns.
      945d8bcb
  3. Jul 26, 2015
    • Cheng Hao's avatar
      [SPARK-8867][SQL] Support list / describe function usage · 1efe97dc
      Cheng Hao authored
      As Hive does, we need to list all of the registered UDF and its usage for user.
      
      We add the annotation to describe a UDF, so we can get the literal description info while registering the UDF.
      e.g.
      ```scala
      ExpressionDescription(
          usage = "_FUNC_(expr) - Returns the absolute value of the numeric value",
          extended = """> SELECT _FUNC_('-1')
                        1""")
       case class Abs(child: Expression) extends UnaryArithmetic {
      ...
      ```
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7259 from chenghao-intel/desc_function and squashes the following commits:
      
      cf29bba [Cheng Hao] fixing the code style issue
      5193855 [Cheng Hao] Add more powerful parser for show functions
      c645a6b [Cheng Hao] fix bug in unit test
      78d40f1 [Cheng Hao] update the padding issue for usage
      48ee4b3 [Cheng Hao] update as feedback
      70eb4e9 [Cheng Hao] add show/describe function support
      1efe97dc
    • Cheng Lian's avatar
      [SPARK-9095] [SQL] Removes the old Parquet support · c025c3d0
      Cheng Lian authored
      This PR removes the old Parquet support:
      
      - Removes the old `ParquetRelation` together with related SQL configuration, plan nodes, strategies, utility classes, and test suites.
      
      - Renames `ParquetRelation2` to `ParquetRelation`
      
      - Renames `RowReadSupport` and `RowRecordMaterializer` to `CatalystReadSupport` and `CatalystRecordMaterializer` respectively, and moved them to separate files.
      
        This follows naming convention used in other Parquet data models implemented in parquet-mr. It should be easier for developers who are familiar with Parquet to follow.
      
      There's still some other code that can be cleaned up. Especially `RowWriteSupport`. But I'd like to leave this part to SPARK-8848.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7441 from liancheng/spark-9095 and squashes the following commits:
      
      c7b6e38 [Cheng Lian] Removes WriteToFile
      2d688d6 [Cheng Lian] Renames ParquetRelation2 to ParquetRelation
      ca9e1b7 [Cheng Lian] Removes old Parquet support
      c025c3d0
    • Kay Ousterhout's avatar
      [SPARK-9326] Close lock file used for file downloads. · 6b2baec0
      Kay Ousterhout authored
      A lock file is used to ensure multiple executors running on the
      same machine don't download the same file concurrently. Spark never
      closes these lock files (releasing the lock does not close the
      underlying file); this commit fixes that.
      
      cc vanzin (looks like you've been involved in various other fixes surrounding these lock files)
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #7650 from kayousterhout/SPARK-9326 and squashes the following commits:
      
      0401bd1 [Kay Ousterhout] Close lock file used for file downloads.
      6b2baec0
    • Andrew Or's avatar
      [SPARK-9352] [SPARK-9353] Add tests for standalone scheduling code · 1cf19760
      Andrew Or authored
      This also fixes a small issue in the standalone Master that was uncovered by the new tests. For more detail, read the description of SPARK-9353.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7668 from andrewor14/standalone-scheduling-tests and squashes the following commits:
      
      d852faf [Andrew Or] Add tests + fix scheduling with memory limits
      1cf19760
    • Yijie Shen's avatar
      [SPARK-9356][SQL]Remove the internal use of DecimalType.Unlimited · fb5d43fb
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9356
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7671 from yjshen/deprecated_unlimit and squashes the following commits:
      
      c707f56 [Yijie Shen] remove pattern matching in changePrecision
      4a1823c [Yijie Shen] remove internal occurrence of Decimal.Unlimited
      fb5d43fb
    • Reynold Xin's avatar
      [SPARK-9354][SQL] Remove InternalRow.get generic getter call in Hive integration code. · 6c400b4f
      Reynold Xin authored
      Replaced them with get(ordinal, datatype) so we can use UnsafeRow here.
      
      I passed the data types throughout.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7669 from rxin/row-generic-getter-hive and squashes the following commits:
      
      3467d8e [Reynold Xin] [SPARK-9354][SQL] Remove Internal.get generic getter call in Hive integration code.
      6c400b4f
    • Yuhao Yang's avatar
      [SPARK-9337] [MLLIB] Add an ut for Word2Vec to verify the empty vocabulary check · b79bf1df
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-9337
      
      Word2Vec should throw exception when vocabulary is empty
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #7660 from hhbyyh/ut4Word2vec and squashes the following commits:
      
      17a18cb [Yuhao Yang] add ut for word2vec
      b79bf1df
    • Reynold Xin's avatar
      [SPARK-9350][SQL] Introduce an InternalRow generic getter that requires a DataType · 4a01bfc2
      Reynold Xin authored
      Currently UnsafeRow cannot support a generic getter. However, if the data type is known, we can support a generic getter.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7666 from rxin/generic-getter-with-datatype and squashes the following commits:
      
      ee2874c [Reynold Xin] Add a default implementation for getStruct.
      1e109a0 [Reynold Xin] [SPARK-9350][SQL] Introduce an InternalRow generic getter that requires a DataType.
      033ee88 [Reynold Xin] Removed getAs in non test code.
      4a01bfc2
Loading