Skip to content
Snippets Groups Projects
  1. Aug 08, 2015
    • Joseph Batchik's avatar
      [SPARK-9486][SQL] Add data source aliasing for external packages · a3aec918
      Joseph Batchik authored
      Users currently have to provide the full class name for external data sources, like:
      
      `sqlContext.read.format("com.databricks.spark.avro").load(path)`
      
      This allows external data source packages to register themselves using a Service Loader so that they can add custom alias like:
      
      `sqlContext.read.format("avro").load(path)`
      
      This makes it so that using external data source packages uses the same format as the internal data sources like parquet, json, etc.
      
      Author: Joseph Batchik <joseph.batchik@cloudera.com>
      Author: Joseph Batchik <josephbatchik@gmail.com>
      
      Closes #7802 from JDrit/service_loader and squashes the following commits:
      
      49a01ec [Joseph Batchik] fixed a couple of format / error bugs
      e5e93b2 [Joseph Batchik] modified rat file to only excluded added services
      72b349a [Joseph Batchik] fixed error with orc data source actually
      9f93ea7 [Joseph Batchik] fixed error with orc data source
      87b7f1c [Joseph Batchik] fixed typo
      101cd22 [Joseph Batchik] removing unneeded changes
      8f3cf43 [Joseph Batchik] merged in changes
      b63d337 [Joseph Batchik] merged in master
      95ae030 [Joseph Batchik] changed the new trait to be used as a mixin for data source to register themselves
      74db85e [Joseph Batchik] reformatted class loader
      ac2270d [Joseph Batchik] removing some added test
      a6926db [Joseph Batchik] added test cases for data source loader
      208a2a8 [Joseph Batchik] changes to do error catching if there are multiple data sources
      946186e [Joseph Batchik] started working on service loader
      a3aec918
    • Yijie Shen's avatar
      [SPARK-9728][SQL]Support CalendarIntervalType in HiveQL · 23695f1d
      Yijie Shen authored
      This PR enables converting interval term in HiveQL to CalendarInterval Literal.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-9728
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #8034 from yjshen/interval_hiveql and squashes the following commits:
      
      7fe9a5e [Yijie Shen] declare throw exception and add unit test
      fce7795 [Yijie Shen] convert hiveql interval term into CalendarInterval literal
      23695f1d
    • Davies Liu's avatar
      [SPARK-6902] [SQL] [PYSPARK] Row should be read-only · ac507a03
      Davies Liu authored
      Raise an read-only exception when user try to mutable a Row.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8009 from davies/readonly_row and squashes the following commits:
      
      8722f3f [Davies Liu] add tests
      05a3d36 [Davies Liu] Row should be read-only
      ac507a03
    • Davies Liu's avatar
      [SPARK-4561] [PYSPARK] [SQL] turn Row into dict recursively · 74a6541a
      Davies Liu authored
      Add an option `recursive` to `Row.asDict()`, when True (default is False), it will convert the nested Row into dict.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8006 from davies/as_dict and squashes the following commits:
      
      922cc5a [Davies Liu] turn Row into dict recursively
      74a6541a
    • Wenchen Fan's avatar
      [SPARK-9738] [SQL] remove FromUnsafe and add its codegen version to GenerateSafe · 106c0789
      Wenchen Fan authored
      In https://github.com/apache/spark/pull/7752 we added `FromUnsafe` to convert nexted unsafe data like array/map/struct to safe versions. It's a quick solution and we already have `GenerateSafe` to do the conversion which is codegened. So we should remove `FromUnsafe` and implement its codegen version in `GenerateSafe`.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8029 from cloud-fan/from-unsafe and squashes the following commits:
      
      ed40d8f [Wenchen Fan] add the copy back
      a93fd4b [Wenchen Fan] cogengen FromUnsafe
      106c0789
    • Cheng Lian's avatar
      [SPARK-4176] [SQL] [MINOR] Should use unscaled Long to write decimals for... · 11caf1ce
      Cheng Lian authored
      [SPARK-4176] [SQL] [MINOR] Should use unscaled Long to write decimals for precision <= 18 rather than 8
      
      This PR fixes a minor bug introduced in #7455: when writing decimals, we should use the unscaled Long for better performance when the precision <= 18 rather than 8 (should be a typo). This bug doesn't affect correctness, but hurts Parquet decimal writing performance.
      
      This PR also replaced similar magic numbers with newly defined constants.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8031 from liancheng/spark-4176/minor-fix-for-writing-decimals and squashes the following commits:
      
      10d4ea3 [Cheng Lian] Should use unscaled Long to write decimals for precision <= 18 rather than 8
      11caf1ce
    • Carson Wang's avatar
      [SPARK-9731] Standalone scheduling incorrect cores if spark.executor.cores is not set · ef062c15
      Carson Wang authored
      The issue only happens if `spark.executor.cores` is not set and executor memory is set to a high value.
      For example, if we have a worker with 4G and 10 cores and we set `spark.executor.memory` to 3G, then only 1 core is assigned to the executor. The correct number should be 10 cores.
      I've added a unit test to illustrate the issue.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #8017 from carsonwang/SPARK-9731 and squashes the following commits:
      
      d09ec48 [Carson Wang] Fix code style
      86b651f [Carson Wang] Simplify the code
      943cc4c [Carson Wang] fix scheduling correct cores to executors
      ef062c15
  2. Aug 07, 2015
    • Yin Huai's avatar
      [SPARK-9753] [SQL] TungstenAggregate should also accept InternalRow instead of just UnsafeRow · c564b274
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-9753
      
      This PR makes TungstenAggregate to accept `InternalRow` instead of just `UnsafeRow`. Also, it adds an `getAggregationBufferFromUnsafeRow` method to `UnsafeFixedWidthAggregationMap`. It is useful when we already have grouping keys stored in `UnsafeRow`s. Finally, it wraps `InputStream` and `OutputStream` in `UnsafeRowSerializer` with `BufferedInputStream` and `BufferedOutputStream`, respectively.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8041 from yhuai/joinedRowForProjection and squashes the following commits:
      
      7753e34 [Yin Huai] Use BufferedInputStream and BufferedOutputStream.
      d68b74e [Yin Huai] Use joinedRow instead of UnsafeRowJoiner.
      e93c009 [Yin Huai] Add getAggregationBufferFromUnsafeRow for cases that the given groupingKeyRow is already an UnsafeRow.
      c564b274
    • Reynold Xin's avatar
      [SPARK-9754][SQL] Remove TypeCheck in debug package. · 998f4ff9
      Reynold Xin authored
      TypeCheck no longer applies in the new "Tungsten" world.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8043 from rxin/SPARK-9754 and squashes the following commits:
      
      4ec471e [Reynold Xin] [SPARK-9754][SQL] Remove TypeCheck in debug package.
      998f4ff9
    • Feynman Liang's avatar
      [SPARK-9719] [ML] Clean up Naive Bayes doc · 85be65b3
      Feynman Liang authored
      Small documentation cleanups, including:
       * Adds documentation for `pi` and `theta`
       * setParam to `setModelType`
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8047 from feynmanliang/SPARK-9719 and squashes the following commits:
      
      b372438 [Feynman Liang] Clean up naive bayes doc
      85be65b3
    • Feynman Liang's avatar
      [SPARK-9756] [ML] Make constructors in ML decision trees private · cd540c1e
      Feynman Liang authored
      These should be made private until there is a public constructor for providing `rootNode: Node` to use these constructors.
      
      jkbradley
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8046 from feynmanliang/SPARK-9756 and squashes the following commits:
      
      2cbdf08 [Feynman Liang] Make RFRegressionModel aux constructor private
      a06f596 [Feynman Liang] Make constructors in ML decision trees private
      cd540c1e
    • Michael Armbrust's avatar
      [SPARK-8890] [SQL] Fallback on sorting when writing many dynamic partitions · 49702bd7
      Michael Armbrust authored
      Previously, we would open a new file for each new dynamic written out using `HadoopFsRelation`.  For formats like parquet this is very costly due to the buffers required to get good compression.  In this PR I refactor the code allowing us to fall back on an external sort when many partitions are seen.  As such each task will open no more than `spark.sql.sources.maxFiles` files.  I also did the following cleanup:
      
       - Instead of keying the file HashMap on an expensive to compute string representation of the partition, we now use a fairly cheap UnsafeProjection that avoids heap allocations.
       - The control flow for instantiating and invoking a writer container has been simplified.  Now instead of switching in two places based on the use of partitioning, the specific writer container must implement a single method `writeRows` that is invoked using `runJob`.
       - `InternalOutputWriter` has been removed.  Instead we have a `private[sql]` method `writeInternal` that converts and calls the public method.  This method can be overridden by internal datasources to avoid the conversion.  This change remove a lot of code duplication and per-row `asInstanceOf` checks.
       - `commands.scala` has been split up.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #8010 from marmbrus/fsWriting and squashes the following commits:
      
      00804fe [Michael Armbrust] use shuffleMemoryManager.pageSizeBytes
      775cc49 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into fsWriting
      17b690e [Michael Armbrust] remove comment
      40f0372 [Michael Armbrust] address comments
      f5675bd [Michael Armbrust] char -> string
      7e2d0a4 [Michael Armbrust] make sure we close current writer
      8100100 [Michael Armbrust] delete empty commands.scala
      71cc717 [Michael Armbrust] update comment
      8ec75ac [Michael Armbrust] [SPARK-8890][SQL] Fallback on sorting when writing many dynamic partitions
      49702bd7
    • Bertrand Dechoux's avatar
      [SPARK-9748] [MLLIB] Centriod typo in KMeansModel · 902334fd
      Bertrand Dechoux authored
      A minor typo (centriod -> centroid). Readable variable names help every users.
      
      Author: Bertrand Dechoux <BertrandDechoux@users.noreply.github.com>
      
      Closes #8037 from BertrandDechoux/kmeans-typo and squashes the following commits:
      
      47632fe [Bertrand Dechoux] centriod typo
      902334fd
    • Dariusz Kobylarz's avatar
      [SPARK-8481] [MLLIB] GaussianMixtureModel predict accepting single vector · e2fbbe73
      Dariusz Kobylarz authored
      Resubmit of [https://github.com/apache/spark/pull/6906] for adding single-vec predict to GMMs
      
      CC: dkobylarz  mengxr
      
      To be merged with master and branch-1.5
      Primary author: dkobylarz
      
      Author: Dariusz Kobylarz <darek.kobylarz@gmail.com>
      
      Closes #8039 from jkbradley/gmm-predict-vec and squashes the following commits:
      
      bfbedc4 [Dariusz Kobylarz] [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector
      e2fbbe73
    • Andrew Or's avatar
      [SPARK-9674] Re-enable ignored test in SQLQuerySuite · 881548ab
      Andrew Or authored
      The original code that this test tests is removed in https://github.com/apache/spark/commit/9270bd06fd0b16892e3f37213b5bc7813ea11fdd. It was ignored shortly before that so we never caught it. This patch re-enables the test and adds the code necessary to make it pass.
      
      JoshRosen yhuai
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8015 from andrewor14/SPARK-9674 and squashes the following commits:
      
      225eac2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into SPARK-9674
      8c24209 [Andrew Or] Fix NPE
      e541d64 [Andrew Or] Track aggregation memory for both sort and hash
      0be3a42 [Andrew Or] Fix test
      881548ab
    • Reynold Xin's avatar
      [SPARK-9733][SQL] Improve physical plan explain for data sources · 05d04e10
      Reynold Xin authored
      All data sources show up as "PhysicalRDD" in physical plan explain. It'd be better if we can show the name of the data source.
      
      Without this patch:
      ```
      == Physical Plan ==
      NewAggregate with UnsafeHybridAggregationIterator ArrayBuffer(date#0, cat#1) ArrayBuffer((sum(CAST((CAST(count#2, IntegerType) + 1), LongType))2,mode=Final,isDistinct=false))
       Exchange hashpartitioning(date#0,cat#1)
        NewAggregate with UnsafeHybridAggregationIterator ArrayBuffer(date#0, cat#1) ArrayBuffer((sum(CAST((CAST(count#2, IntegerType) + 1), LongType))2,mode=Partial,isDistinct=false))
         PhysicalRDD [date#0,cat#1,count#2], MapPartitionsRDD[3] at
      ```
      
      With this patch:
      ```
      == Physical Plan ==
      TungstenAggregate(key=[date#0,cat#1], value=[(sum(CAST((CAST(count#2, IntegerType) + 1), LongType)),mode=Final,isDistinct=false)]
       Exchange hashpartitioning(date#0,cat#1)
        TungstenAggregate(key=[date#0,cat#1], value=[(sum(CAST((CAST(count#2, IntegerType) + 1), LongType)),mode=Partial,isDistinct=false)]
         ConvertToUnsafe
          Scan ParquetRelation[file:/scratch/rxin/spark/sales4][date#0,cat#1,count#2]
      ```
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8024 from rxin/SPARK-9733 and squashes the following commits:
      
      811b90e [Reynold Xin] Fixed Python test case.
      52cab77 [Reynold Xin] Cast.
      eea9ccc [Reynold Xin] Fix test case.
      fcecb22 [Reynold Xin] [SPARK-9733][SQL] Improve explain message for data source scan node.
      05d04e10
    • Reynold Xin's avatar
      [SPARK-9667][SQL] followup: Use GenerateUnsafeProjection.canSupport to test... · aeddeafc
      Reynold Xin authored
      [SPARK-9667][SQL] followup: Use GenerateUnsafeProjection.canSupport to test Exchange supported data types.
      
      This way we recursively test the data types.
      
      cc chenghao-intel
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8036 from rxin/cansupport and squashes the following commits:
      
      f7302ff [Reynold Xin] Can GenerateUnsafeProjection.canSupport to test Exchange supported data types.
      aeddeafc
    • Reynold Xin's avatar
      [SPARK-9736] [SQL] JoinedRow.anyNull should delegate to the underlying rows. · 9897cc5e
      Reynold Xin authored
      JoinedRow.anyNull currently loops through every field to check for null, which is inefficient if the underlying rows are UnsafeRows. It should just delegate to the underlying implementation.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8027 from rxin/SPARK-9736 and squashes the following commits:
      
      03a2e92 [Reynold Xin] Include all files.
      90f1add [Reynold Xin] [SPARK-9736][SQL] JoinedRow.anyNull should delegate to the underlying rows.
      9897cc5e
    • Wenchen Fan's avatar
      [SPARK-8382] [SQL] Improve Analysis Unit test framework · 2432c2e2
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8025 from cloud-fan/analysis and squashes the following commits:
      
      51461b1 [Wenchen Fan] move test file to test folder
      ec88ace [Wenchen Fan] Improve Analysis Unit test framework
      2432c2e2
    • Reynold Xin's avatar
      [SPARK-9674][SPARK-9667] Remove SparkSqlSerializer2 · 76eaa701
      Reynold Xin authored
      It is now subsumed by various Tungsten operators.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7981 from rxin/SPARK-9674 and squashes the following commits:
      
      144f96e [Reynold Xin] Re-enable test
      58b7332 [Reynold Xin] Disable failing list.
      fb797e3 [Reynold Xin] Match all UDTs.
      be9f243 [Reynold Xin] Updated if.
      71fc99c [Reynold Xin] [SPARK-9674][SPARK-9667] Remove GeneratedAggregate & SparkSqlSerializer2.
      76eaa701
    • zsxwing's avatar
      [SPARK-9467][SQL]Add SQLMetric to specialize accumulators to avoid boxing · ebfd91c5
      zsxwing authored
      This PR adds SQLMetric/SQLMetricParam/SQLMetricValue to specialize accumulators to avoid boxing. All SQL metrics should use these classes rather than `Accumulator`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7996 from zsxwing/sql-accu and squashes the following commits:
      
      14a5f0a [zsxwing] Address comments
      367ca23 [zsxwing] Use localValue directly to avoid changing Accumulable
      42f50c3 [zsxwing] Add SQLMetric to specialize accumulators to avoid boxing
      ebfd91c5
    • Wenchen Fan's avatar
      [SPARK-9683] [SQL] copy UTF8String when convert unsafe array/map to safe · e57d6b56
      Wenchen Fan authored
      When we convert unsafe row to safe row, we will do copy if the column is struct or string type. However, the string inside unsafe array/map are not copied, which may cause problems.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7990 from cloud-fan/copy and squashes the following commits:
      
      c13d1e3 [Wenchen Fan] change test name
      fe36294 [Wenchen Fan] we should deep copy UTF8String when convert unsafe row to safe row
      e57d6b56
    • Davies Liu's avatar
      [SPARK-9453] [SQL] support records larger than page size in UnsafeShuffleExternalSorter · 15bd6f33
      Davies Liu authored
      This patch follows exactly #7891 (except testing)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8005 from davies/larger_record and squashes the following commits:
      
      f9c4aff [Davies Liu] address comments
      9de5c72 [Davies Liu] support records larger than page size in UnsafeShuffleExternalSorter
      15bd6f33
    • Reynold Xin's avatar
      [SPARK-9700] Pick default page size more intelligently. · 4309262e
      Reynold Xin authored
      Previously, we use 64MB as the default page size, which was way too big for a lot of Spark applications (especially for single node).
      
      This patch changes it so that the default page size, if unset by the user, is determined by the number of cores available and the total execution memory available.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8012 from rxin/pagesize and squashes the following commits:
      
      16f4756 [Reynold Xin] Fixed failing test.
      5afd570 [Reynold Xin] private...
      0d5fb98 [Reynold Xin] Update default value.
      674a6cd [Reynold Xin] Address review feedback.
      dc00e05 [Reynold Xin] Merge with master.
      73ebdb6 [Reynold Xin] [SPARK-9700] Pick default page size more intelligently.
      4309262e
    • zsxwing's avatar
      [SPARK-8862][SQL]Support multiple SQLContexts in Web UI · 7aaed1b1
      zsxwing authored
      This is a follow-up PR to solve the UI issue when there are multiple SQLContexts. Each SQLContext has a separate tab and contains queries which are executed by this SQLContext.
      
      <img width="1366" alt="multiple sqlcontexts" src="https://cloud.githubusercontent.com/assets/1000778/9088391/54584434-3bc2-11e5-9caf-94c2b0da528e.png">
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7962 from zsxwing/multi-sqlcontext-ui and squashes the following commits:
      
      cf661e1 [zsxwing] sql -> SQL
      39b0c97 [zsxwing] Support multiple SQLContexts in Web UI
      7aaed1b1
    • Cheng Lian's avatar
      [SPARK-7550] [SQL] [MINOR] Fixes logs when persisting DataFrames · f0cda587
      Cheng Lian authored
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8021 from liancheng/spark-7550/fix-logs and squashes the following commits:
      
      b7bd0ed [Cheng Lian] Fixes logs
      f0cda587
  3. Aug 06, 2015
    • zsxwing's avatar
      [SPARK-8057][Core]Call TaskAttemptContext.getTaskAttemptID using Reflection · 672f4676
      zsxwing authored
      Someone may use the Spark core jar in the maven repo with hadoop 1. SPARK-2075 has already resolved the compatibility issue to support it. But `SparkHadoopMapRedUtil.commitTask` broke it recently.
      
      This PR uses Reflection to call `TaskAttemptContext.getTaskAttemptID` to fix the compatibility issue.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6599 from zsxwing/SPARK-8057 and squashes the following commits:
      
      f7a343c [zsxwing] Remove the redundant import
      6b7f1af [zsxwing] Call TaskAttemptContext.getTaskAttemptID using Reflection
      672f4676
    • Jeff Zhang's avatar
      Fix doc typo · fe12277b
      Jeff Zhang authored
      Straightforward fix on doc typo
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #8019 from zjffdu/master and squashes the following commits:
      
      aed6e64 [Jeff Zhang] Fix doc typo
      fe12277b
    • Davies Liu's avatar
      [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe · 17284db3
      Davies Liu authored
      
      spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.
      
      cc marmbrus rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7998 from davies/tungsten and squashes the following commits:
      
      c1c16da [Davies Liu] update doc
      1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe
      
      (cherry picked from commit 4e70e825)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      17284db3
    • Andrew Or's avatar
      [SPARK-9709] [SQL] Avoid starving unsafe operators that use sort · 014a9f9d
      Andrew Or authored
      The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an `IOException("Unable to acquire X bytes of memory")` thrown by `UnsafeExternalSorter`.
      
      The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8011 from andrewor14/unsafe-starve-memory and squashes the following commits:
      
      35b69a4 [Andrew Or] Simplify test
      0b07782 [Andrew Or] Minor: update comments
      5d5afdf [Andrew Or] Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory
      254032e [Andrew Or] Add tests
      234acbd [Andrew Or] Reserve a page in sorter when preparing each partition
      b889e08 [Andrew Or] MapPartitionsWithPreparationRDD
      014a9f9d
    • Reynold Xin's avatar
      [SPARK-9692] Remove SqlNewHadoopRDD's generated Tuple2 and InterruptibleIterator. · b8782531
      Reynold Xin authored
      A small performance optimization – we don't need to generate a Tuple2 and then immediately discard the key. We also don't need an extra wrapper from InterruptibleIterator.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8000 from rxin/SPARK-9692 and squashes the following commits:
      
      1d4d0b3 [Reynold Xin] [SPARK-9692] Remove SqlNewHadoopRDD's generated Tuple2 and InterruptibleIterator.
      b8782531
    • Davies Liu's avatar
    • Michael Armbrust's avatar
      [SPARK-9650][SQL] Fix quoting behavior on interpolated column names · 0867b23c
      Michael Armbrust authored
      Make sure that `$"column"` is consistent with other methods with respect to backticks.  Adds a bunch of tests for various ways of constructing columns.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #7969 from marmbrus/namesWithDots and squashes the following commits:
      
      53ef3d7 [Michael Armbrust] [SPARK-9650][SQL] Fix quoting behavior on interpolated column names
      2bf7a92 [Michael Armbrust] WIP
      0867b23c
    • Davies Liu's avatar
      [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe · 4e70e825
      Davies Liu authored
      spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.
      
      cc marmbrus rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7998 from davies/tungsten and squashes the following commits:
      
      c1c16da [Davies Liu] update doc
      1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe
      4e70e825
    • Yin Huai's avatar
      [SPARK-9691] [SQL] PySpark SQL rand function treats seed 0 as no seed · baf4587a
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-9691
      
      jkbradley rxin
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7999 from yhuai/pythonRand and squashes the following commits:
      
      4187e0c [Yin Huai] Regression test.
      a985ef9 [Yin Huai] Use "if seed is not None" instead "if seed" because "if seed" returns false when seed is 0.
      baf4587a
    • Sean Owen's avatar
      [SPARK-9633] [BUILD] SBT download locations outdated; need an update · 681e3024
      Sean Owen authored
      Remove 2 defunct SBT download URLs and replace with the 1 known download URL. Also, use https.
      Follow up on https://github.com/apache/spark/pull/7792
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #7956 from srowen/SPARK-9633 and squashes the following commits:
      
      caa40bd [Sean Owen] Remove 2 defunct SBT download URLs and replace with the 1 known download URL. Also, use https.
      681e3024
    • Marcelo Vanzin's avatar
      [SPARK-9645] [YARN] [CORE] Allow shuffle service to read shuffle files. · e234ea1b
      Marcelo Vanzin authored
      Spark should not mess with the permissions of directories created
      by the cluster manager. Here, by setting the block manager dir
      permissions to 700, the shuffle service (running as the YARN user)
      wouldn't be able to serve shuffle files created by applications.
      
      Also, the code to protect the local app dir was missing in standalone's
      Worker; that has been now added. Since all processes run as the same
      user in standalone, `chmod 700` should not cause problems.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7966 from vanzin/SPARK-9645 and squashes the following commits:
      
      6e07b31 [Marcelo Vanzin] Protect the app dir in standalone mode.
      384ba6a [Marcelo Vanzin] [SPARK-9645] [yarn] [core] Allow shuffle service to read shuffle files.
      e234ea1b
    • Yin Huai's avatar
      [SPARK-9630] [SQL] Clean up new aggregate operators (SPARK-9240 follow up) · 3504bf3a
      Yin Huai authored
      This is the followup of https://github.com/apache/spark/pull/7813. It renames `HybridUnsafeAggregationIterator` to `TungstenAggregationIterator` and makes it only work with `UnsafeRow`. Also, I add a `TungstenAggregate` that uses `TungstenAggregationIterator` and make `SortBasedAggregate` (renamed from `SortBasedAggregate`) only works with `SafeRow`.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7954 from yhuai/agg-followUp and squashes the following commits:
      
      4d2f4fc [Yin Huai] Add comments and free map.
      0d7ddb9 [Yin Huai] Add TungstenAggregationQueryWithControlledFallbackSuite to test fall back process.
      91d69c2 [Yin Huai] Rename UnsafeHybridAggregationIterator to  TungstenAggregateIteraotr and make it only work with UnsafeRow.
      3504bf3a
    • zsxwing's avatar
      [SPARK-9639] [STREAMING] Fix a potential NPE in Streaming JobScheduler · 34620909
      zsxwing authored
      Because `JobScheduler.stop(false)` may set `eventLoop` to null when `JobHandler` is running, then it's possible that when `post` is called, `eventLoop` happens to null.
      
      This PR fixed this bug and also set threads in `jobExecutor` to `daemon`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7960 from zsxwing/fix-npe and squashes the following commits:
      
      b0864c4 [zsxwing] Fix a potential NPE in Streaming JobScheduler
      34620909
    • cody koeninger's avatar
      [DOCS] [STREAMING] make the existing parameter docs for OffsetRange ac… · 1723e348
      cody koeninger authored
      …tually visible
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #7995 from koeninger/doc-fixes and squashes the following commits:
      
      87af9ea [cody koeninger] [Docs][Streaming] make the existing parameter docs for OffsetRange actually visible
      1723e348
Loading