Skip to content
Snippets Groups Projects
  1. Aug 17, 2015
  2. Aug 16, 2015
    • Cheng Lian's avatar
      [SPARK-10005] [SQL] Fixes schema merging for nested structs · ae2370e7
      Cheng Lian authored
      In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handled.
      
      For example, the schema of a Parquet file to be read can be:
      
      ```
      message individual {
        required group f1 {
          optional binary f11 (utf8);
        }
      }
      ```
      
      while the global schema is:
      
      ```
      message global {
        required group f1 {
          optional binary f11 (utf8);
          optional int32 f12;
        }
      }
      ```
      
      This PR fixes this issue by padding missing fields when creating actual converters.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8228 from liancheng/spark-10005/nested-schema-merging.
      ae2370e7
    • Matei Zaharia's avatar
      [SPARK-10008] Ensure shuffle locality doesn't take precedence over narrow deps · cf016075
      Matei Zaharia authored
      The shuffle locality patch made the DAGScheduler aware of shuffle data,
      but for RDDs that have both narrow and shuffle dependencies, it can
      cause them to place tasks based on the shuffle dependency instead of the
      narrow one. This case is common in iterative join-based algorithms like
      PageRank and ALS, where one RDD is hash-partitioned and one isn't.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #8220 from mateiz/shuffle-loc-fix.
      cf016075
    • Sun Rui's avatar
      [SPARK-8844] [SPARKR] head/collect is broken in SparkR. · 5f9ce738
      Sun Rui authored
      This is a WIP patch for SPARK-8844  for collecting reviews.
      
      This bug is about reading an empty DataFrame. in readCol(),
            lapply(1:numRows, function(x) {
      does not take into consideration the case where numRows = 0.
      
      Will add unit test case.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7419 from sun-rui/SPARK-8844.
      5f9ce738
    • Kun Xu's avatar
      [SPARK-9973] [SQL] Correct in-memory columnar buffer size · 182f9b7a
      Kun Xu authored
      The `initialSize` argument of `ColumnBuilder.initialize()` should be the
      number of rows rather than bytes.  However `InMemoryColumnarTableScan`
      passes in a byte size, which makes Spark SQL allocate more memory than
      necessary when building in-memory columnar buffers.
      
      Author: Kun Xu <viper_kun@163.com>
      
      Closes #8189 from viper-kun/errorSize.
      182f9b7a
  3. Aug 15, 2015
    • Joseph K. Bradley's avatar
      [SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests · 1db7179f
      Joseph K. Bradley authored
      Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time.  Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method.  With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally).
      
      CC: mengxr tdas freeman-lab
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8087 from jkbradley/streaming-ml-tests.
      1db7179f
    • Wenchen Fan's avatar
      [SPARK-9955] [SQL] correct error message for aggregate · 57056725
      Wenchen Fan authored
      We should skip unresolved `LogicalPlan`s for `PullOutNondeterministic`, as calling `output` on unresolved `LogicalPlan` will produce confusing error message.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8203 from cloud-fan/error-msg and squashes the following commits:
      
      1c67ca7 [Wenchen Fan] move test
      7593080 [Wenchen Fan] correct error message for aggregate
      57056725
    • Herman van Hovell's avatar
      [SPARK-9980] [BUILD] Fix SBT publishLocal error due to invalid characters in doc · a85fb6c0
      Herman van Hovell authored
      Tiny modification to a few comments ```sbt publishLocal``` work again.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #8209 from hvanhovell/SPARK-9980.
      a85fb6c0
    • Davies Liu's avatar
      [SPARK-9725] [SQL] fix serialization of UTF8String across different JVM · 7c1e5682
      Davies Liu authored
      The BYTE_ARRAY_OFFSET could be different in JVM with different configurations (for example, different heap size, 24 if heap > 32G, otherwise 16), so offset of UTF8String is not portable, we should handler that during serialization.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8210 from davies/serialize_utf8string.
      7c1e5682
  4. Aug 14, 2015
Loading