Skip to content
Snippets Groups Projects
  1. Aug 17, 2015
    • Calvin Jia's avatar
      [SPARK-9199] [CORE] Upgrade Tachyon version from 0.7.0 -> 0.7.1. · 3ff81ad2
      Calvin Jia authored
      Updates the tachyon-client version to the latest release.
      
      The main difference between 0.7.0 and 0.7.1 on the client side is to support running Tachyon on local file system by default.
      
      No new non-Tachyon dependencies are added, and no code changes are required since the client API has not changed.
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #8235 from calvinjia/spark-9199-master.
      3ff81ad2
    • Yu ISHIKAWA's avatar
      [SPARK-9871] [SPARKR] Add expression functions into SparkR which have a variable parameter · 26e76058
      Yu ISHIKAWA authored
      ### Summary
      
      - Add `lit` function
      - Add `concat`, `greatest`, `least` functions
      
      I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue.
      
      ### JIRA
      [[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #8194 from yu-iskw/SPARK-9856.
      26e76058
  2. Aug 16, 2015
    • Cheng Lian's avatar
      [SPARK-10005] [SQL] Fixes schema merging for nested structs · ae2370e7
      Cheng Lian authored
      In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handled.
      
      For example, the schema of a Parquet file to be read can be:
      
      ```
      message individual {
        required group f1 {
          optional binary f11 (utf8);
        }
      }
      ```
      
      while the global schema is:
      
      ```
      message global {
        required group f1 {
          optional binary f11 (utf8);
          optional int32 f12;
        }
      }
      ```
      
      This PR fixes this issue by padding missing fields when creating actual converters.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8228 from liancheng/spark-10005/nested-schema-merging.
      ae2370e7
    • Matei Zaharia's avatar
      [SPARK-10008] Ensure shuffle locality doesn't take precedence over narrow deps · cf016075
      Matei Zaharia authored
      The shuffle locality patch made the DAGScheduler aware of shuffle data,
      but for RDDs that have both narrow and shuffle dependencies, it can
      cause them to place tasks based on the shuffle dependency instead of the
      narrow one. This case is common in iterative join-based algorithms like
      PageRank and ALS, where one RDD is hash-partitioned and one isn't.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #8220 from mateiz/shuffle-loc-fix.
      cf016075
    • Sun Rui's avatar
      [SPARK-8844] [SPARKR] head/collect is broken in SparkR. · 5f9ce738
      Sun Rui authored
      This is a WIP patch for SPARK-8844  for collecting reviews.
      
      This bug is about reading an empty DataFrame. in readCol(),
            lapply(1:numRows, function(x) {
      does not take into consideration the case where numRows = 0.
      
      Will add unit test case.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7419 from sun-rui/SPARK-8844.
      5f9ce738
    • Kun Xu's avatar
      [SPARK-9973] [SQL] Correct in-memory columnar buffer size · 182f9b7a
      Kun Xu authored
      The `initialSize` argument of `ColumnBuilder.initialize()` should be the
      number of rows rather than bytes.  However `InMemoryColumnarTableScan`
      passes in a byte size, which makes Spark SQL allocate more memory than
      necessary when building in-memory columnar buffers.
      
      Author: Kun Xu <viper_kun@163.com>
      
      Closes #8189 from viper-kun/errorSize.
      182f9b7a
  3. Aug 15, 2015
    • Joseph K. Bradley's avatar
      [SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests · 1db7179f
      Joseph K. Bradley authored
      Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time.  Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method.  With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally).
      
      CC: mengxr tdas freeman-lab
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8087 from jkbradley/streaming-ml-tests.
      1db7179f
    • Wenchen Fan's avatar
      [SPARK-9955] [SQL] correct error message for aggregate · 57056725
      Wenchen Fan authored
      We should skip unresolved `LogicalPlan`s for `PullOutNondeterministic`, as calling `output` on unresolved `LogicalPlan` will produce confusing error message.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8203 from cloud-fan/error-msg and squashes the following commits:
      
      1c67ca7 [Wenchen Fan] move test
      7593080 [Wenchen Fan] correct error message for aggregate
      57056725
    • Herman van Hovell's avatar
      [SPARK-9980] [BUILD] Fix SBT publishLocal error due to invalid characters in doc · a85fb6c0
      Herman van Hovell authored
      Tiny modification to a few comments ```sbt publishLocal``` work again.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #8209 from hvanhovell/SPARK-9980.
      a85fb6c0
    • Davies Liu's avatar
      [SPARK-9725] [SQL] fix serialization of UTF8String across different JVM · 7c1e5682
      Davies Liu authored
      The BYTE_ARRAY_OFFSET could be different in JVM with different configurations (for example, different heap size, 24 if heap > 32G, otherwise 16), so offset of UTF8String is not portable, we should handler that during serialization.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8210 from davies/serialize_utf8string.
      7c1e5682
  4. Aug 14, 2015
  5. Aug 13, 2015
    • Davies Liu's avatar
      [SPARK-9945] [SQL] pageSize should be calculated from executor.memory · bd35385d
      Davies Liu authored
      Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead.
      
      Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8175 from davies/page_size.
      bd35385d
    • Andrew Or's avatar
      [SPARK-9580] [SQL] Replace singletons in SQL tests · 8187b3ae
      Andrew Or authored
      A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure.
      
      This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111)
      <!-- Reviewable:end -->
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8111 from andrewor14/sql-tests-refactor.
      8187b3ae
    • Davies Liu's avatar
      [SPARK-9943] [SQL] deserialized UnsafeHashedRelation should be serializable · c50f97da
      Davies Liu authored
      When the free memory in executor goes low, the cached broadcast objects need to serialized into disk, but currently the deserialized UnsafeHashedRelation can't be serialized , fail with NPE. This PR fixes that.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8174 from davies/serialize_hashed.
      c50f97da
    • Davies Liu's avatar
      [SPARK-8976] [PYSPARK] fix open mode in python3 · 693949ba
      Davies Liu authored
      This bug only happen on Python 3 and Windows.
      
      I tested this manually with python 3 and disable python daemon, no unit test yet.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8181 from davies/open_mode.
      693949ba
    • Xiangrui Meng's avatar
      [SPARK-9922] [ML] rename StringIndexerReverse to IndexToString · 6c5858bc
      Xiangrui Meng authored
      What `StringIndexerInverse` does is not strictly associated with `StringIndexer`, and the name is not clearly describing the transformation. Renaming to `IndexToString` might be better.
      
      ~~I also changed `invert` to `inverse` without arguments. `inputCol` and `outputCol` could be set after.~~
      I also removed `invert`.
      
      jkbradley holdenk
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8152 from mengxr/SPARK-9922.
      6c5858bc
Loading