Skip to content
Snippets Groups Projects
  1. Aug 16, 2015
    • Sun Rui's avatar
      [SPARK-8844] [SPARKR] head/collect is broken in SparkR. · 5f9ce738
      Sun Rui authored
      This is a WIP patch for SPARK-8844  for collecting reviews.
      
      This bug is about reading an empty DataFrame. in readCol(),
            lapply(1:numRows, function(x) {
      does not take into consideration the case where numRows = 0.
      
      Will add unit test case.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7419 from sun-rui/SPARK-8844.
      5f9ce738
    • Kun Xu's avatar
      [SPARK-9973] [SQL] Correct in-memory columnar buffer size · 182f9b7a
      Kun Xu authored
      The `initialSize` argument of `ColumnBuilder.initialize()` should be the
      number of rows rather than bytes.  However `InMemoryColumnarTableScan`
      passes in a byte size, which makes Spark SQL allocate more memory than
      necessary when building in-memory columnar buffers.
      
      Author: Kun Xu <viper_kun@163.com>
      
      Closes #8189 from viper-kun/errorSize.
      182f9b7a
  2. Aug 15, 2015
    • Joseph K. Bradley's avatar
      [SPARK-9805] [MLLIB] [PYTHON] [STREAMING] Added _eventually for ml streaming pyspark tests · 1db7179f
      Joseph K. Bradley authored
      Recently, PySpark ML streaming tests have been flaky, most likely because of the batches not being processed in time.  Proposal: Replace the use of _ssc_wait (which waits for a fixed amount of time) with a method which waits for a fixed amount of time but can terminate early based on a termination condition method.  With this, we can extend the waiting period (to make tests less flaky) but also stop early when possible (making tests faster on average, which I verified locally).
      
      CC: mengxr tdas freeman-lab
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8087 from jkbradley/streaming-ml-tests.
      1db7179f
    • Wenchen Fan's avatar
      [SPARK-9955] [SQL] correct error message for aggregate · 57056725
      Wenchen Fan authored
      We should skip unresolved `LogicalPlan`s for `PullOutNondeterministic`, as calling `output` on unresolved `LogicalPlan` will produce confusing error message.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8203 from cloud-fan/error-msg and squashes the following commits:
      
      1c67ca7 [Wenchen Fan] move test
      7593080 [Wenchen Fan] correct error message for aggregate
      57056725
    • Herman van Hovell's avatar
      [SPARK-9980] [BUILD] Fix SBT publishLocal error due to invalid characters in doc · a85fb6c0
      Herman van Hovell authored
      Tiny modification to a few comments ```sbt publishLocal``` work again.
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #8209 from hvanhovell/SPARK-9980.
      a85fb6c0
    • Davies Liu's avatar
      [SPARK-9725] [SQL] fix serialization of UTF8String across different JVM · 7c1e5682
      Davies Liu authored
      The BYTE_ARRAY_OFFSET could be different in JVM with different configurations (for example, different heap size, 24 if heap > 32G, otherwise 16), so offset of UTF8String is not portable, we should handler that during serialization.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8210 from davies/serialize_utf8string.
      7c1e5682
  3. Aug 14, 2015
  4. Aug 13, 2015
    • Davies Liu's avatar
      [SPARK-9945] [SQL] pageSize should be calculated from executor.memory · bd35385d
      Davies Liu authored
      Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead.
      
      Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8175 from davies/page_size.
      bd35385d
    • Andrew Or's avatar
      [SPARK-9580] [SQL] Replace singletons in SQL tests · 8187b3ae
      Andrew Or authored
      A fundamental limitation of the existing SQL tests is that *there is simply no way to create your own `SparkContext`*. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure.
      
      This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch *all* the SQL test files.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111)
      <!-- Reviewable:end -->
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8111 from andrewor14/sql-tests-refactor.
      8187b3ae
    • Davies Liu's avatar
      [SPARK-9943] [SQL] deserialized UnsafeHashedRelation should be serializable · c50f97da
      Davies Liu authored
      When the free memory in executor goes low, the cached broadcast objects need to serialized into disk, but currently the deserialized UnsafeHashedRelation can't be serialized , fail with NPE. This PR fixes that.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8174 from davies/serialize_hashed.
      c50f97da
    • Davies Liu's avatar
      [SPARK-8976] [PYSPARK] fix open mode in python3 · 693949ba
      Davies Liu authored
      This bug only happen on Python 3 and Windows.
      
      I tested this manually with python 3 and disable python daemon, no unit test yet.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8181 from davies/open_mode.
      693949ba
    • Xiangrui Meng's avatar
      [SPARK-9922] [ML] rename StringIndexerReverse to IndexToString · 6c5858bc
      Xiangrui Meng authored
      What `StringIndexerInverse` does is not strictly associated with `StringIndexer`, and the name is not clearly describing the transformation. Renaming to `IndexToString` might be better.
      
      ~~I also changed `invert` to `inverse` without arguments. `inputCol` and `outputCol` could be set after.~~
      I also removed `invert`.
      
      jkbradley holdenk
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8152 from mengxr/SPARK-9922.
      6c5858bc
    • hyukjinkwon's avatar
      [SPARK-9935] [SQL] EqualNotNull not processed in ORC · c2520f50
      hyukjinkwon authored
      https://issues.apache.org/jira/browse/SPARK-9935
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #8163 from HyukjinKwon/master.
      c2520f50
    • Davies Liu's avatar
      [SPARK-9942] [PYSPARK] [SQL] ignore exceptions while try to import pandas · a8d2f4c5
      Davies Liu authored
      If pandas is broken (can't be imported, raise other exceptions other than ImportError), pyspark can't be imported, we should ignore all the exceptions.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8173 from davies/fix_pandas.
      a8d2f4c5
    • MechCoder's avatar
      [SPARK-9661] [MLLIB] [ML] Java compatibility · 864de8ea
      MechCoder authored
      I skimmed through the docs for various instance of Object and replaced them with Java compaible versions of the same.
      
      1. Some methods in LDAModel.
      2. runMiniBatchSGD
      3. kolmogorovSmirnovTest
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #8126 from MechCoder/java_incop.
      864de8ea
    • Andrew Or's avatar
      [SPARK-9649] Fix MasterSuite, third time's a charm · 8815ba2f
      Andrew Or authored
      This particular test did not load the default configurations so
      it continued to start the REST server, which causes port bind
      exceptions.
      8815ba2f
Loading