Skip to content
Snippets Groups Projects
  1. Jan 13, 2016
    • Joseph K. Bradley's avatar
      [SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example · 20d8ef85
      Joseph K. Bradley authored
      Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #10707 from jkbradley/kmeans-doc-fix.
      20d8ef85
    • Yuhao Yang's avatar
      [SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when number of features is large · 021dafc6
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-12026
      
      The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger.
      
      I tested on local and the change can improve the performance and the running time was stable.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #10146 from hhbyyh/chiSq.
      021dafc6
    • jerryshao's avatar
      [SPARK-12400][SHUFFLE] Avoid generating temp shuffle files for empty partitions · cd81fc9e
      jerryshao authored
      This problem lies in `BypassMergeSortShuffleWriter`, empty partition will also generate a temp shuffle file with several bytes. So here change to only create file when partition is not empty.
      
      This problem only lies in here, no such issue in `HashShuffleWriter`.
      
      Please help to review, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #10376 from jerryshao/SPARK-12400.
      cd81fc9e
    • Carson Wang's avatar
      [SPARK-12690][CORE] Fix NPE in UnsafeInMemorySorter.free() · eabc7b8e
      Carson Wang authored
      I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it.
      
      ```
      ERROR spark.TaskContextImpl: Error in TaskCompletionListener
      java.lang.NullPointerException
              at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110)
              at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288)
              at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141)
              at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
              at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
              at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
              at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
              at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
              at org.apache.spark.scheduler.Task.run(Task.scala:91)
              at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
              at java.lang.Thread.run(Thread.java:722)
      ```
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #10637 from carsonwang/FixNPE.
      eabc7b8e
    • Reynold Xin's avatar
      [SPARK-12791][SQL] Simplify CaseWhen by breaking "branches" into "conditions" and "values" · cbbcd8e4
      Reynold Xin authored
      This pull request rewrites CaseWhen expression to break the single, monolithic "branches" field into a sequence of tuples (Seq[(condition, value)]) and an explicit optional elseValue field.
      
      Prior to this pull request, each even position in "branches" represents the condition for each branch, and each odd position represents the value for each branch. The use of them have been pretty confusing with a lot sliding windows or grouped(2) calls.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10734 from rxin/simplify-case.
      cbbcd8e4
    • Wenchen Fan's avatar
      [SPARK-12642][SQL] improve the hash expression to be decoupled from unsafe row · c2ea79f9
      Wenchen Fan authored
      https://issues.apache.org/jira/browse/SPARK-12642
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10694 from cloud-fan/hash-expr.
      c2ea79f9
    • Erik Selin's avatar
      [SPARK-12268][PYSPARK] Make pyspark shell pythonstartup work under python3 · e4e0b3f7
      Erik Selin authored
      This replaces the `execfile` used for running custom python shell scripts
      with explicit open, compile and exec (as recommended by 2to3). The reason
      for this change is to make the pythonstartup option compatible with python3.
      
      Author: Erik Selin <erik.selin@gmail.com>
      
      Closes #10255 from tyro89/pythonstartup-python3.
      e4e0b3f7
    • Josh Rosen's avatar
      [SPARK-9383][PROJECT-INFRA] PR merge script should reset back to previous branch when possible · 97e0c7c5
      Josh Rosen authored
      This patch modifies our PR merge script to reset back to a named branch when restoring the original checkout upon exit. When the committer is originally checked out to a detached head, then they will be restored back to that same ref (the same as today's behavior).
      
      This is a slightly updated version of #7569, with an extra fix to handle the detached head corner-case.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10709 from JoshRosen/SPARK-9383.
      97e0c7c5
    • Jakob Odersky's avatar
      [SPARK-12761][CORE] Remove duplicated code · 38148f73
      Jakob Odersky authored
      Removes some duplicated code that was reintroduced during a merge.
      
      Author: Jakob Odersky <jodersky@gmail.com>
      
      Closes #10711 from jodersky/repl-2.11-duplicate.
      38148f73
    • Luc Bourlier's avatar
      [SPARK-12805][MESOS] Fixes documentation on Mesos run modes · cc91e218
      Luc Bourlier authored
      The default run has changed, but the documentation didn't fully reflect the change.
      
      Author: Luc Bourlier <luc.bourlier@typesafe.com>
      
      Closes #10740 from skyluc/issue/mesos-modes-doc.
      cc91e218
    • Liang-Chi Hsieh's avatar
      [SPARK-9297] [SQL] Add covar_pop and covar_samp · 63eee86c
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9297
      
      Add two aggregation functions: covar_pop and covar_samp.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #10029 from viirya/covar-funcs.
      63eee86c
    • Yin Huai's avatar
      [SPARK-12692][BUILD][HOT-FIX] Fix the scala style of KinesisBackedBlockRDDSuite.scala. · d6fd9b37
      Yin Huai authored
      https://github.com/apache/spark/pull/10736 was merged yesterday and caused the master start to fail because of the style issue.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #10742 from yhuai/fixStyle.
      d6fd9b37
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD] Enforce style checking about white space before comma · 3d81d63f
      Kousuke Saruta authored
      This is the final PR about SPARK-12692.
      We have removed all of white spaces before comma from code so let's enforce style checking.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10736 from sarutak/SPARK-12692-followup-enforce-checking.
      3d81d63f
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD][SQL] Scala style: Fix the style violation (Space before ",") · cb7b864a
      Kousuke Saruta authored
      Fix the style violation (space before , and :).
      This PR is a followup for #10643 and rework of #10685 .
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10732 from sarutak/SPARK-12692-followup-sql.
      cb7b864a
  2. Jan 12, 2016
  3. Jan 11, 2016
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD][YARN] Scala style: Fix the style violation (Space before "," or ":") · 112abf91
      Kousuke Saruta authored
      Fix the style violation (space before , and :).
      This PR is a followup for #10643.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10686 from sarutak/SPARK-12692-followup-yarn.
      112abf91
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space before "," or ":") · 39ae04e6
      Kousuke Saruta authored
      Fix the style violation (space before , and :).
      This PR is a followup for #10643.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10685 from sarutak/SPARK-12692-followup-streaming.
      39ae04e6
    • Yin Huai's avatar
      [SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel · aaa2c3b6
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-11823
      
      This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #10715 from yhuai/SPARK-11823-ignore.
      aaa2c3b6
    • Cheng Lian's avatar
      [SPARK-12498][SQL][MINOR] BooleanSimplication simplification · 36d49350
      Cheng Lian authored
      Scala syntax allows binary case classes to be used as infix operator in pattern matching. This PR makes use of this syntax sugar to make `BooleanSimplification` more readable.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10445 from liancheng/boolean-simplification-simplification.
      36d49350
    • wangfei's avatar
      [SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due... · 473907ad
      wangfei authored
      [SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists exception
      
      ```
      [info] Exception encountered when attempting to run a suite with class name:
      org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 milliseconds)
      [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
      [info]   at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
      [info]   at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
      [info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
      [info]   at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
      [info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
      [info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
      [info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
      [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
      [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
      [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      [info]   at java.lang.Thread.run(Thread.java:745)
      ```
      
      /cc liancheng
      
      Author: wangfei <wangfei_hello@126.com>
      
      Closes #10682 from scwf/fix-test.
      473907ad
    • Herman van Hovell's avatar
      [SPARK-12576][SQL] Enable expression parsing in CatalystQl · fe9eb0b0
      Herman van Hovell authored
      The PR allows us to use the new SQL parser to parse SQL expressions such as: ```1 + sin(x*x)```
      
      We enable this functionality in this PR, but we will not start using this actively yet. This will be done as soon as we have reached grammar parity with the existing parser stack.
      
      cc rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10649 from hvanhovell/SPARK-12576.
      fe9eb0b0
    • Yuhao Yang's avatar
      [SPARK-10809][MLLIB] Single-document topicDistributions method for LocalLDAModel · bbea8885
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-10809
      
      We could provide a single-document topicDistributions method for LocalLDAModel to allow for quick queries which avoid RDD operations. Currently, the user must use an RDD of documents.
      
      add some missing assert too.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #9484 from hhbyyh/ldaTopicPre.
      bbea8885
    • Yuhao Yang's avatar
      [SPARK-12685][MLLIB] word2vec trainWordsCount gets overflow · 4f8eefa3
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-12685
      the log of `word2vec` reports
      trainWordsCount = -785727483
      during computation over a large dataset.
      
      Update the priority as it will affect the computation process.
      `alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))`
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #10627 from hhbyyh/w2voverflow.
      4f8eefa3
    • Yanbo Liang's avatar
      [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single... · ee4ee02b
      Yanbo Liang authored
      [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft
      
      PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10552 from yanboliang/spark-12603.
      ee4ee02b
    • Brandon Bradley's avatar
      [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting · a767ee8a
      Brandon Bradley authored
      Warning users about casting changes.
      
      Author: Brandon Bradley <bradleytastic@gmail.com>
      
      Closes #10708 from blbradley/spark-12758.
      a767ee8a
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after install in dep tests · a4499145
      Josh Rosen authored
      This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script.
      
      First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed.
      
      I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests.
      
      /cc zsxwing
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10704 from JoshRosen/fix-build-test-problems.
      a4499145
Loading