Skip to content
Snippets Groups Projects
  1. Jul 17, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-9093] [SPARKR] Fix single-quotes strings in SparkR · 5a3c1ad0
      Yu ISHIKAWA authored
      [[SPARK-9093] Fix single-quotes strings in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9093)
      
      This is the result of lintr at the rivision:01155162
      [[SPARK-9093] The result of lintr at 01155162](https://gist.github.com/yu-iskw/8c47acf3202796da4d01)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7439 from yu-iskw/SPARK-9093 and squashes the following commits:
      
      61c391e [Yu ISHIKAWA] [SPARK-9093][SparkR] Fix single-quotes strings in SparkR
      5a3c1ad0
    • Wenchen Fan's avatar
      [SPARK-9102] [SQL] Improve project collapse with nondeterministic expressions · 3f6d28a5
      Wenchen Fan authored
      Currently we will stop project collapse when the lower projection has nondeterministic expressions. However it's overkill sometimes, we should be able to optimize `df.select(Rand(10)).select('a)` to `df.select('a)`
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7445 from cloud-fan/non-deterministic and squashes the following commits:
      
      0deaef6 [Wenchen Fan] Improve project collapse with nondeterministic expressions
      3f6d28a5
    • Reynold Xin's avatar
    • Xiangrui Meng's avatar
      [SPARK-9126] [MLLIB] do not assert on time taken by Thread.sleep() · 358e7bf6
      Xiangrui Meng authored
      Measure lower and upper bounds for task time and use them for validation. This PR also implements `Stopwatch.toString`. This suite should finish in less than 1 second.
      
      jkbradley pwendell
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7457 from mengxr/SPARK-9126 and squashes the following commits:
      
      4b40faa [Xiangrui Meng] simplify tests
      739f5bd [Xiangrui Meng] do not assert on time taken by Thread.sleep()
      358e7bf6
    • Joseph K. Bradley's avatar
      [SPARK-7131] [ML] Copy Decision Tree, Random Forest impl to spark.ml · 322d286b
      Joseph K. Bradley authored
      This PR copies the RandomForest implementation from spark.mllib to spark.ml.  Note that this includes the DecisionTree implementation, but not the GradientBoostedTrees one (which will come later).
      
      I essentially copied a minimal amount of code to spark.ml, removed the use of bins (and only used splits), and modified code only as much as necessary to get it to compile.  The spark.ml implementation still uses some spark.mllib classes (privately), which can be moved in future PRs.
      
      This refactoring will be helpful in extending the node representation to include more information, such as class probabilities.
      
      Specifically:
      * Copied code from spark.mllib to spark.ml:
        * mllib.tree.DecisionTree, mllib.tree.RandomForest copied to ml.tree.impl.RandomForest (main implementation)
        * NodeIdCache (needed to use splits instead of bins)
        * TreePoint (use splits instead of bins)
      * Added ml.tree.LearningNode used in RandomForest training (needed vars)
      * Removed bins from implementation, and only used splits
      * Small fix in JavaDecisionTreeRegressorSuite
      
      CC: mengxr  manishamde  codedeft chouqin
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #7294 from jkbradley/dt-move-impl and squashes the following commits:
      
      48749be [Joseph K. Bradley] cleanups based on code review, mostly style
      bea9703 [Joseph K. Bradley] scala style fixes.  added some scala doc
      4e6d2a4 [Joseph K. Bradley] removed unnecessary use of copyValues, setParent for trees
      9a4d721 [Joseph K. Bradley] cleanups. removed InfoGainStats from ml, using old one for now.
      836e7d4 [Joseph K. Bradley] Fixed test suite failures
      bd5e063 [Joseph K. Bradley] fixed bucketizing issue
      0df3759 [Joseph K. Bradley] Need to remove use of Bucketizer
      d5224a9 [Joseph K. Bradley] modified tree and forest to use moved impl
      cc01823 [Joseph K. Bradley] still editing RF to get it to work
      19143fb [Joseph K. Bradley] More progress, but not done yet.  Rebased with master after 1.4 release.
      322d286b
  2. Jul 16, 2015
    • Wenchen Fan's avatar
      [SPARK-8899] [SQL] remove duplicated equals method for Row · f893955b
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7291 from cloud-fan/row and squashes the following commits:
      
      a11addf [Wenchen Fan] move hashCode back to internal row
      2de6180 [Wenchen Fan] making apply() call to get()
      fbe1b24 [Wenchen Fan] add null check
      ebdf148 [Wenchen Fan] address comments
      25ef087 [Wenchen Fan] remove duplicated equals method for Row
      f893955b
    • zsxwing's avatar
      [SPARK-8857][SPARK-8859][Core]Add an internal flag to Accumulable and send... · 812b63bb
      zsxwing authored
      [SPARK-8857][SPARK-8859][Core]Add an internal flag to Accumulable and send internal accumulator updates to the driver via heartbeats
      
      This PR includes the following changes:
      
      1. Remove the thread local `Accumulators.localAccums`. Instead, all Accumulators in the executors will register with its TaskContext.
      2. Add an internal flag to Accumulable. For internal Accumulators, their updates will be sent to the driver via heartbeats.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7448 from zsxwing/accumulators and squashes the following commits:
      
      c24bc5b [zsxwing] Add comments
      bd7dcf1 [zsxwing] Add an internal flag to Accumulable and send internal accumulator updates to the driver via heartbeats
      812b63bb
    • Andrew Or's avatar
      [SPARK-8119] HeartbeatReceiver should replace executors, not kill · 96aa3340
      Andrew Or authored
      **Symptom.** If an executor in an application times out, `HeartbeatReceiver` attempts to kill it. After this happens, however, the application never gets an executor back even when there are cluster resources available.
      
      **Cause.** The issue is that `sc.killExecutor` automatically assumes that the application wishes to adjust its resource requirements permanently downwards. This is not the intention in `HeartbeatReceiver`, however, which simply wants a replacement for the expired executor.
      
      **Fix.** Differentiate between the intention to kill and the intention to replace an executor with a fresh one. More details can be found in the commit message.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7107 from andrewor14/heartbeat-no-kill and squashes the following commits:
      
      1cd2cd7 [Andrew Or] Add regression test for SPARK-8119
      25a347d [Andrew Or] Reuse more code in scheduler backend
      31ebd40 [Andrew Or] Differentiate between kill and replace
      96aa3340
    • Timothy Chen's avatar
      [SPARK-6284] [MESOS] Add mesos role, principal and secret · d86bbb4e
      Timothy Chen authored
      Mesos supports framework authentication and role to be set per framework, which the role is used to identify the framework's role which impacts the sharing weight of resource allocation and optional authentication information to allow the framework to be connected to the master.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #4960 from tnachen/mesos_fw_auth and squashes the following commits:
      
      0f9f03e [Timothy Chen] Fix review comments.
      8f9488a [Timothy Chen] Fix rebase
      f7fc2a9 [Timothy Chen] Add mesos role, auth and secret.
      d86bbb4e
    • Lianhui Wang's avatar
      [SPARK-8646] PySpark does not run on YARN if master not provided in command line · 49351c7f
      Lianhui Wang authored
      andrewor14 davies vanzin can you take a look at this? thanks
      
      Author: Lianhui Wang <lianhuiwang09@gmail.com>
      
      Closes #7438 from lianhuiwang/SPARK-8646 and squashes the following commits:
      
      cb3f12d [Lianhui Wang] add whitespace
      6d874a6 [Lianhui Wang] support pyspark for yarn-client
      49351c7f
    • Aaron Davidson's avatar
      [SPARK-8644] Include call site in SparkException stack traces thrown by job failures · 57e9b13b
      Aaron Davidson authored
      Example exception (new part at bottom, clearly demarcated):
      
      ```
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.RuntimeException: uh-oh!
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38$$anonfun$apply$mcJ$sp$2.apply(DAGSchedulerSuite.scala:880)
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38$$anonfun$apply$mcJ$sp$2.apply(DAGSchedulerSuite.scala:880)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1640)
      	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099)
      	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099)
      	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1777)
      	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1777)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
      	at org.apache.spark.scheduler.Task.run(Task.scala:70)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      
      Driver stacktrace:
      	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1298)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1289)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1288)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1288)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:755)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:755)
      	at scala.Option.foreach(Option.scala:236)
      	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:755)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1509)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1470)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1459)
      	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:560)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1744)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1762)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1777)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1791)
      	at org.apache.spark.rdd.RDD.count(RDD.scala:1099)
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38.apply$mcJ$sp(DAGSchedulerSuite.scala:880)
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38.apply(DAGSchedulerSuite.scala:880)
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38.apply(DAGSchedulerSuite.scala:880)
      	at org.scalatest.Assertions$class.intercept(Assertions.scala:997)
      	at org.scalatest.FunSuite.intercept(FunSuite.scala:1555)
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37.apply$mcV$sp(DAGSchedulerSuite.scala:879)
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37.apply(DAGSchedulerSuite.scala:878)
      	at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37.apply(DAGSchedulerSuite.scala:878)
      	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
      	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
      	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
      	at org.scalatest.Transformer.apply(Transformer.scala:22)
      	at org.scalatest.Transformer.apply(Transformer.scala:20)
      	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
      	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
      	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
      	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
      	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
      	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
      	at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfter$$super$runTest(DAGSchedulerSuite.scala:70)
      	at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
      	at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:70)
      	at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
      	at org.apache.spark.scheduler.DAGSchedulerSuite.runTest(DAGSchedulerSuite.scala:70)
      	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
      	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
      	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
      	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
      	at scala.collection.immutable.List.foreach(List.scala:318)
      	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
      	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
      	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
      	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
      	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
      	at org.scalatest.Suite$class.run(Suite.scala:1424)
      	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
      	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
      	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
      	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
      	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
      	at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfter$$super$run(DAGSchedulerSuite.scala:70)
      	at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
      	at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterAll$$super$run(DAGSchedulerSuite.scala:70)
      	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
      	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
      	at org.apache.spark.scheduler.DAGSchedulerSuite.run(DAGSchedulerSuite.scala:70)
      	at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
      	at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
      	at sbt.ForkMain$Run$2.call(ForkMain.java:294)
      	at sbt.ForkMain$Run$2.call(ForkMain.java:284)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      ```
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #7028 from aarondav/stack-trace and squashes the following commits:
      
      4714664 [Aaron Davidson] [SPARK-8644] Include call site in SparkException stack traces thrown by job failures
      57e9b13b
    • jerryshao's avatar
      [SPARK-6304] [STREAMING] Fix checkpointing doesn't retain driver port issue. · 031d7d41
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #5060 from jerryshao/SPARK-6304 and squashes the following commits:
      
      89b01f5 [jerryshao] Update the unit test to add more cases
      275d252 [jerryshao] Address the comments
      7cc146d [jerryshao] Address the comments
      2624723 [jerryshao] Fix rebase conflict
      45befaa [Saisai Shao] Update the unit test
      bbc1c9c [Saisai Shao] Fix checkpointing doesn't retain driver port issue
      031d7d41
    • Reynold Xin's avatar
      [SPARK-9085][SQL] Remove LeafNode, UnaryNode, BinaryNode from TreeNode. · fec10f0c
      Reynold Xin authored
      This builds on #7433 but also removes LeafNode/UnaryNode. These are slightly more complicated to remove. I had to change some abstract classes to traits in order for it to work.
      
      The problem with LeafNode/UnaryNode is that they are often mixed in at the end of an Expression, and then the toString function actually gets resolved to the ones defined in TreeNode, rather than in Expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7434 from rxin/remove-binary-unary-leaf-node and squashes the following commits:
      
      9e8a4de [Reynold Xin] Generator should not be foldable.
      3135a8b [Reynold Xin] SortOrder should not be foldable.
      9c589cf [Reynold Xin] Fixed one more test case...
      2225331 [Reynold Xin] Aggregate expressions should not be foldable.
      16b5c90 [Reynold Xin] [SPARK-9085][SQL] Remove LeafNode, UnaryNode, BinaryNode from TreeNode.
      fec10f0c
    • Yijie Shen's avatar
      [SPARK-6941] [SQL] Provide a better error message to when inserting into RDD based table · 43dac2c8
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-6941
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7342 from yijieshen/SPARK-6941 and squashes the following commits:
      
      f82cbe7 [Yijie Shen] reorder import
      dd67e40 [Yijie Shen] resolve comments
      09518af [Yijie Shen] fix import order in DataframeSuite
      0c635d4 [Yijie Shen] make match more specific
      9df388d [Yijie Shen] move check into PreWriteCheck
      847ab20 [Yijie Shen] Detect insertion error in DataSourceStrategy
      43dac2c8
    • Jan Prach's avatar
      [SPARK-9015] [BUILD] Clean project import in scala ide · b536d5dc
      Jan Prach authored
      Cleanup maven for a clean import in scala-ide / eclipse.
      
      * remove groovy plugin which is really not needed at all
      * add-source from build-helper-maven-plugin is not needed as recent version of scala-maven-plugin do it automatically
      * add lifecycle-mapping plugin to hide a few useless warnings from ide
      
      Author: Jan Prach <jendap@gmail.com>
      
      Closes #7375 from jendap/clean-project-import-in-scala-ide and squashes the following commits:
      
      c4b4c0f [Jan Prach] fix whitespaces
      5a83e07 [Jan Prach] Revert "remove java compiler warnings from java tests"
      312007e [Jan Prach] scala-maven-plugin itself add scala sources by default
      f47d856 [Jan Prach] remove spark-1.4-staging repository
      c8a54db [Jan Prach] remove java compiler warnings from java tests
      999a068 [Jan Prach] remove some maven warnings in scala ide
      80fbdc5 [Jan Prach] remove groovy and gmavenplus plugin
      b536d5dc
    • Tarek Auel's avatar
      [SPARK-8995] [SQL] cast date strings like '2015-01-01 12:15:31' to date · 4ea6480a
      Tarek Auel authored
      Jira https://issues.apache.org/jira/browse/SPARK-8995
      
      In PR #6981we noticed that we cannot cast date strings that contains a time, like '2015-03-18 12:39:40' to date. Besides it's not possible to cast a string like '18:03:20' to a timestamp.
      
      If a time is passed without a date, today is inferred as date.
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      Author: Tarek Auel <tarek.auel@gmail.com>
      
      Closes #7353 from tarekauel/SPARK-8995 and squashes the following commits:
      
      14f333b [Tarek Auel] [SPARK-8995] added tests for daylight saving time
      ca1ae69 [Tarek Auel] [SPARK-8995] style fix
      d20b8b4 [Tarek Auel] [SPARK-8995] bug fix: distinguish between 0 and null
      ef05753 [Tarek Auel] [SPARK-8995] added check for year >= 1000
      01c9ff3 [Tarek Auel] [SPARK-8995] support for time strings
      34ec573 [Tarek Auel] fixed style
      71622c0 [Tarek Auel] improved timestamp and date parsing
      0e30c0a [Tarek Auel] Hive compatibility
      cfbaed7 [Tarek Auel] fixed wrong checks
      71f89c1 [Tarek Auel] [SPARK-8995] minor style fix
      f7452fa [Tarek Auel] [SPARK-8995] removed old timestamp parsing
      30e5aec [Tarek Auel] [SPARK-8995] date and timestamp cast
      c1083fb [Tarek Auel] [SPARK-8995] cast date strings like '2015-01-01 12:15:31' to date or timestamp
      4ea6480a
    • Daniel Darabos's avatar
      [SPARK-8893] Add runtime checks against non-positive number of partitions · 01155162
      Daniel Darabos authored
      https://issues.apache.org/jira/browse/SPARK-8893
      
      > What does `sc.parallelize(1 to 3).repartition(p).collect` return? I would expect `Array(1, 2, 3)` regardless of `p`. But if `p` < 1, it returns `Array()`. I think instead it should throw an `IllegalArgumentException`.
      
      > I think the case is pretty clear for `p` < 0. But the behavior for `p` = 0 is also error prone. In fact that's how I found this strange behavior. I used `rdd.repartition(a/b)` with positive `a` and `b`, but `a/b` was rounded down to zero and the results surprised me. I'd prefer an exception instead of unexpected (corrupt) results.
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #7285 from darabos/patch-1 and squashes the following commits:
      
      decba82 [Daniel Darabos] Allow repartitioning empty RDDs to zero partitions.
      97de852 [Daniel Darabos] Allow zero partition count in HashPartitioner
      f6ba5fb [Daniel Darabos] Use require() for simpler syntax.
      d5e3df8 [Daniel Darabos] Require positive number of partitions in HashPartitioner
      897c628 [Daniel Darabos] Require positive maxPartitions in CoalescedRDD
      01155162
    • Liang-Chi Hsieh's avatar
      [SPARK-8807] [SPARKR] Add between operator in SparkR · 0a795336
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8807
      
      Add between operator in SparkR.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7356 from viirya/add_r_between and squashes the following commits:
      
      7f51b44 [Liang-Chi Hsieh] Add test for non-numeric column.
      c6a25c5 [Liang-Chi Hsieh] Add between function.
      0a795336
    • Cheng Hao's avatar
      [SPARK-8972] [SQL] Incorrect result for rollup · e2721231
      Cheng Hao authored
      We don't support the complex expression keys in the rollup/cube, and we even will not report it if we have the complex group by keys, that will cause very confusing/incorrect result.
      
      e.g. `SELECT key%100 FROM src GROUP BY key %100 with ROLLUP`
      
      This PR adds an additional project during the analyzing for the complex GROUP BY keys, and that projection will be the child of `Expand`, so to `Expand`, the GROUP BY KEY are always the simple key(attribute names).
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7343 from chenghao-intel/expand and squashes the following commits:
      
      1ebbb59 [Cheng Hao] update the comment
      827873f [Cheng Hao] update as feedback
      34def69 [Cheng Hao] Add more unit test and comments
      c695760 [Cheng Hao] fix bug of incorrect result for rollup
      e2721231
    • Wenchen Fan's avatar
      [SPARK-9068][SQL] refactor the implicit type cast code · ba330968
      Wenchen Fan authored
      based on https://github.com/apache/spark/pull/7348
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7420 from cloud-fan/type-check and squashes the following commits:
      
      7633fa9 [Wenchen Fan] revert
      fe169b0 [Wenchen Fan] improve test
      03b70da [Wenchen Fan] enhance implicit type cast
      ba330968
  3. Jul 15, 2015
    • Cheng Hao's avatar
      [SPARK-8245][SQL] FormatNumber/Length Support for Expression · 42dea3ac
      Cheng Hao authored
      - `BinaryType` for `Length`
      - `FormatNumber`
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7034 from chenghao-intel/expression and squashes the following commits:
      
      e534b87 [Cheng Hao] python api style issue
      601bbf5 [Cheng Hao] add python API support
      3ebe288 [Cheng Hao] update as feedback
      52274f7 [Cheng Hao] add support for udf_format_number and length for binary
      42dea3ac
    • Yin Huai's avatar
      [SPARK-9060] [SQL] Revert SPARK-8359, SPARK-8800, and SPARK-8677 · 9c64a75b
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9060
      
      This PR reverts:
      * https://github.com/apache/spark/commit/31bd30687bc29c0e457c37308d489ae2b6e5b72a (SPARK-8359)
      * https://github.com/apache/spark/commit/24fda7381171738cbbbacb5965393b660763e562 (SPARK-8677)
      * https://github.com/apache/spark/commit/4b5cfc988f23988c2334882a255d494fc93d252e (SPARK-8800)
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7426 from yhuai/SPARK-9060 and squashes the following commits:
      
      651264d [Yin Huai] Revert "[SPARK-8359] [SQL] Fix incorrect decimal precision after multiplication"
      cfda7e4 [Yin Huai] Revert "[SPARK-8677] [SQL] Fix non-terminating decimal expansion for decimal divide operation"
      2de9afe [Yin Huai] Revert "[SPARK-8800] [SQL] Fix inaccurate precision/scale of Decimal division operation"
      9c64a75b
    • Xiangrui Meng's avatar
      [SPARK-9018] [MLLIB] add stopwatches · 73d92b00
      Xiangrui Meng authored
      Add stopwatches for easy instrumentation of MLlib algorithms. This is based on the `TimeTracker` used in decision trees. The distributed version uses Spark accumulator. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7415 from mengxr/SPARK-9018 and squashes the following commits:
      
      40b4347 [Xiangrui Meng] == -> ===
      c477745 [Xiangrui Meng] address Joseph's comments
      f981a49 [Xiangrui Meng] add stopwatches
      73d92b00
    • Eric Liang's avatar
      [SPARK-8774] [ML] Add R model formula with basic support as a transformer · 6960a793
      Eric Liang authored
      This implements minimal R formula support as a feature transformer. Both numeric and string labels are supported, but features must be numeric for now.
      
      cc mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #7381 from ericl/spark-8774-1 and squashes the following commits:
      
      d1959d2 [Eric Liang] clarify comment
      2db68aa [Eric Liang] second round of comments
      dc3c943 [Eric Liang] address comments
      5765ec6 [Eric Liang] fix style checks
      1f361b0 [Eric Liang] doc
      fb0826b [Eric Liang] [SPARK-8774] Add R model formula with basic support as a transformer
      6960a793
    • Reynold Xin's avatar
      [SPARK-9086][SQL] Remove BinaryNode from TreeNode. · b0645195
      Reynold Xin authored
      These traits are not super useful, and yet cause problems with toString in expressions due to the orders they are mixed in.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7433 from rxin/remove-binary-node and squashes the following commits:
      
      1881f78 [Reynold Xin] [SPARK-9086][SQL] Remove BinaryNode from TreeNode.
      b0645195
    • Reynold Xin's avatar
      [SPARK-9071][SQL] MonotonicallyIncreasingID and SparkPartitionID should be... · affbe329
      Reynold Xin authored
      [SPARK-9071][SQL] MonotonicallyIncreasingID and SparkPartitionID should be marked as nondeterministic.
      
      I also took the chance to more explicitly define the semantics of deterministic.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7428 from rxin/non-deterministic and squashes the following commits:
      
      a760827 [Reynold Xin] [SPARK-9071][SQL] MonotonicallyIncreasingID and SparkPartitionID should be marked as nondeterministic.
      affbe329
    • KaiXinXiaoLei's avatar
      [SPARK-8974] Catch exceptions in allocation schedule task. · 674eb2a4
      KaiXinXiaoLei authored
      I meet a problem. When I submit some tasks, the thread spark-dynamic-executor-allocation should seed the message about "requestTotalExecutors", and the new executor should start. But I meet a problem about this thread, like:
      
      2015-07-14 19:02:17,461 | WARN  | [spark-dynamic-executor-allocation] | Error sending message [message = RequestExecutors(1)] in 1 attempts
      java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
              at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
              at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
              at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
              at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
              at scala.concurrent.Await$.result(package.scala:107)
              at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
              at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
              at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:57)
              at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:351)
              at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1382)
              at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:343)
              at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:295)
              at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:248)
      
      when after some minutes, I find a new ApplicationMaster start,  and tasks submitted start to run. The tasks Completed. And after long time (eg, ten minutes), the number of executor  does not reduce to zero.  I use the default value of "spark.dynamicAllocation.minExecutors".
      
      Author: KaiXinXiaoLei <huleilei1@huawei.com>
      
      Closes #7352 from KaiXinXiaoLei/dym and squashes the following commits:
      
      3603631 [KaiXinXiaoLei] change logError to logWarning
      efc4f24 [KaiXinXiaoLei] change file
      674eb2a4
    • zsxwing's avatar
      [SPARK-6602][Core]Replace Akka Serialization with Spark Serializer · b9a922e2
      zsxwing authored
      Replace Akka Serialization with Spark Serializer and add unit tests.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7159 from zsxwing/remove-akka-serialization and squashes the following commits:
      
      fc0fca3 [zsxwing] Merge branch 'master' into remove-akka-serialization
      cf81a58 [zsxwing] Fix the code style
      73251c6 [zsxwing] Add test scope
      9ef4af9 [zsxwing] Add AkkaRpcEndpointRef.hashCode
      433115c [zsxwing] Remove final
      be3edb0 [zsxwing] Support deserializing RpcEndpointRef
      ecec410 [zsxwing] Replace Akka Serialization with Spark Serializer
      b9a922e2
    • Feynman Liang's avatar
      [SPARK-9005] [MLLIB] Fix RegressionMetrics computation of explainedVariance · 536533ca
      Feynman Liang authored
      Fixes implementation of `explainedVariance` and `r2` to be consistent with their definitions as described in [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005).
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #7361 from feynmanliang/SPARK-9005-RegressionMetrics-bugs and squashes the following commits:
      
      f1112fc [Feynman Liang] Add explainedVariance formula
      1a3d098 [Feynman Liang] SROwen code review comments
      08a0e1b [Feynman Liang] Fix pyspark tests
      db8605a [Feynman Liang] Style fix
      bde9761 [Feynman Liang] Fix RegressionMetrics tests, relax assumption predictor is unbiased
      c235de0 [Feynman Liang] Fix RegressionMetrics tests
      4c4e56f [Feynman Liang] Fix RegressionMetrics computation of explainedVariance and r2
      536533ca
    • Steve Loughran's avatar
      SPARK-9070 JavaDataFrameSuite teardown NPEs if setup failed · ec9b6216
      Steve Loughran authored
      fix teardown to skip table delete if hive context is null
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #7425 from steveloughran/stevel/patches/SPARK-9070-JavaDataFrameSuite-NPE and squashes the following commits:
      
      1982d38 [Steve Loughran] SPARK-9070 JavaDataFrameSuite teardown NPEs if setup failed
      ec9b6216
    • Shuo Xiang's avatar
      [SPARK-7555] [DOCS] Add doc for elastic net in ml-guide and mllib-guide · 303c1201
      Shuo Xiang authored
      jkbradley I put the elastic net under the **Algorithm guide** section. Also add the formula of elastic net in mllib-linear `mllib-linear-methods#regularizers`.
      
      dbtsai I left the code tab for you to add example code. Do you think it is the right place?
      
      Author: Shuo Xiang <shuoxiangpub@gmail.com>
      
      Closes #6504 from coderxiang/elasticnet and squashes the following commits:
      
      f6061ee [Shuo Xiang] typo
      90a7c88 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet
      0610a36 [Shuo Xiang] move out the elastic net to ml-linear-methods
      8747190 [Shuo Xiang] merge master
      706d3f7 [Shuo Xiang] add python code
      9bc2b4c [Shuo Xiang] typo
      db32a60 [Shuo Xiang] java code sample
      aab3b3a [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet
      a0dae07 [Shuo Xiang] simplify code
      d8616fd [Shuo Xiang] Update the definition of elastic net. Add scala code; Mention Lasso and Ridge
      df5bd14 [Shuo Xiang] use wikipeida page in ml-linear-methods.md
      78d9366 [Shuo Xiang] address comments
      8ce37c2 [Shuo Xiang] Merge branch 'elasticnet' of github.com:coderxiang/spark into elasticnet
      8f24848 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc
      998d766 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc
      89f10e4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc
      9262a72 [Shuo Xiang] update
      7e07d12 [Shuo Xiang] update
      b32f21a [Shuo Xiang] add doc for elastic net in sparkml
      937eef1 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc
      180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
      aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
      5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
      c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
      98804c9 [Shuo Xiang] fix bug in topBykey and update test
      303c1201
    • Liang-Chi Hsieh's avatar
      [Minor][SQL] Allow spaces in the beginning and ending of string for Interval · 9716a727
      Liang-Chi Hsieh authored
      This is a minor fixing for #7355 to allow spaces in the beginning and ending of string parsed to `Interval`.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7390 from viirya/fix_interval_string and squashes the following commits:
      
      9eb6831 [Liang-Chi Hsieh] Use trim instead of modifying regex.
      57861f7 [Liang-Chi Hsieh] Fix scala style.
      815a9cb [Liang-Chi Hsieh] Slightly modify regex to allow spaces in the beginning and ending of string.
      9716a727
    • zhichao.li's avatar
      [SPARK-8221][SQL]Add pmod function · a9385271
      zhichao.li authored
      https://issues.apache.org/jira/browse/SPARK-8221
      
      One concern is the result would be negative if the divisor is not positive( i.e pmod(7, -3) ), but the behavior is the same as hive.
      
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #6783 from zhichao-li/pmod2 and squashes the following commits:
      
      7083eb9 [zhichao.li] update to the latest type checking
      d26dba7 [zhichao.li] add pmod
      a9385271
    • Wenchen Fan's avatar
      [SPARK-9020][SQL] Support mutable state in code gen expressions · fa4ec360
      Wenchen Fan authored
      We can keep expressions' mutable states in generated class(like `SpecificProjection`) as member variables, so that we can read and modify them inside codegened expressions.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7392 from cloud-fan/mutable-state and squashes the following commits:
      
      eb3a221 [Wenchen Fan] fix order
      73144d8 [Wenchen Fan] naming improvement
      318f41d [Wenchen Fan] address more comments
      d43b65d [Wenchen Fan] address comments
      fd45c7a [Wenchen Fan] Support mutable state in code gen expressions
      fa4ec360
    • Liang-Chi Hsieh's avatar
      [SPARK-8840] [SPARKR] Add float coercion on SparkR · 6f690259
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8840
      
      Currently the type coercion rules don't include float type. This PR simply adds it.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7280 from viirya/add_r_float_coercion and squashes the following commits:
      
      c86dc0e [Liang-Chi Hsieh] For comments.
      dbf0c1b [Liang-Chi Hsieh] Implicitly convert Double to Float based on provided schema.
      733015a [Liang-Chi Hsieh] Add test case for DataFrame with float type.
      30c2a40 [Liang-Chi Hsieh] Update test case.
      52b5294 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_r_float_coercion
      6f9159d [Liang-Chi Hsieh] Add another test case.
      8db3244 [Liang-Chi Hsieh] schema also needs to support float. add test case.
      0dcc992 [Liang-Chi Hsieh] Add float coercion on SparkR.
      6f690259
    • MechCoder's avatar
      [SPARK-8706] [PYSPARK] [PROJECT INFRA] Add pylint checks to PySpark · 20bb10f8
      MechCoder authored
      This adds Pylint checks to PySpark.
      
      For now this lazy installs using easy_install to /dev/pylint (similar to the pep8 script).
      We still need to figure out what rules to be allowed.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7241 from MechCoder/pylint and squashes the following commits:
      
      2fc7291 [MechCoder] Remove pylint test fail
      6d883a2 [MechCoder] Silence warnings and make pylint tests fail to check if it works in jenkins
      f3a5e17 [MechCoder] undefined-variable
      ca8b749 [MechCoder] Minor changes
      71629f8 [MechCoder] remove trailing whitespace
      8498ff9 [MechCoder] Remove blacklisted arguments and pointless statements check
      1dbd094 [MechCoder] Disable all checks for now
      8b8aa8a [MechCoder] Add pylint configuration file
      7871bb1 [MechCoder] [SPARK-8706] [PySpark] [Project infra] Add pylint checks to PySpark
      20bb10f8
    • zsxwing's avatar
      [SPARK-9012] [WEBUI] Escape Accumulators in the task table · adb33d36
      zsxwing authored
      If running the following codes, the task table will be broken because accumulators aren't escaped.
      ```
      val a = sc.accumulator(1, "<table>")
      sc.parallelize(1 to 10).foreach(i => a += i)
      ```
      
      Before this fix,
      
      <img width="1348" alt="screen shot 2015-07-13 at 8 02 44 pm" src="https://cloud.githubusercontent.com/assets/1000778/8649295/b17c491e-299b-11e5-97ee-4e6a64074c4f.png">
      
      After this fix,
      
      <img width="1355" alt="screen shot 2015-07-13 at 8 14 32 pm" src="https://cloud.githubusercontent.com/assets/1000778/8649337/f9e9c9ec-299b-11e5-927e-35c0a2f897f5.png">
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7369 from zsxwing/SPARK-9012 and squashes the following commits:
      
      a83c9b6 [zsxwing] Escape Accumulators in the task table
      adb33d36
    • Reynold Xin's avatar
      [HOTFIX][SQL] Unit test breaking. · 14935d84
      Reynold Xin authored
      14935d84
    • Feynman Liang's avatar
      [SPARK-8997] [MLLIB] Performance improvements in LocalPrefixSpan · 1bb8accb
      Feynman Liang authored
      Improves the performance of LocalPrefixSpan by implementing optimizations proposed in [SPARK-8997](https://issues.apache.org/jira/browse/SPARK-8997)
      
      Author: Feynman Liang <fliang@databricks.com>
      Author: Feynman Liang <feynman.liang@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7360 from feynmanliang/SPARK-8997-improve-prefixspan and squashes the following commits:
      
      59db2f5 [Feynman Liang] Merge pull request #1 from mengxr/SPARK-8997
      91e4357 [Xiangrui Meng] update LocalPrefixSpan impl
      9212256 [Feynman Liang] MengXR code review comments
      f055d82 [Feynman Liang] Fix failing scalatest
      2e00cba [Feynman Liang] Depth first projections
      70b93e3 [Feynman Liang] Performance improvements in LocalPrefixSpan, fix tests
      1bb8accb
    • Yijie Shen's avatar
      [SPARK-8279][SQL]Add math function round · f0e12974
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8279
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #6938 from yijieshen/udf_round_3 and squashes the following commits:
      
      07a124c [Yijie Shen] remove useless def children
      392b65b [Yijie Shen] add negative scale test in DecimalSuite
      61760ee [Yijie Shen] address reviews
      302a78a [Yijie Shen] Add dataframe function test
      31dfe7c [Yijie Shen] refactor round to make it readable
      8c7a949 [Yijie Shen] rebase & inputTypes update
      9555e35 [Yijie Shen] tiny style fix
      d10be4a [Yijie Shen] use TypeCollection to specify wanted input and implicit cast
      c3b9839 [Yijie Shen] rely on implict cast to handle string input
      b0bff79 [Yijie Shen] make round's inner method's name more meaningful
      9bd6930 [Yijie Shen] revert accidental change
      e6f44c4 [Yijie Shen] refactor eval and genCode
      1b87540 [Yijie Shen] modify checkInputDataTypes using foldable
      5486b2d [Yijie Shen] DataFrame API modification
      2077888 [Yijie Shen] codegen versioned eval
      6cd9a64 [Yijie Shen] refactor Round's constructor
      9be894e [Yijie Shen] add round functions in o.a.s.sql.functions
      7c83e13 [Yijie Shen] more tests on round
      56db4bb [Yijie Shen] Add decimal support to Round
      7e163ae [Yijie Shen] style fix
      653d047 [Yijie Shen] Add math function round
      f0e12974
Loading