Skip to content
Snippets Groups Projects
  1. Jul 15, 2015
    • Michael Armbrust's avatar
      Revert SPARK-6910 and SPARK-9027 · c6b1a9e7
      Michael Armbrust authored
      Revert #7216 and #7386.  These patch seems to be causing quite a few test failures:
      
      ```
      Caused by: java.lang.reflect.InvocationTargetException
      	at sun.reflect.GeneratedMethodAccessor322.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:351)
      	at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getPartitionsByFilter$1.apply(ClientWrapper.scala:320)
      	at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getPartitionsByFilter$1.apply(ClientWrapper.scala:318)
      	at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:180)
      	at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:135)
      	at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:172)
      	at org.apache.spark.sql.hive.client.ClientWrapper.getPartitionsByFilter(ClientWrapper.scala:318)
      	at org.apache.spark.sql.hive.client.HiveTable.getPartitions(ClientInterface.scala:78)
      	at org.apache.spark.sql.hive.MetastoreRelation.getHiveQlPartitions(HiveMetastoreCatalog.scala:670)
      	at org.apache.spark.sql.hive.execution.HiveTableScan.doExecute(HiveTableScan.scala:137)
      	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:90)
      	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:90)
      	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
      	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:89)
      	at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:164)
      	at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:151)
      	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
      	... 85 more
      Caused by: MetaException(message:Filtering is supported only on partition keys of type string)
      	at org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185)
      	at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:452)
      	at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357)
      	at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:279)
      	at org.apache.hadoop.hive.metastore.parser.ExpressionTree$TreeNode.generateJDOFilter(ExpressionTree.java:243)
      	at org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:590)
      	at org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:2417)
      	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2029)
      	at org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:146)
      	at org.apache.hadoop.hive.metastore.ObjectStore$4.getJdoResult(ObjectStore.java:2332)
      ```
      https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-with-YARN/2945/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.sql.hive.execution/SortMergeCompatibilitySuite/auto_sortmerge_join_16/
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #7409 from marmbrus/revertMetastorePushdown and squashes the following commits:
      
      92fabd3 [Michael Armbrust] Revert SPARK-6910 and SPARK-9027
      5d3bdf2 [Michael Armbrust] Revert "[SPARK-9027] [SQL] Generalize metastore predicate pushdown"
      c6b1a9e7
    • Reynold Xin's avatar
      [SPARK-8993][SQL] More comprehensive type checking in expressions. · f23a721c
      Reynold Xin authored
      This patch makes the following changes:
      
      1. ExpectsInputTypes only defines expected input types, but does not perform any implicit type casting.
      2. ImplicitCastInputTypes is a new trait that defines both expected input types, as well as performs implicit type casting.
      3. BinaryOperator has a new abstract function "inputType", which defines the expected input type for both left/right. Concrete BinaryOperator expressions no longer perform any implicit type casting.
      4. For BinaryOperators, convert NullType (i.e. null literals) into some accepted type so BinaryOperators don't need to handle NullTypes.
      
      TODOs needed: fix unit tests for error reporting.
      
      I'm intentionally not changing anything in aggregate expressions because yhuai is doing a big refactoring on that right now.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7348 from rxin/typecheck and squashes the following commits:
      
      8fcf814 [Reynold Xin] Fixed ordering of cases.
      3bb63e7 [Reynold Xin] Style fix.
      f45408f [Reynold Xin] Comment update.
      aa7790e [Reynold Xin] Moved RemoveNullTypes into ImplicitTypeCasts.
      438ea07 [Reynold Xin] space
      d55c9e5 [Reynold Xin] Removes NullTypes.
      360d124 [Reynold Xin] Fixed the rule.
      fb66657 [Reynold Xin] Convert NullType into some accepted type for BinaryOperators.
      2e22330 [Reynold Xin] Fixed unit tests.
      4932d57 [Reynold Xin] Style fix.
      d061691 [Reynold Xin] Rename existing ExpectsInputTypes -> ImplicitCastInputTypes.
      e4727cc [Reynold Xin] BinaryOperator should not be doing implicit cast.
      d017861 [Reynold Xin] Improve expression type checking.
      f23a721c
    • Sun Rui's avatar
      [SPARK-8808] [SPARKR] Fix assignments in SparkR. · f650a005
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7395 from sun-rui/SPARK-8808 and squashes the following commits:
      
      ce603bc [Sun Rui] Use '<-' instead of '='.
      88590b1 [Sun Rui] Use '<-' instead of '='.
      f650a005
  2. Jul 14, 2015
    • Patrick Wendell's avatar
      5572fd0c
    • jerryshao's avatar
      [SPARK-5523] [CORE] [STREAMING] Add a cache for hostname in TaskMetrics to... · bb870e72
      jerryshao authored
      [SPARK-5523] [CORE] [STREAMING] Add a cache for hostname in TaskMetrics to decrease the memory usage and GC overhead
      
      Hostname in TaskMetrics will be created through deserialization, mostly the number of hostname is only the order of number of cluster node, so adding a cache layer to dedup the object could reduce the memory usage and alleviate GC overhead, especially for long-running and fast job generation applications like Spark Streaming.
      
      Author: jerryshao <saisai.shao@intel.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #5064 from jerryshao/SPARK-5523 and squashes the following commits:
      
      3e2412a [jerryshao] Address the comments
      b092a81 [Saisai Shao] Add a pool to cache the hostname
      bb870e72
    • huangzhaowei's avatar
      [SPARK-8820] [STREAMING] Add a configuration to set checkpoint dir. · f957796c
      huangzhaowei authored
      Add a configuration to set checkpoint directory  for convenience to user.
      [Jira Address](https://issues.apache.org/jira/browse/SPARK-8820)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #7218 from SaintBacchus/SPARK-8820 and squashes the following commits:
      
      d49fe4b [huangzhaowei] Rename the configuration name
      66ea47c [huangzhaowei] Add the unit test.
      dd0acc1 [huangzhaowei] [SPARK-8820][Streaming] Add a configuration to set checkpoint dir.
      f957796c
    • Josh Rosen's avatar
      [SPARK-9050] [SQL] Remove unused newOrdering argument from Exchange (cleanup after SPARK-8317) · cc57d705
      Josh Rosen authored
      SPARK-8317 changed the SQL Exchange operator so that it no longer pushed sorting into Spark's shuffle layer, a change which allowed more efficient SQL-specific sorters to be used.
      
      This patch performs some leftover cleanup based on those changes:
      
      - Exchange's constructor should no longer accept a `newOrdering` since it's no longer used and no longer works as expected.
      - `addOperatorsIfNecessary` looked at shuffle input's output ordering to decide whether to sort, but this is the wrong node to be examining: it needs to look at whether the post-shuffle node has the right ordering, since shuffling will not preserve row orderings.  Thanks to davies for spotting this.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7407 from JoshRosen/SPARK-9050 and squashes the following commits:
      
      e70be50 [Josh Rosen] No need to wrap line
      e866494 [Josh Rosen] Refactor addOperatorsIfNecessary to make code clearer
      2e467da [Josh Rosen] Remove `newOrdering` from Exchange.
      cc57d705
    • Josh Rosen's avatar
      [SPARK-9045] Fix Scala 2.11 build break in UnsafeExternalRowSorter · e965a798
      Josh Rosen authored
      This fixes a compilation break in under Scala 2.11:
      
      ```
      [error] /home/jenkins/workspace/Spark-Master-Scala211-Compile/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java:135: error: <anonymous org.apache.spark.sql.execution.UnsafeExternalRowSorter$1> is not abstract and does not override abstract method <B>minBy(Function1<InternalRow,B>,Ordering<B>) in TraversableOnce
      [error]       return new AbstractScalaRowIterator() {
      [error]                                             ^
      [error]   where B,A are type-variables:
      [error]     B extends Object declared in method <B>minBy(Function1<A,B>,Ordering<B>)
      [error]     A extends Object declared in interface TraversableOnce
      [error] 1 error
      ```
      
      The workaround for this is to make `AbstractScalaRowIterator` into a concrete class.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7405 from JoshRosen/SPARK-9045 and squashes the following commits:
      
      cbcbb4c [Josh Rosen] Forgot that we can't use the ??? operator anymore
      577ba60 [Josh Rosen] [SPARK-9045] Fix Scala 2.11 build break in UnsafeExternalRowSorter.
      e965a798
    • Josh Rosen's avatar
      [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses · 11e5c372
      Josh Rosen authored
      This pull request adds a Scalastyle regex rule which fails the style check if `Class.forName` is used directly.  `Class.forName` always loads classes from the default / system classloader, but in a majority of cases, we should be using Spark's own `Utils.classForName` instead, which tries to load classes from the current thread's context classloader and falls back to the classloader which loaded Spark when the context classloader is not defined.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7350 from JoshRosen/ban-Class.forName and squashes the following commits:
      
      e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      c0b7885 [Josh Rosen] Hopefully fix the last two cases
      d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first cleanup pass
      046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName
      62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion.
      d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName
      11e5c372
    • Sean Owen's avatar
      [SPARK-4362] [MLLIB] Make prediction probability available in NaiveBayesModel · 740b034f
      Sean Owen authored
      Add predictProbabilities to Naive Bayes, return class probabilities.
      
      Continues https://github.com/apache/spark/pull/6761
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #7376 from srowen/SPARK-4362 and squashes the following commits:
      
      23d5a76 [Sean Owen] Fix model.labels -> model.theta
      95d91fb [Sean Owen] Check that predicted probabilities sum to 1
      b32d1c8 [Sean Owen] Add predictProbabilities to Naive Bayes, return class probabilities
      740b034f
    • Liang-Chi Hsieh's avatar
      [SPARK-8800] [SQL] Fix inaccurate precision/scale of Decimal division operation · 4b5cfc98
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8800
      
      Previously, we turn to Java BigDecimal's divide with specified ROUNDING_MODE to avoid non-terminating decimal expansion problem. However, as JihongMA reported, for the division operation on some specific values, we get inaccurate results.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #7212 from viirya/fix_decimal4 and squashes the following commits:
      
      4205a0a [Liang-Chi Hsieh] Fix inaccuracy precision/scale of Decimal division operation.
      4b5cfc98
    • zsxwing's avatar
      [SPARK-4072] [CORE] Display Streaming blocks in Streaming UI · fb1d06fc
      zsxwing authored
      Replace #6634
      
      This PR adds `SparkListenerBlockUpdated` to SparkListener so that it can monitor all block update infos that are sent to `BlockManagerMasaterEndpoint`, and also add new tables in the Storage tab to display the stream block infos.
      
      ![screen shot 2015-07-01 at 5 19 46 pm](https://cloud.githubusercontent.com/assets/1000778/8451562/c291a6ec-2016-11e5-890d-0afc174e1f8c.png)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6672 from zsxwing/SPARK-4072-2 and squashes the following commits:
      
      df2c1d8 [zsxwing] Use xml query to check the xml elements
      54d54af [zsxwing] Add unit tests for StoragePage
      e29fb53 [zsxwing] Update as per TD's comments
      ccbee07 [zsxwing] Fix the code style
      6dc42b4 [zsxwing] Fix the replication level of blocks
      450fad1 [zsxwing] Merge branch 'master' into SPARK-4072-2
      1e9ef52 [zsxwing] Don't categorize by Executor ID
      ca0ab69 [zsxwing] Fix the code style
      3de2762 [zsxwing] Make object BlockUpdatedInfo private
      e95b594 [zsxwing] Add 'Aggregated Stream Block Metrics by Executor' table
      ba5d0d1 [zsxwing] Refactor the unit test to improve the readability
      4bbe341 [zsxwing] Revert JsonProtocol and don't log SparkListenerBlockUpdated
      b464dd1 [zsxwing] Add onBlockUpdated to EventLoggingListener
      5ba014c [zsxwing] Fix the code style
      0b1e47b [zsxwing] Add a developer api BlockUpdatedInfo
      04838a9 [zsxwing] Fix the code style
      2baa161 [zsxwing] Add unit tests
      80f6c6d [zsxwing] Address comments
      797ee4b [zsxwing] Display Streaming blocks in Streaming UI
      fb1d06fc
    • Andrew Ray's avatar
      [SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of partitions · 0a4071ea
      Andrew Ray authored
      See https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb
      
      Author: Andrew Ray <ray.andrew@gmail.com>
      
      Closes #7104 from aray/edge-partition-2d-improvement and squashes the following commits:
      
      3729f84 [Andrew Ray] correct bounds and remove unneeded comments
      97f8464 [Andrew Ray] change less
      5141ab4 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement
      925fd2c [Andrew Ray] use new interface for partitioning
      001bfd0 [Andrew Ray] Refactor PartitionStrategy so that we can return a prtition function for a given number of parts. To keep compatibility we define default methods that translate between the two implementation options. Made EdgePartition2D use old strategy when we have a perfect square and implement new interface.
      5d42105 [Andrew Ray] % -> /
      3560084 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement
      f006364 [Andrew Ray] remove unneeded comments
      cfa2c5e [Andrew Ray] Modifications to EdgePartition2D so that it works for non perfect squares.
      0a4071ea
    • Josh Rosen's avatar
      [SPARK-9031] Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class · d267c283
      Josh Rosen authored
      BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter. In order to simplify the code in preparation for other refactorings, I think that we should remove this base class and have only DiskBlockObjectWriter.
      
      While at one time we may have planned to have multiple BlockObjectWriter implementations, that doesn't seem to have happened, so the extra abstraction seems unnecessary.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7391 from JoshRosen/shuffle-write-interface-refactoring and squashes the following commits:
      
      c418e33 [Josh Rosen] Fix compilation
      5047995 [Josh Rosen] Fix comments
      d5dc548 [Josh Rosen] Update references in comments
      89dc797 [Josh Rosen] Rename test suite.
      5755918 [Josh Rosen] Remove unnecessary val in case class
      1607c91 [Josh Rosen] Merge BlockObjectWriter and DiskBlockObjectWriter
      d267c283
    • Andrew Or's avatar
      [SPARK-8911] Fix local mode endless heartbeats · 8fb3a65c
      Andrew Or authored
      As of #7173 we expect executors to properly register with the driver before responding to their heartbeats. This behavior is not matched in local mode. This patch adds the missing event that needs to be posted.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7382 from andrewor14/fix-local-heartbeat and squashes the following commits:
      
      1258bdf [Andrew Or] Post ExecutorAdded event to local executor
      8fb3a65c
    • Brennon York's avatar
      [SPARK-8933] [BUILD] Provide a --force flag to build/mvn that always uses downloaded maven · c4e98ff0
      Brennon York authored
      added --force flag to manually download, if necessary, and use a built-in version of maven best for spark
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #7374 from brennonyork/SPARK-8933 and squashes the following commits:
      
      d673127 [Brennon York] added --force flag to manually download, if necessary, and use a built-in version of maven best for spark
      c4e98ff0
    • Michael Armbrust's avatar
      [SPARK-9027] [SQL] Generalize metastore predicate pushdown · 37f2d963
      Michael Armbrust authored
      Add support for pushing down metastore filters that are in different orders and add some unit tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #7386 from marmbrus/metastoreFilters and squashes the following commits:
      
      05a4524 [Michael Armbrust] [SPARK-9027][SQL] Generalize metastore predicate pushdown
      37f2d963
    • Wenchen Fan's avatar
      [SPARK-9029] [SQL] shortcut CaseKeyWhen if key is null · 59d820aa
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7389 from cloud-fan/case-when and squashes the following commits:
      
      ea4b6ba [Wenchen Fan] shortcut for case key when
      59d820aa
    • Daoyuan Wang's avatar
      [SPARK-6851] [SQL] function least/greatest follow up · 257236c3
      Daoyuan Wang authored
      This is a follow up of remaining comments from #6851
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #7387 from adrian-wang/udflgfollow and squashes the following commits:
      
      6163e62 [Daoyuan Wang] add skipping null values
      e8c2e09 [Daoyuan Wang] use seq
      8362966 [Daoyuan Wang] pr6851 follow up
      257236c3
    • zhaishidan's avatar
      [SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about... · c1feebd8
      zhaishidan authored
      [SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about `spark.kryoserializer.buffer`
      
      The meaning of spark.kryoserializer.buffer should be "Initial size of Kryo's serialization buffer. Note that there will be one buffer per core on each worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.".
      
      The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4.
      
      Author: zhaishidan <zhaishidan@haizhi.com>
      
      Closes #7393 from stanzhai/master and squashes the following commits:
      
      69729ef [zhaishidan] fix document error about spark.kryoserializer.buffer.max.mb
      c1feebd8
    • Joseph Gonzalez's avatar
      [SPARK-9001] Fixing errors in javadocs that lead to failed build/sbt doc · 20c1434a
      Joseph Gonzalez authored
      These are minor corrections in the documentation of several classes that are preventing:
      
      ```bash
      build/sbt publish-local
      ```
      
      I believe this might be an issue associated with running JDK8 as ankurdave does not appear to have this issue in JDK7.
      
      Author: Joseph Gonzalez <joseph.e.gonzalez@gmail.com>
      
      Closes #7354 from jegonzal/FixingJavadocErrors and squashes the following commits:
      
      6664b7e [Joseph Gonzalez] making requested changes
      2e16d89 [Joseph Gonzalez] Fixing errors in javadocs that prevents build/sbt publish-local from completing.
      20c1434a
  3. Jul 13, 2015
    • Cheolsoo Park's avatar
      [SPARK-6910] [SQL] Support for pushing predicates down to metastore for partition pruning · 408b384d
      Cheolsoo Park authored
      This PR supersedes my old one #6921. Since my patch has changed quite a bit, I am opening a new PR to make it easier to review.
      
      The changes include-
      * Implement `toMetastoreFilter()` function in `HiveShim` that takes `Seq[Expression]` and converts them into a filter string for Hive metastore.
       * This functions matches all the `AttributeReference` + `BinaryComparisonOp` + `Integral/StringType` patterns in `Seq[Expression]` and fold them into a string.
      * Change `hiveQlPartitions` field in `MetastoreRelation` to `getHiveQlPartitions()` function that takes a filter string parameter.
      * Call `getHiveQlPartitions()` in `HiveTableScan` with a filter string.
      
      But there are some cases in which predicate pushdown is disabled-
      
      Case | Predicate pushdown
      ------- | -----------------------------
      Hive integral and string types | Yes
      Hive varchar type | No
      Hive 0.13 and newer | Yes
      Hive 0.12 and older | No
      convertMetastoreParquet=false | Yes
      convertMetastoreParquet=true | No
      
      In case of `convertMetastoreParquet=true`, predicates are not pushed down because this conversion happens in an `Analyzer` rule (`HiveMetastoreCatalog.ParquetConversions`). At this point, `HiveTableScan` hasn't run, so predicates are not available. But reading the source code, I think it is intentional to convert the entire Hive table w/ all the partitions into `ParquetRelation` because then `ParquetRelation` can be cached and reused for any query against that table. Please correct me if I am wrong.
      
      cc marmbrus
      
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #7216 from piaozhexiu/SPARK-6910-2 and squashes the following commits:
      
      aa1490f [Cheolsoo Park] Fix ordering of imports
      c212c4d [Cheolsoo Park] Incorporate review comments
      5e93f9d [Cheolsoo Park] Predicate pushdown into Hive metastore
      408b384d
    • Neelesh Srinivas Salian's avatar
      [SPARK-8743] [STREAMING] Deregister Codahale metrics for streaming when StreamingContext is closed · b7bcbe25
      Neelesh Srinivas Salian authored
      The issue link: https://issues.apache.org/jira/browse/SPARK-8743
      Deregister Codahale metrics for streaming when StreamingContext is closed
      
      Design:
      Adding the method calls in the appropriate start() and stop () methods for the StreamingContext
      
      Actions in the PullRequest:
      1) Added the registerSource method call to the start method for the Streaming Context.
      2) Added the removeSource method to the stop method.
      3) Added comments for both 1 and 2 and comment to show initialization of the StreamingSource
      4) Added a test case to check for both registration and de-registration of metrics
      
      Previous closed PR for reference: https://github.com/apache/spark/pull/7250
      
      Author: Neelesh Srinivas Salian <nsalian@cloudera.com>
      
      Closes #7362 from nssalian/branch-SPARK-8743 and squashes the following commits:
      
      7d998a3 [Neelesh Srinivas Salian] Removed the Thread.sleep() call
      8b26397 [Neelesh Srinivas Salian] Moved the scalatest.{} import
      0e8007a [Neelesh Srinivas Salian] moved import org.apache.spark{} to correct place
      daedaa5 [Neelesh Srinivas Salian] Corrected Ordering of imports
      8873180 [Neelesh Srinivas Salian] Removed redundancy in imports
      59227a4 [Neelesh Srinivas Salian] Changed the ordering of the imports to classify  scala and spark imports
      d8cb577 [Neelesh Srinivas Salian] Added registerSource to start() and removeSource to stop(). Wrote a test to check the registration and de-registration
      b7bcbe25
    • Hari Shreedharan's avatar
      [SPARK-8533] [STREAMING] Upgrade Flume to 1.6.0 · 0aed38e4
      Hari Shreedharan authored
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6939 from harishreedharan/upgrade-flume-1.6.0 and squashes the following commits:
      
      94b80ae [Hari Shreedharan] [SPARK-8533][Streaming] Upgrade Flume to 1.6.0
      0aed38e4
    • Vinod K C's avatar
      [SPARK-8636] [SQL] Fix equalNullSafe comparison · 4c797f2b
      Vinod K C authored
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #7040 from vinodkc/fix_CaseKeyWhen_equalNullSafe and squashes the following commits:
      
      be5e641 [Vinod K C] Renamed equalNullSafe to threeValueEquals
      aac9f67 [Vinod K C] Updated test suite and genCode method
      f2d0b53 [Vinod K C]  Fix equalNullSafe comparison
      4c797f2b
    • Vinod K C's avatar
      [SPARK-8991] [ML] Update SharedParamsCodeGen's Generated Documentation · 714fc55f
      Vinod K C authored
      Removed private[ml] from Generated documentation
      
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #7367 from vinodkc/fix_sharedparmascodegen and squashes the following commits:
      
      4fa3c8f [Vinod K C] Adding auto generated code
      7e19025 [Vinod K C] Removed private[ml]
      714fc55f
    • yongtang's avatar
      [SPARK-8954] [BUILD] Remove unneeded deb repository from Dockerfile to fix build error in docker. · 5c41691f
      yongtang authored
      [SPARK-8954] [Build]
      1. Remove unneeded deb repository from Dockerfile to fix build error in docker.
      2. Remove unneeded /var/lib/apt/lists/* after install to reduce the docker image size (by ~30MB).
      
      Author: yongtang <yongtang@users.noreply.github.com>
      
      Closes #7346 from yongtang/SPARK-8954 and squashes the following commits:
      
      36024a1 [yongtang] [SPARK-8954] [Build] Remove unneeded /var/lib/apt/lists/* after install to reduce the docker image size (by ~30MB)
      7084941 [yongtang] [SPARK-8954] [Build] Remove unneeded deb repository from Dockerfile to fix build error in docker.
      5c41691f
    • Davies Liu's avatar
      79c35826
    • Carson Wang's avatar
      [SPARK-8950] [WEBUI] Correct the calculation of SchedulerDelay in StagePage · 5ca26fb6
      Carson Wang authored
      In StagePage, the SchedulerDelay is calculated as totalExecutionTime - executorRunTime - executorOverhead - gettingResultTime.
      But the totalExecutionTime is calculated in the way that doesn't include the gettingResultTime.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #7319 from carsonwang/SchedulerDelayTime and squashes the following commits:
      
      f66fb6e [Carson Wang] Update the code style
      7d971ae [Carson Wang] Correct the calculation of SchedulerDelay
      5ca26fb6
    • MechCoder's avatar
      [SPARK-8706] [PYSPARK] [PROJECT INFRA] Add pylint checks to PySpark · 9b62e937
      MechCoder authored
      This adds Pylint checks to PySpark.
      
      For now this lazy installs using easy_install to /dev/pylint (similar to the pep8 script).
      We still need to figure out what rules to be allowed.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7241 from MechCoder/pylint and squashes the following commits:
      
      8496834 [MechCoder] Silence warnings and make pylint tests fail to check if it works in jenkins
      57393a3 [MechCoder] undefined-variable
      a8e2547 [MechCoder] Minor changes
      7753810 [MechCoder] remove trailing whitespace
      75c5d2b [MechCoder] Remove blacklisted arguments and pointless statements check
      6bde250 [MechCoder] Disable all checks for now
      3464666 [MechCoder] Add pylint configuration file
      d28109f [MechCoder] [SPARK-8706] [PySpark] [Project infra] Add pylint checks to PySpark
      9b62e937
    • Sun Rui's avatar
      [SPARK-6797] [SPARKR] Add support for YARN cluster mode. · 7f487c8b
      Sun Rui authored
      This PR enables SparkR to dynamically ship the SparkR binary package to the AM node in YARN cluster mode, thus it is no longer required that the SparkR package be installed on each worker node.
      
      This PR uses the JDK jar tool to package the SparkR package, because jar is thought to be available on both Linux/Windows platforms where JDK has been installed.
      
      This PR does not address the R worker involved in RDD API. Will address it in a separate JIRA issue.
      
      This PR does not address SBT build. SparkR installation and packaging by SBT will be addressed in a separate JIRA issue.
      
      R/install-dev.bat is not tested. shivaram , Could you help to test it?
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #6743 from sun-rui/SPARK-6797 and squashes the following commits:
      
      ca63c86 [Sun Rui] Adjust MimaExcludes after rebase.
      7313374 [Sun Rui] Fix unit test errors.
      72695fb [Sun Rui] Fix unit test failures.
      193882f [Sun Rui] Fix Mima test error.
      fe25a33 [Sun Rui] Fix Mima test error.
      35ecfa3 [Sun Rui] Fix comments.
      c38a005 [Sun Rui] Unzipped SparkR binary package is still required for standalone and Mesos modes.
      b05340c [Sun Rui] Fix scala style.
      2ca5048 [Sun Rui] Fix comments.
      1acefd1 [Sun Rui] Fix scala style.
      0aa1e97 [Sun Rui] Fix scala style.
      41d4f17 [Sun Rui] Add support for locating SparkR package for R workers required by RDD APIs.
      49ff948 [Sun Rui] Invoke jar.exe with full path in install-dev.bat.
      7b916c5 [Sun Rui] Use 'rem' consistently.
      3bed438 [Sun Rui] Add a comment.
      681afb0 [Sun Rui] Fix a bug that RRunner does not handle client deployment modes.
      cedfbe2 [Sun Rui] [SPARK-6797][SPARKR] Add support for YARN cluster mode.
      7f487c8b
    • Vincent D. Warmerdam's avatar
      [SPARK-8596] Add module for rstudio link to spark · a5bc803b
      Vincent D. Warmerdam authored
      shivaram, added module for rstudio install
      
      Author: Vincent D. Warmerdam <vincentwarmerdam@gmail.com>
      
      Closes #7366 from koaning/rstudio-install and squashes the following commits:
      
      e47c2da [Vincent D. Warmerdam] added rstudio module
      a5bc803b
    • Wenchen Fan's avatar
      [SPARK-8944][SQL] Support casting between IntervalType and StringType · 6b899438
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7355 from cloud-fan/fromString and squashes the following commits:
      
      3bbb9d6 [Wenchen Fan] fix code gen
      7dab957 [Wenchen Fan] naming fix
      0fbbe19 [Wenchen Fan] address comments
      ac1f3d1 [Wenchen Fan] Support casting between IntervalType and StringType
      6b899438
    • Daoyuan Wang's avatar
      [SPARK-8203] [SPARK-8204] [SQL] conditional function: least/greatest · 92540d22
      Daoyuan Wang authored
      chenghao-intel zhichao-li qiansl127
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #6851 from adrian-wang/udflg and squashes the following commits:
      
      0f1bff2 [Daoyuan Wang] address comments from davis
      7a6bdbb [Daoyuan Wang] add '.' for hex()
      c1f6824 [Daoyuan Wang] add codegen, test for all types
      ec625b0 [Daoyuan Wang] conditional function: least/greatest
      92540d22
  4. Jul 12, 2015
    • Davies Liu's avatar
      [SPARK-9006] [PYSPARK] fix microsecond loss in Python 3 · 20b47433
      Davies Liu authored
      It may loss a microsecond if using timestamp as float, should be `int` instead.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7363 from davies/fix_microsecond and squashes the following commits:
      
      36f6007 [Davies Liu] fix microsecond loss in Python 3
      20b47433
    • Kay Ousterhout's avatar
      [SPARK-8880] Fix confusing Stage.attemptId member variable · 30090884
      Kay Ousterhout authored
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #7275 from kayousterhout/SPARK-8880 and squashes the following commits:
      
      3e9ce7c [Kay Ousterhout] Added missing return type
      e150278 [Kay Ousterhout] [SPARK-8880] Fix confusing Stage.attemptId member variable
      30090884
  5. Jul 11, 2015
  6. Jul 10, 2015
    • Joseph K. Bradley's avatar
      [SPARK-8994] [ML] tiny cleanups to Params, Pipeline · 0c5207c6
      Joseph K. Bradley authored
      Made default impl of Params.validateParams empty
      CC mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #7349 from jkbradley/pipeline-small-cleanups and squashes the following commits:
      
      4e0f013 [Joseph K. Bradley] small cleanups after SPARK-5956
      0c5207c6
    • zhangjiajin's avatar
      [SPARK-6487] [MLLIB] Add sequential pattern mining algorithm PrefixSpan to Spark MLlib · 7f6be1f2
      zhangjiajin authored
      Add parallel PrefixSpan algorithm and test file.
      Support non-temporal sequences.
      
      Author: zhangjiajin <zhangjiajin@huawei.com>
      Author: zhang jiajin <zhangjiajin@huawei.com>
      
      Closes #7258 from zhangjiajin/master and squashes the following commits:
      
      ca9c4c8 [zhangjiajin] Modified the code according to the review comments.
      574e56c [zhangjiajin] Add new object LocalPrefixSpan, and do some optimization.
      ba5df34 [zhangjiajin] Fix a Scala style error.
      4c60fb3 [zhangjiajin] Fix some Scala style errors.
      1dd33ad [zhangjiajin] Modified the code according to the review comments.
      89bc368 [zhangjiajin] Fixed a Scala style error.
      a2eb14c [zhang jiajin] Delete PrefixspanSuite.scala
      951fd42 [zhang jiajin] Delete Prefixspan.scala
      575995f [zhangjiajin] Modified the code according to the review comments.
      91fd7e6 [zhangjiajin] Add new algorithm PrefixSpan and test file.
      7f6be1f2
    • jose.cambronero's avatar
      [SPARK-8598] [MLLIB] Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs · 9c507577
      jose.cambronero authored
      This contribution is my original work and I license it to the project under it's open source license.
      
      Author: jose.cambronero <jose.cambronero@cloudera.com>
      
      Closes #6994 from josepablocam/master and squashes the following commits:
      
      bbb30b1 [jose.cambronero] renamed KSTestResult to KolmogorovSmirnovTestResult, to stay consistent with method name
      0d0c201 [jose.cambronero] kstTest -> kolmogorovSmirnovTest in statistics.md
      1f56371 [jose.cambronero] changed ksTest in public API to kolmogorovSmirnovTest for clarity
      a48ae7b [jose.cambronero] refactor code to account for serializable RealDistribution. Reuse testOneSample( _, cdf)
      1bb44bd [jose.cambronero]  style and doc changes. Factored out ks test into 2 separate tests
      2ec2aa6 [jose.cambronero] initialize to stdnormal when no params passed (and log). Change unit tests to approximate equivalence rather than strict
      a4bc0c7 [jose.cambronero] changed ksTest(data, distName) to ksTest(data, distName, params*) after api discussions. Changed tests and docs accordingly
      7e66f57 [jose.cambronero] copied implementation note to public api docs, and added @see for links to wiki info
      e760ebd [jose.cambronero] line length changes to fit style check
      3288e42 [jose.cambronero] addressed style changes, correctness change to simpler approach, and fixed edge case for foldLeft in searchOneSampleCandidates when a partition is empty
      9026895 [jose.cambronero] addressed style changes, correctness change to simpler approach, and fixed edge case for foldLeft in searchOneSampleCandidates when a partition is empty
      1226b30 [jose.cambronero] reindent multi-line lambdas, prior intepretation of style guide was wrong on my part
      9c0f1af [jose.cambronero] additional style changes incorporated and added documentation to mllib statistics docs
      3f81ad2 [jose.cambronero] renamed ks1 sample test for clarity
      992293b [jose.cambronero] Style changes as per comments and added implementation note explaining the distributed approach.
      6a4784f [jose.cambronero] specified what distributions are available for the convenience method ksTest(data, name) (solely standard normal)
      4b8ba61 [jose.cambronero] fixed off by 1/N in cases when post-constant adjustment ecdf is above cdf, but prior to adj it was below
      0b5e8ec [jose.cambronero] changed KS one sample test to perform just 1 distributed pass (in addition to the sorting pass), operates on each partition separately. Implementation of Sandy Ryza's algorithm
      16b5c4c [jose.cambronero] renamed dat to data and eliminated recalc of RDD size by sharing as argument between empirical and evalOneSampleP
      c18dc66 [jose.cambronero] removed ksTestOpt from API and changed comments in HypothesisTestSuite accordingly
      f6951b6 [jose.cambronero] changed style and some comments based on feedback from pull request
      b9cff3a [jose.cambronero] made small changes to pass style check
      ce8e9a1 [jose.cambronero] added kstest testing in HypothesisTestSuite
      4da189b [jose.cambronero] added user facing ks test functions
      c659ea1 [jose.cambronero] created KS test class
      13dfe4d [jose.cambronero] created test result class for ks test
      9c507577
Loading