Skip to content
Snippets Groups Projects
  1. Jan 11, 2016
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space before "," or ":") · 39ae04e6
      Kousuke Saruta authored
      Fix the style violation (space before , and :).
      This PR is a followup for #10643.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10685 from sarutak/SPARK-12692-followup-streaming.
      39ae04e6
    • Yin Huai's avatar
      [SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel · aaa2c3b6
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-11823
      
      This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #10715 from yhuai/SPARK-11823-ignore.
      aaa2c3b6
    • Cheng Lian's avatar
      [SPARK-12498][SQL][MINOR] BooleanSimplication simplification · 36d49350
      Cheng Lian authored
      Scala syntax allows binary case classes to be used as infix operator in pattern matching. This PR makes use of this syntax sugar to make `BooleanSimplification` more readable.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10445 from liancheng/boolean-simplification-simplification.
      36d49350
    • wangfei's avatar
      [SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due... · 473907ad
      wangfei authored
      [SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists exception
      
      ```
      [info] Exception encountered when attempting to run a suite with class name:
      org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 milliseconds)
      [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
      [info]   at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
      [info]   at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
      [info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
      [info]   at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
      [info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
      [info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
      [info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
      [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
      [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
      [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
      [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      [info]   at java.lang.Thread.run(Thread.java:745)
      ```
      
      /cc liancheng
      
      Author: wangfei <wangfei_hello@126.com>
      
      Closes #10682 from scwf/fix-test.
      473907ad
    • Herman van Hovell's avatar
      [SPARK-12576][SQL] Enable expression parsing in CatalystQl · fe9eb0b0
      Herman van Hovell authored
      The PR allows us to use the new SQL parser to parse SQL expressions such as: ```1 + sin(x*x)```
      
      We enable this functionality in this PR, but we will not start using this actively yet. This will be done as soon as we have reached grammar parity with the existing parser stack.
      
      cc rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10649 from hvanhovell/SPARK-12576.
      fe9eb0b0
    • Yuhao Yang's avatar
      [SPARK-10809][MLLIB] Single-document topicDistributions method for LocalLDAModel · bbea8885
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-10809
      
      We could provide a single-document topicDistributions method for LocalLDAModel to allow for quick queries which avoid RDD operations. Currently, the user must use an RDD of documents.
      
      add some missing assert too.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #9484 from hhbyyh/ldaTopicPre.
      bbea8885
    • Yuhao Yang's avatar
      [SPARK-12685][MLLIB] word2vec trainWordsCount gets overflow · 4f8eefa3
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-12685
      the log of `word2vec` reports
      trainWordsCount = -785727483
      during computation over a large dataset.
      
      Update the priority as it will affect the computation process.
      `alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))`
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #10627 from hhbyyh/w2voverflow.
      4f8eefa3
    • Yanbo Liang's avatar
      [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single... · ee4ee02b
      Yanbo Liang authored
      [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft
      
      PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10552 from yanboliang/spark-12603.
      ee4ee02b
    • Brandon Bradley's avatar
      [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting · a767ee8a
      Brandon Bradley authored
      Warning users about casting changes.
      
      Author: Brandon Bradley <bradleytastic@gmail.com>
      
      Closes #10708 from blbradley/spark-12758.
      a767ee8a
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after install in dep tests · a4499145
      Josh Rosen authored
      This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script.
      
      First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed.
      
      I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests.
      
      /cc zsxwing
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10704 from JoshRosen/fix-build-test-problems.
      a4499145
    • Jacek Laskowski's avatar
      [STREAMING][MINOR] Typo fixes · b313bada
      Jacek Laskowski authored
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #10698 from jaceklaskowski/streaming-kafka-typo-fixes.
      b313bada
    • Anatoliy Plastinin's avatar
      [SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat... · 9559ac5f
      Anatoliy Plastinin authored
      [SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat integers as number of seconds
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-12744
      
      This PR makes parsing JSON integers to timestamps consistent with casting behavior.
      
      Author: Anatoliy Plastinin <anatoliy.plastinin@gmail.com>
      
      Closes #10687 from antlypls/fix-json-timestamp-parsing.
      9559ac5f
    • BrianLondon's avatar
      [SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version · 8fe928b4
      BrianLondon authored
      The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.
      
      Author: BrianLondon <brian@seatgeek.com>
      
      Closes #10256 from BrianLondon/master.
      8fe928b4
    • Udo Klein's avatar
      removed lambda from sortByKey() · bd723bd5
      Udo Klein authored
      According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort.
      
      Author: Udo Klein <git@blinkenlight.net>
      
      Closes #10640 from udoklein/patch-1.
      bd723bd5
    • Wenchen Fan's avatar
      [SPARK-12539][FOLLOW-UP] always sort in partitioning writer · f253feff
      Wenchen Fan authored
      address comments in #10498 , especially https://github.com/apache/spark/pull/10498#discussion_r49021259
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Reynold Xin <rxin@databricks.com>
      
      Closes #10638 from cloud-fan/bucket-write.
      f253feff
    • Josh Rosen's avatar
      [SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions · f13c7f8f
      Josh Rosen authored
      This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.
      
      While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10693 from JoshRosen/netty-hotfix.
      f13c7f8f
    • Kousuke Saruta's avatar
      [SPARK-4628][BUILD] Add a resolver to MiMaBuild.scala for mqttv3(1.0.1). · 008a5582
      Kousuke Saruta authored
      #10659 removed the repository `https://repo.eclipse.org/content/repositories/paho-releases` but it's needed by MiMa because `spark-streaming-mqtt(1.6.0)` depends on `mqttv3(1.0.1)` and it is provided by the removed repository and maven-central provide only `mqttv3(1.0.2)` for now.
      Otherwise, if `mqttv3(1.0.1)` is absent from the local repository, dev/mima should fail.
      
      JoshRosen Do you have any other better idea?
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10688 from sarutak/SPARK-4628-followup.
      008a5582
  2. Jan 10, 2016
    • Marcelo Vanzin's avatar
      [SPARK-3873][BUILD] Enable import ordering error checking. · 6439a825
      Marcelo Vanzin authored
      Turn import ordering violations into build errors, plus a few adjustments
      to account for how the checker behaves. I'm a little on the fence about
      whether the existing code is right, but it's easier to appease the checker
      than to discuss what's the more correct order here.
      
      Plus a few fixes to imports that cropped in since my recent cleanups.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10612 from vanzin/SPARK-3873-enable.
      6439a825
    • Josh Rosen's avatar
      [SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent future bugs · 3ab0138b
      Josh Rosen authored
      Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).
      
      This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.
      
      /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.
      
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10672 from JoshRosen/enforce-netty-exclusions.
      3ab0138b
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD][GRAPHX] Scala style: Fix the style violation (Space before "," or ":") · 3119206b
      Kousuke Saruta authored
      Fix the style violation (space before `,` and `:`).
      This PR is a followup for #10643.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10683 from sarutak/SPARK-12692-followup-graphx.
      3119206b
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD][MLLIB] Scala style: Fix the style violation (Space before "," or ":") · e5904bb5
      Kousuke Saruta authored
      Fix the style violation (space before , and :).
      This PR is a followup for #10643.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10684 from sarutak/SPARK-12692-followup-mllib.
      e5904bb5
    • Jacek Laskowski's avatar
      [SPARK-12736][CORE][DEPLOY] Standalone Master cannot be started due t… · b78e028e
      Jacek Laskowski authored
      …o NoClassDefFoundError: org/spark-project/guava/collect/Maps
      
      /cc srowen rxin
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #10674 from jaceklaskowski/SPARK-12736.
      b78e028e
  3. Jan 09, 2016
  4. Jan 08, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-12577] [SQL] Better support of parentheses in partition by and order by... · 95cd5d95
      Liang-Chi Hsieh authored
      [SPARK-12577] [SQL] Better support of parentheses in partition by and order by clause of window function's over clause
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-12577
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #10620 from viirya/fix-parentheses.
      95cd5d95
    • Josh Rosen's avatar
      [SPARK-4628][BUILD] Remove all non-Maven-Central repositories from build · 090d6913
      Josh Rosen authored
      This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository.
      
      I tested this by running
      
      ```
      build/mvn \
              -Phive \
              -Phive-thriftserver \
              -Pkinesis-asl \
              -Pspark-ganglia-lgpl \
              -Pyarn \
              dependency:go-offline
      ```
      
      inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10659 from JoshRosen/SPARK-4628.
      090d6913
    • Josh Rosen's avatar
      [SPARK-12730][TESTS] De-duplicate some test code in BlockManagerSuite · 1fdf9bbd
      Josh Rosen authored
      This patch deduplicates some test code in BlockManagerSuite. I'm splitting this change off from a larger PR in order to make things easier to review.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10667 from JoshRosen/block-mgr-tests-cleanup.
      1fdf9bbd
    • Cheng Lian's avatar
      [SPARK-12593][SQL] Converts resolved logical plan back to SQL · d9447cac
      Cheng Lian authored
      This PR tries to enable Spark SQL to convert resolved logical plans back to SQL query strings.  For now, the major use case is to canonicalize Spark SQL native view support.  The major entry point is `SQLBuilder.toSQL`, which returns an `Option[String]` if the logical plan is recognized.
      
      The current version is still in WIP status, and is quite limited.  Known limitations include:
      
      1.  The logical plan must be analyzed but not optimized
      
          The optimizer erases `Subquery` operators, which contain necessary scope information for SQL generation.  Future versions should be able to recover erased scope information by inserting subqueries when necessary.
      
      1.  The logical plan must be created using HiveQL query string
      
          Query plans generated by composing arbitrary DataFrame API combinations are not supported yet.  Operators within these query plans need to be rearranged into a canonical form that is more suitable for direct SQL generation.  For example, the following query plan
      
          ```
          Filter (a#1 < 10)
           +- MetastoreRelation default, src, None
          ```
      
          need to be canonicalized into the following form before SQL generation:
      
          ```
          Project [a#1, b#2, c#3]
           +- Filter (a#1 < 10)
               +- MetastoreRelation default, src, None
          ```
      
          Otherwise, the SQL generation process will have to handle a large number of special cases.
      
      1.  Only a fraction of expressions and basic logical plan operators are supported in this PR
      
          Currently, 95.7% (1720 out of 1798) query plans in `HiveCompatibilitySuite` can be successfully converted to SQL query strings.
      
          Known unsupported components are:
      
          - Expressions
            - Part of math expressions
            - Part of string expressions (buggy?)
            - Null expressions
            - Calendar interval literal
            - Part of date time expressions
            - Complex type creators
            - Special `NOT` expressions, e.g. `NOT LIKE` and `NOT IN`
          - Logical plan operators/patterns
            - Cube, rollup, and grouping set
            - Script transformation
            - Generator
            - Distinct aggregation patterns that fit `DistinctAggregationRewriter` analysis rule
            - Window functions
      
          Support for window functions, generators, and cubes etc. will be added in follow-up PRs.
      
      This PR leverages `HiveCompatibilitySuite` for testing SQL generation in a "round-trip" manner:
      
      *   For all select queries, we try to convert it back to SQL
      *   If the query plan is convertible, we parse the generated SQL into a new logical plan
      *   Run the new logical plan instead of the original one
      
      If the query plan is inconvertible, the test case simply falls back to the original logic.
      
      TODO
      
      - [x] Fix failed test cases
      - [x] Support for more basic expressions and logical plan operators (e.g. distinct aggregation etc.)
      - [x] Comments and documentation
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10541 from liancheng/sql-generation.
      d9447cac
    • Sean Owen's avatar
      [SPARK-4819] Remove Guava's "Optional" from public API · 659fd9d0
      Sean Owen authored
      Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`)
      
      See also https://github.com/apache/spark/pull/10512
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10513 from srowen/SPARK-4819.
      659fd9d0
    • Thomas Graves's avatar
      [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail… · 553fd7b9
      Thomas Graves authored
      …s on secure Hadoop
      
      https://issues.apache.org/jira/browse/SPARK-12654
      
      So the bug here is that WholeTextFileRDD.getPartitions has:
      val conf = getConf
      in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext.
      The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works.
      
      Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
      
      Closes #10651 from tgravescs/SPARK-12654.
      553fd7b9
    • Udo Klein's avatar
      fixed numVertices in transitive closure example · 8c70cb4c
      Udo Klein authored
      Author: Udo Klein <git@blinkenlight.net>
      
      Closes #10642 from udoklein/patch-2.
      8c70cb4c
    • Jeff Zhang's avatar
      [DOCUMENTATION] doc fix of job scheduling · 00d92617
      Jeff Zhang authored
      spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10657 from zjffdu/doc-fix.
      00d92617
    • Bryan Cutler's avatar
      [SPARK-12701][CORE] FileAppender should use join to ensure writing thread completion · ea104b8f
      Bryan Cutler authored
      Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.
      ea104b8f
    • Liang-Chi Hsieh's avatar
      [SPARK-12687] [SQL] Support from clause surrounded by `()`. · cfe1ba56
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-12687
      
      Some queries such as `(select 1 as a) union (select 2 as a)` can't work. This patch fixes it.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #10660 from viirya/fix-union.
      cfe1ba56
    • Sean Owen's avatar
      [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 edition · b9c83533
      Sean Owen authored
      Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10570 from srowen/SPARK-12618.
      b9c83533
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD] Scala style: check no white space before comma and colon · 794ea553
      Kousuke Saruta authored
      We should not put a white space before `,` and `:` so let's check it.
      Because there are lots of style violations, first, I'd like to add a checker, enable and let the level `warning`.
      Then, I'd like to fix the style step by step.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10643 from sarutak/SPARK-12692.
      794ea553
  5. Jan 07, 2016
    • Reynold Xin's avatar
      Fix indentation for the previous patch. · 726bd3c4
      Reynold Xin authored
      726bd3c4
    • Kevin Yu's avatar
      [SPARK-12317][SQL] Support units (m,k,g) in SQLConf · 5028a001
      Kevin Yu authored
      This PR is continue from previous closed PR 10314.
      
      In this PR, SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE will be taken memory string conventions as input.
      
      For example, the user can now specify 10g for SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE in SQLConf file.
      
      marmbrus srowen : Can you help review this code changes ? Thanks.
      
      Author: Kevin Yu <qyu@us.ibm.com>
      
      Closes #10629 from kevinyu98/spark-12317.
      5028a001
Loading