Skip to content
Snippets Groups Projects
  1. Sep 21, 2016
    • Josh Rosen's avatar
      [SPARK-17418] Prevent kinesis-asl-assembly artifacts from being published · d7ee1221
      Josh Rosen authored
      This patch updates the `kinesis-asl-assembly` build to prevent that module from being published as part of Maven releases and snapshot builds.
      
      The `kinesis-asl-assembly` includes classes from the Kinesis Client Library (KCL) and Kinesis Producer Library (KPL), both of which are licensed under the Amazon Software License and are therefore prohibited from being distributed in Apache releases.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #15167 from JoshRosen/stop-publishing-kinesis-assembly.
      d7ee1221
  2. Sep 19, 2016
    • sureshthalamati's avatar
      [SPARK-17473][SQL] fixing docker integration tests error due to different versions of jars. · cdea1d13
      sureshthalamati authored
      ## What changes were proposed in this pull request?
      Docker tests are using older version  of jersey jars (1.19),  which was used in older releases of spark.  In 2.0 releases Spark was upgraded to use 2.x verison of Jersey. After  upgrade to new versions, docker tests  are  failing with AbstractMethodError.  Now that spark is upgraded  to 2.x jersey version, using of  shaded docker jars  may not be required any more.  Removed the exclusions/overrides of jersey related classes from pom file, and changed the docker-client to use regular jar instead of shaded one.
      
      ## How was this patch tested?
      
      Tested  using existing  docker-integration-tests
      
      Author: sureshthalamati <suresh.thalamati@gmail.com>
      
      Closes #15114 from sureshthalamati/docker_testfix-spark-17473.
      cdea1d13
  3. Sep 16, 2016
    • Adam Roberts's avatar
      [SPARK-17534][TESTS] Increase timeouts for DirectKafkaStreamSuite tests · fc1efb72
      Adam Roberts authored
      **## What changes were proposed in this pull request?**
      There are two tests in this suite that are particularly flaky on the following hardware:
      
      2x Intel(R) Xeon(R) CPU E5-2697 v2  2.70GHz and 16 GB of RAM, 1 TB HDD
      
      This simple PR increases the timeout times and batch duration so they can reliably pass
      
      **## How was this patch tested?**
      Existing unit tests with the two core box where I was seeing the failures often
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #15094 from a-roberts/patch-6.
      fc1efb72
  4. Sep 07, 2016
    • Liwei Lin's avatar
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of... · 3ce3a282
      Liwei Lin authored
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ArrayBuffer.append(A) in performance critical paths
      
      ## What changes were proposed in this pull request?
      
      We should generally use `ArrayBuffer.+=(A)` rather than `ArrayBuffer.append(A)`, because `append(A)` would involve extra boxing / unboxing.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14914 from lw-lin/append_to_plus_eq_v2.
      3ce3a282
  5. Aug 25, 2016
    • Josh Rosen's avatar
      [SPARK-17229][SQL] PostgresDialect shouldn't widen float and short types during reads · a133057c
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      When reading float4 and smallint columns from PostgreSQL, Spark's `PostgresDialect` widens these types to Decimal and Integer rather than using the narrower Float and Short types. According to https://www.postgresql.org/docs/7.1/static/datatype.html#DATATYPE-TABLE, Postgres maps the `smallint` type to a signed two-byte integer and the `real` / `float4` types to single precision floating point numbers.
      
      This patch fixes this by adding more special-cases to `getCatalystType`, similar to what was done for the Derby JDBC dialect. I also fixed a similar problem in the write path which causes Spark to create integer columns in Postgres for what should have been ShortType columns.
      
      ## How was this patch tested?
      
      New test cases in `PostgresIntegrationSuite` (which I ran manually because Jenkins can't run it right now).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14796 from JoshRosen/postgres-jdbc-type-fixes.
      a133057c
  6. Aug 13, 2016
    • Luciano Resende's avatar
      [SPARK-17023][BUILD] Upgrade to Kafka 0.10.0.1 release · 67f025d9
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      Update Kafka streaming connector to use Kafka 0.10.0.1 release
      
      ## How was this patch tested?
      Tested via Spark unit and integration tests
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #14606 from lresende/kafka-upgrade.
      67f025d9
  7. Aug 09, 2016
    • Mariusz Strzelecki's avatar
      [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3 · 29081b58
      Mariusz Strzelecki authored
      ## What changes were proposed in this pull request?
      
      Ability to use KafkaUtils.createDirectStream with starting offsets in python 3 by using java.lang.Number instead of Long during param mapping in scala helper. This allows py4j to pass Integer or Long to the map and resolves ClassCastException problems.
      
      ## How was this patch tested?
      
      unit tests
      
      jerryshao  - could you please look at this PR?
      
      Author: Mariusz Strzelecki <mariusz.strzelecki@allegrogroup.com>
      
      Closes #14540 from szczeles/kafka_pyspark.
      29081b58
  8. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
  9. Aug 05, 2016
  10. Aug 01, 2016
    • hyukjinkwon's avatar
      [SPARK-16776][STREAMING] Replace deprecated API in KafkaTestUtils for 0.10.0. · f93ad4fe
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR replaces the old Kafka API to 0.10.0 ones in `KafkaTestUtils`.
      
      The change include:
      
       - `Producer` to `KafkaProducer`
       - Change configurations to equalvant ones. (I referred [here](http://kafka.apache.org/documentation.html#producerconfigs) for 0.10.0 and [here](http://kafka.apache.org/082/documentation.html#producerconfigs
      ) for old, 0.8.2).
      
      This PR will remove the build warning as below:
      
      ```scala
      [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:71: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead.
      [WARNING]   private var producer: Producer[String, String] = _
      [WARNING]                         ^
      [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:181: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead.
      [WARNING]     producer = new Producer[String, String](new ProducerConfig(producerConfiguration))
      [WARNING]                    ^
      [WARNING] .../spark/streaming/kafka010/KafkaTestUtils.scala:181: class ProducerConfig in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerConfig instead.
      [WARNING]     producer = new Producer[String, String](new ProducerConfig(producerConfiguration))
      [WARNING]                                                 ^
      [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:182: class KeyedMessage in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerRecord instead.
      [WARNING]     producer.send(messages.map { new KeyedMessage[String, String](topic, _ ) }: _*)
      [WARNING]                                      ^
      [WARNING] four warnings found
      [WARNING] warning: [options] bootstrap class path not set in conjunction with -source 1.7
      [WARNING] 1 warning
      ```
      
      ## How was this patch tested?
      
      Existing tests that use `KafkaTestUtils` should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #14416 from HyukjinKwon/SPARK-16776.
      f93ad4fe
  11. Jul 26, 2016
    • Tathagata Das's avatar
      [TEST][STREAMING] Fix flaky Kafka rate controlling test · 03c27435
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      The current test is incorrect, because
      - The expected number of messages does not take into account that the topic has 2 partitions, and rate is set per partition.
      - Also in some cases, the test ran out of data in Kafka while waiting for the right amount of data per batch.
      
      The PR
      - Reduces the number of partitions to 1
      - Adds more data to Kafka
      - Runs with 0.5 second so that batches are created slowly
      
      ## How was this patch tested?
      Ran many times locally, going to run it many times in Jenkins
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #14361 from tdas/kafka-rate-test-fix.
      03c27435
  12. Jul 19, 2016
  13. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  14. Jul 08, 2016
  15. Jul 06, 2016
    • cody koeninger's avatar
      [SPARK-16212][STREAMING][KAFKA] apply test tweaks from 0-10 to 0-8 as well · b8ebf63c
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      Bring the kafka-0-8 subproject up to date with some test modifications from development on 0-10.
      
      Main changes are
      - eliminating waits on concurrent queue in favor of an assert on received results,
      - atomics instead of volatile (although this probably doesn't matter)
      - increasing uniqueness of topic names
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #14073 from koeninger/kafka-0-8-test-direct-cleanup.
      b8ebf63c
  16. Jul 05, 2016
    • cody koeninger's avatar
      [SPARK-16212][STREAMING][KAFKA] use random port for embedded kafka · 1fca9da9
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      Testing for 0.10 uncovered an issue with a fixed port number being used in KafkaTestUtils.  This is making a roughly equivalent fix for the 0.8 connector
      
      ## How was this patch tested?
      
      Unit tests, manual tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #14018 from koeninger/kafka-0-8-test-port.
      1fca9da9
  17. Jul 01, 2016
  18. Jun 30, 2016
    • cody koeninger's avatar
      [SPARK-16212][STREAMING][KAFKA] code cleanup from review feedback · c6226334
      cody koeninger authored
      ## What changes were proposed in this pull request?
      code cleanup in kafka-0-8 to match suggested changes for kafka-0-10 branch
      
      ## How was this patch tested?
      unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #13908 from koeninger/kafka-0-8-cleanup.
      c6226334
    • Tathagata Das's avatar
      [SPARK-12177][TEST] Removed test to avoid compilation issue in scala 2.10 · de8ab313
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      The commented lines failed scala 2.10 build. This is because of change in behavior of case classes between 2.10 and 2.11. In scala 2.10, if companion object of a case class has explicitly defined apply(), then the implicit apply method is not generated. In scala 2.11 it is generated. Hence, the lines compile fine in 2.11 but not in 2.10.
      
      This simply comments the tests to fix broken build. Correct solution is pending.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13992 from tdas/SPARK-12177.
      de8ab313
    • cody koeninger's avatar
      [SPARK-12177][STREAMING][KAFKA] Update KafkaDStreams to new Kafka 0.10 Consumer API · dedbceec
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      New Kafka consumer api for the released 0.10 version of Kafka
      
      ## How was this patch tested?
      
      Unit tests, manual tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #11863 from koeninger/kafka-0.9.
      dedbceec
  19. Jun 12, 2016
    • Sean Owen's avatar
      [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API · f51dfe61
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Deprecate old Java accumulator API; should use Scala now
      - Update Java tests and examples
      - Don't bother testing old accumulator API in Java 8 (too)
      - (fix a misspelling too)
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13606 from srowen/SPARK-15086.
      f51dfe61
  20. Jun 06, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'an -> a' · fd8af397
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `an -> a`
      
      Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one.
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13515 from zhengruifeng/an_a.
      fd8af397
  21. May 31, 2016
    • Marcelo Vanzin's avatar
      [SPARK-15451][BUILD] Use jdk7's rt.jar when available. · 57adb77e
      Marcelo Vanzin authored
      This helps with preventing jdk8-specific calls being checked in,
      because PR builders are running the compiler with the wrong settings.
      
      If the JAVA_7_HOME env variable is set, assume it points at
      a jdk7 and use its rt.jar when invoking javac. For zinc, just run
      it with jdk7, and disable it when building jdk8-specific code.
      
      A big note for sbt usage: adding the bootstrap options forces sbt
      to fork the compiler, and that disables incremental compilation.
      That means that it's really not convenient to use for normal
      development, but should be ok for automated builds.
      
      Tested with JAVA_HOME=jdk8 and JAVA_7_HOME=jdk7:
      - mvn + zinc
      - mvn sans zinc
      - sbt
      
      Verified that in all cases, jdk8-specific library calls fail to
      compile.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #13272 from vanzin/SPARK-15451.
      57adb77e
  22. May 27, 2016
    • Reynold Xin's avatar
      [SPARK-15633][MINOR] Make package name for Java tests consistent · 73178c75
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This is a simple patch that makes package names for Java 8 test suites consistent. I moved everything to test.org.apache.spark to we can test package private APIs properly. Also added "java8" as the package name so we can easily run all the tests related to Java 8.
      
      ## How was this patch tested?
      This is a test only change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13364 from rxin/SPARK-15633.
      73178c75
  23. May 25, 2016
  24. May 17, 2016
  25. May 15, 2016
    • Sean Owen's avatar
      [SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient · f5576a05
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      (Retry of https://github.com/apache/spark/pull/13049)
      
      - update to httpclient 4.5 / httpcore 4.4
      - remove some defunct exclusions
      - manage httpmime version to match
      - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used)
      
      ## How was this patch tested?
      
      Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...`
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13117 from srowen/SPARK-12972.2.
      f5576a05
  26. May 13, 2016
  27. May 12, 2016
    • Brian O'Neill's avatar
      [SPARK-14421] Upgrades protobuf dependency to 2.6.1 for the new version of KCL, and… · 81e3bfc1
      Brian O'Neill authored
      ## What changes were proposed in this pull request?
      
      When running with Kinesis Consumer Library (KCL), against a stream that contains aggregated data, the KCL needs access to protobuf to de-aggregate the records.   Without this patch, that results in the following error message:
      
      ```
         Caused by: java.lang.ClassNotFoundException: com.google.protobuf.ProtocolStringList
      ```
      
      This PR upgrades the protobuf dependency within the kinesis-asl-assembly, and relocates that package (as not to conflict with Spark's use of 2.5.0), which fixes the above CNFE.
      
      ## How was this patch tested?
      
      Used kinesis word count example against a stream containing aggregated data.
      
      See: SPARK-14421
      
      Author: Brian O'Neill <bone@alumni.brown.edu>
      
      Closes #13054 from boneill42/protobuf-relocation-for-kcl.
      81e3bfc1
  28. May 11, 2016
    • cody koeninger's avatar
      [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact · 89e67d66
      cody koeninger authored
      ## What changes were proposed in this pull request?
      Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions
      
      ## How was this patch tested?
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #12946 from koeninger/SPARK-15085.
      89e67d66
  29. May 10, 2016
    • Xin Ren's avatar
      [SPARK-14936][BUILD][TESTS] FlumePollingStreamSuite is slow · 86475520
      Xin Ren authored
      https://issues.apache.org/jira/browse/SPARK-14936
      
      ## What changes were proposed in this pull request?
      
      FlumePollingStreamSuite contains two tests which run for a minute each. This seems excessively slow and we should speed it up if possible.
      
      In this PR, instead of creating `StreamingContext` directly from `conf`, here an underlying `SparkContext` is created before all and it is used to create  each`StreamingContext`.
      
      Running time is reduced by avoiding multiple `SparkContext` creations and destroys.
      
      ## How was this patch tested?
      
      Tested on my local machine running `testOnly *.FlumePollingStreamSuite`
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #12845 from keypointt/SPARK-14936.
      86475520
    • Shixiong Zhu's avatar
      [SPARK-6005][TESTS] Fix flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery · 9533f539
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Because this test extracts data from `DStream.generatedRDDs` before stopping, it may get data before checkpointing. Then after recovering from the checkpoint, `recoveredOffsetRanges` may contain something not in `offsetRangesBeforeStop`, which will fail the test. Adding `Thread.sleep(1000)` before `ssc.stop()` will reproduce this failure.
      
      This PR just moves the logic of `offsetRangesBeforeStop` (also renamed to `offsetRangesAfterStop`) after `ssc.stop()` to fix the flaky test.
      
      ## How was this patch tested?
      
      Jenkins unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #12903 from zsxwing/SPARK-6005.
      9533f539
    • Subhobrata Dey's avatar
      [SPARK-14642][SQL] import org.apache.spark.sql.expressions._ breaks udf under functions · 89f73f67
      Subhobrata Dey authored
      ## What changes were proposed in this pull request?
      
      PR fixes the import issue which breaks udf functions.
      
      The following code snippet throws an error
      
      ```
      scala> import org.apache.spark.sql.functions._
      import org.apache.spark.sql.functions._
      
      scala> import org.apache.spark.sql.expressions._
      import org.apache.spark.sql.expressions._
      
      scala> udf((v: String) => v.stripSuffix("-abc"))
      <console>:30: error: No TypeTag available for String
             udf((v: String) => v.stripSuffix("-abc"))
      ```
      
      This PR resolves the issue.
      
      ## How was this patch tested?
      
      patch tested with unit tests.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Subhobrata Dey <sbcd90@gmail.com>
      
      Closes #12458 from sbcd90/udfFuncBreak.
      89f73f67
  30. May 06, 2016
    • Luciano Resende's avatar
      [SPARK-14738][BUILD] Separate docker integration tests from main build · a03c5e68
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      
      Create a maven profile for executing the docker integration tests using maven
      Remove docker integration tests from main sbt build
      Update documentation on how to run docker integration tests from sbt
      
      ## How was this patch tested?
      
      Manual test of the docker integration tests as in :
      mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test
      
      ## Other comments
      
      Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #12508 from lresende/docker.
      a03c5e68
  31. May 05, 2016
    • Luciano Resende's avatar
      [SPARK-14589][SQL] Enhance DB2 JDBC Dialect docker tests · 10443022
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      
      Enhance the DB2 JDBC Dialect docker tests as they seemed to have had some issues on previous merge causing some tests to fail.
      
      ## How was this patch tested?
      
      By running the integration tests locally.
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #12348 from lresende/SPARK-14589.
      10443022
    • mcheah's avatar
      [SPARK-12154] Upgrade to Jersey 2 · b7fdc23c
      mcheah authored
      ## What changes were proposed in this pull request?
      
      Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things.
      
      ## How was this patch tested?
      
      I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #12715 from mccheah/feature/upgrade-jersey.
      b7fdc23c
  32. Apr 28, 2016
Loading