Skip to content
Snippets Groups Projects
  1. Nov 28, 2016
  2. Nov 22, 2016
  3. Nov 19, 2016
    • hyukjinkwon's avatar
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note... · 4b396a65
      hyukjinkwon authored
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that`/`'''Note:'''` across Scala/Java API documentation
      
      It seems in Scala/Java,
      
      - `Note:`
      - `NOTE:`
      - `Note that`
      - `'''Note:'''`
      - `note`
      
      This PR proposes to fix those to `note` to be consistent.
      
      **Before**
      
      - Scala
        ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png)
      
      - Java
        ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png)
      
      **After**
      
      - Scala
        ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png)
      
      - Java
        ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png
      
      )
      
      The notes were found via
      
      ```bash
      grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// NOTE: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...`
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note that " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// '''Note:''' " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      And then fixed one by one comparing with API documentation/access modifiers.
      
      After that, manually tested via `jekyll build`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #15889 from HyukjinKwon/SPARK-18437.
      
      (cherry picked from commit d5b1d5fc)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      4b396a65
  4. Nov 16, 2016
  5. Nov 14, 2016
  6. Nov 09, 2016
    • Tyson Condie's avatar
      [SPARK-17829][SQL] Stable format for offset log · b7d29256
      Tyson Condie authored
      ## What changes were proposed in this pull request?
      
      Currently we use java serialization for the WAL that stores the offsets contained in each batch. This has two main issues:
      It can break across spark releases (though this is not the only thing preventing us from upgrading a running query)
      It is unnecessarily opaque to the user.
      I'd propose we require offsets to provide a user readable serialization and use that instead. JSON is probably a good option.
      ## How was this patch tested?
      
      Tests were added for KafkaSourceOffset in [KafkaSourceOffsetSuite](external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceOffsetSuite.scala) and for LongOffset in [OffsetSuite](sql/core/src/test/scala/org/apache/spark/sql/streaming/OffsetSuite.scala)
      
      Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
      
       before opening a pull request.
      
      zsxwing marmbrus
      
      Author: Tyson Condie <tcondie@gmail.com>
      Author: Tyson Condie <tcondie@clash.local>
      
      Closes #15626 from tcondie/spark-8360.
      
      (cherry picked from commit 3f62e1b5)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      b7d29256
  7. Nov 07, 2016
  8. Nov 03, 2016
  9. Oct 27, 2016
    • cody koeninger's avatar
      [SPARK-17813][SQL][KAFKA] Maximum data per trigger · 10423258
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      maxOffsetsPerTrigger option for rate limiting, proportionally based on volume of different topicpartitions.
      
      ## How was this patch tested?
      
      Added unit test
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #15527 from koeninger/SPARK-17813.
      10423258
  10. Oct 21, 2016
    • cody koeninger's avatar
      [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream · 268ccb9a
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      startingOffsets takes specific per-topicpartition offsets as a json argument, usable with any consumer strategy
      
      assign with specific topicpartitions as a consumer strategy
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #15504 from koeninger/SPARK-17812.
      268ccb9a
  11. Oct 20, 2016
    • jerryshao's avatar
      [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD · 947f4f25
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      The newly implemented Structured Streaming `KafkaSource` did calculate the preferred locations for each topic partition, but didn't offer this information through RDD's `getPreferredLocations` method. So here propose to add this method in `KafkaSourceRDD`.
      
      ## How was this patch tested?
      
      Manual verification.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #15545 from jerryshao/SPARK-17999.
      947f4f25
  12. Oct 18, 2016
    • cody koeninger's avatar
      [SPARK-17841][STREAMING][KAFKA] drain commitQueue · cd106b05
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      Actually drain commit queue rather than just iterating it.
      iterator() on a concurrent linked queue won't remove items from the queue, poll() will.
      
      ## How was this patch tested?
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #15407 from koeninger/SPARK-17841.
      cd106b05
  13. Oct 13, 2016
    • Tathagata Das's avatar
      [SPARK-17731][SQL][STREAMING] Metrics for structured streaming · 7106866c
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      Metrics are needed for monitoring structured streaming apps. Here is the design doc for implementing the necessary metrics.
      https://docs.google.com/document/d/1NIdcGuR1B3WIe8t7VxLrt58TJB4DtipWEbj5I_mzJys/edit?usp=sharing
      
      Specifically, this PR adds the following public APIs changes.
      
      ### New APIs
      - `StreamingQuery.status` returns a `StreamingQueryStatus` object (renamed from `StreamingQueryInfo`, see later)
      
      - `StreamingQueryStatus` has the following important fields
        - inputRate - Current rate (rows/sec) at which data is being generated by all the sources
        - processingRate - Current rate (rows/sec) at which the query is processing data from
                                        all the sources
        - ~~outputRate~~ - *Does not work with wholestage codegen*
        - latency - Current average latency between the data being available in source and the sink writing the corresponding output
        - sourceStatuses: Array[SourceStatus] - Current statuses of the sources
        - sinkStatus: SinkStatus - Current status of the sink
        - triggerStatus - Low-level detailed status of the last completed/currently active trigger
          - latencies - getOffset, getBatch, full trigger, wal writes
          - timestamps - trigger start, finish, after getOffset, after getBatch
          - numRows - input, output, state total/updated rows for aggregations
      
      - `SourceStatus` has the following important fields
        - inputRate - Current rate (rows/sec) at which data is being generated by the source
        - processingRate - Current rate (rows/sec) at which the query is processing data from the source
        - triggerStatus - Low-level detailed status of the last completed/currently active trigger
      
      - Python API for `StreamingQuery.status()`
      
      ### Breaking changes to existing APIs
      **Existing direct public facing APIs**
      - Deprecated direct public-facing APIs `StreamingQuery.sourceStatuses` and `StreamingQuery.sinkStatus` in favour of `StreamingQuery.status.sourceStatuses/sinkStatus`.
        - Branch 2.0 should have it deprecated, master should have it removed.
      
      **Existing advanced listener APIs**
      - `StreamingQueryInfo` renamed to `StreamingQueryStatus` for consistency with `SourceStatus`, `SinkStatus`
         - Earlier StreamingQueryInfo was used only in the advanced listener API, but now it is used in direct public-facing API (StreamingQuery.status)
      
      - Field `queryInfo` in listener events `QueryStarted`, `QueryProgress`, `QueryTerminated` changed have name `queryStatus` and return type `StreamingQueryStatus`.
      
      - Field `offsetDesc` in `SourceStatus` was Option[String], converted it to `String`.
      
      - For `SourceStatus` and `SinkStatus` made constructor private instead of private[sql] to make them more java-safe. Instead added `private[sql] object SourceStatus/SinkStatus.apply()` which are harder to accidentally use in Java.
      
      ## How was this patch tested?
      
      Old and new unit tests.
      - Rate calculation and other internal logic of StreamMetrics tested by StreamMetricsSuite.
      - New info in statuses returned through StreamingQueryListener is tested in StreamingQueryListenerSuite.
      - New and old info returned through StreamingQuery.status is tested in StreamingQuerySuite.
      - Source-specific tests for making sure input rows are counted are is source-specific test suites.
      - Additional tests to test minor additions in LocalTableScanExec, StateStore, etc.
      
      Metrics also manually tested using Ganglia sink
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #15307 from tdas/SPARK-17731.
      7106866c
    • Shixiong Zhu's avatar
      [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead... · 08eac356
      Shixiong Zhu authored
      [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer
      
      ## What changes were proposed in this pull request?
      
      Because `KafkaConsumer.poll(0)` may update the partition offsets, this PR just calls `seekToBeginning` to manually set the earliest offsets for the KafkaSource initial offsets.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #15397 from zsxwing/SPARK-17834.
      08eac356
  14. Oct 12, 2016
  15. Oct 11, 2016
  16. Oct 06, 2016
  17. Oct 05, 2016
    • Shixiong Zhu's avatar
      [SPARK-17346][SQL][TEST-MAVEN] Generate the sql test jar to fix the maven build · b678e465
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Generate the sql test jar to fix the maven build
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #15368 from zsxwing/sql-test-jar.
      b678e465
    • Shixiong Zhu's avatar
      [SPARK-17346][SQL] Add Kafka source for Structured Streaming · 9293734d
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR adds a new project ` external/kafka-0-10-sql` for Structured Streaming Kafka source.
      
      It's based on the design doc: https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing
      
      tdas did most of work and part of them was inspired by koeninger's work.
      
      ### Introduction
      
      The Kafka source is a structured streaming data source to poll data from Kafka. The schema of reading data is as follows:
      
      Column | Type
      ---- | ----
      key | binary
      value | binary
      topic | string
      partition | int
      offset | long
      timestamp | long
      timestampType | int
      
      The source can deal with deleting topics. However, the user should make sure there is no Spark job processing the data when deleting a topic.
      
      ### Configuration
      
      The user can use `DataStreamReader.option` to set the following configurations.
      
      Kafka Source's options | value | default | meaning
      ------ | ------- | ------ | -----
      startingOffset | ["earliest", "latest"] | "latest" | The start point when a query is started, either "earliest" which is from the earliest offset, or "latest" which is just from the latest offset. Note: This only applies when a new Streaming query is started, and that resuming will always pick up from where the query left off.
      failOnDataLost | [true, false] | true | Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn't work as you expected.
      subscribe | A comma-separated list of topics | (none) | The topic list to subscribe. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
      subscribePattern | Java regex string | (none) | The pattern used to subscribe the topic. Only one of "subscribe" and "subscribeParttern" options can be specified for Kafka source.
      kafka.consumer.poll.timeoutMs | long | 512 | The timeout in milliseconds to poll data from Kafka in executors
      fetchOffset.numRetries | int | 3 | Number of times to retry before giving up fatch Kafka latest offsets.
      fetchOffset.retryIntervalMs | long | 10 | milliseconds to wait before retrying to fetch Kafka offsets
      
      Kafka's own configurations can be set via `DataStreamReader.option` with `kafka.` prefix, e.g, `stream.option("kafka.bootstrap.servers", "host:port")`
      
      ### Usage
      
      * Subscribe to 1 topic
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribe", "topic1")
        .load()
      ```
      
      * Subscribe to multiple topics
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribe", "topic1,topic2")
        .load()
      ```
      
      * Subscribe to a pattern
      ```Scala
      spark
        .readStream
        .format("kafka")
        .option("kafka.bootstrap.servers", "host:port")
        .option("subscribePattern", "topic.*")
        .load()
      ```
      
      ## How was this patch tested?
      
      The new unit tests.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Shixiong Zhu <zsxwing@gmail.com>
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #15102 from zsxwing/kafka-source.
      9293734d
  18. Sep 21, 2016
    • Josh Rosen's avatar
      [SPARK-17418] Prevent kinesis-asl-assembly artifacts from being published · d7ee1221
      Josh Rosen authored
      This patch updates the `kinesis-asl-assembly` build to prevent that module from being published as part of Maven releases and snapshot builds.
      
      The `kinesis-asl-assembly` includes classes from the Kinesis Client Library (KCL) and Kinesis Producer Library (KPL), both of which are licensed under the Amazon Software License and are therefore prohibited from being distributed in Apache releases.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #15167 from JoshRosen/stop-publishing-kinesis-assembly.
      d7ee1221
  19. Sep 19, 2016
    • sureshthalamati's avatar
      [SPARK-17473][SQL] fixing docker integration tests error due to different versions of jars. · cdea1d13
      sureshthalamati authored
      ## What changes were proposed in this pull request?
      Docker tests are using older version  of jersey jars (1.19),  which was used in older releases of spark.  In 2.0 releases Spark was upgraded to use 2.x verison of Jersey. After  upgrade to new versions, docker tests  are  failing with AbstractMethodError.  Now that spark is upgraded  to 2.x jersey version, using of  shaded docker jars  may not be required any more.  Removed the exclusions/overrides of jersey related classes from pom file, and changed the docker-client to use regular jar instead of shaded one.
      
      ## How was this patch tested?
      
      Tested  using existing  docker-integration-tests
      
      Author: sureshthalamati <suresh.thalamati@gmail.com>
      
      Closes #15114 from sureshthalamati/docker_testfix-spark-17473.
      cdea1d13
  20. Sep 16, 2016
    • Adam Roberts's avatar
      [SPARK-17534][TESTS] Increase timeouts for DirectKafkaStreamSuite tests · fc1efb72
      Adam Roberts authored
      **## What changes were proposed in this pull request?**
      There are two tests in this suite that are particularly flaky on the following hardware:
      
      2x Intel(R) Xeon(R) CPU E5-2697 v2  2.70GHz and 16 GB of RAM, 1 TB HDD
      
      This simple PR increases the timeout times and batch duration so they can reliably pass
      
      **## How was this patch tested?**
      Existing unit tests with the two core box where I was seeing the failures often
      
      Author: Adam Roberts <aroberts@uk.ibm.com>
      
      Closes #15094 from a-roberts/patch-6.
      fc1efb72
  21. Sep 07, 2016
    • Liwei Lin's avatar
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of... · 3ce3a282
      Liwei Lin authored
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ArrayBuffer.append(A) in performance critical paths
      
      ## What changes were proposed in this pull request?
      
      We should generally use `ArrayBuffer.+=(A)` rather than `ArrayBuffer.append(A)`, because `append(A)` would involve extra boxing / unboxing.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14914 from lw-lin/append_to_plus_eq_v2.
      3ce3a282
  22. Aug 25, 2016
    • Josh Rosen's avatar
      [SPARK-17229][SQL] PostgresDialect shouldn't widen float and short types during reads · a133057c
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      When reading float4 and smallint columns from PostgreSQL, Spark's `PostgresDialect` widens these types to Decimal and Integer rather than using the narrower Float and Short types. According to https://www.postgresql.org/docs/7.1/static/datatype.html#DATATYPE-TABLE, Postgres maps the `smallint` type to a signed two-byte integer and the `real` / `float4` types to single precision floating point numbers.
      
      This patch fixes this by adding more special-cases to `getCatalystType`, similar to what was done for the Derby JDBC dialect. I also fixed a similar problem in the write path which causes Spark to create integer columns in Postgres for what should have been ShortType columns.
      
      ## How was this patch tested?
      
      New test cases in `PostgresIntegrationSuite` (which I ran manually because Jenkins can't run it right now).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14796 from JoshRosen/postgres-jdbc-type-fixes.
      a133057c
  23. Aug 13, 2016
    • Luciano Resende's avatar
      [SPARK-17023][BUILD] Upgrade to Kafka 0.10.0.1 release · 67f025d9
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      Update Kafka streaming connector to use Kafka 0.10.0.1 release
      
      ## How was this patch tested?
      Tested via Spark unit and integration tests
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #14606 from lresende/kafka-upgrade.
      67f025d9
  24. Aug 09, 2016
    • Mariusz Strzelecki's avatar
      [SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3 · 29081b58
      Mariusz Strzelecki authored
      ## What changes were proposed in this pull request?
      
      Ability to use KafkaUtils.createDirectStream with starting offsets in python 3 by using java.lang.Number instead of Long during param mapping in scala helper. This allows py4j to pass Integer or Long to the map and resolves ClassCastException problems.
      
      ## How was this patch tested?
      
      unit tests
      
      jerryshao  - could you please look at this PR?
      
      Author: Mariusz Strzelecki <mariusz.strzelecki@allegrogroup.com>
      
      Closes #14540 from szczeles/kafka_pyspark.
      29081b58
  25. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
  26. Aug 05, 2016
  27. Aug 01, 2016
    • hyukjinkwon's avatar
      [SPARK-16776][STREAMING] Replace deprecated API in KafkaTestUtils for 0.10.0. · f93ad4fe
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR replaces the old Kafka API to 0.10.0 ones in `KafkaTestUtils`.
      
      The change include:
      
       - `Producer` to `KafkaProducer`
       - Change configurations to equalvant ones. (I referred [here](http://kafka.apache.org/documentation.html#producerconfigs) for 0.10.0 and [here](http://kafka.apache.org/082/documentation.html#producerconfigs
      ) for old, 0.8.2).
      
      This PR will remove the build warning as below:
      
      ```scala
      [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:71: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead.
      [WARNING]   private var producer: Producer[String, String] = _
      [WARNING]                         ^
      [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:181: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead.
      [WARNING]     producer = new Producer[String, String](new ProducerConfig(producerConfiguration))
      [WARNING]                    ^
      [WARNING] .../spark/streaming/kafka010/KafkaTestUtils.scala:181: class ProducerConfig in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerConfig instead.
      [WARNING]     producer = new Producer[String, String](new ProducerConfig(producerConfiguration))
      [WARNING]                                                 ^
      [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:182: class KeyedMessage in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerRecord instead.
      [WARNING]     producer.send(messages.map { new KeyedMessage[String, String](topic, _ ) }: _*)
      [WARNING]                                      ^
      [WARNING] four warnings found
      [WARNING] warning: [options] bootstrap class path not set in conjunction with -source 1.7
      [WARNING] 1 warning
      ```
      
      ## How was this patch tested?
      
      Existing tests that use `KafkaTestUtils` should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #14416 from HyukjinKwon/SPARK-16776.
      f93ad4fe
  28. Jul 26, 2016
    • Tathagata Das's avatar
      [TEST][STREAMING] Fix flaky Kafka rate controlling test · 03c27435
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      The current test is incorrect, because
      - The expected number of messages does not take into account that the topic has 2 partitions, and rate is set per partition.
      - Also in some cases, the test ran out of data in Kafka while waiting for the right amount of data per batch.
      
      The PR
      - Reduces the number of partitions to 1
      - Adds more data to Kafka
      - Runs with 0.5 second so that batches are created slowly
      
      ## How was this patch tested?
      Ran many times locally, going to run it many times in Jenkins
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #14361 from tdas/kafka-rate-test-fix.
      03c27435
  29. Jul 19, 2016
  30. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  31. Jul 08, 2016
  32. Jul 06, 2016
    • cody koeninger's avatar
      [SPARK-16212][STREAMING][KAFKA] apply test tweaks from 0-10 to 0-8 as well · b8ebf63c
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      Bring the kafka-0-8 subproject up to date with some test modifications from development on 0-10.
      
      Main changes are
      - eliminating waits on concurrent queue in favor of an assert on received results,
      - atomics instead of volatile (although this probably doesn't matter)
      - increasing uniqueness of topic names
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #14073 from koeninger/kafka-0-8-test-direct-cleanup.
      b8ebf63c
  33. Jul 05, 2016
    • cody koeninger's avatar
      [SPARK-16212][STREAMING][KAFKA] use random port for embedded kafka · 1fca9da9
      cody koeninger authored
      ## What changes were proposed in this pull request?
      
      Testing for 0.10 uncovered an issue with a fixed port number being used in KafkaTestUtils.  This is making a roughly equivalent fix for the 0.8 connector
      
      ## How was this patch tested?
      
      Unit tests, manual tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #14018 from koeninger/kafka-0-8-test-port.
      1fca9da9
  34. Jul 01, 2016
Loading