Skip to content
Snippets Groups Projects
  1. Jun 20, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-8495] [SPARKR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script · 004f5737
      Yu ISHIKAWA authored
      Thank Shivaram Venkataraman for your support. This is a prototype script to validate the R files.
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6922 from yu-iskw/SPARK-6813 and squashes the following commits:
      
      c1ffe6b [Yu ISHIKAWA] Modify to save result to a log file and add a rule to validate
      5520806 [Yu ISHIKAWA] Exclude the .lintr file not to check Apache lincence
      8f94680 [Yu ISHIKAWA] [SPARK-8495][SparkR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script
      004f5737
    • Josh Rosen's avatar
      [SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests · 7a3c424e
      Josh Rosen authored
      This patch builds upon #5694 to add a 'module' abstraction to the `dev/run-tests` script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules.
      
      This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests.  It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed.
      
      This patch also adds doctests for the new graph traversal / change mapping code.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6866 from JoshRosen/more-dev-run-tests-refactoring and squashes the following commits:
      
      75de450 [Josh Rosen] Use module system to determine which build profiles to enable.
      4224da5 [Josh Rosen] Add documentation to Module.
      a86a953 [Josh Rosen] Clean up modules; add new modules for streaming external projects
      e46539f [Josh Rosen] Fix camel-cased endswith()
      35a3052 [Josh Rosen] Enable Hive tests when running all tests
      df10e23 [Josh Rosen] update to reflect fact that no module depends on root
      3670d50 [Josh Rosen] mllib should depend on streaming
      dc6f1c6 [Josh Rosen] Use changed files' extensions to decide whether to run style checks
      7092d3e [Josh Rosen] Skip SBT tests if no test goals are specified
      43a0ced [Josh Rosen] Minor fixes
      3371441 [Josh Rosen] Test everything if nothing has changed (needed for non-PRB builds)
      37f3fb3 [Josh Rosen] Remove doc profiles option, since it's not actually needed (see #6865)
      f53864b [Josh Rosen] Finish integrating module changes
      f0249bd [Josh Rosen] WIP
      7a3c424e
    • Liang-Chi Hsieh's avatar
      [SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to... · 0b899516
      Liang-Chi Hsieh authored
      [SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-8468
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6905 from viirya/cv_min and squashes the following commits:
      
      930d3db [Liang-Chi Hsieh] Fix python unit test and add document.
      d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min
      16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal.
      c3dd8d9 [Liang-Chi Hsieh] For comments.
      b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.
      0b899516
  2. Jun 19, 2015
    • cody koeninger's avatar
      [SPARK-8127] [STREAMING] [KAFKA] KafkaRDD optimize count() take() isEmpty() · 1b6fe9b1
      cody koeninger authored
      …ed KafkaRDD methods.  Possible fix for [SPARK-7122], but probably a worthwhile optimization regardless.
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6632 from koeninger/kafka-rdd-count and squashes the following commits:
      
      321340d [cody koeninger] [SPARK-8127][Streaming][Kafka] additional test of ordering of take()
      5a05d0f [cody koeninger] [SPARK-8127][Streaming][Kafka] additional test of isEmpty
      f68bd32 [cody koeninger] [Streaming][Kafka][SPARK-8127] code cleanup
      9555b73 [cody koeninger] Merge branch 'master' into kafka-rdd-count
      253031d [cody koeninger] [Streaming][Kafka][SPARK-8127] mima exclusion for change to private method
      8974b9e [cody koeninger] [Streaming][Kafka][SPARK-8127] check offset ranges before constructing KafkaRDD
      c3768c5 [cody koeninger] [Streaming][Kafka] Take advantage of offset range info for size-related KafkaRDD methods.  Possible fix for [SPARK-7122], but probably a worthwhile optimization regardless.
      1b6fe9b1
    • Andrew Or's avatar
      [HOTFIX] [SPARK-8489] Correct JIRA number in previous commit · bec40e52
      Andrew Or authored
      It should be SPARK-8489, not SPARK-8498.
      bec40e52
    • Andrew Or's avatar
      [SPARK-8498] [SQL] Add regression test for SPARK-8470 · 093c3483
      Andrew Or authored
      **Summary of the problem in SPARK-8470.** When using `HiveContext` to create a data frame of a user case class, Spark throws `scala.reflect.internal.MissingRequirementError` when it tries to infer the schema using reflection. This is caused by `HiveContext` silently overwriting the context class loader containing the user classes.
      
      **What this issue is about.** This issue adds regression tests for SPARK-8470, which is already fixed in #6891. We closed SPARK-8470 as a duplicate because it is a different manifestation of the same problem in SPARK-8368. Due to the complexity of the reproduction, this requires us to pre-package a special test jar and include it in the Spark project itself.
      
      I tested this with and without the fix in #6891 and verified that it passes only if the fix is present.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6909 from andrewor14/SPARK-8498 and squashes the following commits:
      
      5e9d688 [Andrew Or] Add regression test for SPARK-8470
      093c3483
    • cody koeninger's avatar
      [SPARK-8390] [STREAMING] [KAFKA] fix docs related to HasOffsetRanges · b305e377
      cody koeninger authored
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6863 from koeninger/SPARK-8390 and squashes the following commits:
      
      26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390
      3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs
      b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing
      bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup
      3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
      b305e377
    • Michael Armbrust's avatar
      [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings · a333a72e
      Michael Armbrust authored
      In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`.  This allowed comparing a timestamp with a partial date as a user would expect.
       - `time > "2014-06-10"`
       - `time > "2014"`
      
      In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.
      
      This PR restores the earlier behavior.  Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6888 from marmbrus/timeCompareString and squashes the following commits:
      
      bdef29c [Michael Armbrust] test partial date
      1f09adf [Michael Armbrust] special handling of equality
      1172c60 [Michael Armbrust] more test fixing
      4dfc412 [Michael Armbrust] fix tests
      aaa9508 [Michael Armbrust] newline
      04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings
      a333a72e
    • Nathan Howell's avatar
      [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents · 9814b971
      Nathan Howell authored
      Author: Nathan Howell <nhowell@godaddy.com>
      
      Closes #6799 from NathanHowell/spark-8093 and squashes the following commits:
      
      76ac3e8 [Nathan Howell] [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents
      9814b971
    • Hossein's avatar
      [SPARK-8452] [SPARKR] expose jobGroup API in SparkR · 1fa29c2d
      Hossein authored
      This pull request adds following methods to SparkR:
      
      ```R
      setJobGroup()
      cancelJobGroup()
      clearJobGroup()
      ```
      For each method, the spark context is passed as the first argument. There does not seem to be a good way to test these in R.
      
      cc shivaram and davies
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #6889 from falaki/SPARK-8452 and squashes the following commits:
      
      9ce9f1e [Hossein] Added basic tests to verify methods can be called and won't throw errors
      c706af9 [Hossein] Added examples
      a2c19af [Hossein] taking spark context as first argument
      343ca77 [Hossein] Added setJobGroup, cancelJobGroup and clearJobGroup to SparkR
      1fa29c2d
    • MechCoder's avatar
      [SPARK-4118] [MLLIB] [PYSPARK] Python bindings for StreamingKMeans · 54976e55
      MechCoder authored
      Python bindings for StreamingKMeans
      
      Will change status to MRG once docs, tests and examples are updated.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6499 from MechCoder/spark-4118 and squashes the following commits:
      
      7722d16 [MechCoder] minor style fixes
      51052d3 [MechCoder] Doc fixes
      2061a76 [MechCoder] Add tests for simultaneous training and prediction Minor style fixes
      81482fd [MechCoder] minor
      5d9fe61 [MechCoder] predictOn should take into account the latest model
      8ab9e89 [MechCoder] Fix Python3 error
      a9817df [MechCoder] Better tests and minor fixes
      c80e451 [MechCoder] Add ignore_unicode_prefix
      ee8ce16 [MechCoder] Update tests, doc and examples
      4b1481f [MechCoder] Some changes and tests
      d8b066a [MechCoder] [SPARK-4118] [MLlib] [PySpark] Python bindings for StreamingKMeans
      54976e55
    • Davies Liu's avatar
      [SPARK-8461] [SQL] fix codegen with REPL class loader · e41e2fd6
      Davies Liu authored
      The ExecutorClassLoader for REPL will cause Janino failed to find class for those in java.lang, so switch to use default class loader for Janino, which will also help performance.
      
      cc liancheng yhuai
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6898 from davies/fix_class_loader and squashes the following commits:
      
      24276d4 [Davies Liu] add regression test
      4ff0457 [Davies Liu] address comment, refactor
      7f5ffbe [Davies Liu] fix REPL class loader with codegen
      e41e2fd6
    • Liang-Chi Hsieh's avatar
      [HOTFIX] Fix scala style in DFSReadWriteTest that causes tests failed · 4a462c28
      Liang-Chi Hsieh authored
      This scala style problem causes tested failed.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6907 from viirya/hotfix_style and squashes the following commits:
      
      c53f188 [Liang-Chi Hsieh] Fix scala style.
      4a462c28
    • Yin Huai's avatar
      [SPARK-8368] [SPARK-8058] [SQL] HiveContext may override the context class... · c5876e52
      Yin Huai authored
      [SPARK-8368] [SPARK-8058] [SQL] HiveContext may override the context class loader of the current thread
      
      https://issues.apache.org/jira/browse/SPARK-8368
      
      Also, I add tests according https://issues.apache.org/jira/browse/SPARK-8058.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6891 from yhuai/SPARK-8368 and squashes the following commits:
      
      37bb3db [Yin Huai] Update test timeout and comment.
      8762eec [Yin Huai] Style.
      695cd2d [Yin Huai] Correctly set the class loader in the conf of the state in client wrapper.
      b3378fe [Yin Huai] Failed tests.
      c5876e52
    • Sean Owen's avatar
      [SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps... · 4be53d03
      Sean Owen authored
      [SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps to preserve shuffle files
      
      Clarify what may cause long-running Spark apps to preserve shuffle files
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6901 from srowen/SPARK-5836 and squashes the following commits:
      
      a9faef0 [Sean Owen] Clarify what may cause long-running Spark apps to preserve shuffle files
      4be53d03
    • Andrew Or's avatar
      [SPARK-8451] [SPARK-7287] SparkSubmitSuite should check exit code · 68a2dca2
      Andrew Or authored
      This patch also reenables the tests. Now that we have access to the log4j logs it should be easier to debug the flakiness.
      
      yhuai brkyvz
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6886 from andrewor14/spark-submit-suite-fix and squashes the following commits:
      
      3f99ff1 [Andrew Or] Move destroy to finally block
      9a62188 [Andrew Or] Re-enable ignored tests
      2382672 [Andrew Or] Check for exit code
      68a2dca2
    • Tathagata Das's avatar
      [SPARK-7180] [SPARK-8090] [SPARK-8091] Fix a number of SerializationDebugger bugs and limitations · 866816eb
      Tathagata Das authored
      This PR solves three SerializationDebugger issues.
      * SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException
      * SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly
      * SPARK-8091 - SerializationDebugger does not handle classes with writeObject method
      
      The solutions for each are explained as follows
      * SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested.
      * SPARK-8090 - Test the type of the replaced object.
      * SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual.
      
      I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6625 from tdas/SPARK-7180 and squashes the following commits:
      
      c7cb046 [Tathagata Das] Addressed comments on docs
      ae212c8 [Tathagata Das] Improved docs
      304c97b [Tathagata Das] Fixed build error
      26b5179 [Tathagata Das] more tests.....92% line coverage
      7e2fdcf [Tathagata Das] Added more tests
      d1967fb [Tathagata Das] Added comments.
      da75d34 [Tathagata Das] Removed unnecessary lines.
      50a608d [Tathagata Das] Fixed bugs and added support for writeObject
      866816eb
    • RJ Nowling's avatar
      Add example that reads a local file, writes to a DFS path provided by th... · a9858036
      RJ Nowling authored
      ...e user, reads the file back from the DFS, and compares word counts on the local and DFS versions. Useful for verifying DFS correctness.
      
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #3347 from rnowling/dfs_read_write_test and squashes the following commits:
      
      af8ccb7 [RJ Nowling] Don't use java.io.File since DFS may not be POSIX-compatible
      b0ef9ea [RJ Nowling] Fix string style
      07c6132 [RJ Nowling] Fix string style
      7d9a8df [RJ Nowling] Fix string style
      f74c160 [RJ Nowling] Fix else statement style
      b9edf12 [RJ Nowling] Fix spark wc style
      44415b9 [RJ Nowling] Fix local wc style
      94a4691 [RJ Nowling] Fix space
      df59b65 [RJ Nowling] Fix if statements
      1b314f0 [RJ Nowling] Add scaladoc
      a931d70 [RJ Nowling] Fix import order
      0c89558 [RJ Nowling] Add example that reads a local file, writes to a DFS path provided by the user, reads the file back from the DFS, and compares word counts on the local and DFS versions. Useful for verifying DFS correctness.
      a9858036
    • Shilei's avatar
      [SPARK-8234][SQL] misc function: md5 · 0c32fc12
      Shilei authored
      Author: Shilei <shilei.qian@intel.com>
      
      Closes #6779 from qiansl127/MD5 and squashes the following commits:
      
      11fcdb2 [Shilei] Fix the indent
      04bd27b [Shilei] Add codegen
      da60eb3 [Shilei] Remove checkInputDataTypes function
      9509ad0 [Shilei] Format code
      12c61f4 [Shilei] Accept only BinaryType for Md5
      1df0b5b [Shilei] format to scala type
      60ccde1 [Shilei] Add more test case
      b8c73b4 [Shilei] Rewrite the type check for Md5
      c166167 [Shilei] Add md5 function
      0c32fc12
    • Takuya UESHIN's avatar
      [SPARK-8476] [CORE] Setters inc/decDiskBytesSpilled in TaskMetrics should also be private. · fe08561e
      Takuya UESHIN authored
      This is a follow-up of [SPARK-3288](https://issues.apache.org/jira/browse/SPARK-3288).
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #6896 from ueshin/issues/SPARK-8476 and squashes the following commits:
      
      89251d8 [Takuya UESHIN] Make inc/decDiskBytesSpilled in TaskMetrics private[spark].
      fe08561e
    • Lianhui Wang's avatar
      [SPARK-8430] ExternalShuffleBlockResolver of shuffle service should support UnsafeShuffleManager · 9baf0930
      Lianhui Wang authored
      andrewor14 can you take a look?thanks
      
      Author: Lianhui Wang <lianhuiwang09@gmail.com>
      
      Closes #6873 from lianhuiwang/SPARK-8430 and squashes the following commits:
      
      51c47ca [Lianhui Wang] update andrewor's comments
      2b27b19 [Lianhui Wang] support UnsafeShuffleManager
      9baf0930
    • Liang-Chi Hsieh's avatar
      [SPARK-8207] [SQL] Add math function bin · 2c59d5c1
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8207
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6721 from viirya/expr_bin and squashes the following commits:
      
      07e1c8f [Liang-Chi Hsieh] Remove AbstractUnaryMathExpression and let BIN inherit UnaryExpression.
      0677f1a [Liang-Chi Hsieh] For comments.
      cf62b95 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      0cf20f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      dea9c12 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      d4f4774 [Liang-Chi Hsieh] Add @ignore_unicode_prefix.
      7a0196f [Liang-Chi Hsieh] Fix python style.
      ac2bacd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      a0a2d0f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      4cb764d [Liang-Chi Hsieh] For comments.
      0f78682 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      c0c3197 [Liang-Chi Hsieh] Add bin to FunctionRegistry.
      824f761 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      50e0c3b [Liang-Chi Hsieh] Add math function bin(a: long): string.
      2c59d5c1
    • Xiangrui Meng's avatar
      [SPARK-8151] [MLLIB] pipeline components should correctly implement copy · 43c7ec63
      Xiangrui Meng authored
      Otherwise, extra params get ignored in `PipelineModel.transform`. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6622 from mengxr/SPARK-8087 and squashes the following commits:
      
      0e4c8c4 [Xiangrui Meng] fix merge issues
      26fc1f0 [Xiangrui Meng] address comments
      e607a04 [Xiangrui Meng] merge master
      b85b57e [Xiangrui Meng] fix examples/compile
      d6f7891 [Xiangrui Meng] rename defaultCopyWithParams to defaultCopy
      84ec278 [Xiangrui Meng] remove setter checks due to generics
      2cf2ed0 [Xiangrui Meng] snapshot
      291814f [Xiangrui Meng] OneVsRest.copy
      1dfe3bd [Xiangrui Meng] PipelineModel.copy should copy stages
      43c7ec63
    • cody koeninger's avatar
      [SPARK-8389] [STREAMING] [KAFKA] Example of getting offset ranges out o… · 47af7c1e
      cody koeninger authored
      …f the existing java direct stream api
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6846 from koeninger/SPARK-8389 and squashes the following commits:
      
      3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
      47af7c1e
    • Jihong MA's avatar
      [SPARK-7265] Improving documentation for Spark SQL Hive support · ebd363ae
      Jihong MA authored
      Please review this pull request.
      
      Author: Jihong MA <linlin200605@gmail.com>
      
      Closes #5933 from JihongMA/SPARK-7265 and squashes the following commits:
      
      dfaa971 [Jihong MA] SPARK-7265 minor fix of the content
      ace454d [Jihong MA] SPARK-7265 take out PySpark on YARN limitation
      9ea0832 [Jihong MA] Merge remote-tracking branch 'upstream/master'
      d5bf3f5 [Jihong MA] Merge remote-tracking branch 'upstream/master'
      7b842e6 [Jihong MA] Merge remote-tracking branch 'upstream/master'
      9c84695 [Jihong MA] SPARK-7265 address review comment
      a399aa6 [Jihong MA] SPARK-7265 Improving documentation for Spark SQL Hive support
      ebd363ae
    • zsxwing's avatar
      [SPARK-7913] [CORE] Make AppendOnlyMap use the same growth strategy of... · 93360dc3
      zsxwing authored
      [SPARK-7913] [CORE] Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message
      
      This is a follow up PR for #6456 to make AppendOnlyMap consistent with OpenHashSet.
      
      /cc srowen andrewor14
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6879 from zsxwing/append-only-map and squashes the following commits:
      
      912c0ad [zsxwing] Fix the doc
      dd4385b [zsxwing] Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message
      93360dc3
    • Carson Wang's avatar
      [SPARK-8387] [FOLLOWUP ] [WEBUI] Update driver log URL to show only 4096 bytes · 54557f35
      Carson Wang authored
      This is to follow up #6834 , update the driver log URL as well for consistency.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #6878 from carsonwang/logUrl and squashes the following commits:
      
      13be948 [Carson Wang] update log URL in YarnClusterSuite
      a0004f4 [Carson Wang] Update driver log URL to show only 4096 bytes
      54557f35
    • Kevin Conor's avatar
      [SPARK-8339] [PYSPARK] integer division for python 3 · fdf63f12
      Kevin Conor authored
      Itertools islice requires an integer for the stop argument.  Switching to integer division here prevents a ValueError when vs is evaluated above.
      
      davies
      
      This is my original work, and I license it to the project.
      
      Author: Kevin Conor <kevin@discoverybayconsulting.com>
      
      Closes #6794 from kconor/kconor-patch-1 and squashes the following commits:
      
      da5e700 [Kevin Conor] Integer division for batch size
      fdf63f12
    • Bryan Cutler's avatar
      [SPARK-8444] [STREAMING] Adding Python streaming example for queueStream · a2016b4b
      Bryan Cutler authored
      A Python example similar to the existing one for Scala.
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #6884 from BryanCutler/streaming-queueStream-example-8444 and squashes the following commits:
      
      435ba7e [Bryan Cutler] [SPARK-8444] Fixed style checks, increased sleep time to show empty queue
      257abb0 [Bryan Cutler] [SPARK-8444] Stop context gracefully, Removed unused import, Added description comment
      376ef6e [Bryan Cutler] [SPARK-8444] Fixed bug causing DStream.pprint to append empty parenthesis to output instead of blank line
      1ff5f8b [Bryan Cutler] [SPARK-8444] Adding Python streaming example for queue_stream
      a2016b4b
    • Yu ISHIKAWA's avatar
      [SPARK-8348][SQL] Add in operator to DataFrame Column · 754929b1
      Yu ISHIKAWA authored
      I have added it for only Scala.
      
      TODO: we should also support `in` operator in Python.
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6824 from yu-iskw/SPARK-8348 and squashes the following commits:
      
      e76d02f [Yu ISHIKAWA] Not use infix notation
      6f744ac [Yu ISHIKAWA] Fit the test cases because these used the old test data set.
      00077d3 [Yu ISHIKAWA] [SPARK-8348][SQL] Add in operator to DataFrame Column
      754929b1
    • Cheng Lian's avatar
      [SPARK-8458] [SQL] Don't strip scheme part of output path when writing ORC files · a71cbbde
      Cheng Lian authored
      `Path.toUri.getPath` strips scheme part of output path (from `file:///foo` to `/foo`), which causes ORC data source only writes to the file system configured in Hadoop configuration. Should use `Path.toString` instead.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6892 from liancheng/spark-8458 and squashes the following commits:
      
      87f8199 [Cheng Lian] Don't strip scheme of output path when writing ORC files
      a71cbbde
  3. Jun 18, 2015
    • Dibyendu Bhattacharya's avatar
      [SPARK-8080] [STREAMING] Receiver.store with Iterator does not give correct count at Spark UI · 3eaed876
      Dibyendu Bhattacharya authored
      tdas  zsxwing this is the new PR for Spark-8080
      
      I have merged https://github.com/apache/spark/pull/6659
      
      Also to mention , for MEMORY_ONLY settings , when Block is not able to unrollSafely to memory if enough space is not there, BlockManager won't try to put the block and ReceivedBlockHandler will throw SparkException as it could not find the block id in PutResult. Thus number of records in block won't be counted if Block failed to unroll in memory. Which is fine.
      
      For MEMORY_DISK settings , if BlockManager not able to unroll block to memory, block will still get deseralized to Disk. Same for WAL based store. So for those cases ( storage level = memory + disk )  number of records will be counted even though the block not able to unroll to memory.
      
      thus I added the isFullyConsumed in the CountingIterator but have not used it as such case will never happen that block not fully consumed and ReceivedBlockHandler still get the block ID.
      
      I have added few test cases to cover those block unrolling scenarios also.
      
      Author: Dibyendu Bhattacharya <dibyendu.bhattacharya1@pearson.com>
      Author: U-PEROOT\UBHATD1 <UBHATD1@PIN-L-PI046.PEROOT.com>
      
      Closes #6707 from dibbhatt/master and squashes the following commits:
      
      f6cb6b5 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      f37cfd8 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      5a8344a [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI Count ByteBufferBlock as 1 count
      fceac72 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      0153e7e [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI Fixed comments given by @zsxwing
      4c5931d [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      01e6dc8 [U-PEROOT\UBHATD1] A
      3eaed876
    • Lars Francke's avatar
      [SPARK-8462] [DOCS] Documentation fixes for Spark SQL · 4ce3bab8
      Lars Francke authored
      This fixes various minor documentation issues on the Spark SQL page
      
      Author: Lars Francke <lars.francke@gmail.com>
      
      Closes #6890 from lfrancke/SPARK-8462 and squashes the following commits:
      
      dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
      34eff2c [Lars Francke] Minor documentation fixes
      4ce3bab8
    • Sandy Ryza's avatar
      [SPARK-8135] Don't load defaults when reconstituting Hadoop Configurations · 43f50dec
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #6679 from sryza/sandy-spark-8135 and squashes the following commits:
      
      c5554ff [Sandy Ryza] SPARK-8135. In SerializableWritable, don't load defaults when instantiating Configuration
      43f50dec
    • Reynold Xin's avatar
      [SPARK-8218][SQL] Binary log math function update. · dc413138
      Reynold Xin authored
      Some minor updates based on after merging #6725.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6871 from rxin/log and squashes the following commits:
      
      ab51542 [Reynold Xin] Use JVM log
      76fc8de [Reynold Xin] Fixed arg.
      a7c1522 [Reynold Xin] [SPARK-8218][SQL] Binary log math function update.
      dc413138
    • Josh Rosen's avatar
      [SPARK-8446] [SQL] Add helper functions for testing SparkPlan physical operators · 207a98ca
      Josh Rosen authored
      This patch introduces `SparkPlanTest`, a base class for unit tests of SparkPlan physical operators.  This is analogous to Spark SQL's existing `QueryTest`, which does something similar for end-to-end tests with actual queries.
      
      These helper methods provide nicer error output when tests fail and help developers to avoid writing lots of boilerplate in order to execute manually constructed physical plans.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6885 from JoshRosen/spark-plan-test and squashes the following commits:
      
      f8ce275 [Josh Rosen] Fix some IntelliJ inspections and delete some dead code
      84214be [Josh Rosen] Add an extra column which isn't part of the sort
      ae1896b [Josh Rosen] Provide implicits automatically
      a80f9b0 [Josh Rosen] Merge pull request #4 from marmbrus/pr/6885
      d9ab1e4 [Michael Armbrust] Add simple resolver
      c60a44d [Josh Rosen] Manually bind references
      996332a [Josh Rosen] Add types so that tests compile
      a46144a [Josh Rosen] WIP
      207a98ca
    • zsxwing's avatar
      [SPARK-8376] [DOCS] Add common lang3 to the Spark Flume Sink doc · 24e53793
      zsxwing authored
      Commons Lang 3 has been added as one of the dependencies of Spark Flume Sink since #5703. This PR updates the doc for it.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6829 from zsxwing/flume-sink-dep and squashes the following commits:
      
      f8617f0 [zsxwing] Add common lang3 to the Spark Flume Sink doc
      24e53793
    • Josh Rosen's avatar
      [SPARK-8353] [DOCS] Show anchor links when hovering over documentation headers · 44c931f0
      Josh Rosen authored
      This patch uses [AnchorJS](https://bryanbraun.github.io/anchorjs/) to show deep anchor links when hovering over headers in the Spark documentation. For example:
      
      ![image](https://cloud.githubusercontent.com/assets/50748/8240800/1502f85c-15ba-11e5-819a-97b231370a39.png)
      
      This makes it easier for users to link to specific sections of the documentation.
      
      I also removed some dead Javascript which isn't used in our current docs (it was introduced for the old AMPCamp training, but isn't used anymore).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits:
      
      e59d8a7 [Josh Rosen] Suppress underline on hover
      f518b6a [Josh Rosen] Turn on for all headers, since we use H1s in a bunch of places
      a9fec01 [Josh Rosen] Add anchor links when hovering over headers; remove some dead JS code
      44c931f0
    • Davies Liu's avatar
      [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark · 9b200272
      Davies Liu authored
      The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
      Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.
      
      cc JoshRosen rxin angelini
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6714 from davies/batch_size and squashes the following commits:
      
      b170dfb [Davies Liu] update test
      b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
      6ade745 [Davies Liu] update test
      5c21777 [Davies Liu] Update shuffle.py
      e746aec [Davies Liu] fix batch size during sort
      9b200272
    • Liang-Chi Hsieh's avatar
      [SPARK-8363][SQL] Move sqrt to math and extend UnaryMathExpression · 31641128
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8363
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6823 from viirya/move_sqrt and squashes the following commits:
      
      8977e11 [Liang-Chi Hsieh] Remove unnecessary old tests.
      d23e79e [Liang-Chi Hsieh] Explicitly indicate sqrt value sequence.
      699f48b [Liang-Chi Hsieh] Use correct @since tag.
      8dff6d1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into move_sqrt
      bc2ed77 [Liang-Chi Hsieh] Remove/move arithmetic expression test and expression type checking test. Remove unnecessary Sqrt type rule.
      d38492f [Liang-Chi Hsieh] Now sqrt accepts boolean because type casting is handled by HiveTypeCoercion.
      297cc90 [Liang-Chi Hsieh] Sqrt only accepts double input.
      ef4a21a [Liang-Chi Hsieh] Move sqrt to math.
      31641128
Loading