Skip to content
Snippets Groups Projects
  1. Jun 21, 2015
    • Joseph K. Bradley's avatar
      [SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4 · a1894422
      Joseph K. Bradley authored
      Reorganized docs a bit.  Added migration guides.
      
      **Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```?  It would be a lot.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits:
      
      4bf26d6 [Joseph K. Bradley] tiny fix
      8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR
      6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4
      a1894422
    • Cheng Lian's avatar
      [SPARK-8508] [SQL] Ignores a test case to cleanup unnecessary testing output until #6882 is merged · 83cdfd84
      Cheng Lian authored
      Currently [the test case for SPARK-7862] [1] writes 100,000 lines of integer triples to stderr and makes Jenkins build output unnecessarily large and it's hard to debug other build errors. A proper fix is on the way in #6882. This PR ignores this test case temporarily until #6882 is merged.
      
      [1]: https://github.com/apache/spark/pull/6404/files#diff-1ea02a6fab84e938582f7f87cc4d9ea1R641
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6925 from liancheng/spark-8508 and squashes the following commits:
      
      41e5b47 [Cheng Lian] Ignores the test case until #6882 is merged
      83cdfd84
    • Yanbo Liang's avatar
      [SPARK-7604] [MLLIB] Python API for PCA and PCAModel · 32e3cdaa
      Yanbo Liang authored
      Python API for PCA and PCAModel
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6315 from yanboliang/spark-7604 and squashes the following commits:
      
      1d58734 [Yanbo Liang] remove transform() in PCAModel, use default behavior
      4d9d121 [Yanbo Liang] Python API for PCA and PCAModel
      32e3cdaa
    • jeanlyn's avatar
      [SPARK-8379] [SQL] avoid speculative tasks write to the same file · a1e3649c
      jeanlyn authored
      The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
      Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
      ```
      org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
      Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
      owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
      but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
      ```
      This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file
      
      Author: jeanlyn <jeanlyn92@gmail.com>
      
      Closes #6833 from jeanlyn/speculation and squashes the following commits:
      
      64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
      8860af0 [jeanlyn] remove the never using code
      e19a3bd [jeanlyn] avoid speculative tasks write same file
      a1e3649c
  2. Jun 20, 2015
    • Tarek Auel's avatar
      [SPARK-8301] [SQL] Improve UTF8String substring/startsWith/endsWith/contains performance · 41ab2853
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8301
      
      Added the private method startsWith(prefix, offset) to implement startsWith, endsWith and contains without copying the array
      
      I hope that the component SQL is still correct. I copied it from the Jira ticket.
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      Author: Tarek Auel <tarek.auel@gmail.com>
      
      Closes #6804 from tarekauel/SPARK-8301 and squashes the following commits:
      
      f5d6b9a [Tarek Auel] fixed parentheses and annotation
      6d7b068 [Tarek Auel] [SPARK-8301] removed null checks
      9ca0473 [Tarek Auel] [SPARK-8301] removed null checks
      1c327eb [Tarek Auel] [SPARK-8301] removed new
      9f17cc8 [Tarek Auel] [SPARK-8301] fixed conversion byte to string in codegen
      3a0040f [Tarek Auel] [SPARK-8301] changed call of UTF8String.set to UTF8String.from
      e4530d2 [Tarek Auel] [SPARK-8301] changed call of UTF8String.set to UTF8String.from
      a5f853a [Tarek Auel] [SPARK-8301] changed visibility of set to protected. Changed annotation of bytes from Nullable to Nonnull
      d2fb05f [Tarek Auel] [SPARK-8301] added additional null checks
      79cb55b [Tarek Auel] [SPARK-8301] null check. Added test cases for null check.
      b17909e [Tarek Auel] [SPARK-8301] removed unnecessary copying of UTF8String. Added a private function startsWith(prefix, offset) to implement the check for startsWith, endsWith and contains.
      41ab2853
    • Yu ISHIKAWA's avatar
      [SPARK-8495] [SPARKR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script · 004f5737
      Yu ISHIKAWA authored
      Thank Shivaram Venkataraman for your support. This is a prototype script to validate the R files.
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6922 from yu-iskw/SPARK-6813 and squashes the following commits:
      
      c1ffe6b [Yu ISHIKAWA] Modify to save result to a log file and add a rule to validate
      5520806 [Yu ISHIKAWA] Exclude the .lintr file not to check Apache lincence
      8f94680 [Yu ISHIKAWA] [SPARK-8495][SparkR] Add a `.lintr` file to validate the SparkR files and the `lint-r` script
      004f5737
    • Josh Rosen's avatar
      [SPARK-8422] [BUILD] [PROJECT INFRA] Add a module abstraction to dev/run-tests · 7a3c424e
      Josh Rosen authored
      This patch builds upon #5694 to add a 'module' abstraction to the `dev/run-tests` script which groups together the per-module test logic, including the mapping from file paths to modules, the mapping from modules to test goals and build profiles, and the dependencies / relationships between modules.
      
      This refactoring makes it much easier to increase the granularity of test modules, which will let us skip even more tests.  It's also a prerequisite for other changes that will reduce test time, such as running subsets of the Python tests based on which files / modules have changed.
      
      This patch also adds doctests for the new graph traversal / change mapping code.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6866 from JoshRosen/more-dev-run-tests-refactoring and squashes the following commits:
      
      75de450 [Josh Rosen] Use module system to determine which build profiles to enable.
      4224da5 [Josh Rosen] Add documentation to Module.
      a86a953 [Josh Rosen] Clean up modules; add new modules for streaming external projects
      e46539f [Josh Rosen] Fix camel-cased endswith()
      35a3052 [Josh Rosen] Enable Hive tests when running all tests
      df10e23 [Josh Rosen] update to reflect fact that no module depends on root
      3670d50 [Josh Rosen] mllib should depend on streaming
      dc6f1c6 [Josh Rosen] Use changed files' extensions to decide whether to run style checks
      7092d3e [Josh Rosen] Skip SBT tests if no test goals are specified
      43a0ced [Josh Rosen] Minor fixes
      3371441 [Josh Rosen] Test everything if nothing has changed (needed for non-PRB builds)
      37f3fb3 [Josh Rosen] Remove doc profiles option, since it's not actually needed (see #6865)
      f53864b [Josh Rosen] Finish integrating module changes
      f0249bd [Josh Rosen] WIP
      7a3c424e
    • Liang-Chi Hsieh's avatar
      [SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to... · 0b899516
      Liang-Chi Hsieh authored
      [SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-8468
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6905 from viirya/cv_min and squashes the following commits:
      
      930d3db [Liang-Chi Hsieh] Fix python unit test and add document.
      d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min
      16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal.
      c3dd8d9 [Liang-Chi Hsieh] For comments.
      b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.
      0b899516
  3. Jun 19, 2015
    • cody koeninger's avatar
      [SPARK-8127] [STREAMING] [KAFKA] KafkaRDD optimize count() take() isEmpty() · 1b6fe9b1
      cody koeninger authored
      …ed KafkaRDD methods.  Possible fix for [SPARK-7122], but probably a worthwhile optimization regardless.
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6632 from koeninger/kafka-rdd-count and squashes the following commits:
      
      321340d [cody koeninger] [SPARK-8127][Streaming][Kafka] additional test of ordering of take()
      5a05d0f [cody koeninger] [SPARK-8127][Streaming][Kafka] additional test of isEmpty
      f68bd32 [cody koeninger] [Streaming][Kafka][SPARK-8127] code cleanup
      9555b73 [cody koeninger] Merge branch 'master' into kafka-rdd-count
      253031d [cody koeninger] [Streaming][Kafka][SPARK-8127] mima exclusion for change to private method
      8974b9e [cody koeninger] [Streaming][Kafka][SPARK-8127] check offset ranges before constructing KafkaRDD
      c3768c5 [cody koeninger] [Streaming][Kafka] Take advantage of offset range info for size-related KafkaRDD methods.  Possible fix for [SPARK-7122], but probably a worthwhile optimization regardless.
      1b6fe9b1
    • Andrew Or's avatar
      [HOTFIX] [SPARK-8489] Correct JIRA number in previous commit · bec40e52
      Andrew Or authored
      It should be SPARK-8489, not SPARK-8498.
      bec40e52
    • Andrew Or's avatar
      [SPARK-8498] [SQL] Add regression test for SPARK-8470 · 093c3483
      Andrew Or authored
      **Summary of the problem in SPARK-8470.** When using `HiveContext` to create a data frame of a user case class, Spark throws `scala.reflect.internal.MissingRequirementError` when it tries to infer the schema using reflection. This is caused by `HiveContext` silently overwriting the context class loader containing the user classes.
      
      **What this issue is about.** This issue adds regression tests for SPARK-8470, which is already fixed in #6891. We closed SPARK-8470 as a duplicate because it is a different manifestation of the same problem in SPARK-8368. Due to the complexity of the reproduction, this requires us to pre-package a special test jar and include it in the Spark project itself.
      
      I tested this with and without the fix in #6891 and verified that it passes only if the fix is present.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6909 from andrewor14/SPARK-8498 and squashes the following commits:
      
      5e9d688 [Andrew Or] Add regression test for SPARK-8470
      093c3483
    • cody koeninger's avatar
      [SPARK-8390] [STREAMING] [KAFKA] fix docs related to HasOffsetRanges · b305e377
      cody koeninger authored
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6863 from koeninger/SPARK-8390 and squashes the following commits:
      
      26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390
      3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs
      b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing
      bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup
      3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
      b305e377
    • Michael Armbrust's avatar
      [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings · a333a72e
      Michael Armbrust authored
      In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`.  This allowed comparing a timestamp with a partial date as a user would expect.
       - `time > "2014-06-10"`
       - `time > "2014"`
      
      In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.
      
      This PR restores the earlier behavior.  Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6888 from marmbrus/timeCompareString and squashes the following commits:
      
      bdef29c [Michael Armbrust] test partial date
      1f09adf [Michael Armbrust] special handling of equality
      1172c60 [Michael Armbrust] more test fixing
      4dfc412 [Michael Armbrust] fix tests
      aaa9508 [Michael Armbrust] newline
      04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings
      a333a72e
    • Nathan Howell's avatar
      [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents · 9814b971
      Nathan Howell authored
      Author: Nathan Howell <nhowell@godaddy.com>
      
      Closes #6799 from NathanHowell/spark-8093 and squashes the following commits:
      
      76ac3e8 [Nathan Howell] [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents
      9814b971
    • Hossein's avatar
      [SPARK-8452] [SPARKR] expose jobGroup API in SparkR · 1fa29c2d
      Hossein authored
      This pull request adds following methods to SparkR:
      
      ```R
      setJobGroup()
      cancelJobGroup()
      clearJobGroup()
      ```
      For each method, the spark context is passed as the first argument. There does not seem to be a good way to test these in R.
      
      cc shivaram and davies
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #6889 from falaki/SPARK-8452 and squashes the following commits:
      
      9ce9f1e [Hossein] Added basic tests to verify methods can be called and won't throw errors
      c706af9 [Hossein] Added examples
      a2c19af [Hossein] taking spark context as first argument
      343ca77 [Hossein] Added setJobGroup, cancelJobGroup and clearJobGroup to SparkR
      1fa29c2d
    • MechCoder's avatar
      [SPARK-4118] [MLLIB] [PYSPARK] Python bindings for StreamingKMeans · 54976e55
      MechCoder authored
      Python bindings for StreamingKMeans
      
      Will change status to MRG once docs, tests and examples are updated.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6499 from MechCoder/spark-4118 and squashes the following commits:
      
      7722d16 [MechCoder] minor style fixes
      51052d3 [MechCoder] Doc fixes
      2061a76 [MechCoder] Add tests for simultaneous training and prediction Minor style fixes
      81482fd [MechCoder] minor
      5d9fe61 [MechCoder] predictOn should take into account the latest model
      8ab9e89 [MechCoder] Fix Python3 error
      a9817df [MechCoder] Better tests and minor fixes
      c80e451 [MechCoder] Add ignore_unicode_prefix
      ee8ce16 [MechCoder] Update tests, doc and examples
      4b1481f [MechCoder] Some changes and tests
      d8b066a [MechCoder] [SPARK-4118] [MLlib] [PySpark] Python bindings for StreamingKMeans
      54976e55
    • Davies Liu's avatar
      [SPARK-8461] [SQL] fix codegen with REPL class loader · e41e2fd6
      Davies Liu authored
      The ExecutorClassLoader for REPL will cause Janino failed to find class for those in java.lang, so switch to use default class loader for Janino, which will also help performance.
      
      cc liancheng yhuai
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6898 from davies/fix_class_loader and squashes the following commits:
      
      24276d4 [Davies Liu] add regression test
      4ff0457 [Davies Liu] address comment, refactor
      7f5ffbe [Davies Liu] fix REPL class loader with codegen
      e41e2fd6
    • Liang-Chi Hsieh's avatar
      [HOTFIX] Fix scala style in DFSReadWriteTest that causes tests failed · 4a462c28
      Liang-Chi Hsieh authored
      This scala style problem causes tested failed.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6907 from viirya/hotfix_style and squashes the following commits:
      
      c53f188 [Liang-Chi Hsieh] Fix scala style.
      4a462c28
    • Yin Huai's avatar
      [SPARK-8368] [SPARK-8058] [SQL] HiveContext may override the context class... · c5876e52
      Yin Huai authored
      [SPARK-8368] [SPARK-8058] [SQL] HiveContext may override the context class loader of the current thread
      
      https://issues.apache.org/jira/browse/SPARK-8368
      
      Also, I add tests according https://issues.apache.org/jira/browse/SPARK-8058.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6891 from yhuai/SPARK-8368 and squashes the following commits:
      
      37bb3db [Yin Huai] Update test timeout and comment.
      8762eec [Yin Huai] Style.
      695cd2d [Yin Huai] Correctly set the class loader in the conf of the state in client wrapper.
      b3378fe [Yin Huai] Failed tests.
      c5876e52
    • Sean Owen's avatar
      [SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps... · 4be53d03
      Sean Owen authored
      [SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps to preserve shuffle files
      
      Clarify what may cause long-running Spark apps to preserve shuffle files
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6901 from srowen/SPARK-5836 and squashes the following commits:
      
      a9faef0 [Sean Owen] Clarify what may cause long-running Spark apps to preserve shuffle files
      4be53d03
    • Andrew Or's avatar
      [SPARK-8451] [SPARK-7287] SparkSubmitSuite should check exit code · 68a2dca2
      Andrew Or authored
      This patch also reenables the tests. Now that we have access to the log4j logs it should be easier to debug the flakiness.
      
      yhuai brkyvz
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6886 from andrewor14/spark-submit-suite-fix and squashes the following commits:
      
      3f99ff1 [Andrew Or] Move destroy to finally block
      9a62188 [Andrew Or] Re-enable ignored tests
      2382672 [Andrew Or] Check for exit code
      68a2dca2
    • Tathagata Das's avatar
      [SPARK-7180] [SPARK-8090] [SPARK-8091] Fix a number of SerializationDebugger bugs and limitations · 866816eb
      Tathagata Das authored
      This PR solves three SerializationDebugger issues.
      * SPARK-7180 - SerializationDebugger fails with ArrayOutOfBoundsException
      * SPARK-8090 - SerializationDebugger does not handle classes with writeReplace correctly
      * SPARK-8091 - SerializationDebugger does not handle classes with writeObject method
      
      The solutions for each are explained as follows
      * SPARK-7180 - The wrong slot desc was used for getting the value of the fields in the object being tested.
      * SPARK-8090 - Test the type of the replaced object.
      * SPARK-8091 - Use a dummy ObjectOutputStream to collect all the objects written by the writeObject() method, and then test those objects as usual.
      
      I also added more tests in the testsuite to increase code coverage. For example, added tests for cases where there are not serializability issues.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6625 from tdas/SPARK-7180 and squashes the following commits:
      
      c7cb046 [Tathagata Das] Addressed comments on docs
      ae212c8 [Tathagata Das] Improved docs
      304c97b [Tathagata Das] Fixed build error
      26b5179 [Tathagata Das] more tests.....92% line coverage
      7e2fdcf [Tathagata Das] Added more tests
      d1967fb [Tathagata Das] Added comments.
      da75d34 [Tathagata Das] Removed unnecessary lines.
      50a608d [Tathagata Das] Fixed bugs and added support for writeObject
      866816eb
    • RJ Nowling's avatar
      Add example that reads a local file, writes to a DFS path provided by th... · a9858036
      RJ Nowling authored
      ...e user, reads the file back from the DFS, and compares word counts on the local and DFS versions. Useful for verifying DFS correctness.
      
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #3347 from rnowling/dfs_read_write_test and squashes the following commits:
      
      af8ccb7 [RJ Nowling] Don't use java.io.File since DFS may not be POSIX-compatible
      b0ef9ea [RJ Nowling] Fix string style
      07c6132 [RJ Nowling] Fix string style
      7d9a8df [RJ Nowling] Fix string style
      f74c160 [RJ Nowling] Fix else statement style
      b9edf12 [RJ Nowling] Fix spark wc style
      44415b9 [RJ Nowling] Fix local wc style
      94a4691 [RJ Nowling] Fix space
      df59b65 [RJ Nowling] Fix if statements
      1b314f0 [RJ Nowling] Add scaladoc
      a931d70 [RJ Nowling] Fix import order
      0c89558 [RJ Nowling] Add example that reads a local file, writes to a DFS path provided by the user, reads the file back from the DFS, and compares word counts on the local and DFS versions. Useful for verifying DFS correctness.
      a9858036
    • Shilei's avatar
      [SPARK-8234][SQL] misc function: md5 · 0c32fc12
      Shilei authored
      Author: Shilei <shilei.qian@intel.com>
      
      Closes #6779 from qiansl127/MD5 and squashes the following commits:
      
      11fcdb2 [Shilei] Fix the indent
      04bd27b [Shilei] Add codegen
      da60eb3 [Shilei] Remove checkInputDataTypes function
      9509ad0 [Shilei] Format code
      12c61f4 [Shilei] Accept only BinaryType for Md5
      1df0b5b [Shilei] format to scala type
      60ccde1 [Shilei] Add more test case
      b8c73b4 [Shilei] Rewrite the type check for Md5
      c166167 [Shilei] Add md5 function
      0c32fc12
    • Takuya UESHIN's avatar
      [SPARK-8476] [CORE] Setters inc/decDiskBytesSpilled in TaskMetrics should also be private. · fe08561e
      Takuya UESHIN authored
      This is a follow-up of [SPARK-3288](https://issues.apache.org/jira/browse/SPARK-3288).
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #6896 from ueshin/issues/SPARK-8476 and squashes the following commits:
      
      89251d8 [Takuya UESHIN] Make inc/decDiskBytesSpilled in TaskMetrics private[spark].
      fe08561e
    • Lianhui Wang's avatar
      [SPARK-8430] ExternalShuffleBlockResolver of shuffle service should support UnsafeShuffleManager · 9baf0930
      Lianhui Wang authored
      andrewor14 can you take a look?thanks
      
      Author: Lianhui Wang <lianhuiwang09@gmail.com>
      
      Closes #6873 from lianhuiwang/SPARK-8430 and squashes the following commits:
      
      51c47ca [Lianhui Wang] update andrewor's comments
      2b27b19 [Lianhui Wang] support UnsafeShuffleManager
      9baf0930
    • Liang-Chi Hsieh's avatar
      [SPARK-8207] [SQL] Add math function bin · 2c59d5c1
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8207
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6721 from viirya/expr_bin and squashes the following commits:
      
      07e1c8f [Liang-Chi Hsieh] Remove AbstractUnaryMathExpression and let BIN inherit UnaryExpression.
      0677f1a [Liang-Chi Hsieh] For comments.
      cf62b95 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      0cf20f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      dea9c12 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      d4f4774 [Liang-Chi Hsieh] Add @ignore_unicode_prefix.
      7a0196f [Liang-Chi Hsieh] Fix python style.
      ac2bacd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      a0a2d0f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      4cb764d [Liang-Chi Hsieh] For comments.
      0f78682 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      c0c3197 [Liang-Chi Hsieh] Add bin to FunctionRegistry.
      824f761 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      50e0c3b [Liang-Chi Hsieh] Add math function bin(a: long): string.
      2c59d5c1
    • Xiangrui Meng's avatar
      [SPARK-8151] [MLLIB] pipeline components should correctly implement copy · 43c7ec63
      Xiangrui Meng authored
      Otherwise, extra params get ignored in `PipelineModel.transform`. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6622 from mengxr/SPARK-8087 and squashes the following commits:
      
      0e4c8c4 [Xiangrui Meng] fix merge issues
      26fc1f0 [Xiangrui Meng] address comments
      e607a04 [Xiangrui Meng] merge master
      b85b57e [Xiangrui Meng] fix examples/compile
      d6f7891 [Xiangrui Meng] rename defaultCopyWithParams to defaultCopy
      84ec278 [Xiangrui Meng] remove setter checks due to generics
      2cf2ed0 [Xiangrui Meng] snapshot
      291814f [Xiangrui Meng] OneVsRest.copy
      1dfe3bd [Xiangrui Meng] PipelineModel.copy should copy stages
      43c7ec63
    • cody koeninger's avatar
      [SPARK-8389] [STREAMING] [KAFKA] Example of getting offset ranges out o… · 47af7c1e
      cody koeninger authored
      …f the existing java direct stream api
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #6846 from koeninger/SPARK-8389 and squashes the following commits:
      
      3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
      47af7c1e
    • Jihong MA's avatar
      [SPARK-7265] Improving documentation for Spark SQL Hive support · ebd363ae
      Jihong MA authored
      Please review this pull request.
      
      Author: Jihong MA <linlin200605@gmail.com>
      
      Closes #5933 from JihongMA/SPARK-7265 and squashes the following commits:
      
      dfaa971 [Jihong MA] SPARK-7265 minor fix of the content
      ace454d [Jihong MA] SPARK-7265 take out PySpark on YARN limitation
      9ea0832 [Jihong MA] Merge remote-tracking branch 'upstream/master'
      d5bf3f5 [Jihong MA] Merge remote-tracking branch 'upstream/master'
      7b842e6 [Jihong MA] Merge remote-tracking branch 'upstream/master'
      9c84695 [Jihong MA] SPARK-7265 address review comment
      a399aa6 [Jihong MA] SPARK-7265 Improving documentation for Spark SQL Hive support
      ebd363ae
    • zsxwing's avatar
      [SPARK-7913] [CORE] Make AppendOnlyMap use the same growth strategy of... · 93360dc3
      zsxwing authored
      [SPARK-7913] [CORE] Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message
      
      This is a follow up PR for #6456 to make AppendOnlyMap consistent with OpenHashSet.
      
      /cc srowen andrewor14
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6879 from zsxwing/append-only-map and squashes the following commits:
      
      912c0ad [zsxwing] Fix the doc
      dd4385b [zsxwing] Make AppendOnlyMap use the same growth strategy of OpenHashSet and consistent exception message
      93360dc3
    • Carson Wang's avatar
      [SPARK-8387] [FOLLOWUP ] [WEBUI] Update driver log URL to show only 4096 bytes · 54557f35
      Carson Wang authored
      This is to follow up #6834 , update the driver log URL as well for consistency.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #6878 from carsonwang/logUrl and squashes the following commits:
      
      13be948 [Carson Wang] update log URL in YarnClusterSuite
      a0004f4 [Carson Wang] Update driver log URL to show only 4096 bytes
      54557f35
    • Kevin Conor's avatar
      [SPARK-8339] [PYSPARK] integer division for python 3 · fdf63f12
      Kevin Conor authored
      Itertools islice requires an integer for the stop argument.  Switching to integer division here prevents a ValueError when vs is evaluated above.
      
      davies
      
      This is my original work, and I license it to the project.
      
      Author: Kevin Conor <kevin@discoverybayconsulting.com>
      
      Closes #6794 from kconor/kconor-patch-1 and squashes the following commits:
      
      da5e700 [Kevin Conor] Integer division for batch size
      fdf63f12
    • Bryan Cutler's avatar
      [SPARK-8444] [STREAMING] Adding Python streaming example for queueStream · a2016b4b
      Bryan Cutler authored
      A Python example similar to the existing one for Scala.
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #6884 from BryanCutler/streaming-queueStream-example-8444 and squashes the following commits:
      
      435ba7e [Bryan Cutler] [SPARK-8444] Fixed style checks, increased sleep time to show empty queue
      257abb0 [Bryan Cutler] [SPARK-8444] Stop context gracefully, Removed unused import, Added description comment
      376ef6e [Bryan Cutler] [SPARK-8444] Fixed bug causing DStream.pprint to append empty parenthesis to output instead of blank line
      1ff5f8b [Bryan Cutler] [SPARK-8444] Adding Python streaming example for queue_stream
      a2016b4b
    • Yu ISHIKAWA's avatar
      [SPARK-8348][SQL] Add in operator to DataFrame Column · 754929b1
      Yu ISHIKAWA authored
      I have added it for only Scala.
      
      TODO: we should also support `in` operator in Python.
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6824 from yu-iskw/SPARK-8348 and squashes the following commits:
      
      e76d02f [Yu ISHIKAWA] Not use infix notation
      6f744ac [Yu ISHIKAWA] Fit the test cases because these used the old test data set.
      00077d3 [Yu ISHIKAWA] [SPARK-8348][SQL] Add in operator to DataFrame Column
      754929b1
    • Cheng Lian's avatar
      [SPARK-8458] [SQL] Don't strip scheme part of output path when writing ORC files · a71cbbde
      Cheng Lian authored
      `Path.toUri.getPath` strips scheme part of output path (from `file:///foo` to `/foo`), which causes ORC data source only writes to the file system configured in Hadoop configuration. Should use `Path.toString` instead.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6892 from liancheng/spark-8458 and squashes the following commits:
      
      87f8199 [Cheng Lian] Don't strip scheme of output path when writing ORC files
      a71cbbde
  4. Jun 18, 2015
    • Dibyendu Bhattacharya's avatar
      [SPARK-8080] [STREAMING] Receiver.store with Iterator does not give correct count at Spark UI · 3eaed876
      Dibyendu Bhattacharya authored
      tdas  zsxwing this is the new PR for Spark-8080
      
      I have merged https://github.com/apache/spark/pull/6659
      
      Also to mention , for MEMORY_ONLY settings , when Block is not able to unrollSafely to memory if enough space is not there, BlockManager won't try to put the block and ReceivedBlockHandler will throw SparkException as it could not find the block id in PutResult. Thus number of records in block won't be counted if Block failed to unroll in memory. Which is fine.
      
      For MEMORY_DISK settings , if BlockManager not able to unroll block to memory, block will still get deseralized to Disk. Same for WAL based store. So for those cases ( storage level = memory + disk )  number of records will be counted even though the block not able to unroll to memory.
      
      thus I added the isFullyConsumed in the CountingIterator but have not used it as such case will never happen that block not fully consumed and ReceivedBlockHandler still get the block ID.
      
      I have added few test cases to cover those block unrolling scenarios also.
      
      Author: Dibyendu Bhattacharya <dibyendu.bhattacharya1@pearson.com>
      Author: U-PEROOT\UBHATD1 <UBHATD1@PIN-L-PI046.PEROOT.com>
      
      Closes #6707 from dibbhatt/master and squashes the following commits:
      
      f6cb6b5 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      f37cfd8 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      5a8344a [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI Count ByteBufferBlock as 1 count
      fceac72 [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      0153e7e [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI Fixed comments given by @zsxwing
      4c5931d [Dibyendu Bhattacharya] [SPARK-8080][STREAMING] Receiver.store with Iterator does not give correct count at Spark UI
      01e6dc8 [U-PEROOT\UBHATD1] A
      3eaed876
    • Lars Francke's avatar
      [SPARK-8462] [DOCS] Documentation fixes for Spark SQL · 4ce3bab8
      Lars Francke authored
      This fixes various minor documentation issues on the Spark SQL page
      
      Author: Lars Francke <lars.francke@gmail.com>
      
      Closes #6890 from lfrancke/SPARK-8462 and squashes the following commits:
      
      dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
      34eff2c [Lars Francke] Minor documentation fixes
      4ce3bab8
    • Sandy Ryza's avatar
      [SPARK-8135] Don't load defaults when reconstituting Hadoop Configurations · 43f50dec
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #6679 from sryza/sandy-spark-8135 and squashes the following commits:
      
      c5554ff [Sandy Ryza] SPARK-8135. In SerializableWritable, don't load defaults when instantiating Configuration
      43f50dec
    • Reynold Xin's avatar
      [SPARK-8218][SQL] Binary log math function update. · dc413138
      Reynold Xin authored
      Some minor updates based on after merging #6725.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6871 from rxin/log and squashes the following commits:
      
      ab51542 [Reynold Xin] Use JVM log
      76fc8de [Reynold Xin] Fixed arg.
      a7c1522 [Reynold Xin] [SPARK-8218][SQL] Binary log math function update.
      dc413138
Loading