Skip to content
Snippets Groups Projects
  1. Jun 30, 2015
  2. Jun 29, 2015
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · ea775b06
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1767 (close requested by 'andrewor14')
      Closes #6952 (close requested by 'andrewor14')
      Closes #7051 (close requested by 'andrewor14')
      Closes #5357 (close requested by 'marmbrus')
      Closes #5233 (close requested by 'andrewor14')
      Closes #6930 (close requested by 'JoshRosen')
      Closes #5502 (close requested by 'andrewor14')
      Closes #6778 (close requested by 'andrewor14')
      Closes #7006 (close requested by 'andrewor14')
      ea775b06
    • Josh Rosen's avatar
      [SPARK-5161] Parallelize Python test execution · 7bbbe380
      Josh Rosen authored
      This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times.  Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism).
      
      To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7031 from JoshRosen/parallelize-python-tests and squashes the following commits:
      
      feb3763 [Josh Rosen] Re-enable other tests
      f87ea81 [Josh Rosen] Only log output from failed tests
      d4ded73 [Josh Rosen] Logging improvements
      a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests
      1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests
      110cd9d [Josh Rosen] Fix universal_newlines for Python 3
      cd13db8 [Josh Rosen] Also log python_implementation
      9e31127 [Josh Rosen] Log Python --version output for each executable.
      a2b9094 [Josh Rosen] Bump up parallelism.
      5552380 [Josh Rosen] Python 3 fix
      866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks
      87cb988 [Josh Rosen] Skip MLLib tests for PyPy
      8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure
      9129027 [Josh Rosen] Disable Spark UI in Python tests
      037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins.
      af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
      7bbbe380
    • Yanbo Liang's avatar
      [SPARK-7667] [MLLIB] MLlib Python API consistency check · f9b6bf2f
      Yanbo Liang authored
      MLlib Python API consistency check
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6856 from yanboliang/spark-7667 and squashes the following commits:
      
      21bae35 [Yanbo Liang] remove duplicate code
      eb12f95 [Yanbo Liang] fix doc inherit problem
      9e7ec3c [Yanbo Liang] address comments
      e763d32 [Yanbo Liang] MLlib Python API consistency check
      f9b6bf2f
    • Steven She's avatar
      [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7 · 4915e9e3
      Steven She authored
      Patch to fix crash with BINARY fields with ENUM original types.
      
      Author: Steven She <steven@canopylabs.com>
      
      Closes #7048 from stevencanopy/SPARK-8669 and squashes the following commits:
      
      2e72979 [Steven She] [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7
      4915e9e3
    • Burak Yavuz's avatar
      [SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab · ecacb1e8
      Burak Yavuz authored
      cc yhuai
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #7100 from brkyvz/ct-flakiness-fix and squashes the following commits:
      
      abc299a [Burak Yavuz] change 'to' to until
      7e96d7c [Burak Yavuz] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab
      ecacb1e8
    • Feynman Liang's avatar
      [SPARK-8456] [ML] Ngram featurizer python · 620605a4
      Feynman Liang authored
      Python API for N-gram feature transformer
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #6960 from feynmanliang/ngram-featurizer-python and squashes the following commits:
      
      f9e37c9 [Feynman Liang] Remove debugging code
      4dd81f4 [Feynman Liang] Fix typo and doctest
      06c79ac [Feynman Liang] Style guide
      26c1175 [Feynman Liang] Add python NGram API
      620605a4
    • Andrew Or's avatar
      Revert "[SPARK-8437] [DOCS] Using directory path without wildcard for filename... · 4c1808be
      Andrew Or authored
      Revert "[SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles"
      
      This reverts commit 5d30eae5.
      4c1808be
    • Michael Sannella x268's avatar
      [SPARK-8019] [SPARKR] Support SparkR spawning worker R processes with a command other then Rscript · 4a9e03fa
      Michael Sannella x268 authored
      This is a simple change to add a new environment variable
      "spark.sparkr.r.command" that specifies the command that SparkR will
      use when creating an R engine process.  If this is not specified,
      "Rscript" will be used by default.
      
      I did not add any documentation, since I couldn't find any place where
      environment variables (such as "spark.sparkr.use.daemon") are
      documented.
      
      I also did not add a unit test.  The only test that would work
      generally would be one starting SparkR with
      sparkR.init(sparkEnvir=list(spark.sparkr.r.command="Rscript")), just
      using the default value.  I think that this is a low-risk change.
      
      Likely committers: shivaram
      
      Author: Michael Sannella x268 <msannell@tibco.com>
      
      Closes #6557 from msannell/altR and squashes the following commits:
      
      7eac142 [Michael Sannella x268] add spark.sparkr.r.command config parameter
      4a9e03fa
    • Burak Yavuz's avatar
      [SPARK-8410] [SPARK-8475] remove previous ivy resolution when using spark-submit · d7f796da
      Burak Yavuz authored
      This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be prioritized.
      
      cc andrewor14
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #7089 from brkyvz/delete-prev-ivy-resolution and squashes the following commits:
      
      a21f95a [Burak Yavuz] remove previous ivy resolution when using spark-submit
      d7f796da
    • Sean Owen's avatar
      [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow... · 5d30eae5
      Sean Owen authored
      [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles
      
      Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #7036 from srowen/SPARK-8437 and squashes the following commits:
      
      0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
      5d30eae5
    • Yin Huai's avatar
      [SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite · fbf75738
      Yin Huai authored
      Hopefully, this suite will not be flaky anymore.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7027 from yhuai/SPARK-8567 and squashes the following commits:
      
      c0167e2 [Yin Huai] Add sc.stop().
      fbf75738
    • zsxwing's avatar
      [SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite "receiver info reporting" · cec98525
      zsxwing authored
      As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/
      
      ```
      15/06/24 23:09:10.210 Thread-3495 INFO ReceiverTracker: Starting 1 receivers
      15/06/24 23:09:10.270 Thread-3495 INFO SparkContext: Starting job: apply at Transformer.scala:22
      ...
      15/06/24 23:09:14.259 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Started receiver and sleeping
      15/06/24 23:09:14.270 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Reporting error and sleeping
      ```
      
      it needs at least 4 seconds to receive all receiver events in this slow machine, but `timeout` for `eventually` is only 2 seconds.
      This PR increases `timeout` to make this test stable.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7017 from zsxwing/SPARK-8634 and squashes the following commits:
      
      719cae4 [zsxwing] Fix flaky test StreamingListenerSuite "receiver info reporting"
      cec98525
    • Wenchen Fan's avatar
      [SPARK-8589] [SQL] cleanup DateTimeUtils · 881662e9
      Wenchen Fan authored
      move date time related operations into `DateTimeUtils` and rename some methods to make it more clear.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6980 from cloud-fan/datetime and squashes the following commits:
      
      9373a9d [Wenchen Fan] cleanup DateTimeUtil
      881662e9
    • Yin Huai's avatar
      [SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def. · 4b497a72
      Yin Huai authored
      jira: https://issues.apache.org/jira/browse/SPARK-8710
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #7094 from yhuai/SPARK-8710 and squashes the following commits:
      
      c854baa [Yin Huai] Change ScalaReflection.mirror from a val to a def.
      4b497a72
    • Rosstin's avatar
      [SPARK-8661][ML] for LinearRegressionSuite.scala, changed javadoc-style... · 4e880cf5
      Rosstin authored
      [SPARK-8661][ML] for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple
      
      for mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple
      
      Author: Rosstin <asterazul@gmail.com>
      
      Closes #7098 from Rosstin/SPARK-8661 and squashes the following commits:
      
      5a05dee [Rosstin] SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments to make it easier to copy-paste the R code.
      bb9a4b1 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8660
      242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala
      2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
      21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
      6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
      4e880cf5
    • Davies Liu's avatar
      [SPARK-8579] [SQL] support arbitrary object in UnsafeRow · ed359de5
      Davies Liu authored
      This PR brings arbitrary object support in UnsafeRow (both in grouping key and aggregation buffer).
      
      Two object pools will be created to hold those non-primitive objects, and put the index of them into UnsafeRow. In order to compare the grouping key as bytes, the objects in key will be stored in a unique object pool, to make sure same objects will have same index (used as hashCode).
      
      For StringType and BinaryType, we still put them as var-length in UnsafeRow when initializing for better performance. But for update, they will be an object inside object pools (there will be some garbages left in the buffer).
      
      BTW: Will create a JIRA once issue.apache.org is available.
      
      cc JoshRosen rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6959 from davies/unsafe_obj and squashes the following commits:
      
      5ce39da [Davies Liu] fix comment
      5e797bf [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
      5803d64 [Davies Liu] fix conflict
      461d304 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
      2f41c90 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
      b04d69c [Davies Liu] address comments
      4859b80 [Davies Liu] fix comments
      f38011c [Davies Liu] add a test for grouping by decimal
      d2cf7ab [Davies Liu] add more tests for null checking
      71983c5 [Davies Liu] add test for timestamp
      e8a1649 [Davies Liu] reuse buffer for string
      39f09ca [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj
      035501e [Davies Liu] fix style
      236d6de [Davies Liu] support arbitrary object in UnsafeRow
      ed359de5
    • BenFradet's avatar
      [SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf · 931da5c8
      BenFradet authored
      Follow-up of #6902 for being coherent between ```Udf``` and ```UDF```
      
      Author: BenFradet <benjamin.fradet@gmail.com>
      
      Closes #6920 from BenFradet/SPARK-8478 and squashes the following commits:
      
      c500f29 [BenFradet] renamed a few variables in functions to use UDF
      8ab0f2d [BenFradet] renamed idUdf to idUDF in SQLQuerySuite
      98696c2 [BenFradet] renamed originalUdfs in TestHive to originalUDFs
      7738f74 [BenFradet] modified HiveUDFSuite to use only UDF
      c52608d [BenFradet] renamed HiveUdfSuite to HiveUDFSuite
      e51b9ac [BenFradet] renamed ExtractPythonUdfs to ExtractPythonUDFs
      8c756f1 [BenFradet] renamed Hive UDF related code
      2a1ca76 [BenFradet] renamed pythonUdfs to pythonUDFs
      261e6fb [BenFradet] renamed ScalaUdf to ScalaUDF
      931da5c8
    • Rosstin's avatar
      [SPARK-8660][ML] Convert JavaDoc style comments... · c8ae887e
      Rosstin authored
      [SPARK-8660][ML] Convert JavaDoc style comments inLogisticRegressionSuite.scala to regular multiline comments, to make copy-pasting R commands easier
      
      Converted JavaDoc style comments in mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala to regular multiline comments, to make copy-pasting R commands easier.
      
      Author: Rosstin <asterazul@gmail.com>
      
      Closes #7096 from Rosstin/SPARK-8660 and squashes the following commits:
      
      242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala
      2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
      21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
      6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
      c8ae887e
    • Ai He's avatar
      [SPARK-7810] [PYSPARK] solve python rdd socket connection problem · ecd3aacf
      Ai He authored
      Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification should work around both two protocols.
      
      Author: Ai He <ai.he@ussuning.com>
      Author: AiHe <ai.he@ussuning.com>
      
      Closes #6338 from AiHe/pyspark-networking-issue and squashes the following commits:
      
      d4fc9c4 [Ai He] handle code review 2
      e75c5c8 [Ai He] handle code review
      5644953 [AiHe] solve python rdd socket connection problem to jvm
      ecd3aacf
    • Ilya Ganelin's avatar
      [SPARK-8056][SQL] Design an easier way to construct schema for both Scala and Python · f6fc254e
      Ilya Ganelin authored
      I've added functionality to create new StructType similar to how we add parameters to a new SparkContext.
      
      I've also added tests for this type of creation.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #6686 from ilganeli/SPARK-8056B and squashes the following commits:
      
      27c1de1 [Ilya Ganelin] Rename
      467d836 [Ilya Ganelin] Removed from_string in favor of _parse_Datatype_json_value
      5fef5a4 [Ilya Ganelin] Updates for type parsing
      4085489 [Ilya Ganelin] Style errors
      3670cf5 [Ilya Ganelin] added string to DataType conversion
      8109e00 [Ilya Ganelin] Fixed error in tests
      41ab686 [Ilya Ganelin] Fixed style errors
      e7ba7e0 [Ilya Ganelin] Moved some python tests to tests.py. Added cleaner handling of null data type and added test for correctness of input format
      15868fa [Ilya Ganelin] Fixed python errors
      b79b992 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-8056B
      a3369fc [Ilya Ganelin] Fixing space errors
      e240040 [Ilya Ganelin] Style
      bab7823 [Ilya Ganelin] Constructor error
      73d4677 [Ilya Ganelin] Style
      4ed00d9 [Ilya Ganelin] Fixed default arg
      67df57a [Ilya Ganelin] Removed Foo
      04cbf0c [Ilya Ganelin] Added comments for single object
      0484d7a [Ilya Ganelin] Restored second method
      6aeb740 [Ilya Ganelin] Style
      689e54d [Ilya Ganelin] Style
      f497e9e [Ilya Ganelin] Got rid of old code
      e3c7a88 [Ilya Ganelin] Fixed doctest failure
      a62ccde [Ilya Ganelin] Style
      966ac06 [Ilya Ganelin] style checks
      dabb7e6 [Ilya Ganelin] Added Python tests
      a3f4152 [Ilya Ganelin] added python bindings and better comments
      e6e536c [Ilya Ganelin] Added extra space
      7529a2e [Ilya Ganelin] Fixed formatting
      d388f86 [Ilya Ganelin] Fixed small bug
      c4e3bf5 [Ilya Ganelin] Reverted to using parse. Updated parse to support long
      d7634b6 [Ilya Ganelin] Reverted to fromString to properly support types
      22c39d5 [Ilya Ganelin] replaced FromString with DataTypeParser.parse. Replaced empty constructor initializing a null to have it instead create a new array to allow appends to it.
      faca398 [Ilya Ganelin] [SPARK-8056] Replaced default argument usage. Updated usage and code for DataType.fromString
      1acf76e [Ilya Ganelin] Scala style
      e31c674 [Ilya Ganelin] Fixed bug in test
      8dc0795 [Ilya Ganelin] Added tests for creation of StructType object with new methods
      fdf7e9f [Ilya Ganelin] [SPARK-8056] Created add methods to facilitate building new StructType objects.
      f6fc254e
    • Josh Rosen's avatar
      [SPARK-8709] Exclude hadoop-client's mockito-all dependency · 27ef8545
      Josh Rosen authored
      This patch excludes `hadoop-client`'s dependency on `mockito-all`.  As of #7061, Spark depends on `mockito-core` instead of `mockito-all`, so the dependency from Hadoop was leading to test compilation failures for some of the Hadoop 2 SBT builds.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7090 from JoshRosen/SPARK-8709 and squashes the following commits:
      
      e190122 [Josh Rosen] [SPARK-8709] Exclude hadoop-client's mockito-all dependency.
      27ef8545
    • Davies Liu's avatar
      [SPARK-8070] [SQL] [PYSPARK] avoid spark jobs in createDataFrame · afae9766
      Davies Liu authored
      Avoid the unnecessary jobs when infer schema from list.
      
      cc yhuai mengxr
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6606 from davies/improve_create and squashes the following commits:
      
      a5928bf [Davies Liu] Update MimaExcludes.scala
      62da911 [Davies Liu] fix mima
      bab4d7d [Davies Liu] Merge branch 'improve_create' of github.com:davies/spark into improve_create
      eee44a8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create
      8d9292d [Davies Liu] Update context.py
      eb24531 [Davies Liu] Update context.py
      c969997 [Davies Liu] bug fix
      d5a8ab0 [Davies Liu] fix tests
      8c3f10d [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create
      6ea5925 [Davies Liu] address comments
      6ceaeff [Davies Liu] avoid spark jobs in createDataFrame
      afae9766
    • Burak Yavuz's avatar
      [SPARK-8681] fixed wrong ordering of columns in crosstab · be7ef067
      Burak Yavuz authored
      I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that we hit a corner case or something.
      
      cc rxin marmbrus
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #7060 from brkyvz/crosstab-fixes and squashes the following commits:
      
      0a65234 [Burak Yavuz] addressed comments v1
      d96da7e [Burak Yavuz] fixed wrong ordering of columns in crosstab
      be7ef067
    • Cheng Hao's avatar
      [SPARK-7862] [SQL] Disable the error message redirect to stderr · c6ba2ea3
      Cheng Hao authored
      This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #6882 from chenghao-intel/verbose and squashes the following commits:
      
      bfedd77 [Cheng Hao] revert the write
      76ff46b [Cheng Hao] update the CircularBuffer
      692b19e [Cheng Hao] check the process exitValue for ScriptTransform
      47e0970 [Cheng Hao] Use the RedirectThread instead
      1de771d [Cheng Hao] naming the threads in ScriptTransformation
      8536e81 [Cheng Hao] disable the error message redirection for stderr
      c6ba2ea3
    • zhichao.li's avatar
      [SPARK-8214] [SQL] Add function hex · 637b4eed
      zhichao.li authored
      cc chenghao-intel  adrian-wang
      
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #6976 from zhichao-li/hex and squashes the following commits:
      
      e218d1b [zhichao.li] turn off scalastyle for non-ascii
      de3f5ea [zhichao.li] non-ascii char
      cf9c936 [zhichao.li] give separated buffer for each hex method
      967ec90 [zhichao.li] Make 'value' as a feild of Hex
      3b2fa13 [zhichao.li] tiny fix
      a647641 [zhichao.li] remove duplicate null check
      7cab020 [zhichao.li] tiny refactoring
      35ecfe5 [zhichao.li] add function hex
      637b4eed
    • Kousuke Saruta's avatar
      [SQL][DOCS] Remove wrong example from DataFrame.scala · 94e040d0
      Kousuke Saruta authored
      In DataFrame.scala, there are examples like as follows.
      
      ```
       * // The following are equivalent:
       * peopleDf.filter($"age" > 15)
       * peopleDf.where($"age" > 15)
       * peopleDf($"age" > 15)
      ```
      
      But, I think the last example doesn't work.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6977 from sarutak/fix-dataframe-example and squashes the following commits:
      
      46efbd7 [Kousuke Saruta] Removed wrong example
      94e040d0
    • Vladimir Vladimirov's avatar
      [SPARK-8528] Expose SparkContext.applicationId in PySpark · 492dca3a
      Vladimir Vladimirov authored
      Use case - we want to log applicationId (YARN in hour case) to request help with troubleshooting from the DevOps
      
      Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com>
      
      Closes #6936 from smartkiwi/master and squashes the following commits:
      
      870338b [Vladimir Vladimirov] this would make doctest to run in python3
      0eae619 [Vladimir Vladimirov] Scala doesn't use u'...' for unicode literals
      14d77a8 [Vladimir Vladimirov] stop using ELLIPSIS
      b4ebfc5 [Vladimir Vladimirov] addressed PR feedback - updated docstring
      223a32f [Vladimir Vladimirov] fixed test - applicationId is property that returns the string
      3221f5a [Vladimir Vladimirov] [SPARK-8528] added documentation for Scala
      2cff090 [Vladimir Vladimirov] [SPARK-8528] add applicationId property for SparkContext object in pyspark
      492dca3a
    • Tarek Auel's avatar
      [SPARK-8235] [SQL] misc function sha / sha1 · a5c2961c
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8235
      
      I added the support for sha1. If I understood rxin correctly, sha and sha1 should execute the same algorithm, shouldn't they?
      
      Please take a close look on the Python part. This is adopted from #6934
      
      Author: Tarek Auel <tarek.auel@gmail.com>
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #6963 from tarekauel/SPARK-8235 and squashes the following commits:
      
      f064563 [Tarek Auel] change to shaHex
      7ce3cdc [Tarek Auel] rely on automatic cast
      a1251d6 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-8235
      68eb043 [Tarek Auel] added docstring
      be5aff1 [Tarek Auel] improved error message
      7336c96 [Tarek Auel] added type check
      cf23a80 [Tarek Auel] simplified example
      ebf75ef [Tarek Auel] [SPARK-8301] updated the python documentation. Removed sha in python and scala
      6d6ff0d [Tarek Auel] [SPARK-8233] added docstring
      ea191a9 [Tarek Auel] [SPARK-8233] fixed signatureof python function. Added expected type to misc
      e3fd7c3 [Tarek Auel] SPARK[8235] added sha to the list of __all__
      e5dad4e [Tarek Auel] SPARK[8235] sha / sha1
      a5c2961c
    • Marcelo Vanzin's avatar
      [SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2. · 3664ee25
      Marcelo Vanzin authored
      Allow HiveContext to connect to metastores of those versions; some new shims
      had to be added to account for changing internal APIs.
      
      A new test was added to exercise the "reset()" path which now also requires
      a shim; and the test code was changed to use a directory under the build's
      target to store ivy dependencies. Without that, at least I consistently run
      into issues with Ivy messing up (or being confused) by my existing caches.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7026 from vanzin/SPARK-8067 and squashes the following commits:
      
      3e2e67b [Marcelo Vanzin] [SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2.
      3664ee25
    • Wenchen Fan's avatar
      [SPARK-8692] [SQL] re-order the case statements that handling catalyst data types · ed413bcc
      Wenchen Fan authored
      use same order: boolean, byte, short, int, date, long, timestamp, float, double, string, binary, decimal.
      
      Then we can easily check whether some data types are missing by just one glance, and make sure we handle data/timestamp just as int/long.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7073 from cloud-fan/fix-date and squashes the following commits:
      
      463044d [Wenchen Fan] fix style
      51cd347 [Wenchen Fan] refactor handling of date and timestmap
      ed413bcc
    • Andrew Or's avatar
    • Yu ISHIKAWA's avatar
      [SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` · 715f084c
      Yu ISHIKAWA authored
      [[SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8554)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6947 from yu-iskw/SPARK-8554 and squashes the following commits:
      
      5ca240c [Yu ISHIKAWA] [SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license`
      715f084c
    • Brennon York's avatar
      [SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a nice way · 5c796d57
      Brennon York authored
      Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like:
      
      ```
      -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  streaming-kafka-assembly/assembly
      ```
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #7085 from brennonyork/SPARK-8693 and squashes the following commits:
      
      c5575f1 [Brennon York] added commas to end of print statements for proper printing
      5c796d57
    • zsxwing's avatar
      [SPARK-8702] [WEBUI] Avoid massive concating strings in Javascript · 630bd5fd
      zsxwing authored
      When there are massive tasks, such as `sc.parallelize(1 to 100000, 10000).count()`, the generated JS codes have a lot of string concatenations in the stage page, nearly 40 string concatenations for one task.
      
      We can generate the whole string for a task instead of execution string concatenations in the browser.
      
      Before this patch, the load time of the page is about 21 seconds.
      ![screen shot 2015-06-29 at 6 44 04 pm](https://cloud.githubusercontent.com/assets/1000778/8406644/eb55ed18-1e90-11e5-9ad5-50d27ad1dff1.png)
      
      After this patch, it reduces to about 17 seconds.
      
      ![screen shot 2015-06-29 at 6 47 34 pm](https://cloud.githubusercontent.com/assets/1000778/8406665/087003ca-1e91-11e5-80a8-3485aa9adafa.png)
      
      One disadvantage is that the generated JS codes become hard to read.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7082 from zsxwing/js-string and squashes the following commits:
      
      b29231d [zsxwing] Avoid massive concating strings in Javascript
      630bd5fd
    • Reynold Xin's avatar
      [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should... · 660c6cec
      Reynold Xin authored
      [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7079 from rxin/SPARK-8698 and squashes the following commits:
      
      8513e1c [Reynold Xin] [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.
      660c6cec
Loading