Skip to content
Snippets Groups Projects
  1. Apr 24, 2015
    • Deborah Siegel's avatar
      [SPARK-7136][Docs] Spark SQL and DataFrame Guide fix example file and paths · 59b7cfc4
      Deborah Siegel authored
      Changes example file for Generic Load/Save Functions to users.parquet rather than people.parquet which doesn't exist unless a later example has already been executed. Also adds filepaths.
      
      Author: Deborah Siegel <deborah.siegel@gmail.com>
      Author: DEBORAH SIEGEL <deborahsiegel@d-140-142-0-49.dhcp4.washington.edu>
      Author: DEBORAH SIEGEL <deborahsiegel@DEBORAHs-MacBook-Pro.local>
      Author: DEBORAH SIEGEL <deborahsiegel@d-69-91-154-197.dhcp4.washington.edu>
      
      Closes #5693 from d3borah/master and squashes the following commits:
      
      4d5e43b [Deborah Siegel] sparkSQL doc change
      b15a497 [Deborah Siegel] Revert "sparkSQL doc change"
      5a2863c [DEBORAH SIEGEL] Merge remote-tracking branch 'upstream/master'
      91972fc [DEBORAH SIEGEL] sparkSQL doc change
      f000e59 [DEBORAH SIEGEL] Merge remote-tracking branch 'upstream/master'
      db54173 [DEBORAH SIEGEL] fixed aggregateMessages example in graphX doc
      59b7cfc4
    • linweizhong's avatar
      [PySpark][Minor] Update sql example, so that can read file correctly · d874f8b5
      linweizhong authored
      To run Spark, default will read file from HDFS if we don't set the schema.
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #5684 from Sephiroth-Lin/pyspark_example_minor and squashes the following commits:
      
      19fe145 [linweizhong] Update example sql.py, so that can read file correctly
      d874f8b5
    • Calvin Jia's avatar
      [SPARK-6122] [CORE] Upgrade tachyon-client version to 0.6.3 · 438859eb
      Calvin Jia authored
      This is a reopening of #4867.
      A short summary of the issues resolved from the previous PR:
      
      1. HTTPClient version mismatch: Selenium (used for UI tests) requires version 4.3.x, and Tachyon included 4.2.5 through a transitive dependency of its shaded thrift jar. To address this, Tachyon 0.6.3 will promote the transitive dependencies of the shaded jar so they can be excluded in spark.
      
      2. Jackson-Mapper-ASL version mismatch: In lower versions of hadoop-client (ie. 1.0.4), version 1.0.1 is included. The parquet library used in spark sql requires version 1.8+. Its unclear to me why upgrading tachyon-client would cause this dependency to break. The solution was to exclude jackson-mapper-asl from hadoop-client.
      
      It seems that the dependency management in spark-parent will not work on transitive dependencies, one way to make sure jackson-mapper-asl is included with the correct version is to add it as a top level dependency. The best solution would be to exclude the dependency in the modules which require a higher version, but that did not fix the unit tests. Any suggestions on the best way to solve this would be appreciated!
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #5354 from calvinjia/upgrade_tachyon_0.6.3 and squashes the following commits:
      
      0eefe4d [Calvin Jia] Handle httpclient version in maven dependency management. Remove httpclient version setting from profiles.
      7c00dfa [Calvin Jia] Set httpclient version to 4.3.2 for selenium. Specify version of httpclient for sql/hive (previously 4.2.5 transitive dependency of libthrift).
      9263097 [Calvin Jia] Merge master to test latest changes
      dbfc1bd [Calvin Jia] Use Tachyon 0.6.4 for cleaner dependencies.
      e2ff80a [Calvin Jia] Exclude the jetty and curator promoted dependencies from tachyon-client.
      a3a29da [Calvin Jia] Update tachyon-client exclusions.
      0ae6c97 [Calvin Jia] Change tachyon version to 0.6.3
      a204df9 [Calvin Jia] Update make distribution tachyon version.
      a93c94f [Calvin Jia] Exclude jackson-mapper-asl from hadoop client since it has a lower version than spark's expected version.
      a8a923c [Calvin Jia] Exclude httpcomponents from Tachyon
      910fabd [Calvin Jia] Update to master
      eed9230 [Calvin Jia] Update tachyon version to 0.6.1.
      11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings.
      71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
      438859eb
    • Sun Rui's avatar
      [SPARK-6852] [SPARKR] Accept numeric as numPartitions in SparkR. · caf0136e
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #5613 from sun-rui/SPARK-6852 and squashes the following commits:
      
      abaf02e [Sun Rui] Change the type of default numPartitions from integer to numeric in generics.R.
      29d67c1 [Sun Rui] [SPARK-6852][SPARKR] Accept numeric as numPartitions in SparkR.
      caf0136e
    • Sun Rui's avatar
      [SPARK-7033] [SPARKR] Clean usage of split. Use partition instead where applicable. · ebb77b2a
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #5628 from sun-rui/SPARK-7033 and squashes the following commits:
      
      046bc9e [Sun Rui] Clean split usage in tests.
      d531c86 [Sun Rui] [SPARK-7033][SPARKR] Clean usage of split. Use partition instead where applicable.
      ebb77b2a
    • Xusen Yin's avatar
      [SPARK-6528] [ML] Add IDF transformer · 6e57d57b
      Xusen Yin authored
      See [SPARK-6528](https://issues.apache.org/jira/browse/SPARK-6528). Add IDF transformer in ML package.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #5266 from yinxusen/SPARK-6528 and squashes the following commits:
      
      741db31 [Xusen Yin] get param from new paramMap
      d169967 [Xusen Yin] add final to param and IDF class
      c9c3759 [Xusen Yin] simplify test suite
      5867c09 [Xusen Yin] refine IDF transformer with new interfaces
      7727cae [Xusen Yin] Merge branch 'master' into SPARK-6528
      4338a37 [Xusen Yin] Merge branch 'master' into SPARK-6528
      aef2cdf [Xusen Yin] add doc and group for param
      5760b49 [Xusen Yin] fix code style
      2add691 [Xusen Yin] fix code style and test
      03fbecb [Xusen Yin] remove duplicated code
      2aa4be0 [Xusen Yin] clean test suite
      4802c67 [Xusen Yin] add IDF transformer and test suite
      6e57d57b
    • Xiangrui Meng's avatar
      [SPARK-7115] [MLLIB] skip the very first 1 in poly expansion · 78b39c7e
      Xiangrui Meng authored
      yinxusen
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5681 from mengxr/SPARK-7115 and squashes the following commits:
      
      9ac27cd [Xiangrui Meng] skip the very first 1 in poly expansion
      78b39c7e
    • Xusen Yin's avatar
      [SPARK-5894] [ML] Add polynomial mapper · 8509519d
      Xusen Yin authored
      See [SPARK-5894](https://issues.apache.org/jira/browse/SPARK-5894).
      
      Author: Xusen Yin <yinxusen@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5245 from yinxusen/SPARK-5894 and squashes the following commits:
      
      dc461a6 [Xusen Yin] merge polynomial expansion v2
      6d0c3cc [Xusen Yin] Merge branch 'SPARK-5894' of https://github.com/mengxr/spark into mengxr-SPARK-5894
      57bfdd5 [Xusen Yin] Merge branch 'master' into SPARK-5894
      3d02a7d [Xusen Yin] Merge branch 'master' into SPARK-5894
      a067da2 [Xiangrui Meng] a new approach for poly expansion
      0789d81 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5894
      4e9aed0 [Xusen Yin] fix test suite
      95d8fb9 [Xusen Yin] fix sparse vector indices
      8d39674 [Xusen Yin] fix sparse vector expansion error
      5998dd6 [Xusen Yin] fix dense vector fillin
      fa3ade3 [Xusen Yin] change the functional code into imperative one to speedup
      b70e7e1 [Xusen Yin] remove useless case class
      6fa236f [Xusen Yin] fix vector slice error
      daff601 [Xusen Yin] fix index error of sparse vector
      6bd0a10 [Xusen Yin] merge repeated features
      419f8a2 [Xusen Yin] need to merge same columns
      4ebf34e [Xusen Yin] add test suite of polynomial expansion
      372227c [Xusen Yin] add polynomial expansion
      8509519d
    • Reynold Xin's avatar
      Fixed a typo from the previous commit. · 4c722d77
      Reynold Xin authored
      4c722d77
  2. Apr 23, 2015
    • Reynold Xin's avatar
      [SQL] Fixed expression data type matching. · d3a302de
      Reynold Xin authored
      Also took the chance to improve documentation for various types.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5675 from rxin/data-type-matching-expr and squashes the following commits:
      
      0f31856 [Reynold Xin] One more function documentation.
      27c1973 [Reynold Xin] Added more documentation.
      336a36d [Reynold Xin] [SQL] Fixed expression data type matching.
      d3a302de
    • Ken Geis's avatar
      Update sql-programming-guide.md · 67bccbda
      Ken Geis authored
      fix typo
      
      Author: Ken Geis <geis.ken@gmail.com>
      
      Closes #5674 from kgeis/patch-1 and squashes the following commits:
      
      5ae67de [Ken Geis] Update sql-programming-guide.md
      67bccbda
    • Yin Huai's avatar
      [SPARK-7060][SQL] Add alias function to python dataframe · 2d010f7a
      Yin Huai authored
      This pr tries to provide a way to let python users workaround https://issues.apache.org/jira/browse/SPARK-6231.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5634 from yhuai/pythonDFAlias and squashes the following commits:
      
      8465acd [Yin Huai] Add an alias to a Python DF.
      2d010f7a
    • Cheolsoo Park's avatar
      [SPARK-7037] [CORE] Inconsistent behavior for non-spark config properties in... · 336f7f53
      Cheolsoo Park authored
      [SPARK-7037] [CORE] Inconsistent behavior for non-spark config properties in spark-shell and spark-submit
      
      When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users.
      Here is the summary-
      * spark-submit
        * --conf k=v => silently ignored
        * spark-defaults.conf => applied
      * spark-shell
        * --conf k=v => show a warning message and ignored
        *  spark-defaults.conf => show a warning message and ignored
      
      I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases.
      
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #5617 from piaozhexiu/SPARK-7037 and squashes the following commits:
      
      8957950 [Cheolsoo Park] Add IgnoreNonSparkProperties method
      fedd01c [Cheolsoo Park] Ignore non-spark properties with a warning message in all cases
      336f7f53
    • Sun Rui's avatar
      [SPARK-6818] [SPARKR] Support column deletion in SparkR DataFrame API. · 73db132b
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #5655 from sun-rui/SPARK-6818 and squashes the following commits:
      
      7c66570 [Sun Rui] [SPARK-6818][SPARKR] Support column deletion in SparkR DataFrame API.
      73db132b
    • Reynold Xin's avatar
      [SQL] Break dataTypes.scala into multiple files. · 6220d933
      Reynold Xin authored
      It was over 1000 lines of code, making it harder to find all the types. Only moved code around, and didn't change any.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5670 from rxin/break-types and squashes the following commits:
      
      8c59023 [Reynold Xin] Check in missing files.
      dcd5193 [Reynold Xin] [SQL] Break dataTypes.scala into multiple files.
      6220d933
    • Xiangrui Meng's avatar
      [SPARK-7070] [MLLIB] LDA.setBeta should call setTopicConcentration. · 1ed46a60
      Xiangrui Meng authored
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5649 from mengxr/SPARK-7070 and squashes the following commits:
      
      c66023c [Xiangrui Meng] setBeta should call setTopicConcentration
      1ed46a60
    • Tijo Thomas's avatar
      [SPARK-7087] [BUILD] Fix path issue change version script · 6d0749ca
      Tijo Thomas authored
      Author: Tijo Thomas <tijoparacka@gmail.com>
      
      Closes #5656 from tijoparacka/FIX_PATHISSUE_CHANGE_VERSION_SCRIPT and squashes the following commits:
      
      ab4f4b1 [Tijo Thomas] removed whitespace
      24478c9 [Tijo Thomas] modified to provide the spark base dir while searching for pom and also while changing the vesrion no
      7b8e10b [Tijo Thomas] Modified for providing the base directories while finding the list of pom files and also while changing the version no
      6d0749ca
    • WangTaoTheTonic's avatar
      [SPARK-6879] [HISTORYSERVER] check if app is completed before clean it up · baa83a9a
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-6879
      
      Use `applications` to replace `FileStatus`, and check if the app is completed before clean it up.
      If an exception was throwed, add it to `applications` to wait for the next loop.
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #5491 from WangTaoTheTonic/SPARK-6879 and squashes the following commits:
      
      4a533eb [WangTaoTheTonic] treat ACE specially
      cb45105 [WangTaoTheTonic] rebase
      d4d5251 [WangTaoTheTonic] per Marcelo's comments
      d7455d8 [WangTaoTheTonic] slightly change when delete file
      b0abca5 [WangTaoTheTonic] use global var to store apps to clean
      94adfe1 [WangTaoTheTonic] leave expired apps alone to be deleted
      9872a9d [WangTaoTheTonic] use the right path
      fdef4d6 [WangTaoTheTonic] check if app is completed before clean it up
      baa83a9a
    • wizz's avatar
      [SPARK-7085][MLlib] Fix miniBatchFraction parameter in train method called with 4 arguments · 3e91cc27
      wizz authored
      Author: wizz <wizz@wizz-dev01.kawasaki.flab.fujitsu.com>
      
      Closes #5658 from kuromatsu-nobuyuki/SPARK-7085 and squashes the following commits:
      
      6ec2d21 [wizz] Fix miniBatchFraction parameter in train method called with 4 arguments
      3e91cc27
    • Josh Rosen's avatar
      [SPARK-7058] Include RDD deserialization time in "task deserialization time" metric · 6afde2c7
      Josh Rosen authored
      The web UI's "task deserialization time" metric is slightly misleading because it does not capture the time taken to deserialize the broadcasted RDD.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5635 from JoshRosen/SPARK-7058 and squashes the following commits:
      
      ed90f75 [Josh Rosen] Update UI tooltip
      a3743b4 [Josh Rosen] Update comments.
      4f52910 [Josh Rosen] Roll back whitespace change
      e9cf9f4 [Josh Rosen] Remove unused variable
      9f32e55 [Josh Rosen] Expose executorDeserializeTime on Task instead of pushing runtime calculation into Task.
      21f5b47 [Josh Rosen] Don't double-count the broadcast deserialization time in task runtime
      1752f0e [Josh Rosen] [SPARK-7058] Incorporate RDD deserialization time in task deserialization time metric
      6afde2c7
    • Vinod K C's avatar
      [SPARK-7055][SQL]Use correct ClassLoader for JDBC Driver in JDBCRDD.getConnector · c1213e6a
      Vinod K C authored
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #5633 from vinodkc/use_correct_classloader_driverload and squashes the following commits:
      
      73c5380 [Vinod K C] Use correct ClassLoader for JDBC Driver
      c1213e6a
    • Tathagata Das's avatar
      [SPARK-6752][Streaming] Allow StreamingContext to be recreated from checkpoint... · 534f2a43
      Tathagata Das authored
      [SPARK-6752][Streaming] Allow StreamingContext to be recreated from checkpoint and existing SparkContext
      
      Currently if you want to create a StreamingContext from checkpoint information, the system will create a new SparkContext. This prevent StreamingContext to be recreated from checkpoints in managed environments where SparkContext is precreated.
      
      The solution in this PR: Introduce the following methods on StreamingContext
      1. `new StreamingContext(checkpointDirectory, sparkContext)`
         Recreate StreamingContext from checkpoint using the provided SparkContext
      2. `StreamingContext.getOrCreate(checkpointDirectory, sparkContext, createFunction: SparkContext => StreamingContext)`
         If checkpoint file exists, then recreate StreamingContext using the provided SparkContext (that is, call 1.), else create StreamingContext using the provided createFunction
      
      TODO: the corresponding Java and Python API has to be added as well.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5428 from tdas/SPARK-6752 and squashes the following commits:
      
      94db63c [Tathagata Das] Fix long line.
      524f519 [Tathagata Das] Many changes based on PR comments.
      eabd092 [Tathagata Das] Added Function0, Java API and unit tests for StreamingContext.getOrCreate
      36a7823 [Tathagata Das] Minor changes.
      204814e [Tathagata Das] Added StreamingContext.getOrCreate with existing SparkContext
      534f2a43
    • Cheng Hao's avatar
      [SPARK-7044] [SQL] Fix the deadlock in script transformation · cc48e638
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #5625 from chenghao-intel/transform and squashes the following commits:
      
      5ec1dd2 [Cheng Hao] fix the deadlock issue in ScriptTransform
      cc48e638
    • Prabeesh K's avatar
      [minor][streaming]fixed scala string interpolation error · 975f53e4
      Prabeesh K authored
      Author: Prabeesh K <prabeesh.k@namshi.com>
      
      Closes #5653 from prabeesh/fix and squashes the following commits:
      
      9d7a9f5 [Prabeesh K] fixed scala string interpolation error
      975f53e4
    • Prashant Sharma's avatar
      [HOTFIX] [SQL] Fix compilation for scala 2.11. · a7d65d38
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #5652 from ScrapCodes/hf/compilation-fix-scala-2.11 and squashes the following commits:
      
      819ff06 [Prashant Sharma] [HOTFIX] Fix compilation for scala 2.11.
      a7d65d38
    • Reynold Xin's avatar
      [SPARK-7069][SQL] Rename NativeType -> AtomicType. · f60bece1
      Reynold Xin authored
      Also renamed JvmType to InternalType.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5651 from rxin/native-to-atomic-type and squashes the following commits:
      
      cbd4028 [Reynold Xin] [SPARK-7069][SQL] Rename NativeType -> AtomicType.
      f60bece1
    • Reynold Xin's avatar
      [SPARK-7068][SQL] Remove PrimitiveType · 29163c52
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5646 from rxin/remove-primitive-type and squashes the following commits:
      
      01b673d [Reynold Xin] [SPARK-7068][SQL] Remove PrimitiveType
      29163c52
    • Reynold Xin's avatar
      [MLlib] Add support for BooleanType to VectorAssembler. · 2d33323c
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5648 from rxin/vectorAssembler-boolean and squashes the following commits:
      
      1bf3d40 [Reynold Xin] [MLlib] Add support for BooleanType to VectorAssembler.
      2d33323c
    • Liang-Chi Hsieh's avatar
      [HOTFIX][SQL] Fix broken cached test · d9e70f33
      Liang-Chi Hsieh authored
      Added in #5475. Pointed as broken in #5639.
      /cc marmbrus
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5640 from viirya/fix_cached_test and squashes the following commits:
      
      c0cf69a [Liang-Chi Hsieh] Fix broken cached test.
      d9e70f33
  3. Apr 22, 2015
Loading