Skip to content
Snippets Groups Projects
  1. Apr 04, 2014
    • Haoyuan Li's avatar
      SPARK-1305: Support persisting RDD's directly to Tachyon · b50ddfde
      Haoyuan Li authored
      Move the PR#468 of apache-incubator-spark to the apache-spark
      "Adding an option to persist Spark RDD blocks into Tachyon."
      
      Author: Haoyuan Li <haoyuan@cs.berkeley.edu>
      Author: RongGu <gurongwalker@gmail.com>
      
      Closes #158 from RongGu/master and squashes the following commits:
      
      72b7768 [Haoyuan Li] merge master
      9f7fa1b [Haoyuan Li] fix code style
      ae7834b [Haoyuan Li] minor cleanup
      a8b3ec6 [Haoyuan Li] merge master branch
      e0f4891 [Haoyuan Li] better check offheap.
      55b5918 [RongGu] address matei's comment on the replication of offHeap storagelevel
      7cd4600 [RongGu] remove some logic code for tachyonstore's replication
      51149e7 [RongGu] address aaron's comment on returning value of the remove() function in tachyonstore
      8adfcfa [RongGu] address arron's comment on inTachyonSize
      120e48a [RongGu] changed the root-level dir name in Tachyon
      5cc041c [Haoyuan Li] address aaron's comments
      9b97935 [Haoyuan Li] address aaron's comments
      d9a6438 [Haoyuan Li] fix for pspark
      77d2703 [Haoyuan Li] change python api.git status
      3dcace4 [Haoyuan Li] address matei's comments
      91fa09d [Haoyuan Li] address patrick's comments
      589eafe [Haoyuan Li] use TRY_CACHE instead of MUST_CACHE
      64348b2 [Haoyuan Li] update conf docs.
      ed73e19 [Haoyuan Li] Merge branch 'master' of github.com:RongGu/spark-1
      619a9a8 [RongGu] set number of directories in TachyonStore back to 64; added a TODO tag for duplicated code from the DiskStore
      be79d77 [RongGu] find a way to clean up some unnecessay metods and classed to make the code simpler
      49cc724 [Haoyuan Li] update docs with off_headp option
      4572f9f [RongGu] reserving the old apply function API of StorageLevel
      04301d3 [RongGu] rename StorageLevel.TACHYON to Storage.OFF_HEAP
      c9aeabf [RongGu] rename the StorgeLevel.TACHYON as StorageLevel.OFF_HEAP
      76805aa [RongGu] unifies the config properties name prefix; add the configs into docs/configuration.md
      e700d9c [RongGu] add the SparkTachyonHdfsLR example and some comments
      fd84156 [RongGu] use randomUUID to generate sparkapp directory name on tachyon;minor code style fix
      939e467 [Haoyuan Li] 0.4.1-thrift from maven central
      86a2eab [Haoyuan Li] tachyon 0.4.1-thrift is in the staging repo. but jenkins failed to download it. temporarily revert it back to 0.4.1
      16c5798 [RongGu] make the dependency on tachyon as tachyon-0.4.1-thrift
      eacb2e8 [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1
      bbeb4de [RongGu] fix the JsonProtocolSuite test failure problem
      6adb58f [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1
      d827250 [RongGu] fix JsonProtocolSuie test failure
      716e93b [Haoyuan Li] revert the version
      ca14469 [Haoyuan Li] bump tachyon version to 0.4.1-thrift
      2825a13 [RongGu] up-merging to the current master branch of the apache spark
      6a22c1a [Haoyuan Li] fix scalastyle
      8968b67 [Haoyuan Li] exclude more libraries from tachyon dependency to be the same as referencing tachyon-client.
      77be7e8 [RongGu] address mateiz's comment about the temp folder name problem. The implementation followed mateiz's advice.
      1dcadf9 [Haoyuan Li] typo
      bf278fa [Haoyuan Li] fix python tests
      e82909c [Haoyuan Li] minor cleanup
      776a56c [Haoyuan Li] address patrick's and ali's comments from the previous PR
      8859371 [Haoyuan Li] various minor fixes and clean up
      e3ddbba [Haoyuan Li] add doc to use Tachyon cache mode.
      fcaeab2 [Haoyuan Li] address Aaron's comment
      e554b1e [Haoyuan Li] add python code
      47304b3 [Haoyuan Li] make tachyonStore in BlockMananger lazy val; add more comments StorageLevels.
      dc8ef24 [Haoyuan Li] add old storelevel constructor
      e01a271 [Haoyuan Li] update tachyon 0.4.1
      8011a96 [RongGu] fix a brought-in mistake in StorageLevel
      70ca182 [RongGu] a bit change in comment
      556978b [RongGu] fix the scalastyle errors
      791189b [RongGu] "Adding an option to persist Spark RDD blocks into Tachyon." move the PR#468 of apache-incubator-spark to the apache-spark
      b50ddfde
    • Matei Zaharia's avatar
      SPARK-1414. Python API for SparkContext.wholeTextFiles · 60e18ce7
      Matei Zaharia authored
      Also clarified comment on each file having to fit in memory
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #327 from mateiz/py-whole-files and squashes the following commits:
      
      9ad64a5 [Matei Zaharia] SPARK-1414. Python API for SparkContext.wholeTextFiles
      60e18ce7
  2. Mar 10, 2014
  3. Mar 06, 2014
    • Prabin Banka's avatar
      SPARK-1187, Added missing Python APIs · 3d3acef0
      Prabin Banka authored
      The following Python APIs are added,
      RDD.id()
      SparkContext.setJobGroup()
      SparkContext.setLocalProperty()
      SparkContext.getLocalProperty()
      SparkContext.sparkUser()
      
      was raised earlier as a part of  apache/incubator-spark#486
      
      Author: Prabin Banka <prabin.banka@imaginea.com>
      
      Closes #75 from prabinb/python-api-backup and squashes the following commits:
      
      cc3c6cd [Prabin Banka] Added missing Python APIs
      3d3acef0
  4. Feb 20, 2014
    • Ahir Reddy's avatar
      SPARK-1114: Allow PySpark to use existing JVM and Gateway · 59b13795
      Ahir Reddy authored
      Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      
      Closes #622 from ahirreddy/pyspark-existing-jvm and squashes the following commits:
      
      a86f457 [Ahir Reddy] Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.
      59b13795
  5. Jan 28, 2014
    • Josh Rosen's avatar
      Switch from MUTF8 to UTF8 in PySpark serializers. · 1381fc72
      Josh Rosen authored
      This fixes SPARK-1043, a bug introduced in 0.9.0
      where PySpark couldn't serialize strings > 64kB.
      
      This fix was written by @tyro89 and @bouk in #512.
      This commit squashes and rebases their pull request
      in order to fix some merge conflicts.
      1381fc72
  6. Jan 01, 2014
  7. Dec 30, 2013
  8. Dec 29, 2013
  9. Dec 28, 2013
  10. Dec 24, 2013
  11. Dec 18, 2013
  12. Nov 10, 2013
  13. Nov 03, 2013
  14. Oct 22, 2013
    • Ewen Cheslack-Postava's avatar
      Pass self to SparkContext._ensure_initialized. · 317a9eb1
      Ewen Cheslack-Postava authored
      The constructor for SparkContext should pass in self so that we track
      the current context and produce errors if another one is created. Add
      a doctest to make sure creating multiple contexts triggers the
      exception.
      317a9eb1
    • Ewen Cheslack-Postava's avatar
      Add classmethod to SparkContext to set system properties. · 56d230e6
      Ewen Cheslack-Postava authored
      Add a new classmethod to SparkContext to set system properties like is
      possible in Scala/Java. Unlike the Java/Scala implementations, there's
      no access to System until the JVM bridge is created. Since
      SparkContext handles that, move the initialization of the JVM
      connection to a separate classmethod that can safely be called
      repeatedly as long as the same instance (or no instance) is provided.
      56d230e6
  15. Sep 08, 2013
  16. Sep 07, 2013
  17. Sep 06, 2013
  18. Sep 01, 2013
  19. Aug 16, 2013
  20. Jul 29, 2013
    • Matei Zaharia's avatar
      SPARK-815. Python parallelize() should split lists before batching · feba7ee5
      Matei Zaharia authored
      One unfortunate consequence of this fix is that we materialize any
      collections that are given to us as generators, but this seems necessary
      to get reasonable behavior on small collections. We could add a
      batchSize parameter later to bypass auto-computation of batch size if
      this becomes a problem (e.g. if users really want to parallelize big
      generators nicely)
      feba7ee5
  21. Jul 16, 2013
  22. Feb 03, 2013
  23. Feb 01, 2013
  24. Jan 23, 2013
  25. Jan 22, 2013
  26. Jan 21, 2013
  27. Jan 20, 2013
Loading