Skip to content
Snippets Groups Projects
  1. Dec 09, 2015
    • Mark Grover's avatar
      [SPARK-11796] Fix httpclient and httpcore depedency issues related to docker-client · 2166c2a7
      Mark Grover authored
      This commit fixes dependency issues which prevented the Docker-based JDBC integration tests from running in the Maven build.
      
      Author: Mark Grover <mgrover@cloudera.com>
      
      Closes #9876 from markgrover/master_docker.
      2166c2a7
    • Yin Huai's avatar
      [SPARK-11678][SQL][DOCS] Document basePath in the programming guide. · ac8cdf1c
      Yin Huai authored
      This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`.
      
      The compiled doc is shown below.
      ![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png)
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-11678
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #10211 from yhuai/basePathDoc.
      ac8cdf1c
    • Andrew Or's avatar
      [SPARK-12165][ADDENDUM] Fix outdated comments on unroll test · 8770bd12
      Andrew Or authored
      JoshRosen
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10229 from andrewor14/unroll-test-comments.
      8770bd12
    • Andrew Ray's avatar
      [SPARK-12211][DOC][GRAPHX] Fix version number in graphx doc for migration from 1.1 · 7a8e587d
      Andrew Ray authored
      Migration from 1.1 section added to the GraphX doc in 1.2.0 (see https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#migrating-from-spark-11) uses \{{site.SPARK_VERSION}} as the version where changes were introduced, it should be just 1.2.
      
      Author: Andrew Ray <ray.andrew@gmail.com>
      
      Closes #10206 from aray/graphx-doc-1.1-migration.
      7a8e587d
    • Xusen Yin's avatar
      [SPARK-11551][DOC] Replace example code in ml-features.md using include_example · 051c6a06
      Xusen Yin authored
      PR on behalf of somideshmukh, thanks!
      
      Author: Xusen Yin <yinxusen@gmail.com>
      Author: somideshmukh <somilde@us.ibm.com>
      
      Closes #10219 from yinxusen/SPARK-11551.
      051c6a06
    • Sean Owen's avatar
      [SPARK-11824][WEBUI] WebUI does not render descriptions with 'bad' HTML, throws console error · 1eb7c22c
      Sean Owen authored
      Don't warn when description isn't valid HTML since it may properly be like "SELECT ... where foo <= 1"
      
      The tests for this code indicate that it's normal to handle strings like this that don't contain HTML as a string rather than markup. Hence logging every such instance as a warning is too noisy since it's not a problem. this is an issue for stages whose name contain SQL like the above
      
      CC tdas as author of this bit of code
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10159 from srowen/SPARK-11824.
      1eb7c22c
    • Josh Rosen's avatar
      [SPARK-12165][SPARK-12189] Fix bugs in eviction of storage memory by execution · aec5ea00
      Josh Rosen authored
      This patch fixes a bug in the eviction of storage memory by execution.
      
      ## The bug:
      
      In general, execution should be able to evict storage memory when the total storage memory usage is greater than `maxMemory * spark.memory.storageFraction`. Due to a bug, however, Spark might wind up evicting no storage memory in certain cases where the storage memory usage was between `maxMemory * spark.memory.storageFraction` and `maxMemory`. For example, here is a regression test which illustrates the bug:
      
      ```scala
          val maxMemory = 1000L
          val taskAttemptId = 0L
          val (mm, ms) = makeThings(maxMemory)
          // Since we used the default storage fraction (0.5), we should be able to allocate 500 bytes
          // of storage memory which are immune to eviction by execution memory pressure.
      
          // Acquire enough storage memory to exceed the storage region size
          assert(mm.acquireStorageMemory(dummyBlock, 750L, evictedBlocks))
          assertEvictBlocksToFreeSpaceNotCalled(ms)
          assert(mm.executionMemoryUsed === 0L)
          assert(mm.storageMemoryUsed === 750L)
      
          // At this point, storage is using 250 more bytes of memory than it is guaranteed, so execution
          // should be able to reclaim up to 250 bytes of storage memory.
          // Therefore, execution should now be able to require up to 500 bytes of memory:
          assert(mm.acquireExecutionMemory(500L, taskAttemptId, MemoryMode.ON_HEAP) === 500L) // <--- fails by only returning 250L
          assert(mm.storageMemoryUsed === 500L)
          assert(mm.executionMemoryUsed === 500L)
          assertEvictBlocksToFreeSpaceCalled(ms, 250L)
      ```
      
      The problem relates to the control flow / interaction between `StorageMemoryPool.shrinkPoolToReclaimSpace()` and `MemoryStore.ensureFreeSpace()`. While trying to allocate the 500 bytes of execution memory, the `UnifiedMemoryManager` discovers that it will need to reclaim 250 bytes of memory from storage, so it calls `StorageMemoryPool.shrinkPoolToReclaimSpace(250L)`. This method, in turn, calls `MemoryStore.ensureFreeSpace(250L)`. However, `ensureFreeSpace()` first checks whether the requested space is less than `maxStorageMemory - storageMemoryUsed`, which will be true if there is any free execution memory because it turns out that `MemoryStore.maxStorageMemory = (maxMemory - onHeapExecutionMemoryPool.memoryUsed)` when the `UnifiedMemoryManager` is used.
      
      The control flow here is somewhat confusing (it grew to be messy / confusing over time / as a result of the merging / refactoring of several components). In the pre-Spark 1.6 code, `ensureFreeSpace` was called directly by the `MemoryStore` itself, whereas in 1.6 it's involved in a confusing control flow where `MemoryStore` calls `MemoryManager.acquireStorageMemory`, which then calls back into `MemoryStore.ensureFreeSpace`, which, in turn, calls `MemoryManager.freeStorageMemory`.
      
      ## The solution:
      
      The solution implemented in this patch is to remove the confusing circular control flow between `MemoryManager` and `MemoryStore`, making the storage memory acquisition process much more linear / straightforward. The key changes:
      
      - Remove a layer of inheritance which made the memory manager code harder to understand (53841174760a24a0df3eb1562af1f33dbe340eb9).
      - Move some bounds checks earlier in the call chain (13ba7ada77f87ef1ec362aec35c89a924e6987cb).
      - Refactor `ensureFreeSpace()` so that the part which evicts blocks can be called independently from the part which checks whether there is enough free space to avoid eviction (7c68ca09cb1b12f157400866983f753ac863380e).
      - Realize that this lets us remove a layer of overloads from `ensureFreeSpace` (eec4f6c87423d5e482b710e098486b3bbc4daf06).
      - Realize that `ensureFreeSpace()` can simply be replaced with an `evictBlocksToFreeSpace()` method which is called [after we've already figured out](https://github.com/apache/spark/blob/2dc842aea82c8895125d46a00aa43dfb0d121de9/core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala#L88) how much memory needs to be reclaimed via eviction; (2dc842aea82c8895125d46a00aa43dfb0d121de9).
      
      Along the way, I fixed some problems with the mocks in `MemoryManagerSuite`: the old mocks would [unconditionally](https://github.com/apache/spark/blob/80a824d36eec9d9a9f092ee1741453851218ec73/core/src/test/scala/org/apache/spark/memory/MemoryManagerSuite.scala#L84) report that a block had been evicted even if there was enough space in the storage pool such that eviction would be avoided.
      
      I also fixed a problem where `StorageMemoryPool._memoryUsed` might become negative due to freed memory being double-counted when excution evicts storage. The problem was that `StorageMemoryPoolshrinkPoolToFreeSpace` would [decrement `_memoryUsed`](https://github.com/apache/spark/commit/7c68ca09cb1b12f157400866983f753ac863380e#diff-935c68a9803be144ed7bafdd2f756a0fL133) even though `StorageMemoryPool.freeMemory` had already decremented it as each evicted block was freed. See SPARK-12189 for details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10170 from JoshRosen/SPARK-12165.
      aec5ea00
    • Steve Loughran's avatar
      [SPARK-12241][YARN] Improve failure reporting in Yarn client obtainTokenForHBase() · 442a7715
      Steve Loughran authored
      This lines up the HBase token logic with that done for Hive in SPARK-11265: reflection with only CFNE being swallowed.
      
      There is a test, one which doesn't try to put HBase on the yarn/test class and really do the reflection (the way the hive introspection does). If people do want that then it could be added with careful POM work
      
      +also: cut an incorrect comment from the Hive test case before copying it, and a couple of imports that may have been related to the hive test in the past.
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #10227 from steveloughran/stevel/patches/SPARK-12241-obtainTokenForHBase.
      442a7715
    • jerryshao's avatar
      [SPARK-10582][YARN][CORE] Fix AM failure situation for dynamic allocation · 6900f017
      jerryshao authored
      Because of AM failure, the target executor number between driver and AM will be different, which will lead to unexpected behavior in dynamic allocation. So when AM is re-registered with driver, state in `ExecutorAllocationManager` and `CoarseGrainedSchedulerBacked` should be reset.
      
      This issue is originally addressed in #8737 , here re-opened again. Thanks a lot KaiXinXiaoLei for finding this issue.
      
      andrewor14 and vanzin would you please help to review this, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #9963 from jerryshao/SPARK-10582.
      6900f017
    • Holden Karau's avatar
      [SPARK-10299][ML] word2vec should allow users to specify the window size · 22b9a874
      Holden Karau authored
      Currently word2vec has the window hard coded at 5, some users may want different sizes (for example if using on n-gram input or similar). User request comes from http://stackoverflow.com/questions/32231975/spark-word2vec-window-size .
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8513 from holdenk/SPARK-10299-word2vec-should-allow-users-to-specify-the-window-size.
      22b9a874
    • Cheng Lian's avatar
      [SPARK-12012][SQL] Show more comprehensive PhysicalRDD metadata when visualizing SQL query plan · 6e1c55ea
      Cheng Lian authored
      This PR adds a `private[sql]` method `metadata` to `SparkPlan`, which can be used to describe detail information about a physical plan during visualization. Specifically, this PR uses this method to provide details of `PhysicalRDD`s translated from a data source relation. For example, a `ParquetRelation` converted from Hive metastore table `default.psrc` is now shown as the following screenshot:
      
      ![image](https://cloud.githubusercontent.com/assets/230655/11526657/e10cb7e6-9916-11e5-9afa-f108932ec890.png)
      
      And here is the screenshot for a regular `ParquetRelation` (not converted from Hive metastore table) loaded from a really long path:
      
      ![output](https://cloud.githubusercontent.com/assets/230655/11680582/37c66460-9e94-11e5-8f50-842db5309d5a.png)
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10004 from liancheng/spark-12012.physical-rdd-metadata.
      6e1c55ea
    • uncleGen's avatar
      [SPARK-12031][CORE][BUG] Integer overflow when do sampling · a1132168
      uncleGen authored
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #10023 from uncleGen/1.6-bugfix.
      a1132168
    • hyukjinkwon's avatar
      [SPARK-11676][SQL] Parquet filter tests all pass if filters are not really pushed down · f6883bb7
      hyukjinkwon authored
      Currently Parquet predicate tests all pass even if filters are not pushed down or this is disabled.
      
      In this PR, For checking evaluating filters, Simply it makes the expression from `expression.Filter` and then try to create filters just like Spark does.
      
      For checking the results, this manually accesses to the child rdd (of `expression.Filter`) and produces the results which should be filtered properly, and then compares it to expected values.
      
      Now, if filters are not pushed down or this is disabled, this throws exceptions.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #9659 from HyukjinKwon/SPARK-11676.
      f6883bb7
  2. Dec 08, 2015
  3. Dec 07, 2015
Loading