Skip to content
Snippets Groups Projects
  1. Oct 12, 2014
    • NamelessAnalyst's avatar
      SPARK-3716 [GraphX] Update Analytics.scala for partitionStrategy assignment · 5a21e3e7
      NamelessAnalyst authored
      Previously, when the val partitionStrategy was created it called a function in the Analytics object which was a copy of the PartitionStrategy.fromString() method. This function has been removed, and the assignment of partitionStrategy now uses the PartitionStrategy.fromString method instead. In this way, it better matches the declarations of edge/vertex StorageLevel variables.
      
      Author: NamelessAnalyst <NamelessAnalyst@users.noreply.github.com>
      
      Closes #2569 from NamelessAnalyst/branch-1.1 and squashes the following commits:
      
      c24ff51 [NamelessAnalyst] Update Analytics.scala
      5a21e3e7
  2. Oct 09, 2014
    • Yash Datta's avatar
      [SPARK-3711][SQL] Optimize where in clause filter queries · 18ef22ab
      Yash Datta authored
      The In case class is replaced by a InSet class in case all the filters are literals, which uses a hashset instead of Sequence, thereby giving significant performance improvement (earlier the seq was using a worst case linear match (exists method) since expressions were assumed in the filter list) . Maximum improvement should be visible in case small percentage of large data matches the filter list.
      
      Author: Yash Datta <Yash.Datta@guavus.com>
      
      Closes #2561 from saucam/branch-1.1 and squashes the following commits:
      
      4bf2d19 [Yash Datta] SPARK-3711: 1. Fix code style and import order             2. Fix optimization condition             3. Add tests for null in filter list             4. Add test case that optimization is not triggered in case of attributes in filter list
      afedbcd [Yash Datta] SPARK-3711: 1. Add test cases for InSet class in ExpressionEvaluationSuite             2. Add class OptimizedInSuite on the lines of ConstantFoldingSuite, for the optimized In clause
      0fc902f [Yash Datta] SPARK-3711: UnaryMinus will be handled by constantFolding
      bd84c67 [Yash Datta] SPARK-3711: Incorporate review comments. Move optimization of In clause to Optimizer.scala by adding a rule. Add appropriate comments
      430f5d1 [Yash Datta] SPARK-3711: Optimize the filter list in case of negative values as well
      bee98aa [Yash Datta] SPARK-3711: Optimize where in clause filter queries
      18ef22ab
    • Xiangrui Meng's avatar
      [SPARK-3844][UI] Truncate appName in WebUI if it is too long · 09d6a81a
      Xiangrui Meng authored
      
      Truncate appName in WebUI if it is too long.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #2707 from mengxr/truncate-app-name and squashes the following commits:
      
      87834ce [Xiangrui Meng] move scala import below java
      c7111dc [Xiangrui Meng] truncate appName in WebUI if it is too long
      
      (cherry picked from commit 86b39294)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      09d6a81a
  3. Oct 08, 2014
    • Marcelo Vanzin's avatar
      [SPARK-3788] [yarn] Fix compareFs to do the right thing for HDFS namespaces (1.1 version). · a44af730
      Marcelo Vanzin authored
      HA and viewfs use namespaces instead of host names, so you can't
      resolve them since that will fail. So be smarter to avoid doing
      unnecessary work.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2650 from vanzin/SPARK-3788-1.1 and squashes the following commits:
      
      174bf71 [Marcelo Vanzin] Update comment.
      0e36be7 [Marcelo Vanzin] Use Objects.equal() instead of ==.
      772aead [Marcelo Vanzin] [SPARK-3788] [yarn] Fix compareFs to do the right thing for HA, federation (1.1 version).
      a44af730
  4. Oct 07, 2014
    • Kousuke Saruta's avatar
      [SPARK-3829] Make Spark logo image on the header of HistoryPage as a link to HistoryPage's page #1 · a1f833f7
      Kousuke Saruta authored
      There is a Spark logo on the header of HistoryPage.
      We can have too many HistoryPages if we run 20+ applications. So I think, it's useful if the logo is as a link to the HistoryPage's page number 1.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2690 from sarutak/SPARK-3829 and squashes the following commits:
      
      908c109 [Kousuke Saruta] Removed extra space.
      00bfbd7 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark
      
       into SPARK-3829
      dd87480 [Kousuke Saruta] Made header Spark log image as a link to History Server's top page.
      
      (cherry picked from commit b69c9fb6)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      a1f833f7
    • zsxwing's avatar
      [SPARK-3777] Display "Executor ID" for Tasks in Stage page · e8afb733
      zsxwing authored
      Now the Stage page only displays "Executor"(host) for tasks. However, there may be more than one Executors running in the same host. Currently, when some task is hung, I only know the host of the faulty executor. Therefore I have to check all executors in the host.
      
      Adding "Executor ID" in the Tasks table. would be helpful to locate the faulty executor. Here is the new page:
      
      ![add_executor_id_for_tasks](https://cloud.githubusercontent.com/assets/1000778/4505774/acb9648c-4afa-11e4-8826-8768a0a60cc9.png
      
      )
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #2642 from zsxwing/SPARK-3777 and squashes the following commits:
      
      37945af [zsxwing] Put Executor ID and Host into one cell
      4bbe2c7 [zsxwing] [SPARK-3777] Display "Executor ID" for Tasks in Stage page
      
      (cherry picked from commit 446063ec)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      e8afb733
    • Davies Liu's avatar
      [SPARK-3731] [PySpark] fix memory leak in PythonRDD · 55318302
      Davies Liu authored
      
      The parent.getOrCompute() of PythonRDD is executed in a separated thread, it should release the memory reserved for shuffle and unrolling finally.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2668 from davies/leak and squashes the following commits:
      
      ae98be2 [Davies Liu] fix memory leak in PythonRDD
      
      (cherry picked from commit bc87cc41)
      Signed-off-by: default avatarJosh Rosen <joshrosen@apache.org>
      
      Conflicts:
      	core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
      55318302
    • Andrew Or's avatar
      [SPARK-3825] Log more detail when unrolling a block fails · 267c7be3
      Andrew Or authored
      
      Before:
      ```
      14/10/06 16:45:42 WARN CacheManager: Not enough space to cache partition rdd_0_2
      in memory! Free memory is 481861527 bytes.
      ```
      After:
      ```
      14/10/07 11:08:24 WARN MemoryStore: Not enough space to cache rdd_2_0 in memory!
      (computed 68.8 MB so far)
      14/10/07 11:08:24 INFO MemoryStore: Memory use = 1088.0 B (blocks) + 445.1 MB
      (scratch space shared across 8 thread(s)) = 445.1 MB. Storage limit = 459.5 MB.
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2688 from andrewor14/cache-log-message and squashes the following commits:
      
      28e33d6 [Andrew Or] Shy away from "unrolling"
      5638c49 [Andrew Or] Grammar
      39a0c28 [Andrew Or] Log more detail when unrolling a block fails
      
      (cherry picked from commit 553737c6)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      267c7be3
    • Masayoshi TSUZUKI's avatar
      [SPARK-3808] PySpark fails to start in Windows · 3a7875d9
      Masayoshi TSUZUKI authored
      
      Modified syntax error of *.cmd script.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2669 from tsudukim/feature/SPARK-3808 and squashes the following commits:
      
      7f804e6 [Masayoshi TSUZUKI] [SPARK-3808] PySpark fails to start in Windows
      
      (cherry picked from commit 12e2551e)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      3a7875d9
    • Hossein's avatar
      [SPARK-3827] Very long RDD names are not rendered properly in web UI · 82ab4a79
      Hossein authored
      
      With Spark SQL we generate very long RDD names. These names are not properly rendered in the web UI.
      
      This PR fixes the rendering issue.
      
      [SPARK-3827] #comment Linking PR with JIRA
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #2687 from falaki/sparkTableUI and squashes the following commits:
      
      fd06409 [Hossein] Limit width of cell when RDD name is too long
      
      (cherry picked from commit d65fd554)
      Signed-off-by: default avatarJosh Rosen <joshrosen@apache.org>
      82ab4a79
  5. Oct 05, 2014
    • scwf's avatar
      [SPARK-3792][SQL] Enable JavaHiveQLSuite · 964e3aa4
      scwf authored
      
      Do not use TestSQLContext in JavaHiveQLSuite, that may lead to two SparkContexts in one jvm and enable JavaHiveQLSuite
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2652 from scwf/fix-JavaHiveQLSuite and squashes the following commits:
      
      be35c91 [scwf] enable JavaHiveQLSuite
      
      (cherry picked from commit 58f5361c)
      Signed-off-by: default avatarMichael Armbrust <michael@databricks.com>
      964e3aa4
    • zsxwing's avatar
      SPARK-1656: Fix potential resource leaks · c068d908
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1656
      
      
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #577 from zsxwing/SPARK-1656 and squashes the following commits:
      
      c431095 [zsxwing] Add a comment and fix the code style
      2de96e5 [zsxwing] Make sure file will be deleted if exception happens
      28b90dc [zsxwing] Update to follow the code style
      4521d6e [zsxwing] Merge branch 'master' into SPARK-1656
      afc3383 [zsxwing] Update to follow the code style
      071fdd1 [zsxwing] SPARK-1656: Fix potential resource leaks
      
      (cherry picked from commit a7c73130)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      c068d908
    • Brenden Matthews's avatar
      [SPARK-3597][Mesos] Implement `killTask`. · d9cf4d08
      Brenden Matthews authored
      
      The MesosSchedulerBackend did not previously implement `killTask`,
      resulting in an exception.
      
      Author: Brenden Matthews <brenden@diddyinc.com>
      
      Closes #2453 from brndnmtthws/implement-killtask and squashes the following commits:
      
      23ddcdc [Brenden Matthews] [SPARK-3597][Mesos] Implement `killTask`.
      
      (cherry picked from commit 32fad423)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      d9cf4d08
  6. Oct 03, 2014
  7. Oct 02, 2014
    • Eric Eijkelenboom's avatar
      [DEPLOY] SPARK-3759: Return the exit code of the driver process · 699af62d
      Eric Eijkelenboom authored
      
      SparkSubmitDriverBootstrapper.scala now returns the exit code of the driver process, instead of always returning 0.
      
      Author: Eric Eijkelenboom <ee@userreport.com>
      
      Closes #2628 from ericeijkelenboom/master and squashes the following commits:
      
      cc4a571 [Eric Eijkelenboom] Return the exit code of the driver process
      
      (cherry picked from commit 42d5077f)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      699af62d
    • scwf's avatar
      [SPARK-3755][Core] avoid trying privileged port when request a non-privileged port · 16789f62
      scwf authored
      
      pwendell, ```tryPort``` is not compatible with old code in last PR, this is to fix it.
      And after discuss with srowen renamed the title to "avoid trying privileged port when request a non-privileged port". Plz refer to the discuss for detail.
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2623 from scwf/1-1024 and squashes the following commits:
      
      10a4437 [scwf] add comment
      de3fd17 [scwf] do not try privileged port when request a non-privileged port
      42cb0fa [scwf] make tryPort compatible with old code
      cb8cc76 [scwf] do not use port 1 - 1024
      
      (cherry picked from commit 8081ce8b)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      
      Conflicts:
      	core/src/main/scala/org/apache/spark/util/Utils.scala
      16789f62
    • Yin Huai's avatar
      [SQL][Docs] Update the output of printSchema and fix a typo in SQL programming guide. · 68693519
      Yin Huai authored
      
      We have changed the output format of `printSchema`. This PR will update our SQL programming guide to show the updated format. Also, it fixes a typo (the value type of `StructType` in Java API).
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2630 from yhuai/sqlDoc and squashes the following commits:
      
      267d63e [Yin Huai] Update the output of printSchema and fix a typo.
      
      (cherry picked from commit 82a6a083)
      Signed-off-by: default avatarMichael Armbrust <michael@databricks.com>
      68693519
  8. Oct 01, 2014
  9. Sep 30, 2014
  10. Sep 29, 2014
  11. Sep 28, 2014
  12. Sep 27, 2014
  13. Sep 26, 2014
    • aniketbhatnagar's avatar
      SPARK-3639 | Removed settings master in examples · d6ed5abf
      aniketbhatnagar authored
      
      This patch removes setting of master as local in Kinesis examples so that users can set it using submit-job.
      
      Author: aniketbhatnagar <aniket.bhatnagar@gmail.com>
      
      Closes #2536 from aniketbhatnagar/Kinesis-Examples-Master-Unset and squashes the following commits:
      
      c9723ac [aniketbhatnagar] Merge remote-tracking branch 'origin/Kinesis-Examples-Master-Unset' into Kinesis-Examples-Master-Unset
      fec8ead [aniketbhatnagar] SPARK-3639 | Removed settings master in examples
      31cdc59 [aniketbhatnagar] SPARK-3639 | Removed settings master in examples
      
      (cherry picked from commit d16e161d)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      d6ed5abf
  14. Sep 23, 2014
    • Mubarak Seyed's avatar
      [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI · 505ed6ba
      Mubarak Seyed authored
      This is a refactored version of the original PR https://github.com/apache/spark/pull/1723
      
       my mubarak
      
      Please take a look andrewor14, mubarak
      
      Author: Mubarak Seyed <mubarak.seyed@gmail.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2464 from tdas/streaming-callsite and squashes the following commits:
      
      dc54c71 [Tathagata Das] Made changes based on PR comments.
      390b45d [Tathagata Das] Fixed minor bugs.
      904cd92 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-callsite
      7baa427 [Tathagata Das] Refactored getCallSite and setCallSite to make it simpler. Also added unit test for DStream creation site.
      b9ed945 [Mubarak Seyed] Adding streaming utils
      c461cf4 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      ceb43da [Mubarak Seyed] Changing default regex function name
      8c5d443 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      196121b [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      491a1eb [Mubarak Seyed] Removing streaming visibility from getRDDCreationCallSite in DStream
      33a7295 [Mubarak Seyed] Fixing review comments: Merging both setCallSite methods
      c26d933 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      f51fd9f [Mubarak Seyed] Fixing scalastyle, Regex for Utils.getCallSite, and changing method names in DStream
      5051c58 [Mubarak Seyed] Getting return value of compute() into variable and call setCallSite(prevCallSite) only once. Adding return for other code paths (for None)
      a207eb7 [Mubarak Seyed] Fixing code review comments
      ccde038 [Mubarak Seyed] Removing Utils import from MappedDStream
      2a09ad6 [Mubarak Seyed] Changes in Utils.scala for SPARK-1853
      1d90cc3 [Mubarak Seyed] Changes for SPARK-1853
      5f3105a [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      70f494f [Mubarak Seyed] Changes for SPARK-1853
      1500deb [Mubarak Seyed] Changes in Spark Streaming UI
      9d38d3c [Mubarak Seyed] [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
      d466d75 [Mubarak Seyed] Changes for spark streaming UI
      
      (cherry picked from commit 729952a5)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      505ed6ba
    • Andrew Or's avatar
      [SPARK-3653] Respect SPARK_*_MEMORY for cluster mode · 5bbc621f
      Andrew Or authored
      `SPARK_DRIVER_MEMORY` was only used to start the `SparkSubmit` JVM, which becomes the driver only in client mode but not cluster mode. In cluster mode, this property is simply not propagated to the worker nodes.
      
      `SPARK_EXECUTOR_MEMORY` is picked up from `SparkContext`, but in cluster mode the driver runs on one of the worker machines, where this environment variable may not be set.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2500 from andrewor14/memory-env-vars and squashes the following commits:
      
      6217b38 [Andrew Or] Respect SPARK_*_MEMORY for cluster mode
      
      Conflicts:
      	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
      5bbc621f
    • Sandy Ryza's avatar
      SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach ... · ffd97be3
      Sandy Ryza authored
      
      ...the driver
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #2487 from sryza/sandy-spark-3612 and squashes the following commits:
      
      2b7353d [Sandy Ryza] SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach the driver
      (cherry picked from commit d79238d0)
      
      Signed-off-by: default avatarPatrick Wendell <pwendell@gmail.com>
      ffd97be3
Loading