Skip to content
Snippets Groups Projects
  1. Sep 29, 2014
  2. Sep 28, 2014
  3. Sep 27, 2014
  4. Sep 26, 2014
    • aniketbhatnagar's avatar
      SPARK-3639 | Removed settings master in examples · d6ed5abf
      aniketbhatnagar authored
      
      This patch removes setting of master as local in Kinesis examples so that users can set it using submit-job.
      
      Author: aniketbhatnagar <aniket.bhatnagar@gmail.com>
      
      Closes #2536 from aniketbhatnagar/Kinesis-Examples-Master-Unset and squashes the following commits:
      
      c9723ac [aniketbhatnagar] Merge remote-tracking branch 'origin/Kinesis-Examples-Master-Unset' into Kinesis-Examples-Master-Unset
      fec8ead [aniketbhatnagar] SPARK-3639 | Removed settings master in examples
      31cdc59 [aniketbhatnagar] SPARK-3639 | Removed settings master in examples
      
      (cherry picked from commit d16e161d)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      d6ed5abf
  5. Sep 23, 2014
    • Mubarak Seyed's avatar
      [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI · 505ed6ba
      Mubarak Seyed authored
      This is a refactored version of the original PR https://github.com/apache/spark/pull/1723
      
       my mubarak
      
      Please take a look andrewor14, mubarak
      
      Author: Mubarak Seyed <mubarak.seyed@gmail.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2464 from tdas/streaming-callsite and squashes the following commits:
      
      dc54c71 [Tathagata Das] Made changes based on PR comments.
      390b45d [Tathagata Das] Fixed minor bugs.
      904cd92 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-callsite
      7baa427 [Tathagata Das] Refactored getCallSite and setCallSite to make it simpler. Also added unit test for DStream creation site.
      b9ed945 [Mubarak Seyed] Adding streaming utils
      c461cf4 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      ceb43da [Mubarak Seyed] Changing default regex function name
      8c5d443 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      196121b [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      491a1eb [Mubarak Seyed] Removing streaming visibility from getRDDCreationCallSite in DStream
      33a7295 [Mubarak Seyed] Fixing review comments: Merging both setCallSite methods
      c26d933 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      f51fd9f [Mubarak Seyed] Fixing scalastyle, Regex for Utils.getCallSite, and changing method names in DStream
      5051c58 [Mubarak Seyed] Getting return value of compute() into variable and call setCallSite(prevCallSite) only once. Adding return for other code paths (for None)
      a207eb7 [Mubarak Seyed] Fixing code review comments
      ccde038 [Mubarak Seyed] Removing Utils import from MappedDStream
      2a09ad6 [Mubarak Seyed] Changes in Utils.scala for SPARK-1853
      1d90cc3 [Mubarak Seyed] Changes for SPARK-1853
      5f3105a [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
      70f494f [Mubarak Seyed] Changes for SPARK-1853
      1500deb [Mubarak Seyed] Changes in Spark Streaming UI
      9d38d3c [Mubarak Seyed] [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
      d466d75 [Mubarak Seyed] Changes for spark streaming UI
      
      (cherry picked from commit 729952a5)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      505ed6ba
    • Andrew Or's avatar
      [SPARK-3653] Respect SPARK_*_MEMORY for cluster mode · 5bbc621f
      Andrew Or authored
      `SPARK_DRIVER_MEMORY` was only used to start the `SparkSubmit` JVM, which becomes the driver only in client mode but not cluster mode. In cluster mode, this property is simply not propagated to the worker nodes.
      
      `SPARK_EXECUTOR_MEMORY` is picked up from `SparkContext`, but in cluster mode the driver runs on one of the worker machines, where this environment variable may not be set.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2500 from andrewor14/memory-env-vars and squashes the following commits:
      
      6217b38 [Andrew Or] Respect SPARK_*_MEMORY for cluster mode
      
      Conflicts:
      	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
      5bbc621f
    • Sandy Ryza's avatar
      SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach ... · ffd97be3
      Sandy Ryza authored
      
      ...the driver
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #2487 from sryza/sandy-spark-3612 and squashes the following commits:
      
      2b7353d [Sandy Ryza] SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach the driver
      (cherry picked from commit d79238d0)
      
      Signed-off-by: default avatarPatrick Wendell <pwendell@gmail.com>
      ffd97be3
  6. Sep 22, 2014
  7. Sep 21, 2014
  8. Sep 19, 2014
    • andrewor14's avatar
      [Docs] Fix outdated docs for standalone cluster · fd883532
      andrewor14 authored
      
      This is now supported!
      
      Author: andrewor14 <andrewor14@gmail.com>
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2461 from andrewor14/document-standalone-cluster and squashes the following commits:
      
      85c8b9e [andrewor14] Wording change per Patrick
      35e30ee [Andrew Or] Fix outdated docs for standalone cluster
      
      (cherry picked from commit 8af23706)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      fd883532
    • Larry Xiao's avatar
      [SPARK-2062][GraphX] VertexRDD.apply does not use the mergeFunc · 1687d6ba
      Larry Xiao authored
      
      VertexRDD.apply had a bug where it ignored the merge function for
      duplicate vertices and instead used whichever vertex attribute occurred
      first. This commit fixes the bug by passing the merge function through
      to ShippableVertexPartition.apply, which merges any duplicates using the
      merge function and then fills in missing vertices using the specified
      default vertex attribute. This commit also adds a unit test for
      VertexRDD.apply.
      
      Author: Larry Xiao <xiaodi@sjtu.edu.cn>
      Author: Blie Arkansol <xiaodi@sjtu.edu.cn>
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #1903 from larryxiao/2062 and squashes the following commits:
      
      625aa9d [Blie Arkansol] Merge pull request #1 from ankurdave/SPARK-2062
      476770b [Ankur Dave] ShippableVertexPartition.initFrom: Don't run mergeFunc on default values
      614059f [Larry Xiao] doc update: note about the default null value vertices construction
      dfdb3c9 [Larry Xiao] minor fix
      1c70366 [Larry Xiao] scalastyle check: wrap line, parameter list indent 4 spaces
      e4ca697 [Larry Xiao] [TEST] VertexRDD.apply mergeFunc
      6a35ea8 [Larry Xiao] [TEST] VertexRDD.apply mergeFunc
      4fbc29c [Blie Arkansol] undo unnecessary change
      efae765 [Larry Xiao] fix mistakes: should be able to call with or without mergeFunc
      b2422f9 [Larry Xiao] Merge branch '2062' of github.com:larryxiao/spark into 2062
      52dc7f7 [Larry Xiao] pass mergeFunc to VertexPartitionBase, where merge is handled
      581e9ee [Larry Xiao] TODO: VertexRDDSuite
      20d80a3 [Larry Xiao] [SPARK-2062][GraphX] VertexRDD.apply does not use the mergeFunc
      
      (cherry picked from commit 3bbbdd81)
      Signed-off-by: default avatarAnkur Dave <ankurdave@gmail.com>
      1687d6ba
  9. Sep 18, 2014
  10. Sep 17, 2014
  11. Sep 16, 2014
    • Andrew Or's avatar
      [SPARK-3490] Disable SparkUI for tests (backport into 1.1) · 937de93e
      Andrew Or authored
      Original PR: #2363
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2415 from andrewor14/disable-ui-for-tests-1.1 and squashes the following commits:
      
      8d9df5a [Andrew Or] Oops, missed one.
      509507d [Andrew Or] Backport #2363 (SPARK-3490) into branch-1.1
      937de93e
    • Andrew Or's avatar
      [SPARK-3555] Fix UISuite race condition · 856156b4
      Andrew Or authored
      
      The test "jetty selects different port under contention" is flaky.
      
      If another process binds to 4040 before the test starts, then the first server we start there will fail, and the subsequent servers we start thereafter may successfully bind to 4040 if it was released between the servers starting. Instead, we should just let Java find a random free port for us and hold onto it for the duration of the test.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2418 from andrewor14/fix-port-contention and squashes the following commits:
      
      0cd4974 [Andrew Or] Stop them servers
      a7071fe [Andrew Or] Pick random port instead of 4040
      
      (cherry picked from commit 0a7091e6)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      856156b4
    • Michael Armbrust's avatar
      [SQL][DOCS] Improve section on thrift-server · 75158a7e
      Michael Armbrust authored
      
      Taken from liancheng's updates. Merged conflicts with #2316.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2384 from marmbrus/sqlDocUpdate and squashes the following commits:
      
      2db6319 [Michael Armbrust] @liancheng's updates
      
      (cherry picked from commit 84073eb1)
      Signed-off-by: default avatarMichael Armbrust <michael@databricks.com>
      75158a7e
  12. Sep 15, 2014
  13. Sep 14, 2014
    • Bertrand Bossy's avatar
      SPARK-3039: Allow spark to be built using avro-mapred for hadoop2 · 78887f94
      Bertrand Bossy authored
      
      SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles.
      
      I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this.
      
      Author: Bertrand Bossy <bertrandbossy@gmail.com>
      
      Closes #1945 from bbossy/SPARK-3039 and squashes the following commits:
      
      c32ce59 [Bertrand Bossy] SPARK-3039: Allow spark to be built using avro-mapred for hadoop2
      
      (cherry picked from commit c243b21a)
      Signed-off-by: default avatarPatrick Wendell <pwendell@gmail.com>
      78887f94
  14. Sep 13, 2014
    • Nicholas Chammas's avatar
      [SQL] [Docs] typo fixes · 70f93d5a
      Nicholas Chammas authored
      
      * Fixed random typo
      * Added in missing description for DecimalType
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2367 from nchammas/patch-1 and squashes the following commits:
      
      aa528be [Nicholas Chammas] doc fix for SQL DecimalType
      3247ac1 [Nicholas Chammas] [SQL] [Docs] typo fixes
      
      (cherry picked from commit a523ceaf)
      Signed-off-by: default avatarMichael Armbrust <michael@databricks.com>
      70f93d5a
  15. Sep 12, 2014
    • Cheng Lian's avatar
      [SPARK-3515][SQL] Moves test suite setup code to beforeAll rather than in constructor · 44e534eb
      Cheng Lian authored
      
      Please refer to the JIRA ticket for details.
      
      **NOTE** We should check all test suites that do similar initialization-like side effects in their constructors. This PR only fixes `ParquetMetastoreSuite` because it breaks our Jenkins Maven build.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2375 from liancheng/say-no-to-constructor and squashes the following commits:
      
      0ceb75b [Cheng Lian] Moves test suite setup code to beforeAll rather than in constructor
      
      (cherry picked from commit 6d887db7)
      Signed-off-by: default avatarMichael Armbrust <michael@databricks.com>
      44e534eb
    • Davies Liu's avatar
      [SPARK-3500] [SQL] use JavaSchemaRDD as SchemaRDD._jschema_rdd · 9c06c723
      Davies Liu authored
      
      Currently, SchemaRDD._jschema_rdd is SchemaRDD, the Scala API (coalesce(), repartition()) can not been called in Python easily, there is no way to specify the implicit parameter `ord`. The _jrdd is an JavaRDD, so _jschema_rdd should also be JavaSchemaRDD.
      
      In this patch, change _schema_rdd to JavaSchemaRDD, also added an assert for it. If some methods are missing from JavaSchemaRDD, then it's called by _schema_rdd.baseSchemaRDD().xxx().
      
      BTW, Do we need JavaSQLContext?
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2369 from davies/fix_schemardd and squashes the following commits:
      
      abee159 [Davies Liu] use JavaSchemaRDD as SchemaRDD._jschema_rdd
      
      (cherry picked from commit 885d1621)
      Signed-off-by: default avatarJosh Rosen <joshrosen@apache.org>
      
      Conflicts:
      	python/pyspark/tests.py
      9c06c723
    • Cheng Hao's avatar
      [SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test · 6cbf83c0
      Cheng Hao authored
      
      Logically, we should remove the Hive Table/Database first and then reset the Hive configuration, repoint to the new data warehouse directory etc.
      Otherwise it raised exceptions like "Database doesn't not exists: default" in the local testing.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2352 from chenghao-intel/test_hive and squashes the following commits:
      
      74fd76b [Cheng Hao] eliminate the error log
      
      (cherry picked from commit 8194fc66)
      Signed-off-by: default avatarMichael Armbrust <michael@databricks.com>
      6cbf83c0
    • Andrew Or's avatar
      Revert "[Spark-3490] Disable SparkUI for tests" · f17b7957
      Andrew Or authored
      This reverts commit 2ffc7980.
      f17b7957
  16. Sep 11, 2014
    • Davies Liu's avatar
      [SPARK-3465] fix task metrics aggregation in local mode · e69deb81
      Davies Liu authored
      Before overwrite t.taskMetrics, take a deepcopy of it.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2338 from davies/fix_metric and squashes the following commits:
      
      a5cdb63 [Davies Liu] Merge branch 'master' into fix_metric
      7c879e0 [Davies Liu] add more comments
      754b5b8 [Davies Liu] copy taskMetrics only when isLocal is true
      5ca26dc [Davies Liu] fix task metrics aggregation in local mode
      e69deb81
    • Andrew Ash's avatar
      [SPARK-3429] Don't include the empty string "" as a defaultAclUser · 4245404e
      Andrew Ash authored
      
      Changes logging from
      
      ```
      14/09/05 02:01:08 INFO SecurityManager: Changing view acls to: aash,
      14/09/05 02:01:08 INFO SecurityManager: Changing modify acls to: aash,
      14/09/05 02:01:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash, ); users with modify permissions: Set(aash, )
      ```
      to
      ```
      14/09/05 02:28:28 INFO SecurityManager: Changing view acls to: aash
      14/09/05 02:28:28 INFO SecurityManager: Changing modify acls to: aash
      14/09/05 02:28:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash); users with modify permissions: Set(aash)
      ```
      
      Note that the first set of logs have a Set of size 2 containing "aash" and the empty string ""
      
      cc tgravescs
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #2286 from ash211/empty-default-acl and squashes the following commits:
      
      18cc612 [Andrew Ash] Use .isEmpty instead of ==""
      cf973a1 [Andrew Ash] Don't include the empty string "" as a defaultAclUser
      
      (cherry picked from commit ce59725b)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      4245404e
    • Andrew Or's avatar
      [Spark-3490] Disable SparkUI for tests · 2ffc7980
      Andrew Or authored
      
      We currently open many ephemeral ports during the tests, and as a result we occasionally can't bind to new ones. This has caused the `DriverSuite` and the `SparkSubmitSuite` to fail intermittently.
      
      By disabling the `SparkUI` when it's not needed, we already cut down on the number of ports opened significantly, on the order of the number of `SparkContexts` ever created. We must keep it enabled for a few tests for the UI itself, however.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2363 from andrewor14/disable-ui-for-tests and squashes the following commits:
      
      332a7d5 [Andrew Or] No need to set spark.ui.port to 0 anymore
      30c93a2 [Andrew Or] Simplify streaming UISuite
      a431b84 [Andrew Or] Fix streaming test failures
      8f5ae53 [Andrew Or] Fix no new line at the end
      29c9b5b [Andrew Or] Disable SparkUI for tests
      
      (cherry picked from commit 6324eb7b)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      
      Conflicts:
      	pom.xml
      	yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
      	yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
      2ffc7980
    • Chris Cope's avatar
      [SPARK-2140] Updating heap memory calculation for YARN stable and alpha. · 06fb2d05
      Chris Cope authored
      
      Updated pull request, reflecting YARN stable and alpha states. I am getting intermittent test failures on my own test infrastructure. Is that tracked anywhere yet?
      
      Author: Chris Cope <ccope@resilientscience.com>
      
      Closes #2253 from copester/master and squashes the following commits:
      
      5ad89da [Chris Cope] [SPARK-2140] Removing calculateAMMemory functions since they are no longer needed.
      52b4e45 [Chris Cope] [SPARK-2140] Updating heap memory calculation for YARN stable and alpha.
      
      (cherry picked from commit ed1980ff)
      Signed-off-by: default avatarThomas Graves <tgraves@apache.org>
      06fb2d05
    • Patrick Wendell's avatar
      HOTFIX: Changing color on doc menu · e51ce9a5
      Patrick Wendell authored
      e51ce9a5
  17. Sep 09, 2014
    • Andrew Or's avatar
      [SPARK-1919] Fix Windows spark-shell --jars · 359cd59d
      Andrew Or authored
      We were trying to add `file:/C:/path/to/my.jar` to the class path. We should add `C:/path/to/my.jar` instead. Tested on Windows 8.1.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2211 from andrewor14/windows-shell-jars and squashes the following commits:
      
      262c6a2 [Andrew Or] Oops... Add the new code to the correct place
      0d5a0c1 [Andrew Or] Format jar path only for adding to shell classpath
      42bd626 [Andrew Or] Remove unnecessary code
      0049f1b [Andrew Or] Remove embarrassing log messages
      b1755a0 [Andrew Or] Format jar paths properly before adding them to the classpath
      359cd59d
    • Josh Rosen's avatar
      [SPARK-3061] Fix Maven build under Windows · 23fd3e8b
      Josh Rosen authored
      The Maven build was failing on Windows because it tried to call the unix `unzip` utility to extract the Py4J files into core's build directory.  I've fixed this issue by using the `maven-antrun-plugin` to perform the unzipping.
      
      I also fixed an issue that prevented tests from running under Windows:
      
      In the Maven ScalaTest plugin, the filename listed in <filereports> is placed under the <reportsDirectory>; the current code places it in a subdirectory of reportsDirectory, e.g.
      
      ```
      ${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt
      ```
      
      This caused problems under Windows because it would try to create a subdirectory named "c:\\".
      
      Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory.
      
      Author: Josh Rosen <joshrosen@apache.org>
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2165 from JoshRosen/windows-support and squashes the following commits:
      
      651d210 [Josh Rosen] Unzip to python/build instead of core/build
      fbf3e61 [Josh Rosen] 4 spaces -> 2 spaces
      e347668 [Josh Rosen] Fix Maven scalatest filereports path:
      4994af1 [Josh Rosen] [SPARK-3061] Use maven-antrun-plugin to unzip Py4J.
      23fd3e8b
    • Liang-Chi Hsieh's avatar
      [SPARK-3345] Do correct parameters for ShuffleFileGroup · e5f77ae9
      Liang-Chi Hsieh authored
      In the method `newFileGroup` of class `FileShuffleBlockManager`, the parameters for creating new `ShuffleFileGroup` object is in wrong order.
      
      Because in current codes, the parameters `shuffleId` and `fileId` are not used. So it doesn't cause problem now. However it should be corrected for readability and avoid future problem.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2235 from viirya/correct_shufflefilegroup_params and squashes the following commits:
      
      fe72567 [Liang-Chi Hsieh] Do correct parameters for ShuffleFileGroup.
      e5f77ae9
    • scwf's avatar
      [SPARK-3193]output errer info when Process exit code is not zero in test suite · 24262684
      scwf authored
      https://issues.apache.org/jira/browse/SPARK-3193
      I noticed that sometimes pr tests failed due to the Process exitcode != 0,refer to
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull
      
      
      
      [info] SparkSubmitSuite:
      [info] - prints usage on empty input
      [info] - prints usage with only --help
      [info] - prints error with unrecognized options
      [info] - handle binary specified but not class
      [info] - handles arguments with --key=val
      [info] - handles arguments to user program
      [info] - handles arguments to user program with name collision
      [info] - handles YARN cluster mode
      [info] - handles YARN client mode
      [info] - handles standalone cluster mode
      [info] - handles standalone client mode
      [info] - handles mesos client mode
      [info] - handles confs with flag equivalents
      [info] - launch simple application with spark-submit *** FAILED ***
      [info]   org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited with code 1
      [info]   at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872)
      [info]   at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
      [info]   at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291)
      [info]   at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284)
      [info]   at org.apacSpark assembly has been built with Hive, including Datanucleus jars on classpath
      
      this PR output the process error info when failed, it can be helpful for diagnosis.
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2108 from scwf/output-test-error-info and squashes the following commits:
      
      0c48082 [scwf] minor fix according to comments
      563fde1 [scwf] output errer info when Process exitcode not zero
      
      (cherry picked from commit 26862337)
      Signed-off-by: default avatarAndrew Or <andrewor14@gmail.com>
      24262684
Loading