Skip to content
Snippets Groups Projects
  1. Jan 08, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4048] Enhance and extend hadoop-provided profile. · 48cecf67
      Marcelo Vanzin authored
      This change does a few things to make the hadoop-provided profile more useful:
      
      - Create new profiles for other libraries / services that might be provided by the infrastructure
      - Simplify and fix the poms so that the profiles are only activated while building assemblies.
      - Fix tests so that they're able to run when the profiles are activated
      - Add a new env variable to be used by distributions that use these profiles to provide the runtime
        classpath for Spark jobs and daemons.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
      
      82eb688 [Marcelo Vanzin] Add a comment.
      eb228c0 [Marcelo Vanzin] Fix borked merge.
      4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
      371ebee [Marcelo Vanzin] Review feedback.
      52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      322f882 [Marcelo Vanzin] Fix merge fail.
      f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9640503 [Marcelo Vanzin] Cleanup child process log message.
      115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
      e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
      7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
      1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
      d1399ed [Marcelo Vanzin] Restore jetty dependency.
      82a54b9 [Marcelo Vanzin] Remove unused profile.
      5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
      1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
      f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
      9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
      d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
      4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
      417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
      2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
      1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
      284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
      48cecf67
    • RJ Nowling's avatar
      [SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp dist sampling to P... · c9c8b219
      RJ Nowling authored
      ...ySpark MLlib
      
      This is a follow up to PR3680 https://github.com/apache/spark/pull/3680 .
      
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #3955 from rnowling/spark4891 and squashes the following commits:
      
      1236a01 [RJ Nowling] Fix Python style issues
      7a01a78 [RJ Nowling] Fix Python style issues
      174beab [RJ Nowling] [SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp dist sampling to PySpark MLlib
      c9c8b219
    • Kousuke Saruta's avatar
      [SPARK-4973][CORE] Local directory in the driver of client-mode continues... · a00af6be
      Kousuke Saruta authored
      [SPARK-4973][CORE] Local directory in the driver of client-mode continues remaining even if application finished when external shuffle is enabled
      
      When we enables external shuffle service, local directories in the driver of client-mode continue remaining even if application has finished.
      I think local directories for drivers should be deleted.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3811 from sarutak/SPARK-4973 and squashes the following commits:
      
      ad944ab [Kousuke Saruta] Fixed DiskBlockManager to cleanup local directory if it's the driver
      43770da [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973
      88feecd [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973
      d99718e [Kousuke Saruta] Fixed SparkSubmit.scala and DiskBlockManager.scala in order to delete local directories of the driver of local-mode when external shuffle service is enabled
      a00af6be
    • Fernando Otero (ZeoS)'s avatar
      SPARK-5148 [MLlib] Make usersOut/productsOut storagelevel in ALS configurable · 72df5a30
      Fernando Otero (ZeoS) authored
      Author: Fernando Otero (ZeoS) <fotero@gmail.com>
      
      Closes #3953 from zeitos/storageLevel and squashes the following commits:
      
      0f070b9 [Fernando Otero (ZeoS)] fix imports
      6869e80 [Fernando Otero (ZeoS)] fix comment length
      90c9f7e [Fernando Otero (ZeoS)] fix comment length
      18a992e [Fernando Otero (ZeoS)] changing storage level
      72df5a30
    • Eric Moyer's avatar
      Document that groupByKey will OOM for large keys · 538f2216
      Eric Moyer authored
      This pull request is my own work and I license it under Spark's open-source license.
      
      This contribution is an improvement to the documentation. I documented that the maximum number of values per key for groupByKey is limited by available RAM (see [Datablox][datablox link] and [the spark mailing list][list link]).
      
      Just saying that better performance is available is not sufficient. Sometimes you need to do a group-by - your operation needs all the items available in order to complete. This warning explains the problem.
      
      [datablox link]: http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
      [list link]: http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-RDD-GroupBy-OutOfMemory-Exceptions-tp11427p11466.html
      
      Author: Eric Moyer <eric_moyer@yahoo.com>
      
      Closes #3936 from RadixSeven/better-group-by-docs and squashes the following commits:
      
      5b6f4e9 [Eric Moyer] groupByKey docs naming updates
      238e81b [Eric Moyer] Doc that groupByKey will OOM for large keys
      538f2216
    • WangTaoTheTonic's avatar
      [SPARK-5130][Deploy]Take yarn-cluster as cluster mode in spark-submit · 0760787d
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-5130
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #3929 from WangTaoTheTonic/SPARK-5130 and squashes the following commits:
      
      c490648 [WangTaoTheTonic] take yarn-cluster as cluster mode in spark-submit
      0760787d
    • Kousuke Saruta's avatar
      [Minor] Fix the value represented by spark.executor.id for consistency. · 0a597276
      Kousuke Saruta authored
      The property  `spark.executor.id` can represent both `driver` and `<driver>`  for one driver.
      It's inconsistent.
      
      This issue is minor so I didn't file this in JIRA.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3812 from sarutak/fix-driver-identifier and squashes the following commits:
      
      d885498 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-driver-identifier
      4275663 [Kousuke Saruta] Fixed the value represented by spark.executor.id of local mode
      0a597276
    • Zhang, Liye's avatar
      [SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in standalone mode · 06dc4b52
      Zhang, Liye authored
      when enabling eventlog in standalone mode, if give the wrong configuration, the standalone cluster will down (cause master restart, lose connection with workers).
      How to reproduce: just give an invalid value to "spark.eventLog.dir", for example: spark.eventLog.dir=hdfs://tmp/logdir1, hdfs://tmp/logdir2. This will throw illegalArgumentException, which will cause the Master restart. And the whole cluster is not available.
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #3824 from liyezhang556520/wrongConf4Cluster and squashes the following commits:
      
      3c24d98 [Zhang, Liye] revert change with logwarning and excetption for FileNotFoundException
      3c1ac2e [Zhang, Liye] change var to val
      a49c52f [Zhang, Liye] revert wrong modification
      12eee85 [Zhang, Liye] add more message in log and on webUI
      5c1fa33 [Zhang, Liye] cache exceptions when eventlog with wrong conf
      06dc4b52
    • Takeshi Yamamuro's avatar
      [SPARK-4917] Add a function to convert into a graph with canonical edges in GraphOps · f825e193
      Takeshi Yamamuro authored
      Convert bi-directional edges into uni-directional ones instead of 'canonicalOrientation' in GraphLoader.edgeListFile.
      This function is useful when a graph is loaded as it is and then is transformed into one with canonical edges.
      It rewrites the vertex ids of edges so that srcIds are bigger than dstIds, and merges the duplicated edges.
      
      Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
      
      Closes #3760 from maropu/ConvertToCanonicalEdgesSpike and squashes the following commits:
      
      7f8b580 [Takeshi Yamamuro] Add a function to convert into a graph with canonical edges in GraphOps
      f825e193
    • Sandy Ryza's avatar
      SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase · 8d45834d
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3896 from sryza/sandy-spark-5087 and squashes the following commits:
      
      65611d0 [Sandy Ryza] Review feedback
      3294176 [Sandy Ryza] SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase
      8d45834d
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · c0823857
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3880 (close requested by 'ash211')
      Closes #3649 (close requested by 'marmbrus')
      Closes #3791 (close requested by 'mengxr')
      Closes #3559 (close requested by 'andrewor14')
      Closes #3879 (close requested by 'ash211')
      c0823857
    • Shuo Xiang's avatar
      [SPARK-5116][MLlib] Add extractor for SparseVector and DenseVector · c66a9763
      Shuo Xiang authored
      Add extractor for SparseVector and DenseVector in MLlib to save some code while performing pattern matching on Vectors. For example, previously we may use:
      
           vec match {
                case dv: DenseVector =>
                  val values = dv.values
                  ...
                case sv: SparseVector =>
                  val indices = sv.indices
                  val values = sv.values
                  val size = sv.size
                  ...
            }
      
      with extractor it is:
      
          vec match {
              case DenseVector(values) =>
                ...
              case SparseVector(size, indices, values) =>
                ...
          }
      
      Author: Shuo Xiang <shuoxiangpub@gmail.com>
      
      Closes #3919 from coderxiang/extractor and squashes the following commits:
      
      359e8d5 [Shuo Xiang] merge master
      ca5fc3e [Shuo Xiang] merge master
      0b1e190 [Shuo Xiang] use extractor for vectors in RowMatrix.scala
      e961805 [Shuo Xiang] use extractor for vectors in StandardScaler.scala
      c2bbdaf [Shuo Xiang] use extractor for vectors in IDFscala
      8433922 [Shuo Xiang] use extractor for vectors in NaiveBayes.scala and Normalizer.scala
      d83c7ca [Shuo Xiang] use extractor for vectors in Vectors.scala
      5523dad [Shuo Xiang] Add extractor for SparseVector and DenseVector
      c66a9763
    • zsxwing's avatar
      [SPARK-5126][Core] Verify Spark urls before creating Actors so that invalid... · 2b729d22
      zsxwing authored
      [SPARK-5126][Core] Verify Spark urls before creating Actors so that invalid urls can crash the process.
      
      Because `actorSelection` will return `deadLetters` for an invalid path,  Worker keeps quiet for an invalid master url. It's better to log an error so that people can find such problem quickly.
      
      This PR will check the url before sending to `actorSelection`, throw and log a SparkException for an invalid url.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3927 from zsxwing/SPARK-5126 and squashes the following commits:
      
      9d429ee [zsxwing] Create a utility method in Utils to parse Spark url; verify urls before creating Actors so that invalid urls can crash the process.
      8286e51 [zsxwing] Check the url before sending to Akka and log the error if the url is invalid
      2b729d22
  2. Jan 07, 2015
    • hushan[胡珊]'s avatar
      [SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson · d345ebeb
      hushan[胡珊] authored
      SPARK-5132:
      stageInfoToJson: Stage Attempt Id
      stageInfoFromJson: Attempt Id
      
      Author: hushan[胡珊] <hushan@xiaomi.com>
      
      Closes #3932 from suyanNone/json-stage and squashes the following commits:
      
      41419ab [hushan[胡珊]] Correct stage Attempt Id key in stageInfofromJson
      d345ebeb
    • DB Tsai's avatar
      [SPARK-5128][MLLib] Add common used log1pExp API in MLUtils · 60e2d9e2
      DB Tsai authored
      When `x` is positive and large, computing `math.log(1 + math.exp(x))` will lead to arithmetic
      overflow. This will happen when `x > 709.78` which is not a very large number.
      It can be addressed by rewriting the formula into `x + math.log1p(math.exp(-x))` when `x > 0`.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #3915 from dbtsai/mathutil and squashes the following commits:
      
      bec6a84 [DB Tsai] remove empty line
      3239541 [DB Tsai] revert part of patch into another PR
      23144f3 [DB Tsai] doc
      49f3658 [DB Tsai] temp
      6c29ed3 [DB Tsai] formating
      f8447f9 [DB Tsai] address another overflow issue in gradientMultiplier in LOR gradient code
      64eefd0 [DB Tsai] first commit
      60e2d9e2
    • Masayoshi TSUZUKI's avatar
      [SPARK-2458] Make failed application log visible on History Server · 6e74edec
      Masayoshi TSUZUKI authored
      Enabled HistoryServer to show incomplete applications.
      We can see the log for incomplete applications by clicking the bottom link.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #3467 from tsudukim/feature/SPARK-2458-2 and squashes the following commits:
      
      76205d2 [Masayoshi TSUZUKI] Fixed and added test code.
      29a04a9 [Masayoshi TSUZUKI] Merge branch 'master' of github.com:tsudukim/spark into feature/SPARK-2458-2
      f9ef854 [Masayoshi TSUZUKI] Added space between "if" and "(". Fixed "Incomplete" as capitalized in the web UI. Modified double negative variable name.
      9b465b0 [Masayoshi TSUZUKI] Modified typo and better implementation.
      3ed8a41 [Masayoshi TSUZUKI] Modified too long lines.
      08ea14d [Masayoshi TSUZUKI] [SPARK-2458] Make failed application log visible on History Server
      6e74edec
    • WangTaoTheTonic's avatar
      [SPARK-2165][YARN]add support for setting maxAppAttempts in the ApplicationSubmissionContext · 8fdd4895
      WangTaoTheTonic authored
      ...xt
      
      https://issues.apache.org/jira/browse/SPARK-2165
      
      I still have 2 questions:
      * If this config is not set, we should use yarn's corresponding value or a default value(like 2) on spark side?
      * Is the config name best? Or "spark.yarn.am.maxAttempts"?
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #3878 from WangTaoTheTonic/SPARK-2165 and squashes the following commits:
      
      1416c83 [WangTaoTheTonic] use the name spark.yarn.maxAppAttempts
      202ac85 [WangTaoTheTonic] rephrase some
      afdfc99 [WangTaoTheTonic] more detailed description
      91562c6 [WangTaoTheTonic] add support for setting maxAppAttempts in the ApplicationSubmissionContext
      8fdd4895
    • huangzhaowei's avatar
      [YARN][SPARK-4929] Bug fix: fix the yarn-client code to support HA · 5fde6616
      huangzhaowei authored
      Nowadays, yarn-client will exit directly when the HA change happens no matter how many times the am should retry.
      The reason may be that the default final status only considerred the sys.exit, and the yarn-client HA cann't benefit from this.
      So we should distinct the default final status between client and cluster, because the SUCCEEDED status may cause the HA failed in client mode and UNDEFINED may cause the error reporter in cluster when using sys.exit.
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #3771 from SaintBacchus/YarnHA and squashes the following commits:
      
      c02bfcc [huangzhaowei] Improve the comment of the funciton 'getDefaultFinalStatus'
      0e69924 [huangzhaowei] Bug fix: fix the yarn-client code to support HA
      5fde6616
  3. Jan 06, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-5099][Mllib] Simplify logistic loss function · e21acc19
      Liang-Chi Hsieh authored
      This is a minor pr where I think that we can simply take minus of `margin`, instead of subtracting  `margin`.
      
      Mathematically, they are equal. But the modified equation is the common form of logistic loss function and so more readable. It also computes more accurate value as some quick tests show.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #3899 from viirya/logit_func and squashes the following commits:
      
      91a3860 [Liang-Chi Hsieh] Modified for comment.
      0aa51e4 [Liang-Chi Hsieh] Further simplified.
      72a295e [Liang-Chi Hsieh] Revert LogLoss back and add more considerations in Logistic Loss.
      a3f83ca [Liang-Chi Hsieh] Fix a bug.
      2bc5712 [Liang-Chi Hsieh] Simplify loss function.
      e21acc19
    • Liang-Chi Hsieh's avatar
      [SPARK-5050][Mllib] Add unit test for sqdist · bb38ebb1
      Liang-Chi Hsieh authored
      Related to #3643. Follow the previous suggestion to add unit test for `sqdist` in `VectorsSuite`.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #3869 from viirya/sqdist_test and squashes the following commits:
      
      fb743da [Liang-Chi Hsieh] Modified for comment and fix bug.
      90a08f3 [Liang-Chi Hsieh] Modified for comment.
      39a3ca6 [Liang-Chi Hsieh] Take care of special case.
      b789f42 [Liang-Chi Hsieh] More proper unit test with random sparsity pattern.
      c36be68 [Liang-Chi Hsieh] Add unit test for sqdist.
      bb38ebb1
    • Travis Galoppo's avatar
      SPARK-5017 [MLlib] - Use SVD to compute determinant and inverse of covariance matrix · 4108e5f3
      Travis Galoppo authored
      MultivariateGaussian was calling both pinv() and det() on the covariance matrix, effectively performing two matrix decompositions.  Both values are now computed using the singular value decompositon. Both the pseudo-inverse and the pseudo-determinant are used to guard against singular matrices.
      
      Author: Travis Galoppo <tjg2107@columbia.edu>
      
      Closes #3871 from tgaloppo/spark-5017 and squashes the following commits:
      
      383b5b3 [Travis Galoppo] MultivariateGaussian - minor optimization in density calculation
      a5b8bc5 [Travis Galoppo] Added additional points to tests in test suite. Fixed comment in MultivariateGaussian
      629d9d0 [Travis Galoppo] Moved some test values from var to val.
      dc3d0f7 [Travis Galoppo] Catch potential exception calculating pseudo-determinant. Style improvements.
      d448137 [Travis Galoppo] Added test suite for MultivariateGaussian, including test for degenerate case.
      1989be0 [Travis Galoppo] SPARK-5017 - Fixed to use SVD to compute determinant and inverse of covariance matrix.  Previous code called both pinv() and det(), effectively performing two matrix decompositions. Additionally, the pinv() implementation in Breeze is known to fail for singular matrices.
      b4415ea [Travis Galoppo] Merge branch 'spark-5017' of https://github.com/tgaloppo/spark into spark-5017
      6f11b6d [Travis Galoppo] SPARK-5017 - Use SVD to compute determinant and inverse of covariance matrix. Code was calling both det() and pinv(), effectively performing two matrix decompositions. Futhermore, Breeze pinv() currently fails for singular matrices.
      fd9784c [Travis Galoppo] SPARK-5017 - Use SVD to compute determinant and inverse of covariance matrix
      4108e5f3
    • Sean Owen's avatar
      SPARK-4159 [CORE] Maven build doesn't run JUnit test suites · 4cba6eb4
      Sean Owen authored
      This PR:
      
      - Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
      - Tells `surefire` to test only Java tests
      - Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
      
      For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
      
      2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
      12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
      e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
      4cba6eb4
    • kj-ki's avatar
      [Minor] Fix comments for GraphX 2D partitioning strategy · 5e3ec111
      kj-ki authored
      The sum of vertices on matrix (v0 to v11) is 12. And, I think one same block overlaps in this strategy.
      
      This is minor PR, so I didn't file in JIRA.
      
      Author: kj-ki <kikushima.kenji@lab.ntt.co.jp>
      
      Closes #3904 from kj-ki/fix-partitionstrategy-comments and squashes the following commits:
      
      79829d9 [kj-ki] Fix comments for 2D partitioning.
      5e3ec111
    • Josh Rosen's avatar
      [SPARK-1600] Refactor FileInputStream tests to remove Thread.sleep() calls and SystemClock usage · a6394bc2
      Josh Rosen authored
      This patch refactors Spark Streaming's FileInputStream tests to remove uses of Thread.sleep() and SystemClock, which should hopefully resolve some longstanding flakiness in these tests (see SPARK-1600).
      
      Key changes:
      
      - Modify FileInputDStream to use the scheduler's Clock instead of System.currentTimeMillis(); this allows it to be tested using ManualClock.
      - Fix a synchronization issue in ManualClock's `currentTime` method.
      - Add a StreamingTestWaiter class which allows callers to block until a certain number of batches have finished.
      - Change the FileInputStream tests so that files' modification times are manually set based off of ManualClock; this eliminates many Thread.sleep calls.
      - Update these tests to use the withStreamingContext fixture.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #3801 from JoshRosen/SPARK-1600 and squashes the following commits:
      
      e4494f4 [Josh Rosen] Address a potential race when setting file modification times
      8340bd0 [Josh Rosen] Use set comparisons for output.
      0b9c252 [Josh Rosen] Fix some ManualClock usage problems.
      1cc689f [Josh Rosen] ConcurrentHashMap -> SynchronizedMap
      db26c3a [Josh Rosen] Use standard timeout in ScalaTest `eventually` blocks.
      3939432 [Josh Rosen] Rename StreamingTestWaiter to BatchCounter
      0b9c3a1 [Josh Rosen] Wait for checkpoint to complete
      863d71a [Josh Rosen] Remove Thread.sleep that was used to make task run slowly
      b4442c3 [Josh Rosen] batchTimeToSelectedFiles should be thread-safe
      15b48ee [Josh Rosen] Replace several TestWaiter methods w/ ScalaTest eventually.
      fffc51c [Josh Rosen] Revert "Remove last remaining sleep() call"
      dbb8247 [Josh Rosen] Remove last remaining sleep() call
      566a63f [Josh Rosen] Fix log message and comment typos
      da32f3f [Josh Rosen] Fix log message and comment typos
      3689214 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-1600
      c8f06b1 [Josh Rosen] Remove Thread.sleep calls in FileInputStream CheckpointSuite test.
      d4f2d87 [Josh Rosen] Refactor file input stream tests to not rely on SystemClock.
      dda1403 [Josh Rosen] Add StreamingTestWaiter class.
      3c3efc3 [Josh Rosen] Synchronize `currentTime` in ManualClock
      a95ddc4 [Josh Rosen] Modify FileInputDStream to use Clock class.
      a6394bc2
    • Kostas Sakellis's avatar
      SPARK-4843 [YARN] Squash ExecutorRunnableUtil and ExecutorRunnable · 451546aa
      Kostas Sakellis authored
      ExecutorRunnableUtil is a parent of ExecutorRunnable because of the yarn-alpha and yarn-stable split. Now that yarn-alpha is gone, this commit squashes the unnecessary hierarchy. The methods from ExecutorRunnableUtil are added as private.
      
      Author: Kostas Sakellis <kostas@cloudera.com>
      
      Closes #3696 from ksakellis/kostas-spark-4843 and squashes the following commits:
      
      486716f [Kostas Sakellis] Moved prepareEnvironment call to after yarnConf declaration
      470e22e [Kostas Sakellis] Fixed indentation and renamed sparkConf variable
      9b1b1c9 [Kostas Sakellis] SPARK-4843 [YARN] Squash ExecutorRunnableUtil and ExecutorRunnable
      451546aa
  4. Jan 05, 2015
    • Reynold Xin's avatar
      [SPARK-5040][SQL] Support expressing unresolved attributes using $"attribute... · 04d55d8e
      Reynold Xin authored
      [SPARK-5040][SQL] Support expressing unresolved attributes using $"attribute name" notation in SQL DSL.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #3862 from rxin/stringcontext-attr and squashes the following commits:
      
      9b10f57 [Reynold Xin] Rename StrongToAttributeConversionHelper
      72121af [Reynold Xin] [SPARK-5040][SQL] Support expressing unresolved attributes using $"attribute name" notation in SQL DSL.
      04d55d8e
    • Reynold Xin's avatar
      [SPARK-5093] Set spark.network.timeout to 120s consistently. · bbcba3a9
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #3903 from rxin/timeout-120 and squashes the following commits:
      
      7c2138e [Reynold Xin] [SPARK-5093] Set spark.network.timeout to 120s consistently.
      bbcba3a9
    • freeman's avatar
      [SPARK-5089][PYSPARK][MLLIB] Fix vector convert · 6c6f3257
      freeman authored
      This is a small change addressing a potentially significant bug in how PySpark + MLlib handles non-float64 numpy arrays. The automatic conversion to `DenseVector` that occurs when passing RDDs to MLlib algorithms in PySpark should automatically upcast to float64s, but currently this wasn't actually happening. As a result, non-float64 would be silently parsed inappropriately during SerDe, yielding erroneous results when running, for example, KMeans.
      
      The PR includes the fix, as well as a new test for the correct conversion behavior.
      
      davies
      
      Author: freeman <the.freeman.lab@gmail.com>
      
      Closes #3902 from freeman-lab/fix-vector-convert and squashes the following commits:
      
      764db47 [freeman] Add a test for proper conversion behavior
      704f97e [freeman] Return array after changing type
      6c6f3257
    • Jongyoul Lee's avatar
      [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environme... · 1c0e7ce0
      Jongyoul Lee authored
      ...nt at all.
      
      - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask
      - See the Jira Issue for more details.
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3741 from jongyoul/SPARK-4465 and squashes the following commits:
      
      46ad71e [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed unused import
      3d6631f [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed comments and adjusted indentations
      2343f13 [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask
      1c0e7ce0
    • WangTao's avatar
      [SPARK-5057] Log message in failed askWithReply attempts · ce39b344
      WangTao authored
      https://issues.apache.org/jira/browse/SPARK-5057
      
      Author: WangTao <barneystinson@aliyun.com>
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #3875 from WangTaoTheTonic/SPARK-5057 and squashes the following commits:
      
      1503487 [WangTao] use string interpolation
      706c8a7 [WangTaoTheTonic] log more messages
      ce39b344
    • Varun Saxena's avatar
      [SPARK-4688] Have a single shared network timeout in Spark · d3f07fd2
      Varun Saxena authored
      [SPARK-4688] Have a single shared network timeout in Spark
      
      Author: Varun Saxena <vsaxena.varun@gmail.com>
      Author: varunsaxena <vsaxena.varun@gmail.com>
      
      Closes #3562 from varunsaxena/SPARK-4688 and squashes the following commits:
      
      6e97f72 [Varun Saxena] [SPARK-4688] Single shared network timeout
      cd783a2 [Varun Saxena] SPARK-4688
      d6f8c29 [Varun Saxena] SCALA-4688
      9562b15 [Varun Saxena] SPARK-4688
      a75f014 [varunsaxena] SPARK-4688
      594226c [varunsaxena] SPARK-4688
      d3f07fd2
  5. Jan 04, 2015
    • zsxwing's avatar
      [SPARK-5074][Core] Fix a non-deterministic test failure · 5c506cec
      zsxwing authored
      Add `assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))` to make sure `sparkListener` receive the message.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3889 from zsxwing/SPARK-5074 and squashes the following commits:
      
      e61c198 [zsxwing] Fix a non-deterministic test failure
      5c506cec
    • zsxwing's avatar
      [SPARK-5083][Core] Fix a flaky test in TaskResultGetterSuite · 27e7f5a7
      zsxwing authored
      Because `sparkEnv.blockManager.master.removeBlock` is asynchronous, we need to make sure the block has already been removed before calling `super.enqueueSuccessfulTask`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3894 from zsxwing/SPARK-5083 and squashes the following commits:
      
      d97c03d [zsxwing] Fix a flaky test in TaskResultGetterSuite
      27e7f5a7
    • zsxwing's avatar
      [SPARK-5069][Core] Fix the race condition of TaskSchedulerImpl.dagScheduler · 6c726a3f
      zsxwing authored
      It's not necessary to set `TaskSchedulerImpl.dagScheduler` in preStart. It's safe to set it after `initializeEventProcessActor()`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3887 from zsxwing/SPARK-5069 and squashes the following commits:
      
      d95894f [zsxwing] Fix the race condition of TaskSchedulerImpl.dagScheduler
      6c726a3f
    • zsxwing's avatar
      [SPARK-5067][Core] Use '===' to compare well-defined case class · 72396522
      zsxwing authored
      A simple fix would be adding `assert(e1.appId == e2.appId)` for `SparkListenerApplicationStart`. But actually we can use `===` for well-defined case class directly. Therefore, instead of fixing this issue, I use `===` to compare those well-defined case classes (all fields have implemented a correct `equals` method, such as primitive types)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3886 from zsxwing/SPARK-5067 and squashes the following commits:
      
      0a51711 [zsxwing] Use '===' to compare well-defined case class
      72396522
    • Josh Rosen's avatar
      [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs · 939ba1f8
      Josh Rosen authored
      This patch disables output spec. validation for jobs launched through Spark Streaming, since this interferes with checkpoint recovery.
      
      Hadoop OutputFormats have a `checkOutputSpecs` method which performs certain checks prior to writing output, such as checking whether the output directory already exists.  SPARK-1100 added checks for FileOutputFormat, SPARK-1677 (#947) added a SparkConf configuration to disable these checks, and SPARK-2309 (#1088) extended these checks to run for all OutputFormats, not just FileOutputFormat.
      
      In Spark Streaming, we might have to re-process a batch during checkpoint recovery, so `save` actions may be called multiple times.  In addition to `DStream`'s own save actions, users might use `transform` or `foreachRDD` and call the `RDD` and `PairRDD` save actions.  When output spec. validation is enabled, the second calls to these actions will fail due to existing output.
      
      This patch automatically disables output spec. validation for jobs submitted by the Spark Streaming scheduler.  This is done by using Scala's `DynamicVariable` to propagate the bypass setting without having to mutate SparkConf or introduce a global variable.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #3832 from JoshRosen/SPARK-4835 and squashes the following commits:
      
      36eaf35 [Josh Rosen] Add comment explaining use of transform() in test.
      6485cf8 [Josh Rosen] Add test case in Streaming; fix bug for transform()
      7b3e06a [Josh Rosen] Remove Streaming-specific setting to undo this change; update conf. guide
      bf9094d [Josh Rosen] Revise disableOutputSpecValidation() comment to not refer to Spark Streaming.
      e581d17 [Josh Rosen] Deduplicate isOutputSpecValidationEnabled logic.
      762e473 [Josh Rosen] [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs.
      939ba1f8
    • bilna's avatar
      [SPARK-4631] unit test for MQTT · e767d7dd
      bilna authored
      Please review the unit test for MQTT
      
      Author: bilna <bilnap@am.amrita.edu>
      Author: Bilna P <bilna.p@gmail.com>
      
      Closes #3844 from Bilna/master and squashes the following commits:
      
      acea3a3 [bilna] Adding dependency with scope test
      28681fa [bilna] Merge remote-tracking branch 'upstream/master'
      fac3904 [bilna] Correction in Indentation and coding style
      ed9db4c [bilna] Merge remote-tracking branch 'upstream/master'
      4b34ee7 [Bilna P] Update MQTTStreamSuite.scala
      04503cf [bilna] Added embedded broker service for mqtt test
      89d804e [bilna] Merge remote-tracking branch 'upstream/master'
      fc8eb28 [bilna] Merge remote-tracking branch 'upstream/master'
      4b58094 [Bilna P] Update MQTTStreamSuite.scala
      b1ac4ad [bilna] Added BeforeAndAfter
      5f6bfd2 [bilna] Added BeforeAndAfter
      e8b6623 [Bilna P] Update MQTTStreamSuite.scala
      5ca6691 [Bilna P] Update MQTTStreamSuite.scala
      8616495 [bilna] [SPARK-4631] unit test for MQTT
      e767d7dd
    • Dale's avatar
      [SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs · 3fddc946
      Dale authored
      Author: Dale <tigerquoll@outlook.com>
      
      Closes #3809 from tigerquoll/SPARK-4787 and squashes the following commits:
      
      5661e01 [Dale] [SPARK-4787] Ensure that call to stop() doesn't lose the exception by using a finally block.
      2172578 [Dale] [SPARK-4787] Stop context properly if an exception occurs during DAGScheduler initialization.
      3fddc946
    • Brennon York's avatar
      [SPARK-794][Core] Remove sleep() in ClusterScheduler.stop · b96008d5
      Brennon York authored
      Removed `sleep()` from the `stop()` method of the `TaskSchedulerImpl` class which, from the JIRA ticket, is believed to be a legacy artifact slowing down testing originally introduced in the `ClusterScheduler` class.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #3851 from brennonyork/SPARK-794 and squashes the following commits:
      
      04c3e64 [Brennon York] Removed sleep() from the stop() method
      b96008d5
  6. Jan 03, 2015
    • sigmoidanalytics's avatar
      [SPARK-5058] Updated broken links · 342612b6
      sigmoidanalytics authored
      Updated the broken link pointing to the KafkaWordCount example to the correct one.
      
      Author: sigmoidanalytics <mayur@sigmoidanalytics.com>
      
      Closes #3877 from sigmoidanalytics/patch-1 and squashes the following commits:
      
      3e19b31 [sigmoidanalytics] Updated broken links
      342612b6
Loading