Skip to content
Snippets Groups Projects
  1. Jan 10, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5032] [graphx] Remove GraphX MIMA exclude for 1.3 · 33132609
      Joseph K. Bradley authored
      Since GraphX is no longer alpha as of 1.2, MimaExcludes should not exclude GraphX for 1.3
      
      Here are the individual excludes I had to add + the associated commits:
      
      ```
                  // SPARK-4444
                  ProblemFilters.exclude[IncompatibleResultTypeProblem](
                    "org.apache.spark.graphx.EdgeRDD.fromEdges"),
                  ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.graphx.EdgeRDD.filter"),
                  ProblemFilters.exclude[IncompatibleResultTypeProblem](
                    "org.apache.spark.graphx.impl.EdgeRDDImpl.filter"),
      ```
      [https://github.com/apache/spark/commit/9ac2bb18ede2e9f73c255fa33445af89aaf8a000]
      
      ```
                  // SPARK-3623
                  ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.graphx.Graph.checkpoint")
      ```
      [https://github.com/apache/spark/commit/e895e0cbecbbec1b412ff21321e57826d2d0a982]
      
      ```
                  // SPARK-4620
                  ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.graphx.Graph.unpersist"),
      ```
      [https://github.com/apache/spark/commit/8817fc7fe8785d7b11138ca744f22f7e70f1f0a0]
      
      CC: rxin
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #3856 from jkbradley/graphx-mima and squashes the following commits:
      
      1eea2f6 [Joseph K. Bradley] moved cleanup to run-tests
      527ccd9 [Joseph K. Bradley] fixed jenkins script to remove ivy2 cache
      802e252 [Joseph K. Bradley] Removed GraphX MIMA excludes and added line to clear spark from .m2 dir before Jenkins tests.  This may not work yet...
      30f8bb4 [Joseph K. Bradley] added individual mima excludes for graphx
      a3fea42 [Joseph K. Bradley] removed graphx mima exclude for 1.3
      33132609
    • scwf's avatar
      [SPARK-5029][SQL] Enable from follow multiple brackets · d22a31f5
      scwf authored
      Enable from follow multiple brackets:
      ```
      select key from ((select * from testData limit 1) union all (select * from testData limit 1)) x limit 1
      ```
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #3853 from scwf/from and squashes the following commits:
      
      14f110a [scwf] enable from follow multiple brackets
      d22a31f5
    • wangfei's avatar
      [SPARK-4871][SQL] Show sql statement in spark ui when run sql with spark-sql · 92d9a704
      wangfei authored
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #3718 from scwf/sparksqlui and squashes the following commits:
      
      e0d6b5d [wangfei] format fix
      383b505 [wangfei] fix conflicts
      4d2038a [wangfei] using setJobDescription
      df79837 [wangfei] fix compile error
      92ce834 [wangfei] show sql statement in spark ui when run sql use spark-sql
      92d9a704
    • GuoQiang Li's avatar
      [Minor]Resolve sbt warnings during build (MQTTStreamSuite.scala). · 8a29dc71
      GuoQiang Li authored
      cc andrewor14
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #3989 from witgo/MQTTStreamSuite and squashes the following commits:
      
      a6e967e [GuoQiang Li] Resolve sbt warnings during build (MQTTStreamSuite.scala).
      8a29dc71
    • CodingCat's avatar
      [SPARK-5181] do not print writing WAL log when WAL is disabled · f0d558b6
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-5181
      
      Currently, even the logManager is not created, we still see the log entry
      s"Writing to log $record"
      
      a simple fix to make log more accurate
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #3985 from CodingCat/SPARK-5181 and squashes the following commits:
      
      0e27dc5 [CodingCat] do not print writing WAL log when WAL is disabled
      f0d558b6
    • YanTangZhai's avatar
      [SPARK-4692] [SQL] Support ! boolean logic operator like NOT · 0ca51cc3
      YanTangZhai authored
      Support ! boolean logic operator like NOT in sql as follows
      select * from for_test where !(col1 > col2)
      
      Author: YanTangZhai <hakeemzhai@tencent.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #3555 from YanTangZhai/SPARK-4692 and squashes the following commits:
      
      1a9f605 [YanTangZhai] Update HiveQuerySuite.scala
      7c03c68 [YanTangZhai] Merge pull request #23 from apache/master
      992046e [YanTangZhai] Update HiveQuerySuite.scala
      ea618f4 [YanTangZhai] Update HiveQuerySuite.scala
      192411d [YanTangZhai] Merge pull request #17 from YanTangZhai/master
      e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
      1e1ebb4 [YanTangZhai] Update HiveQuerySuite.scala
      efc4210 [YanTangZhai] Update HiveQuerySuite.scala
      bd2c444 [YanTangZhai] Update HiveQuerySuite.scala
      1893956 [YanTangZhai] Merge pull request #14 from marmbrus/pr/3555
      59e4de9 [Michael Armbrust] make hive test
      718afeb [YanTangZhai] Merge pull request #12 from apache/master
      950b21e [YanTangZhai] Update HiveQuerySuite.scala
      74175b4 [YanTangZhai] Update HiveQuerySuite.scala
      92242c7 [YanTangZhai] Update HiveQl.scala
      6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
      e249846 [YanTangZhai] Merge pull request #10 from apache/master
      d26d982 [YanTangZhai] Merge pull request #9 from apache/master
      76d4027 [YanTangZhai] Merge pull request #8 from apache/master
      03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
      8a00106 [YanTangZhai] Merge pull request #6 from apache/master
      cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
      cdef539 [YanTangZhai] Merge pull request #1 from apache/master
      0ca51cc3
    • Michael Armbrust's avatar
      [SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause · 3684fd21
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #3987 from marmbrus/hiveUdfCaching and squashes the following commits:
      
      8bca2fa [Michael Armbrust] [SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause
      3684fd21
    • Yanbo Liang's avatar
      SPARK-4963 [SQL] Add copy to SQL's Sample operator · 77106df6
      Yanbo Liang authored
      https://issues.apache.org/jira/browse/SPARK-4963
      SchemaRDD.sample() return wrong results due to GapSamplingIterator operating on mutable row.
      HiveTableScan make RDD with SpecificMutableRow and SchemaRDD.sample() will return GapSamplingIterator for iterating.
      
      override def next(): T = {
          val r = data.next()
          advance
          r
        }
      
      GapSamplingIterator.next() return the current underlying element and assigned it to r.
      However if the underlying iterator is mutable row just like what HiveTableScan returned, underlying iterator and r will point to the same object.
      After advance operation, we drop some underlying elments and it also changed r which is not expected. Then we return the wrong value different from initial r.
      
      To fix this issue, the most direct way is to make HiveTableScan return mutable row with copy just like the initial commit that I have made. This solution will make HiveTableScan can not get the full advantage of reusable MutableRow, but it can make sample operation return correct result.
      Further more, we need to investigate  GapSamplingIterator.next() and make it can implement copy operation inside it. To achieve this, we should define every elements that RDD can store implement the function like cloneable and it will make huge change.
      
      Author: Yanbo Liang <yanbohappy@gmail.com>
      
      Closes #3827 from yanbohappy/spark-4963 and squashes the following commits:
      
      0912ca0 [Yanbo Liang] code format keep
      65c4e7c [Yanbo Liang] import file and clear annotation
      55c7c56 [Yanbo Liang] better output of test case
      cea7e2e [Yanbo Liang] SchemaRDD add copy operation before Sample operator
      e840829 [Yanbo Liang] HiveTableScan return mutable row with copy
      77106df6
    • scwf's avatar
      [SPARK-4861][SQL] Refactory command in spark sql · b3e86dc6
      scwf authored
      Follow up for #3712.
      This PR finally remove ```CommandStrategy``` and make all commands follow ```RunnableCommand``` so they can go with ```case r: RunnableCommand => ExecutedCommand(r) :: Nil```.
      
      One exception is the ```DescribeCommand``` of hive, which is a special case and need to distinguish hive table and temporary table, so still keep ```HiveCommandStrategy``` here.
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #3948 from scwf/followup-SPARK-4861 and squashes the following commits:
      
      6b48e64 [scwf] minor style fix
      2c62e9d [scwf] fix for hive module
      5a7a819 [scwf] Refactory command in spark sql
      b3e86dc6
    • scwf's avatar
      [SPARK-4574][SQL] Adding support for defining schema in foreign DDL commands. · 693a323a
      scwf authored
      Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like:
      ```
      CREATE TEMPORARY TABLE avroTable
      USING org.apache.spark.sql.avro
      OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
      ```
      With this PR user can define schema instead of infer from file, so  support ddl command as follows:
      ```
      CREATE TEMPORARY TABLE avroTable(a int, b string)
      USING org.apache.spark.sql.avro
      OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
      ```
      
      Author: scwf <wangfei1@huawei.com>
      Author: Yin Huai <yhuai@databricks.com>
      Author: Fei Wang <wangfei1@huawei.com>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #3431 from scwf/ddl and squashes the following commits:
      
      7e79ce5 [Fei Wang] Merge pull request #22 from yhuai/pr3431yin
      38f634e [Yin Huai] Remove Option from createRelation.
      65e9c73 [Yin Huai] Revert all changes since applying a given schema has not been testd.
      a852b10 [scwf] remove cleanIdentifier
      f336a16 [Fei Wang] Merge pull request #21 from yhuai/pr3431yin
      baf79b5 [Yin Huai] Test special characters quoted by backticks.
      50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to StringType.
      1eeb769 [Fei Wang] Merge pull request #20 from yhuai/pr3431yin
      f5c22b0 [Yin Huai] Refactor code and update test cases.
      f1cffe4 [Yin Huai] Revert "minor refactory"
      b621c8f [scwf] minor refactory
      d02547f [scwf] fix HiveCompatibilitySuite test failure
      8dfbf7a [scwf] more tests for complex data type
      ddab984 [Fei Wang] Merge pull request #19 from yhuai/pr3431yin
      91ad91b [Yin Huai] Parse data types in DDLParser.
      cf982d2 [scwf] fixed test failure
      445b57b [scwf] address comments
      02a662c [scwf] style issue
      44eb70c [scwf] fix decimal parser issue
      83b6fc3 [scwf] minor fix
      9bf12f8 [wangfei] adding test case
      7787ec7 [wangfei] added SchemaRelationProvider
      0ba70df [wangfei] draft version
      693a323a
    • Alex Liu's avatar
      [SPARK-4943][SQL] Allow table name having dot for db/catalog · 4b39fd1e
      Alex Liu authored
      The pull only fixes the parsing error and changes API to use tableIdentifier. Joining different catalog datasource related change is not done in this pull.
      
      Author: Alex Liu <alex_liu68@yahoo.com>
      
      Closes #3941 from alexliu68/SPARK-SQL-4943-3 and squashes the following commits:
      
      343ae27 [Alex Liu] [SPARK-4943][SQL] refactoring according to review
      29e5e55 [Alex Liu] [SPARK-4943][SQL] fix failed Hive CTAS tests
      6ae77ce [Alex Liu] [SPARK-4943][SQL] fix TestHive matching error
      3652997 [Alex Liu] [SPARK-4943][SQL] Allow table name having dot to support db/catalog ...
      4b39fd1e
    • Alex Liu's avatar
      [SPARK-4925][SQL] Publish Spark SQL hive-thriftserver maven artifact · 1e56eba5
      Alex Liu authored
      Author: Alex Liu <alex_liu68@yahoo.com>
      
      Closes #3766 from alexliu68/SPARK-SQL-4925 and squashes the following commits:
      
      3137b51 [Alex Liu] [SPARK-4925][SQL] Remove sql/hive-thriftserver module from pom.xml
      15f2e38 [Alex Liu] [SPARK-4925][SQL] Publish Spark SQL hive-thriftserver maven artifact
      1e56eba5
  2. Jan 09, 2015
    • luogankun's avatar
      [SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableException · 545dfcb9
      luogankun authored
      CaseInsensitiveMap throws java.io.NotSerializableException.
      
      Author: luogankun <luogankun@gmail.com>
      
      Closes #3944 from luogankun/SPARK-5141 and squashes the following commits:
      
      b6d63d5 [luogankun] [SPARK-5141]CaseInsensitiveMap throws java.io.NotSerializableException
      545dfcb9
    • MechCoder's avatar
      [SPARK-4406] [MLib] FIX: Validate k in SVD · 4554529d
      MechCoder authored
      Raise exception when k is non-positive in SVD
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #3945 from MechCoder/spark-4406 and squashes the following commits:
      
      64e6d2d [MechCoder] TST: Add better test errors and messages
      12dae73 [MechCoder] [SPARK-4406] FIX: Validate k in SVD
      4554529d
    • WangTaoTheTonic's avatar
      [SPARK-4990][Deploy]to find default properties file, search SPARK_CONF_DIR first · 8782eb99
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-4990
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #3823 from WangTaoTheTonic/SPARK-4990 and squashes the following commits:
      
      133c43e [WangTao] Update spark-submit2.cmd
      b1ab402 [WangTao] Update spark-submit
      4cc7f34 [WangTaoTheTonic] rebase
      55300bc [WangTaoTheTonic] use export to make it global
      d8d3cb7 [WangTaoTheTonic] remove blank line
      07b9ebf [WangTaoTheTonic] check SPARK_CONF_DIR instead of checking properties file
      c5a85eb [WangTaoTheTonic] to find default properties file, search SPARK_CONF_DIR first
      8782eb99
    • bilna's avatar
      [Minor] Fix import order and other coding style · 4e1f12d9
      bilna authored
      fixed import order and other coding style
      
      Author: bilna <bilnap@am.amrita.edu>
      Author: Bilna P <bilna.p@gmail.com>
      
      Closes #3966 from Bilna/master and squashes the following commits:
      
      5e76f04 [bilna] fix import order and other coding style
      5718d66 [bilna] Merge remote-tracking branch 'upstream/master'
      ae56514 [bilna] Merge remote-tracking branch 'upstream/master'
      acea3a3 [bilna] Adding dependency with scope test
      28681fa [bilna] Merge remote-tracking branch 'upstream/master'
      fac3904 [bilna] Correction in Indentation and coding style
      ed9db4c [bilna] Merge remote-tracking branch 'upstream/master'
      4b34ee7 [Bilna P] Update MQTTStreamSuite.scala
      04503cf [bilna] Added embedded broker service for mqtt test
      89d804e [bilna] Merge remote-tracking branch 'upstream/master'
      fc8eb28 [bilna] Merge remote-tracking branch 'upstream/master'
      4b58094 [Bilna P] Update MQTTStreamSuite.scala
      b1ac4ad [bilna] Added BeforeAndAfter
      5f6bfd2 [bilna] Added BeforeAndAfter
      e8b6623 [Bilna P] Update MQTTStreamSuite.scala
      5ca6691 [Bilna P] Update MQTTStreamSuite.scala
      8616495 [bilna] [SPARK-4631] unit test for MQTT
      4e1f12d9
    • Kousuke Saruta's avatar
      [DOC] Fixed Mesos version in doc from 0.18.1 to 0.21.0 · ae628725
      Kousuke Saruta authored
      #3934 upgraded Mesos version so we should also fix docs right?
      
      This issue is really minor so I don't file in JIRA.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3982 from sarutak/fix-mesos-version and squashes the following commits:
      
      9a86ee3 [Kousuke Saruta] Fixed mesos version from 0.18.1 to 0.21.0
      ae628725
    • mcheah's avatar
      [SPARK-4737] Task set manager properly handles serialization errors · e0f28e01
      mcheah authored
      Dealing with [SPARK-4737], the handling of serialization errors should not be the DAGScheduler's responsibility. The task set manager now catches the error and aborts the stage.
      
      If the TaskSetManager throws a TaskNotSerializableException, the TaskSchedulerImpl will return an empty list of task descriptions, because no tasks were started. The scheduler should abort the stage gracefully.
      
      Note that I'm not too familiar with this part of the codebase and its place in the overall architecture of the Spark stack. If implementing it this way will have any averse side effects please voice that loudly.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #3638 from mccheah/task-set-manager-properly-handle-ser-err and squashes the following commits:
      
      1545984 [mcheah] Some more style fixes from Andrew Or.
      5267929 [mcheah] Fixing style suggestions from Andrew Or.
      dfa145b [mcheah] Fixing style from Josh Rosen's feedback
      b2a430d [mcheah] Not returning empty seq when a task set cannot be serialized.
      94844d7 [mcheah] Fixing compilation error, one brace too many
      5f486f4 [mcheah] Adding license header for fake task class
      bf5e706 [mcheah] Fixing indentation.
      097e7a2 [mcheah] [SPARK-4737] Catching task serialization exception in TaskSetManager
      e0f28e01
    • WangTaoTheTonic's avatar
      [SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory... · e9664520
      WangTaoTheTonic authored
      ... size
      
      Ways to set Application Master's memory on yarn-client mode:
      1.  `spark.yarn.am.memory` in SparkConf or System Properties
      2.  default value 512m
      
      Note: this arguments is only available in yarn-client mode.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits:
      
      d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes
      6c1b264 [WangTaoTheTonic] rebase
      b8410c0 [WangTaoTheTonic] minor optiminzation
      ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements
      3bf70cc [WangTaoTheTonic] rebase and give proper hint
      987b99d [WangTaoTheTonic] disable --driver-memory in client mode
      2b27928 [WangTaoTheTonic] inaccurate description
      b7acbb2 [WangTaoTheTonic] incorrect method invoked
      2557c5e [WangTaoTheTonic] missing a single blank
      42075b0 [WangTaoTheTonic] arrange the args and warn logging
      69c7dba [WangTaoTheTonic] rebase
      1960d16 [WangTaoTheTonic] fix wrong comment
      7fa9e2e [WangTaoTheTonic] log a warning
      f6bee0e [WangTaoTheTonic] docs issue
      d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181
      b09c309 [WangTaoTheTonic] use code format
      ab16bb5 [WangTaoTheTonic] fix bug and add comments
      44e48c2 [WangTaoTheTonic] minor fix
      6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs
      0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size
      e9664520
    • Joseph K. Bradley's avatar
      [SPARK-5015] [mllib] Random seed for GMM + make test suite deterministic · 7e8e62ae
      Joseph K. Bradley authored
      Issues:
      * From JIRA: GaussianMixtureEM uses randomness but does not take a random seed. It should take one as a parameter.
      * This also makes the test suite flaky since initialization can fail due to stochasticity.
      
      Fix:
      * Add random seed
      * Use it in test suite
      
      CC: mengxr  tgaloppo
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #3981 from jkbradley/gmm-seed and squashes the following commits:
      
      f0df4fd [Joseph K. Bradley] Added seed parameter to GMM.  Updated test suite to use seed to prevent flakiness
      7e8e62ae
    • Jongyoul Lee's avatar
      [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 · 454fe129
      Jongyoul Lee authored
      - update version from 0.18.1 to 0.21.0
      - I'm doing some tests in order to verify some spark jobs work fine on mesos 0.21.0 environment.
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3934 from jongyoul/SPARK-3619 and squashes the following commits:
      
      ab994fa [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688 - update version from 0.18.1 to 0.21.0
      454fe129
    • Liang-Chi Hsieh's avatar
      [SPARK-5145][Mllib] Add BLAS.dsyr and use it in GaussianMixtureEM · e9ca16ec
      Liang-Chi Hsieh authored
      This pr uses BLAS.dsyr to replace few implementations in GaussianMixtureEM.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #3949 from viirya/blas_dsyr and squashes the following commits:
      
      4e4d6cf [Liang-Chi Hsieh] Add unit test. Rename function name, modify doc and style.
      3f57fd2 [Liang-Chi Hsieh] Add BLAS.dsyr and use it in GaussianMixtureEM.
      e9ca16ec
    • Kay Ousterhout's avatar
      [SPARK-1143] Separate pool tests into their own suite. · b6aa5573
      Kay Ousterhout authored
      The current TaskSchedulerImplSuite includes some tests that are
      actually for the TaskSchedulerImpl, but the remainder of the tests avoid using
      the TaskSchedulerImpl entirely, and actually test the pool and scheduling
      algorithm mechanisms. This commit separates the pool/scheduling algorithm
      tests into their own suite, and also simplifies those tests.
      
      The pull request replaces #339.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #3967 from kayousterhout/SPARK-1143 and squashes the following commits:
      
      8a898c4 [Kay Ousterhout] [SPARK-1143] Separate pool tests into their own suite.
      b6aa5573
    • Patrick Wendell's avatar
      HOTFIX: Minor improvements to make-distribution.sh · 1790b386
      Patrick Wendell authored
      1. Renames $FWDIR to $SPARK_HOME (vast majority of diff).
      2. Use Spark-provided Maven.
      3. Logs build flags in the RELEASE file.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3973 from pwendell/master and squashes the following commits:
      
      340a2fa [Patrick Wendell] HOTFIX: Minor improvements to make-distribution.sh
      1790b386
    • Sean Owen's avatar
      SPARK-5136 [DOCS] Improve documentation around setting up Spark IntelliJ project · 547df977
      Sean Owen authored
      This PR simply points to the IntelliJ wiki page instead of also including IntelliJ notes in the docs. The intent however is to also update the wiki page with updated tips. This is the text I propose for the IntelliJ section on the wiki. I realize it omits some of the existing instructions on the wiki, about enabling Hive, but I think those are actually optional.
      
      ------
      
      IntelliJ supports both Maven- and SBT-based projects. It is recommended, however, to import Spark as a Maven project. Choose "Import Project..." from the File menu, and select the `pom.xml` file in the Spark root directory.
      
      It is fine to leave all settings at their default values in the Maven import wizard, with two caveats. First, it is usually useful to enable "Import Maven projects automatically", sincchanges to the project structure will automatically update the IntelliJ project.
      
      Second, note the step that prompts you to choose active Maven build profiles. As documented above, some build configuration require specific profiles to be enabled. The same profiles that are enabled with `-P[profile name]` above may be enabled on this screen. For example, if developing for Hadoop 2.4 with YARN support, enable profiles `yarn` and `hadoop-2.4`.
      
      These selections can be changed later by accessing the "Maven Projects" tool window from the View menu, and expanding the Profiles section.
      
      "Rebuild Project" can fail the first time the project is compiled, because generate source files are not automatically generated. Try clicking the  "Generate Sources and Update Folders For All Projects" button in the "Maven Projects" tool window to manually generate these sources.
      
      Compilation may fail with an error like "scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar". If so, go to Preferences > Build, Execution, Deployment > Scala Compiler and clear the "Additional compiler options" field. It will work then although the option will come back when the project reimports.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3952 from srowen/SPARK-5136 and squashes the following commits:
      
      f3baa66 [Sean Owen] Point to new IJ / Eclipse wiki link
      016b7df [Sean Owen] Point to IntelliJ wiki page instead of also including IntelliJ notes in the docs
      547df977
    • Aaron Davidson's avatar
      [Minor] Fix test RetryingBlockFetcherSuite after changed config name · b4034c3f
      Aaron Davidson authored
      Flakey due to the default retry interval being the same as our test's wait timeout.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #3972 from aarondav/fix-test and squashes the following commits:
      
      db77cab [Aaron Davidson] [Minor] Fix test after changed config name
      b4034c3f
    • WangTaoTheTonic's avatar
      [SPARK-5169][YARN]fetch the correct max attempts · f3da4bd7
      WangTaoTheTonic authored
      Soryy for fetching the wrong max attempts in this commit https://github.com/apache/spark/commit/8fdd48959c93b9cf809f03549e2ae6c4687d1fcd.
      We need to fix it now.
      
      tgravescs
      
      If we set an spark.yarn.maxAppAttempts which is larger than `yarn.resourcemanager.am.max-attempts` in yarn side, it will be overrided as described here:
      >The maximum number of application attempts. It's a global setting for all application masters. Each application master can specify its individual maximum number of application attempts via the API, but the individual number cannot be more than the global upper bound. If it is, the resourcemanager will override it. The default number is set to 2, to allow at least one retry for AM.
      
      http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #3942 from WangTaoTheTonic/HOTFIX and squashes the following commits:
      
      9ac16ce [WangTaoTheTonic] fetch the correct max attempts
      f3da4bd7
  3. Jan 08, 2015
    • Nicholas Chammas's avatar
      [SPARK-5122] Remove Shark from spark-ec2 · 167a5ab0
      Nicholas Chammas authored
      I moved the Spark-Shark version map [to the wiki](https://cwiki.apache.org/confluence/display/SPARK/Spark-Shark+version+mapping).
      
      This PR has a [matching PR in mesos/spark-ec2](https://github.com/mesos/spark-ec2/pull/89).
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #3939 from nchammas/remove-shark and squashes the following commits:
      
      66e0841 [Nicholas Chammas] fix style
      ceeab85 [Nicholas Chammas] show default Spark GitHub repo
      7270126 [Nicholas Chammas] validate Spark hashes
      db4935d [Nicholas Chammas] validate spark version upfront
      fc0d5b9 [Nicholas Chammas] remove Shark
      167a5ab0
    • Marcelo Vanzin's avatar
      [SPARK-4048] Enhance and extend hadoop-provided profile. · 48cecf67
      Marcelo Vanzin authored
      This change does a few things to make the hadoop-provided profile more useful:
      
      - Create new profiles for other libraries / services that might be provided by the infrastructure
      - Simplify and fix the poms so that the profiles are only activated while building assemblies.
      - Fix tests so that they're able to run when the profiles are activated
      - Add a new env variable to be used by distributions that use these profiles to provide the runtime
        classpath for Spark jobs and daemons.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
      
      82eb688 [Marcelo Vanzin] Add a comment.
      eb228c0 [Marcelo Vanzin] Fix borked merge.
      4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
      371ebee [Marcelo Vanzin] Review feedback.
      52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      322f882 [Marcelo Vanzin] Fix merge fail.
      f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9640503 [Marcelo Vanzin] Cleanup child process log message.
      115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
      e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
      7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
      1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
      d1399ed [Marcelo Vanzin] Restore jetty dependency.
      82a54b9 [Marcelo Vanzin] Remove unused profile.
      5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
      1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
      f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
      9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
      d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
      4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
      417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
      2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
      1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
      284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
      48cecf67
    • RJ Nowling's avatar
      [SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp dist sampling to P... · c9c8b219
      RJ Nowling authored
      ...ySpark MLlib
      
      This is a follow up to PR3680 https://github.com/apache/spark/pull/3680 .
      
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #3955 from rnowling/spark4891 and squashes the following commits:
      
      1236a01 [RJ Nowling] Fix Python style issues
      7a01a78 [RJ Nowling] Fix Python style issues
      174beab [RJ Nowling] [SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp dist sampling to PySpark MLlib
      c9c8b219
    • Kousuke Saruta's avatar
      [SPARK-4973][CORE] Local directory in the driver of client-mode continues... · a00af6be
      Kousuke Saruta authored
      [SPARK-4973][CORE] Local directory in the driver of client-mode continues remaining even if application finished when external shuffle is enabled
      
      When we enables external shuffle service, local directories in the driver of client-mode continue remaining even if application has finished.
      I think local directories for drivers should be deleted.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3811 from sarutak/SPARK-4973 and squashes the following commits:
      
      ad944ab [Kousuke Saruta] Fixed DiskBlockManager to cleanup local directory if it's the driver
      43770da [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973
      88feecd [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973
      d99718e [Kousuke Saruta] Fixed SparkSubmit.scala and DiskBlockManager.scala in order to delete local directories of the driver of local-mode when external shuffle service is enabled
      a00af6be
    • Fernando Otero (ZeoS)'s avatar
      SPARK-5148 [MLlib] Make usersOut/productsOut storagelevel in ALS configurable · 72df5a30
      Fernando Otero (ZeoS) authored
      Author: Fernando Otero (ZeoS) <fotero@gmail.com>
      
      Closes #3953 from zeitos/storageLevel and squashes the following commits:
      
      0f070b9 [Fernando Otero (ZeoS)] fix imports
      6869e80 [Fernando Otero (ZeoS)] fix comment length
      90c9f7e [Fernando Otero (ZeoS)] fix comment length
      18a992e [Fernando Otero (ZeoS)] changing storage level
      72df5a30
    • Eric Moyer's avatar
      Document that groupByKey will OOM for large keys · 538f2216
      Eric Moyer authored
      This pull request is my own work and I license it under Spark's open-source license.
      
      This contribution is an improvement to the documentation. I documented that the maximum number of values per key for groupByKey is limited by available RAM (see [Datablox][datablox link] and [the spark mailing list][list link]).
      
      Just saying that better performance is available is not sufficient. Sometimes you need to do a group-by - your operation needs all the items available in order to complete. This warning explains the problem.
      
      [datablox link]: http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
      [list link]: http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-RDD-GroupBy-OutOfMemory-Exceptions-tp11427p11466.html
      
      Author: Eric Moyer <eric_moyer@yahoo.com>
      
      Closes #3936 from RadixSeven/better-group-by-docs and squashes the following commits:
      
      5b6f4e9 [Eric Moyer] groupByKey docs naming updates
      238e81b [Eric Moyer] Doc that groupByKey will OOM for large keys
      538f2216
    • WangTaoTheTonic's avatar
      [SPARK-5130][Deploy]Take yarn-cluster as cluster mode in spark-submit · 0760787d
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-5130
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #3929 from WangTaoTheTonic/SPARK-5130 and squashes the following commits:
      
      c490648 [WangTaoTheTonic] take yarn-cluster as cluster mode in spark-submit
      0760787d
    • Kousuke Saruta's avatar
      [Minor] Fix the value represented by spark.executor.id for consistency. · 0a597276
      Kousuke Saruta authored
      The property  `spark.executor.id` can represent both `driver` and `<driver>`  for one driver.
      It's inconsistent.
      
      This issue is minor so I didn't file this in JIRA.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3812 from sarutak/fix-driver-identifier and squashes the following commits:
      
      d885498 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-driver-identifier
      4275663 [Kousuke Saruta] Fixed the value represented by spark.executor.id of local mode
      0a597276
    • Zhang, Liye's avatar
      [SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in standalone mode · 06dc4b52
      Zhang, Liye authored
      when enabling eventlog in standalone mode, if give the wrong configuration, the standalone cluster will down (cause master restart, lose connection with workers).
      How to reproduce: just give an invalid value to "spark.eventLog.dir", for example: spark.eventLog.dir=hdfs://tmp/logdir1, hdfs://tmp/logdir2. This will throw illegalArgumentException, which will cause the Master restart. And the whole cluster is not available.
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #3824 from liyezhang556520/wrongConf4Cluster and squashes the following commits:
      
      3c24d98 [Zhang, Liye] revert change with logwarning and excetption for FileNotFoundException
      3c1ac2e [Zhang, Liye] change var to val
      a49c52f [Zhang, Liye] revert wrong modification
      12eee85 [Zhang, Liye] add more message in log and on webUI
      5c1fa33 [Zhang, Liye] cache exceptions when eventlog with wrong conf
      06dc4b52
    • Takeshi Yamamuro's avatar
      [SPARK-4917] Add a function to convert into a graph with canonical edges in GraphOps · f825e193
      Takeshi Yamamuro authored
      Convert bi-directional edges into uni-directional ones instead of 'canonicalOrientation' in GraphLoader.edgeListFile.
      This function is useful when a graph is loaded as it is and then is transformed into one with canonical edges.
      It rewrites the vertex ids of edges so that srcIds are bigger than dstIds, and merges the duplicated edges.
      
      Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
      
      Closes #3760 from maropu/ConvertToCanonicalEdgesSpike and squashes the following commits:
      
      7f8b580 [Takeshi Yamamuro] Add a function to convert into a graph with canonical edges in GraphOps
      f825e193
    • Sandy Ryza's avatar
      SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase · 8d45834d
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #3896 from sryza/sandy-spark-5087 and squashes the following commits:
      
      65611d0 [Sandy Ryza] Review feedback
      3294176 [Sandy Ryza] SPARK-5087. [YARN] Merge yarn.Client and yarn.ClientBase
      8d45834d
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · c0823857
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3880 (close requested by 'ash211')
      Closes #3649 (close requested by 'marmbrus')
      Closes #3791 (close requested by 'mengxr')
      Closes #3559 (close requested by 'andrewor14')
      Closes #3879 (close requested by 'ash211')
      c0823857
    • Shuo Xiang's avatar
      [SPARK-5116][MLlib] Add extractor for SparseVector and DenseVector · c66a9763
      Shuo Xiang authored
      Add extractor for SparseVector and DenseVector in MLlib to save some code while performing pattern matching on Vectors. For example, previously we may use:
      
           vec match {
                case dv: DenseVector =>
                  val values = dv.values
                  ...
                case sv: SparseVector =>
                  val indices = sv.indices
                  val values = sv.values
                  val size = sv.size
                  ...
            }
      
      with extractor it is:
      
          vec match {
              case DenseVector(values) =>
                ...
              case SparseVector(size, indices, values) =>
                ...
          }
      
      Author: Shuo Xiang <shuoxiangpub@gmail.com>
      
      Closes #3919 from coderxiang/extractor and squashes the following commits:
      
      359e8d5 [Shuo Xiang] merge master
      ca5fc3e [Shuo Xiang] merge master
      0b1e190 [Shuo Xiang] use extractor for vectors in RowMatrix.scala
      e961805 [Shuo Xiang] use extractor for vectors in StandardScaler.scala
      c2bbdaf [Shuo Xiang] use extractor for vectors in IDFscala
      8433922 [Shuo Xiang] use extractor for vectors in NaiveBayes.scala and Normalizer.scala
      d83c7ca [Shuo Xiang] use extractor for vectors in Vectors.scala
      5523dad [Shuo Xiang] Add extractor for SparseVector and DenseVector
      c66a9763
Loading