Skip to content
Snippets Groups Projects
  1. Sep 12, 2014
    • Cheng Hao's avatar
      [SPARK-3481] [SQL] Eliminate the error log in local Hive comparison test · 8194fc66
      Cheng Hao authored
      Logically, we should remove the Hive Table/Database first and then reset the Hive configuration, repoint to the new data warehouse directory etc.
      Otherwise it raised exceptions like "Database doesn't not exists: default" in the local testing.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2352 from chenghao-intel/test_hive and squashes the following commits:
      
      74fd76b [Cheng Hao] eliminate the error log
      8194fc66
    • RJ Nowling's avatar
      [PySpark] Add blank line so that Python RDD.top() docstring renders correctly · 53337762
      RJ Nowling authored
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #2370 from rnowling/python_rdd_docstrings and squashes the following commits:
      
      5230574 [RJ Nowling] Add blank line so that Python RDD.top() docstring renders correctly
      53337762
    • Mark G. Whitney's avatar
      [SPARK-2558][DOCS] Add --queue example to YARN doc · f116f76b
      Mark G. Whitney authored
      Put original YARN queue spark-submit arg description in
      running-on-yarn html table and example command line
      
      Author: Mark G. Whitney <mark@whitneyindustries.com>
      
      Closes #2218 from kramimus/2258-yarndoc and squashes the following commits:
      
      4b5d808 [Mark G. Whitney] remove yarn queue config
      f8cda0d [Mark G. Whitney] [SPARK-2558][DOCS] Add spark.yarn.queue description to YARN doc
      f116f76b
    • Joseph K. Bradley's avatar
      [SPARK-3160] [SPARK-3494] [mllib] DecisionTree: eliminate pre-allocated... · b8634df1
      Joseph K. Bradley authored
      [SPARK-3160] [SPARK-3494] [mllib]  DecisionTree: eliminate pre-allocated nodes, parentImpurities arrays. Memory calc bug fix.
      
      This PR includes some code simplifications and re-organization which will be helpful for implementing random forests.  The main changes are that the nodes and parentImpurities arrays are no longer pre-allocated in the main train() method.
      
      Also added 2 bug fixes:
      * maxMemoryUsage calculation
      * over-allocation of space for bins in DTStatsAggregator for unordered features.
      
      Relation to RFs:
      * Since RFs will be deeper and will therefore be more likely sparse (not full trees), it could be a cost savings to avoid pre-allocating a full tree.
      * The associated re-organization also reduces bookkeeping, which will make RFs easier to implement.
      * The return code doneTraining may be generalized to include cases such as nodes ready for local training.
      
      Details:
      
      No longer pre-allocate parentImpurities array in main train() method.
      * parentImpurities values are now stored in individual nodes (in Node.stats.impurity).
      * These were not really needed.  They were used in calculateGainForSplit(), but they can be calculated anyways using parentNodeAgg.
      
      No longer using Node.build since tree structure is constructed on-the-fly.
      * Did not eliminate since it is public (Developer) API.  Marked as deprecated.
      
      Eliminated pre-allocated nodes array in main train() method.
      * Nodes are constructed and added to the tree structure as needed during training.
      * Moved tree construction from main train() method into findBestSplitsPerGroup() since there is no need to keep the (split, gain) array for an entire level of nodes.  Only one element of that array is needed at a time, so we do not the array.
      
      findBestSplits() now returns 2 items:
      * rootNode (newly created root node on first iteration, same root node on later iterations)
      * doneTraining (indicating if all nodes at that level were leafs)
      
      Updated DecisionTreeSuite.  Notes:
      * Improved test "Second level node building with vs. without groups"
      ** generateOrderedLabeledPoints() modified so that it really does require 2 levels of internal nodes.
      * Related update: Added Node.deepCopy (private[tree]), used for test suite
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #2341 from jkbradley/dt-spark-3160 and squashes the following commits:
      
      07dd1ee [Joseph K. Bradley] Fixed overflow bug with computing maxMemoryUsage in DecisionTree.  Also fixed bug with over-allocating space in DTStatsAggregator for unordered features.
      debe072 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-spark-3160
      5c4ac33 [Joseph K. Bradley] Added check in Strategy to make sure minInstancesPerNode >= 1
      0dd4d87 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-spark-3160
      306120f [Joseph K. Bradley] Fixed typo in DecisionTreeModel.scala doc
      eaa1dcf [Joseph K. Bradley] Added topNode doc in DecisionTree and scalastyle fix
      d4d7864 [Joseph K. Bradley] Marked Node.build as deprecated
      d4dbb99 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-spark-3160
      1a8f0ad [Joseph K. Bradley] Eliminated pre-allocated nodes array in main train() method. * Nodes are constructed and added to the tree structure as needed during training.
      2ab763b [Joseph K. Bradley] Simplifications to DecisionTree code:
      b8634df1
  2. Sep 11, 2014
    • Davies Liu's avatar
      [SPARK-3465] fix task metrics aggregation in local mode · 42904b8d
      Davies Liu authored
      Before overwrite t.taskMetrics, take a deepcopy of it.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2338 from davies/fix_metric and squashes the following commits:
      
      a5cdb63 [Davies Liu] Merge branch 'master' into fix_metric
      7c879e0 [Davies Liu] add more comments
      754b5b8 [Davies Liu] copy taskMetrics only when isLocal is true
      5ca26dc [Davies Liu] fix task metrics aggregation in local mode
      42904b8d
    • witgo's avatar
      SPARK-2482: Resolve sbt warnings during build · 33c7a738
      witgo authored
      At the same time, import the `scala.language.postfixOps` and ` org.scalatest.time.SpanSugar._` cause `scala.language.postfixOps` doesn't work
      
      Author: witgo <witgo@qq.com>
      
      Closes #1330 from witgo/sbt_warnings3 and squashes the following commits:
      
      179ba61 [witgo] Resolve sbt warnings during build
      33c7a738
    • Cody Koeninger's avatar
      SPARK-3462 push down filters and projections into Unions · f858f466
      Cody Koeninger authored
      Author: Cody Koeninger <cody.koeninger@mediacrossing.com>
      
      Closes #2345 from koeninger/SPARK-3462 and squashes the following commits:
      
      5c8d24d [Cody Koeninger] SPARK-3462 remove now-unused parameter
      0788691 [Cody Koeninger] SPARK-3462 add tests, handle compatible schema with different aliases, per marmbrus feedback
      ef47b3b [Cody Koeninger] SPARK-3462 push down filters and projections into Unions
      f858f466
    • Andrew Ash's avatar
      [SPARK-3429] Don't include the empty string "" as a defaultAclUser · ce59725b
      Andrew Ash authored
      Changes logging from
      
      ```
      14/09/05 02:01:08 INFO SecurityManager: Changing view acls to: aash,
      14/09/05 02:01:08 INFO SecurityManager: Changing modify acls to: aash,
      14/09/05 02:01:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash, ); users with modify permissions: Set(aash, )
      ```
      to
      ```
      14/09/05 02:28:28 INFO SecurityManager: Changing view acls to: aash
      14/09/05 02:28:28 INFO SecurityManager: Changing modify acls to: aash
      14/09/05 02:28:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash); users with modify permissions: Set(aash)
      ```
      
      Note that the first set of logs have a Set of size 2 containing "aash" and the empty string ""
      
      cc tgravescs
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #2286 from ash211/empty-default-acl and squashes the following commits:
      
      18cc612 [Andrew Ash] Use .isEmpty instead of ==""
      cf973a1 [Andrew Ash] Don't include the empty string "" as a defaultAclUser
      ce59725b
    • Andrew Or's avatar
      [Spark-3490] Disable SparkUI for tests · 6324eb7b
      Andrew Or authored
      We currently open many ephemeral ports during the tests, and as a result we occasionally can't bind to new ones. This has caused the `DriverSuite` and the `SparkSubmitSuite` to fail intermittently.
      
      By disabling the `SparkUI` when it's not needed, we already cut down on the number of ports opened significantly, on the order of the number of `SparkContexts` ever created. We must keep it enabled for a few tests for the UI itself, however.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2363 from andrewor14/disable-ui-for-tests and squashes the following commits:
      
      332a7d5 [Andrew Or] No need to set spark.ui.port to 0 anymore
      30c93a2 [Andrew Or] Simplify streaming UISuite
      a431b84 [Andrew Or] Fix streaming test failures
      8f5ae53 [Andrew Or] Fix no new line at the end
      29c9b5b [Andrew Or] Disable SparkUI for tests
      6324eb7b
    • Yin Huai's avatar
      [SPARK-3390][SQL] sqlContext.jsonRDD fails on a complex structure of JSON... · 4bc9e046
      Yin Huai authored
      [SPARK-3390][SQL] sqlContext.jsonRDD fails on a complex structure of JSON array and JSON object nesting
      
      This PR aims to correctly handle JSON arrays in the type of `ArrayType(...(ArrayType(StructType)))`.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-3390.
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #2364 from yhuai/SPARK-3390 and squashes the following commits:
      
      46db418 [Yin Huai] Handle JSON arrays in the type of ArrayType(...(ArrayType(StructType))).
      4bc9e046
    • Cheng Hao's avatar
      [SPARK-2917] [SQL] Avoid table creation in logical plan analyzing for CTAS · ca83f1e2
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #1846 from chenghao-intel/ctas and squashes the following commits:
      
      56a0578 [Cheng Hao] remove the unused imports
      9a57abc [Cheng Hao] Avoid table creation in logical plan analyzing
      ca83f1e2
    • Davies Liu's avatar
      [SPARK-3047] [PySpark] add an option to use str in textFileRDD · 1ef656ea
      Davies Liu authored
      str is much efficient than unicode (both CPU and memory), it'e better to use str in textFileRDD. In order to keep compatibility, use unicode by default. (Maybe change it in the future).
      
      use_unicode=True:
      
      daviesliudm:~/work/spark$ time python wc.py
      (u'./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
      
      real	2m8.298s
      user	0m0.185s
      sys	0m0.064s
      
      use_unicode=False
      
      daviesliudm:~/work/spark$ time python wc.py
      ('./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
      
      real	1m26.402s
      user	0m0.182s
      sys	0m0.062s
      
      We can see that it got 32% improvement!
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1951 from davies/unicode and squashes the following commits:
      
      8352d57 [Davies Liu] update version number
      a286f2f [Davies Liu] rollback loads()
      85246e5 [Davies Liu] add docs for use_unicode
      a0295e1 [Davies Liu] add an option to use str in textFile()
      1ef656ea
    • Chris Cope's avatar
      [SPARK-2140] Updating heap memory calculation for YARN stable and alpha. · ed1980ff
      Chris Cope authored
      Updated pull request, reflecting YARN stable and alpha states. I am getting intermittent test failures on my own test infrastructure. Is that tracked anywhere yet?
      
      Author: Chris Cope <ccope@resilientscience.com>
      
      Closes #2253 from copester/master and squashes the following commits:
      
      5ad89da [Chris Cope] [SPARK-2140] Removing calculateAMMemory functions since they are no longer needed.
      52b4e45 [Chris Cope] [SPARK-2140] Updating heap memory calculation for YARN stable and alpha.
      ed1980ff
  3. Sep 10, 2014
    • Aaron Staple's avatar
      [SPARK-2781][SQL] Check resolution of LogicalPlans in Analyzer. · c27718f3
      Aaron Staple authored
      LogicalPlan contains a ‘resolved’ attribute indicating that all of its execution requirements have been resolved. This attribute is not checked before query execution. The analyzer contains a step to check that all Expressions are resolved, but this is not equivalent to checking all LogicalPlans. In particular, the Union plan’s implementation of ‘resolved’ verifies that the types of its children’s columns are compatible. Because the analyzer does not check that a Union plan is resolved, it is possible to execute a Union plan that outputs different types in the same column.  See SPARK-2781 for an example.
      
      This patch adds two checks to the analyzer’s CheckResolution rule. First, each logical plan is checked to see if it is not resolved despite its children being resolved. This allows the ‘problem’ unresolved plan to be included in the TreeNodeException for reporting. Then as a backstop the root plan is checked to see if it is resolved, which recursively checks that the entire plan tree is resolved. Note that the resolved attribute is implemented recursively, and this patch also explicitly checks the resolved attribute on each logical plan in the tree. I assume the query plan trees will not be large enough for this redundant checking to meaningfully impact performance.
      
      Because this patch starts validating that LogicalPlans are resolved before execution, I had to fix some cases where unresolved plans were passing through the analyzer as part of the implementation of the hive query system. In particular, HiveContext applies the CreateTables and PreInsertionCasts, and ExtractPythonUdfs rules manually after the analyzer runs. I moved these rules to the analyzer stage (for hive queries only), in the process completing a code TODO indicating the rules should be moved to the analyzer.
      
      It’s worth noting that moving the CreateTables rule means introducing an analyzer rule with a significant side effect - in this case the side effect is creating a hive table. The rule will only attempt to create a table once even if its batch is executed multiple times, because it converts the InsertIntoCreatedTable plan it matches against into an InsertIntoTable. Additionally, these hive rules must be added to the Resolution batch rather than as a separate batch because hive rules rules may be needed to resolve non-root nodes, leaving the root to be resolved on a subsequent batch iteration. For example, the hive compatibility test auto_smb_mapjoin_14, and others, make use of a query plan where the root is a Union and its children are each a hive InsertIntoTable.
      
      Mixing the custom hive rules with standard analyzer rules initially resulted in an additional failure because of policy differences between spark sql and hive when casting a boolean to a string. Hive casts booleans to strings as “true” / “false” while spark sql casts booleans to strings as “1” / “0” (causing the cast1.q test to fail). This behavior is a result of the BooleanCasts rule in HiveTypeCoercion.scala, and from looking at the implementation of BooleanCasts I think converting to to “1”/“0” is potentially a programming mistake. (If the BooleanCasts rule is disabled, casting produces “true”/“false” instead.) I believe “true” / “false” should be the behavior for spark sql - I changed the behavior so bools are converted to “true”/“false” to be consistent with hive, and none of the existing spark tests failed.
      
      Finally, in some initial testing with hive it appears that an implicit type coercion of boolean to string results in a lowercase string, e.g. CONCAT( TRUE, “” ) -> “true” while an explicit cast produces an all caps string, e.g. CAST( TRUE AS STRING ) -> “TRUE”.  The change I’ve made just converts to lowercase strings in all cases.  I believe it is at least more correct than the existing spark sql implementation where all Cast expressions become “1” / “0”.
      
      Author: Aaron Staple <aaron.staple@gmail.com>
      
      Closes #1706 from staple/SPARK-2781 and squashes the following commits:
      
      32683c4 [Aaron Staple] Fix compilation failure due to merge.
      7c77fda [Aaron Staple] Move ExtractPythonUdfs to Analyzer's extendedRules in HiveContext.
      d49bfb3 [Aaron Staple] Address review comments.
      915b690 [Aaron Staple] Fix merge issue causing compilation failure.
      701dcd2 [Aaron Staple] [SPARK-2781][SQL] Check resolution of LogicalPlans in Analyzer.
      c27718f3
    • Michael Armbrust's avatar
      [SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE · f92cde24
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2323 from marmbrus/kryoJListNPE and squashes the following commits:
      
      9634f11 [Michael Armbrust] Rollback JSON RDD changes
      4d4d93c [Michael Armbrust] Merge remote-tracking branch 'origin/master' into kryoJListNPE
      646976b [Michael Armbrust] Fix JSON RDD Conversion too
      59065bc [Michael Armbrust] Remove explicit conversion to avoid NPE
      f92cde24
    • Michael Armbrust's avatar
      [SQL] Add test case with workaround for reading partitioned Avro files · 84e2c8bf
      Michael Armbrust authored
      In order to read from partitioned Avro files we need to also set the `SERDEPROPERTIES` since `TBLPROPERTIES` are not passed to the initialization.  This PR simply adds a test to make sure we don't break this workaround.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2340 from marmbrus/avroPartitioned and squashes the following commits:
      
      6b969d6 [Michael Armbrust] fix style
      fea2124 [Michael Armbrust] Add test case with workaround for reading partitioned avro files.
      84e2c8bf
    • qiping.lqp's avatar
      [SPARK-2207][SPARK-3272][MLLib]Add minimum information gain and minimum... · 79cdb9b6
      qiping.lqp authored
      [SPARK-2207][SPARK-3272][MLLib]Add minimum information gain and minimum instances per node as training parameters for decision tree.
      
      These two parameters can act as early stop rules to do pre-pruning. When a split cause cause left or right child to have less than `minInstancesPerNode` or has less information gain than `minInfoGain`, current node will not be split by this split.
      
      When there is no possible splits that satisfy requirements, there is no useful information gain stats, but we still need to calculate the predict value for current node. So I separated calculation of predict from calculation of information gain, which can also save computation when the number of possible splits is large. Please see [SPARK-3272](https://issues.apache.org/jira/browse/SPARK-3272) for more details.
      
      CC: mengxr manishamde jkbradley, please help me review this, thanks.
      
      Author: qiping.lqp <qiping.lqp@alibaba-inc.com>
      Author: chouqin <liqiping1991@gmail.com>
      
      Closes #2332 from chouqin/dt-preprune and squashes the following commits:
      
      f1d11d1 [chouqin] fix typo
      c7ebaf1 [chouqin] fix typo
      39f9b60 [chouqin] change edge `minInstancesPerNode` to 2 and add one more test
      0278a11 [chouqin] remove `noSplit` and set `Predict` private to tree
      d593ec7 [chouqin] fix docs and change minInstancesPerNode to 1
      efcc736 [qiping.lqp] fix bug
      10b8012 [qiping.lqp] fix style
      6728fad [qiping.lqp] minor fix: remove empty lines
      bb465ca [qiping.lqp] Merge branch 'master' of https://github.com/apache/spark into dt-preprune
      cadd569 [qiping.lqp] add api docs
      46b891f [qiping.lqp] fix bug
      e72c7e4 [qiping.lqp] add comments
      845c6fa [qiping.lqp] fix style
      f195e83 [qiping.lqp] fix style
      987cbf4 [qiping.lqp] fix bug
      ff34845 [qiping.lqp] separate calculation of predict of node from calculation of info gain
      ac42378 [qiping.lqp] add min info gain and min instances per node parameters in decision tree
      79cdb9b6
    • WangTaoTheTonic's avatar
      [SPARK-3411] Improve load-balancing of concurrently-submitted drivers across workers · 558962a8
      WangTaoTheTonic authored
      If the waiting driver array is too big, the drivers in it will be dispatched to the first worker we get(if it has enough resources), with or without the Randomization.
      
      We should do randomization every time we dispatch a driver, in order to better balance drivers.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #1106 from WangTaoTheTonic/fixBalanceDrivers and squashes the following commits:
      
      d1a928b [WangTaoTheTonic] Minor adjustment
      b6560cf [WangTaoTheTonic] solve the shuffle problem for HashSet
      f674e59 [WangTaoTheTonic] add comment and minor fix
      2835929 [WangTao] solve the failed test and avoid filtering
      2ca3091 [WangTao] fix checkstyle
      bc91bb1 [WangTao] Avoid shuffle every time we schedule the driver using round robin
      bbc7087 [WangTaoTheTonic] Optimize the schedule in Master
      558962a8
    • Wenchen Fan's avatar
      [SPARK-2096][SQL] Correctly parse dot notations · e4f4886d
      Wenchen Fan authored
      First let me write down the current `projections` grammar of spark sql:
      
          expression                : orExpression
          orExpression              : andExpression {"or" andExpression}
          andExpression             : comparisonExpression {"and" comparisonExpression}
          comparisonExpression      : termExpression | termExpression "=" termExpression | termExpression ">" termExpression | ...
          termExpression            : productExpression {"+"|"-" productExpression}
          productExpression         : baseExpression {"*"|"/"|"%" baseExpression}
          baseExpression            : expression "[" expression "]" | ... | ident | ...
          ident                     : identChar {identChar | digit} | delimiters | ...
          identChar                 : letter | "_" | "."
          delimiters                : "," | ";" | "(" | ")" | "[" | "]" | ...
          projection                : expression [["AS"] ident]
          projections               : projection { "," projection}
      
      For something like `a.b.c[1]`, it will be parsed as:
      <img src="http://img51.imgspice.com/i/03008/4iltjsnqgmtt_t.jpg" border=0>
      But for something like `a[1].b`, the current grammar can't parse it correctly.
      A simple solution is written in `ParquetQuerySuite#NestedSqlParser`, changed grammars are:
      
          delimiters                : "." | "," | ";" | "(" | ")" | "[" | "]" | ...
          identChar                 : letter | "_"
          baseExpression            : expression "[" expression "]" | expression "." ident | ... | ident | ...
      This works well, but can't cover some corner case like `select t.a.b from table as t`:
      <img src="http://img51.imgspice.com/i/03008/v2iau3hoxoxg_t.jpg" border=0>
      `t.a.b` parsed as `GetField(GetField(UnResolved("t"), "a"), "b")` instead of `GetField(UnResolved("t.a"), "b")` using this new grammar.
      However, we can't resolve `t` as it's not a filed, but the whole table.(if we could do this, then `select t from table as t` is legal, which is unexpected)
      My solution is:
      
          dotExpressionHeader       : ident "." ident
          baseExpression            : expression "[" expression "]" | expression "." ident | ... | dotExpressionHeader  | ident | ...
      I passed all test cases under sql locally and add a more complex case.
      "arrayOfStruct.field1 to access all values of field1" is not supported yet. Since this PR has changed a lot of code, I will open another PR for it.
      I'm not familiar with the latter optimize phase, please correct me if I missed something.
      
      Author: Wenchen Fan <cloud0fan@163.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2230 from cloud-fan/dot and squashes the following commits:
      
      e1a8898 [Wenchen Fan] remove support for arbitrary nested arrays
      ee8a724 [Wenchen Fan] rollback LogicalPlan, support dot operation on nested array type
      a58df40 [Michael Armbrust] add regression test for doubly nested data
      16bc4c6 [Wenchen Fan] some enhance
      95d733f [Wenchen Fan] split long line
      dc31698 [Wenchen Fan] SPARK-2096 Correctly parse dot notations
      e4f4886d
    • Sandy Ryza's avatar
      SPARK-1713. Use a thread pool for launching executors. · 1f4a648d
      Sandy Ryza authored
      This patch copies the approach used in the MapReduce application master for launching containers.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #663 from sryza/sandy-spark-1713 and squashes the following commits:
      
      036550d [Sandy Ryza] SPARK-1713. [YARN] Use a threadpool for launching executor containers
      1f4a648d
    • Josh Rosen's avatar
      26503fdf
    • Daoyuan Wang's avatar
      [SPARK-3363][SQL] Type Coercion should promote null to all other types. · f0c87dc8
      Daoyuan Wang authored
      Type Coercion should support every type to have null value
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2246 from adrian-wang/spark3363-0 and squashes the following commits:
      
      c6241de [Daoyuan Wang] minor code clean
      595b417 [Daoyuan Wang] Merge pull request #2 from marmbrus/pr/2246
      832e640 [Michael Armbrust] reduce code duplication
      ef6f986 [Daoyuan Wang] make double boolean miss in jsonRDD compatibleType
      c619f0a [Daoyuan Wang] Type Coercion should support every type to have null value
      f0c87dc8
    • Daoyuan Wang's avatar
      [SPARK-3362][SQL] Fix resolution for casewhen with nulls. · a0283300
      Daoyuan Wang authored
      Current implementation will ignore else val type.
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #2245 from adrian-wang/casewhenbug and squashes the following commits:
      
      3332f6e [Daoyuan Wang] remove wrong comment
      83b536c [Daoyuan Wang] a comment to trigger retest
      d7315b3 [Daoyuan Wang] code improve
      eed35fc [Daoyuan Wang] bug in casewhen resolve
      a0283300
    • Benoy Antony's avatar
      [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme i... · 6f7a7683
      Benoy Antony authored
      ...s https
      
      Author: Benoy Antony <benoy@apache.org>
      
      Closes #2276 from benoyantony/SPARK-3286 and squashes the following commits:
      
      c3d51ee [Benoy Antony] Use address with scheme, but Allpha version removes the scheme
      e82f94e [Benoy Antony] Use address with scheme, but Allpha version removes the scheme
      92127c9 [Benoy Antony] rebasing from master
      450c536 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme is https
      f060c02 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme is https
      6f7a7683
    • Eric Liang's avatar
      [SPARK-3395] [SQL] DSL sometimes incorrectly reuses attribute ids, breaking queries · b734ed0c
      Eric Liang authored
      This resolves https://issues.apache.org/jira/browse/SPARK-3395
      
      Author: Eric Liang <ekl@google.com>
      
      Closes #2266 from ericl/spark-3395 and squashes the following commits:
      
      7f2b6f0 [Eric Liang] add regression test
      05bd1e4 [Eric Liang] in the dsl, create a new schema instance in each applySchema
      b734ed0c
  4. Sep 09, 2014
    • Matthew Farrellee's avatar
      [SPARK-3458] enable python "with" statements for SparkContext · 25b5b867
      Matthew Farrellee authored
      allow for best practice code,
      
      ```
      try:
        sc = SparkContext()
        app(sc)
      finally:
        sc.stop()
      ```
      
      to be written using a "with" statement,
      
      ```
      with SparkContext() as sc:
        app(sc)
      ```
      
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2335 from mattf/SPARK-3458 and squashes the following commits:
      
      5b4e37c [Matthew Farrellee] [SPARK-3458] enable python "with" statements for SparkContext
      25b5b867
    • Cheng Lian's avatar
      [SPARK-3448][SQL] Check for null in SpecificMutableRow.update · c110614b
      Cheng Lian authored
      `SpecificMutableRow.update` doesn't check for null, and breaks existing `MutableRow` contract.
      
      The tricky part here is that for performance considerations, the `update` method of all subclasses of `MutableValue` doesn't check for null and sets the null bit to false.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2325 from liancheng/check-for-null and squashes the following commits:
      
      9366c44 [Cheng Lian] Check for null in SpecificMutableRow.update
      c110614b
    • xinyunh's avatar
      [SPARK-3176] Implement 'ABS and 'LAST' for sql · 07ee4a28
      xinyunh authored
      Add support for the mathematical function"ABS" and the analytic function "last" to return a subset of the rows satisfying a query within spark sql. Test-cases included.
      
      Author: xinyunh <xinyun.huang@huawei.com>
      Author: bomeng <golf8lover>
      
      Closes #2099 from xinyunh/sqlTest and squashes the following commits:
      
      71d15e7 [xinyunh] remove POWER part
      8843643 [xinyunh] fix the code style issue
      39f0309 [bomeng] Modify the code of POWER and ABS. Move them to the file arithmetic
      ff8e51e [bomeng] add abs() function support
      7f6980a [xinyunh] fix the bug in 'Last' component
      b3df91b [xinyunh] add 'Last' component
      07ee4a28
    • Prashant Sharma's avatar
      Minor - Fix trivial compilation warnings. · 02b5ac71
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2331 from ScrapCodes/compilation-warn and squashes the following commits:
      
      44c1e76 [Prashant Sharma] Minor - Fix trivial compilation warnings.
      02b5ac71
    • scwf's avatar
      [SPARK-3193]output errer info when Process exit code is not zero in test suite · 26862337
      scwf authored
      https://issues.apache.org/jira/browse/SPARK-3193
      I noticed that sometimes pr tests failed due to the Process exitcode != 0,refer to
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18688/consoleFull
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19118/consoleFull
      
      [info] SparkSubmitSuite:
      [info] - prints usage on empty input
      [info] - prints usage with only --help
      [info] - prints error with unrecognized options
      [info] - handle binary specified but not class
      [info] - handles arguments with --key=val
      [info] - handles arguments to user program
      [info] - handles arguments to user program with name collision
      [info] - handles YARN cluster mode
      [info] - handles YARN client mode
      [info] - handles standalone cluster mode
      [info] - handles standalone client mode
      [info] - handles mesos client mode
      [info] - handles confs with flag equivalents
      [info] - launch simple application with spark-submit *** FAILED ***
      [info]   org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1408854098404-0/testJar-1408854098404.jar) exited with code 1
      [info]   at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:872)
      [info]   at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
      [info]   at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291)
      [info]   at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284)
      [info]   at org.apacSpark assembly has been built with Hive, including Datanucleus jars on classpath
      
      this PR output the process error info when failed, it can be helpful for diagnosis.
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #2108 from scwf/output-test-error-info and squashes the following commits:
      
      0c48082 [scwf] minor fix according to comments
      563fde1 [scwf] output errer info when Process exitcode not zero
      26862337
    • Sean Owen's avatar
      SPARK-3404 [BUILD] SparkSubmitSuite fails with "spark-submit exits with code 1" · f0f1ba09
      Sean Owen authored
      This fixes the `SparkSubmitSuite` failure by setting `<spark.ui.port>0</spark.ui.port>` in the Maven build, to match the SBT build. This avoids a port conflict which causes failures.
      
      (This also updates the `scalatest` plugin off of a release candidate, to the identical final release.)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2328 from srowen/SPARK-3404 and squashes the following commits:
      
      512d782 [Sean Owen] Set spark.ui.port=0 in Maven scalatest config to match SBT build and avoid SparkSubmitSuite failure due to port conflict
      f0f1ba09
    • Sandy Ryza's avatar
      SPARK-3422. JavaAPISuite.getHadoopInputSplits isn't used anywhere. · 88547a09
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #2324 from sryza/sandy-spark-3422 and squashes the following commits:
      
      6446175 [Sandy Ryza] SPARK-3422. JavaAPISuite.getHadoopInputSplits isn't used anywhere.
      88547a09
    • Cheng Hao's avatar
      [SPARK-3455] [SQL] **HOT FIX** Fix the unit test failure · 1e03cf79
      Cheng Hao authored
      Unit test failed due to can not resolve the attribute references. Temporally disable this test case for a quick fixing, otherwise it will block the others.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2334 from chenghao-intel/unit_test_failure and squashes the following commits:
      
      661f784 [Cheng Hao] temporally disable the failed test case
      1e03cf79
    • Mario Pastorelli's avatar
      [Docs] actorStream storageLevel default is MEMORY_AND_DISK_SER_2 · c419e4f1
      Mario Pastorelli authored
      Comment of the storageLevel param of actorStream says that it defaults to memory-only while the default is MEMORY_AND_DISK_SER_2.
      
      Author: Mario Pastorelli <pastorelli.mario@gmail.com>
      
      Closes #2319 from melrief/master and squashes the following commits:
      
      7b6ce68 [Mario Pastorelli] [Docs] actorStream storageLevel default is MEMORY_AND_DISK_SER_2
      c419e4f1
    • Cheng Lian's avatar
      [Build] Removed -Phive-thriftserver since this profile has been removed · ce5cb325
      Cheng Lian authored
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2269 from liancheng/clean-run-tests-profile and squashes the following commits:
      
      08617bd [Cheng Lian] Removed -Phive-thriftserver since this profile has been removed
      ce5cb325
  5. Sep 08, 2014
    • Mark Hamstra's avatar
      SPARK-2425 Don't kill a still-running Application because of some misbehaving Executors · 092e2f15
      Mark Hamstra authored
      Introduces a LOADING -> RUNNING ApplicationState transition and prevents Master from removing an Application with RUNNING Executors.
      
      Two basic changes: 1) Instead of allowing MAX_NUM_RETRY abnormal Executor exits over the entire lifetime of the Application, allow that many since any Executor successfully began running the Application; 2) Don't remove the Application while Master still thinks that there are RUNNING Executors.
      
      This should be fine as long as the ApplicationInfo doesn't believe any Executors are forever RUNNING when they are not.  I think that any non-RUNNING Executors will eventually no longer be RUNNING in Master's accounting, but another set of eyes should confirm that.  This PR also doesn't try to detect which nodes have gone rogue or to kill off bad Workers, so repeatedly failing Executors will continue to fail and fill up log files with failure reports as long as the Application keeps running.
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      
      Closes #1360 from markhamstra/SPARK-2425 and squashes the following commits:
      
      f099c0b [Mark Hamstra] Reuse appInfo
      b2b7b25 [Mark Hamstra] Moved 'Application failed' logging
      bdd0928 [Mark Hamstra] switched to string interpolation
      1dd591b [Mark Hamstra] SPARK-2425 introduce LOADING -> RUNNING ApplicationState transition and prevent Master from removing Application with RUNNING Executors
      092e2f15
    • William Benton's avatar
      [SPARK-3329][SQL] Don't depend on Hive SET pair ordering in tests. · 2b7ab814
      William Benton authored
      This fixes some possible spurious test failures in `HiveQuerySuite` by comparing sets of key-value pairs as sets, rather than as lists.
      
      Author: William Benton <willb@redhat.com>
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #2220 from willb/spark-3329 and squashes the following commits:
      
      3b3e205 [William Benton] Collapse collectResults case match in HiveQuerySuite
      6525d8e [William Benton] Handle cases where SET returns Rows of (single) strings
      cf11b0e [Aaron Davidson] Fix flakey HiveQuerySuite test
      2b7ab814
    • Cheng Lian's avatar
      [SPARK-3414][SQL] Stores analyzed logical plan when registering a temp table · dc1dbf20
      Cheng Lian authored
      Case insensitivity breaks when unresolved relation contains attributes with uppercase letters in their names, because we store unanalyzed logical plan when registering temp tables while the `CaseInsensitivityAttributeReferences` batch runs before the `Resolution` batch. To fix this issue, we need to store analyzed logical plan.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2293 from liancheng/spark-3414 and squashes the following commits:
      
      d9fa1d6 [Cheng Lian] Stores analyzed logical plan when registering a temp table
      dc1dbf20
    • William Benton's avatar
      SPARK-3423: [SQL] Implement BETWEEN for SQLParser · ca0348e6
      William Benton authored
      This patch improves the SQLParser by adding support for BETWEEN conditions
      
      Author: William Benton <willb@redhat.com>
      
      Closes #2295 from willb/sql-between and squashes the following commits:
      
      0016d30 [William Benton] Implement BETWEEN for SQLParser
      ca0348e6
    • Xiangrui Meng's avatar
      [SPARK-3443][MLLIB] update default values of tree: · 50a4fa77
      Xiangrui Meng authored
      Adjust the default values of decision tree, based on the memory requirement discussed in https://github.com/apache/spark/pull/2125 :
      
      1. maxMemoryInMB: 128 -> 256
      2. maxBins: 100 -> 32
      3. maxDepth: 4 -> 5 (in some example code)
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #2322 from mengxr/tree-defaults and squashes the following commits:
      
      cda453a [Xiangrui Meng] fix tests
      5900445 [Xiangrui Meng] update comments
      8c81831 [Xiangrui Meng] update default values of tree:
      50a4fa77
Loading