Skip to content
Snippets Groups Projects
  1. Jun 08, 2015
    • Marcelo Vanzin's avatar
      [SPARK-8126] [BUILD] Use custom temp directory during build. · a1d9e5cc
      Marcelo Vanzin authored
      Even with all the efforts to cleanup the temp directories created by
      unit tests, Spark leaves a lot of garbage in /tmp after a test run.
      This change overrides java.io.tmpdir to place those files under the
      build directory instead.
      
      After an sbt full unit test run, I was left with > 400 MB of temp
      files. Since they're now under the build dir, it's much easier to
      clean them up.
      
      Also make a slight change to a unit test to make it not pollute the
      source directory with test data.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6674 from vanzin/SPARK-8126 and squashes the following commits:
      
      0f8ad41 [Marcelo Vanzin] Make sure tmp dir exists when tests run.
      643e916 [Marcelo Vanzin] [MINOR] [BUILD] Use custom temp directory during build.
      a1d9e5cc
    • Liang-Chi Hsieh's avatar
      [SPARK-7939] [SQL] Add conf to enable/disable partition column type inference · 03ef6be9
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7939
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6503 from viirya/disable_partition_type_inference and squashes the following commits:
      
      3e90470 [Liang-Chi Hsieh] Default to enable type inference and update docs.
      455edb1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into disable_partition_type_inference
      9a57933 [Liang-Chi Hsieh] Add conf to enable/disable partition column type inference.
      03ef6be9
    • linweizhong's avatar
      [SPARK-7705] [YARN] Cleanup of .sparkStaging directory fails if application is killed · eacd4a92
      linweizhong authored
      As I have tested, if we cancel or kill the app then the final status may be undefined, killed or succeeded, so clean up staging directory when appMaster exit at any final application status.
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #6409 from Sephiroth-Lin/SPARK-7705 and squashes the following commits:
      
      3a5a0a5 [linweizhong] Update
      83dc274 [linweizhong] Update
      923d44d [linweizhong] Update
      0dd7c2d [linweizhong] Update
      b76a102 [linweizhong] Update code style
      7846b69 [linweizhong] Update
      bd6cf0d [linweizhong] Refactor
      aed9f18 [linweizhong] Clean up stagingDir when launch app on yarn
      95595c3 [linweizhong] Cleanup of .sparkStaging directory when AppMaster exit at any final application status
      eacd4a92
    • Daoyuan Wang's avatar
      [SPARK-4761] [DOC] [SQL] kryo default setting in SQL Thrift server · 10fc2f6f
      Daoyuan Wang authored
      this is a follow up of #3621
      
      /cc liancheng pwendell
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #6639 from adrian-wang/kryodoc and squashes the following commits:
      
      3c4b1cf [Daoyuan Wang] [DOC] kryo default setting in SQL Thrift server
      10fc2f6f
    • Reynold Xin's avatar
      [SPARK-8154][SQL] Remove Term/Code type aliases in code generation. · 72ba0fc4
      Reynold Xin authored
      From my perspective as a code reviewer, I find them more confusing than using String directly.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6694 from rxin/SPARK-8154 and squashes the following commits:
      
      4e5056c [Reynold Xin] [SPARK-8154][SQL] Remove Term/Code type aliases in code generation.
      72ba0fc4
  2. Jun 07, 2015
    • Reynold Xin's avatar
      [SPARK-8149][SQL] Break ExpressionEvaluationSuite down to multiple files · f74be744
      Reynold Xin authored
      Also moved a few files in expressions package around to match test suites.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6693 from rxin/expr-refactoring and squashes the following commits:
      
      857599f [Reynold Xin] Fixed style violation.
      c0eb74b [Reynold Xin] Fixed compilation.
      b3a40f8 [Reynold Xin] Refactored expression test suites.
      f74be744
    • Davies Liu's avatar
      [SPARK-8117] [SQL] Push codegen implementation into each Expression · 5e7b6b67
      Davies Liu authored
      This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.
      
      It introduces two APIs in Expression:
      ```
      def gen(ctx: CodeGenContext): GeneratedExpressionCode
      def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code
      ```
      
      gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource().
      
      Here are the types:
      ```
      type Term String
      type Code String
      
      /**
       * Java source for evaluating an [[Expression]] given a [[Row]] of input.
       */
      case class GeneratedExpressionCode(var code: Code,
                                     nullTerm: Term,
                                     primitiveTerm: Term,
                                     objectTerm: Term)
      /**
       * A context for codegen, which is used to bookkeeping the expressions those are not supported
       * by codegen, then they are evaluated directly. The unsupported expression is appended at the
       * end of `references`, the position of it is kept in the code, used to access and evaluate it.
       */
      class CodeGenContext {
        /**
         * Holding all the expressions those do not support codegen, will be evaluated directly.
         */
        val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]()
      }
      ```
      
      This is basically #6660, but fixed style violation and compilation failure.
      
      Author: Davies Liu <davies@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6690 from rxin/codegen and squashes the following commits:
      
      e1368c2 [Reynold Xin] Fixed tests.
      73db80e [Reynold Xin] Fixed compilation failure.
      19d6435 [Reynold Xin] Fixed style violation.
      9adaeaf [Davies Liu] address comments
      f42c732 [Davies Liu] improve coverage and tests
      bad6828 [Davies Liu] address comments
      e03edaa [Davies Liu] consts fold
      86fac2c [Davies Liu] fix style
      02262c9 [Davies Liu] address comments
      b5d3617 [Davies Liu] Merge pull request #5 from rxin/codegen
      48c454f [Reynold Xin] Some code gen update.
      2344bc0 [Davies Liu] fix test
      12ff88a [Davies Liu] fix build
      c5fb514 [Davies Liu] rename
      8c6d82d [Davies Liu] update docs
      b145047 [Davies Liu] fix style
      e57959d [Davies Liu] add type alias
      3ff25f8 [Davies Liu] refactor
      593d617 [Davies Liu] pushing codegen into Expression
      5e7b6b67
    • cody koeninger's avatar
      [SPARK-2808] [STREAMING] [KAFKA] cleanup tests from · b127ff8a
      cody koeninger authored
      see if requiring producer acks eliminates the need for waitUntilLeaderOffset calls in tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #5921 from koeninger/kafka-0.8.2-test-cleanup and squashes the following commits:
      
      1e89dc8 [cody koeninger] Merge branch 'master' into kafka-0.8.2-test-cleanup
      4662828 [cody koeninger] [Streaming][Kafka] filter mima issue for removal of method from private test class
      af1e083 [cody koeninger] Merge branch 'master' into kafka-0.8.2-test-cleanup
      4298ac2 [cody koeninger] [Streaming][Kafka] update comment to trigger jenkins attempt
      1274afb [cody koeninger] [Streaming][Kafka] see if requiring producer acks eliminates the need for waitUntilLeaderOffset calls in tests
      b127ff8a
    • Sean Owen's avatar
      [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+ · e84815dc
      Sean Owen authored
      Update build to use Java 7, and remove some comments and special-case support for Java 6.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6265 from srowen/SPARK-7733 and squashes the following commits:
      
      59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
      e84815dc
    • Wenchen Fan's avatar
      [SPARK-7952][SQL] use internal Decimal instead of java.math.BigDecimal · db81b9d8
      Wenchen Fan authored
      This PR fixes a bug introduced in https://github.com/apache/spark/pull/6505.
      Decimal literal's value is not `java.math.BigDecimal`, but Spark SQL internal type: `Decimal`.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6574 from cloud-fan/fix and squashes the following commits:
      
      b0e3549 [Wenchen Fan] rename to BooleanEquality
      1987b37 [Wenchen Fan] use Decimal instead of java.math.BigDecimal
      f93c420 [Wenchen Fan] compare literal
      db81b9d8
    • Reynold Xin's avatar
      [SPARK-8004][SQL] Quote identifier in JDBC data source. · d6d601a0
      Reynold Xin authored
      This is a follow-up patch to #6577 to replace columnEnclosing to quoteIdentifier.
      
      I also did some minor cleanup to the JdbcDialect file.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6689 from rxin/jdbc-quote and squashes the following commits:
      
      bad365f [Reynold Xin] Fixed test compilation...
      e39e14e [Reynold Xin] Fixed compilation.
      db9a8e0 [Reynold Xin] [SPARK-8004][SQL] Quote identifier in JDBC data source.
      d6d601a0
    • Yijie Shen's avatar
      [DOC] [TYPO] Fix typo in standalone deploy scripts description · 835f1380
      Yijie Shen authored
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #6691 from yijieshen/patch-2 and squashes the following commits:
      
      b40a4b0 [Yijie Shen] [DOC][TYPO] Fix typo in standalone deploy scripts description
      835f1380
    • Konstantin Shaposhnikov's avatar
      [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x · ca8dafcc
      Konstantin Shaposhnikov authored
      Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs
      custom 2.3.4-spark akka version that shades protobuf-2.5
      
      This change also updates akka version (for hadoop-2.x profiles only) to the
      latest 2.3.11 as akka-zeromq_2.11 is not available for akka 2.3.4.
      
      This partially fixes SPARK-7042 (for hadoop-2.x builds)
      
      Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com>
      
      Closes #6492 from kostya-sh/SPARK-7042 and squashes the following commits:
      
      dc195b0 [Konstantin Shaposhnikov] [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x
      ca8dafcc
    • Cheng Lian's avatar
      [SPARK-8118] [SQL] Mutes noisy Parquet log output reappeared after upgrading Parquet to 1.7.0 · 8c321d66
      Cheng Lian authored
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6670 from liancheng/spark-8118 and squashes the following commits:
      
      b6e85a6 [Cheng Lian] Suppresses unnecesary ParquetRecordReader log message (PARQUET-220)
      385603c [Cheng Lian] Mutes noisy Parquet log output reappeared after upgrading Parquet to 1.7.0
      8c321d66
    • Reynold Xin's avatar
      [SPARK-8146] DataFrame Python API: Alias replace in df.na · 0ac47083
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6688 from rxin/df-alias-replace and squashes the following commits:
      
      774c19c [Reynold Xin] [SPARK-8146] DataFrame Python API: Alias replace in DataFrameNaFunctions.
      0ac47083
    • Liang-Chi Hsieh's avatar
      [SPARK-8141] [SQL] Precompute datatypes for partition columns and reuse it · 26d07f1e
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8141
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6687 from viirya/reuse_partition_column_types and squashes the following commits:
      
      dab0688 [Liang-Chi Hsieh] Reuse partitionColumnTypes.
      26d07f1e
    • 979969786's avatar
      [SPARK-8145] [WEBUI] Trigger a double click on the span to show full job description. · 081db947
      979969786 authored
      When using the Spark SQL, Jobs tab and Stages tab display only part of SQL. I change it to  display full SQL by double-click on the description span
      
      before:
      ![before](https://cloud.githubusercontent.com/assets/5399861/8022257/9f8e0a22-0cf8-11e5-98c8-da4d7a615e7e.png)
      
      after double click on the description span:
      ![after](https://cloud.githubusercontent.com/assets/5399861/8022261/dac08d4a-0cf8-11e5-8fe7-74c96c6ce933.png)
      
      Author: 979969786 <q79969786@gmail.com>
      
      Closes #6646 from 979969786/master and squashes the following commits:
      
      b5ba20e [979969786] Trigger a double click on the span to show full job description.
      081db947
    • Liang-Chi Hsieh's avatar
      [SPARK-8004][SQL] Enclose column names by JDBC Dialect · 901a552c
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8004
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6577 from viirya/enclose_jdbc_columns and squashes the following commits:
      
      614606a [Liang-Chi Hsieh] For comment.
      bc50182 [Liang-Chi Hsieh] Enclose column names by JDBC Dialect.
      901a552c
  3. Jun 06, 2015
    • Hari Shreedharan's avatar
      [SPARK-7955] [CORE] Ensure executors with cached RDD blocks are not re… · 3285a511
      Hari Shreedharan authored
      …moved if dynamic allocation is enabled.
      
      This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed,
      but makes no attempt to find another executor to remove. This is meant to get some feedback on the current
      approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6508 from harishreedharan/dymanic-caching and squashes the following commits:
      
      dddf1eb [Hari Shreedharan] Minor configuration description update.
      10130e2 [Hari Shreedharan] Fix compile issue.
      5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped.
      875916a [Hari Shreedharan] Make some code more readable.
      39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered.
      90ad711 [Hari Shreedharan] Remove unused imports and unused methods.
      063985c [Hari Shreedharan] Send correct message instead of recursively calling same method.
      ec2fd7e [Hari Shreedharan] Add file missed in last commit
      5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC.
      193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC.
      ae932ff [Hari Shreedharan] Fix config param name.
      272969d [Hari Shreedharan] Fix seconds to millis bug.
      5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks.
      57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.
      3285a511
    • Hari Shreedharan's avatar
      [SPARK-8136] [YARN] Fix flakiness in YarnClusterSuite. · ed2cc3ee
      Hari Shreedharan authored
      Instead of actually downloading the logs, just verify that the logs link is actually
      a URL and is in the expected format.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6680 from harishreedharan/simplify-am-log-tests and squashes the following commits:
      
      3183aeb [Hari Shreedharan] Remove check for hostname which can fail on machines with several hostnames. Removed some unused imports.
      50d69a7 [Hari Shreedharan] [SPARK-8136][YARN] Fix flakiness in YarnClusterSuite.
      ed2cc3ee
    • Marcelo Vanzin's avatar
      [SPARK-7169] [CORE] Allow metrics system to be configured through SparkConf. · 18c4fceb
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #6560 from vanzin/SPARK-7169 and squashes the following commits:
      
      737266f [Marcelo Vanzin] Feedback.
      702d5a3 [Marcelo Vanzin] Scalastyle.
      ce66e7e [Marcelo Vanzin] Remove metrics config handling from SparkConf.
      439938a [Jacek Lewandowski] SPARK-7169: Metrics can be additionally configured from Spark configuration
      18c4fceb
    • MechCoder's avatar
      [SPARK-7639] [PYSPARK] [MLLIB] Python API for KernelDensity · 5aa804f3
      MechCoder authored
      Python API for KernelDensity
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6387 from MechCoder/spark-7639 and squashes the following commits:
      
      17abc62 [MechCoder] add tests
      2de6540 [MechCoder] style tests
      bf4acc0 [MechCoder] Added doctests
      84359d5 [MechCoder] [SPARK-7639] Python API for KernelDensity
      5aa804f3
    • Cheng Lian's avatar
      [SPARK-8079] [SQL] Makes InsertIntoHadoopFsRelation job/task abortion more robust · 16fc4961
      Cheng Lian authored
      As described in SPARK-8079, when writing a DataFrame to a `HadoopFsRelation`, if `HadoopFsRelation.prepareForWriteJob` throws exception, an unexpected NPE will be thrown during job abortion. (This issue doesn't bring much damage since the job is failing anyway.)
      
      This PR makes the job/task abortion logic in `InsertIntoHadoopFsRelation` more robust to avoid such confusing exceptions.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6612 from liancheng/spark-8079 and squashes the following commits:
      
      87cd81e [Cheng Lian] Addresses @rxin's comment
      1864c75 [Cheng Lian] Addresses review comments
      9e6dbb3 [Cheng Lian] Makes InsertIntoHadoopFsRelation job/task abortion more robust
      16fc4961
    • Xu Tingjun's avatar
      [SPARK-6973] remove skipped stage ID from completed set on the allJobsPage · a8077e5c
      Xu Tingjun authored
      Though totalStages = allStages - skippedStages is understandable. But consider the problem [SPARK-6973], I think totalStages = allStages is more reasonable. Like "2/1 (2 failed) (1 skipped)", this item also shows the skipped num, it also will be understandable.
      
      Author: Xu Tingjun <xutingjun@huawei.com>
      Author: Xutingjun <xutingjun@huawei.com>
      Author: meiyoula <1039320815@qq.com>
      
      Closes #5550 from XuTingjun/allJobsPage and squashes the following commits:
      
      a742541 [Xu Tingjun] delete the loop
      40ce94b [Xutingjun] remove stage id from completed set if it retries again
      6459238 [meiyoula] delete space
      9e23c71 [Xu Tingjun] recover numSkippedStages
      b987ea7 [Xutingjun] delete skkiped stages from completed set
      47525c6 [Xu Tingjun] modify total stages/tasks on the allJobsPage
      a8077e5c
    • Reynold Xin's avatar
      [SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ round 3. · a71be0a3
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6677 from rxin/test-wildcard and squashes the following commits:
      
      8a17b33 [Reynold Xin] Fixed line length.
      6663813 [Reynold Xin] [SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ round 3.
      a71be0a3
  4. Jun 05, 2015
    • Dong Wang's avatar
      [SPARK-6964] [SQL] Support Cancellation in the Thrift Server · eb19d3f7
      Dong Wang authored
      Support runInBackground in SparkExecuteStatementOperation, and add cancellation
      
      Author: Dong Wang <dong@databricks.com>
      
      Closes #6207 from dongwang218/SPARK-6964-jdbc-cancel and squashes the following commits:
      
      687c113 [Dong Wang] fix 100 characters
      7bfa2a7 [Dong Wang] fix merge
      380480f [Dong Wang] fix for liancheng's comments
      eb3e385 [Dong Wang] small nit
      341885b [Dong Wang] small fix
      3d8ebf8 [Dong Wang] add spark.sql.hive.thriftServer.async flag
      04142c3 [Dong Wang] set SQLSession for async execution
      184ec35 [Dong Wang] keep hive conf
      819ae03 [Dong Wang] [SPARK-6964][SQL][WIP] Support Cancellation in the Thrift Server
      eb19d3f7
    • Reynold Xin's avatar
      [SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ cont'd. · 6ebe419f
      Reynold Xin authored
      Fixed the following packages:
      sql.columnar
      sql.jdbc
      sql.json
      sql.parquet
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6667 from rxin/testsqlcontext_wildcard and squashes the following commits:
      
      134a776 [Reynold Xin] Fixed compilation break.
      6da7b69 [Reynold Xin] [SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ cont'd.
      6ebe419f
    • amey's avatar
      [SPARK-7991] [PySpark] Adding support for passing lists to describe. · 356a4a9b
      amey authored
      This is a minor change.
      
      Author: amey <amey@skytree.net>
      
      Closes #6655 from ameyc/JIRA-7991/support-passing-list-to-describe and squashes the following commits:
      
      e8a1dff [amey] Adding support for passing lists to describe.
      356a4a9b
    • Luca Martinetti's avatar
      [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort · 4060526c
      Luca Martinetti authored
      Add documentation for spark.sql.planner.externalSort
      
      Author: Luca Martinetti <luca@luca.io>
      
      Closes #6272 from lucamartinetti/docs-externalsort and squashes the following commits:
      
      985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort
      4060526c
    • zsxwing's avatar
      [SPARK-8112] [STREAMING] Fix the negative event count issue · 4f16d3fe
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6659 from zsxwing/SPARK-8112 and squashes the following commits:
      
      a5d7da6 [zsxwing] Address comments
      d255b6e [zsxwing] Fix the negative event count issue
      4f16d3fe
    • jerryshao's avatar
      [SPARK-7699] [CORE] Lazy start the scheduler for dynamic allocation · 3f80bc84
      jerryshao authored
      This patch propose to lazy start the scheduler for dynamic allocation to avoid fast ramp down executor numbers is load is less.
      
      This implementation will:
      1. immediately start the scheduler is `numExecutorsTarget` is 0, this is the expected behavior.
      2. if `numExecutorsTarget` is not zero, start the scheduler until the number is satisfied, if the load is less, this initial started executors will last for at least 60 seconds, user will have a window to submit a job, no need to revamp the executors.
      3. if `numExecutorsTarget` is not satisfied until the timeout, this means resource is not enough, the scheduler will start until this timeout, will not wait infinitely.
      
      Please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #6430 from jerryshao/SPARK-7699 and squashes the following commits:
      
      02cac8e [jerryshao] Address the comments
      7242450 [jerryshao] Remove the useless import
      ecc0b00 [jerryshao] Address the comments
      6f75f00 [jerryshao] Style changes
      8b8decc [jerryshao] change the test name
      fb822ca [jerryshao] Change the solution according to comments
      1cc74e5 [jerryshao] Lazy start the scheduler for dynamic allocation
      3f80bc84
    • Xutingjun's avatar
      [SPARK-8099] set executor cores into system in yarn-cluster mode · 0992a0a7
      Xutingjun authored
      Author: Xutingjun <xutingjun@huawei.com>
      Author: xutingjun <xutingjun@huawei.com>
      
      Closes #6643 from XuTingjun/SPARK-8099 and squashes the following commits:
      
      80b18cd [Xutingjun] change to STANDALONE | YARN
      ce33148 [Xutingjun] set executor cores into system
      e51cc9e [Xutingjun] set executor cores into system
      0600861 [xutingjun] set executor cores into system
      0992a0a7
    • Andrew Or's avatar
      Revert "[MINOR] [BUILD] Use custom temp directory during build." · 4036d05c
      Andrew Or authored
      This reverts commit b16b5434.
      4036d05c
    • Shivaram Venkataraman's avatar
      [SPARK-8085] [SPARKR] Support user-specified schema in read.df · 12f5eaee
      Shivaram Venkataraman authored
      cc davies sun-rui
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6620 from shivaram/sparkr-read-schema and squashes the following commits:
      
      16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test
      a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader
      ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df
      12f5eaee
    • Cheng Lian's avatar
      [SQL] Simplifies binary node pattern matching · bc0d76a2
      Cheng Lian authored
      This PR is a simpler version of #2764, and adds `unapply` methods to the following binary nodes for simpler pattern matching:
      
      - `BinaryExpression`
      - `BinaryComparison`
      - `BinaryArithmetics`
      
      This enables nested pattern matching for binary nodes. For example, the following pattern matching
      
      ```scala
      case p: BinaryComparison if p.left.dataType == StringType &&
                                  p.right.dataType == DateType =>
        p.makeCopy(Array(p.left, Cast(p.right, StringType)))
      ```
      
      can be simplified to
      
      ```scala
      case p  BinaryComparison(l  StringType(), r  DateType()) =>
        p.makeCopy(Array(l, Cast(r, StringType)))
      ```
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6537 from liancheng/binary-node-patmat and squashes the following commits:
      
      a3bf5fe [Cheng Lian] Fixes compilation error introduced while rebasing
      b738986 [Cheng Lian] Renames `l`/`r` to `left`/`right` or `lhs`/`rhs`
      14900ae [Cheng Lian] Simplifies binary node pattern matching
      bc0d76a2
    • Marcelo Vanzin's avatar
      [SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1
      Marcelo Vanzin authored
      Reorganize code so that the launcher library handles most of the work
      of printing usage messages, instead of having an awkward protocol between
      the library and the scripts for that.
      
      This mostly applies to SparkSubmit, since the launcher lib does not do
      command line parsing for classes invoked in other ways, and thus cannot
      handle failures for those. Most scripts end up going through SparkSubmit,
      though, so it all works.
      
      The change adds a new, internal command line switch, "--usage-error",
      which prints the usage message and exits with a non-zero status. Scripts
      can override the command printed in the usage message by setting an
      environment variable - this avoids having to grep the output of
      SparkSubmit to remove references to the "spark-submit" script.
      
      The only sub-optimal part of the change is the special handling for the
      spark-sql usage, which is now done in SparkSubmitArguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:
      
      2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
      bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
      c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
      6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
      700312e1
    • Akhil Das's avatar
      [STREAMING] Update streaming-kafka-integration.md · 019dc9f5
      Akhil Das authored
      Fixed the broken links (Examples) in the documentation.
      
      Author: Akhil Das <akhld@darktech.ca>
      
      Closes #6666 from akhld/patch-2 and squashes the following commits:
      
      2228b83 [Akhil Das] Update streaming-kafka-integration.md
      019dc9f5
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Use custom temp directory during build. · b16b5434
      Marcelo Vanzin authored
      Even with all the efforts to cleanup the temp directories created by
      unit tests, Spark leaves a lot of garbage in /tmp after a test run.
      This change overrides java.io.tmpdir to place those files under the
      build directory instead.
      
      After an sbt full unit test run, I was left with > 400 MB of temp
      files. Since they're now under the build dir, it's much easier to
      clean them up.
      
      Also make a slight change to a unit test to make it not pollute the
      source directory with test data.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6653 from vanzin/unit-test-tmp and squashes the following commits:
      
      31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
      aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.
      b16b5434
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Change link to jenkins builds on github. · da20c8ca
      Marcelo Vanzin authored
      Link to the tail of the console log, instead of the full log. That's
      bound to have the info the user is looking for, and at the same time
      loads way more quickly than the (huge) full log, which is just one click
      away if needed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6664 from vanzin/jenkins-link and squashes the following commits:
      
      ba07ed8 [Marcelo Vanzin] [minor] [build] Change link to jenkins builds on github.
      da20c8ca
    • Sean Owen's avatar
      [MINOR] remove unused interpolation var in log message · 3a5c4da4
      Sean Owen authored
      Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6650 from srowen/Interpolation and squashes the following commits:
      
      518687a [Sean Owen] Actually interpolate log string
      7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message
      3a5c4da4
Loading