Skip to content
Snippets Groups Projects
  1. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1
      Marcelo Vanzin authored
      Reorganize code so that the launcher library handles most of the work
      of printing usage messages, instead of having an awkward protocol between
      the library and the scripts for that.
      
      This mostly applies to SparkSubmit, since the launcher lib does not do
      command line parsing for classes invoked in other ways, and thus cannot
      handle failures for those. Most scripts end up going through SparkSubmit,
      though, so it all works.
      
      The change adds a new, internal command line switch, "--usage-error",
      which prints the usage message and exits with a non-zero status. Scripts
      can override the command printed in the usage message by setting an
      environment variable - this avoids having to grep the output of
      SparkSubmit to remove references to the "spark-submit" script.
      
      The only sub-optimal part of the change is the special handling for the
      spark-sql usage, which is now done in SparkSubmitArguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:
      
      2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
      bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
      c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
      6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
      700312e1
    • Akhil Das's avatar
      [STREAMING] Update streaming-kafka-integration.md · 019dc9f5
      Akhil Das authored
      Fixed the broken links (Examples) in the documentation.
      
      Author: Akhil Das <akhld@darktech.ca>
      
      Closes #6666 from akhld/patch-2 and squashes the following commits:
      
      2228b83 [Akhil Das] Update streaming-kafka-integration.md
      019dc9f5
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Use custom temp directory during build. · b16b5434
      Marcelo Vanzin authored
      Even with all the efforts to cleanup the temp directories created by
      unit tests, Spark leaves a lot of garbage in /tmp after a test run.
      This change overrides java.io.tmpdir to place those files under the
      build directory instead.
      
      After an sbt full unit test run, I was left with > 400 MB of temp
      files. Since they're now under the build dir, it's much easier to
      clean them up.
      
      Also make a slight change to a unit test to make it not pollute the
      source directory with test data.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6653 from vanzin/unit-test-tmp and squashes the following commits:
      
      31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
      aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.
      b16b5434
    • Marcelo Vanzin's avatar
      [MINOR] [BUILD] Change link to jenkins builds on github. · da20c8ca
      Marcelo Vanzin authored
      Link to the tail of the console log, instead of the full log. That's
      bound to have the info the user is looking for, and at the same time
      loads way more quickly than the (huge) full log, which is just one click
      away if needed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6664 from vanzin/jenkins-link and squashes the following commits:
      
      ba07ed8 [Marcelo Vanzin] [minor] [build] Change link to jenkins builds on github.
      da20c8ca
    • Sean Owen's avatar
      [MINOR] remove unused interpolation var in log message · 3a5c4da4
      Sean Owen authored
      Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6650 from srowen/Interpolation and squashes the following commits:
      
      518687a [Sean Owen] Actually interpolate log string
      7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message
      3a5c4da4
    • Yijie Shen's avatar
      [DOC][Minor]Specify the common sources available for collecting · 2777ed39
      Yijie Shen authored
      I was wondering what else common sources available until search the source code. Maybe better to make this clear.
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #6641 from yijieshen/patch-1 and squashes the following commits:
      
      b5b99b4 [Yijie Shen] Make it clear that JvmSource is the only available additional source currently
      f23140c [Yijie Shen] [DOC][Minor]Specify the common sources available for collecting
      2777ed39
    • Ted Blackman's avatar
      [SPARK-8116][PYSPARK] Allow sc.range() to take a single argument. · e5054605
      Ted Blackman authored
      
      Author: Ted Blackman <ted.blackman@gmail.com>
      
      Closes #6656 from belisarius222/branch-1.4 and squashes the following commits:
      
      747cbc2 [Ted Blackman] [SPARK-8116][PYSPARK] Allow sc.range() to take a single argument.
      
      (cherry picked from commit f02af7c8)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      e5054605
    • Reynold Xin's avatar
      [SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ · 8f16b94a
      Reynold Xin authored
      I kept some of the sql import there to avoid changing too many lines.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6661 from rxin/remove-wildcard-import-sqlcontext and squashes the following commits:
      
      c265347 [Reynold Xin] Fixed ListTablesSuite failure.
      de9d491 [Reynold Xin] Fixed tests.
      73b5365 [Reynold Xin] Mima.
      8f6b642 [Reynold Xin] Fixed style violation.
      443f6e8 [Reynold Xin] [SPARK-8113][SQL] Remove some wildcard import on TestSQLContext._
      8f16b94a
  2. Jun 04, 2015
    • Josh Rosen's avatar
      [SPARK-8106] [SQL] Set derby.system.durability=test to speed up Hive compatibility tests · 74dc2a90
      Josh Rosen authored
      Derby has a `derby.system.durability` configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests.
      
      We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change.
      
      See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6651 from JoshRosen/hive-compat-suite-speedup and squashes the following commits:
      
      b7a08a2 [Josh Rosen] Set derby.system.durability=test in our unit tests.
      74dc2a90
    • Carson Wang's avatar
      [SPARK-8098] [WEBUI] Show correct length of bytes on log page · 63bc0c44
      Carson Wang authored
      The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #6640 from carsonwang/logpage and squashes the following commits:
      
      58cb3fd [Carson Wang] Show correct length of bytes on log page
      63bc0c44
    • Reynold Xin's avatar
      [SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate · 2bcdf8c2
      Reynold Xin authored
      This patch replaces Distinct with Aggregate in the optimizer, so Distinct will become
      more efficient over time as we optimize Aggregate (via Tungsten).
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6637 from rxin/replace-distinct and squashes the following commits:
      
      b3cc50e [Reynold Xin] Mima excludes.
      93d6117 [Reynold Xin] Code review feedback.
      87e4741 [Reynold Xin] [SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate.
      2bcdf8c2
    • Reynold Xin's avatar
    • Cheolsoo Park's avatar
      [SPARK-6909][SQL] Remove Hive Shim code · 0526fea4
      Cheolsoo Park authored
      This is a follow-up on #6393. I am removing the following files in this PR.
      ```
      ./sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala
      ./sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala
      ```
      Basically, I re-factored the shim code as follows-
      * Rewrote code directly with Hive 0.13 methods, or
      * Converted code into private methods, or
      * Extracted code into separate classes
      
      But for leftover code that didn't fit in any of these cases, I created a HiveShim object. For eg, helper functions which wrap Hive 0.13 methods to work around Hive bugs are placed here.
      
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #6604 from piaozhexiu/SPARK-6909 and squashes the following commits:
      
      5dccc20 [Cheolsoo Park] Remove hive shim code
      0526fea4
    • Shivaram Venkataraman's avatar
      [SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh · 3dc00528
      Shivaram Venkataraman authored
      This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      
      Related to discussion in #6567
      
      cc pwendell srowen -- Let me know if this looks better
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:
      
      b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
      8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
      3dc00528
    • Thomas Omans's avatar
      [SPARK-7743] [SQL] Parquet 1.7 · cd3176bd
      Thomas Omans authored
      Resolves [SPARK-7743](https://issues.apache.org/jira/browse/SPARK-7743).
      
      Trivial changes of versions, package names, as well as a small issue in `ParquetTableOperations.scala`
      
      ```diff
      -    val readContext = getReadSupport(configuration).init(
      +    val readContext = ParquetInputFormat.getReadSupportInstance(configuration).init(
      ```
      
      Since ParquetInputFormat.getReadSupport was made package private in the latest release.
      
      Thanks
      -- Thomas Omans
      
      Author: Thomas Omans <tomans@cj.com>
      
      Closes #6597 from eggsby/SPARK-7743 and squashes the following commits:
      
      2df0d1b [Thomas Omans] [SPARK-7743] [SQL] Upgrading parquet version to 1.7.0
      cd3176bd
    • Mike Dusenberry's avatar
      [SPARK-7969] [SQL] Added a DataFrame.drop function that accepts a Column reference. · df7da07a
      Mike Dusenberry authored
      Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests.  Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits:
      
      514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0.
      2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over.
      6bf7c0e [Mike Dusenberry] Minor code formatting change.
      e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name.
      5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names.
      4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference.
      986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests.  Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function.
      df7da07a
    • Davies Liu's avatar
      [SPARK-7956] [SQL] Use Janino to compile SQL expressions into bytecode · c8709dcf
      Davies Liu authored
      In order to reduce the overhead of codegen, this PR switch to use Janino to compile SQL expressions into bytecode.
      
      After this, the time used to compile a SQL expression is decreased from 100ms to 5ms, which is necessary to turn on codegen for general workload, also tests.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6479 from davies/janino and squashes the following commits:
      
      cc689f5 [Davies Liu] remove globalLock
      262d848 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      eec3a33 [Davies Liu] address comments from Josh
      f37c8c3 [Davies Liu] fix DecimalType and cast to String
      202298b [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      a21e968 [Davies Liu] fix style
      0ed3dc6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      551a851 [Davies Liu] fix tests
      c3bdffa [Davies Liu] remove print
      6089ce5 [Davies Liu] change logging level
      7e46ac3 [Davies Liu] fix style
      d8f0f6c [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      da4926a [Davies Liu] fix tests
      03660f3 [Davies Liu] WIP: use Janino to compile Java source
      f2629cd [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino
      f7d66cf [Davies Liu] use template based string for codegen
      c8709dcf
    • Daniel Darabos's avatar
      Fix maxTaskFailures comment · 10ba1880
      Daniel Darabos authored
      If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks!
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #6621 from darabos/patch-2 and squashes the following commits:
      
      dfebdec [Daniel Darabos] Fix comment.
      10ba1880
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 9982d453
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #5976 (close requested by 'JoshRosen')
      Closes #4576 (close requested by 'pwendell')
      Closes #3430 (close requested by 'pwendell')
      Closes #2495 (close requested by 'pwendell')
      9982d453
  3. Jun 03, 2015
    • Andrew Or's avatar
      [BUILD] Fix Maven build for Kinesis · 984ad601
      Andrew Or authored
      A necessary dependency that is transitively referenced is not
      provided, causing compilation failures in builds that provide
      the kinesis-asl profile.
      984ad601
    • Andrew Or's avatar
      [BUILD] Use right branch when checking against Hive · 9cf740f3
      Andrew Or authored
      Right now we always run hive tests in branch-1.4 PRs because we compare whether the diff against master involves hive changes. Really we should be comparing against the target branch itself.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6629 from andrewor14/build-check-hive and squashes the following commits:
      
      450fbbd [Andrew Or] [BUILD] Use right branch when checking against Hive
      9cf740f3
    • Andrew Or's avatar
      [BUILD] Increase Jenkins test timeout · e35cd36e
      Andrew Or authored
      Currently hive tests alone take 40m. The right thing to do is
      to reduce the test time. However, that is a bigger project and
      we currently have PRs blocking on tests not timing out.
      e35cd36e
    • Shivaram Venkataraman's avatar
      [SPARK-8084] [SPARKR] Make SparkR scripts fail on error · 0576c3c4
      Shivaram Venkataraman authored
      cc shaneknapp pwendell JoshRosen
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6623 from shivaram/SPARK-8084 and squashes the following commits:
      
      0ec5b26 [Shivaram Venkataraman] Make SparkR scripts fail on error
      0576c3c4
    • Ryan Williams's avatar
      [SPARK-8088] don't attempt to lower number of executors by 0 · 51898b51
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #6624 from ryan-williams/execs and squashes the following commits:
      
      b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0
      51898b51
    • Hari Shreedharan's avatar
      [HOTFIX] History Server API docs error fix. · 566cb594
      Hari Shreedharan authored
      Minor error in the monitoring docs. Also made indentation changes in `ApiRootResource`
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6628 from harishreedharan/eventlog-formatting and squashes the following commits:
      
      a12553d [Hari Shreedharan] Javadoc updates.
      ca399b6 [Hari Shreedharan] [HOTFIX] History Server API docs error fix.
      566cb594
    • Andrew Or's avatar
      [HOTFIX] [TYPO] Fix typo in #6546 · bfbdab12
      Andrew Or authored
      bfbdab12
    • leahmcguire's avatar
      [SPARK-6164] [ML] CrossValidatorModel should keep stats from fitting · d8662cd9
      leahmcguire authored
      Added stats from cross validation as a val in the cross validation model to save them for user access.
      
      Author: leahmcguire <lmcguire@salesforce.com>
      
      Closes #5915 from leahmcguire/saveCVmetrics and squashes the following commits:
      
      49b507b [leahmcguire] fixed tyle error
      67537b1 [leahmcguire] rebased
      85907f0 [leahmcguire] fixed name
      59987cc [leahmcguire] changed param name and test according to comments
      36e71e3 [leahmcguire] rebasing
      4b8223e [leahmcguire] fixed name
      4ddffc6 [leahmcguire] changed param name and test according to comments
      3a995da [leahmcguire] Added stats from cross validation as a val in the cross validation model to save them for user access
      d8662cd9
    • Xiangrui Meng's avatar
      [SPARK-8051] [MLLIB] make StringIndexerModel silent if input column does not exist · 26c9d7a0
      Xiangrui Meng authored
      This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6595 from mengxr/SPARK-8051 and squashes the following commits:
      
      b6a36b9 [Xiangrui Meng] add doc
      f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
      8ee7c7e [Xiangrui Meng] use SparkFunSuite
      e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist
      26c9d7a0
    • Shivaram Venkataraman's avatar
      [SPARK-3674] [EC2] Clear SPARK_WORKER_INSTANCES when using YARN · d3e026f8
      Shivaram Venkataraman authored
      cc andrewor14
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6424 from shivaram/spark-worker-instances-yarn-ec2 and squashes the following commits:
      
      db244ae [Shivaram Venkataraman] Make Python Lint happy
      0593d1b [Shivaram Venkataraman] Clear SPARK_WORKER_INSTANCES when using YARN
      d3e026f8
    • Hari Shreedharan's avatar
      [HOTFIX] Fix Hadoop-1 build caused by #5792. · a8f1f154
      Hari Shreedharan authored
      Replaced `fs.listFiles` with Hadoop-1 friendly `fs.listStatus` method.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6619 from harishreedharan/evetlog-hadoop-1-fix and squashes the following commits:
      
      6192078 [Hari Shreedharan] [HOTFIX] Fix Hadoop-1 build caused by #5972.
      a8f1f154
    • zsxwing's avatar
      [SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and... · f2713478
      zsxwing authored
      [SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
      
      The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs.
      
      This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6546 from zsxwing/SPARK-7989 and squashes the following commits:
      
      5560e09 [zsxwing] Fix a typo
      3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
      f2713478
    • zsxwing's avatar
      [SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout · 1d8669f1
      zsxwing authored
      Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits:
      
      607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
      1d8669f1
    • Marcelo Vanzin's avatar
      [SPARK-8059] [YARN] Wake up allocation thread when new requests arrive. · aa40c442
      Marcelo Vanzin authored
      This should help reduce latency for new executor allocations.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #6600 from vanzin/SPARK-8059 and squashes the following commits:
      
      8387a3a [Marcelo Vanzin] [SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.
      aa40c442
    • Timothy Chen's avatar
      [SPARK-8083] [MESOS] Use the correct base path in mesos driver page. · bfbf12b3
      Timothy Chen authored
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #6615 from tnachen/mesos_driver_path and squashes the following commits:
      
      4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.
      bfbf12b3
    • Andrew Or's avatar
      [MINOR] [UI] Improve confusing message on log page · c6a6dd0d
      Andrew Or authored
      It's good practice to check if the input path is in the directory
      we expect to avoid potentially confusing error messages.
      c6a6dd0d
    • Joseph K. Bradley's avatar
      [SPARK-8054] [MLLIB] Added several Java-friendly APIs + unit tests · 20a26b59
      Joseph K. Bradley authored
      Java-friendly APIs added:
      * GaussianMixture.run()
      * GaussianMixtureModel.predict()
      * DistributedLDAModel.javaTopicDistributions()
      * StreamingKMeans: trainOn, predictOn, predictOnValues
      * Statistics.corr
      * params
        * added doc to w() since Java docs do not inherit doc
        * removed non-Java-friendly w() from StringArrayParam and DoubleArrayParam
        * made DoubleArrayParam Java-friendly w() actually Java-friendly
      
      I generated the doc and verified all changes.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #6562 from jkbradley/java-api-1.4 and squashes the following commits:
      
      c16821b [Joseph K. Bradley] Small fixes based on code review.
      d955581 [Joseph K. Bradley] unit test fixes
      29b6b0d [Joseph K. Bradley] small fixes
      fe6dcfe [Joseph K. Bradley] Added several Java-friendly APIs + unit tests: NaiveBayes, GaussianMixture, LDA, StreamingKMeans, Statistics.corr, params
      20a26b59
    • Reynold Xin's avatar
    • Reynold Xin's avatar
      [SPARK-8074] Parquet should throw AnalysisException during setup for data... · 939e4f3d
      Reynold Xin authored
      [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6608 from rxin/parquet-analysis and squashes the following commits:
      
      b5dc8e2 [Reynold Xin] Code review feedback.
      5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
      939e4f3d
    • Sun Rui's avatar
      [SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable... · 708c63bb
      Sun Rui authored
      [SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits:
      
      51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
      708c63bb
    • Hari Shreedharan's avatar
      [SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro... · d2a86eb8
      Hari Shreedharan authored
      ...m History Server
      
      This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.
      
      This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #5792 from harishreedharan/eventlog-download and squashes the following commits:
      
      221cc26 [Hari Shreedharan] Update docs with new API information.
      a131be6 [Hari Shreedharan] Fix style issues.
      5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
      6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
      d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
      ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
      1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
      5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
      0b66948 [Hari Shreedharan] Minor formatting/import fixes.
      4fc518c [Hari Shreedharan] Fix rat failures.
      a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
      0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
      350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
      fd6ab00 [Hari Shreedharan] Fix style issues
      32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
      7b362b2 [Hari Shreedharan] Almost working.
      3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
      d2a86eb8
Loading