Skip to content
Snippets Groups Projects
  1. Sep 07, 2017
    • Dongjoon Hyun's avatar
      [SPARK-21939][TEST] Use TimeLimits instead of Timeouts · c26976fe
      Dongjoon Hyun authored
      Since ScalaTest 3.0.0, `org.scalatest.concurrent.Timeouts` is deprecated.
      This PR replaces the deprecated one with `org.scalatest.concurrent.TimeLimits`.
      
      ```scala
      -import org.scalatest.concurrent.Timeouts._
      +import org.scalatest.concurrent.TimeLimits._
      ```
      
      Pass the existing test suites.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #19150 from dongjoon-hyun/SPARK-21939.
      
      Change-Id: I1a1b07f1b97e51e2263dfb34b7eaaa099b2ded5e
      c26976fe
    • Sanket Chintapalli's avatar
      [SPARK-21890] Credentials not being passed to add the tokens · b9ab791a
      Sanket Chintapalli authored
      I observed this while running a oozie job trying to connect to hbase via spark.
      It look like the creds are not being passed in thehttps://github.com/apache/spark/blob/branch-2.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/HadoopFSCredentialProvider.scala#L53 for 2.2 release.
      More Info as to why it fails on secure grid:
      Oozie client gets the necessary tokens the application needs before launching. It passes those tokens along to the oozie launcher job (MR job) which will then actually call the Spark client to launch the spark app and pass the tokens along.
      The oozie launcher job cannot get anymore tokens because all it has is tokens ( you can't get tokens with tokens, you need tgt or keytab).
      The error here is because the launcher job runs the Spark Client to submit the spark job but the spark client doesn't see that it already has the hdfs tokens so it tries to get more, which ends with the exception.
      There was a change with SPARK-19021 to generalize the hdfs credentials provider that changed it so we don't pass the existing credentials into the call to get tokens so it doesn't realize it already has the necessary tokens.
      
      https://issues.apache.org/jira/browse/SPARK-21890
      Modified to pass creds to get delegation tokens
      
      Author: Sanket Chintapalli <schintap@yahoo-inc.com>
      
      Closes #19140 from redsanket/SPARK-21890-master.
      b9ab791a
  2. Sep 04, 2017
    • Sean Owen's avatar
      [SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScanExec with... · ca59445a
      Sean Owen authored
      [SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true
      
      ## What changes were proposed in this pull request?
      
      If no SparkConf is available to Utils.redact, simply don't redact.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #19123 from srowen/SPARK-21418.
      ca59445a
  3. Sep 01, 2017
    • Sean Owen's avatar
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala... · 12ab7f7e
      Sean Owen authored
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation
      
      …build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
      
      ## What changes were proposed in this pull request?
      
      This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
      
      In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
      
      It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.
      
      - Scalatest 2.x -> 3.0.3
      - Chill 0.8.0 -> 0.8.4
      - Clapper 1.0.x -> 1.1.2
      - json4s 3.2.x -> 3.4.2
      - Jackson 2.6.x -> 2.7.9 (required by json4s)
      
      This change does _not_ fully enable a Scala 2.12 build:
      
      - It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
      - It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.
      
      What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
      
      ## How was this patch tested?
      
      Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #18645 from srowen/SPARK-14280.
      12ab7f7e
    • Marcelo Vanzin's avatar
      [SPARK-21728][CORE] Follow up: fix user config, auth in SparkSubmit logging. · 0bdbefe9
      Marcelo Vanzin authored
      - SecurityManager complains when auth is enabled but no secret is defined;
        SparkSubmit doesn't use the auth functionality of the SecurityManager,
        so use a dummy secret to work around the exception.
      
      - Only reset the log4j configuration when Spark was the one initializing
        it, otherwise user-defined log configuration may be lost.
      
      Tested with the log config file posted to the bug, on a secured YARN cluster.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #19089 from vanzin/SPARK-21728.
      0bdbefe9
  4. Aug 30, 2017
    • Liang-Chi Hsieh's avatar
      [SPARK-21534][SQL][PYSPARK] PickleException when creating dataframe from... · ecf437a6
      Liang-Chi Hsieh authored
      [SPARK-21534][SQL][PYSPARK] PickleException when creating dataframe from python row with empty bytearray
      
      ## What changes were proposed in this pull request?
      
      `PickleException` is thrown when creating dataframe from python row with empty bytearray
      
          spark.createDataFrame(spark.sql("select unhex('') as xx").rdd.map(lambda x: {"abc": x.xx})).show()
      
          net.razorvine.pickle.PickleException: invalid pickle data for bytearray; expected 1 or 2 args, got 0
          	at net.razorvine.pickle.objects.ByteArrayConstructor.construct(ByteArrayConstructor.java
              ...
      
      `ByteArrayConstructor` doesn't deal with empty byte array pickled by Python3.
      
      ## How was this patch tested?
      
      Added test.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #19085 from viirya/SPARK-21534.
      ecf437a6
    • Xiaofeng Lin's avatar
      [SPARK-11574][CORE] Add metrics StatsD sink · cd5d0f33
      Xiaofeng Lin authored
      This patch adds statsd sink to the current metrics system in spark core.
      
      Author: Xiaofeng Lin <xlin@twilio.com>
      
      Closes #9518 from xflin/statsd.
      
      Change-Id: Ib8720e86223d4a650df53f51ceb963cd95b49a44
      cd5d0f33
    • Andrew Ash's avatar
      [SPARK-21875][BUILD] Fix Java style bugs · 313c6ca4
      Andrew Ash authored
      ## What changes were proposed in this pull request?
      
      Fix Java code style so `./dev/lint-java` succeeds
      
      ## How was this patch tested?
      
      Run `./dev/lint-java`
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #19088 from ash211/spark-21875-lint-java.
      313c6ca4
    • Sital Kedia's avatar
      [SPARK-21834] Incorrect executor request in case of dynamic allocation · 6949a9c5
      Sital Kedia authored
      ## What changes were proposed in this pull request?
      
      killExecutor api currently does not allow killing an executor without updating the total number of executors needed. In case of dynamic allocation is turned on and the allocator tries to kill an executor, the scheduler reduces the total number of executors needed ( see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) which is incorrect because the allocator already takes care of setting the required number of executors itself.
      
      ## How was this patch tested?
      
      Ran a job on the cluster and made sure the executor request is correct
      
      Author: Sital Kedia <skedia@fb.com>
      
      Closes #19081 from sitalkedia/skedia/oss_fix_executor_allocation.
      6949a9c5
    • hyukjinkwon's avatar
      [SPARK-21764][TESTS] Fix tests failures on Windows: resources not being closed and incorrect paths · b30a11a6
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      `org.apache.spark.deploy.RPackageUtilsSuite`
      
      ```
       - jars without manifest return false *** FAILED *** (109 milliseconds)
         java.io.IOException: Unable to delete file: C:\projects\spark\target\tmp\1500266936418-0\dep1-c.jar
      ```
      
      `org.apache.spark.deploy.SparkSubmitSuite`
      
      ```
       - download one file to local *** FAILED *** (16 milliseconds)
         java.net.URISyntaxException: Illegal character in authority at index 6: s3a://C:\projects\spark\target\tmp\test2630198944759847458.jar
      
       - download list of files to local *** FAILED *** (0 milliseconds)
         java.net.URISyntaxException: Illegal character in authority at index 6: s3a://C:\projects\spark\target\tmp\test2783551769392880031.jar
      ```
      
      `org.apache.spark.scheduler.ReplayListenerSuite`
      
      ```
       - Replay compressed inprogress log file succeeding on partial read (156 milliseconds)
         Exception encountered when attempting to run a suite with class name:
         org.apache.spark.scheduler.ReplayListenerSuite *** ABORTED *** (1 second, 391 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-8f3cacd6-faad-4121-b901-ba1bba8025a0
      
       - End-to-end replay *** FAILED *** (62 milliseconds)
         java.io.IOException: No FileSystem for scheme: C
      
       - End-to-end replay with compression *** FAILED *** (110 milliseconds)
         java.io.IOException: No FileSystem for scheme: C
      ```
      
      `org.apache.spark.sql.hive.StatisticsSuite`
      
      ```
       - SPARK-21079 - analyze table with location different than that of individual partitions *** FAILED *** (875 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - SPARK-21079 - analyze partitioned table with only a subset of partitions visible *** FAILED *** (47 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      ```
      
      **Note:** this PR does not fix:
      
      `org.apache.spark.deploy.SparkSubmitSuite`
      
      ```
       - launch simple application with spark-submit with redaction *** FAILED *** (172 milliseconds)
         java.util.NoSuchElementException: next on empty iterator
      ```
      
      I can't reproduce this on my Windows machine but looks appearntly consistently failed on AppVeyor. This one is unclear to me yet and hard to debug so I did not include this one for now.
      
      **Note:** it looks there are more instances but it is hard to identify them partly due to flakiness and partly due to swarming logs and errors. Will probably go one more time if it is fine.
      
      ## How was this patch tested?
      
      Manually via AppVeyor:
      
      **Before**
      
      - `org.apache.spark.deploy.RPackageUtilsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/8t8ra3lrljuir7q4
      - `org.apache.spark.deploy.SparkSubmitSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/taquy84yudjjen64
      - `org.apache.spark.scheduler.ReplayListenerSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/24omrfn2k0xfa9xq
      - `org.apache.spark.sql.hive.StatisticsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/2079y1plgj76dc9l
      
      **After**
      
      - `org.apache.spark.deploy.RPackageUtilsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/3803dbfn89ne1164
      - `org.apache.spark.deploy.SparkSubmitSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/m5l350dp7u9a4xjr
      - `org.apache.spark.scheduler.ReplayListenerSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/565vf74pp6bfdk18
      - `org.apache.spark.sql.hive.StatisticsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/qm78tsk8c37jb6s4
      
      Jenkins tests are required and AppVeyor tests will be triggered.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18971 from HyukjinKwon/windows-fixes.
      b30a11a6
    • liuxian's avatar
      [MINOR][TEST] Off -heap memory leaks for unit tests · d4895c9d
      liuxian authored
      ## What changes were proposed in this pull request?
      Free off -heap memory .
      I have checked all the unit tests.
      
      ## How was this patch tested?
      N/A
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #19075 from 10110346/memleak.
      d4895c9d
  5. Aug 29, 2017
    • Steve Loughran's avatar
      [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to handle FileOutputCommitter.getWorkPath==null · e47f48c7
      Steve Loughran authored
      ## What changes were proposed in this pull request?
      
      Handles the situation where a `FileOutputCommitter.getWorkPath()` returns `null` by downgrading to the supplied `path` argument.
      
      The existing code does an  `Option(workPath.toString).getOrElse(path)`, which triggers an NPE in the `toString()` operation if the workPath == null. The code apparently was meant to handle this (hence the getOrElse() clause, but as the NPE has already occurred at that point the else-clause never gets invoked.
      
      ## How was this patch tested?
      
      Manually, with some later code review.
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #18111 from steveloughran/cloud/SPARK-20886-committer-NPE.
      e47f48c7
    • he.qiao's avatar
      [SPARK-21813][CORE] Modify TaskMemoryManager.MAXIMUM_PAGE_SIZE_BYTES comments · fba9cc84
      he.qiao authored
      ## What changes were proposed in this pull request?
      The variable "TaskMemoryManager.MAXIMUM_PAGE_SIZE_BYTES" comment error, It shouldn't be 2^32-1, should be 2^31-1, That means the maximum value of int.
      
      ## How was this patch tested?
      Existing test cases
      
      Author: he.qiao <he.qiao17@zte.com.cn>
      
      Closes #19025 from Geek-He/08_23_comments.
      fba9cc84
    • Marcelo Vanzin's avatar
      [SPARK-21728][CORE] Allow SparkSubmit to use Logging. · d7b1fcf8
      Marcelo Vanzin authored
      This change initializes logging when SparkSubmit runs, using
      a configuration that should avoid printing log messages as
      much as possible with most configurations, and adds code to
      restore the Spark logging system to as close as possible to
      its initial state, so the Spark app being run can re-initialize
      logging with its own configuration.
      
      With that feature, some duplicate code in SparkSubmit can now
      be replaced with the existing methods in the Utils class, which
      could not be used before because they initialized logging. As part
      of that I also did some minor refactoring, moving methods that
      should really belong in DependencyUtils.
      
      The change also shuffles some code in SparkHadoopUtil so that
      SparkSubmit can create a Hadoop config like the rest of Spark
      code, respecting the user's Spark configuration.
      
      The behavior was verified running spark-shell, pyspark and
      normal applications, then verifying the logging behavior,
      with and without dependency downloads.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #19013 from vanzin/SPARK-21728.
      d7b1fcf8
  6. Aug 28, 2017
    • erenavsarogullari's avatar
      [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit Test coverage for different build cases · 73e64f7d
      erenavsarogullari authored
      ## What changes were proposed in this pull request?
      Fair Scheduler can be built via one of the following options:
      - By setting a `spark.scheduler.allocation.file` property,
      - By setting `fairscheduler.xml` into classpath.
      
      These options are checked **in order** and fair-scheduler is built via first found option. If invalid path is found, `FileNotFoundException` will be expected.
      
      This PR aims unit test coverage of these use cases and a minor documentation change has been added for second option(`fairscheduler.xml` into classpath) to inform the users.
      
      Also, this PR was related with #16813 and has been created separately to keep patch content as isolated and to help the reviewers.
      
      ## How was this patch tested?
      Added new Unit Tests.
      
      Author: erenavsarogullari <erenavsarogullari@gmail.com>
      
      Closes #16992 from erenavsarogullari/SPARK-19662.
      73e64f7d
  7. Aug 25, 2017
    • jerryshao's avatar
      [SPARK-21714][CORE][YARN] Avoiding re-uploading remote resources in yarn client mode · 1813c4a8
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      With SPARK-10643, Spark supports download resources from remote in client deploy mode. But the implementation overrides variables which representing added resources (like `args.jars`, `args.pyFiles`) to local path, And yarn client leverage this local path to re-upload resources to distributed cache. This is unnecessary to break the semantics of putting resources in a shared FS. So here proposed to fix it.
      
      ## How was this patch tested?
      
      This is manually verified with jars, pyFiles in local and remote storage, both in client and cluster mode.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #18962 from jerryshao/SPARK-21714.
      1813c4a8
    • Sean Owen's avatar
      [MINOR][BUILD] Fix build warnings and Java lint errors · de7af295
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Fix build warnings and Java lint errors. This just helps a bit in evaluating (new) warnings in another PR I have open.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #19051 from srowen/JavaWarnings.
      de7af295
    • zhoukang's avatar
      [SPARK-21527][CORE] Use buffer limit in order to use JAVA NIO Util's buffercache · 574ef6c9
      zhoukang authored
      ## What changes were proposed in this pull request?
      
      Right now, ChunkedByteBuffer#writeFully do not slice bytes first.We observe code in java nio Util#getTemporaryDirectBuffer below:
      
              BufferCache cache = bufferCache.get();
              ByteBuffer buf = cache.get(size);
              if (buf != null) {
                  return buf;
              } else {
                  // No suitable buffer in the cache so we need to allocate a new
                  // one. To avoid the cache growing then we remove the first
                  // buffer from the cache and free it.
                  if (!cache.isEmpty()) {
                      buf = cache.removeFirst();
                      free(buf);
                  }
                  return ByteBuffer.allocateDirect(size);
              }
      
      If we slice first with a fixed size, we can use buffer cache and only need to allocate at the first write call.
      Since we allocate new buffer, we can not control the free time of this buffer.This once cause memory issue in our production cluster.
      In this patch, i supply a new api which will slice with fixed size for buffer writing.
      
      ## How was this patch tested?
      
      Unit test and test in production.
      
      Author: zhoukang <zhoukang199191@gmail.com>
      Author: zhoukang <zhoukang@xiaomi.com>
      
      Closes #18730 from caneGuy/zhoukang/improve-chunkwrite.
      574ef6c9
  8. Aug 23, 2017
    • Sanket Chintapalli's avatar
      [SPARK-21501] Change CacheLoader to limit entries based on memory footprint · 1662e931
      Sanket Chintapalli authored
      Right now the spark shuffle service has a cache for index files. It is based on a # of files cached (spark.shuffle.service.index.cache.entries). This can cause issues if people have a lot of reducers because the size of each entry can fluctuate based on the # of reducers.
      We saw an issues with a job that had 170000 reducers and it caused NM with spark shuffle service to use 700-800MB or memory in NM by itself.
      We should change this cache to be memory based and only allow a certain memory size used. When I say memory based I mean the cache should have a limit of say 100MB.
      
      https://issues.apache.org/jira/browse/SPARK-21501
      
      Manual Testing with 170000 reducers has been performed with cache loaded up to max 100MB default limit, with each shuffle index file of size 1.3MB. Eviction takes place as soon as the total cache size reaches the 100MB limit and the objects will be ready for garbage collection there by avoiding NM to crash. No notable difference in runtime has been observed.
      
      Author: Sanket Chintapalli <schintap@yahoo-inc.com>
      
      Closes #18940 from redsanket/SPARK-21501.
      1662e931
  9. Aug 22, 2017
    • Jane Wang's avatar
      [SPARK-19326] Speculated task attempts do not get launched in few scenarios · d58a3507
      Jane Wang authored
      ## What changes were proposed in this pull request?
      
      Add a new listener event when a speculative task is created and notify it to ExecutorAllocationManager for requesting more executor.
      
      ## How was this patch tested?
      
      - Added Unittests.
      - For the test snippet in the jira:
      val n = 100
      val someRDD = sc.parallelize(1 to n, n)
      someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
      if (index == 1) {
      Thread.sleep(Long.MaxValue) // fake long running task(s)
      }
      it.toList.map(x => index + ", " + x).iterator
      }).collect
      With this code change, spark indicates 101 jobs are running (99 succeeded, 2 running and 1 is speculative job)
      
      Author: Jane Wang <janewang@fb.com>
      
      Closes #18492 from janewangfb/speculated_task_not_launched.
      d58a3507
    • jerryshao's avatar
      [SPARK-20641][CORE] Add missing kvstore module in Laucher and SparkSubmit code · 3ed1ae10
      jerryshao authored
      There're two code in Launcher and SparkSubmit will will explicitly list all the Spark submodules, newly added kvstore module is missing in this two parts, so submitting a minor PR to fix this.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #19014 from jerryshao/missing-kvstore.
      3ed1ae10
  10. Aug 21, 2017
    • Sergey Serebryakov's avatar
      [SPARK-21782][CORE] Repartition creates skews when numPartitions is a power of 2 · 77d046ec
      Sergey Serebryakov authored
      ## Problem
      When an RDD (particularly with a low item-per-partition ratio) is repartitioned to numPartitions = power of 2, the resulting partitions are very uneven-sized, due to using fixed seed to initialize PRNG, and using the PRNG only once. See details in https://issues.apache.org/jira/browse/SPARK-21782
      
      ## What changes were proposed in this pull request?
      Instead of directly using `0, 1, 2,...` seeds to initialize `Random`, hash them with `scala.util.hashing.byteswap32()`.
      
      ## How was this patch tested?
      `build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.rdd.RDDSuite test`
      
      Author: Sergey Serebryakov <sserebryakov@tesla.com>
      
      Closes #18990 from megaserg/repartition-skew.
      77d046ec
  11. Aug 17, 2017
    • ArtRand's avatar
      [SPARK-16742] Mesos Kerberos Support · bfdc361e
      ArtRand authored
      ## What changes were proposed in this pull request?
      
      Add Kerberos Support to Mesos.   This includes kinit and --keytab support, but does not include delegation token renewal.
      
      ## How was this patch tested?
      
      Manually against a Secure DC/OS Apache HDFS cluster.
      
      Author: ArtRand <arand@soe.ucsc.edu>
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #18519 from mgummelt/SPARK-16742-kerberos.
      bfdc361e
    • Kent Yao's avatar
      [SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for... · b83b502c
      Kent Yao authored
      [SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState
      
      ## What changes were proposed in this pull request?
      
      Set isolated to false while using builtin hive jars and `SessionState.get` returns a `CliSessionState` instance.
      
      ## How was this patch tested?
      
      1 Unit Tests
      2 Manually verified: `hive.exec.strachdir` was only created once because of reusing cliSessionState
      ```java
      ➜  spark git:(SPARK-21428) ✗ bin/spark-sql --conf spark.sql.hive.metastore.jars=builtin
      
      log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
      log4j:WARN Please initialize the log4j system properly.
      log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
      Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
      17/07/16 23:59:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      17/07/16 23:59:27 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
      17/07/16 23:59:27 INFO ObjectStore: ObjectStore, initialize called
      17/07/16 23:59:28 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
      17/07/16 23:59:28 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
      17/07/16 23:59:29 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
      17/07/16 23:59:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
      17/07/16 23:59:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
      17/07/16 23:59:31 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
      17/07/16 23:59:31 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
      17/07/16 23:59:31 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
      17/07/16 23:59:31 INFO ObjectStore: Initialized ObjectStore
      17/07/16 23:59:31 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
      17/07/16 23:59:31 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
      17/07/16 23:59:32 INFO HiveMetaStore: Added admin role in metastore
      17/07/16 23:59:32 INFO HiveMetaStore: Added public role in metastore
      17/07/16 23:59:32 INFO HiveMetaStore: No user is added in admin role, since config is empty
      17/07/16 23:59:32 INFO HiveMetaStore: 0: get_all_databases
      17/07/16 23:59:32 INFO audit: ugi=Kent	ip=unknown-ip-addr	cmd=get_all_databases
      17/07/16 23:59:32 INFO HiveMetaStore: 0: get_functions: db=default pat=*
      17/07/16 23:59:32 INFO audit: ugi=Kent	ip=unknown-ip-addr	cmd=get_functions: db=default pat=*
      17/07/16 23:59:32 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
      17/07/16 23:59:32 INFO SessionState: Created local directory: /var/folders/k2/04p4k4ws73l6711h_mz2_tq00000gn/T/beea7261-221a-4711-89e8-8b12a9d37370_resources
      17/07/16 23:59:32 INFO SessionState: Created HDFS directory: /tmp/hive/Kent/beea7261-221a-4711-89e8-8b12a9d37370
      17/07/16 23:59:32 INFO SessionState: Created local directory: /var/folders/k2/04p4k4ws73l6711h_mz2_tq00000gn/T/Kent/beea7261-221a-4711-89e8-8b12a9d37370
      17/07/16 23:59:32 INFO SessionState: Created HDFS directory: /tmp/hive/Kent/beea7261-221a-4711-89e8-8b12a9d37370/_tmp_space.db
      17/07/16 23:59:32 INFO SparkContext: Running Spark version 2.3.0-SNAPSHOT
      17/07/16 23:59:32 INFO SparkContext: Submitted application: SparkSQL::10.0.0.8
      17/07/16 23:59:32 INFO SecurityManager: Changing view acls to: Kent
      17/07/16 23:59:32 INFO SecurityManager: Changing modify acls to: Kent
      17/07/16 23:59:32 INFO SecurityManager: Changing view acls groups to:
      17/07/16 23:59:32 INFO SecurityManager: Changing modify acls groups to:
      17/07/16 23:59:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(Kent); groups with view permissions: Set(); users  with modify permissions: Set(Kent); groups with modify permissions: Set()
      17/07/16 23:59:33 INFO Utils: Successfully started service 'sparkDriver' on port 51889.
      17/07/16 23:59:33 INFO SparkEnv: Registering MapOutputTracker
      17/07/16 23:59:33 INFO SparkEnv: Registering BlockManagerMaster
      17/07/16 23:59:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
      17/07/16 23:59:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
      17/07/16 23:59:33 INFO DiskBlockManager: Created local directory at /private/var/folders/k2/04p4k4ws73l6711h_mz2_tq00000gn/T/blockmgr-9cfae28a-01e9-4c73-a1f1-f76fa52fc7a5
      17/07/16 23:59:33 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
      17/07/16 23:59:33 INFO SparkEnv: Registering OutputCommitCoordinator
      17/07/16 23:59:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
      17/07/16 23:59:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.0.8:4040
      17/07/16 23:59:33 INFO Executor: Starting executor ID driver on host localhost
      17/07/16 23:59:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51890.
      17/07/16 23:59:33 INFO NettyBlockTransferService: Server created on 10.0.0.8:51890
      17/07/16 23:59:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
      17/07/16 23:59:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.8, 51890, None)
      17/07/16 23:59:33 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.0.8:51890 with 366.3 MB RAM, BlockManagerId(driver, 10.0.0.8, 51890, None)
      17/07/16 23:59:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.8, 51890, None)
      17/07/16 23:59:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.8, 51890, None)
      17/07/16 23:59:34 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/Kent/Documents/spark/spark-warehouse').
      17/07/16 23:59:34 INFO SharedState: Warehouse path is 'file:/Users/Kent/Documents/spark/spark-warehouse'.
      17/07/16 23:59:34 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
      17/07/16 23:59:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
      17/07/16 23:59:34 INFO HiveMetaStore: 0: get_database: default
      17/07/16 23:59:34 INFO audit: ugi=Kent	ip=unknown-ip-addr	cmd=get_database: default
      17/07/16 23:59:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
      17/07/16 23:59:34 INFO HiveMetaStore: 0: get_database: global_temp
      17/07/16 23:59:34 INFO audit: ugi=Kent	ip=unknown-ip-addr	cmd=get_database: global_temp
      17/07/16 23:59:34 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
      17/07/16 23:59:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
      17/07/16 23:59:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
      spark-sql>
      
      ```
      cc cloud-fan gatorsmile
      
      Author: Kent Yao <yaooqinn@hotmail.com>
      Author: hzyaoqin <hzyaoqin@corp.netease.com>
      
      Closes #18648 from yaooqinn/SPARK-21428.
      b83b502c
    • Hideaki Tanaka's avatar
      [SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS instead of ip address · d695a528
      Hideaki Tanaka authored
      ## What changes were proposed in this pull request?
      
      The patch lets spark web ui use FQDN as its hostname instead of ip address.
      
      In current implementation, ip address of a driver host is set to DRIVER_HOST_ADDRESS. This becomes a problem when we enable SSL using "spark.ssl.enabled", "spark.ssl.trustStore" and "spark.ssl.keyStore" properties. When we configure these properties, spark web ui is launched with SSL enabled and the HTTPS server is configured with the custom SSL certificate you configured in these properties.
      In this case, client gets javax.net.ssl.SSLPeerUnverifiedException exception when the client accesses the spark web ui because the client fails to verify the SSL certificate (Common Name of the SSL cert does not match with DRIVER_HOST_ADDRESS).
      
      To avoid the exception, we should use FQDN of the driver host for DRIVER_HOST_ADDRESS.
      
      Error message that client gets when the client accesses spark web ui:
      javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.102.138.239> doesn't match any of the subject alternative names: []
      
      ## How was this patch tested?
      manual tests
      
      Author: Hideaki Tanaka <tanakah@amazon.com>
      
      Closes #18846 from thideeeee/SPARK-21642.
      d695a528
  12. Aug 16, 2017
    • Eyal Farago's avatar
      [SPARK-3151][BLOCK MANAGER] DiskStore.getBytes fails for files larger than 2GB · b8ffb510
      Eyal Farago authored
      ## What changes were proposed in this pull request?
      introduced `DiskBlockData`, a new implementation of `BlockData` representing a whole file.
      this is somehow related to [SPARK-6236](https://issues.apache.org/jira/browse/SPARK-6236) as well
      
      This class follows the implementation of `EncryptedBlockData` just without the encryption. hence:
      * `toInputStream` is implemented using a `FileInputStream` (todo: encrypted version actually uses `Channels.newInputStream`, not sure if it's the right choice for this)
      * `toNetty` is implemented in terms of `io.netty.channel.DefaultFileRegion`
      * `toByteBuffer` fails for files larger than 2GB (same behavior of the original code, just postponed a bit), it also respects the same configuration keys defined by the original code to choose between memory mapping and simple file read.
      
      ## How was this patch tested?
      added test to DiskStoreSuite and MemoryManagerSuite
      
      Author: Eyal Farago <eyal@nrgene.com>
      
      Closes #18855 from eyalfa/SPARK-3151.
      b8ffb510
    • John Lee's avatar
      [SPARK-21656][CORE] spark dynamic allocation should not idle timeout executors... · adf005da
      John Lee authored
      [SPARK-21656][CORE] spark dynamic allocation should not idle timeout executors when tasks still to run
      
      ## What changes were proposed in this pull request?
      
      Right now spark lets go of executors when they are idle for the 60s (or configurable time). I have seen spark let them go when they are idle but they were really needed. I have seen this issue when the scheduler was waiting to get node locality but that takes longer than the default idle timeout. In these jobs the number of executors goes down really small (less than 10) but there are still like 80,000 tasks to run.
      We should consider not allowing executors to idle timeout if they are still needed according to the number of tasks to be run.
      
      ## How was this patch tested?
      
      Tested by manually adding executors to `executorsIdsToBeRemoved` list and seeing if those executors were removed when there are a lot of tasks and a high `numExecutorsTarget` value.
      
      Code used
      
      In  `ExecutorAllocationManager.start()`
      
      ```
          start_time = clock.getTimeMillis()
      ```
      
      In `ExecutorAllocationManager.schedule()`
      ```
          val executorIdsToBeRemoved = ArrayBuffer[String]()
          if ( now > start_time + 1000 * 60 * 2) {
            logInfo("--- REMOVING 1/2 of the EXECUTORS ---")
            start_time +=  1000 * 60 * 100
            var counter = 0
            for (x <- executorIds) {
              counter += 1
              if (counter == 2) {
                counter = 0
                executorIdsToBeRemoved += x
              }
            }
          }
      
      Author: John Lee <jlee2@yahoo-inc.com>
      
      Closes #18874 from yoonlee95/SPARK-21656.
      adf005da
  13. Aug 15, 2017
    • Marcelo Vanzin's avatar
      [SPARK-21731][BUILD] Upgrade scalastyle to 0.9. · 3f958a99
      Marcelo Vanzin authored
      This version fixes a few issues in the import order checker; it provides
      better error messages, and detects more improper ordering (thus the need
      to change a lot of files in this patch). The main fix is that it correctly
      complains about the order of packages vs. classes.
      
      As part of the above, I moved some "SparkSession" import in ML examples
      inside the "$example on$" blocks; that didn't seem consistent across
      different source files to start with, and avoids having to add more on/off blocks
      around specific imports.
      
      The new scalastyle also seems to have a better header detector, so a few
      license headers had to be updated to match the expected indentation.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18943 from vanzin/SPARK-21731.
      3f958a99
    • Marcelo Vanzin's avatar
      [SPARK-17742][CORE] Handle child process exit in SparkLauncher. · cba826d0
      Marcelo Vanzin authored
      Currently the launcher handle does not monitor the child spark-submit
      process it launches; this means that if the child exits with an error,
      the handle's state will never change, and an application will not know
      that the application has failed.
      
      This change adds code to monitor the child process, and changes the
      handle state appropriately when the child process exits.
      
      Tested with added unit tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18877 from vanzin/SPARK-17742.
      cba826d0
  14. Aug 14, 2017
    • Andrew Ash's avatar
      [SPARK-21563][CORE] Fix race condition when serializing TaskDescriptions and adding jars · 6847e93c
      Andrew Ash authored
      ## What changes were proposed in this pull request?
      
      Fix the race condition when serializing TaskDescriptions and adding jars by keeping the set of jars and files for a TaskSet constant across the lifetime of the TaskSet.  Otherwise TaskDescription serialization can produce an invalid serialization when new file/jars are added concurrently as the TaskDescription is serialized.
      
      ## How was this patch tested?
      
      Additional unit test ensures jars/files contained in the TaskDescription remain constant throughout the lifetime of the TaskSet.
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #18913 from ash211/SPARK-21563.
      6847e93c
    • Anderson Osagie's avatar
      [SPARK-21176][WEB UI] Format worker page links to work with proxy · 34d2134a
      Anderson Osagie authored
      ## What changes were proposed in this pull request?
      
      Several links on the worker page do not work correctly with the proxy because:
      1) They don't acknowledge the proxy
      2) They use relative paths (unlike the Application Page which uses full paths)
      
      This patch fixes that. It also fixes a mistake in the proxy's Location header parsing which caused it to incorrectly handle redirects.
      
      ## How was this patch tested?
      
      I checked the validity of every link with the proxy on and off.
      
      Author: Anderson Osagie <osagie@gmail.com>
      
      Closes #18915 from aosagie/fix/proxy-links.
      34d2134a
  15. Aug 11, 2017
    • Stavros Kontopoulos's avatar
      [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alone cluster mode · da8c59bd
      Stavros Kontopoulos authored
      Fixes --packages flag for the stand-alone case in cluster mode. Adds to the driver classpath the jars that are resolved via ivy along with any other jars passed to `spark.jars`. Jars not resolved by ivy are downloaded explicitly to a tmp folder on the driver node. Similar code is available in SparkSubmit so we refactored part of it to use it at the DriverWrapper class which is responsible for launching driver in standalone cluster mode.
      
      Note: In stand-alone mode `spark.jars` contains the user jar so it can be fetched later on at the executor side.
      
      Manually by submitting a driver in cluster mode within a standalone cluster and checking if dependencies were resolved at the driver side.
      
      Author: Stavros Kontopoulos <st.kontopoulos@gmail.com>
      
      Closes #18630 from skonto/fix_packages_stand_alone_cluster.
      da8c59bd
    • Kent Yao's avatar
      [SPARK-21675][WEBUI] Add a navigation bar at the bottom of the Details for Stage Page · 2387f1e3
      Kent Yao authored
      ## What changes were proposed in this pull request?
      
      1. In Spark Web UI, the Details for Stage Page don't have a navigation bar at the bottom. When we drop down to the bottom, it is better for us to see a navi bar right there to go wherever we what.
      2. Executor ID is not equivalent to Host, it may be  better to separate them, and then we can group the tasks by Hosts .
      
      ## How was this patch tested?
      manually test
      ![wx20170809-165606](https://user-images.githubusercontent.com/8326978/29114161-f82b4920-7d25-11e7-8d0c-0c036b008a78.png)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Kent Yao <yaooqinn@hotmail.com>
      
      Closes #18893 from yaooqinn/SPARK-21675.
      2387f1e3
  16. Aug 09, 2017
    • peay's avatar
      [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator · c06f3f5a
      peay authored
      ## What changes were proposed in this pull request?
      
      This modification increases the timeout for `serveIterator` (which is not dynamically configurable). This fixes timeout issues in pyspark when using `collect` and similar functions, in cases where Python may take more than a couple seconds to connect.
      
      See https://issues.apache.org/jira/browse/SPARK-21551
      
      ## How was this patch tested?
      
      Ran the tests.
      
      cc rxin
      
      Author: peay <peay@protonmail.com>
      
      Closes #18752 from peay/spark-21551.
      c06f3f5a
    • Takeshi Yamamuro's avatar
      [SPARK-21276][CORE] Update lz4-java to the latest (v1.4.0) · b78cf13b
      Takeshi Yamamuro authored
      ## What changes were proposed in this pull request?
      This pr updated `lz4-java` to the latest (v1.4.0) and removed custom `LZ4BlockInputStream`. We currently use custom `LZ4BlockInputStream` to read concatenated byte stream in shuffle. But, this functionality has been implemented in the latest lz4-java (https://github.com/lz4/lz4-java/pull/105). So, we might update the latest to remove the custom `LZ4BlockInputStream`.
      
      Major diffs between the latest release and v1.3.0 in the master are as follows (https://github.com/lz4/lz4-java/compare/62f7547abb0819d1ca1e669645ee1a9d26cd60b0...6d4693f56253fcddfad7b441bb8d917b182efa2d);
      - fixed NPE in XXHashFactory similarly
      - Don't place resources in default package to support shading
      - Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed
      - Try to load lz4-java from java.library.path, then fallback to bundled
      - Add ppc64le binary
      - Add s390x JNI binding
      - Add basic LZ4 Frame v1.5.0 support
      - enable aarch64 support for lz4-java
      - Allow unsafeInstance() for ppc64le archiecture
      - Add unsafeInstance support for AArch64
      - Support 64-bit JNI build on Solaris
      - Avoid over-allocating a buffer
      - Allow EndMark to be incompressible for LZ4FrameInputStream.
      - Concat byte stream
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Takeshi Yamamuro <yamamuro@apache.org>
      
      Closes #18883 from maropu/SPARK-21276.
      b78cf13b
    • vinodkc's avatar
      [SPARK-21665][CORE] Need to close resources after use · 83fe3b5e
      vinodkc authored
      ## What changes were proposed in this pull request?
      Resources in Core - SparkSubmitArguments.scala, Spark-launcher - AbstractCommandBuilder.java, resource-managers- YARN - Client.scala are released
      
      ## How was this patch tested?
      No new test cases added, Unit test have been passed
      
      Author: vinodkc <vinod.kc.in@gmail.com>
      
      Closes #18880 from vinodkc/br_fixresouceleak.
      83fe3b5e
    • 10087686's avatar
      [SPARK-21663][TESTS] test("remote fetch below max RPC message size") should... · 6426adff
      10087686 authored
      [SPARK-21663][TESTS] test("remote fetch below max RPC message size") should call masterTracker.stop() in MapOutputTrackerSuite
      
      Signed-off-by: 10087686 <wang.jiaochunzte.com.cn>
      
      ## What changes were proposed in this pull request?
      After Unit tests end,there should be call masterTracker.stop() to free resource;
      (Please fill in changes proposed in this fix)
      
      ## How was this patch tested?
      Run Unit tests;
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: 10087686 <wang.jiaochun@zte.com.cn>
      
      Closes #18867 from wangjiaochun/mapout.
      6426adff
    • Anderson Osagie's avatar
      [SPARK-21176][WEB UI] Use a single ProxyServlet to proxy all workers and applications · ae8a2b14
      Anderson Osagie authored
      ## What changes were proposed in this pull request?
      
      Currently, each application and each worker creates their own proxy servlet. Each proxy servlet is backed by its own HTTP client and a relatively large number of selector threads. This is excessive but was fixed (to an extent) by https://github.com/apache/spark/pull/18437.
      
      However, a single HTTP client (backed by a single selector thread) should be enough to handle all proxy requests. This PR creates a single proxy servlet no matter how many applications and workers there are.
      
      ## How was this patch tested?
      .
      The unit tests for rewriting proxied locations and headers were updated. I then spun up a 100 node cluster to ensure that proxy'ing worked correctly
      
      jiangxb1987 Please let me know if there's anything else I can do to help push this thru. Thanks!
      
      Author: Anderson Osagie <osagie@gmail.com>
      
      Closes #18499 from aosagie/fix/minimize-proxy-threads.
      ae8a2b14
    • pgandhi's avatar
      [SPARK-21503][UI] Spark UI shows incorrect task status for a killed Executor Process · f016f5c8
      pgandhi authored
      The executor tab on Spark UI page shows task as completed when an executor process that is running that task is killed using the kill command.
      Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. This case will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc.
      
      ## How was this patch tested?
      Manually Tested the fix by observing the UI change before and after.
      Before:
      <img width="1398" alt="screen shot-before" src="https://user-images.githubusercontent.com/22228190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png">
      After:
      <img width="1385" alt="screen shot-after" src="https://user-images.githubusercontent.com/22228190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png">
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: pgandhi <pgandhi@yahoo-inc.com>
      Author: pgandhi999 <parthkgandhi9@gmail.com>
      
      Closes #18707 from pgandhi999/master.
      f016f5c8
  17. Aug 07, 2017
    • Xianyang Liu's avatar
      [SPARK-21621][CORE] Reset numRecordsWritten after DiskBlockObjectWriter.commitAndGet called · 534a063f
      Xianyang Liu authored
      ## What changes were proposed in this pull request?
      
      We should reset numRecordsWritten to zero after DiskBlockObjectWriter.commitAndGet called.
      Because when `revertPartialWritesAndClose` be called, we decrease the written records in `ShuffleWriteMetrics` . However, we decreased the written records to zero, this should be wrong, we should only decreased the number reords after the last `commitAndGet` called.
      
      ## How was this patch tested?
      Modified existing test.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Xianyang Liu <xianyang.liu@intel.com>
      
      Closes #18830 from ConeyLiu/DiskBlockObjectWriter.
      534a063f
Loading