Skip to content
Snippets Groups Projects
Commit 7fecf513 authored by Tom Graves's avatar Tom Graves Committed by Tom Graves
Browse files

[SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…

…ss NFS directories

## What changes were proposed in this pull request?

Change from using java Files.move to use Hadoop filesystem operations to move the directories.  The java Files.move does not work when moving directories across NFS mounts and in fact also says that if the directory has entries you should do a recursive move. We are already using Hadoop filesystem here so just use the local filesystem from there as it handles this properly.

Note that the DB here is actually a directory of files and not just a single file, hence the change in the name of the local var.

## How was this patch tested?

Ran YarnShuffleServiceSuite unit tests.  Unfortunately couldn't easily add one here since involves NFS.
Ran manual tests to verify that the DB directories were properly moved across NFS mounted directories. Have been running this internally for weeks.

Author: Tom Graves <tgraves@apache.org>

Closes #17748 from tgravescs/SPARK-19812.
parent 7a365257
No related branches found
No related tags found
No related merge requests found
......@@ -21,7 +21,6 @@ import java.io.File;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.ByteBuffer;
import java.nio.file.Files;
import java.util.List;
import java.util.Map;
......@@ -340,9 +339,9 @@ public class YarnShuffleService extends AuxiliaryService {
* when it previously was not. If YARN NM recovery is enabled it uses that path, otherwise
* it will uses a YARN local dir.
*/
protected File initRecoveryDb(String dbFileName) {
protected File initRecoveryDb(String dbName) {
if (_recoveryPath != null) {
File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbFileName);
File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbName);
if (recoveryFile.exists()) {
return recoveryFile;
}
......@@ -350,7 +349,7 @@ public class YarnShuffleService extends AuxiliaryService {
// db doesn't exist in recovery path go check local dirs for it
String[] localDirs = _conf.getTrimmedStrings("yarn.nodemanager.local-dirs");
for (String dir : localDirs) {
File f = new File(new Path(dir).toUri().getPath(), dbFileName);
File f = new File(new Path(dir).toUri().getPath(), dbName);
if (f.exists()) {
if (_recoveryPath == null) {
// If NM recovery is not enabled, we should specify the recovery path using NM local
......@@ -363,17 +362,21 @@ public class YarnShuffleService extends AuxiliaryService {
// make sure to move all DBs to the recovery path from the old NM local dirs.
// If another DB was initialized first just make sure all the DBs are in the same
// location.
File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName);
if (!newLoc.equals(f)) {
Path newLoc = new Path(_recoveryPath, dbName);
Path copyFrom = new Path(f.toURI());
if (!newLoc.equals(copyFrom)) {
logger.info("Moving " + copyFrom + " to: " + newLoc);
try {
Files.move(f.toPath(), newLoc.toPath());
// The move here needs to handle moving non-empty directories across NFS mounts
FileSystem fs = FileSystem.getLocal(_conf);
fs.rename(copyFrom, newLoc);
} catch (Exception e) {
// Fail to move recovery file to new path, just continue on with new DB location
logger.error("Failed to move recovery file {} to the path {}",
dbFileName, _recoveryPath.toString(), e);
dbName, _recoveryPath.toString(), e);
}
}
return newLoc;
return new File(newLoc.toUri().getPath());
}
}
}
......@@ -381,7 +384,7 @@ public class YarnShuffleService extends AuxiliaryService {
_recoveryPath = new Path(localDirs[0]);
}
return new File(_recoveryPath.toUri().getPath(), dbFileName);
return new File(_recoveryPath.toUri().getPath(), dbName);
}
/**
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment