Get absolute path on Linux/Unix

Hello,

I have a program that takes an input and output file as arguments. I want to make sure that the user does not specify both as the same file, because this would result is trouble. However, a simple strcmp() would only catch the most trivial cases. For example, "foo.txt", "./foo.txt" and "/full/path/foo.txt" would still be the same file, even though the strings are obviously different. There are many more examples, of course...

So, I want to convert the file names to absolute paths before comparing them. On Windows, I could use GetFullPathName() for that purpose. Is there an equivalent in POSIX API on Linux/Unix? Unfortunately, I can not use realpath(), because it only works with existing files. The output file does not usually exist, though...

Note: GetFullPathName() on Windows does not require the file to exist!

Please also note that the program is written in "plain" C, so no C++ <filesystem> is available. Also, I really want to avoid using 3rd-party libraries, otherwise something like cwalk could probably do the job...

Any suggestions?

(The best solution that I have come up with, so far, is to manually test whether the path starts with a slash. If it doesn't, then prepend the current working directory, as returned by getcwd() function)

Thank you and regards.
Last edited on
Even the absolute path might not save you.

In Linux, the same file can have multiple names through the use of hard links.

https://www.man7.org/linux/man-pages/man2/stat.2.html
Do a 'stat' on both files and check that the .st_ino values are different.
You are right, of course.

I think even .st_ino is only unambiguous within a specific volume (file system). Also, stat() function only works for existing files. The output path may (and usually does) not exist, though.

Anyhow, even if there are some cases that we won't be able to catch, I think converting both paths to absolute paths and then comparing them should be "good enough" here.

However, is there really no function in Linux/Unix API to resolve a given path, extsing or non-existing, to an absolute path, in the exactly same way as open() would do effectively?

(Seems like a weird overlook to me)

Regards.
Last edited on
> Unfortunately, I can not use realpath(), because it only works with existing files. The output file does not usually exist, though...

What about doing realpath on the directory?

Then you only have to compare the filenames.
Last edited on
After thinking about this a little more, I believe that I can get along with realpath() 💡

If the given input file does not exist, then we are going to fail anyway, because fopen() is going to fail to open the non-existing input file for reading. So, the only relevant case is when the input file does exist. And, in that case, realpath() will be able to resolve the absolute (canonical) path of the input file.

If we already know that the given input file exists, then still the given output file may exist or not. In case that the given output file does exist, realpath() will be able to resolve its absolute (canonical) path, so that we can compare it to the absolute path of the input file – if they are the same, we will detect it! Otherwise, if the given output file does not exist, then it obviously can not be the same as the existing input file.

I assume there still could be some race-conditions, because the realpath() invocation and the following fopen() invocation are not "atomic". But that's probably more a "theoretical" problem...
Last edited on
obtain the full path of a file, we use the readlink command. readlink prints the absolute path of a symbolic link, but as a side-effect, it also prints the absolute path for a relative path

There is the readlink program, form GNU CoreUtils, that does this. I use it in shell scripts. However, I'm talking about my own program, written in C, here. And, unfortunately, the readlink() syscall, which I could call from my C code, only works with actual symlinks! It fails, with EINVAL, if the given path is not a symlink.

I actually looked at the source code of the readlink program to see what they are doing in order to get an absolute path, but it's a quite complex routine. Certainly not as simple as a single syscall...
Last edited on
is there really no function in Linux/Unix API to resolve a given path, extsing or non-existing, to an absolute path, in the exactly same way as open() would do effectively?
open() doesn't need to resolve the full path. It just follows the inodes.
I think even .st_ino is only unambiguous within a specific volume

True, but the combination of st_ino and st_dev uniquely identify the file.

So if you can stat both files then compare the st_ino and st_dev fields.
otherwise, if stat(file1) returned ENOENT then temporarily create it and compare again.
otherwise if stat(file2) returned ENOENT then temporarily create it and compare again.

Since file2 might be a symlink to file1, the "compare again" part means you need to stat both files again.
So, after all, stat() can be used to check whether two exiting files are the same (combination of st_ino and st_dev). But the same can be achieved by using realpath() and then comparing the paths. For me, the advantage of realpath() is that Windows has an equivalent _fullpath() function (except that this one handles non-existing files), whereas the Windows version of stat() does not provide meaningful st_ino values.

BTW: My code needs to work on both platforms, Windows and Unix.

BTW²: Does Unix provide meaningful st_ino values for any type of file system?

The "problematic" case is when the file does not exist, because in that case neither stat() nor realpath() will work. But, as pointed out before, in my special situation the given input file always exists, or we will fail anyway! The given output file may exist or not. But, if the output file does not exist, then we are fine. The case I need to catch is when the output file already exists and happens to be the same as the input file.
Last edited on
Update:

Just figured out that Windows also provides a unique id for each file, just not via MSVC's implementation of the stat() or fstat() functions, but via the GetFileInformationByHandle() function:

1
2
3
4
5
6
typedef struct BY_HANDLE_FILE_INFORMATION {
   /* ... */
   DWORD    dwVolumeSerialNumber;
   DWORD    nFileIndexHigh;
   DWORD    nFileIndexLow;
}

The identifier (low and high parts) and the volume serial number uniquely identify a file on a single computer.
Last edited on
uniquely identify a file on a single computer.

Are you sure? As I recall, it's per volume.
@kbw, if MS Docs are to be believed regarding WinAPI the identification of a file is on a single computer.

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information
uniquely identify a file on a single computer.
Are you sure? As I recall, it's per volume.

The file identifier (low and high parts) is unique per volume. Together with the volume serial number it should be unique on a single computer. Says Microsoft in their documentation...
Last edited on
Topic archived. No new replies allowed.