About files, little bit related to c++

Forum

Forum
Beginners
About files, little bit related to c++

About files, little bit related to c++

I am experimenting a thing here, Well some files are encrypted, others are not. Two of the last category are: text(.txt) files and cpp files.

Every time I add a content to those files, the files size change

Content 1(No new line)

The file size = 1 byte.
===========

Content 2(No new line)

aaa

The file size = 3 bytes.
===========

Content 3(One new line)

aaa

The file size = 5 bytes.
===========

Content 4(Two new lines)

aaa

The file size = 7 bytes.

So here I am assuming the new line cost 2 bytes, but in c++ this character is as any other character the size of it is just 1 byte.

Looking for any idea related to this topic.

Thanks

Last edited on

JLBorges (13770)

How a new line is represented in the file system depends on the envronment; on Windows the representation is two characters long.
https://en.wikipedia.org/wiki/Newline#Representation

However,

To facilitate the creation of portable programs, programming languages provide some abstractions to deal with the different types of newline sequences used in different environments.

The C programming language provides the escape sequences '\n' (newline) and '\r' (carriage return). However, these are not required to be equivalent to the ASCII LF and CR control characters. The C standard only guarantees two things:
1. Each of these escape sequences maps to a unique implementation-defined number that can be stored in a single char value.
2. When writing to a file, device node, or socket/fifo in text mode, '\n' is transparently translated to the native newline sequence used by the system, which may be longer than one character. When reading in text mode, the native newline sequence is translated back to '\n'. In binary mode, no translation is performed, and the internal representation produced by '\n' is output directly.

On Unix platforms, where C originated, the native newline sequence is ASCII LF (0x0A), so '\n' was simply defined to be that value. With the internal and external representation being identical, the translation performed in text mode is a no-op, and Unix has no notion of text mode or binary mode. This has caused many programmers who developed their software on Unix systems simply to ignore the distinction completely, resulting in code that is not portable to different platforms.

The C library function fgets() is best avoided in binary mode because any file not written with the Unix newline convention will be misread. Also, in text mode, any file not written with the system's native newline sequence (such as a file created on a Unix system, then copied to a Windows system) will be misread as well.

Another common problem is the use of '\n' when communicating using an Internet protocol that mandates the use of ASCII CR+LF for ending lines. Writing '\n' to a text mode stream works correctly on Windows systems, but produces only LF on Unix, and something completely different on more exotic systems. Using "\r\n" in binary mode is slightly better.
https://en.wikipedia.org/wiki/Newline#In_programming_languages

ninja01 (157)

Thanks @JLBorges

Topic archived. No new replies allowed.

C++

Forum

About files, little bit related to c++