Reading files with fread() in chunks.

Good morning everyone!

I would like to ask for a little help here on the forum. It's like this.

I have a binary file of (9175867 bytes),
I need to read only (75867) bytes of this file, and in chunks of (1024 bytes) at a time, I don't want to throw  (75867 bytes)  all in a buffer in memory.

How can I do this with the fread() function?

If anyone can help me, I would be very grateful.
Last edited on
I don't want to throw (75867 bytes) all in a buffer in memory.


Why not?
It's just an example, in fact the file that I opened is 6 gigabytes.
It works just like you imagine. The only trick is that you must now handle the bookkeeping of when your memory buffer ends but the file does not — meaning you have an extra layer of function on top of reading the file.

If you only wish to moderate how much memory is used for the file buffering, the C library lets you do that already using setvbuf() (https://en.cppreference.com/w/c/io/setvbuf), so you do not need to do anything extra to use it.

(I am assuming you are using C here, not C++.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <stdio.h>

char buffer[1024];

int main(void)
{
    // Open the file
    FILE * f = fopen( "some/file.txt", "rb" );
    if (!f) fooey( "Could not open file" );

    // Endeavor to use our buffer
    if (setvbuf( f, buffer, _IOFBF, sizeof(buffer) ) != 0)
        fooey( "Could not change buffer size" );

    // Read the file in any normal way. Here we just copy it to stdout.
    int ch;
    while ((ch = fgetc(f)) != EOF)
        putchar( ch );

    fclose( f );
    return 0;
}

Hope this helps.
Last edited on
in fact the file that I opened is 6 gigabytes
...
I need to read only (75867) bytes of this file


What exactly are you trying to do? You can easily read 75,867 bytes from a 6 gigabyte file without any buffering or special processing.... Is this data at the beginning of the file?
Last edited on
@Duthomhas, you've had a bit of a brain slip writing sizeof(1024). You mean sizeof(buffer) (or just 1024 without the sizeof). :)
6gb is about 10% of the memory on a modern good quality computer (most good ones now have 64 gb). There are diminishing returns of course; its not usually useful to read that big a bite unless you are doing something very trivial (like copy the file or encrypt it).

but... another idea..
get the file size
figure out how many chunks you have. If its not X * 1024 either error out or add logic to handle that.
then just read chunks...
1
2
3
4
5
6
vector<unsigned char> buff(1024);
for(int i = 0; i <numberofchunks; i++)
{
 fread(&buff[0],1024,1,fileptr);
 dosomething(buff);
}


if the file is structured into 1024 byte records or something, change the vector of bytes to an unaligned structure and read directly into that so the fields will be populated directly.
Last edited on
Thanks for the help, I've already got what I wanted.

Maybe I expressed myself wrong in the post. This is how I wanted it.

/*I need to read "file1.dat 26.3 MB (27,648,054 bytes)" which is in the file "binfile.bin 2.53 GB (2,726,166,528 bytes)", this file "file1.dat" is in position 500 of the file "binfile.bin".

I don't want to throw all 27648054 bytes into a buffer, I want to read in chunks of 16384,
and write in chunks as well.

I managed to do it like this: see the example below.

This code to compress files such as images, audios, graphics and other 2D game files that I'm creating, into a single encrypted file.

I've already done everything right the way I wanted, reading the chunks in buffers, but there was a bit of a delay in extracting images in the game at runtime.

That's why I'm going to try to read from the HD now and play it directly to the video memory without playing it to RAM in "buffer".
*/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int main() {

	FILE* in = fopen("D:/bACKUP_HDD_USB/binfile.bin", "rb"); // 2,53 GB (2.726.166.528 bytes)
	FILE* out = fopen("file1.bmp", "wb");

int chunk = 16384;
char* buf = new char[chunk];
int ret = 0, ret2 = 0;
int size_file = 27648054; // tamando que quero ler.
int ok = 0;
int have = 0;

fseek(in, 500, SEEK_SET);

while (!feof(in) && (ok != 1))
{
	ret = fread(buf, 1, chunk, in);
	ret2 += ret;
	int have = ret;
	if (ret2 > size_file) {			// se ler mais que o tamanho desjado: que no caso é "size_file".
		ret2 = -(size_file - ret2); // subtrai do "size_file" total que já foi lido em "ret2",
									// se der valor negativo, é convertido em  positivo pela procendência de sinais.  
		have = (chunk - ret2);		// "have" recenbe o restate de bytes necessário pra completar o total "size_file" .
		ok = 1;
	}		
	fwrite(buf, 1, have, out);
}
fclose(out);
fclose(in);
	return 0;
}


If anyone has a better idea of ​​how to do it, I'd be very grateful.
You should post the complete program, which includes #includes.
We shouldn't have to translate your comments (from Portuguese?).
There's no reason to dynamically allocate your buffer, and doing so with a C++ statement in a program that is otherwise C is strange.

Anyway, you are not checking for eof properly. In general, you don't use eof() to do so, but instead test the return value of the reading function, in this case fread. When it is 0, eof has been reached (or an error has occurred).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <stdio.h>

#define INPUT_FILE  "D:/bACKUP_HDD_USB/binfile.bin"
#define OUTPUT_FILE "file1.bmp"

#define WANTED_SIZE 27648054
#define CHUNK_SIZE     16384
#define OFFSET           500

char buffer[CHUNK_SIZE];

int main(void) {
    FILE* in = fopen(INPUT_FILE, "rb");
    if (!in) {
        perror("fopen");
        return 1;
    }

    FILE* out = fopen(OUTPUT_FILE, "wb");
    int total = 0, amount_read = 0;

    fseek(in, OFFSET, SEEK_SET);

    while (total < WANTED_SIZE && (amount_read = fread(buffer, 1, CHUNK_SIZE, in)) > 0) {
        total += amount_read;
        if (total > WANTED_SIZE)
            amount_read -= total - WANTED_SIZE;
        fwrite(buffer, 1, amount_read, out);
    }

    if (total < WANTED_SIZE)
        printf("Only %d bytes of the file were read.\n", total);

    fclose(out);
    fclose(in);
    return 0;
}

Last edited on
@DizzyDon
Thanks!

@cgm2k7
The file goes through the CPU’s RAM either way.
https://stackoverflow.com/questions/17345735/is-there-a-way-to-read-directly-from-the-hard-drive-to-the-gpu
Yes, I've read about this before.
My idea is: instead of creating several buffers in RAM until they reach the GPU, "I think" by playing directly in the buffers with DirectX or OpenGL, you'll have to create buffers in RAM, but reading directly from the HD using DirectX or OpenGL methods is more expensive, yes, I think it costs less RAM.
I really don't understand much. It's just me thinking, I haven't proven it yet.
Thanks
if its for a game, one thing to think about is whether your game will require SSD or not. I don't honestly know what you might do differently in that case, but you certainly can avoid less reading ahead (knowing you will soon need a file, you kick off a thread to read it before you do) and are that much closer to real time than HDD.

Is the encryption necessary? If you leave that open, players can modify your game and that is a huge bonus for many players. And, it removes that much more processing which is slowness.

only other "idea" I have is extremely difficult in practice... but if your encryption / byte modification can be biased towards improved compression... it could drastically improve the annoying audio and video compression problem. That is if somehow your encryption favored generating the same byte patterns to assist the compression algorithms... again, I don't know if this can even be done other than by hand. Ive done some similar work by hand, but its basically 'compression' that only works on 1 file, and it was long ago when that was worth doing :)
I am a self-taught programming hobbyist who has been spending time dinking around since before C++98 was standardized. My knowledge has some gaps because I program as a hobby for fun.

There are 2 basic bottlenecks in a graphics based game.

1. Reading of the graphical assets, most times done in a pre-fetch manner during the times the game is idle. Yes, even fast games are idle a lot of times, depending on the frame-rate. The higher the frame rate the less time the game is idle.

2. Updating a game frame to reflect what's changed.

User input is crucial to a game, turned based or real time updated. Turned based timing of the graphics isn't as crucial to real time.

Using DX or OpenGL to create games has some benefits and deficits. There are more sophisticated and newer engines available. Unreal, Unity and Cocos are three that have support in MSVC.

For that matter it is still possible to create graphical games, real time games, using the WinAPI GDI with old school Windows sound/music support. Games created this way ain't 3D or full screen, but they can be fun to play.

If a game ain't fun to play then using the latest game tools are a waste of time and effort.

Shameless plug, my GitHub repo for updating (2003/2004) graphical games using the GDI and WinAPI multi-media system to work with modern Windows (Win 10):

https://github.com/GeorgePimpleton/Win32-games

"I think" by playing directly in the buffers with DirectX or OpenGL

You are trying to defeat one of the main reasons the libraries were created originally. To hide all the messy details of graphical/sound management. Before DX/OpenGL the programmer had to spend a lot of coding effort dealing with how to "talk" to the video/sound cards installed with lots and lots of custom code. DX/OpenGL were created so there was a "universal" interface to the cards no matter what was installed. So you, the programmer, could be more involved in creating the guts of the game itself.

DX and OpenGL are designed to be as speedy as needed, manual tinkering will likely create a lot of bottlenecks that wouldn't be there otherwise. Use the asset management DX/OpenGL provides, they are optimized for good performance.

DX has seen some major changes over the years, OpenGL development has been more static though there are differences between the versions.

I truly miss DirectDraw. Though it has kinda resurfaced with the introduction of Direct2D. The resources to learn DX or OpenGL are outdated (books) and online resources are scattered all over the place.

I haven't spent much time with DX, most of my hobby projects are console based. I have never really done even any tinkering with the basics of OpenGL.
Consider reading/writing chunks in parallel using threads and asynchronous file i/o:
https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
thanks
Well, it looks like the Report button still works for every rando. Or maybe seeplus has himself his very own Report button stalker now?
Probably reported for mentioning microsoft, a grievous offence. :-)
I've joined the wall of notoriety :)
Just wait for when you have EVERY post reported, including ones that previously weren't. That happened to me a while back.
Topic archived. No new replies allowed.