Hey can someone provide a c++ code example on how to open a file in binary mode and look for any files inside of the binary with a file extension of any and dump them to a directory path based on there buffer size.
I know how to open a file so far but I need to be able to look into the file after I open it and look for any file extensions based on what I provide "which could be '.dll' , or any provided" and then dump them if it finds any by the file extension based on there buffer size.
FILE *fp;
fp=fopen("c:\\test.bin", "r"); // This opens the bin file.. All I got so far..
The problem with what your asking is that there's no standard method of storing a file inside another file. In other words, suppose I find the string "foo.dll" inside a file. Does that that foo.dll is contained in that file? Where? How big is it? What if the string is just part of a larger string that contains those characters?
If you know the format of this binary file then it's a different story. For example, if it's a zip file or a tar then it's possible, but you'd be better off using a command line program to extract the files.
Don't do it in C++. Just use tar. If the tar file is f.tar and you want to extract myfile.dll, then I think the command line is tar xf f.tar myfile.dll
regardless, you will do something like
- open the file in write and appending modes
- read the file position. this is the file size in bytes (because you opened to append, its at the end).
- close the file. (alternately, reset the file pointer position to the start, if you know how)
- reopen it to read from beginning (if closed)
- allocate a vector/pointer (C?) container of bytes big enough to hold the whole file or large amounts of it (I don't sweat this anymore until the files are more than 4-8 gigs in size, depending on the machine's ram somewhat) (you want the read() function here, in both C and C++ its similar).
- run thru the file as bytes, looking for whatever... eg, search for the period character, and if you find it, look to see if it sorta kinda looks like a path or file name or whatever you want. If it seems to fit the pattern, extract a substring (you may need to iterate backwards from the period to examine the data bit).
-repeat until you processed all the data. do this off file size; binary files have end of string zeros all over and are generally unreliable to try to second guess.
datamining of this sort requires you to be rather clever when you look at the data to determine that it is text and that this text is what you wanted to see. Its not hard, but you can miss data or print too much unwanted data if you are sloppy with it. It is also a last resort, really. If you have a tool (like the tar progam!) that can give you the info you want (it can) without doing it this way, then use the tool that knows the file format and is designed to do this job. If you have a random binary file, and no clue what is in it, then you can mine at it to see what you can see. I have a block of code that dumps all text from a binary to a text file, and spaces out the rest --- after running that I can use grep on the result safely. Even then, I am assuming some things (ascii, for one) that may or may not be correct.