Hello I am trying to figure out how to read and write the hex/binary of a file in order to edit it to my choosing but no matter what I search it's all how to read data from a file.
For example if I have a text file with some sentences in it I want to try and read and write the hex/binary of the file not the sentences in the file.
This is part of my project for my computer science class and if possible I need to edit it through c++ code not a hex/binary editor.
#include <iostream>
#include <fstream>
#include <limits>
#include <bitset>
using byte = unsignedchar ; // C++17: we can use std::byte if we please
// number of bits in a byte (in practice, a byte is an octet ie. BITS_PER_BYTE == 8)
// http://en.cppreference.com/w/cpp/types/numeric_limits/digitsconstexpr std::size_t BITS_PER_BYTE = std::numeric_limits<byte>::digits ;
// return the bits read from the file represented as a string of chars '0' or '1'
std::string read_bits( const std::string& file_name )
{
// http://en.cppreference.com/w/cpp/utility/bitsetusing bits = std::bitset<BITS_PER_BYTE> ; // bitset of bits in one byte
// if the file was successfully opened for input (note: binary mode)
if( std::ifstream file{ file_name, std::ios::binary } )
{
std::string result ;
file >> std::noskipws ; // read every byte including white space
byte b ;
while( file >> b ) // for each byte read from the file
result += bits(b).to_string() ; // append the bits in this byte to the result
// http://en.cppreference.com/w/cpp/utility/bitset/to_stringreturn result ;
}
elsereturn {} ; // read failure: return an empty string
}
// TO DO: write the bits represented as a string of chars '0' or '1' into the file
bool write_bits( const std::string& file_name ) ; // return true on success
// print the bits (default: to stdout) in a human readable form
std::ostream& print_bits( const std::string& bit_str, std::ostream& stm = std::cout )
{
constexpr std::size_t BYTES_PER_LINE = 8 ;
std::size_t nbits = 0 ;
std::size_t nbytes = 0 ;
for( char bit : bit_str ) // for each bit in the string
{
stm << bit ; // print the bit
if( ++nbits%BITS_PER_BYTE == 0 ) // end of a byte
{
stm << ' ' ; // put a space after every byte
// put a new line after every BYTES_PER_LINE bytes
if( ++nbytes%BYTES_PER_LINE == 0 ) stm << '\n' ;
}
}
return stm ;
}
int main()
{
std::string str_bits = read_bits( __FILE__ ) ; // use this file for testing
std::cout << "read " << str_bits.size() / BITS_PER_BYTE << " bytes ("
<< str_bits.size() << " bits)\n------------------------------\n" ;
print_bits(str_bits);
// edit the string of bits if required
// call put_bits to write the modified bits into a file
}
// TEST: 0123ABCDabcd
There is no such thing as a "hex" file.
Likewise, "hex" is not the same thing as "binary".
(1)
The first problem comes from how things are represented.
The number five (for example) is really an imaginary thing. I can show you five fingers, or count five years that pass, or draw a funny-looking scribble that represents the number five, but a five itself is not real.
We people of the ten fingers have, conveniently, ten digits we use to represent numbers: 0,1,2,3,4,5,6,7,8, and 9. Numbers like five are, also conveniently, small enough to be represented by a single digit.
But computers do not have ten digits. They have two: 0 and 1. This is called binary, because there are two digits.
Another way to refer to the number of digits in a counting system is its radix, also called the "base". In fourth grade, we all learned that 123 is really the same as:
1×100 + 2×10 + 3
We people who have survived seventh grade math know how to write that better:
1×102 + 2×101 + 3×100
That is, every digit place is a power of the base. For use ten-fingered humans, we like a radix (or base) of 10.
Computers like a radix of 2. Here is our five:
1×22 + 0×21 + 1×20
Verify with a little simplification (in our base 10):
1×4 + 0×2 + 1×1 4 + 0 + 1 5
tl;dr
The whole thing you should be getting out of this is that "hex" (hexadecimal) and "binary" and even "decimal" are all human friendly representations of a number. A file is nothing but a list of numbers. The way we view them depends entirely on how we want to view them.
(2)
"Text" and "binary" are another wrinkle in how we view things. The difference is how the computer displays the numbers in a file to us.
A "text" file means that we want the computer to transform the numbers into letters and stuff. For example, a 65 means the computer displays an uppercase 'A'.
A "binary" file means that we are not interested in having the computer transform the numbers for us. We'll decide what the numbers mean.
(3)
To do your CS project, read the file in binary mode. For each number (unsigned char) you get from the file, display it as hex/decimal/binary/whatever you want.
Hex, binary, and text are 3 words that are used oddly by a lot of programmers and even computer literate people.
At the purest level:
hex is simply the base 16 visualization of an integer**.
binary is simply the base 2 visual of an integer.
text is a subset of all data that contains (almost exclusively) human readable content (usually ascii, but unicode is sort of text and sort of readable in a hex editor).
common uses:
hex meaning binary meaning "not text" (text is also viewable in hex!)
binary meaning 'not text' (this is mostly correct, except, text files are also binary files, so its a valid distinction for humans to communicate, but technically incorrect).
binary meaning 'executable/compiled' files (this makes no sense to me).
there may be some other similar uses to these, you will get a feel for it.
at the pure level, all files are a group of bytes, which can be viewed in hex, decimal, binary, octal, and as text (some bytes in "binary" files will give unprintable or nonsense symbols though). The data is the same, but how you view it helps apply context and makes various tasks easier (it is difficult to type up a read-me in a hex editor, and it is usually not possible to modify an executable in a text editor and have it still execute after saving, for a couple of examples)
**note that floating point values are made up of bytes that are integers, and can be viewed in hex or binary, though they are not easy to comprehend as their true floating point value in this format. Most good hex editors will show the float/double value of the bytes you are moused over.
Both C and C++ make a clear, well-defined distinction between text and binary streams.
Text streams (eg. files read or written in text mode):
A text stream is an ordered sequence of characters composed into lines (zero or more characters plus a terminating '\n'). Whether the last line requires a terminating '\n' is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to the conventions for representing text in the OS (in particular, C streams on Windows OS convert \n to \r\n on output, and convert \r\n to \n on input)
Data read in from a text stream is guaranteed to compare equal to the data that were earlier written out to that stream only if all of the following is true:
. the data consist only of printing characters and the control characters \t and \n (in particular, on Windows OS, the character '\0x1A' terminates input)
. no \n is immediately preceded by a space character (space characters that are written out immediately before a \n may disappear when read)
. the last character is \n
Binary streams (eg. files read or written in binary mode):
A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in from a binary stream always equals to the data that were earlier written out to that stream. Implementations are only allowed to append a number of null characters to the end of the stream. A wide binary stream doesn't need to end in the initial shift state.
This distinction between binary and text files is also important in other (non C/C++) contexts; for example FTP handles text files (ASCII and EBCIDIC modes) differently from binary files (Image mode).
yes. But a lot of programmers throw the terms around much more loosely defined. It can be confusing to new programmers if the terms were used carelessly.
> It can be confusing to new programmers if the terms were used carelessly.
It would be even more confusing to new programmers (and potentially more damaging; because they may be naive enough to believe you) to assert in stentorian tones that there is no difference between files opened in binary and text modes.
I did not mean to assert or say that at all!
You can open a text file in binary mode, but I never meant to say you should always do that (its handy for a few specific use cases and good to know that you can) and opening binary as text simply does not work.