How do you read the hex/binary OF a file

Forum

Forum
Beginners
How do you read the hex/binary OF a file

How do you read the hex/binary OF a file?

Hello I am trying to figure out how to read and write the hex/binary of a file in order to edit it to my choosing but no matter what I search it's all how to read data from a file.
For example if I have a text file with some sentences in it I want to try and read and write the hex/binary of the file not the sentences in the file.

This is part of my project for my computer science class and if possible I need to edit it through c++ code not a hex/binary editor.

Any and all help is appreciated, Thank you.

SamuelAdams (1535)

http://www.cplusplus.com/doc/tutorial/files/

JLBorges (13770)

This should help to get you started:

#include <iostream>
#include <fstream>
#include <limits>
#include <bitset>

using byte = unsigned char ; // C++17: we can use std::byte if we please

// number of bits in a byte (in practice, a byte is an octet ie. BITS_PER_BYTE == 8)
// http://en.cppreference.com/w/cpp/types/numeric_limits/digits
constexpr std::size_t BITS_PER_BYTE = std::numeric_limits<byte>::digits ;

// return the bits read from the file represented as a string of chars '0' or '1'
std::string read_bits( const std::string& file_name )
{
    // http://en.cppreference.com/w/cpp/utility/bitset
    using bits = std::bitset<BITS_PER_BYTE> ; // bitset of bits in one byte

    // if the file was successfully opened for input (note: binary mode)
    if( std::ifstream file{ file_name, std::ios::binary } )
    {
        std::string result ;

        file >> std::noskipws ; // read every byte including white space

        byte b ;
        while( file >> b ) // for each byte read from the file
            result += bits(b).to_string() ; // append the bits in this byte to the result
                             // http://en.cppreference.com/w/cpp/utility/bitset/to_string
        return result ;
    }

    else return {} ; // read failure: return an empty string
}

// TO DO: write the bits represented as a string of chars '0' or '1' into the file
bool write_bits( const std::string& file_name ) ; // return true on success


// print the bits (default: to stdout) in a human readable form
std::ostream& print_bits( const std::string& bit_str, std::ostream& stm = std::cout )
{
    constexpr std::size_t BYTES_PER_LINE = 8 ;

    std::size_t nbits = 0 ;
    std::size_t nbytes = 0 ;
    for( char bit : bit_str ) // for each bit in the string
    {
        stm << bit ; // print the bit

        if( ++nbits%BITS_PER_BYTE == 0 ) // end of a byte
        {
            stm << ' ' ; // put a space after every byte

            // put a new line after every BYTES_PER_LINE bytes
            if( ++nbytes%BYTES_PER_LINE == 0 ) stm << '\n' ;
        }
    }

    return stm ;
}

int main()
{
    std::string str_bits = read_bits( __FILE__ ) ; // use this file for testing

    std::cout << "read " << str_bits.size() / BITS_PER_BYTE << " bytes ("
              << str_bits.size() << " bits)\n------------------------------\n" ;

    print_bits(str_bits);

    // edit the string of bits if required
    // call put_bits to write the modified bits into a file
}
// TEST: 0123ABCDabcd

Duthomhas (13309)

There is no such thing as a "hex" file.
Likewise, "hex" is not the same thing as "binary".

(1)
The first problem comes from how things are represented.

The number five (for example) is really an imaginary thing. I can show you five fingers, or count five years that pass, or draw a funny-looking scribble that represents the number five, but a five itself is not real.

We people of the ten fingers have, conveniently, ten digits we use to represent numbers: 0,1,2,3,4,5,6,7,8, and 9. Numbers like five are, also conveniently, small enough to be represented by a single digit.

But computers do not have ten digits. They have two: 0 and 1. This is called binary, because there are two digits.

Another way to refer to the number of digits in a counting system is its radix, also called the "base". In fourth grade, we all learned that 123 is really the same as:

    1×100 + 2×10 + 3

We people who have survived seventh grade math know how to write that better:

    1×10² + 2×10¹ + 3×10⁰

That is, every digit place is a power of the base. For use ten-fingered humans, we like a radix (or base) of 10.

Computers like a radix of 2. Here is our five:

    1×2² + 0×2¹ + 1×2⁰

Verify with a little simplification (in our base 10):

    1×4 + 0×2 + 1×1
    4 + 0 + 1
    5

tl;dr
The whole thing you should be getting out of this is that "hex" (hexadecimal) and "binary" and even "decimal" are all human friendly representations of a number. A file is nothing but a list of numbers. The way we view them depends entirely on how we want to view them.

(2)
"Text" and "binary" are another wrinkle in how we view things. The difference is how the computer displays the numbers in a file to us.

A "text" file means that we want the computer to transform the numbers into letters and stuff. For example, a 65 means the computer displays an uppercase 'A'.

A "binary" file means that we are not interested in having the computer transform the numbers for us. We'll decide what the numbers mean.

(3)
To do your CS project, read the file in binary mode. For each number (unsigned char) you get from the file, display it as hex/decimal/binary/whatever you want.

Good luck!

lastchance (6980)

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
using namespace std;


string toBase( unsigned i, unsigned base )
{ 
   const string DIGITS = "0123456789ABCDEF";
   return i ? toBase( i / base, base ) + DIGITS[i%base] : "";
}


int main( int argc, char *argv[] )
{
   string filename = argv[1];
   unsigned base = atoi( argv[2] );

   const int LINELENGTH = 10;

   int width = toBase( 255, base ).size();
   char c;
   int counter = 0;

   ifstream in( filename, ios::binary );
   while ( in.get( c ) )
   {
      cout << setw( width + 1 ) << toBase( (unsigned)c, base );
      counter++;
      if ( counter%LINELENGTH == 0 ) cout << '\n';
   }
}

a.exe ThisFile.cpp 16

 23 69 6E 63 6C 75 64 65 20 3C
 69 6F 73 74 72 65 61 6D 3E  D
  A 23 69 6E 63 6C 75 64 65 20
 3C 69 6F 6D 61 6E 69 70 3E  D
  A 23 69 6E 63 6C 75 64 65 20
 3C 66 73 74 72 65 61 6D 3E  D
  A 23 69 6E 63 6C 75 64 65 20
 3C 73 74 72 69 6E 67 3E  D  A
 75 73 69 6E 67 20 6E 61 6D 65
 73 70 61 63 65 20 73 74 64 3B
  D  A  D  A  D  A 73 74 72 69
 6E 67 20 74 6F 42 61 73 65 28
 20 75 6E 73 69 67 6E 65 64 20
 69 2C 20 75 6E 73 69 67 6E 65
 64 20 62 61 73 65 20 29  D  A
 7B 20  D  A 20 20 20 63 6F 6E
 73 74 20 73 74 72 69 6E 67 20
 44 49 47 49 54 53 20 3D 20 22
 30 31 32 33 34 35 36 37 38 39
 41 42 43 44 45 46 22 3B  D  A
 20 20 20 72 65 74 75 72 6E 20
 69 20 3F 20 74 6F 42 61 73 65
 28 20 69 20 2F 20 62 61 73 65
 2C 20 62 61 73 65 20 29 20 2B
 20 44 49 47 49 54 53 5B 69 25
 62 61 73 65 5D 20 3A 20 22 22
 3B  D  A 7D  D  A  D  A  D  A
 69 6E 74 20 6D 61 69 6E 28 20
 69 6E 74 20 61 72 67 63 2C 20
 63 68 61 72 20 2A 61 72 67 76
 5B 5D 20 29  D  A 7B  D  A 20
 20 20 73 74 72 69 6E 67 20 66
 69 6C 65 6E 61 6D 65 20 3D 20
 61 72 67 76 5B 31 5D 3B  D  A
 20 20 20 75 6E 73 69 67 6E 65
 64 20 62 61 73 65 20 3D 20 61
 74 6F 69 28 20 61 72 67 76 5B
 32 5D 20 29 3B  D  A  D  A 20
 20 20 63 6F 6E 73 74 20 69 6E
 74 20 4C 49 4E 45 4C 45 4E 47
 54 48 20 3D 20 31 30 3B  D  A
  D  A 20 20 20 69 6E 74 20 77
 69 64 74 68 20 3D 20 74 6F 42
 61 73 65 28 20 32 35 35 2C 20
 62 61 73 65 20 29 2E 73 69 7A
 65 28 29 3B  D  A 20 20 20 63
 68 61 72 20 63 3B  D  A 20 20
 20 69 6E 74 20 63 6F 75 6E 74
 65 72 20 3D 20 30 3B  D  A  D
  A 20 20 20 69 66 73 74 72 65
 61 6D 20 69 6E 28 20 66 69 6C
 65 6E 61 6D 65 2C 20 69 6F 73
 3A 3A 62 69 6E 61 72 79 20 29
 3B  D  A 20 20 20 77 68 69 6C
 65 20 28 20 69 6E 2E 67 65 74
 28 20 63 20 29 20 29  D  A 20
 20 20 7B  D  A 20 20 20 20 20
 20 63 6F 75 74 20 3C 3C 20 73
 65 74 77 28 20 77 69 64 74 68
 20 2B 20 31 20 29 20 3C 3C 20
 74 6F 42 61 73 65 28 20 28 75
 6E 73 69 67 6E 65 64 29 63 2C
 20 62 61 73 65 20 29 3B  D  A
 20 20 20 20 20 20 63 6F 75 6E
 74 65 72 2B 2B 3B  D  A 20 20
 20 20 20 20 69 66 20 28 20 63
 6F 75 6E 74 65 72 25 4C 49 4E
 45 4C 45 4E 47 54 48 20 3D 3D
 20 30 20 29 20 63 6F 75 74 20
 3C 3C 20 27 5C 6E 27 3B  D  A
 20 20 20 7D  D  A 7D  D  A

Duthomhas (13309)

LOL, me too!

#include <cctype>
#include <ciso646>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <string>

#ifdef _WIN32
  #include <io.h>
  #include <fcntl.h>
  #define isatty _isatty
  #define fileno _fileno
#else // POSIX
  #include <unistd.h>
#endif

/*
-------------------------------------------------------------------------------
1234567890  01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16  1234567890123456
*/
void dump( std::istream& ins )
{
  std::cout << std::setfill( '0' ) << std::hex << std::uppercase;

  std::size_t offset = 0;
  while (ins)
  {
    char s[ 16 ];
    std::size_t n, i;

    std::cout << std::setw( 10 ) << (offset & 0xFFFFFFFFFF) << "  ";
    offset += sizeof(s);

    ins.read( s, sizeof(s) );
    n = ins.gcount();

    for (i = 0; i < n; i++)
      std::cout << std::setw( 2 ) << (int)s[ i ] << " ";

    while (i++ < sizeof(s))
      std::cout << "   ";
    std::cout << "  ";

    for (i = 0; i < n; i++)
      if (std::isprint( s[ i ] )) std::cout << s[i];
      else                        std::cout << ".";

    std::cout << "\n";
  }
}

//-----------------------------------------------------------------------------
int main( int argc, char** argv )
{
  if (argc < 2)
  {
    // Display usage if stdin is a human
    if (isatty( fileno(stdin) ))
    {
      std::string filename = argv[ 0 ];
      filename.erase( 0, filename.find_last_of( "/\\" ) + 1 );
      auto n = filename.rfind( '.' );
      if ((n > 0) and (n != filename.npos)) filename.erase( n );

      std::cerr << 
        "usage:\n"
        "  " << filename << " FILENAME ...\n"
        "  " << filename << " < FILENAME\n"
        "  SOMEPROG | " << filename << "\n\n"
        "Display the hexadecimal dump of each named file.\n\n";
      return 0;
    }

    // Dump stdin
    #ifdef _WIN32
    _setmode( fileno(stdin), _O_BINARY );
    #endif
    
    dump( std::cin );
  }
  else
  {
    // Dump all named files
    for (int n = 1; n < argc; n++)
    {
      std::ifstream f( argv[ n ], std::ios::binary );
      if (!f) 
      {
        std::cout << "Could not open " << argv[ n ] << "\n";
        continue;
      }
      std::cout << argv[ n ] << "\n";
      dump( f );
    }
  }
}

C:\prog> dump test.txt
test.txt
0000000000  42 65 61 6E 73 2C 20 62 65 61 6E 73 2C 20 67 6F   Beans, beans, go
0000000010  6F 64 20 66 6F 72 20 74 68 65 20 68 65 61 72 74   od for the heart
0000000020  21 0D 0A 54 68 65 20 6D 6F 72 65 20 79 6F 75 20   !..The more you
0000000030  65 61 74 2C 20 74 68 65 20 6D 6F 72 65 20 79 6F   eat, the more yo
0000000040  75 20 66 61 72 74 2E 0D 0A 54 68 65 20 6D 6F 72   u fart...The mor
0000000050  65 20 79 6F 75 20 66 61 72 74 2C 20 74 68 65 20   e you fart, the
0000000060  62 65 74 74 65 72 20 79 6F 75 20 66 65 65 6C 2E   better you feel.
0000000070  0D 0A 53 6F 20 65 61 74 20 6D 6F 72 65 20 62 65   ..So eat more be
0000000080  61 6E 73 20 77 69 74 68 20 65 76 65 72 79 20 6D   ans with every m
0000000090  65 61 6C 21 0D 0A                                 eal!..

Last edited on

jonnin (11497)

Adding to the above to give some context:

Hex, binary, and text are 3 words that are used oddly by a lot of programmers and even computer literate people.

At the purest level:
hex is simply the base 16 visualization of an integer**.
binary is simply the base 2 visual of an integer.
text is a subset of all data that contains (almost exclusively) human readable content (usually ascii, but unicode is sort of text and sort of readable in a hex editor).

common uses:
hex meaning binary meaning "not text" (text is also viewable in hex!)
binary meaning 'not text' (this is mostly correct, except, text files are also binary files, so its a valid distinction for humans to communicate, but technically incorrect).
binary meaning 'executable/compiled' files (this makes no sense to me).
there may be some other similar uses to these, you will get a feel for it.

at the pure level, all files are a group of bytes, which can be viewed in hex, decimal, binary, octal, and as text (some bytes in "binary" files will give unprintable or nonsense symbols though). The data is the same, but how you view it helps apply context and makes various tasks easier (it is difficult to type up a read-me in a hex editor, and it is usually not possible to modify an executable in a text editor and have it still execute after saving, for a couple of examples)

**note that floating point values are made up of bytes that are integers, and can be viewed in hex or binary, though they are not easy to comprehend as their true floating point value in this format. Most good hex editors will show the float/double value of the bytes you are moused over.

ProtoType25 (2)

Thank you everyone for helping me this is exactly what i was looking for!

JLBorges (13770)

Both C and C++ make a clear, well-defined distinction between text and binary streams.

Text streams (eg. files read or written in text mode):

A text stream is an ordered sequence of characters composed into lines (zero or more characters plus a terminating '\n'). Whether the last line requires a terminating '\n' is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to the conventions for representing text in the OS (in particular, C streams on Windows OS convert \n to \r\n on output, and convert \r\n to \n on input)

Data read in from a text stream is guaranteed to compare equal to the data that were earlier written out to that stream only if all of the following is true:

. the data consist only of printing characters and the control characters \t and \n (in particular, on Windows OS, the character '\0x1A' terminates input)
. no \n is immediately preceded by a space character (space characters that are written out immediately before a \n may disappear when read)
. the last character is \n

Binary streams (eg. files read or written in binary mode):

A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in from a binary stream always equals to the data that were earlier written out to that stream. Implementations are only allowed to append a number of null characters to the end of the stream. A wide binary stream doesn't need to end in the initial shift state.

http://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes

This distinction between binary and text files is also important in other (non C/C++) contexts; for example FTP handles text files (ASCII and EBCIDIC modes) differently from binary files (Image mode).

jonnin (11497)

yes. But a lot of programmers throw the terms around much more loosely defined. It can be confusing to new programmers if the terms were used carelessly.

JLBorges (13770)

> It can be confusing to new programmers if the terms were used carelessly.

It would be even more confusing to new programmers (and potentially more damaging; because they may be naive enough to believe you) to assert in stentorian tones that there is no difference between files opened in binary and text modes.

jonnin (11497)

I did not mean to assert or say that at all!
You can open a text file in binary mode, but I never meant to say you should always do that (its handy for a few specific use cases and good to know that you can) and opening binary as text simply does not work.

If that was confusing, I apologize.

Topic archived. No new replies allowed.