Convert binary to base 58

I want to convert a long binary number, 256 bits, into a base 58 char array using c code. I cannot find a simple explanation. The Bing tool insists on converting to decimal then to base 58. I don’t want to use a library, I want to write the code. So, …, is this the right concept:

I use a union which includes an array of four uint64_t to give me 256 bits. The binary number is put in the union.

Presume I know how to do multi-word divides and mod operations. I have never done it but I think so. If not, that is a different question. This is my idea of the concept.

remainder = long_number mod 58; // This takes several steps to work though the four 64 bit ints.

long_number = long_number / 58; // again several steps.

The first remainder becomes the least significant character of the base 58 string.
I am aware that binary value 0 becomes character ‘1’, value 1 becomes character ‘2’, etc.)

Repeat those two steps. Each following remainder becomes the next most significant base 58 character.

I will eventually reach one of two conditions and I am uncertain of the last step
1) Quotient is zero, remainder > 0
Remainder becomes the most significant base 58 character
2) Quotient is > 0, remainder = 0
Quotient becomes the most significant base 58 character

Am I on target?
Thank you for your time.

mbozzi (3925)

Start by taking the remainder to find the next digit. Then divide. When the quotient of the division is zero, you're done; you've already got the last digit. Don't stop until the division produces a zero.

For example let's convert the decimal 123456 to base 58.

123456 % 58 = 32    // first (least significant) digit
123456 / 58 = 2128  
2128   % 58 = 40    // second digit
2128   / 58 = 36
36     % 58 = 36    // third digit
36     / 58 = 0     // we're done

Sure enough 123456 can be written as 36×58^2 + 40×58^1 + 32×58^0.

Multi-word divide

Is a little more complicated!

Last edited on

kigar64551 (803)

Something Like this?

static const char* const ASCII_CHARS = "!\"#$ % &'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";

void reverse_str(char* const str_buffer, const size_t len)
{
    size_t i, j;
    char temp;
    if (len > 1U) {
        for (i = 0U, j = len - 1U; i < j; i++, j--) {
            temp = str_buffer[i];
            str_buffer[i] = str_buffer[j];
            str_buffer[j] = temp;
        }
    }
}

size_t base58_encode(char* const buff_out, const size_t buff_size, uint64_t value)
{
    size_t out_index = 0U;

    do {
        if (out_index >= buff_size) {
            return 0U;
        }
        buff_out[out_index++] = ASCII_CHARS[value % 58U];
    } while (value /= 58U);

    if (out_index >= buff_size) {
        return 0U;
    }

    buff_out[out_index] = '\0';
    reverse_str(buff_out, out_index);
    return out_index;
}

uint64_t base58_decode(const char* const str)
{
    uint64_t value = 0U;
    const char* p;
    size_t pos, digit;

    for (pos = 0U; str[pos]; ++pos) {
        if (p = strchr(ASCII_CHARS, str[pos])) {
            digit = p - ASCII_CHARS;
            if (digit >= 58U) {
                return 0U; /* decoding error */
            }
            if (value <= (INT64_MAX - 57U) / 58U) {
                value = value * 58U + digit;
            }
            else {
                return 0U; /* numeric overflow */
            }
        }
        else {
            return 0U; /* decoding error */
        }
    }

    return value;
}

int main()
{
    char str_buff[256U];

    if (base58_encode(str_buff, sizeof(str_buff), 0U)) {
        printf("%s\n", str_buff);
    }

    if (base58_encode(str_buff, sizeof(str_buff), 1U)) {
        printf("%s\n", str_buff);
    }

    if (base58_encode(str_buff, sizeof(str_buff), 123456U)) {
        printf("%s\n", str_buff);
    }

    if (base58_encode(str_buff, sizeof(str_buff), 9000000000000000001U)) {
        printf("%s\n", str_buff);
    }

    puts("");

    printf("%llu\n", base58_decode("!"));
    printf("%llu\n", base58_decode("\""));
    printf("%llu\n", base58_decode("CG?"));
    printf("%llu\n", base58_decode("3RFQ)FTR-'0"));

    return 0;
}

!
"
CG?
3RFQ)FTR-'0

0
1
123456
9000000000000000001

Last edited on

bkelly13 (24)

mbozzi wrote: Start by taking the remainder to find the next digit. Then divide. When the quotient of the division is zero, you're done; you've already got the last digit. Don't stop until the division produces a zero.

I don't understand "Start by taking the remainder to find the next digit." But from the example that follows, I think it confirms the concept I described, with the change for the ending divide as noted in your reply.

In essence, divide by 58 repeatedly until the dividend is less than 58. Each remainder becomes the next character of the base 58 number, beginning with the least significant character.

Now to look at the example code and get the multi-word divide correct. This will take a bit more thought.

Thank you for your time and patience.

bkelly13 (24)

Kigar64551 wrote in line 1:

static const char* const ASCII_CHARS = "!\"#$ % &'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";

I don’t understand. Many of those characters are not used in the base 58 result. From what I see this will create problems. Here is what I came up with.
Caution: My working environment is Qt Creator. It did not like my definition but provided a hint. That hint resulted in the code a few lines below.
Had to use array size of 59 rather than 58 because I was unable to get Qt Creator to create the string using the format ‘123’; The use of “123” forces a null character on the end.
To continue:

constexpr static const unsigned char cb58_array[ 59 ] =
"123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ";

Using this format, the remainder indexes into the array to get the base 58 character that becomes the result.

1
2

a_char = c_b_t0_b58_array[ 0 ];   // result is ‘1’
a_char = c_b_t0_b58_array[ 1 ];  // result is ‘2’

These are exactly what is required.
I tried to create something like the below.

Const static unsigned char b58_char[ 58 ] = ‘1234’; // shortened here for brevity.

That did not work out. I think what I have will work but suggestions will be appreciated.

Side note: I am using MS Edge browser and Preview does not work. Is my browser the problem?

kigar64551 (803)

In my above example, ASCII_CHARS contains all "printable" ASCII characters. For Base-58, only the first 58 chars are used, so, yeah, you could shorten that array to a length of just 58 chars. You can also easily replace/swap the characters to be used as needed.

Note: String literals implicitly have an additional terminating NUL character. This additional NUL characters is included in the size returned by the sizeof() operator, but it is not counted when you use the strlen() function. Anyway, you don't need to care about this, because the suggested Base-58 encode/decode functions simply won't touch any characters in ASCII_CHARS beyond array index 57 😊

Here's a slightly updated version:

static const char* const ASCII_CHARS = "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ";

void reverse_str(char* const str_buffer, const size_t len)
{
    size_t i, j;
    char temp;
    if (len > 1U) {
        for (i = 0U, j = len - 1U; i < j; i++, j--) {
            temp = str_buffer[i];
            str_buffer[i] = str_buffer[j];
            str_buffer[j] = temp;
        }
    }
}

size_t base58_encode(char* const buff_out, const size_t buff_size, uint64_t value)
{
    size_t out_index = 0U;

    do {
        if (out_index >= buff_size) {
            return 0U;
        }
        buff_out[out_index++] = ASCII_CHARS[value % 58U];
    } while (value /= 58U);

    if (out_index >= buff_size) {
        return 0U;
    }

    buff_out[out_index] = '\0';
    reverse_str(buff_out, out_index);
    return out_index;
}

uint64_t base58_decode(const char* const str)
{
    uint64_t value = 0U;
    const char* p;
    size_t pos, digit;

    for (pos = 0U; str[pos]; ++pos) {
        if (p = strchr(ASCII_CHARS, str[pos])) {
            digit = p - ASCII_CHARS;
            if (digit >= 58U) {
                return 0U; /* decoding error */
            }
            if (value <= ((UINT64_MAX - 57U) / 58U) + 1U) {
                value = value * 58U + digit;
            }
            else {
                return 0U; /* numeric overflow */
            }
        }
        else {
            return 0U; /* decoding error */
        }
    }

    return value;
}

static void test_encode(const uint64_t value)
{
    char str[256U];
    if (base58_encode(str, sizeof(str), value)) {
        printf("Encode: %19llu -> %11s\n", value, str);
    }
}

static void test_decode(const char *const str)
{
    printf("Decode: %11s -> %19llu\n", str, base58_decode(str));
}

int main()
{
    test_encode(0U);
    test_encode(1U);
    test_encode(123456U);
    test_encode(9000000000000000001U);

    puts("");

    test_decode("1");
    test_decode("2");
    test_decode("CGy");
    test_decode("mTFSbFVTf9i");

    return 0;
}

Encode:                   0 ->           1
Encode:                   1 ->           2
Encode:              123456 ->         CGy
Encode: 9000000000000000001 -> mTFSbFVTf9i

Decode:           1 ->                   0
Decode:           2 ->                   1
Decode:         CGy ->              123456
Decode: mTFSbFVTf9i -> 9000000000000000001

Last edited on

mbozzi (3925)

Had to use array size of 59 rather than 58 because I was unable to get Qt Creator to create the string using the format ‘123’; The use of “123” forces a null character on the end.

"Strings" with single quotes (e.g., 'abc' or '123' or 'hello') are called "multi-character literals". They have type int and a value that depends on the implementation.

If there's only one character in the single quotes, (e.g., 'a'), it's a normal char.

To define an array of char without an ending null character, use an initialization syntax like this:

static const char cb58_array[58] = 
{  
  '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e',
  'f', 'g', 'h', 'i', 'j', 'k', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
  'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
  'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
  'Y', 'Z', 
};

The ASCII table consists of digits 0-9 followed by capital letters A-Z then lowercase letters a-z, sequentially. If you have the choice, it might make sense to rearrange the contents of cb58_array from
123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ
to
123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
This way, strcmp will be able to put these numbers in order.

I don't understand "Start by taking the remainder to find the next digit." But from the example that follows, I think it confirms the concept I described, with the change for the ending divide as noted in your reply.

It sounds like you get it.

#include <string>
#include <iostream>
 
std::string to_base(int n, int b, char const* digits) { return (n/b? to_base(n/b, b, digits): "") + digits[n%b]; }
std::string to_base2(int n)  { return to_base(n, 2,  "01"); }
std::string to_base10(int n) { return to_base(n, 10, "0123456789"); }
std::string to_base16(int n) { return to_base(n, 16, "0123456789abcdef"); }
std::string to_base58(int n) { return to_base(n, 58, "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"); }

int main()
{
  std::cout << to_base2(123456)  << " (b2)\n";
  std::cout << to_base10(123456) << " (b10)\n";
  std::cout << to_base16(123456) << " (b16)\n";
  std::cout << to_base58(123456) << " (b58)\n";
}

Last edited on

dhayden (5798)

You can convert binary to BCD using double-dabble (https://en.wikipedia.org/wiki/Double_dabble). This requires only shifts and adds. No division is needed. Once you understand the algorithm, you can modify it to work with an output of base 58.

Last edited on

bkelly13 (24)

Regarding the post from kigar64551 (799)
(Sorry about the delay, difficulties on the home front.)
I implemented the code in Qt Creator.
The hex input of all F for a 64 bit int produces JPwcyDCgEup
Visited two base 58 encode sites to get the following results for that value
https://www.dcode.fr/base-58-cipher produces jpXCZedGQ9V
https://appdevtools.com/base58-encoder-decoder produces jpXCZedGfVQ

Then I used the example value of 9000000000000000001 and the dcode site produces 36AncsPnj1q

So, from three tools, there is no agreement.

kigar64551 (803)

It seems they simply use a different mapping of the characters 😏

Just change code from above:
https://cplusplus.com/forum/beginner/285815/#msg1242856

...with this:
static const char* const ASCII_CHARS = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

With that change, I get:

Encode:                    0 (0x0000000000000000) -->           1
Encode:                    1 (0x0000000000000001) -->           2
Encode:               123456 (0x000000000001E240) -->         dhZ
Encode:  9000000000000000001 (0x7CE66C50E2840001) --> MtgsBgvtF9J
Encode: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) --> jpXCZedGfVQ

Decode:           1 -->                    0 (0x0000000000000000)
Decode:           2 -->                    1 (0x0000000000000001)
Decode:         dhZ -->               123456 (0x000000000001E240)
Decode: MtgsBgvtF9J -->  9000000000000000001 (0x7CE66C50E2840001)
Decode: jpXCZedGfVQ --> 18446744073709551615 (0xFFFFFFFFFFFFFFFF)

Last edited on

DizzyDon (110)

I get the same result on those 2 sites (dcode only accepts decimal input) :

1
2

appdevtools:     FFFFFFFFFFFFFFFF --> jpXCZedGfVQ
dcode:       18446744073709551615 --> jpXCZedGfVQ

Your code incorrecly has the lowercase letters before the uppercase letters.
You seem to have left out the correct characters (0, I, O, l) but for some reason put the alphabets in the wrong order.

Here's simple code that can "encode" a hardcoded 256-bit number:

#include <iostream>
#include <cinttypes>

#define UINT256_SIZE 8

typedef uint32_t Uint256[UINT256_SIZE];

uint32_t div256_32(Uint256 n, uint32_t d) {
    uint32_t m = 0;
    for (int i = 0; i < UINT256_SIZE; ++i) {
        uint64_t n64 = n[i] + m * 0x100000000;
        m = n64 % d;
        n[i] = n64 / d;
    }
    return m;
}

bool iszero256(Uint256 n) {
    for (int i = 0; i < UINT256_SIZE; ++i)
        if (n[i] != 0)
            return false;
    return true;
}

int main() {
    const char *s =
        "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

    //Uint256 n = { 0, 0, 0, 0, 0, 0, 0xFFFFFFFF,  0xFFFFFFFF };
    Uint256 n = {
        0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
        0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF
    };

    char o[50] = {0};
    char *p = o + (sizeof o - 1); // point to '\0' at end

    while (!iszero256(n)) {
        uint32_t m = div256_32(n, 58);
        *--p = s[m];
    }

    if (*p == '\0') // if n was 0, p will not have moved
        *--p = '0';

    std::cout << p << '\n';
}

I used 32-bit limbs so that I could use 64-bit values to do the calculation.
I don't know how to get 128-bit values with windows cl. (Is it possible?)

Registered users can post here. Sign in or register to post.

C++

Forum

Convert binary to base 58