asm to c++?

Pages: 12
I have this code in asm that I'm debugging:

1
2
3
4
5
6
//no type checking here.
mov ebx, dword ptr ds:[40302C] //a string from user input
movsx edx, byte ptr ds:[eax+40341E] //one byte of a static string varibale
sub ebx,edx
imul ebx,edx 
...


so basically this is visually equivalent to

1
2
3
4
5
6
7
8
9
10
11
12
const std::string static_var{"ABCDEFGHIJKLMNOPQRSTUVWXYZ12345"};
std::string user_data{};

std::cout << "Enter your data\n";// you can input any alphanumeric sequence
std::cin >> user_data;

int var1 = /*first four most significant bytes of variable user_data, I'm not sure how I would copy them in c++ because of type checking*/
int var2 = /*sign extended first byte of static_var to fit into 32 bits variable*/

var1 -= var2 //sub ebx,edx
var1 *= var2 //imul ebx,edx 


Is there any way I could repricate the above asm code in c++?

edit1:


An example:
1
2
3
4
5
6
7
8
//if I input the following string "AAAAA" then:
user_data  = "AAAAA"
static_var{"%MNOPQRSTUVWXYZ1234506789"};

mov ebx, dword ptr ds:[40302C] //ebx would contain the first 4 bytes of AAAAA that would be ebx == 0x41414141  because hex_of_char('A') == 0x41
movsx edx, byte ptr ds:[eax+40341E] //any byte of the static string var, let's say the first so edx == 0x25 because hex_of_char('%') == 0x25
sub ebx,edx // ebx == 0x41414141, edx == 0x25 so ebx = 0x41414141 - 0x25 == 0x4141411C
imul ebx,edx // ebx == 0x4141411C, edx == 0x25 so ebx = 0x4141411C * 0x25 == 6E6E690C 

Last edited on
Not sure I understand, but perhaps:

1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>
#include <iomanip>
#include <string>
using namespace std;

int main() {
    string s;
    cout << "Enter at least 4 characters: ";
    cin >> s;
    int v = *(int*)s.data();
    cout << hex << setfill('0') << setw(8) << v << '\n';
}

Thank you for the quick reply, let me test your suggestion I'll give my feedback.
int v = *(int*)s.data();

Feedback:

What you've suggested works in copying the raw bytes but when I place the code in a function I can't tell why it's spitting garbage. see below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <string>

int copy_bytes(int num_bytes,const std::string& ldata)
{
    int total_bytes = ldata.size();

    if(num_bytes > 0){
        if(num_bytes > total_bytes){
            num_bytes = total_bytes;
        }
        if(num_bytes == 1){
          return (int)ldata[0];//or (*(int*)&ldata[0]);
        }//garbage error fixed

        std::string str_bytes{ldata.substr(0,num_bytes)};
        
        return (*(int*)str_bytes.data());
    }

    std::cout<<"Error: number of bytes to copy must be > 0\n";
    return 0;
}

int main(){
std::string datax{"AAAAA"};

std::cout<<copy_bytes(4,datax)<<"\n";//prints valid result for num_bytes > 1 and garbage when it's 1, any suggestions.

}


Last edited on
int v = *(int*)s.data();
try
int *v = (int*)&s[0];
cout << *v << endl;
Well, obviously it's not going to work to just get one byte.
It's designed to get an int's worth of bytes.
If you want the one byte, just access it with the subscript operator.
Thank you very much guys, the first problem of acquiring raw bytes is done but the integer values that I've gotten are in decimal form while in my asm code everything was in hexadecimal, if I do arithmetic operations on the decimal values(there's no way to perform hexadecimal arithmetics on c++) and then convert them into hexadecimal am I guaranteed to get the same result as in my asm code?

Thanks in advance.
the computer only stores integers ONE way. That is in binary, effectively circuits that have power or not in the hardware. You can print it in binary, hex, decimal, octal, and other ways, but the bits are all the same in the hardware, that is just formatting of the output.

note that the ascii letters and hex digits are not directly convertible just as ascii decimal digit zero is not ascii 0. Do not think that "AAAA" string is 0xAAAA in hex at all.
It worked yea, but only when I hardcode the user input sometimes for example:
If i use the following input
1
2
3
user_input = "AGSD REID";//the program runs without errors when this is hardcoded into the algo.
//but when I acquire the same input from the console it doesn't work.
std::cin>>user_input; //it only seems to read the first half and discard the rest so user_input would contain "AGSD" only, very weird. 


@jonnin note that the ascii letters and hex digits are not directly convertible just as ascii decimal digit zero is not ascii 0. Do not think that "AAAA" string is 0xAAAA in hex at all.
It took me sometimes to wrap my head around this but I finally got it, thank you.

As for the input problem any suggestions as to how I would solve it?
Last edited on
it only seems to read the first half and discard the rest so user_input would contain "AGSD" only, very weird.


No. That is by design. string extraction from a stream only extracts up to but not including a white-space (space, tab, newline).

To obtain input that may contain a space, use getline()

 
std::getline(std::cin, user_input);


See
http://www.cplusplus.com/reference/string/string/getline/
http://www.cplusplus.com/reference/string/string/operator%3E%3E/
No. That is by design. string extraction from a stream only extracts up to but not including a white-space (space, tab, newline).
Thank you, I didn't know that was by design , thanks for the documentation it explains it clearly.

Finally to convert my computed decimal result into hex, I have this function:
1
2
3
4
5
6
  auto int2hex = [](int val)
                  {
                      std::stringstream ss{""};//was suggested
                      ss<<std::hex<<val;
                      return ss.str();
                  };

is there a better way I could achieve the above result other than this(note: no errors present though., everything works as expected.).
Last edited on
You can use ostringstream instead of stringstream as the stream is only used for insertion. Also you don't need the {""} after ss on L3
better is subjective. the string to text and text to string built in tools are very, very slow**. There are a lot of pages, papers, code examples, and more on extremely ugly ways to 'fix' this when you need high performance. Its a rabbit hole though, and its best left undug unless you are trying to process data that would take hours doing it the 'slow' way.

**slow is relative. For printing on the screen, it won't matter. Printing on the screen is slow too! You will only really notice it if dumping to a file or something faster than screen and when processing multi-million string sets.
Last edited on
Yeah. But before exploring the rabbit hole, always make sure you need to.

Performance benchmark, performance benchmark, performance benchmark.....
better is subjective. I meant faster in terms of performance. The only reason I converted the asm to c++ is for the sole purpose of reproducing\documenting bugs and give my colleague a simple test case in c++ which details the bug. (she requested c++ and I perfectly reproduced it with your help.)

Note: this is solely a tool for documentation.

Thank you.
Last edited on
Instead of using a stream for the conversion, another way is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <string>
#include <iterator>

template <typename I>
std::string toHexStr(I w) {
	static const char* const digits {"0123456789ABCDEF"};
	std::string rc(sizeof(I) << 1, '0');
	auto itr {rc.rbegin()};

	for (; w; w >>= 4)
		*itr++ = digits[w & 0x0f];

	return itr == rc.rbegin() ? "0" : std::string(itr.base(), rc.end());
}

int main()
{
	int i = 0xfffffff;

	std::cout << toHexStr(i) << '\n';
}

> I meant faster in terms of performance.

C++17 has the high-performance std::to_chars.

1) Integer formatters: value is converted to a string of digits in the given base (with no redundant leading zeroes).
...
Unlike other formatting functions in C++ and C libraries, std::to_chars is locale-independent, non-allocating, and non-throwing. Only a small subset of formatting policies used by other libraries (such as std::sprintf) is provided. This is intended to allow the fastest possible implementation that is useful in common high-throughput contexts.
https://en.cppreference.com/w/cpp/utility/to_chars

1
2
3
4
5
6
7
8
9
#include <iostream>
#include <string>
#include <charconv>

template <typename I>
std::string toHexStr(I w) {
	std::string rc(sizeof(I) << 1, '0');
	return {rc.data(), std::to_chars(rc.data(), rc.data() + rc.size(), w, 16).ptr};
}

I did not mean to include to_chars in my complaint above. Its pretty good for a reusable generic answer. I forgot about it, to be honest...
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <type_traits>
#include <string_view>
#include <string>
#include <charconv>

// designed with efficiency as the overriding goal:
// returns a view to a static internal buffer, which is overwritten on each call
template < typename T >
typename std::enable_if< std::is_integral<T>::value, std::string_view >::type hex_str_view( T value )
{
    static char buffer[128] ;
    const auto [ ptr, ec ] = std::to_chars( buffer, buffer+sizeof(buffer), value, 16 ) ;
    return { buffer, std::size_t(ptr-buffer) } ;
}

// returns a string (copy of the characters in the view returned by hex_str_view)
template < typename T >
typename std::enable_if< std::is_integral<T>::value, std::string >::type hex_str( T value )
{
    return std::string( hex_str_view(value) );
}
Some comparison timings using the various methods from above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
#include <string>
#include <sstream>
#include <charconv>
#include <type_traits>
#include <string_view>
#include <chrono>
#include <iostream>

template<typename T>
std::string toHex1(T val) {
	std::stringstream ss;
	ss << std::hex << val;
	return ss.str();
}

template <typename T>
std::string toHex2(T w) {
	static const char* const digits {"0123456789ABCDEF"};
	std::string rc(sizeof(T) << 1, '0');
	auto itr {rc.rbegin()};

	for (; w; w >>= 4)
		*itr++ = digits[w & 0x0f];

	return itr == rc.rbegin() ? "0" : std::string(itr.base(), rc.end());
}

template <typename I>
std::string toHex3(I w) {
	std::string rc(sizeof(I) << 1, '0');
	return {rc.data(), std::to_chars(rc.data(), rc.data() + rc.size(), w, 16).ptr};
}

template <typename T>
typename std::enable_if_t< std::is_integral_v<T>, std::string_view> toHex4(T value) {
	static char buffer[128];
	const auto [ptr, ec] = std::to_chars(buffer, buffer + sizeof(buffer), value, 16);

	return {buffer, std::size_t(ptr - buffer)};
}

template <typename T>
typename std::enable_if_t< std::is_integral_v<T>, std::string> toHex5(T value) {
	return std::string(toHex4(value));
}

class Timer
{
public:
	Timer(const char* name) : name_(name), start(std::chrono::high_resolution_clock::now()) {}

	~Timer() {
		const auto diff {std::chrono::high_resolution_clock::now() - start};

		std::cout << name_ << " took " << std::chrono::duration<double, std::milli>(diff).count() << " ms\n";
	}

private:
	std::string name_;
	decltype(std::chrono::high_resolution_clock::now()) start {};
};

int main()
{
	constexpr size_t iters {5'000'000};

	{
		Timer t("stream");

		for (size_t i = 0; i < iters; ++i)
			toHex1(i);
	}

	{
		Timer t("loop");

		for (size_t i = 0; i < iters; ++i)
			toHex2(i);
	}

	{
		Timer t("to chars");

		for (size_t i = 0; i < iters; ++i)
			toHex3(i);
	}

	{
		Timer t("static buffer");

		for (size_t i = 0; i < iters; ++i)
			toHex5(i);
	}
}



stream took 4687.9 ms
loop took 308.771 ms
to chars took 294.2 ms
static buffer took 69.0416 ms


Which just shows how slow using stream is - and how using a static buffer can improve performance.
Pages: 12