My "loop through memory" doesnt recognise value to find and it's very slow

Pages: 12
Hey, I have loop to find value in address range, but it's slow and it doesnt recognise value that I want to search.

In this address there's DOUBLE value: 0xD57AAE48 -> 74.7999999523163
But my program doesnt recognise it...

 
mem.readCustomType<double>(start) == d_Y + playerHeight


Isnt it if 74.7999999523163 equals 74.7999999523163?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <chrono>
#include <iomanip>
#include "memory.h"

Memory mem;

typedef uint64_t QWORD;


QWORD start = 0xD57AAE00; //0xD2000000
QWORD end = start + 0x48; //0x26A85000;

double playerHeight = 1.7999999523163;
double d_Y = 0;
double findValue = 0;

int main()
{
    std::cout << "Your height value? ";
    std::cin >> d_Y;

    auto t1 = std::chrono::high_resolution_clock::now();
    std::cout << std::setprecision(13) << std::fixed << d_Y + playerHeight << std::endl;

    std::cout << "Address value: " << mem.readCustomType<double>(0xD57AAE48) << " Value to find: " << d_Y + playerHeight << std::endl;

    for(start; start <= end; start++){
        std::cout << std::hex << "0x" << std::uppercase << start << std::endl;
        if(mem.readCustomType<double>(start) == d_Y + playerHeight){
            mem.write(start, d_Y);
            std::cout << "Value found at: " << std::hex << "0x" << std::uppercase << start << std::endl;
        }
    }
    auto t2 = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::seconds>( t2 - t1 ).count();
    std::cout << duration;

    std::getchar();
    return 0;
}


Your height value? 73
74.7999999523163
Address value: 74.7999999523163 Value to find: 74.7999999523163
0xD57AAE00
0xD57AAE01
0xD57AAE02
0xD57AAE03
0xD57AAE04
0xD57AAE05
0xD57AAE06
0xD57AAE07
0xD57AAE08
0xD57AAE09
0xD57AAE0A
0xD57AAE0B
0xD57AAE0C
0xD57AAE0D
0xD57AAE0E
0xD57AAE0F
0xD57AAE10
0xD57AAE11
0xD57AAE12
0xD57AAE13
0xD57AAE14
0xD57AAE15
0xD57AAE16
0xD57AAE17
0xD57AAE18
0xD57AAE19
0xD57AAE1A
0xD57AAE1B
0xD57AAE1C
0xD57AAE1D
0xD57AAE1E
0xD57AAE1F
0xD57AAE20
0xD57AAE21
0xD57AAE22
0xD57AAE23
0xD57AAE24
0xD57AAE25
0xD57AAE26
0xD57AAE27
0xD57AAE28
0xD57AAE29
0xD57AAE2A
0xD57AAE2B
0xD57AAE2C
0xD57AAE2D
0xD57AAE2E
0xD57AAE2F
0xD57AAE30
0xD57AAE31
0xD57AAE32
0xD57AAE33
0xD57AAE34
0xD57AAE35
0xD57AAE36
0xD57AAE37
0xD57AAE38
0xD57AAE39
0xD57AAE3A
0xD57AAE3B
0xD57AAE3C
0xD57AAE3D
0xD57AAE3E
0xD57AAE3F
0xD57AAE40
0xD57AAE41
0xD57AAE42
0xD57AAE43
0xD57AAE44
0xD57AAE45
0xD57AAE46
0xD57AAE47
0xD57AAE48
0Press <RETURN> to close this window...
Last edited on
You need a better understanding of floating point numbers.

There are multiple floating point values with a (rounded) decimal representation of
74.7999999523163
Examine the actual binary representation of the values involved; std::hexfloat is your friend.

Read the Goldberg paper.
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Somehow fixed it, but it's very slow. It took ~20mins to loop through 0xD2000000 to 0xF8A85000 :|
What part of the code takes the most time?

Hint: use a performance profiler
3523215360 to 4171780096 is 648,564,736 loop iterations.
printing to screen is sluggish, print to a file:
c:\program.exe > file.txt

and be sure to use release optimized build.

hang on, looked at wrong thing.

I don't know what a "Memory mem;" is and neither does my compiler.
but when you profile it and see that this object is behind the problem, consider something faster like memcmp, and some faster way to find the pattern.

little things matter when you do half a billion of them.
d_Y + playerHeight <--- this looks like a constant? why add it up over and over (even if the compiler knows to fix that for you, its unclean).

line 29 is a problem. remove it. This is at least 90% of the problem.
Last edited on
> if(mem.readCustomType<double>(start) == d_Y + playerHeight)
The first problem is going to be comparing doubles for equality.
If you're off by even a single bit in the LSB, then "damn close" just becomes "nope, not it".

Even if they are bit-equal in memory, you might still need -ffloat-store

The following options control compiler behavior regarding floating point arithmetic. These options trade off between speed and correctness. All must be specifically enabled.
-ffloat-store
Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.

This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a "double" is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.


> for(start; start <= end; start++)
doubles are not scattered at random addresses in memory. Each type has a minimum alignment.
So perhaps
for(start; start <= end; start += alignof(double))
You should also have < end, not <= end.

A caveat to that would be various attempts to obfuscate the code to make simple memory searches less effective (such as structure packing). But this always comes with a performance cost in the software.

Speaking of which, you might want to benchmark
mem.readCustomType<double>(0xD57AAE00)
against
mem.readCustomType<double>(0xD57AAE01)
A naive pointer cast would just get you a https://en.wikipedia.org/wiki/Bus_error on most architectures. This should mean your unaligned access is doing a lot more work than the aligned case.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>
#include <iomanip>

struct foo {
  char c;
  double d;
};
#pragma pack(push,1)
struct bar {
  char c;
  double d; // trying to be sneaky by hiding double in an odd place
};
#pragma pack(pop)

void show(const double f) {
  std::cout << std::fixed << std::setprecision(15) << f
            << " = " << std::hexfloat << f
            << std::endl;
}

int main() {
  double a = 1.7999999523163;
  double b = 73 + a;
  double c = 74.7999999523163;
  show(a);
  show(b);
  show(c);
  std::cout << "Double alignment=" << alignof(double) << std::endl;
  std::cout << sizeof(foo) << std::endl;
  std::cout << sizeof(bar) << std::endl;
}

$ g++ -std=c++11 foo.cpp
$ ./a.out 
1.799999952316300 = 0x1.cccccc0000047p+0
74.799999952316298 = 0x1.2b33333000001p+6
74.799999952316298 = 0x1.2b33333000001p+6
Double alignment=8
16
9



3523215360 to 4171780096 is 648,564,736 loop iterations.
printing to screen is sluggish, print to a file:
c:\program.exe > file.txt


Printing to screen in this project is only for notifying me when memory got changed, now Im using it only in case when Readed Memory value equals Value to find.


I don't know what a "Memory mem;" is and neither does my compiler.
but when you profile it and see that this object is behind the problem, consider something faster like memcmp, and some faster way to find the pattern.

Memory mem is a class where is function to read and write memory etc.

1
2
3
4
5
6
    template <class vData> // To give our function the datatype with Object.Function<DataType>(Parameters);
    vData readCustomType(QWORD Address) {
        vData vReturn;
        ReadProcessMemory(hProcess, (LPVOID)Address, &vReturn, sizeof(vData), NULL);
        return vReturn;
    }



Even if they are bit-equal in memory, you might still need -ffloat-store

I dont know how to use it, im using QT with mingw. I probably must have it because sometimes I got it (findValue == valToFind), but sometimes dont.


> for(start; start <= end; start++)
doubles are not scattered at random addresses in memory. Each type has a minimum alignment.
So perhaps
for(start; start <= end; start += alignof(double))
You should also have < end, not <= end.


With this alignof(double), entire cave scan took 146 seconds (from around 20mins to 2.5min, pretty good)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <iostream>
#include <chrono>
#include <iomanip>
#include "memory.h"

Memory mem;

typedef uint64_t QWORD;


QWORD start = 0xD2000000; //0xD2000000
QWORD end = start + 0x26A85000; //0x26A85000;

const double playerHeight = 1.7999999523163;
double d_Y = 0;
double findValue = 0;
double valToFind = 0;

int main()
{
    std::cout << "Your height value? ";
    std::cin >> d_Y;

    auto t1 = std::chrono::high_resolution_clock::now();

    valToFind = d_Y + playerHeight;

    for(start; start < end; start += alignof(double)){
        findValue = mem.readCustomType<double>(start);
        if(findValue == valToFind){
            mem.write(start, d_Y);
            std::cout << "Value found at: " << std::hex << "0x" << std::uppercase << start << std::endl;
        }
    }
    auto t2 = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::seconds>( t2 - t1 ).count();
    std::cout << std::endl << std::dec << duration;

    std::getchar();
    return 0;
}


TLDR: from 20mins to 2.5min, but need -ffloat-store and dont know how to apply this on QT with mingw
Last edited on
Fair enough. If you want it faster from here, you will probably get more lift from cutting it into chunks and threading it. 4 threads would make it around 4x faster on a 4 core cpu, since you can split the memory window into 4 blocks and avoid race conditions entirely.

I am not sure what you want from QT. I don't see any gui code at all, apart from what looks like mfc handle gibberish at ReadProcessMemory(hProcess,... which isnt really 'gui' here so much as just 'microsoft'.
Last edited on
Yeah, I know there is literally 0 QT, but probably later I will do sth with this Loop through memory and gui
> ReadProcessMemory(hProcess, (LPVOID)Address, &vReturn, sizeof(vData), NULL);
TBH, rather than reading doubles, just treat the memory as a block of unsigned char.
Grab yourself a megabyte at a time.
1
2
3
4
5
6
7
8
    valToFind = d_Y + playerHeight;
    for(start; start < end; start += alignof(double)){
        unsigned char *p = mem.getPointer(start);  // you'd have to look this up
        if ( memcmp(p, &valToFind, sizeof(valToFind)) == 0 ) {
            mem.write(start, d_Y);
            std::cout << "Value found at: " << std::hex << "0x" << std::uppercase << start << std::endl;
        }
    }

how getPointer() function should look like?

For read I have:

1
2
3
4
5
int Memory::read(QWORD address){
    QWORD value;
    ReadProcessMemory(hProcess, (LPVOID)address, &value,sizeof(QWORD),0);
    return value;
}


1
2
3
4
5
6
    template <class vData>
    vData readCustomType(QWORD Address) {
        vData vReturn;
        ReadProcessMemory(hProcess, (LPVOID)Address, &vReturn, sizeof(vData), NULL);
        return vReturn;
    }
Last edited on
1
2
unsigned char *buff = new unsigned char[1024*1024];
ReadProcessMemory(hProcess, (LPVOID)address, buff,1024*1024,0);
I'm doing sth wrong, or what's doing on? App just crashed after run with this

1
2
3
4
5
6
unsigned char* Memory::getPointer(QWORD address)
{
    unsigned char *buff = new unsigned char[1024*1024];
    ReadProcessMemory(hProcess, (LPVOID)address, buff,1024*1024,0);
    return buff;
}
The outer loop reads blocks of 1MB.
The inner loop processes blocks of 1MB.
1
2
3
4
5
6
7
8
9
10
11
    valToFind = d_Y + playerHeight;
    for(start; start < end; start += 1024*1024){
        unsigned char *ptr = mem.getPointer(start);  // you'd have to look this up
        for ( unsigned char *p = ptr ; p < ptr + 1024*1024 ; p += sizeof(double) ) {
            if ( memcmp(p, &valToFind, sizeof(valToFind)) == 0 ) {
                mem.write(start, d_Y);
                std::cout << "Value found at: " << std::hex << "0x" << std::uppercase << start << std::endl;
            }
        }
        delete [] ptr;
    }


Exercise:
Adjust so the last block read is the right size for the remaining data.
I've never done this, can u explain a little bit more what I have to do now?

realloc(p, sizeof(double));

maybe this one? (in loop, before if)
Last edited on
Anyone? :|
Why do you need realloc? Nobody has mentioned it, and you don't need it.
Adjust so the last block read is the right size for the remaining data.

You said me this, and I dont know what should I do. I just heard about realloc when googled adjust memory block size or sth like that
that is not what he is asking.
say you have 1005 things.
you loop 100 times to process things.
the last loop need to only loop 5 times, though, because there are not an even multiple of 100, there are 5 extra.
how do you handle that?

this is the question.
realloc is a C tool that is best avoided in most c++. Its part of the malloc memory tool family; c++ uses new and delete but even those are avoided in favor of containers most of the time.
Last edited on
The workspace is allocated and freed on every iteration of the loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ cat foo.cpp
#include <iostream>
#include <iomanip>

int main() {
  unsigned long start = 0xD2000000;
  unsigned long end = start + 0x1234567;
  unsigned long len = end - start;
  unsigned long oneMb = 1024 * 1024;
  for ( ; start < end ; start += oneMb ) {
    unsigned long bsize = len > oneMb ? oneMb : len;
    std::cout << std::hex << "Reading " << bsize << " bytes from " << start << std::endl;
    len -= bsize;
  }
}
$ ./a.out 
Reading 100000 bytes from d2000000
Reading 100000 bytes from d2100000
Reading 100000 bytes from d2200000
Reading 100000 bytes from d2300000
Reading 100000 bytes from d2400000
Reading 100000 bytes from d2500000
Reading 100000 bytes from d2600000
Reading 100000 bytes from d2700000
Reading 100000 bytes from d2800000
Reading 100000 bytes from d2900000
Reading 100000 bytes from d2a00000
Reading 100000 bytes from d2b00000
Reading 100000 bytes from d2c00000
Reading 100000 bytes from d2d00000
Reading 100000 bytes from d2e00000
Reading 100000 bytes from d2f00000
Reading 100000 bytes from d3000000
Reading 100000 bytes from d3100000
Reading 34567 bytes from d3200000


Mix with this, and you're done.
1
2
3
4
        unsigned char *ptr = mem.getPointer(start,bsize);
        for ( unsigned char *p = ptr ; p < ptr + bsize ; p += sizeof(double) ) {
        }
        delete [] ptr;


TBH, even that much is a waste of effort. Just allocate 1MB for the duration of the code and keep re-using the same buffer. That it's only partially full on the last iteration is irrelevant.
Pages: 12