Allocating Specific Memory

Hi,
I've got a situation where I have a huge amount of memory on whose specific pointer values I'm processing with a time consuming algorithm, basically performing a custom sorting of them. Ideally, I would like to be able to save the results of the sort so that I could reuse the results more quickly the next time I run my program. However, what I don't know how to do is make sure that I allocated the exact same pointer values the next time I run the program. Does that make any sense? Does anyone have any suggestions for me?
Thanks,
Sean
You cannot guarentee that the pointer values will be the same. Why would you want to save pointer values anyway?
A better approach would be to index these items somehow. Like give them some kind of ID number. Then store those ID numbers in the file, rather than store pointer values.

May be "placement new" can help. Although it needs a lot of care to use it, but see if that can help you.
placement new only works on memory blocks that you already have allocated.

There's no guarantee you can allocate memory at the same address as the pointer you previously had.

And even if you could it would be a bad idea. Better to just use some other ID that you have full control over.

yes you are correct Disch. I just gave smcguffee, a hint, so may be he somehow get something out of it. :)
say you have n objects that you want sorted from small to large

1. allocate an data[n] of objects and load these objects in
2. allocate unsigned idx[n], initialized from 0 to n-1
3. when "sorting", don't sort data[] directly; rather, sort idx[n] (so you have 1 level of indirection)
4. when done sorting, idx[0] should be the idx to the smallest entry in data[]
5. write data[] to disk in one big chunk
6. write idx[] to disk in one big chunk
7. next time, read data[] and idx[] from disk in big chunks
Storing pointers doesn't make any sense, because a pointer is valid only within the same process instance that allocated the block it points to. Pointers are not memory addresses. The OS does the translation from pointers to addresses, and after the process quits, that mapping is lost. When you run it once again it can be different. Therefore you can have even two applications sharing exactly the same pointer, but each of these pointers would point to a different physical memory location.
Last edited on
rapidcoder's comment is true for address pointers

that's why Disch and I advocate the use of indexes which remain valid, as long as the original array of objects remain in the same order - indexes are almost like pointers that you maintain on your own
Okay, I think I get it.
It seems I can't find a way to reproduce the pointers in memory, so I have to store the offsets from the starting pointer in memory, and then the next time i allocate, I can recreate the values with those stored offsets and the new starting pointer. That's only one level of indirection. I think I can handle that. Thanks!
Sean
fyi, there are more direct ways of reproducing pointers in memory for *nix systems

http://en.wikipedia.org/wiki/Memory-mapped_file

however, it's harder to do than using indexes, and may not yield that many benefits these days for most applications since we have tons of memory - indexes is so easy to do that you can code it up in less time than it takes to learn how to do memory mapped files correctly - gl
Ok, it turns out that it's not so simple to do an index. I have multiple points of origin, i.e. more than one starting pointer, yet, all pointers are sorted. Thus, I'd have to do quite a bit of conditional math to do offsets. I suppose that overhead wouldn't be horrible, but it might be a real pain to implement. Am I making any sense? Is there still a simple way to index for this type of situation? I suppose maybe I could somehow have a pair structure with which starting point as one of the terms, but at least at the moment I'm still a bit confused about how that might work out. Anyway, this memory-mapped-file idea might be something with potential. I'll definitely have to check that out.
no, you are not making any sense

an index is a pointer - it's one that you maintain yourself and instead of an absolute address, it's an offset from the head of your data block

if you need to sort separately by lastname and by zip-code, you maintain two different blocks of indexes

if you need to sort by both lastname and zip-code, your comparison operator would check on zip-code if and only if the lastnames are the same

understanding how to do a memory-mapped-file is much harder than building your own index, but you can try if you like

the only time you cannot use an index is, if your records are variable length, but even then, there are ways to remap your records so they become fixed length
Last edited on
Let me be specific about what type of pointers I'm sorting.
I have a list of char* C strings, each of which has it's own first position and many pointers adjacent to them along those C strings. I then make a
map<char*,MyInfoClass,MySortFunction>
with those pointers, the first positions and many of the other positions.
Thus, after the sort is done by inserting all the values in the map, I have a list of values that are offset from different first char* pointer positions. Does that make more sense?
sorry, but you are still being unclear - as a software engineer or as a project manager, I would find it difficult to write a design specification based on what you have just described

without worries about implementation, describe a specific example of what you are trying to do - give the input and describe what the output should look like to satisfy your requirements

in other words, if I describe an implementation that can turn your input into your output, then I have done my job

I seriously doubt there is a case where pointers would work, but indexes wouldn't...
Example:
Sort Function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    class LessForCharPointerOfGivenLength
    {
    public:
        LessForCharPointerOfGivenLength():length(0)
        {
            
        }
        LessForCharPointerOfGivenLength(long _length):length(_length)
        {
            
        }
        bool operator() (const char* i,const char* j) const
        {
            if(strncmp(i,j,length)<0)
            {
                return(true);
            }
            else
            {
                return(false);
            }
        }
        long length;
    };

Some data:
1
2
3
4
5
6
7
8
9
10
11
12
string s1("LMNRQABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO");
string s2("ABCDLMNRQAGHIJKLMNBCNRQLMNIPQLMINOPQLRPGHIJKLMNGJJGHIJKLMNNO");
string s3("MNBCNRQLMNIPQLMINOLMNRQABCNRQLMNIPQLMINOPQLRPGJ");
//many more strings with thousands of characters in some
long length=20;
multimap<const char*, long, LessForCharPointerOfGivenLength>charPointerMappedLocations(length);
//do this for each string such as s1;
stringVal=s1;
        for(long chLoc = 0;chLoc + length < stringVal.size();chLoc++)
        {
              charPointerMappedLocations.push_back(make_pair(s1.c_str()+chLoc,chLoc));
        }

This algorithm takes a long time because I have a lot of data going into it, and it actually stores more complex data than a long type which also takes some time to crunch.
s1.c_str()+0 would be an example of a first pointer.
s2.c_str()+0 would an example of another first pointer.
s2.c_str()+2 would an example of another general pointer.
All of my pointers all somewhere in the map as key values though.
What I want to do is store the sorted list for next execution.
The problem is that the next run won't necessarily give me the same pointers to start with.
Does this make sense now?

Last edited on
when I dump your output, I get this:

ABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   5
BCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   6
CNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   7
INOPQLRPGJJJJKILLMMNO   19
IPQLMINOPQLRPGJJJJKILLMMNO   14
LMINOPQLRPGJJJJKILLMMNO   17
LMNIPQLMINOPQLRPGJJJJKILLMMNO   11
LMNRQABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   0
MINOPQLRPGJJJJKILLMMNO   18
MNIPQLMINOPQLRPGJJJJKILLMMNO   12
MNRQABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   1
NIPQLMINOPQLRPGJJJJKILLMMNO   13
NRQABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   2
NRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   8
PQLMINOPQLRPGJJJJKILLMMNO   15
QABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   4
QLMINOPQLRPGJJJJKILLMMNO   16
QLMNIPQLMINOPQLRPGJJJJKILLMMNO   10
RQABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO   3
RQLMNIPQLMINOPQLRPGJJJJKILLMMNO   9


is that really the output you want without running the sort the second time, or do you only care about the order of the strings?

I'm not sure why you are storing the offset as the value for your multimap.


Well, the key is that the non-test-example list I'm going to be sorting for real is over 20GB.
The time consumed in the sort of that large of a data set is why I don't want to sort the second time.
And, ideally, I only care about the order of the strings.
However, I don't have enough room in memory to store all of the strings separately.
Note that the second string in the dump above is actually a substring already in memory within the first string in the dump above. I'm only able to fit the 20 GB of data while storing the first string and pointers within it, so practically, I'm not concerned with the order of strings as much as the order of pointers to the strings. With regard to the value being stored with it, that is not so important, as in this case, it could in principle be calculated from the pointer value and the first pointer value within the string it came from. However, at the moment, i don't know how to get the first pointer value within the string it came from unless stored it. Anyway, does that make it more clear as to why I'm looking to allocate with the same pointer values?

but are you trying to sort all 20GB of strings all at once or you can take s1, s2, and s3 (and of course, their respective substrings) separately?

if you need it all at once, I would consider doing it in a SQL database for the first time (to let the server deal with the memory swapping)

if you want to do it in C++, but don't have enough memory (including virtual memory), you have a memory problem first and a speed problem second

edit: if your dataset is too big to fit into virtual memory, you need to consider

http://en.wikipedia.org/wiki/Hadoop
http://en.wikipedia.org/wiki/MapReduce
Last edited on
Topic archived. No new replies allowed.