Hi,
I've got a situation where I have a huge amount of memory on whose specific pointer values I'm processing with a time consuming algorithm, basically performing a custom sorting of them. Ideally, I would like to be able to save the results of the sort so that I could reuse the results more quickly the next time I run my program. However, what I don't know how to do is make sure that I allocated the exact same pointer values the next time I run the program. Does that make any sense? Does anyone have any suggestions for me?
Thanks,
Sean
A better approach would be to index these items somehow. Like give them some kind of ID number. Then store those ID numbers in the file, rather than store pointer values.
say you have n objects that you want sorted from small to large
1. allocate an data[n] of objects and load these objects in
2. allocate unsigned idx[n], initialized from 0 to n-1
3. when "sorting", don't sort data[] directly; rather, sort idx[n] (so you have 1 level of indirection)
4. when done sorting, idx[0] should be the idx to the smallest entry in data[]
5. write data[] to disk in one big chunk
6. write idx[] to disk in one big chunk
7. next time, read data[] and idx[] from disk in big chunks
Storing pointers doesn't make any sense, because a pointer is valid only within the same process instance that allocated the block it points to. Pointers are not memory addresses. The OS does the translation from pointers to addresses, and after the process quits, that mapping is lost. When you run it once again it can be different. Therefore you can have even two applications sharing exactly the same pointer, but each of these pointers would point to a different physical memory location.
that's why Disch and I advocate the use of indexes which remain valid, as long as the original array of objects remain in the same order - indexes are almost like pointers that you maintain on your own
Okay, I think I get it.
It seems I can't find a way to reproduce the pointers in memory, so I have to store the offsets from the starting pointer in memory, and then the next time i allocate, I can recreate the values with those stored offsets and the new starting pointer. That's only one level of indirection. I think I can handle that. Thanks!
Sean
however, it's harder to do than using indexes, and may not yield that many benefits these days for most applications since we have tons of memory - indexes is so easy to do that you can code it up in less time than it takes to learn how to do memory mapped files correctly - gl
Ok, it turns out that it's not so simple to do an index. I have multiple points of origin, i.e. more than one starting pointer, yet, all pointers are sorted. Thus, I'd have to do quite a bit of conditional math to do offsets. I suppose that overhead wouldn't be horrible, but it might be a real pain to implement. Am I making any sense? Is there still a simple way to index for this type of situation? I suppose maybe I could somehow have a pair structure with which starting point as one of the terms, but at least at the moment I'm still a bit confused about how that might work out. Anyway, this memory-mapped-file idea might be something with potential. I'll definitely have to check that out.
an index is a pointer - it's one that you maintain yourself and instead of an absolute address, it's an offset from the head of your data block
if you need to sort separately by lastname and by zip-code, you maintain two different blocks of indexes
if you need to sort by both lastname and zip-code, your comparison operator would check on zip-code if and only if the lastnames are the same
understanding how to do a memory-mapped-file is much harder than building your own index, but you can try if you like
the only time you cannot use an index is, if your records are variable length, but even then, there are ways to remap your records so they become fixed length
Let me be specific about what type of pointers I'm sorting.
I have a list of char* C strings, each of which has it's own first position and many pointers adjacent to them along those C strings. I then make a map<char*,MyInfoClass,MySortFunction>
with those pointers, the first positions and many of the other positions.
Thus, after the sort is done by inserting all the values in the map, I have a list of values that are offset from different first char* pointer positions. Does that make more sense?
sorry, but you are still being unclear - as a software engineer or as a project manager, I would find it difficult to write a design specification based on what you have just described
without worries about implementation, describe a specific example of what you are trying to do - give the input and describe what the output should look like to satisfy your requirements
in other words, if I describe an implementation that can turn your input into your output, then I have done my job
I seriously doubt there is a case where pointers would work, but indexes wouldn't...
string s1("LMNRQABCNRQLMNIPQLMINOPQLRPGJJJJKILLMMNO");
string s2("ABCDLMNRQAGHIJKLMNBCNRQLMNIPQLMINOPQLRPGHIJKLMNGJJGHIJKLMNNO");
string s3("MNBCNRQLMNIPQLMINOLMNRQABCNRQLMNIPQLMINOPQLRPGJ");
//many more strings with thousands of characters in some
long length=20;
multimap<constchar*, long, LessForCharPointerOfGivenLength>charPointerMappedLocations(length);
//do this for each string such as s1;
stringVal=s1;
for(long chLoc = 0;chLoc + length < stringVal.size();chLoc++)
{
charPointerMappedLocations.push_back(make_pair(s1.c_str()+chLoc,chLoc));
}
This algorithm takes a long time because I have a lot of data going into it, and it actually stores more complex data than a long type which also takes some time to crunch.
s1.c_str()+0 would be an example of a first pointer.
s2.c_str()+0 would an example of another first pointer.
s2.c_str()+2 would an example of another general pointer.
All of my pointers all somewhere in the map as key values though.
What I want to do is store the sorted list for next execution.
The problem is that the next run won't necessarily give me the same pointers to start with.
Does this make sense now?
Well, the key is that the non-test-example list I'm going to be sorting for real is over 20GB.
The time consumed in the sort of that large of a data set is why I don't want to sort the second time.
And, ideally, I only care about the order of the strings.
However, I don't have enough room in memory to store all of the strings separately.
Note that the second string in the dump above is actually a substring already in memory within the first string in the dump above. I'm only able to fit the 20 GB of data while storing the first string and pointers within it, so practically, I'm not concerned with the order of strings as much as the order of pointers to the strings. With regard to the value being stored with it, that is not so important, as in this case, it could in principle be calculated from the pointer value and the first pointer value within the string it came from. However, at the moment, i don't know how to get the first pointer value within the string it came from unless stored it. Anyway, does that make it more clear as to why I'm looking to allocate with the same pointer values?