I want to write such a function, which depends on the encodingType (We just consider the int: 0, 1, 2, 3) cast the base_pointer to different uint pointer. Then I can do some pointer arithmetic (My Aim).
Is that way possible?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
void cast(int encodingType, uint8_t *base_pointe) {
switch (encodingType) {
case 0:
uint8_t* ptr = reinterpret_cast<uint8_t *>(base_pointe);
break;
case 1:
uint16_t* ptr = reinterpret_cast<uint16_t*>(base_pointe);
break;
case 2:
uint32_t* ptr = reinterpret_cast<uint32_t*>(base_pointe);
break;
case 3:
uint64_t* ptr = reinterpret_cast<uint64_t*>(base_pointe);
break;
}
/// then use ptr like ptr + 1 ...
/// ...
/// ...
} /// End of this function
It's not possible to do exactly that since ptr needs to have a single definite type, not four different types.
If I knew exactly what you want to do with it I could tell you the best thing to do, but all I can go on is a vaque mention of "pointer arithmetic". So, guessing, one possible solution is to not cast at all and insteaed use a "size" variable like so:
1 2 3 4 5 6 7
size = 1 << encodingType; // 0 <= encodingType <= 3
// Using a fake "uint32_t*" type (but it's actually uint8_t*).
ptr += size; // moving by size bytes
// And of course I have no idea what you want to do to do with it.
Thank you dutch.
I think in my case I still have to continue to use reinterpret_cast. The aim is to use the __m256i register and the SIMD/AVX instructions. (And I thought about the fake type and bit shifting, which can cost a bit of performance, which I do not want to.)
My current solution is easy and dummy: write 4 different function. They are working well with good performace. But they are very similar (expect of the pointer types). I want to make the thing simple, to write these 4 function into a compact one.
Do you think it is still impossible? If true, I can ignore this try, because I am just want to reduce the Lines of Code and KEEP current performance. (I think my current 4 functions are very well optimized by compiler and work well.)
I still don't know the details of what you are doing. You should post the code that uses the pointer so we can see what can be done. My idea is to only use the uint8_t pointer and multiply by the size of the object wherever necessary. JLB's idea is a variant type, which is potentially a more general, and very C++, solution.
template <>
void count<EncodingType::byte1, CompareType::EQUAL>(const Predicate &p,
const uint32_t dataLength,
uint8_t *column_base_pointer,
std::vector<uint32_t> &col_count) {
SMA* sma_ptr = reinterpret_cast<SMA*>(column_base_pointer);
constauto[min, max] = sma_ptr->getSMA_min_max();
const uint64_t value = p.val;
/// 2 SMA + Padding = 32B
column_base_pointer += 32;
const uint8_t differ = value - min;
if (col_count.size() == 0) {
/// Case for first scan => full loop all data needed
/// Initial match with val not in range => then skip
if (value < min || value > max) return;
#ifdef SCALAR
for (size_t in = 0 ; in < dataLength; in++) {
if (*(column_base_pointer + in) == differ) {
col_count.push_back(in);
}
}
#endif
} else {
/// Case for not the first scan (in col_count vector are some matches already)
if (value < min || value > max) {
/// Initial match with val not in range => all invaild
std::fill(col_count.begin(), col_count.end(), UINT32_MAX);
} else {
for (size_t in = 0; in < col_count.size(); ++in) {
if (col_count[in] != UINT32_MAX /* not UINT32_MAX: there is(was) a match */
&& *(column_base_pointer + (size_t)col_count[in]) != differ) {
/// Not equal, then marked as INVALID
col_count[in] = UINT32_MAX;
}
}
}
}
}
template <>
void count<EncodingType::byte2, CompareType::EQUAL>(const Predicate &p,
const uint32_t dataLength,
uint8_t *column_base_pointer,
std::vector<uint32_t> &col_count) {
SMA* sma_ptr = reinterpret_cast<SMA*>(column_base_pointer);
constauto[min, max] = sma_ptr->getSMA_min_max();
column_base_pointer += 32;
uint16_t* ptr = reinterpret_cast<uint16_t*>(column_base_pointer);
const uint64_t value = p.val;
uint16_t differ = value - min;
if (col_count.size() == 0) {
/// Case for first scan => full loop all data needed
/// Initial match with val not in range => then skip
if (value < min || value > max) return;
#ifdef SCALAR
/// in < dataLength >> 1 == in < dataLength / 2
/// Reason: Each tuple has 2B uint16_t
for (size_t in = 0 ; in < dataLength >> 1; in++) {
if (*(ptr + in) == differ) {
col_count.push_back(in);
}
}
#endif
} else {
/// Case for not the first scan (in col_count vector are some matches already)
if (value < min || value > max) {
/// Initial match with val not in range => all invaild
std::fill(col_count.begin(), col_count.end(), UINT32_MAX);
} else {
for (size_t in = 0; in < col_count.size(); ++in) {
if (col_count[in] != UINT32_MAX /* not UINT32_MAX: there is(was) a match */
&& *(ptr + (size_t)col_count[in]) != differ) {
/// Not equal, then marked as INVALID
col_count[in] = UINT32_MAX;
}
}
}
}
}
I don't think my idea works very well after all. It's easy enough to point to the beginning of an n-byte object at some position in the array. But to do the comparisons requires bytewise comparisons in a loop, which is not great for performance. And then in selecting the largest size for differ (uint64_t), the endianness comes into the bytewise equality comparison. Little-endian (like our intel/amd cpus) works well, since the "little end" comes first. But big-endian would need an offset from the start of differ. Doable but fiddly.
And I'm not sure what to do about the EncodingType::byte1/byte2 difference. What exactly is that for?
Anyway, this is incomplete. I left EncodingType::byte1 at the top. It won't work on big-endian. And obviously there could be other things wrong with it. I didn't even try compiling this.
If all you're trying to do is save a little code repetition, I don't think this is the way to do it.