Faster way to assign values to Class variables?

Working on an arduino, I would like to be able to write the fastest code with the smallest memory foot print so I decided that it would be better to set the member values of an object by getting a pointer to the first member and using pointer arithmetic to iterate one byte at a time and write over the memory values stored in that address. The class object has a bunch of uint16_t declared at the very top and they seem to be in contiguous address with no padding. The arduino Wire library does a byte by byte read on the serial line. This is my code on the receive side:

1
2
3
4
5
6
  uint8_t * p= reinterpret_cast<uint8_t*>(&SData1.packing_list_bool); //Directly write to class object memory
    for (int i=0; i<howMany; i++) {
        *p=Wire.read();
        p++;
    }
 


This is my code on the send side which reads directly from memory given a pointer offset.

1
2
3
4
5
6
7
8
void transmit_package(uint8_t *offset, uint8_t num_ints){
  Wire.beginTransmission(9); // transmit to device #9
  for (int i=0; i < (num_ints *2) ; i++) {
      Wire.write(*offset);
      offset++;
  }
  Wire.endTransmission();    // stop transmitting
} 


Code seems to work.

The alternative is to get each byte and shift the second byte of a pair 8 bits to the left and add the byte pair to decode a uint16_t and then assign that to the uint16_t class member variable.

Is there any downside to using the above method. I think the upside is speed but I am not sure of that either.

Thanks,
Chris
Last edited on
"Seems to work" will only work while this is true.
https://en.cppreference.com/w/cpp/types/is_pod

What you should have are a couple of methods in the class to serialise and deserialise the class instance data to a given buffer.

If that can be achieved with a simple memcpy, fine.
If it can't, then it can do the "right thing" and your other code doesn't mysteriously fail in odd ways.

1
2
3
4
5
6
7
8
9
10
size_t serialise(unsigned char *buff, size_t bufsize) {
  // throw exception if bufsize is smaller than expected?
  size_t i = 0;
  memcpy(&buff[i], &member1, sizeof(member1)); i += sizeof(member1);
  // etc
  return i;
}
void deserialise(unsigned char *buff, size_t bufsize) {
  // ditto above, but copying from buff to members
}


So transmit would be
1
2
3
4
5
6
7
unsigned char buff[32]; // from your other thread
size_t n = SData1.serialise(buff,sizeof(buff));
Wire.beginTransmission(9); // transmit to device #9
for (size_t i=0; i < n ; i++) {
    Wire.write(buff[i]);
}
Wire.endTransmission();    // stop transmitting 


I think I see what you mean about serializing the data on the transmit side. However, Wire.write() actually serializes the data into its own internal 32 byte buffer. Wire.endTransmission() then initiates the transmission of the buffer byte by byte. So I was trying to avoid an intermediate copy by writing directly to the Wire.write() buffer, the contents of a memory addresss. Basically, in your suggestion, buff would be an intermediate copy and itself will occupy memory. I actually considered this solution but decided against it because I didn't see the point of an intermediate buffer or copy assignment when I could directly write to the memory address on the receive side or read from memory address on the transmit side. What I seem to be ending up with is a perfect copy of X bytes from a class object memory range in the transmit side to a class object memory range on the receive side. Kind of like a memcpy but through I2c bus. If the transmit class and receive class object has the same data types in the memory region, I don't even have to worry about reassembling the receive bytes into data types. The uint16_t's on the receive side work the same as the uint16_t on the transmit side. Some implementations of i2c bus transfer I saw, required bit shifting and addition for example on the receive side to assemble a uint16_t. It's not necessary using this technique. However, while it seems easy, I'm wonder if I am introducing any issues. What is the price of this convenience?
I would like to be able to write the fastest code with the smallest memory foot print
The don't write the data 1 byte at a time. Use Wire.write(address, bytes) and Wire.readBytes(address, num).

I agree with salem c that you should write a pair of matching functions to do this, but if you want to maximize performance, you might serialize/deserialize directly to the stream:
1
2
3
4
5
6
7
size_t serialize(Stream &out) {
    return out.write((char*)this, sizeof(*this));
}

size_t deserialize(Stream &in) {
   return in.readBytes((char*)this, sizeof(*this));
}

when you need speed, I often simply bypass serialization entirely by using POD struct/classes**. For some compilers you have to repack that to 1 byte alignment, don't forget or you can get into a mess, but directly dumping works great with .read and .write

** this has some gotchas. member functions are FINE, they do not affect the size of the object. But virtual members DO change it, it adds a pointer to the size for the vtable. static can do weird things (I honestly don't know how it works, but it seems to break up the solid block of memory of the POD struct because the static variable isnt in the block). And you cannot have pointers either, really only arrays and simple types (char, int, double). As soon as you have an object (vector, string, your own, whatever) or a pointer, you can't do it safely anymore. However here again, you can to an extent. The compiler does this stuff top-down. If you need a vector, you can put it as the last member and do a read() up to the vector, resize the vector, and do a second read into &vec[0] directly, not quite as good as 1 read but certainly very good. this happens a lot in messaging where you get a size of an upcoming block and then the block, but the size is not fixed and the max size is too big to make an array for each message.

there isn't really a 'pretty way' to do POD structs. Their efficiency is tied to the limitations you imposed and the limitations are a pain. So you have a choice of serializing more useful objects into simple ones to write, and reverse for read, or you can just work directly with the simplified objects and forget having more powerful designs (my choice for speed: you can work around the limited objects with some smoke and mirrors).
Last edited on
> I'm wonder if I am introducing any issues. What is the price of this convenience?
TBH, it smells of premature optimisation.

Your first order of business is writing clean reliable code with the most chance of success with the least amount of effort.

You've already said I2C is slow, so one extra memcpy for the processor just isn't going to be detectable in the big scheme of things.

Having something working is your baseline for measuring performance.
Maybe it's already good enough at that point, in which case you saved yourself a whole lot of effort and frustration.

But if it isn't good enough, you then profile to find out where the real hot spots are (they're seldom where you expect). You don't want to be optimising the wrong things.

So, having found a hotspot, you come up with a plan to improve things and test again (you do this on a branch in your source control system).
If it doesn't work, you can just throw the branch away.
If it does work, you can just merge the branch back to your master branch.
pass them in a container and use the container directly.
but you won't notice the speed, it just saves a lot of senseless typing.
whats with the idiotic links? (not exactly spam, but screwy things to link with code).
Last edited on
I am now leaning more towards using a temporary buffer to hold the sequence of bytes. I thought i was limited to one byte at a time when writing to the i2c buffer so the iteration was necessary. If I can write a sequence of bytes in a one off instruction, that would certainly be much faster. The way the program is supposed to work is that the slave sends a order_list_bool ( list of booleans the slave cares about at the moment) and a order_list_data ( numerical values ) with both lists in the form of a uint16_t. The master responds with a packing_list_bool which will have bits set for the particular bools requested, and a packing_list_data which will have bits set for the corresponding variable that has updated data sent next after the list. Basically, if at least 1 bit is set in the packing_list_data, then there is 1 uint16_t that follows the packing_list_data. I was thinking of checking the status of each bit in packing_list_data between Wire.beginTransmission() and Wire.endTransmission(), using pointer math to jump over memory blocks, and reading only from specific offsets and writing to the i2c buffer directly. Now it makes more sense to build a buffer containing packing_list_bool, packing_list_data, and the rest of the numerical data packaged in uint16_t each. The sequence will be written with one Write(address,bytes) instruction. Then the i2c bus is released and the slave sorts the data into their respective bins. The I2c bus will only be held by the slave for as long as the byte transfer takes.
I have read a little about serializer libraries available out there but I don't see much benefit to using from a performance standpoint. Plus all the information to extract the data is already present in the byte sequences. I plan to write a serializer function that simply loads a series of uint16_t into a buffer array using the algorithm above and then a deserializer function that unloads the uint16_t from the buffer into specific variables at the slave side. I'll try not to do anything that messes up the object memory storage. Now, for the buffer, vector vs array.....
what functionality from <vector> would you use here?
the vector takes up a little bit more space. it also has to be coded just so to make it behave the way an array does already. Vectors are awesome, but if you are not going to do anything with it that an array cannot do, use array. If you will use some of its power, use the vector. There is also std array, valarray, and possibly other options. any of the objectified ones will use more memory (again, 10s of bytes, not much in the grand scheme) than a pure array, but they give you tools in exchange.

your hardware is kind of limited so don't hesitate to use simple tools where they are sufficient.
You're right. An array would work better. In fact, I just realized that in between data retrieval cycles from the PC to the master Arduino, I really don't need to preserve any local data except for the variables ( strings, ints and floats) that were directly queried from the PC. This is so that on the next data retrieval cycle, I can check if the value has changed and if not, there is no need to encode the data and send the data down the i2c bus again. This means, I really don't need a variable to hold the packing_list_bool, packing_list_data, order_list_bool, order_list_data. All I need is a 32 byte uint16_t array ( which will assure that the memory locations are contiguous) even if I modify the class to be non POD. The master requests the order_list_bool and order_list_data from the slave and writes it on buff[0] and buff[1]. Then it retrieves the data and sets the bits thereby changing the order_list_bool and order_list_data into packing_list_bool and packing_list_data. Actual numerical data is stored in the next index in the array and so forth. Keep track with a counter how many bytes have been written to the array. Then present the buffer with the number of bytes to Write(). So all I need in the class is a static 32 byte uint16_t array and a bunch of variables ( unique to each slave) to hold data for comparison in the next retrieval cycle.

I was trying to figure out how to make sure the buffer array is contiguous with and following the memory location of the packing_list_bool and pack_list_data, when it occured to me that they can be the first two elements.
Topic archived. No new replies allowed.