code unexplained behavior

Pages: 12
Hi guys.

I work with some exotic hw design [microblaze + ddr], which upon I build
sw which consists of many .cpp and .h files.

At some point i encounter unexplained behavior: let's say I have functions:
int fun_a (int swdevice) {
....
result = fun_b(swdevice);
....
}

At some point in runtime fun_a call fun_b :

result = fun_b(0):

when the function reach the last line, swdevice suddenly change to some value - 354. So fun_a now sees swdevice as 354.

I suspect it has something to do with stack memory (although i give it more than enough space). Another option is microblaze/ddr configurations..

In your opinion, what could be the reason for this oddity?


Last edited on
In your opinion, what could be the reason for this oddity?


Why don't you just show some code instead of asking for forensic investigation without evidence.

Nothing in the (extremely limited) snippet of code that you cite would suggest any problem with stack memory (though it might suggest accidentally overwriting some unexpected bit of memory - a completely different problem). Turn the debugger on.


result = fun_b(swdevice);
....
result = fun_b(0):
when the function reach the last line, swdevice suddenly change to some value - 354. So fun_a now sees swdevice as 354.
Your explanation of the problem makes zero sense, since you appear to have swdevice as 0, not +-354.


Last edited on
ok i see. I have the same code working on another hw platform, so i am not sure the problem is within the function code. Here it is anyway:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
S16BIT _DECL sitalRt_MessageLegality_Enable (	S16BIT swDevice,
												U16BIT wOwnAddressOrBroadcast,
												U16BIT wMessageDirection,
												U16BIT wSubaddress,
												U32BIT dwWordCountOrModeCodeMask)
{
	/// @pseudocode

	/// If any state irrelevancy or illegal input parameter or operation failure is identified:
	///		Return error.
	if (((S16BIT)0 > swDevice) || ((S16BIT)sitalMaximum_DEVICES <= swDevice))
	{
		return sitalReturnCode_INVALID_DEVICE_NUMBER;
	}

	const DeviceStateStructure* dsspDeviceState // A pointer to the state structure of given device.
								= &(s_dssaDevices[swDevice]);
	if ((sitalMode_RT != dsspDeviceState->wMode) && (sitalMode_RT_AND_MT != dsspDeviceState->wMode))
	{
		return sitalReturnCode_INVALID_MODE;
	}

	if ((sitalDeviceState_READY != dsspDeviceState->wCurrentState) && (sitalDeviceState_RUN != dsspDeviceState->wCurrentState))
	{
		return sitalReturnCode_INVALID_STATE;
	}

	BOOLEAN bIsLegalParameter; // A flag that says whether an input parameter is legal.
	switch (wOwnAddressOrBroadcast)
	{
	case sitalRtAddressType_BROADCAST:
	case sitalRtAddressType_OWN:
	case sitalRtAddressType_BOTH:
		bIsLegalParameter = TRUE;
		break;
	default:
		bIsLegalParameter = FALSE;
		break;
	}
	if (FALSE == bIsLegalParameter)
	{
		return sitalReturnCode_INVALID_PARAMETER;
	}

	switch (wMessageDirection)
	{
	case sitalMessageDirection_RX:
	case sitalMessageDirection_TX:
	case sitalMessageDirection_BOTH:
		bIsLegalParameter = TRUE;
		break;
	default:
		bIsLegalParameter = FALSE;
		break;
	}
	if (FALSE == bIsLegalParameter)
	{
		return sitalReturnCode_INVALID_PARAMETER;
	}

	if ((0U > wSubaddress) || ((rtSubaddress_BORDER <= wSubaddress) && (sitalRtSubaddress_ALL != wSubaddress)))
	{
		return sitalReturnCode_INVALID_PARAMETER;
	}

	/// Set loop delimiters for the direction, address type, and subaddress loops.
	U16BIT wDirectionLoopBase; // Base value of direction loop.
	U16BIT wDirectionLoopBorder; // Border value of direction loop.
	if (sitalMessageDirection_BOTH == wMessageDirection)
	{
		wDirectionLoopBase = 0U;
		wDirectionLoopBorder = messageDirection_BORDER;
	}
	else
	{
		wDirectionLoopBase = wMessageDirection;
		wDirectionLoopBorder = (wMessageDirection + (U16BIT)1U);
	}

	U16BIT wAddressLoopBase; // Base value of address loop.
	U16BIT wAddressLoopBorder; // Border value of address loop.
	if (sitalRtAddressType_BOTH == wOwnAddressOrBroadcast)
	{
		wAddressLoopBase = 0U;
		wAddressLoopBorder = rtAddressType_BORDER;
	}
	else
	{
		wAddressLoopBase = wOwnAddressOrBroadcast;
		wAddressLoopBorder = (wOwnAddressOrBroadcast + (U16BIT)1U);
	}

	U16BIT wSubaddressLoopBase; // Base value of subaddress loop.
	U16BIT wSubaddressLoopBorder; // Border value of subaddress loop.
	if (sitalRtSubaddress_ALL == wSubaddress)
	{
		wSubaddressLoopBase = 0U;
		wSubaddressLoopBorder = rtSubaddress_BORDER;
	}
	else
	{
		wSubaddressLoopBase = wSubaddress;
		wSubaddressLoopBorder = (wSubaddress + (U16BIT)1U);
	}

	/// Loop over all given combinations of direction, address type, and subaddress:
	///		Update each combination's illegalization specifications.
	for (S32BIT iDirection=(S32BIT)wDirectionLoopBase; iDirection<(S32BIT)wDirectionLoopBorder; iDirection++)
	{
		for (S32BIT iAddress=(S32BIT)wAddressLoopBase; iAddress<(S32BIT)wAddressLoopBorder; iAddress++)
		{
			for (S32BIT iSubaddress=(S32BIT)wSubaddressLoopBase; iSubaddress<(S32BIT)wSubaddressLoopBorder; iSubaddress++)
			{
				// Find the address of the entry of the target table that corresponds current combination.
				U16BIT wTableEntryAddress; // The address of an entry in the target table.
				wTableEntryAddress = ((U16BIT)rtAddressMap_COMMAND_ILLEGALIZATION_TABLE | (U16BIT)(iDirection << rtCommandIllegalizationTable_OFFSET_OF_DIRECTION) | (U16BIT)(iAddress << rtCommandIllegalizationTable_OFFSET_OF_ADDRESS_TYPE) | (U16BIT)(iSubaddress << rtCommandIllegalizationTable_OFFSET_OF_SUBADDRESS));

				// Calculate the new mask.
				U16BIT wMaskLowWord; // The value of the least significant word of the new mask.
				U16BIT wMaskHighWord; // The value of the most significant word of the new mask.
				U32BIT dwNewMask; // The value of the new illegalization mask.
				S16BIT swResult; // Result of operation or function call.
				swResult = sitalDevice_AccessMemory	(swDevice, sitalDeviceAccessOperation_Read, wTableEntryAddress, sitalDeviceMemorySection_Ram, 1U, &wMaskLowWord);
				if (sitalReturnCode_SUCCESS != swResult)
				{
					return swResult;
				}
				swResult = sitalDevice_AccessMemory	(swDevice, sitalDeviceAccessOperation_Read, (wTableEntryAddress + 1U), sitalDeviceMemorySection_Ram, 1U, &wMaskHighWord);
				if (sitalReturnCode_SUCCESS != swResult)
				{
					return swResult;
				}
				dwNewMask = wMaskHighWord;
				dwNewMask = ((dwNewMask << 16U) | wMaskLowWord);
				dwNewMask &= (~dwWordCountOrModeCodeMask);

				// Write the new mask in the entry of the target table that corresponds current combination.
				wMaskLowWord = (U16BIT)(dwNewMask & 0xFFFFU);
				wMaskHighWord = (U16BIT)(dwNewMask >> 16U);
				swResult = sitalDevice_AccessMemory	(swDevice, sitalDeviceAccessOperation_Write, wTableEntryAddress, sitalDeviceMemorySection_Ram, 1U, &wMaskLowWord);
				if (sitalReturnCode_SUCCESS != swResult)
				{
					return swResult;
				}
				swResult = sitalDevice_AccessMemory	(swDevice, sitalDeviceAccessOperation_Write, (wTableEntryAddress + 1U), sitalDeviceMemorySection_Ram, 1U, &wMaskHighWord);
				if (sitalReturnCode_SUCCESS != swResult)
				{
					return swResult;
				}
			}
		}
	}

	return sitalReturnCode_SUCCESS;
}




Nothing in the (extremely limited) snippet of code that you cite would suggest any problem with stack memory (though it might suggest accidentally overwriting some unexpected bit of memory - a completely different problem)>

To my acknowledge, the stack memory stores all of the functions local variables.

Turn the debugger on.

So after debug i notice that swDevice [of the calling function] pointer is changing right after exit the above function. Hence value of swDevice is changing accordingly.

Your explanation of the problem makes zero sense, since you appear to have swdevice as 0, not +-354.


That's what makes me confused 🤔

Hope it is more clear now.





you are not using a reference parameter.
so ... lets see if this is your problem..

1
2
3
4
5
6
7
8
9
10
11
12
13
void foo( int x)
{
  x = 42;
  cout << x; 
}
int main()
{
 int y = 1234;
 cout << y;
 foo(y);
 cout << y;
}
the output would be 1234, 42, and 1234 again, because main's y is not changed by foo. its local copy.
add a & to the parameter, and then its
1234,42,42 because that change makes foo's x a reference to y, not a copy of it, so changes propagate back to the caller.

Is this your problem? A common confusion with pointers is that their DATA is by reference but the pointer itself is NOT.
that is:
1
2
3
4
5
6
7
8
9
void foo(int *ip)
{
   //here ip is a COPY of the passed in pointer. 
//but the data pointed to by IP is at its location. 
//so..
 ip[42] = 11; //the caller's pointed to data has changed. 
//however
ip = new int(31415); //this is a local change.  the callers pointer is unchanged. 
}


you can pass a pointer by reference.
void foo (int *& ip)
Last edited on
To my acknowledge[sic], the stack memory stores all of the functions local variables.

I think you misunderstood the point of the comment. You aren't overflowing the stack.


So after debug

Why don't you STEP through this routine, checking the values of variables and seeing at which of many possible points it exits.


swDevice [of the calling function] pointer is changing right after exit the above function

Absolutely no idea what you mean by that.


wTableEntryAddress appears to be 16-bit (as far as I can judge from your code).
wTableEntryAddress + 1U would (probably) be 32-bit.
This might - or might not - make a difference to the second call to sitalDevice_AccessMemory(...) depending on the hardware and the order that it stores bytes, but you haven't given either the code or the interface for that function.
Several other function calls might suffer from the mix of 16-bit and 32-bit arguments.



I'm not convinced that it is necessarily the function that you've posted that is at fault.
Last edited on

lastchance (6921) ---> Yes that is good direction. However that is not where the problem is. I mentioned that the code is working properly on another hw platform.

swDevice [of the calling function] pointer is changing right after exit the above function

Absolutely no idea what you mean by that.


I mean that- If you take my initial example - when fun_b ends, i noticed that swdevice pointer in fun_a has changed, and so its value. so fun_a now sees swdevice not as 0.

I mean that- If you take my initial example - when fun_b ends, i noticed that swdevice pointer in fun_a has changed, and so its value. so fun_a now sees swdevice not as 0.


We can't see anything in what you have posted about an swdevice pointer, and we can't see your calling function, and we can't see the output of your debugger ... so unless you can provide some minimum compileable example that we can test ourselves it would be very difficult to debug your code.
What's the definition of fun_b?

When the code works/doesn't are you using the same compiler with different options or different compilers? Have you looked at the generated code for the version that doesn't work to see if the generated code is correct.

In fun_a() are you using swdevice after the call to fun_b ? If not and as swdevice is passed by value and hence doesn't exist after fun_a returns the compiler could be doing some clever memory optimisation by re-using the memory used for swdevice inside fun_a for something else. As this doesn't affect operation this is an allowed optimisation.
Line 61: (0U > wSubaddress) is always false. An unsigned is never less than 0.

Line 123: Each time through the loop. swResult is technically destroyed and recreated. If it has no constructor or destructor then no code executes, but between lines 115 and 123, the program is free to use the memory occupied swResult for other things. Could this be the cause of your troubles?
I still don't understand, where do you see the problem you're talking about?

Is it only in the debugger? Then that is not necessarily a problem, at least not if you have compiler optimizations enabled.
Last edited on
i noticed that swdevice pointer in fun_a has changed
Since 'swdevice' is not a pointer what is it that has actually changed?

You pass a copy of the value that is not supposed to change outside (within the caller context)
I mentioned that the code is working properly on another hw platform.

Lets assume that your program has undefined behaviour (UB) somewhere.
On one platform the UB produces all the trouble.
On the other platform the UB output can be indistinguishable from "working properly".

"Works" or you are just unlucky with UB?
I folks.

I am mot using optimizations of any kind.
And the problem arised only in runtime.

i noticed that swdevice pointer in fun_a has changed
Since 'swdevice' is not a pointer what is it that has actually changed?


In the debug i can see the actual memory address that each local variables is stored in. For example: 0x80156d38 for fun_a. This address is somehow changing after fun_b finished..

Line 123: Each time through the loop. swResult is technically destroyed and recreated. If it has no constructor or destructor then no code executes, but between lines 115 and 123, the program is free to use the memory occupied swResult for other things. Could this be the cause of your troubles?


Here we talking about swdevice, nor swResult

You pass a copy of the value that is not supposed to change outside (within the caller context)


Exactly, it is not suppose to happen


Last edited on
Jonathan100 wrote:
Here we talking about swdevice, nor swResult

Yes, we know that you are talking about swDevice.

However, swDevice is NOT a pointer (it's a S16BIT, whatever that is), yet you keep saying
swDevice [of the calling function] pointer


You aren't showing us anything like enough code to understand what you are trying to say, let alone debug.
Last edited on
This address is somehow changing after fun_b finished..
After a function is finished the memory of the local variables is (kind of) freed and can be reused by other functions such as streaming to cout.
S16BIT _DECL sitalRt_MessageLegality_Enable ( S16BIT swDevice,
U16BIT wOwnAddressOrBroadcast,
U16BIT wMessageDirection,
U16BIT wSubaddress,
U32BIT dwWordCountOrModeCodeMask)
{
/// @pseudocode

/// If any state irrelevancy or illegal input parameter or operation failure is identified:
/// Return error.
if (((S16BIT)0 > swDevice) || ((S16BIT)sitalMaximum_DEVICES <= swDevice))
{
return sitalReturnCode_INVALID_DEVICE_NUMBER;
}

const DeviceStateStructure* dsspDeviceState // A pointer to the state structure of given device.
= &(s_dssaDevices[swDevice]);
if ((sitalMode_RT != dsspDeviceState->wMode) && (sitalMode_RT_AND_MT != dsspDeviceState->wMode))
{
return sitalReturnCode_INVALID_MODE;
}

if ((sitalDeviceState_READY != dsspDeviceState->wCurrentState) && (sitalDeviceState_RUN != dsspDeviceState->wCurrentState))
{
return sitalReturnCode_INVALID_STATE;
}

BOOLEAN bIsLegalParameter; // A flag that says whether an input parameter is legal.
switch (wOwnAddressOrBroadcast)
{
case sitalRtAddressType_BROADCAST:
case sitalRtAddressType_OWN:
case sitalRtAddressType_BOTH:
bIsLegalParameter = TRUE;
break;
default:
bIsLegalParameter = FALSE;
break;
}
if (FALSE == bIsLegalParameter)
{
return sitalReturnCode_INVALID_PARAMETER;
}

switch (wMessageDirection)
{
case sitalMessageDirection_RX:
case sitalMessageDirection_TX:
case sitalMessageDirection_BOTH:
bIsLegalParameter = TRUE;
break;
default:
bIsLegalParameter = FALSE;
break;
}
if (FALSE == bIsLegalParameter)
{
return sitalReturnCode_INVALID_PARAMETER;
}

if ((0U > wSubaddress) || ((rtSubaddress_BORDER <= wSubaddress) && (sitalRtSubaddress_ALL != wSubaddress)))
{
return sitalReturnCode_INVALID_PARAMETER;
}

/// Set loop delimiters for the direction, address type, and subaddress loops.
U16BIT wDirectionLoopBase; // Base value of direction loop.
U16BIT wDirectionLoopBorder; // Border value of direction loop.
if (sitalMessageDirection_BOTH == wMessageDirection)
{
wDirectionLoopBase = 0U;
wDirectionLoopBorder = messageDirection_BORDER;
}
else
{
wDirectionLoopBase = wMessageDirection;
wDirectionLoopBorder = (wMessageDirection + (U16BIT)1U);
}

U16BIT wAddressLoopBase; // Base value of address loop.
U16BIT wAddressLoopBorder; // Border value of address loop.
if (sitalRtAddressType_BOTH == wOwnAddressOrBroadcast)
{
wAddressLoopBase = 0U;
wAddressLoopBorder = rtAddressType_BORDER;
}
else
{
wAddressLoopBase = wOwnAddressOrBroadcast;
wAddressLoopBorder = (wOwnAddressOrBroadcast + (U16BIT)1U);
}

U16BIT wSubaddressLoopBase; // Base value of subaddress loop.
U16BIT wSubaddressLoopBorder; // Border value of subaddress loop.
if (sitalRtSubaddress_ALL == wSubaddress)
{
wSubaddressLoopBase = 0U;
wSubaddressLoopBorder = rtSubaddress_BORDER;
}
else
{
wSubaddressLoopBase = wSubaddress;
wSubaddressLoopBorder = (wSubaddress + (U16BIT)1U);
}

/// Loop over all given combinations of direction, address type, and subaddress:
/// Update each combination's illegalization specifications.
for (S32BIT iDirection=(S32BIT)wDirectionLoopBase; iDirection<(S32BIT)wDirectionLoopBorder; iDirection++)
{
for (S32BIT iAddress=(S32BIT)wAddressLoopBase; iAddress<(S32BIT)wAddressLoopBorder; iAddress++)
{
for (S32BIT iSubaddress=(S32BIT)wSubaddressLoopBase; iSubaddress<(S32BIT)wSubaddressLoopBorder; iSubaddress++)
{
// Find the address of the entry of the target table that corresponds current combination.
U16BIT wTableEntryAddress; // The address of an entry in the target table.
wTableEntryAddress = ((U16BIT)rtAddressMap_COMMAND_ILLEGALIZATION_TABLE | (U16BIT)(iDirection << rtCommandIllegalizationTable_OFFSET_OF_DIRECTION) | (U16BIT)(iAddress << rtCommandIllegalizationTable_OFFSET_OF_ADDRESS_TYPE) | (U16BIT)(iSubaddress << rtCommandIllegalizationTable_OFFSET_OF_SUBADDRESS));

// Calculate the new mask.
U16BIT wMaskLowWord; // The value of the least significant word of the new mask.
U16BIT wMaskHighWord; // The value of the most significant word of the new mask.
U32BIT dwNewMask; // The value of the new illegalization mask.
S16BIT swResult; // Result of operation or function call.
swResult = sitalDevice_AccessMemory (swDevice, sitalDeviceAccessOperation_Read, wTableEntryAddress, sitalDeviceMemorySection_Ram, 1U, &wMaskLowWord);
if (sitalReturnCode_SUCCESS != swResult)
{
return swResult;
}
swResult = sitalDevice_AccessMemory (swDevice, sitalDeviceAccessOperation_Read, (wTableEntryAddress + 1U), sitalDeviceMemorySection_Ram, 1U, &wMaskHighWord);
if (sitalReturnCode_SUCCESS != swResult)
{
return swResult;
}
dwNewMask = wMaskHighWord;
dwNewMask = ((dwNewMask << 16U) | wMaskLowWord);
dwNewMask &= (~dwWordCountOrModeCodeMask);

// Write the new mask in the entry of the target table that corresponds current combination.
wMaskLowWord = (U16BIT)(dwNewMask & 0xFFFFU);
wMaskHighWord = (U16BIT)(dwNewMask >> 16U);
swResult = sitalDevice_AccessMemory (swDevice, sitalDeviceAccessOperation_Write, wTableEntryAddress, sitalDeviceMemorySection_Ram, 1U, &wMaskLowWord);
if (sitalReturnCode_SUCCESS != swResult)
{
return swResult;
}
swResult = sitalDevice_AccessMemory (swDevice, sitalDeviceAccessOperation_Write, (wTableEntryAddress + 1U), sitalDeviceMemorySection_Ram, 1U, &wMaskHighWord);
if (sitalReturnCode_SUCCESS != swResult)
{
return swResult;
}
}
}
}

return sitalReturnCode_SUCCESS;
}
Hi folks, thanks for your comments.

I was introduced by one of my colleague to the idea of: exception handling.

I think the code may be working on Zynq because Zynq somehow manage to handle error exceptions in it, unlike microblaze.

Do you think that the hw platform expose some of the errors in my code?
What there is in my code that can create elusive issues like that?

Do you have some 'exception handling' code error example [I am asking because I don't understand how to sort such a problems] ?
Last edited on
You do not provide enough code to reproduce the problem. So the problem might not be where you think it is. Maybe the stack is corrupted before this.

Though when pointer come into play it is always dangerous.
const DeviceStateStructure* dsspDeviceState // A pointer to the state structure of given device.
= &(s_dssaDevices[swDevice]);
Is the value of dsspDeviceState correct? It depends on the memory of s_dssaDevices and whether swDevice is out of bounds or not.

I think the code may be working on Zynq because Zynq somehow manage to handle error exceptions in it, unlike microblaze.
I don't think exception is the point here. You most likely have undefined behavior which results in different behavior on different hardware.
I was introduced by one of my colleague to the idea of: exception handling.

Some library functions do signal errors by throwing an exception.

The throw does instantly return from the function. The thrown object has info about the error.

You could explicitly throw in your code.
You could explicitly catch in your code.

If function calls another function and receives an exception object but does not catch it, i.e. study and handle exceptions, then the calling function does quit too and pass the exception "up".

If your program does not catch at any level, then program quits and OS reports the error (as in "segfault").
Pages: 12