Simplify foc fpu math #372

kubark42 · 2021-11-04T00:56:35Z

This simplifies math, eliminating FPU operations --and in particular FPU divisions-- on the critical ADC interrupt handler pathway.

There should be no effect aside from more compact, more streamlined assembly code.

Before

   text	   data	    bss	    dec	    hex	filename
 300896	   3444	 144964	 449304	  6db18	build/BLDC_4_ChibiOS.elf

After

   text	   data	    bss	    dec	    hex	filename
 300704	   3444	 144964	 449112	  6da58	build/BLDC_4_ChibiOS.elf

Example output

Note there is one less division and one less operation; there's no access of the program counter; and 1.5 is able to be loaded by a vmov.f32 instruction (which likely helps the ART accelerator optimize predictive caching at run-time). It's not much of a difference, but it's free for the taking.

This also isn't the complete story since voltageNormalize is reused further down in the code, saving at least two additional divisions. Divisions take 14 cycles per operation, so this alone is a net savings of >50 cycles per ADC interrupt.

Before

	state_m->mod_d = state_m->vd / ((2.0 / 3.0) * state_m->v_bus);
 8029670:	ed94 6a1f 	vldr	s12, [r4, #124]	; 0x7c
 8029674:	eddf 6ae2 	vldr	s13, [pc, #904]	; 8029a00 <mcpwm_foc_adc_int_handler+0x12b0>
 8029678:	ed94 7a16 	vldr	s14, [r4, #88]	; 0x58
 802967c:	ee27 7a26 	vmul.f32	s14, s14, s13
 8029680:	eec6 7a07 	vdiv.f32	s15, s12, s14
 8029684:	edc4 7a19 	vstr	s15, [r4, #100]	; 0x64
	state_m->mod_q = state_m->vq / ((2.0 / 3.0) * state_m->v_bus);
 8029688:	ed94 6a20 	vldr	s12, [r4, #128]	; 0x80
 802968c:	edd4 7a16 	vldr	s15, [r4, #88]	; 0x58
 8029690:	ee67 7aa6 	vmul.f32	s15, s15, s13
 8029694:	ee86 7a27 	vdiv.f32	s14, s12, s15
 8029698:	ed84 7a1a 	vstr	s14, [r4, #104]	; 0x68

After

	const float voltageNormalize = 1.5 / state_m->v_bus;
 8029564:	edd4 7a1f 	vldr	s15, [r4, #124]	; 0x7c
 8029568:	eef7 6a08 	vmov.f32	s13, #120	; 0x3fc00000  1.5
 802956c:	eec6 aa87 	vdiv.f32	s21, s13, s14
	state_m->mod_d = state_m->vd * voltageNormalize;
 8029560:	ed94 7a16 	vldr	s14, [r4, #88]	; 0x58
 8029570:	ee6a 7aa7 	vmul.f32	s15, s21, s15
 8029574:	edc4 7a19 	vstr	s15, [r4, #100]	; 0x64
	state_m->mod_q = state_m->vq * voltageNormalize;
 8029578:	edd4 7a20 	vldr	s15, [r4, #128]	; 0x80
 802957c:	ee6a 7aa7 	vmul.f32	s15, s21, s15
 8029580:	edc4 7a1a 	vstr	s15, [r4, #104]	; 0x68

kubark42 · 2021-11-04T04:20:21Z

mcpwm_foc.c

@@ -4088,8 +4116,8 @@ static void svm(float alpha, float beta, uint32_t PWMHalfPeriod,
 	// sector 1-2
 	case 1: {
 		// Vector on-times
-		uint32_t t1 = (alpha - ONE_BY_SQRT3 * beta) * PWMHalfPeriod;
-		uint32_t t2 = (TWO_BY_SQRT3 * beta) * PWMHalfPeriod;
+		uint32_t t1 = (alpha - beta13) * PWMHalfPeriod;


This is outside the scope of this PR, but I notice that all these values are truncated. I'm not sure if it's worth the performance penalty to add 0.5f to the end to get a rounding-ish behavior.

kubark42 · 2021-11-04T13:36:35Z

Tested, motor runs fine. RFM.

kalvdans · 2021-11-04T13:37:37Z

Before

 8029678:	ed94 7a16 	vldr	s14, [r4, #88]	; 0x58
...
 802968c:	edd4 7a16 	vldr	s15, [r4, #88]	; 0x58

Outside the scope of this PR, but before minimizing the number of divisions by making code harder to read I think we should tackle the issue with redundant loads and eliminate the volatile keywords.

kubark42 · 2021-11-04T13:39:40Z

making code harder to read

Do you feel that is the result? I put the original, pedantic math in comments so that the reader could quickly understand that's where the simplified forms came from.

kubark42 · 2021-11-04T13:45:12Z

we should tackle the issue with redundant loads and eliminate the volatile keywords.

I hadn't realized state_m was declared as volatile. Since it can't change inside of the ADC interrupt-- unless there's a higher prio interrupt which could overwrite it?-- it seems it shouldn't be a volatile at all, no?

kalvdans · 2021-11-04T13:52:30Z

we should tackle the issue with redundant loads and eliminate the volatile keywords.

I hadn't realized state_m was declared as volatile. Since it can't change inside of the ADC interrupt-- unless there's a higher prio interrupt which could overwrite it?-- it seems it shouldn't be a volatile at all, no?

I'm afraid volatile masks other bugs - the same bugs that comes to surface when we use a modern compiler. Anyway, this discussion belongs in another thread.

kalvdans · 2021-11-04T13:53:38Z

making code harder to read

Do you feel that is the result? I put the original, pedantic math in comments so that the reader could quickly understand that's where the simplified forms came from.

Yes, the fact that we need to write the original math in a comment instead of the code makes the code harder to follow and debug.

mcpwm_foc.c

kubark42 · 2021-11-04T14:37:10Z

Yes, the fact that we need to write the original math in a comment instead of the code makes the code harder to follow and debug.

I guess I see it the other way around. I find it a lot easier to understand 2/3*Va + 1/3*Vb than (2.0 / 3.0) * V_a + (1.0 / 3.0) * V_b. There's a lot of code floating around to ensure there is no accidental integer division, and that makes it less clear for me.

This gets especially painful when working with long equations and with variable names which are somewhat obscure, such as a hypothetical V_a_noised_and_filtered.

Just my personal theory on things, but I see the comment as the intent, and the contract that the code must deliver. The code's responsibility is to deliver the promise the comment makes. If the two disagree, then the code as a whole is buggy.

This does mean that the comments need to be faithful and clear.

ElwinBoots · 2021-11-04T14:37:24Z

Before
 8029678:	ed94 7a16 	vldr	s14, [r4, #88]	; 0x58
...
 802968c:	edd4 7a16 	vldr	s15, [r4, #88]	; 0x58
Outside the scope of this PR, but before minimizing the number of divisions by making code harder to read I think we should tackle the issue with redundant loads and eliminate the volatile keywords.

Interesting. When I saw the proposal I was really happy because it made the math much clearer for me. Dividing by (2/3 * Vbus) is not clear to me at all, even after having seen it many times in the code.

nitrousnrg · 2021-11-04T14:41:49Z

Yes, the fact that we need to write the original math in a comment instead of the code makes the code harder to follow and debug.

This is exactly how I feel about this PR.

In general I don't like higher cpu performance if it comes at a cost of code readability. The way I see it if the code is more readable it invites people and opens the door in the future to improve algorithms to achieve higher motor performance. If I can get 5% more torque it would be huge, but 5% better cpu usage? I don't care.

Tested, motor runs fine. RFM.

In general I would need a lot more data to feel confident about a merge, maybe we should have a template so we can show the hardware that has been used for testing, motor characteristics, voltage/current/torque/rpm/power of the test, such that we have an idea of the coverage of such testing. I watched my dyno run fine for a while, until it blew when it entered FW for example. Its very common that "testing" is just spinning a motor at no load, that's not enough when you're changing core stuff.

Same about removing volatile declarations... gotta be real careful to make that call and get some good testing coverage.

ElwinBoots · 2021-11-04T15:08:14Z

My suggestion would be as follows:

/* Voltage normalization such that 1 corresponds to the absolute maximum voltage the inverter can make including overmodulation. Without overmodulation the max value is sqrt(3)/2 = 0.866 */
const float voltageNormalize = 1.5 / motor_now->m_motor_state.v_bus;

This actually explains where the factor comes from, instead of a factor 2/3 just appearing. It also tells you what is means for the normalization amplitudes.

kubark42 · 2021-11-04T15:22:23Z

The way I see it if the code is more readable it invites people and opens the door in the future to improve algorithms to achieve higher motor performance.

I think that's what @ElwinBoots and I are saying. As we're going through the code implementing https://vesc-project.com/node/3257, we find that it is in general difficult to read and understand some of the basic mathematical underpinnings. We're not so much concerned with the nitty-gritty of the implementation as the why and what it all means.

Stuff like v_mag = (2.0 / 3.0) * max_duty * SQRT3_BY_2 * state_m->v_bus; is a lot easier for us to understand when it's written as float max_v_mag = ONE_BY_SQRT3 * max_duty * state_m->v_bus;. It gets even easier when the original algorithm is written as a code comment, so we can understand the intent as well as the implementation.

It was a similar process with https://vesc-project.com/node/3225. Catching those bugs made a night and day difference for my 20kW motor, but it was harder to do because the math is not well documented. We have to read individual lines of code to understand the gist of what's going on, and that's not as efficient as we'd like it to be.

Do you guys see a happy compromise? We both find this new form a lot more readable, but I certainly appreciate why you'd prefer the old one.

If I can get 5% more torque it would be huge, but 5% better cpu usage? I don't care.

I don't think available CPU can help with max current/torque, as unfortunately that's a limit of the FET switching losses. However, max eRPM is going to be driven in part by how quickly the code can run. Eliminating 50 operations per loop and helping the system avoid caching misses gives a little more stability, so we can push the motors to a little higher eRPM.

This isn't completely academic. In our FRF testing we are seeing the controller crash and reboot when trying to use the trace sample functionality when running at 30kHz. This is indicative of loops consuming too many resources, possibly running past their margins and triggering a watchdog.

While the impact is minor, I wouldn't say it's trivial. We have a saying in aviation, "take care of the grams the the kilograms take care of themselves". I feel like similar approaches in embedded dev pay off.

In general I would need a lot more data to feel confident about a merge, maybe we should have a template so we can show the hardware that has been used for testing, motor characteristics, voltage/current/torque/rpm/power of the test, such that we have an idea of the coverage of such testing. I watched my dyno run fine for a while, until it blew when it entered FW for example. It's very common that "testing" is just spinning a motor at no load, that's not enough when you're changing core stuff.

I agree! That's why I limited the PR to only reshuffling the math. We can validate that by eye and with light testing because we'll know instantly if, for instance, the svm() function is wrong. However, to get to the real meat of optimization we need to do a lot more testing.

Short of having some kind of hardware-in-the-loop testing as part of the PR, I don't really know how to accomplish this. There are so many variables and edge cases which pop up even after the extensive testing done before a release, such as https://vesc-project.com/comment/8408#comment-8408.

When I was doing autopilots, our goal was an automated build which pushed the code to hardware, and then the hardware ran a script which tested a variety of configurations. Maybe something similar to this could happen in VESC?

nitrousnrg · 2021-11-05T01:47:30Z

/* Voltage normalization such that 1 corresponds to the absolute maximum voltage the inverter can make including overmodulation. Without overmodulation the max value is sqrt(3)/2 = 0.866 */ const float voltageNormalize = 1.5 / motor_now->m_motor_state.v_bus;

Comments like that would be super super appreciated, even links to point to literature.

This isn't completely academic. In our FRF testing we are seeing the controller crash and reboot when trying to use the trace sample functionality when running at 30kHz. This is indicative of loops consuming too many resources, possibly running past their margins and triggering a watchdog.

Yes, I know its unsafe to run the foc loop at >60KHz, my builds usually limit the max swtiching frequency to avoid RTOS crashes as it might end up in a bricked board, but that kHz limit changes with each version as we add more tasks to the code. In my experience 60kHz is more than enough, but I don't use ultra low inductance motors... Its not too hard to ask the RTOS how is it doing (% of idling time) to estimate in runtime what's the max available loop freq.

When I was doing autopilots, our goal was an automated build which pushed the code to hardware, and then the hardware ran a script which tested a variety of configurations. Maybe something similar to this could happen in VESC?

I hear you there, I had the same goal for vesc, we built into the vesc firmware a motor model to be able to run a virtual motor:
#82

My end goal was to setup a continuous integration that could test each commit by running it into real hardware (stm32f4 discovery) making it drive the built-in virtual motor. I couldn't push this to the end as I got swamped with dayjob work... I'm not even sure if the virtual motor is working in the latest build.

vedderb · 2021-11-06T11:43:53Z

In general readability should always go before CPU efficiency, unless it makes a significant difference. In this case the maximum interrupt rate is 30 kHz, so 50 cycles per interrupt makes a difference of 50 / (168000000 / 30000) * 100 = 0.9%. I wouldn't sacrifice any readability for that. That being said, there are some things I like about this PR that make it more readable such as the voltageNormalize (which I would write as voltage_normalize to be consistent with (most of) the rest of the code). The new version of the mod signs are also as readable as before, but faster, so that is nice.

A small comment: In general it is better to write out floating point constants with decimals, like 2.0 instead of 2. In most cases that does not matter and they will used as floats, but I think it is good to make that extra clear.

The SVM function is a bit less clear now and I don't know if it even makes a meaningful performance difference as the values are used very few times through the nested if and switch statements. beta23 is also only used in 4 out of 6 cases where it is used only once, so if anything the pre-calculation makes it slower. I also don't like the auto-generated comment in the beginning of the function as it only takes space unless the parameters actually are described there.

vedderb · 2021-11-06T11:54:49Z

About the motor state being volatile: The main reason for that is that it is read from threads in several places, and if you, for example, run a loop that cannot alter the state and wait for something to change it the optimization can mess things up by assuming that the state will be unchanged in that loop as it is unaware of interrupts. The detect functions do this a lot, so I think it is better to keep it volatile to not run into surprises.

vedderb · 2021-11-06T11:58:43Z

Another thing: the terminal command last_adc_duration can be used while the motor is running to measure how long the interrupt takes to finish. That gives some some indication about the effectiveness of the optimizations. Duty cycle more usually runs the slowest and using encoders also slows things down, so testing under those conditions is useful.

kubark42 · 2021-11-06T14:52:42Z

0.9%.

I haven't done any A/B testing yet so any performance gains are purely hypothetical as of yet. 1% for marginal changes in the code sounds almost too good to be true.

Thanks for the tip on last_adc_duration, I think we can give that a try and report back.

I wouldn't sacrifice any readability for that.

This sounds like a stylistic question. It seems opinions differ on whether the original or the resulting is more readable, but since none of us are programming in assembly it comes down to preferences of how abstract we like the code. Obv. the project maintainer preferences are the most important, so we'll cater to those.

Would you like to highlight the sections you agree with and the ones you think need improvement? We can merge in only the uncontroversial changes, and then I can test the speed changes for the "less readable" optimizations. If it's still something worthwhile to consider I'll open a new PR.

In general it is better to write out floating point constants with decimals, like 2.0 instead of 2.

Usually yes, but in this case is intentionally as an integer, for the case where some optimizations/compilers will use a faster bit shift of the exponential in IEEE 754 format. By using the exact 2, it lets the compiler make any optimizations for us. Otherwise it might be reluctant to assume that 2.0 == 2.

beta23 is also only used in 4 out of 6 cases where it is used only once, so if anything the pre-calculation makes it slower.

I suspect that the compiler will resolve this for us. It's usually smart enough not to compile the number if it doesn't need it, but not smart enough to realize that multiple math equations evaluate to identical results.

As an aside, how important is it that SVM() pathway execution be identical in time? I suspect it doesn't here, since we can guarantee that in a complete 6-step cycle we will wind up executing all pathways, but I'd love your opinion on it.

I also don't like the auto-generated comment in the beginning of the function as it only takes space unless the parameters actually are described there.

Yup, agreed. Part of the reason this is a draft is because I've been trying to figure out what the SVM is doing. I'd like to precisely document it, but it's not completely clear yet what the basic assumptions are. For instance, why are all the calls to it passing negatives? And why the complex branching execution path vs. the clearer 1/2*(max(va,vb,vc) + min(va, vb, vc))?

If you can fill in the blanks, perhaps by giving the paper where that SVM methodology is described, I will happily get that added.

kubark42 · 2021-11-06T14:58:38Z

The main reason for that is that it is read from threads in several places, and if you, for example, run a loop that cannot alter the state and wait for something to change it the optimization can mess things up by assuming that the state will be unchanged in that loop as it is unaware of interrupts.

What currently has a higher prio than the ADC interrupt? We could look at those individual sections and see if they affect anything which is calculated during the ADC interrupt service handler.

kubark42 · 2021-11-06T16:06:31Z

mcpwm_foc.c

+	/* mod_alpha_sign = 2/3*sign(ia) - 1/3*sign(ib) - 1/3*sign(ic) */
+	/* mod_beta_sign  = 1/sqrt(3)*sign(ib) - 1/sqrt(3)*sign(ic) */
+	const float mod_alpha_filter_sgn = (1.0 / 3.0) * (2 * SIGN(ia_filter) - SIGN(ib_filter) - SIGN(ic_filter));
+	const float mod_beta_filter_sgn = ONE_BY_SQRT3 * (SIGN(ib_filter) - SIGN(ic_filter));


Identical to the above, only here we have [-2/sqrt(3), 0, 2/sqrt(3)] as the three possible outputs.

mcpwm_foc.c

vedderb · 2021-11-06T19:35:25Z

What currently has a higher prio than the ADC interrupt? We could look at those individual sections and see if they affect anything which is calculated during the ADC interrupt service handler.

The ADC handler has the highest priority, but there are other parts that have lower priority that will be interrupted by it, namely the detection functions.

Usually yes, but in this case is intentionally as an integer, for the case where some optimizations/compilers will use a faster bit shift of the exponential in IEEE 754 format. By using the exact 2, it lets the compiler make any optimizations for us. Otherwise it might be reluctant to assume that 2.0 == 2

Floating point multiplication only takes one cycle anyway, so it should not matter. It is more important that it is consistent.

As an aside, how important is it that SVM() pathway execution be identical in time? I suspect it doesn't here, since we can guarantee that in a complete 6-step cycle we will wind up executing all pathways, but I'd love your opinion on it.

Making it identical is not important, only the longest execution time. Having some of the execution paths shorter does not help the worst execution time, but it will give the regular threads a few more cycles on average.

If you can fill in the blanks, perhaps by giving the paper where that SVM methodology is described, I will happily get that added.

Describing it fully takes some effort, but understanding the parameters is simple. It will calculate the compare values for a center-aligned up-and-downcounting timer that counts to PWMHalfPeriod and back. tAout, tBout and tCout are the compare values. The input is the alpha and beta-voltage to be modulated. The original version of the svm function comes from the author of odrive, who gave me the code for one of his FPGA-based controllers. I think his timers used the opposite sign, so I just made alpha and beta negative to math it and never came back to fix it (I didn't edit the naming to be consistent either).

Here is a brief attempt at describing it:

The power stage can produce 8 vectors, out of which 2 short all the phases. We call them V0 to V7, where V0 (0, 0, 0) and V7 (1, 1, 1) are the zero-vectors that short all phases. The other six vectors can produce six different voltage vectors that are 60 degrees apart. Any vector in the hexagon that these six vectors span can be produced by altering between two of the six vectors and the 0 vectors (the portion spent in the 0 vectors controls the magnitude). The clever part of SVPWM is that we alter between these vectors such that any jump between two vectors only takes one half-bridge to change state and thus minimizes the amount of switching. The sequence is V0-Vx-Vy-V7-Vy-Vx-V0. So we start in V0, go to the first vector, then the second vector, then V7 and then go the same way back to V0.

The steps to do this are:

Determine which of the 6 different sectors we are in

For example, in sector 1 we alter between V1 and V2, in sector 2 we alter between V2 and V3 etc.
For that sector, calculate how long time to spend in each of the 2 vectors (e.g. t1 and t2 in sector 1) based on the alpha and beta-voltages. That is, we map the 90 degree apart alpha and beta to the 60 degree apart vectors that we alter between in this sector. This is an example for how to do this in sector 1:

Every sector can be solved like this by drawing some triangles and using trigonometric identities.
Calculate the compare-values for the center-aligned timer from these two time values. This is how that looks (kind of):

The timer counts up to PWMHalfPeriod and down again, and the state of each half bridge is determined by whether the current count is above or below tX (tA, tB or tC) for that half bridge.

ElwinBoots · 2021-11-06T21:32:00Z

Nice explanation Benjamin!

Funny coincidence that just today I looked into why those inputs to the svm function are negative. My conclusion was that that function calculates rising edge moments (so a high value would be a low voltage). From what I saw the setting of the center aligned PWM expects the duty cycle instead (high value is high voltage). So you inverted this by inverting the inputs. I guess this matches what you describe above. This does make the sector output incorrect, but I don't think that is being used. Maybe we can clean svm up in some future release.

kubark42 · 2021-11-07T12:29:42Z

The ADC handler has the highest priority, but there are other parts that have lower priority that will be interrupted by it, namely the detection functions.

If that's the case, then all volatiles should be able to be safely removed as they cannot be modified outside the ADC handler.

Floating point multiplication only takes one cycle anyway, so it should not matter.

Yes... and no. FPUs are magic, but they can have a cost which isn't always obvious.

The CPU can only do FPU operations on certain special registers. If that variable isn't already loaded into one of those then a CPU instruction is required to move the variable into it and another to move it out. Even more instructions might be required if there isn't already a register free, since a register shuffle has to take place
I forget the exact details, but basically on ARM there's a limitation in how quickly the next instruction can be loaded into the CPU. IIRC, an FPU instruction can be loaded before the non-FPU instruction is completed, but not vice versa.
- Recall that the CPU has to load operations from memory, either RAM or flash. If it has to load them from flash that's very slow and it can no longer do one instruction per CPU cycle.

While it's worth knowing, none of this is a super compelling reason to use 2 instead of 2.0. I'll change it to 2.0.

Here is a brief attempt at describing it:

Thanks. I will update the PR.

@vedderb, I'm still a little unclear on how I should change this PR to move it forward. When you have a sec, could you flag which changes you'd like me to keep and which changes you'd like to defer/decline?

`/((2.0/3.0) * v_bus)` is the same as `* 1.5/V_bus`. Abstracting this out saves at least one division per loop, and in some cases as many as three. Each STM32F4 FPU division takes 14 cycles.

kubark42 · 2021-11-07T12:43:06Z

mcpwm_foc.c

+
+	/* mod_alpha_sign = 2/3*sign(ia) - 1/3*sign(ib) - 1/3*sign(ic) */
+	/* mod_beta_sign  = 1/sqrt(3)*sign(ib) - 1/sqrt(3)*sign(ic) */
+	const float mod_alpha_filter_sgn = (1.0 / 3.0) * (2.0 * SIGN(ia_filter) - SIGN(ib_filter) - SIGN(ic_filter));


This line is basically taking a truth table as input and outputs in the set of [-4/3, -2/3, 2/3, 4/3]. This could be moved into a function and turned into a simple if() hierarchy.

kubark42 · 2021-11-07T13:03:36Z

mcpwm_foc.c

@@ -4127,8 +4154,8 @@ static void svm(float alpha, float beta, uint32_t PWMHalfPeriod,
 	// sector 3-4
 	case 3: {
 		// Vector on-times
-		uint32_t t3 = (TWO_BY_SQRT3 * beta) * PWMHalfPeriod;
-		uint32_t t4 = (-alpha - ONE_BY_SQRT3 * beta) * PWMHalfPeriod;
+		uint32_t t3 = 2.0 * beta13 * PWMHalfPeriod;


This could also be left as TWO_BY_SQRT3. The reason to use 2.0 * beta13 is to let the compiler know we don't need beta anymore. This isn't a very strong reason and if it's found that there's a readability preference for the old format then let's go back to that.

vedderb · 2021-11-07T13:24:57Z

If that's the case, then all volatiles should be able to be safely removed as they cannot be modified outside the ADC handler.

No. There are loops assume that the state will change as they go. For example, there are a few places that wait for the motor to stop. If the compiler thinks that nothing in that loop can affect the state, it might optimize away those checks and go forever as the loop does nothing that can affect e.g. the rpm in the state. In most cases that is not a problem, but I have seen it happen, especially when changing compiler version or compiler settings.

I think everything except the changes in the SVM-function are fine. The reason is that, as it is now, it is relatively easy to look at the svm function, draw some figures and see exactly what the steps are and where the calculations come from. It was quite easy for me to draw the figures above because of that, and I think that has some value.

kubark42 · 2021-11-07T14:05:59Z

I think everything except the changes in the SVM-function are fine.

Okay, they've been taken out.

kubark42 · 2021-11-07T15:12:36Z

In general it is better to write out floating point constants with decimals, like 2.0 instead of 2.

Usually yes, but in this case is intentionally as an integer, for the case where some optimizations/compilers will use a faster bit shift of the exponential in IEEE 754 format. By using the exact 2, it lets the compiler make any optimizations for us. Otherwise it might be reluctant to assume that 2.0 == 2.

Of interest: https://gist.github.com/kubark42/333bfce67e309474fe1801eb2356f7be

Compared to using 2, the version using 2.0 has more than double the instruction count (63 vs 31). I don't claim I completely understand what's going on, I would have thought it would have been at most an extra instruction or two. It might even be that this isn't fully representative of how it compiles in situ, since for this test I made a separate function and forced it not to inline.

However, I think it is a good example of how compilers sometimes do unexpected and screwy things when floats are involved.

kubark42 commented Nov 4, 2021

View reviewed changes

kubark42 marked this pull request as ready for review November 4, 2021 13:28

kubark42 commented Nov 4, 2021

View reviewed changes

mcpwm_foc.c Outdated Show resolved Hide resolved

kubark42 marked this pull request as draft November 4, 2021 14:37

kubark42 force-pushed the simplify_foc_fpu_math branch from 92cd71e to 257eb85 Compare November 6, 2021 15:29

kubark42 commented Nov 6, 2021

View reviewed changes

mcpwm_foc.c Outdated Show resolved Hide resolved

[MCPWM_FOC] Eliminate repeated divisions

6dbea7e

`/((2.0/3.0) * v_bus)` is the same as `* 1.5/V_bus`. Abstracting this out saves at least one division per loop, and in some cases as many as three. Each STM32F4 FPU division takes 14 cycles.

kubark42 force-pushed the simplify_foc_fpu_math branch from 257eb85 to 2cb9b3f Compare November 7, 2021 12:40

kubark42 commented Nov 7, 2021

View reviewed changes

[MCPWM_FOC] Condense math in order to save on FPU operations

80adea0

kubark42 force-pushed the simplify_foc_fpu_math branch from 0f47264 to 80adea0 Compare November 7, 2021 14:02

kubark42 marked this pull request as ready for review November 7, 2021 14:04

vedderb merged commit 3b3c423 into vedderb:dev_fw_5_03 Nov 7, 2021

kubark42 deleted the simplify_foc_fpu_math branch November 7, 2021 23:17

Rampagy mentioned this pull request Jan 12, 2023

Feature request: Alternating Reversing sequence & Null=V0 space vector PWM #589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify foc fpu math #372

Simplify foc fpu math #372

kubark42 commented Nov 4, 2021 •

edited

Loading

kubark42 Nov 4, 2021 •

edited

Loading

kubark42 commented Nov 4, 2021

kalvdans commented Nov 4, 2021

kubark42 commented Nov 4, 2021 •

edited

Loading

kubark42 commented Nov 4, 2021

kalvdans commented Nov 4, 2021

kalvdans commented Nov 4, 2021

kubark42 commented Nov 4, 2021 •

edited

Loading

ElwinBoots commented Nov 4, 2021

nitrousnrg commented Nov 4, 2021

ElwinBoots commented Nov 4, 2021 •

edited

Loading

kubark42 commented Nov 4, 2021 •

edited

Loading

nitrousnrg commented Nov 5, 2021

vedderb commented Nov 6, 2021

vedderb commented Nov 6, 2021

vedderb commented Nov 6, 2021

kubark42 commented Nov 6, 2021 •

edited

Loading

kubark42 commented Nov 6, 2021

kubark42 Nov 6, 2021 •

edited

Loading

vedderb commented Nov 6, 2021

ElwinBoots commented Nov 6, 2021

kubark42 commented Nov 7, 2021

kubark42 Nov 7, 2021 •

edited

Loading

kubark42 Nov 7, 2021

vedderb commented Nov 7, 2021

kubark42 commented Nov 7, 2021

kubark42 commented Nov 7, 2021 •

edited

Loading

Simplify foc fpu math #372

Simplify foc fpu math #372

Conversation

kubark42 commented Nov 4, 2021 • edited Loading

Before

After

Example output

Before

After

kubark42 Nov 4, 2021 • edited Loading

Choose a reason for hiding this comment

kubark42 commented Nov 4, 2021

kalvdans commented Nov 4, 2021

kubark42 commented Nov 4, 2021 • edited Loading

kubark42 commented Nov 4, 2021

kalvdans commented Nov 4, 2021

kalvdans commented Nov 4, 2021

kubark42 commented Nov 4, 2021 • edited Loading

ElwinBoots commented Nov 4, 2021

nitrousnrg commented Nov 4, 2021

ElwinBoots commented Nov 4, 2021 • edited Loading

kubark42 commented Nov 4, 2021 • edited Loading

nitrousnrg commented Nov 5, 2021

vedderb commented Nov 6, 2021

vedderb commented Nov 6, 2021

vedderb commented Nov 6, 2021

kubark42 commented Nov 6, 2021 • edited Loading

kubark42 commented Nov 6, 2021

kubark42 Nov 6, 2021 • edited Loading

Choose a reason for hiding this comment

vedderb commented Nov 6, 2021

ElwinBoots commented Nov 6, 2021

kubark42 commented Nov 7, 2021

kubark42 Nov 7, 2021 • edited Loading

Choose a reason for hiding this comment

kubark42 Nov 7, 2021

Choose a reason for hiding this comment

vedderb commented Nov 7, 2021

kubark42 commented Nov 7, 2021

kubark42 commented Nov 7, 2021 • edited Loading

kubark42 commented Nov 4, 2021 •

edited

Loading

kubark42 Nov 4, 2021 •

edited

Loading

kubark42 commented Nov 4, 2021 •

edited

Loading

kubark42 commented Nov 4, 2021 •

edited

Loading

ElwinBoots commented Nov 4, 2021 •

edited

Loading

kubark42 commented Nov 4, 2021 •

edited

Loading

kubark42 commented Nov 6, 2021 •

edited

Loading

kubark42 Nov 6, 2021 •

edited

Loading

kubark42 Nov 7, 2021 •

edited

Loading

kubark42 commented Nov 7, 2021 •

edited

Loading