z_cstart bug #32444

hongshui3000 · 2021-02-18T11:20:06Z

Describe the bug
file:init.c

FUNC_NORETURN void z_cstart(void)
{
	/* gcov hook needed to get the coverage report.*/
	gcov_static_init();

	LOG_CORE_INIT();

	/* perform any architecture-specific initialization */
	arch_kernel_init();

#if defined(CONFIG_MULTITHREADING)
	/* Note: The z_ready_thread() call in prepare_multithreading() requires
	 * a dummy thread even if CONFIG_ARCH_HAS_CUSTOM_SWAP_TO_MAIN=y
	 */
	struct k_thread dummy_thread;

	z_dummy_thread_init(&dummy_thread);
#endif

	/* perform basic hardware initialization */
	z_sys_init_run_level(_SYS_INIT_LEVEL_PRE_KERNEL_1);-------------------------------------->  where the bug originated
	z_sys_init_run_level(_SYS_INIT_LEVEL_PRE_KERNEL_2);--------------------------------------> where the bug originated

#ifdef CONFIG_STACK_CANARIES
	uintptr_t stack_guard;

	z_early_boot_rand_get((uint8_t *)&stack_guard, sizeof(stack_guard));
	__stack_chk_guard = stack_guard;
	__stack_chk_guard <<= 8;
#endif	/* CONFIG_STACK_CANARIES */

#ifdef CONFIG_MULTITHREADING
	switch_to_main_thread(prepare_multithreading());  --------------------------------------------> where the bug originated
#else
#ifdef ARCH_SWITCH_TO_MAIN_NO_MULTITHREADING
	/* Custom ARCH-specific routine to switch to main()
	 * in the case of no multi-threading.
	 */
	ARCH_SWITCH_TO_MAIN_NO_MULTITHREADING(bg_thread_main,
		NULL, NULL, NULL);
#else
	bg_thread_main(NULL, NULL, NULL);

	/* LCOV_EXCL_START
	 * We've already dumped coverage data at this point.
	 */
	irq_lock();
	while (true) {
	}
	/* LCOV_EXCL_STOP */
#endif
#endif /* CONFIG_MULTITHREADING */

	/*
	 * Compiler can't tell that the above routines won't return and issues
	 * a warning unless we explicitly tell it that control never gets this
	 * far.
	 */

	CODE_UNREACHABLE; /* LCOV_EXCL_LINE */
}

If the interrupt is enabled when the driver is initialized, and the interrupt is entered before the interrupt stack is initialized, the software will generate a bug.

If I understand it correctly, the initialization of the interrupt stack will always occur after the interrupt is enabled .

When the interrupt is enabled, an interrupt is generated immediately, and the interrupt stack is not ready yet, then the bug I said will appear.

interrupt en：

z_sys_init_run_level(_SYS_INIT_LEVEL_PRE_KERNEL_2)

interrupt stack：

_kernel.cpus[i].irq_stack =(Z_KERNEL_STACK_BUFFER(z_interrupt_stacks[i]) +K_KERNEL_STACK_SIZEOF(z_interrupt_stacks[i]))

I found this problem when I debugged my systemtick

The text was updated successfully, but these errors were encountered:

StefJar · 2021-02-28T22:32:27Z

any progress on that so far? Maybe my problem is related to that (#32261)

yerabolu · 2021-03-01T18:53:57Z

@hongshui3000 : Interrupt is disabled when it enters z_cstart and gets enabled only when it enters main thread through bg_thread_main.

Also, could you give some more information on how this is encountered and which platform its encountered on?

yerabolu · 2021-03-01T18:55:04Z

any progress on that so far? Maybe my problem is related to that (#32261)

Looking at the bug it doesn't look to be related. But will look into it further and get back in here.

StefJar · 2021-03-01T22:53:20Z

@yerabolu my issue is happens at the start up of zephyr when the main function is executed and a task switch happens. Its highly probable that it is triggered through an ISR.
Since a (custom) bootloader runs, before zephyr starts, it might happen that an interrupt etc. is activated there(maybe bug).

hongshui3000 · 2021-03-02T02:40:59Z

@hongshui3000 : Interrupt is disabled when it enters z_cstart and gets enabled only when it enters main thread through bg_thread_main.

Also, could you give some more information on how this is encountered and which platform its encountered on?
file : cavs_timer.c

int z_clock_driver_init(const struct device *device)
{
	uint64_t curr = count();

	IRQ_CONNECT(TIMER_IRQ, 0, compare_isr, 0, 0);
	set_compare(curr + CYC_PER_TICK);
	last_count = curr;
	irq_enable(TIMER_IRQ); ---------------------------------------------------------->this
	return 0;
}

and file: uart_mcux.c

static int uart_mcux_init(const struct device *dev)
{
#ifdef CONFIG_UART_INTERRUPT_DRIVEN
	const struct uart_mcux_config *config = dev->config;
#endif
	struct uart_mcux_data *data = dev->data;
	int err;

	err = uart_mcux_configure(dev, &data->uart_cfg);
	if (err != 0) {
		return err;
	}

#ifdef CONFIG_UART_INTERRUPT_DRIVEN
	config->irq_config_func(dev);
#endif

	return 0;
}

#ifdef CONFIG_UART_INTERRUPT_DRIVEN
#define UART_MCUX_CONFIG_FUNC(n)					\
	static void uart_mcux_config_func_##n(const struct device *dev)	\
	{								\
		IRQ_CONNECT(DT_INST_IRQ_BY_NAME(n, status, irq),	\
			    DT_INST_IRQ_BY_NAME(n, status, priority),	\
			    uart_mcux_isr, DEVICE_GET(uart_##n), 0);	\
									\
		irq_enable(DT_INST_IRQ_BY_NAME(n, status, irq));	\---------------------------->this
									\
		IRQ_CONNECT(DT_INST_IRQ_BY_NAME(n, error, irq),		\
			    DT_INST_IRQ_BY_NAME(n, error, priority),	\
			    uart_mcux_isr, DEVICE_GET(uart_##n), 0);	\
									\
		irq_enable(DT_INST_IRQ_BY_NAME(n, error, irq));		\
	}
#define UART_MCUX_IRQ_CFG_FUNC_INIT(n)					\
	.irq_config_func = uart_mcux_config_func_##n
#define UART_MCUX_INIT_CFG(n)						\
	UART_MCUX_DECLARE_CFG(n, UART_MCUX_IRQ_CFG_FUNC_INIT(n))
#else
#define UART_MCUX_CONFIG_FUNC(n)
#define UART_MCUX_IRQ_CFG_FUNC_INIT
#define UART_MCUX_INIT_CFG(n)						\
	UART_MCUX_DECLARE_CFG(n, UART_MCUX_IRQ_CFG_FUNC_INIT)
#endif

If you use them, then the interrupt may be enabled before the interrupt stack is initialized. At least in the xtensa architecture I don’t see interrupts being turned off.If log is turned on, I feel that the interrupt will also be enabled in advance.

In the xtensa architecture, if I deliberately reduce the value of CYC_PER_TICK (such as 0 or 1), then the interrupt will be generated before entering bg_thread_main

I feel that this should involve all the drivers started by _SYS_INIT_LEVEL_PRE_KERNEL_2 and _SYS_INIT_LEVEL_PRE_KERNEL_1 and whether the architecture will be enabled to interrupt. Our system does not make it clear that interrupts cannot be enabled in this place. So I feel that there may be potential problems at present or in the future.

yerabolu · 2021-03-03T18:56:57Z

Thank You for the info. Investigating further to see which drivers are started by _SYS_INIT_LEVEL_PRE_KERNEL_2 and _SYS_INIT_LEVEL_PRE_KERNEL_1. Will update on it by EOD.

yerabolu · 2021-03-16T17:41:05Z

@hongshui3000: Ran the below test to see if the irq_lock() and irq_unlock() are setting correctly and ran the tests. All the tests passed. Could you try it on your end as well and see if there is any issue.

	/* Custom ARCH-specific routine to switch to main()
	 * in the case of no multi-threading.
	 */
	ARCH_SWITCH_TO_MAIN_NO_MULTITHREADING(bg_thread_main,
		NULL, NULL, NULL);
#else
	unsigned int key = irq_lock();
	__ASSERT(!arch_irq_unlocked(key),"");
	irq_unlock(key);
	bg_thread_main(NULL, NULL, NULL);

hongshui3000 · 2021-03-17T02:53:39Z

@yerabolu
Thank you for the information, I will try to debug my program later. But what I want to know is when zephyr rtos is started, "irq_lock" and "irq_unlock" are executed in the above place. Do you suspect that the execution of "irq_lock" in my code failed?

yerabolu · 2021-03-17T21:03:39Z

@yerabolu
Thank you for the information, I will try to debug my program later. But what I want to know is when zephyr rtos is started, "irq_lock" and "irq_unlock" are executed in the above place. Do you suspect that the execution of "irq_lock" in my code failed?

@hongshui3000
Looking at the code mentioned above, Interrupt is disabled when the zephyr rtos is started and then gets enabled when it enters main thread. And the above test confirms that.

When you mention "Do you suspect that the execution of "irq_lock" in my code failed?" - Are there are any new additions/changes to the above code that you are working on ?

hongshui3000 · 2021-03-18T02:13:05Z

@yerabolu
I understand what you said. But when I was debugging the xtensa architecture, I did see an interruption. I will continue to check my own code.

When the "_SYS_INIT_LEVEL_PRE_KERNEL_2" driver is initialized, some drivers enable interrupts. These drivers are initialized before entering main thread.

Whether an interrupt occurs depends on the processor interrupt architecture and the implementation of irq_lock.

It's just that what I see in the interrupt architecture of xtensa is like this, not the interrupt architecture of the finished chip (the interrupt controller that increases the hardware may control the global interrupt)

I put this question here, the main purpose is to think that in a system without global interrupt control, the timing of the interrupt enable may cause problems.

yerabolu · 2021-03-18T18:05:37Z

@hongshui3000 : Unfortunately i am not able to see this issue on my end and without a way to reproducing it in some way i will not be able to figure out if it is actually causing problems with the timing of enabling interrupts. But on the Zephyr master i don't see any issue.

Adding @andyross to take a look on this.

andyross · 2021-03-18T22:05:56Z

I understand what you said. But when I was debugging the xtensa architecture, I did see an interruption. I will continue to check my own code.

Honestly this sounds like a platform bug. The way this design is supposed to work:

The platform layer enters z_cstart() with interrupts masked[1]
The OS initializes core elements like the _kernel struct, main thread, interrupt stacks, etc...
The OS does its device- and driver-level initialization
The OS then context switches to the main thread (via a z_swap() to init.c:bg_thread_main()), which is the first moment where interrupts become unmasked

(And on the SMP CPUs the process is similar, they start with interrupts masked and only enable them once switching into user/idle threads)

[1] In the sense of arch_irq_lock(). Note that arch_irq_enable(int) is a separate API designed to selectively mask specific interrupts for the purpose of device control. An interrupt that is "enabled" will still not be delivered if it is globally masked by irq_lock(). On Xtensa, "enabled" corresponds to a bit set in the INTENABLE special register, while irq_lock() works by setting the INTLEVEL field in the bottom bits of PS to XCHAL_EXCM_LEVEL.

hongshui3000 · 2021-03-19T02:26:05Z

@andyross @yerabolu
I have basically obtained the information I want, thanks for the reply. I think the issue can be closed now

hongshui3000 added the bug The issue is a bug, or the PR is fixing a bug label Feb 18, 2021

carlescufi changed the title ~~z_cstart Bug ？？~~ z_cstart bug Feb 22, 2021

carlescufi assigned andyross Feb 22, 2021

carlescufi added the area: Kernel label Feb 22, 2021

yerabolu assigned yerabolu and unassigned andyross Feb 22, 2021

nashif added the priority: medium Medium impact/importance bug label Feb 22, 2021

yerabolu assigned andyross Mar 18, 2021

hongshui3000 closed this as completed Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

z_cstart bug #32444

z_cstart bug #32444

hongshui3000 commented Feb 18, 2021 •

edited

Loading

StefJar commented Feb 28, 2021

yerabolu commented Mar 1, 2021

yerabolu commented Mar 1, 2021

StefJar commented Mar 1, 2021

hongshui3000 commented Mar 2, 2021 •

edited

Loading

yerabolu commented Mar 3, 2021

yerabolu commented Mar 16, 2021 •

edited

Loading

hongshui3000 commented Mar 17, 2021

yerabolu commented Mar 17, 2021 •

edited

Loading

hongshui3000 commented Mar 18, 2021 •

edited

Loading

yerabolu commented Mar 18, 2021 •

edited

Loading

andyross commented Mar 18, 2021

hongshui3000 commented Mar 19, 2021 •

edited

Loading

z_cstart bug #32444

z_cstart bug #32444

Comments

hongshui3000 commented Feb 18, 2021 • edited Loading

StefJar commented Feb 28, 2021

yerabolu commented Mar 1, 2021

yerabolu commented Mar 1, 2021

StefJar commented Mar 1, 2021

hongshui3000 commented Mar 2, 2021 • edited Loading

yerabolu commented Mar 3, 2021

yerabolu commented Mar 16, 2021 • edited Loading

hongshui3000 commented Mar 17, 2021

yerabolu commented Mar 17, 2021 • edited Loading

hongshui3000 commented Mar 18, 2021 • edited Loading

yerabolu commented Mar 18, 2021 • edited Loading

andyross commented Mar 18, 2021

hongshui3000 commented Mar 19, 2021 • edited Loading

hongshui3000 commented Feb 18, 2021 •

edited

Loading

hongshui3000 commented Mar 2, 2021 •

edited

Loading

yerabolu commented Mar 16, 2021 •

edited

Loading

yerabolu commented Mar 17, 2021 •

edited

Loading

hongshui3000 commented Mar 18, 2021 •

edited

Loading

yerabolu commented Mar 18, 2021 •

edited

Loading

hongshui3000 commented Mar 19, 2021 •

edited

Loading