[BBB] Reboot/halt problem

This forum is for supported devices using an ARMv7 Texas Instruments (TI) SoC.

Re: [BBB] Reboot/halt problem

Postby bulletmark » Mon Aug 14, 2017 12:09 pm

@summers [Note I can't quote posts in this thread due to a forum bug which says the posts are too old to quote?!].

If you are power resetting then of course you won't see the bug. Read what has been said above and in the other thread I quoted. Just do a simple "sudo reboot", e.g. from an ssh session, then you will see that the BBB locks up as it is shutting down and does not actually ever reset. I am sure you will see the same bug. The only way to recover then is to power reset which is damn unfortunate for those of us who have their BBB located remotely.

WarheadsSE asked above for us to capture the console output so I bought a FTDI breakout cable and did that but he has never come back here to comment further. I would like to know how/where to report this bug.
bulletmark
 
Posts: 25
Joined: Tue Oct 13, 2015 10:17 pm

Re: [BBB] Reboot/halt problem

Postby summers » Mon Aug 14, 2017 1:38 pm

Well its tricky for WSE, as it looks like a kernel update is breaking the BBB; and the kernel is controlled beyond ArchArm.

Looking at http://ix.io/yki/ This suggests its something to do with systemd-udevd hanging, and that this happened across an update to the kernel, then its something in the kernel that is causing systemd-udevd to hang.

Now systemd-udevd takes messages from the kernel, to set up device files - so much of this is tested on all architectures. Question is what specific to arm linux has changed to mess that up. It is unlikely to be the systemd-udevd <--> kernel interface, as that used on all architectures; so its probably something to do with a BBB interface - e.g. could be something like a duff device tree.

If I get time, I think and dig some more - hassle for me though is getting the time.
summers
 
Posts: 175
Joined: Sat Sep 06, 2014 12:56 pm

Re: [BBB] Reboot/halt problem

Postby summers » Mon Aug 14, 2017 3:10 pm

Have to wonder if
Code: Select all
[  197.343726] systemd-udevd[113]: seq 1430 '/devices/platform/ocp/44e0b000.i2c/i2c-0/0-0024/tps65217-charger' killed


Is a smoking gun. Its defined here:

https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/am33xx.dtsi

But no recent changes, so don't know if the code that handles that has changed ...

Suggests maybe when the machine is up, looking at the i2c interfaces ....

TPS65217 btw is part of the power supply on the BBB ... and is wired into I2C0 (see section 6.1.12 of the system reference manual)
summers
 
Posts: 175
Joined: Sat Sep 06, 2014 12:56 pm

Re: [BBB] Reboot/halt problem

Postby kilian » Tue Aug 15, 2017 2:21 pm

I might have missed it, but it seems nobody has mentioned yet that disabling the problematic kernel module is a viable workaround for now.

The line
Code: Select all
blacklist tps65217_charger
in /etc/modprobe.d/modprobe.conf does the trick for me.

It has been working flawlessly in my setup for several weeks now. No more shutdown and reboot issues with current kernels. Doesn't change the fact that someone with deeper knowledge should address the real source of the issue.
kilian
 
Posts: 8
Joined: Tue Aug 15, 2017 1:53 pm

Re: [BBB] Reboot/halt problem

Postby bulletmark » Tue Aug 15, 2017 9:41 pm

@kilian, would you please mind sharing where you found that solution? Also, what side-effect or compromise does it impose?
bulletmark
 
Posts: 25
Joined: Tue Oct 13, 2015 10:17 pm

Re: [BBB] Reboot/halt problem

Postby summers » Wed Aug 16, 2017 9:15 am

If that solves it - last change was made 3rd April, which is probably the right time frame, see:

https://github.com/torvalds/linux/commi ... _charger.c

So guess first think to try is revering the 3rd April patch and see if it cures the problem. Looking at the changes, and it looks innocent - so I can't see that being the problem.
summers
 
Posts: 175
Joined: Sat Sep 06, 2014 12:56 pm

Re: [BBB] Reboot/halt problem

Postby kilian » Wed Aug 16, 2017 4:21 pm

bulletmark wrote:@kilian, would you please mind sharing where you found that solution? Also, what side-effect or compromise does it impose?

Well, from the logs it was obvious that the tps65217_charger kernel module was misbehaving. I tracked down the changes but had no time to investigate further, so I decided that I could do without the driver and simply disabled the module.

Concerning the side-effects or compromises, I guess this just precludes the possibility to use the battery charging part of the PMIC. As I have not connected a battery to my BBB and have not even remotely any idea of how the kernel's charger controlling infrastructure works and interacts with userland tools, I simply don't care about this loss of functionality.

summers wrote:So guess first think to try is revering the 3rd April patch and see if it cures the problem. Looking at the changes, and it looks innocent - so I can't see that being the problem.

I am pretty sure that is the wrong one. The problematic changes are the ones by Milo Kim from Jan 4. They entered mainline in 4.11.

[edit]

I have just seen that the latest linux-am33x package now has the driver built-in, so the modprobe workaround will not work anymore.

I might have found the real solution, though. It is now working for my setup, including the driver, as far as I can tell. I just cannot say why exactly and need to look into it a little bit more.
In the meantime, might those of you who still experience the issue post your bootloader and device-tree version and configuration? My guess is that this is the origin.
kilian
 
Posts: 8
Joined: Tue Aug 15, 2017 1:53 pm

Re: [BBB] Reboot/halt problem

Postby bulletmark » Wed Aug 16, 2017 10:51 pm

The problem is fixed with yesterday's update of linux-am33x to 4.12.7-2. My BB now reboots fine. Fix seems to be to compile the tps65217_charger module in to the kernel as per https://archlinuxarm.org/packages/armv7 ... 037182101b.
bulletmark
 
Posts: 25
Joined: Tue Oct 13, 2015 10:17 pm

Re: [BBB] Reboot/halt problem

Postby kilian » Thu Aug 17, 2017 12:18 pm

I think I figured it out.
bulletmark wrote:The problem is fixed with yesterday's update of linux-am33x to 4.12.7-2. My BB now reboots fine. Fix seems to be to compile the tps65217_charger module in to the kernel as per https://archlinuxarm.org/packages/armv7 ... 037182101b.

While that suppresses the problem for now, I would not really call it a fix. The issue is now shifted to the USB driver. Are you using the peripheral USB port (usb0, the one that can be used to power the board)? Is it still working?

The problem seems to be that the TPS65217's USB IRQ is being claimed by two different drivers, tps65217-charger and vbus (for the USB infrastructure).
https://github.com/torvalds/linux/commit/d680414d0f421563a9746c29d82e6794a604cf0c introduced an attempt to use the interrupt that the TPS generates when it senses a change in the USB power in order for the USB driver to be notified when a USB cable is plugged in.

As the USB drivers were built-in and the TPS driver used to be a module, its IRQ register request came after the one of vbus and failed. While it should have handled this error gracefully, it obviously didn't and got stuck.
Now that the tps65217-charger driver is compiled in as well, coincidentally it is being executed before the USB driver initialization, so the IRQ registration succeeds while the USB driver fails.

You should be seeing something like this in your kernel debug messages:
Code: Select all
[    3.336879] genirq: Flags mismatch irq 188. 00002000 (vbus) vs. 00000000 (tps65217-charger)
[    3.345587] musb-dsps: probe of 47401400.usb failed with error -16

As a result, the driver for the usb0 port controller musb-hdrc.0 is unusable.

My solution for now is to patch the device-tree in order to undo the aforementioned commit d680414.
This makes sure the usb0 stays usable and also should enable the use of older kernel packages (prior to 4.12.7-2).
I tested it on 4.12.5-1, which works fine.

So, if you can live without usb0 you shouldn't have to do anything, but if you are experiencing problems with usb0 or want to use older kernel packages, patching the device-tree could be considered as a solution.
Nonetheless, this is still just a workaround until kernel developers have figured out how to share the interrupt between both drivers.
kilian
 
Posts: 8
Joined: Tue Aug 15, 2017 1:53 pm

Re: [BBB] Reboot/halt problem

Postby bulletmark » Thu Aug 17, 2017 12:44 pm

@killan, I am not using the USB port and yes, I get the following errors since boot in my journal:
Code: Select all
bb:~ journalctl -perr -b --no-pager
-- Logs begin at Fri 2017-08-04 07:42:08 AEST, end at Thu 2017-08-17 22:39:40 AEST. --
Aug 17 08:46:38 bb kernel: clocksource_probe: no matching clocksources found
Aug 17 08:46:38 bb kernel: wkup_m3_ipc 44e11324.wkup_m3_ipc: could not get rproc handle
Aug 17 08:46:38 bb kernel: musb-hdrc musb-hdrc.1: Failed to request rx1.
Aug 17 08:46:38 bb kernel: omap_voltage_late_init: Voltage driver support not added
Aug 17 08:46:38 bb kernel: genirq: Flags mismatch irq 188. 00002000 (vbus) vs. 00000000 (tps65217-charger)
bulletmark
 
Posts: 25
Joined: Tue Oct 13, 2015 10:17 pm

PreviousNext

Return to Texas Instruments (TI)

Who is online

Users browsing this forum: No registered users and 4 guests