Power Issues and LPC wakeup

I bought these FYI.

1 Like

I’ve had this twice. Healthy batteries (it’s only two days old) and after power-off (Circle 0) the LPC OLED is unresponsive to long or short keypress.

I have tried the LPC reset (the button nearest the battery connector) but this did not work. The only solution is disconnecting and reconnecting the battery cables.

I wonder if it’s the keyboard? The reason i wonder is I had one boot where the keyboard did not work, (could not log in) but the circle button did.

It’s very hard to reproduce.

1 Like

Yep, had this too, but also now have dead batteries…

I’m also having problems with the lpc not booting a few times now after leaving it off overnight. LPC reset button has no effect. But I can pull out the batteries and put them back in and it will work then.

The reform has been hooked up to power the whole time.

I have been shutting down using sudo shutdown not the circle+0 command.

The status page shows:
MREF2LPC R3 20210925
normaL, 552, 430, 1795
MNT Reform Keyboard
R1 20211213

I updated the keyboard and LPC firmware as per these instructions:

Looks like the LPC is already at latest, but the keyboard updated to 20220221. We’ll see if that improves things over the next few days.

I’ve also been having this problem, and after spending a lot of time this past week diagnosing and testing, I was able to figure out the problem (and a fix). The TLDR is that it is indeed a subtle issue with the keyboard firmware.

I’d be very appreciative if anyone else who’s encountered this problem, could verify that (a) they can readily reproduce the issue the same way I describe below, and (b) applying the minimal patch below (not the debug patch) means that the issue no longer occurs with the same repro procecdure.

cc @Chartreuse (who I believe wrote the original powersave code)

Summary:

The problem is that calling wdt_disable() from inside the WDT interrupt routine (ISR(WDT_vect)) is apparently dodgy/racey, and sometimes causes the MCU to never fully wake from sleep. I suspect this is because the AVR takes multiple cycles to fully wake from sleep (according to the datasheet). The problem can be avoided by replacing the wdt_disable() call with Delay_MS(1), and adding wdt_disable() immediately after sleep_disable() inside keyboard_power_off().

I also found a few other small improvements to the powersave code. I’ll create a Merge Request with all the changes as soon as I can - @minute, can you please approve my account on source.mnt.re?

In the meantime, the minimal patch (against commit 1a9c21f) to avoid the problem is:

diff --git a/reform2-keyboard-fw/keyboard.c b/reform2-keyboard-fw/keyboard.c
index 840dad6..1efeea9 100644
--- a/reform2-keyboard-fw/keyboard.c
+++ b/reform2-keyboard-fw/keyboard.c
@@ -342,7 +342,7 @@ void setup_hardware(void)
 ISR(WDT_vect)
 {
   // WDT interrupt enable and flag cleared on entry
-  wdt_disable(); // Disable watchdog for now
+  Sleep_MS(1);
 }

 /** Event handler for the library USB Connection event. */
diff --git a/reform2-keyboard-fw/powersave.c b/reform2-keyboard-fw/powersave.c
index 4edcc8f..5927304 100644
--- a/reform2-keyboard-fw/powersave.c
+++ b/reform2-keyboard-fw/powersave.c
@@ -60,6 +60,7 @@ void keyboard_power_off(void)
     sleep_cpu();        // Actually go to sleep
     // Zzzzzz
     sleep_disable();    // We've woken up
+    wdt_disable();      // Disable watchdog for now
     sei();
     // Check if circle key has been pressed (active-low)
     // If not reset the watchdog and try again

More details:

The first thing I noticed is that the device doesn’t have to be sleeping for a long time, for the problem to happen. One time I happened to press Circle (actually hold for 2+ secs) again straight away after using Circle + 0, and the problem occurred - the menu didn’t come up. There was still 3.3v on SYSCTL, and after resetting the keyboard, Circle + s showed the LPC still reporting many cycles in “normal” mode (ie. the LPC and batteries were fine).

I then tested the keyboard some more by holding down the Circle key, and pressing 0 every time the keyboard FW woke up. This turned out to be a reliable repro - the problem would eventually occur, although sometimes it could take a while, eg. 5-10 mins sitting there pressing 0 every 2 seconds.

I sped up the testing process considerably by using this patch, which removes the “goodbye” animation, lowers the timeout from 1 sec to 0.25 secs, and removes the need to hold down the Circle key:

diff --git a/reform2-keyboard-fw/menu.c b/reform2-keyboard-fw/menu.c
index 796f87d..aa9532c 100644
--- a/reform2-keyboard-fw/menu.c
+++ b/reform2-keyboard-fw/menu.c
@@ -77,7 +77,6 @@ void jump_to_bootloader(void) {
 int execute_meta_function(int keycode) {
   if (keycode == KEY_0) {
     // TODO: are you sure?
-    anim_goodbye();
     remote_turn_off_som();
     keyboard_power_off();
     reset_keyboard_state();
diff --git a/reform2-keyboard-fw/powersave.c b/reform2-keyboard-fw/powersave.c
index 4edcc8f..4837f05 100644
--- a/reform2-keyboard-fw/powersave.c
+++ b/reform2-keyboard-fw/powersave.c
@@ -51,7 +51,7 @@ void keyboard_power_off(void)
   do {
     wdt_reset();
     WDTCSR = (1<<WDCE) | (1<<WDE); // Enable writes to watchdog
-    WDTCSR = (1<<WDIE) | (1<<WDE) | (0<<WDP3) | (1<<WDP2) | (1<<WDP1) | (0<<WDP0); // Interrupt mode, 1s timeout
+    WDTCSR = (1<<WDIE) | (0<<WDE) | (0<<WDP3) | (1<<WDP2) | (0<<WDP1) | (0<<WDP0); // Interrupt mode, 0.25s timeout

     // Enter Power-save mode
     set_sleep_mode(SLEEP_MODE_PWR_DOWN);
@@ -63,7 +63,7 @@ void keyboard_power_off(void)
     sei();
     // Check if circle key has been pressed (active-low)
     // If not reset the watchdog and try again
-  } while(PINC&(1<<6));
+  } while(false);

   // Resume and reinitialize hardware
   setup_hardware();

I also added a lot of debug code, including clearing the OLED (instead of turning it off) and poking various info to it during the sleep / wake process. This eventually allowed me to confirm that the MCU was getting stuck inside the sleep_cpu(), and never returning from it. Another thing I noticed was that when I had calls to gfx_poke(...); gfx_flush(); inside the interrupt routine, the problem was significantly less likely to occur (or maybe it never occurred, I can’t recall exactly).

After reading the Sleep and WDT (Enhanced Watchdog Timer) sections of the ATMega32U4-AU’s datasheet, I started investigating some more (eg. confirming that the WDTCSR flags were actually being set, checking when the WDT interrupt routine was running, etc). I noticed the datasheet mentions that the AVR MCU takes several cycles to fully wake from sleep (restarting clocks/timers, etc). This, plus the problem being less likely when the interrupt routine took longer to complete, led me to try an interrupt routine which only waited for 1ms, and calling wdt_disable() from the main execution thread after sleep mode had been disabled.

(I also made the WDT use interrupt-only mode, not interrupt-and-reset mode, to remove the possibility of accidental WDT-induced bootloops. This is an issue I discovered while searching for problems with ATMegas and the WDT, but it turns out this wasn’t the cause here.)

This change (disabling the WDT from outside its interrupt routine) fixed the problem - using the 4x speed-up testing, I went 5 mins without the problem reoccurring. This test was equivalent to (at least) 20 mins of “normal” testing, and my previous “normal” testing had always struck the problem in under 10 mins.

6 Likes

That’s very good work.

That sounds awesome devkev.

It was happening every night for me, I’ve not tried your patch yet, but I’ve changed my process for starting up and shutting down and it seems to have improved things. Still trialing it, not 100% sure yet, so this is a bit of guesswork.

Don’t tap the circle button to wake up the keyboard. I make a conscious effort to hold the button. I think maybe tapping it while asleep puts it in a weird state?

Shutdown via Circle+0. I was using shutdown via the terminal, maybe that makes the issue you found more likely?

Not getting a response from a short tap, when the keyboard is in deep sleep mode, is normal and expected. While in deep sleep, the keyboard doesn’t respond to any external events. It wakes up once per second to check if the Circle key is depressed. If not, it goes back to sleep, but if so, it stays awake and goes to the menu. So it’s normal to have to hold down Circle for about a second to wake up the keyboard. Short taps while it’s in deep sleep should have no effect (and definitely shouldn’t induce any sort of weird state).

There’s a separate bug where waking up the keyboard, while also holding down some other key, causes the menu to not be displayed. I think I know what’s causing that, but haven’t looked into it properly yet (one thing at a time).

I’ve also noticed that there’s an issue where shutting down via the OS (which I assume does an ACPI shutdown) sometimes gets the keyboard to go into sleep (the “goodbye” animation plays), but sometimes it doesn’t. When this happens, Circle + b still reports the SOM as being “On”, and the current draw is non-negligible (about 0.1-0.2A, IIRC?). So perhaps the bug is actually that the system itself hasn’t actually powered off like it should. I think I remember reading about that in another thread somewhere on the forums, but I’m not sure. eg. maybe this is a known issue with the v2 system image, and the v3 beta fixes it?

Anyway, yes, the advice is generally to cleanly shutdown the OS first, and then also do Circle + 0, “just in case”.

1 Like

Merge Request has been created now: Keyboard powersave fix (!26) · Merge requests · Reform / reform · GitLab

3 Likes

I’m confused why disabling the watchdog timer from within its interrupt handler would be a race condition, guess there’s something odd happening with the AVR as that shouldn’t make us exit interrupt handler. Even if it took multiple cycles to wake from sleep it shouldn’t be anywhere near the 1 second WDT interval.

I never personally encountered any issues with my reform since adding that, I guess whatever it is is extremely rare? Or perhaps this is some other hidden race condition and not the true culpret?

Or perhaps something else in the sleep code is the issue where you’re trying to wake it up before it’s fully asleep? I’ll have to give this a test myself on the original code, though I’m currently away from home for an extended period and don’t have my reform with me.

The reason it was done with the WDT and all that was because the keyboard pin that could be used to determine the circle key was not an interruptable pin, or at least couldn’t be used in deep sleep more. So polling had to be done. 1s was chosen to reduce the on time of the microcontroller since it would take a fair number of cycles to power up (as you said) then do a single row poll then shut back down,

I don’t think a boot loop is possible as after a reset I believe the WDT is disable until manually enabled again in the code.

I’m still surprised there’s an issue as I’ve never had any even since first implementing the code where the keyboard did not wake normally, even before the corresponding LPC sleep code was added. I’ve had it sit in that low power state for over a week even with no issue.

2 Likes

I got another won’t turn on situation today, even with shutting down only by the circle menu.

That has inspired me to build and flash your branch, so we’ll see how that goes :sweat_smile:

1 Like

Yeah, it’s not very clear why this solves the problem. My best guess is that it might be something along the lines of:

  • The WDT fires the interrupt
  • MCU starts waking up
  • MCU starts executing the WDT interrupt routine
  • WDT gets disabled
    • (If other interrupts can fire during an interrupt handler, then maybe this doesn’t take effect, if the 4-cycle WDCE rule is broken?)
  • WDT interrupt routine finishes
  • The WDT-interrupt dispatch code somehow gets confused, because it fired a WDT-interrupt, but now it looks like the WDT is disabled.
  • Because of this confusion, it puts the MCU back to sleep - except that now, since WDT has been disabled (and there are no other interrupt sources), the MCU can never wake up again.
  • The confusion could be racy because the main MCU clock is still in the process of starting up, which IIRC takes a little time (4 cycles?).

But I agree this is pretty vague and hand-wavy.

I similarly don’t understand why the Delay_MS(1) should be required in the interrupt routine - all I know is that without it, the problem somehow remained.

There’s also this quote from AVR132: Using the Enhanced Watchdog Timer, from Section 2.3 which discusses using WDT as a periodic sleep interrupt/wakeup, as well as a reset source (emphasis added by me):

It is also possible to set up the WDT to work as a wakeup timer when entering sleep mode, and switch to WDT System Reset operation when back in active mode. … the application … has to enable the WDT Interrupt Mode prior to entering sleep mode every time.

Re-enabling the WDT Interrupt Mode inside the interrupt handler is not recommended, as it could cause the WDT to get stuck in WDT Interrupt Mode, if some parts of the code fail.

It doesn’t go into any specifics about what “some parts of the code fail” actually means, and I also haven’t thought about it especially deeply, but it seems plausible that for the similar reasons, it might be true that “disabling WDT Interrupt Mode inside the interrupt handler is not recommended, as it could cause the WDT to get stuck in non-Interrupt Mode”. And since the WDT interrupt is the only thing waking the MCU from sleep, not having it could maybe explain why it gets stuck forever in Deep Sleep?

The flipside is that Section 3.2 (Using the WDT as a Wakeup Timer) of the same document describes the behaviour of a sample application (which I can’t actually find the code for):

The interrupt handler disables WDT Interrupt Mode, so that no unnecessary interrupts are generated if the main loop runs long before entering sleep mode once again.

So yeah, it’s all a bit confusing.

I considered this, since the wdt_disable() in the interrupt handler would mean that an accidental “early fire” of the WDT (ie. before sleeping) would cause the sleep to never wake. But the WDT is configured with a 1 sec timeout immediately before going to sleep, and it would be very strange (and contrary to observations) if the MCU took over 1 sec to go to sleep. My OLED debug output did show that the MCU executed instructions right up until the sleep_cpu() call, and then ran the interrupt handler (to completion), but never resumed executing instructions after sleep_cpu().

There was a separate bug in the firmware’s sleep code where interrupts were only disabled the first time the WDT was configured. This could (theoretically) cause the WDT to not be configured (if an interrupt happened during the 4-cycle WDCE window), which would cause the MCU to never wake up from sleep. However, I ruled this out because (a) the problem persisted after I disabled interrupts immediately prior to setting the WDT (ie. inside the loop), (b) in my testing I was observing the problem on the first pass through the loop (actually the loop had been adjusted so it never repeated), when interrupts were correctly disabled during WDCE.

AVRs have a weird thing where after a WDT reset, the WDT remains enabled, but the timeout is reset to 16ms. So if the bootloader/MCU init doesn’t disable the WDT within the first 16ms, the MCU will get reset again. You’re right though that this can’t be the issue here, because the Reform KB FW does disable the WDT pretty early on.

Yes, please do give this a try when you get the chance! I’m very curious if my repro steps will trigger the problem on your unit, since you say you’ve never seen this before, whereas others on this thread have run into it occasionally without even trying. So it could easily be a problem that somehow only affects some units, and if so then it would be very good to understand that.

I was having startup problems every day after leaving it off overnight. Shutting down via the circle menu instead of sudo shutdown made it a little better, but it was still happening. I’ve had no problems since applying the patch 9 days ago.

A big thanks to devkev, it’s a huge relief not to have to pull the batteries out every day, haha.

1 Like

I’ve found I only need to unplug the battery packs from the motherboard, not take out all the batteries each time, but still, I think I’ll try the patch.

It should be possible to recover from the problem by pressing the reset button RST (SW83) on the keyboard (not LPC). Unfortunately, this still does require unscrewing and removing the keyboard bezel.

Also my unit has a weird issue (that I’m still looking into) where it seems to always reset into flash mode regardless of the state of the PROG switch (SW84). In which case yes, unplugging and replugging the battery packs from the motherboard is the easiest workaround. But I would definitely welcome more testers of the patch!

The keyboard reset didn’t work for me in the past.

I have had another power on fail this evening so I tried SW83 again. It did not work the first time, but after switching the prog switch on, hitting SW83, switching back and hitting SW83 again it has reset the keyboard successfully.

That is better than having to open up the bottom at least. Thanks!

As another data point, I had shutdown via shutdown last night. Not the circle menu.

Can you confirm that you mean the keyboard wouldn’t wake up and present the menu after holding down only Circle for several seconds? This would be the first time it’s happened with the patch applied, to my knowledge. So perhaps the patch might only make the problem much less likely? I haven’t had any problems in the past 2 weeks, but I’ve only been checking on my Reform every so often.

This sounds awfully similar to the problem that I have with SW84. Have you tried (or can you try) just pressing SW83 on its own while the system is running? And then doing lsusb with an external keyboard plugged in?

I think what is supposed to happen is that the keyboard resets, displays the startup logo, and then is functional again. But what I get (basically every time) is that lsusb shows that the keyboard has gone into DFU mode.

Sometimes I can get it to properly reset if I switch SW84 on/off. But usually I end up getting out of DFU mode by just flashing an image (if the system is on), or by pulling and replugging the connector from J2 (UART) (if the system is off, which is where the keyboard actually gets power when the system (and hence USB) is off).

If your SW84 behaviour is similar to this, then I wonder if perhaps it could somehow be related to the deep sleep problems…?!

The other thing I’ve noticed with this is, if I place my finger on the left side of SW84 (touching, but not pressing hard), then it makes SW83 much more likely to reset properly without going into DFU. I think this means the PROG pin is floating, and SW84 isn’t actually connecting it to GND. I’ve tried reflowing the solder joints on SW84 but that hasn’t helped. I need to do some more rigorous analysis of the whole problem…

That’s correct.

Yep that’s exactly what I’m seeing happen. With SW84 in the 1 position, about 50% of the time when I hit SW83 it goes into DFU mode, the backlight does not come on and lsusb shows Atmel Corp. atmega32u4 DFU bootloader.

If I keep hitting the button, eventually the backlight and screen will come on and it will boot into the correct mode.

That seems to be the opposite for me. If i make my finer push against the bottom of SW84 it seems more likely to boot into DFU.

I should also note is sometimes when I open up the laptop the OLED is on with the reform logo. When it was off when i originally shutdown and closed it up.

Thanks for your help and info!

1 Like

It should be possible to recover from the problem by pressing the reset button RST (SW83) on the keyboard (not LPC). Unfortunately, this still does require unscrewing and removing the keyboard bezel.

Yeah, but 6 < 10, and I don’t need to turn the reform over. Thanks for the tip.

In any case, I flashed the new firmware and haven’t had any problems with the LPC since.

1 Like