Strange mouse problem or so I thought

The place to discuss Linux and Unix Operating Systems
Forum rules
Behave
Post Reply
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Strange mouse problem or so I thought

Post by Grogan »

The past few nights in IceWM, usually noticed while working the web browser, my mouse has been freezing up for a few seconds. Par for the course, this mouse has always thrown latency errors on the root console in recent months (something like "MSI mouse: input lags 26ms behind. Your system is too slow"). I think the mouse does have some problems, it's a 5'ish year old cheap, corded MSI gaming mouse. I quite like it but it's wearing out. It's one of those optical mice that doesn't use a visible light. I can't remember if it's a laser mouse or not, doesn't matter. Anyway, after these mouse freezes I was seeing messages like "4000ms" corresponding to the delays.

Last night it was critical, lucky to be able to get the mouse to move at all except for brief periods when it happened. It's actually not, or not only, the mouse:

Code: Select all

[16135.954439] amdgpu 0000:03:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
[16136.134955] amdgpu 0000:03:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
[16141.611637] amdgpu 0000:03:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
[16144.425953] amdgpu 0000:03:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
I'm not completely sure what "DMUB" is yet, but it's got to do with the amdgpu "display manager" driver component that communicates with the DC ("display core") component. Simplistically, the DM translates "drm" to "dc" requests.

It IS something initialized by firmware, though

Code: Select all

[    0.565036] [drm] Loading DMUB firmware via PSP: version=0x07002A00
[    0.980848] [drm] DMUB hardware initialized: version=0x07002A00
Two things have changed in Arch. libdrm and libinput (X11).

One thing that has changed here is an even faster display, with no vsync in IceWM. This has not yet happened in XFCE where I'm using its compositor. The latency warnings for the mouse on the root console have always still occurred in XFCE though (e.g. 26 ms). Those are ignorable though, and the messages are heavily rate limited.

I yanked my USB mouse and plugged it into one of the front ports to make it easier to get at. Next time it happened I tried yanking and re-plugging the mouse from USB and that didn't fix it, still floods of those drm errors in the kernel ring buffer. Quitting X and restarting it fixes it though. Funny how it was one of those things where it would just coincidentally start working again for a few seconds and it seemed banging, blowing the mouse fixed it but it wasn't that at all.

Nothing is frozen, it just seems to be the mouse. I can bring up my window list in icewm and scroll with the keyboard and bring windows up in the foreground etc. It is likely the mouse losing synchronization with X11.

Not sure what, if anything, I'm going to do about it. I'm considering just going and buying a cordless mouse/keyboard combo so it can just be one receiver on the USB bus. I'm not sure that has anything to do with anything, it would just be the path of least resistance since I really need to get a new mouse anyway. I can also grab the latest linux-firmware, newer than what even Arch provides. My graphics card, NAVI32, will still get firmware updates.

P.S. The reason I can't find much about "DMUB", outside of viewing source code snippets, is because it in itself is an abbreviated acronym, for "DMCUB", which is "Display Microcontroller Unit B" that these graphics cards have.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

Yep, I found newer firmware, 20240610 (today!), and the checksums of all the navi_ files have changed from what I have on disk. So I'll try that.

P.S. Heh... of course the checksums are different, even if the files haven't changed. Arch compresses them with zstd, and the actual distribution of them doesn't. I was only comparing the checksums. I may have done nothing but replace the firmware files I use with uncompressed ones if they haven't changed.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

This has not happened since I dropped in a new /lib/firmware (well, /usr/lib/firmware on Arch lol), a month newer than Arch's. I used IceWM all day and night yesterday. I don't think it's a coincidence. Moreover, I didn't see any mouse latency messages on the root console when I quit, but that doesn't necessarily mean they weren't there because while icewm itself runs pretty cleanly and quietly, shit like Firefox gl_thread warning spam (actually the warnings come from mesa) may have been obscuring it. There would not have been many though if I couldn't see any. I would say at least normal behaviour since the new rig.

The mouse went back in the original rear motherboard port that I had it in before, since I'm not going to be changing it just yet and it had nothing to do with the physical anyway.

I suspect this started happening after Arch upgraded libdrm, and it persisted across a few days. I haven't (yet) taken control of that package, so I got it through pacman -Syu. The libinput package too could be related.

Also possible that mesa needed a recompile, but these are supposed to be compatible new versions of dynamic libraries (that didn't change their sonames). That's not necessarily everything though, the includes change too.

I'm going with the firmware, for stopping the issue immediately (but I'm not certain any relevant files even changed... I'll just not bother looking a gift horse in the mouth)
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

So I had that mouse and DMUB command queue shit happen again tonight. It's got to do with something happening after the display comes out of sleep, some sync thing. This has only happened in IceWM (no compositor) so far. Playing games, go for a smoke, come back, move mouse to bring back display, open Firefox, flip to Desktop 2 pop open terminal and the mouse grinds to a halt and then will only move a pixel or two. While this is happening I may have a hard time getting something else in focus, but when I do get my window list/task list up and move to Firefox and hit enter, the mouse will work again for a short time. When it halts, bring something else in focus like ctrl+F to bring up the Find bar at the bottom, and the mouse will work again. At that point I quit Firefox and attempt to quit the window manager. Wait a bit and I'll get mouse input again briefly if not etc.

All the while it's spamming the log buffer with the DMUB command queue errors.

I triggered it again after a reboot. Go for smoke, bring monitor back up by moving mouse, open Firefox, go to desktop 2, pop open terminal and... same shit.

I'm not sure what's causing it, but that linux-firmware update wasn't what fixed it. Well, I do know what's caused it, sync problems. It could be the mouse starting to go after all, intermittently acting up.

This time I shut right down and powered off at the PSU etc. and we'll see if it happens again. I may indeed go get a keyboard and mouse set. Something that isn't ancient, meant for modern USB ports etc. The only problem is scroll wheels, I hate those smooth scroll wheels that move every time you middle click them. That's a mouse button too, I hate mice that do that. I prefer a firm, notched scroll wheel and every notch is one line, not some Microsoft interpretation of what a scroll wheel is supposed to do.
User avatar
Zema Bus
Your Co-Host
Posts: 1115
Joined: Sun Feb 04, 2024 1:25 am

Re: Strange mouse problem or so I thought

Post by Zema Bus »

I had a Logitech mouse die in a similar way, it would intermittently stop moving then start working again, it finally died completely. This was on my work laptop while working so I had to scramble to find another mouse to use. I got a VicTsing mouse about 5 years ago to replace a flimsy Microsoft mouse that was giving me trouble, it worked out really well and I'm still using it, it has a nice solid feel and the scroll wheels are notched. I have around 4 of them now, I'm using one on my gaming computer. They look like this (except all of mine are wired versions):
51rZ2vYS9fL._AC_SL1024_.jpg
51rZ2vYS9fL._AC_SL1024_.jpg (53.22 KiB) Viewed 47535 times
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

I missed your last post about that mouse, it looks like a nice one.

I came back to say I've had that

[drm] *ERROR* Error queueing DMUB command: status=2

problem again, quite severely. I now think it's Mesa. It could also be libdrm, but it's no coincidence that I did a Mesa build when that got upgraded back when this problem all started, so...

It started happening again bad (several times while using Firefox) after a current mesa build yesterday. I hadn't seen that problem in a while, it may have gotten reintroduced. In both IceWM and Trinity. (I killed that picomp compositor too, I think that was causing other instability with display power management... leaving graphics card in bad state, but that's when the DMUB queue error started happening in Trinity). It happened again today too, very soon after I woke my display. I'm in XFCE now, I've never had it happen there.

I just found a thread where someone reported this problem on a Framework laptop 28 days ago, similar symptoms, display PM involved (but some saying not always), with me too's as recent as 2 days ago (multiple distros). Someone in the thread said it seemed like Mesa, AND because of, wait for it, building Mesa with LTO (which I do lol but this isn't the usual kind of problem when LTO breaks it. There's a good chance that's it though, because LTO subtly breaks things. I think this is the last straw for that, it's not worth the risk of causing needle-in-haystack problems.) however he said it did happen once since, but unrelated to power management.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

By my reasoning this is ultimately a firmware issue. While things like libdrm and mesa may be hitting it, these messages from the DRM layer are failures to queue commands on the display manager microcontroller (hardware) and it takes more than a restart of X (and I think even a reboot isn't enough, but that could be coincidence too) to make it go away for any length of time. In my books, any hardware/driver fault gets a complete power off nowadays. Hardware is weirder than it used to be.

I see this commit to linux-firmware

4 days ago amdgpu: update DMCUB to v0.0.225.0 for Various AMDGPU Asics

I see some bug fixes in there, and while I don't actually know which of those dmcub firmware versions my amdgpu driver loads ("via PSP"), and it wasn't tested on the current stable kernel's amd display core version (likely Linux git), I'm going to pull the Linux firmware tree from git.

I guess I should wait and see if the problem happens again after doing a mesa build without LTO (I rebooted after too, and I'm using Trinity, so we'll see) but ultimately the solution will eventually be there, if not already. You shouldn't be able to crash firmware (whether on chip, or blob loaded by driver)

P.S. It wasn't mesa LTO, it happened again in Trinity soon after coming out of display PM. I was skeptical on that, there are too many coincidences in reports where behaviour is intermittent (and he did say it happened once).

So I replaced the /usr/lib/firmware/amdgpu directory with the one from linux-firmware.git that has the updated dcmub blobs. We'll see.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

No, the firmware update didn't help. This is so reproducible, too. Go for smoke, wait for display PM to kick in (if it hasn't already) and within a minute or two it will happen, using Firefox (but that could just be because that's what I'm doing, or have running)

So far, this has never happened with XFCE, using its compositor (that isn't actually doing any effects, just enabled and regulating). Since it's so reproducible right now, I'm going to see if it actually does.

I guess the next step is to go back to before the problem started. libdrm 2.4.120 and stable mesa (I'll have to because I can't compile current mesa with that libdrm) and see if I can reproduce it.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

Yeah, no, I can't reproduce this in XFCE4, at least not with display power management like I can every time elsewhere. I tried 3 times.

Ultimately it's driver and firmware issues being hit, but the key probably is the compositor, for whatever reason. This also did not happen in Trinity until I stopped using the picomp compositor (other instability not related to this). I did try enabling Trinity's compositor, but that's just a software based Compton used internally and may not actually be doing anything except for when using effects (I mean it's not taking over your whole display canvas). There's a "use GL" checkbox, but that's always been bad juju in the past, so I wouldn't care to try it.

This likely didn't go away. For a while I was sort of avoiding it by only playing games in IceWM and then avoiding it while using picomp with Trinity. Yesterday's mesa build may not have had any effect on that (it was around the same time I stopped using picomp).

Tonight I think I'll try turning the display PM off with xset while using IceWM and see if it ever happens while using Firefox in between gaming.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

So yeah, no display power management, no problem, and XFCE is the only place where I have no problem with display power management (and that's just xorg's DPMS facility too, I don't install xfce's power manager). Having the composited canvas probably prevents that from going out of sync, or causes that IC on the card to initialize properly after being powered down (BACO etc.). The reason the fault doesn't happen instantly could be that there's a big buffered request queue or something.

So at least it's avoidable until solved (and likely not any hardware failure... of which I'm very fearful of nowadays. I've had ridiculous hardware failures and ridiculous problems.)
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

Well, it had to happen. Now xfwm4 is broken too, I have no solace. The compositor suddenly can't enable vblank anymore through GLX. I noticed this last night on the root console when I quit XFCE in Arch. I haven't been using Arch as much while working on Gentoo since my most recent Mesa build, probably not long enough to get DPMS to kick in. (or just boot to icewm to play a game where I don't use display power management anymore... no compositor, no vsync)

GLX is obviously working (I'd have known that in 10 seconds) and this wasn't happening, and I'd have noticed this error and the bad behaviour. It is not happening on my Gentoo with a stable graphics stack (and locally compiled XFCE)

Code: Select all

(xfwm4:692): xfwm4-WARNING **: 13:10:06.723: Cannot find a matching visual for the frame buffer config.

(xfwm4:692): xfwm4-WARNING **: 13:10:06.723: Cannot find a matching GLX config, vsync disabled.
I just knew that was going to cause this same problem as soon as I saw it, and it does.

I can use libXpresent for vblank mode (tested and it seems to work at least initially) but I don't know much about that extension and I'm leery. Picom uses that I think, and I had problems with that too. Not the same problem, but coming back to a dead session a few times.

The trouble with xfce is that it takes years for any significant updating, so I wouldn't hold my breath for them to update their methods. I don't know why it doesn't like the GLX extension now. Stable xfwm4 hasn't been touched since Dec. 2022.

XFCE is actually still Arch distro packages here, I'm going to try compiling it now. (I probably have to downgrade Mesa but I shouldn't be running distro packages for this now anyway)

P.S. Now that I think about this, what it's most likely to turn out to be is the change in mesa, not mesa itself. Because I build my Mesa without the LLVM back end, with a recent change to Mesa (now present in 24.2.0-rc2 as well as git main, though I haven't installed and tested it yet) I now have no more swrast, so it's probably just barfing on the first error it gets trying to initialize glx. Software mode fails on swrast, then tries to initialize zink, which fails with "libGL only software" (it should) and then falls back to softpipe. I'll bet something similar is happening with the way they are intializing glx, but xfwm4 then disables vsync because of it.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

Actually, it was just my fucking Mesa build. I just built a Mesa 24.2.0-rc2 my way (LLVM disabled, softpipe for software rendering etc.) and xfwm4's compositor is working correctly again. I'm still going to do a build of XFCE though, I shouldn't be using this dogfood anyway.
User avatar
Grogan
Your Host
Posts: 2049
Joined: Sat Aug 21, 2021 10:04 am
Location: Ontario, Canada

Re: Strange mouse problem or so I thought

Post by Grogan »

Hah fuck... I think I pegged that behaviour about right after all. What they did was fix THAT in the Mesa -rc :-)

Code: Select all

[grogan@nicetry ~]$ LIBGL_ALWAYS_SOFTWARE=true glxgears
4188 frames in 5.0 seconds = 837.527 FPS
4208 frames in 5.0 seconds = 841.561 FPS
No more failures and fallback to correct driver (which is harmless, evidently except for such unforeseen problems with software that doesn't really do things all that smartly etc.)

I can use Mesa 24.2.0, soon to be the stable branch. It's certainly got everything I want in it at this time (anything new in the tree for my drivers at this point in time would be more akin to breakage than a new feature I want). I guess I'll get off the train at this stop for a while.
Post Reply