New Kernel

Post by **Grogan** » Sat Aug 03, 2024 8:37 am

Linux 6.10.3
https://cdn.kernel.org/pub/linux/kernel ... Log-6.10.3

A few NVME fixes caught my eye

nvme-pci: add missing condition check for existence of mapped data

This one is listed as nvme-pci but it's really a new switch for aspm power management because of problems with NVME

nvme-pci: Fix the instructions for disabling power management

[ Upstream commit 92fc2c469eb26060384e9b2cd4cb0cc228aba582 ]

pcie_aspm=off tells the kernel not to modify the ASPM configuration. This
setting does not guarantee that ASPM (Active State Power Management) is
disabled. Hence add pcie_port_pm=off. This disables power management for
all PCIe ports.

This patch has been tested on a workstation with a Samsung SSD 970 EVO Plus
NVMe SSD.

Post by **Grogan** » Sun Aug 11, 2024 6:20 pm

It's been a long time between point releases, but Linux 6.10.4 is out now
https://cdn.kernel.org/pub/linux/kernel ... Log-6.10.4

A lot of mptcp (multi-path) fixes as well as other networking fixes
drm fixes for various GPUs (including one for amdgpu "fix contiguous handling for IB parsing v2")
btrs fixes
ext4 fixes

Should be had

Post by **Grogan** » Wed Aug 14, 2024 6:08 pm

Linux 6.10.5 already
https://cdn.kernel.org/pub/linux/kernel ... Log-6.10.5

Maybe because of those btrfs corruption fixes.

Also a few amd/display fixes.

Post by **Grogan** » Mon Aug 19, 2024 5:50 am

Linux 6.10.6
https://cdn.kernel.org/pub/linux/kernel ... Log-6.10.6

A bunch of amd/display fixes, not much else that jumps out.

Post by **Zema Bus** » Mon Aug 19, 2024 6:27 am

I was going to do 6.10.5 last night but I was tired after spending most of the day working on the house so decided to do it today. So 6.10.6 then.

Post by **Grogan** » Mon Aug 19, 2024 7:05 am

Sometimes procrastination pays off

Post by **Grogan** » Thu Aug 29, 2024 8:24 pm

Linux 6.10.7 today (of course there is, I just rebuilt my kernel last night lol)

https://cdn.kernel.org/pub/linux/kernel ... Log-6.10.7

Since nobody knows possible max slots, and I haven't got any of those devices ("multitouch" touchpads) I guess that's great

Input: MT - limit max slots

commit 99d3bf5f7377d42f8be60a6b9cb60fb0be34dceb upstream.

syzbot is reporting too large allocation at input_mt_init_slots(), for
num_slots is supplied from userspace using ioctl(UI_DEV_CREATE).

Since nobody knows possible max slots, this patch chose 1024.

I haven't been using this lately, but I'll want that fixed

ksmbd: fix race condition between destroy_previous_session() and smb2 operations()

If there is ->PreviousSessionId field in the session setup request,
The session of the previous connection should be destroyed.
During this, if the smb2 operation requests in the previous session are
being processed, a racy issue could happen with ksmbd_destroy_file_table().
This patch sets conn->status to KSMBD_SESS_NEED_RECONNECT to block
incoming operations and waits until on-going operations are complete
(i.e. idle) before desctorying the previous session.

drm/amdgpu/vcn: not pause dpg for unified queue

commit 7d75ef3736a025db441be652c8cc8e84044a215f upstream.

For unified queue, DPG pause for encoding is done inside VCN firmware,
so there is no need to pause dpg based on ring type in kernel.

For VCN3 and below, pausing DPG for encoding in kernel is still needed.

v2: add more comments
v3: update commit message

drm/amdgpu/vcn: identify unified queue in sw init

commit ecfa23c8df7ef3ea2a429dfe039341bf792e95b4 upstream.

Determine whether VCN using unified queue in sw_init, instead of calling
functions later on.

v2: fix coding style

Some mptcp fixes (multipath tcp... I don't encounter that)

Thermal fixes (and a bunch of related ones)

thermal: of: Fix OF node leak in of_thermal_zone_find() error paths

commit c0a1ef9c5be72ff28a5413deb1b3e1a066593c13 upstream.

Terminating for_each_available_child_of_node() loop requires dropping OF
node reference, so bailing out on errors misses this. Solve the OF node
reference leak with scoped for_each_available_child_of_node_scoped().

drm/amd/amdgpu: command submission parser for JPEG

commit 470516c2925493594a690bc4d05b1f4471d9f996 upstream.

Add JPEG IB command parser to ensure registers
in the command are within the JPEG IP block.

Suffice it to say there are a lot of fixes that should be had, for a lot of things. It's a fairly lengthy changelog.

Post by **Zema Bus** » Fri Aug 30, 2024 7:18 am

And I just did 6.10.6 a few days ago

Post by **Grogan** » Wed Sep 04, 2024 5:14 pm

Linux 6.10.8
https://cdn.kernel.org/pub/linux/kernel ... Log-6.10.8

Not a biggy... some nfsd fixes, a bunch of various USB devices fixes (specific shit, not anything we'd have). A few amdgpu/display fixes as usual.

Post by **Grogan** » Fri Sep 06, 2024 6:33 pm

I think the reason we're not seeing point releases as frequently anymore is that they are doing better quality control on them. It also has not escaped my notice that there haven't been as many reverts lately either. Often times you'd see a new point release a day or two later that had only reverts, back when they were shitting them out like a goose. I think they are actually doing a release candidate stage for them now. They aren't posted, but they'd be like git tags or something in whatever "stable review" branch that is.

P.S. I think it's like this (where linux-6.10.9 would be the branch to checkout, which would check out the latest rc for it)

git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.9

I'm not saying that .9 branch exists yet, specifically, just an example of how it seems to work. (from a mailing list announcement that came up in a search)

Post by **Grogan** » Sun Sep 08, 2024 6:58 pm

Linux 6.10.9 already
https://cdn.kernel.org/pub/linux/kernel ... Log-6.10.9

Not only, but it looks like there are a lot of code cleanups, stuff found by analysis tools. For one example:

drm/amd/display: Avoid overflow from uint32_t to uint8_t

[ Upstream commit d6b54900c564e35989cf6813e4071504fa0a90e0 ]

[WHAT & HOW]
dmub_rb_cmd's ramping_boundary has size of uint8_t and it is assigned
0xFFFF. Fix it by changing it to uint8_t with value of 0xFF.

This fixes 2 INTEGER_OVERFLOW issues reported by Coverity.

There are several amdgpu and amdgpu/display fixes today.

It looks like we may also be getting to the crux of the problem I (am/was) having when I hit that bug when using no compositor. I've been avoiding it so I only assume it's not fixed. Firmware fixed other issues that crept in, but I don't know about that one. I really hate it when it happens, so I don't let it. But I'll have to get up the nerve to test that.

drm/amd/display: Disable DMCUB timeout for DCN35

[ Upstream commit 7c70e60fbf4bff1123f0e8d5cb1ae71df6164d7f ]

[Why]
DMCUB can intermittently take longer than expected to process commands.

Old ASIC policy was to continue while logging a diagnostic error - which
works fine for ASIC without IPS, but with IPS this could lead to a race
condition where we attempt to access DCN state while it's inaccessible,
leading to a system hang when the NIU port is not disabled or register
accesses that timeout and the display configuration in an undefined
state.

[How]
We need to investigate why these accesses take longer than expected, but
for now we should disable the timeout on DCN35 to avoid this race
condition. Since the waits happen only at lower interrupt levels the
risk of taking too long at higher IRQ and causing a system watchdog
timeout are minimal.

DCN 3.5... that's me. There are other entries related to that change as well, so hopefully that was it.

Not to discount all the other fixes in this point release, but I'm highlighting what may interest us.

Post by **Grogan** » Sun Sep 08, 2024 8:17 pm

Nope, that didn't solve my dcmub queuing problem. With the new kernel, using IceWM, I let X11's DPMS power management kick in and when it awoke, it didn't take long for the problem to happen. Scroll a bit in Firefox, pop a terminal, list something and input devices are stuck. You can break through briefly by doing something... for example switching desktops with a keyboard shortcut may work if the timing is right, or if Firefox is up on screen, ctrl-F to bring up the Find toolbar may briefly snap out of it and allow me to close things down and quit to console.

Code: Select all

[  899.834655] amdgpu 0000:03:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
[  899.834656] amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[  900.014164] amdgpu 0000:03:00.0: [drm] *ERROR* Error queueing DMUB command: status=2
[  900.014165] amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
[  900.193678] amdgpu 0000:03:00.0: [drm] *ERROR* Error queueing DMUB command: status=2

Lots of that when it happens.

I was blaming firmware for this, but it seems it could be the kernel drivers. I tend to take the low level amdgpu driver for granted, but there's all that display core shit too. I think I should report this now.

Post by **Grogan** » Thu Sep 12, 2024 10:44 pm

Linux 6.10.10 now
https://cdn.kernel.org/pub/linux/kernel ... og-6.10.10

A lot of fixes to a lot of systems. One thing that looks significant/welcome is some amdgpu work pertaining to GPU resets.

drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic

[ Upstream commit 6e4aa08fa9c6c0c027fc86f242517c925d159393 ]

The retry loop for SRIOV reset have refcount and memory leak issue.
Depending on which function call fails it can potentially call
amdgpu_amdkfd_pre/post_reset different number of times and causes
kfd_locked count to be wrong. This will block all future attempts at
opening /dev/kfd. The retry loop also leakes resources by calling
amdgpu_virt_init_data_exchange multiple times without calling the
corresponding fini function.

Align with the bare-metal reset path which doesn't have these issues.
This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
reset loop and calling amdgpu_device_pre_asic_reset each retry which
properly free the resources from previous try by calling
amdgpu_virt_fini_data_exchange.

drm/amdgpu: Add reset_context flag for host FLR

[ Upstream commit 25c01191c2555351922e5515b6b6d31357975031 ]

There are other reset sources that pass NULL as the job pointer, such as
amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if
the FLR comes from the host does not work.

Add a flag in reset_context to explicitly mark host triggered reset, and
set this flag when we receive host reset notification.

drm/amdgpu: Fix two reset triggered in a row

[ Upstream commit f4322b9f8ad5f9f62add288c785d2e10bb6a5efe ]

Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.

Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.

Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Post by **Zema Bus** » Sat Sep 14, 2024 8:07 am

I'll do this one tomorrow, then it looks like we'll probably have 6.11 on Sunday.

Post by **Zema Bus** » Sun Sep 15, 2024 11:29 pm

It's done

: PXL_20240915_232135577.jpg (187.08 KiB) Viewed 108469 times

Post by **Grogan** » Mon Sep 16, 2024 12:24 am

Damnit... now I have to do it

(forgot that was coming today)

Post by **Grogan** » Mon Sep 30, 2024 5:39 pm

Linux 6.11.1 today, which calls it "stable"

This is very short, even just for 6.11 to 6.11.1 (as always, main releases are too long to have changelogs lol)
ttps://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.11.1

This means it'll be time for new kernel api headers and gnu rebuilds etc. soon though. Arch usually does it soon after this.

Post by **Grogan** » Fri Oct 04, 2024 6:32 pm

It's kernel time again, Linux 6.11.2
https://cdn.kernel.org/pub/linux/kernel ... Log-6.11.2

It's considerably more lengthy this time. Memory management related fixes, filesystem fixes. Of course (as always) networking fixes, power manglement fixes etc.

Post by **Zema Bus** » Sat Oct 05, 2024 8:32 am

I'll do it tomorrow, I'm still on 6.11.0.

Post by **Grogan** » Thu Oct 10, 2024 4:40 pm

Linux 6.11.3
https://cdn.kernel.org/pub/linux/kernel ... Log-6.11.3

Lots of other stuff, but I could stop reading right there. I use Display Port and DPMS, though I haven't hit this.

drm/amd/display: Revert Avoid overflow assignment

commit e80f8f491df873ea2e07c941c747831234814612 upstream.

This reverts commit a15268787b79 ("drm/amd/display: Avoid overflow assignment in link_dp_cts")
Due to regression causing DPMS hang.

This is my NIC (though I'm not having any problems). There are more related entries for this in the changelog.

r8169: add tally counter fields added with RTL8125

[ Upstream commit ced8e8b8f40accfcce4a2bbd8b150aa76d5eff9a ]

RTL8125 added fields to the tally counter, what may result in the chip
dma'ing these new fields to unallocated memory. Therefore make sure
that the allocated memory area is big enough to hold all of the
tally counter values, even if we use only parts of it.

This would be significant for laptop users

ACPI: battery: Fix possible crash when unregistering a battery hook

[ Upstream commit 76959aff14a0012ad6b984ec7686d163deccdc16 ]

When a battery hook returns an error when adding a new battery, then
the battery hook is automatically unregistered.
However the battery hook provider cannot know that, so it will later
call battery_hook_unregister() on the already unregistered battery
hook, resulting in a crash.

Fix this by using the list head to mark already unregistered battery
hooks as already being unregistered so that they can be ignored by
battery_hook_unregister().

Lots of justification for doing this build.

P.S. Shit like this shouldn't be going in point releases, but if one just goes with the default, it makes no change. I don't think there would be much cost to that though (and it wouldn't affect me as I don't think software would normally uses that syscall, it's for debugging. Though 2 would be the safer option so crash handlers and the like could operate). All the same, I'm just going with the old default.

Allow /proc/pid/mem access override
> 1. Traditional /proc/pid/mem behavior (PROC_MEM_ALWAYS_FORCE) (NEW)
2. Require active ptrace() use for access override (PROC_MEM_FORCE_PTRACE) (NEW)
3. Never (PROC_MEM_NO_FORCE) (NEW)
choice[1-3?]: ?

Traditionally /proc/pid/mem allows users to override memory
permissions for users like ptrace, assuming they have ptrace
capability.

This allows people to limit that - either never override, or
require actual active ptrace attachment.

Defaults to the traditional behavior (for now)

Defined at security/Kconfig:22
Prompt: Allow /proc/pid/mem access override
Location:
-> Security options
-> Allow /proc/pid/mem access override (<choice> [=n])

Post by **Grogan** » Thu Oct 17, 2024 5:57 pm

Linux 6.11.4
https://cdn.kernel.org/pub/linux/kernel ... Log-6.11.4

Typical fixes that you would see in a point release. Security, reliability, fixing things caught by instrumentation/analysis tools etc.

For example,

nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error

commit 835745a377a4519decd1a36d6b926e369b3033e2 upstream.

The `nouveau_dmem_copy_one` function ensures that the copy push command is
sent to the device firmware but does not track whether it was executed
successfully.

In the case of a copy error (e.g., firmware or hardware failure), the
copy push command will be sent via the firmware channel, and
`nouveau_dmem_copy_one` will likely report success, leading to the
`migrate_to_ram` function returning a dirty HIGH_USER page to the user.

This can result in a security vulnerability, as a HIGH_USER page that may
contain sensitive or corrupted data could be returned to the user.

To prevent this vulnerability, we allocate a zero page. Thus, in case of
an error, a non-dirty (zero) page will be returned to the user.

I'm glad I don't have anything that uses libata in my rig, but:

ata: libata: avoid superfluous disk spin down + spin up during hibernation

commit a38719e3157118428e34fbd45b0d0707a5877784 upstream.

A user reported that commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi
device manage_system_start_stop") introduced a spin down + immediate spin
up of the disk both when entering and when resuming from hibernation.
This behavior was not there before, and causes an increased latency both
when entering and when resuming from hibernation.

Hibernation is done by three consecutive PM events, in the following order:
1) PM_EVENT_FREEZE
2) PM_EVENT_THAW
3) PM_EVENT_HIBERNATE

Commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") modified ata_eh_handle_port_suspend() to call
ata_dev_power_set_standby() (which spins down the disk), for both event
PM_EVENT_FREEZE and event PM_EVENT_HIBERNATE.

Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
explicitly mentions that PM_EVENT_FREEZE does not have to be put the device
in a low-power state, and actually recommends not doing so. Thus, let's not
spin down the disk on PM_EVENT_FREEZE. (The disk will instead be spun down
during the subsequent PM_EVENT_HIBERNATE event.)

This way, PM_EVENT_FREEZE will behave as it did before commit aa3998dbeb3a
("ata: libata-scsi: Disable scsi device manage_system_start_stop"), while
PM_EVENT_HIBERNATE will continue to spin down the disk.

This will avoid the superfluous spin down + spin up when entering and
resuming from hibernation, while still making sure that the disk is spun
down before actually entering hibernation.

As usual, also fixes related to amdgpu

drm/amd/display: Clear update flags after update has been applied

commit 0a9906cc45d21e21ca8bb2b98b79fd7c05420fda upstream.

[Why]
Since the surface/stream update flags aren't cleared after applying
updates, those same updates may be applied again in a future call to
update surfaces/streams for surfaces/streams that aren't actually part
of that update (i.e. applying an update for one surface/stream can
trigger unintended programming on a different surface/stream).

For example, when an update results in a call to
program_front_end_for_ctx, that function may call program_pipe on all
pipes. If there are surface update flags that were never cleared on the
surface some pipe is attached to, then the same update will be
programmed again.

[How]
Clear the surface and stream update flags after applying the updates.

drm/amdgpu: partially revert powerplay `__counted_by` changes

commit d6b9f492e229be1d1bd360c3ac5bee4635bacf99 upstream.

Partially revert
commit 0ca9f757a0e2 ("drm/amd/pm: powerplay: Add `__counted_by` attribute for flexible arrays")

The count attribute for these arrays does not get set until
after the arrays are allocated and populated leading to false
UBSAN warnings.

Post by **Grogan** » Tue Oct 22, 2024 4:58 pm

Linux 6.11.5
https://cdn.kernel.org/pub/linux/kernel ... Log-6.11.5

Lots of fixes... e.g. mm, swap.

Post by **Grogan** » Fri Nov 01, 2024 5:20 pm

Linux 6.11.6 today:
https://cdn.kernel.org/pub/linux/kernel ... Log-6.11.6

Not a huge changelog, but some significant fixes for some. I gave it a brief scroll and nothing jumped out for us. A lot of things don't affect me because I don't use power management (other than DPMS in X11). For example, fixes for my r8169 NIC coming out of suspend don't really affect me (assuming it's even relevant for the code paths I use for my model). I don't use BPF for anything, I don't have the latest generation of AMD video cards, I don't use libata for anything anymore etc.

Post by **Zema Bus** » Sat Nov 02, 2024 5:28 pm

Something to do today

Post by **Grogan** » Fri Nov 08, 2024 6:53 pm

Linux 6.11.7
https://cdn.kernel.org/pub/linux/kernel ... Log-6.11.7

This is the kind of stuff they do in point releases now? Here's a significant change I just noticed for amdgpu:

drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs

commit ec1aab7816b06c32f42935e34ce3a3040c778afb upstream.

This uses more aggressive hueristics than the the bootup default
profile. On windows the OS has a special fullscreen 3D mode
where this is used. Since we don't have the equivalent on Linux
default to this profile for dGPUs.

"dGPUs" means discrete video cards (v.s. iGPU for integrated)

That's a more aggressive power management profile for the cards. I'm not sure I want that now (Not the same as forcing it to full power, but that didn't actually help me with Starfield, my gpu power management is working fine) and now I'm going to have to figure out how to override it and have the change stick (probably using sysctl, I don't want to stick commands in a script). I'm going to hold off on this kernel until I have that sorted.

Here's another, but I think it's for newer cards.

drm/amdgpu: fix random data corruption for sdma 7

[ Upstream commit 108bc59fe817686a59d2008f217bad38a5cf4427 ]

There is random data corruption caused by const fill, this is caused by
write compression mode not correctly configured.

So correct compression mode for const fill.

Mine is SDMA 6:

Code: Select all

[grogan@nicetry ~]$ dmesg | grep -i sdma
[    0.587394] [drm] add ip block number 7 <sdma_v6_0>
[    1.232613] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    1.232614] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0

Of course, there are plenty of other little fixes in this point release that can be viewed in the changelog. These are the ones that jump out for us.

Post by **Grogan** » Fri Nov 08, 2024 8:35 pm

Actually I don't think that's hurt anything

idle:

Code: Select all

[grogan@nicetry ~]$ sensors
amdgpu-pci-0300
Adapter: PCI adapter
vddgfx:       84.00 mV 
fan1:           0 RPM  (min =    0 RPM, max = 3200 RPM)
edge:         +28.0°C  (crit = +100.0°C, hyst = -273.1°C)
                       (emerg = +105.0°C)
junction:     +35.0°C  (crit = +110.0°C, hyst = -273.1°C)
                       (emerg = +115.0°C)
mem:          +52.0°C  (crit = +108.0°C, hyst = -273.1°C)
                       (emerg = +113.0°C)
PPT:           5.00 W  (cap = 220.00 W)

5.00W to 8.00W at idle and 0 rpm fan speed at idle. The 5W value is a bit low, probably just sensor anomaly. It's usually 7 or 8.

So this is likely just going to ramp up sooner, with the new profile.

Post by **Grogan** » Mon Nov 11, 2024 8:14 pm

I decided to stop mounting the EFI partition (/boot) in the OS. There's no need for that to be mounted except for kernel installation time. I'm tired of dosfscking that if it doesn't get unmounted properly (e.g. when stuff like that Cosmic Desktop crash/freeze or game lockup happens, though I haven't had that happen in a long time). Grub is what accesses it. For most people, that gets mounted by the initramfs hooks, but for me, with my custom monolithic kernel it doesn't, so that meant just taking the line I put in fstab out.

It also makes installing kernels more of a pain in the ass now, with 2 filesystems to mount and unmount so I had to script it. I made a script on the old computer, but I wasn't using it on here, since it's a common install of the same kernel for both Arch and Gentoo (separate modules tree though, with INSTALL_MOD_PATH)

My new kernel install script, "installkern" in /usr/local/bin. Actually not "new", I edited the old one to have the right commands in the right places for this usage. It takes care of both OSes.

Code: Select all

#! /bin/sh

if [ -f "arch/x86/boot/bzImage" ]; then
  read -p "Enter OLDVER or leave blank to skip uninstall: " OLDVER
  read -p "Enter NEWVER: " NEWVER
else
  echo "You're in the wrong dir or your kernel isn't built"
  exit 1
fi

echo "Mounting filesystems CTRL+C to abort 5s"
sleep 5
mount -t vfat -o umask=077 /dev/nvme1n1p1 /boot
mount -t ext4 -o noatime /dev/nvme1n1p4 /mnt/gentoo

if [ -z "$OLDVER" ]; then
  echo "Not removing old kernel"
else
  echo "Removing old kernel"
  rm /boot/vmlinuz-"$OLDVER"
  rm -r /lib/modules/"$OLDVER"
  rm -r /mnt/gentoo/lib/modules/"$OLDVER"
fi

if [ -z "$NEWVER" ]; then
  echo "That was silly of you"
  umount /mnt/gentoo
  umount /boot
  exit 1
else
  echo "Installing Linux $NEWVER"
  cp arch/x86/boot/bzImage /boot/vmlinuz-"$NEWVER" || exit 1
  make modules_install 
  make INSTALL_MOD_PATH=/mnt/gentoo modules_install
  echo "Opening editor for grub.cfg"
  vi /boot/grub/grub.cfg
  sleep 1
  sync
  umount /boot || exit 1
  umount /mnt/gentoo || exit 1
  echo "/boot and /mnt/gentoo unmounted"
  echo "Linux $NEWVER is installed"
fi

I used it to install the kernel I'm going to talk about in the next post in thread.

Post by **Grogan** » Mon Nov 11, 2024 8:23 pm

So I decided to try something different today. I've been hearing a lot of chatter about the CachyOS kernel, so I grabbed their main patch set. I found it by expanding variables in their PKGBUILD, from here:

https://github.com/CachyOS/linux-cachyos

This patch set (rolled into one big patch)
https://raw.githubusercontent.com/cachy ... -all.patch

I didn't install their scheduler patches, I'm going to stay with the kernel's default for those. What I'm primarily interested in is the THP Shrinker (Transparent Huge Page wasting less memory) backported from the newly branched 6.13.

I used my own .config and enabled things selectively during oldconfig (then checked it over with menuconfig after)

I did not enable NTSYNC, that's a bit too experimental yet, I'll stick with FSYNC for now. My wine-tkg and proton-tkg builds have support for it though, I think.

I did enable -O3 and -march=alderlake though (#34 in their list of cpu types lol). I know that's pointless, the kernel build disables all the instructions that turns on anyway and -O3 probably just bloats the size (my bzImage is about 850K larger, compressed size). For kernel code, the scheduling for core2 (the highest for Intel) is fine, but we'll see. It can't hurt, because the kernel build system does -mno-sse2, -mno-sse3, -mno-avx (pretty much all vector instructions) etc.

Since I didn't use their config, it doesn't change my kernel version string... still 6.11.7 (I like to stick with the numbers)

Anyway, my kernel boots, I'm using it now, we'll see tonight while gaming.

Post by **Grogan** » Mon Nov 11, 2024 9:46 pm

Actually, I changed my mind. In for a penny, in for a pound. I decided to patch in their CPU schedulers, the BORE scheduler (actually from Facebook's programmers) with CachyOS tweaks to it. Might as well try their kernel with their schedulers and tweaks if I'm going to do it. It becomes a kernel config option:

Code: Select all

CONFIG_SCHED_BORE:
  │
  │ In Desktop and Mobile computing, one might prefer interactive
  │ tasks to keep responsive no matter what they run in the background.
  │
  │ Enabling this kernel feature modifies the scheduler to discriminate
  │ tasks by their burst time (runtime since it last went sleeping or
  │ yielding state) and prioritize those that run less bursty.
  │ Such tasks usually include window compositor, widgets backend,
  │ terminal emulator, video playback, games and so on.
  │ With a little impact to scheduling fairness, it may improve
  │ responsiveness especially under heavy background workload.

So these 3 patches in order (else patching fails with hunk rejects)

https://raw.githubusercontent.com/cachy ... -all.patch
https://raw.githubusercontent.com/cachy ... -ext.patch
https://raw.githubusercontent.com/cachy ... -ext.patch

Code: Select all

patch -Np1 -i ../0001-cachyos-base-all.patch
patch -Np1 -i ../0001-sched-ext.patch
patch -Np1 -i ../0001-bore-cachy-ext.patch

I also enabled NTSYNC (Device Drivers/Misc devices/NT synchronization primitive emulation) in the kernel, I'll need a variable at this time to use it for games anyway, might as well have it. I don't have to use it if it's NFG.

P.S. That's a no-go on NTSYNC. Firstly, it's not available in Valve trees (like my proton-tkg) and for my system wine-tkg, I'd have to build it with _use_ntsync=true and then it can't be built with either esync or fsync enabled. So that's not practical at this time... I'll disable it in kernel next rebuild.

Post by **Grogan** » Tue Nov 12, 2024 4:22 am

Well, you know me... I can't just give up. I didn't want to do that to my system wine, but I managed to build a proton-tkg with upstream wine 9.21 (since this is incompatible with valve's wine trees), with the ntsync5 patchset enabled (_use_ntsync=true, _use_esync=false, and _use_fsync=false in the config file before starting the build). At first it wasn't finding my ntsync.h kernel header, I was expecting it to follow the symlinks for that, but on viewing the offending source file (dlls/ntdll/unix/sync.c) I saw it was doing #include <linux/ntsync.h> so I copied it to /usr/include/linux from my kernel tree, which is kind of wrong (that's kernel api headers) but oh well. I'll just have to remember that I put that there lol

I had to create a udev rule because the permissions on /dev/ntsync were root only. I only need read permissions on this, so it's a very simple rule, only changing the mode)

Code: Select all

KERNEL=="ntsync", MODE="0644"

I tested Starfield with PROTON_LOG=1 to verify that it was working, and yes: "wine: using fast synchronization"

I haven't played enough to speak about performance (and good or bad, that may be affected simply because this is upstream wine and not valve's proton wine tree so I won't be able to scientifically say "aha!" here) but it seems to work OK.

What I've gleaned is that this isn't necessarily better than fsync for performance, just that it's more compatible, meaning that games that barf on fsync or esync shouldn't barf on this. It's more how Windows works. All the same I have to try it, since I have a kernel with it patched in and enabled. Apparently it's going to be in the kernel eventually (was supposed to go in 6.10 but wasn't ready yet)

Mikeserv Support Forum

New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel

Re: New Kernel