Linux
Contents
About
Kernel Boot-Parameter
Different distributions ship different boot-parameters. Look them up via:
1 man 7 bootparam
Here are some important kernel command line parameters that should not be forgotten.
1 GRUB_CMDLINE_LINUX_DEFAULT="quiet zswap.enabled=1 cgroup.enable=memory swapaccount=1 scsi_mod.use_blk_mq=1 nomodeset"
Networking
IPv6
Source of the hint: FreeIPA Deployment Recommendations
DO NOT use ipv6.disable=1 on the kernel commandline: It disables the whole IPv6 stack and breaks Samba.
If necessary, adding ipv6.disable_ipv6=1 will keep the IPv6 stack functional but will not assign IPv6 addresses to any of your network devices. This is recommended approach for cases when you don't use IPv6 networking.
You may also disable "all" or very specific interfaces.
/etc/sysctl.d/ipv6.conf
IPv4 Forwarding
https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html
ip_forward - BOOLEAN 0 - disabled (default) not 0 - enabled Forward Packets between interfaces. This variable is special, its change resets all configuration parameters to their default state (RFC1122 for hosts, RFC1812 for routers)
path: /proc/sys/net/ipv4/ip_forward
sysctl key: net.ipv4.ip_forward
- default: 0
- configuration:
at boottime via sysctl
at runtime via procfs
/proc/sys/net/ip_forward
/etc/sysctl.d/net.conf
1 net.ipv4.ip_forward = 1
Virtual Memory
- You should use the newer sysfs interface, while the procfs-interface is kept for backwards compatibility.
Take a look on the info provided by the pseudo-filesystems exported by the kernel concerning the virtual memory management.
Swappiness
path: /proc/sys/vm/swappiness
sysctl key: vm.swappiness
- default: 60
- configuration:
at boottime via sysctl
at runtime via procfs
/proc/sys/vm/swappiness
/etc/sysctl.d/vm.conf
1 vm.swappiness = 5
Apply configuration via sysctl.
1 # sysctl --system
2 * Applying /etc/sysctl.d/30-baloo-inotify-limit.conf ...
3 fs.inotify.max_user_watches = 524288
4 * Applying /etc/sysctl.d/30-postgresql-shm.conf ...
5 * Applying /etc/sysctl.d/30-tracker.conf ...
6 fs.inotify.max_user_watches = 65536
7 * Applying /usr/lib/sysctl.d/50-coredump.conf ...
8 kernel.core_pattern = |/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %e
9 * Applying /etc/sysctl.d/99-sysctl.conf ...
10 * Applying /etc/sysctl.d/vm.conf ...
11 vm.swappiness = 5
12 vm.dirty_background_ratio = 8
13 vm.dirty_expire_centisecs = 3000
14 vm.dirty_ratio = 32
15 vm.dirty_writeback_centisecs = 500
16 * Applying /etc/sysctl.conf ...
Memory over-commitment
https://www.kernel.org/doc/html/latest/vm/overcommit-accounting.html
sysctl key: vm.overcommit_memory
path: /proc/sys/vm/overcommit_memory
- default: 0
- configuration:
at boottime via sysctl
at runtime via procfs
- usage:
- redis demands it with
WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
- redis demands it with
/etc/sysctl.d/vm.conf
1 vm.overcommit_memory = 1
Hugepages
Hugepages are a optimization of the memory management targeting the Translation Lookaside Buffer (TLB), which is a fast buffer and limited in the CPU, that maps virtual addresses to physical addresses. Less entries in the TLB because of bigger page sizes, mean less page misses during the runtime.
Transparent hugepages
https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html
khugepaged scans the memory in intervals and defrags and collapses large areas to a hugepages.
Currently THP only works for anonymous memory mappings and tmpfs/shmem. But in the future it can expand to other filesystems. See also tmpfs with systemd
path: /sys/kernel/mm/transparent_hugepage
- default: 0
- configuration:
at boottime via sysfsutils
at runtime via sysfs
- usage:
- redis demands it with
WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
- redis demands it with
/etc/sysfs.d/transparent_hugepage.conf
1 kernel/mm/transparent_hugepage = madvise
Explicit hugepages
https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
path: /sys/kernel/mm/hugepages/
- default: 0
- configuration:
at boottime via sysctl
at runtime via sysfs
at runtime via procfs /proc/sys/vm/*huge*
This example configures 2048 hugepages, each 2MiB in size, which may be allocated dynamically on top to the fixed number of hugepages (0).
/etc/sysctl.d/hugepages.conf
grep -rH "" /sys/kernel/mm/hugepages
1 /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages:0
2 /sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages:0
3 /sys/kernel/mm/hugepages/hugepages-2048kB/surplus_hugepages:0
4 /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages_mempolicy:0
5 /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages:0
6 /sys/kernel/mm/hugepages/hugepages-2048kB/nr_overcommit_hugepages:2048
7 /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages:0
8 /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages:0
9 /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages:0
10 /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages_mempolicy:0
11 /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages:0
12 /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages:0
zswap
Zswap is a lightweight compressed cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a dynamically allocated RAM-based memory pool. zswap basically trades CPU cycles for potentially reduced swap I/O. This trade-off can also result in a significant performance improvement if reads from the compressed cache are faster than reads from a swap device.
path: /sys/module/zswap/parameters
- default: N
- configuration:
at boottime via sysfsutils
at runtime via sysfs
grep -R . /sys/module/zswap/parameters
/etc/sysfs.d/zswap.conf
1 module/zswap/parameters/enabled = 1
CPU scaling governor
Available scaling governors
Default is ondemand
Set governor
Finetune CPU frequency with
1 # grep -H "" /sys/devices/system/cpu/cpu*/cpufreq/scaling_*_freq
2 /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:600000
3 /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1500000
4 /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:600000
5 /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:600000
6 /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:1500000
7 /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:600000
8 /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:600000
9 /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:1500000
10 /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq:600000
11 /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:600000
12 /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:1500000
13 /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:600000
IO-Scheduler
On a hypervisor the scheduler bfq seems to be reasonable.
On a VM with no disk or controller pass-through none should be used. This avoids optimizing the queues twice, which is inefficient and contra-productive. The hypervisor will optimize the io-request anyway.
Make alternative schedulers available
BLK-MQ is nowadays broadly available and enabled in distributions. Using multiple queues on multicore systems with fast storage promises some performance gains.
But when I took a look on available schedulers only "mq-deadline" and "none" were available.
This is because these scheduler are shipped as a kernel module and need to be loaded first into the kernel via modprobe.
Modules may be loaded manually:
Modules may also be loaded automatically at boot-time via /etc/modules.
Set IO-Scheduler permanently
kernel-cmdline
Method seems not to be working any longer.
- Service affecting.
/etc/default/grub
1 GRUB_CMDLINE_LINUX_DEFAULT="quiet elevator=$SCHEDULER"
Refresh grub config and reboot.
udev-rule
Works at run- and at boot-time!
- More selective because disks may be filtered with a regex.
/etc/udev/rules.d/60-persistent-storage-scheduler.rules
Reload udev-rules
Reload will probably happen automatically but the "trigger" is necessary.
1 udevadm control --reload-rules && udevadm trigger
Drop FS Cache
1 echo 3 | tee /proc/sys/vm/drop_caches
Hardening
Disable TCP Timestamping
1 hping3 -S -p 22 --tcp-timestamp $DESTINATION
2
3 1 root@libertas /home/tobias/Downloads # hping3 -S -p 22 --tcp-timestamp www.rockstable.it
4 HPING www.rockstable.it (bridge 178.63.149.226): S set, 40 headers + 0 data bytes
5 len=56 ip=178.63.149.226 ttl=53 DF id=0 sport=22 flags=SA seq=0 win=65160 rtt=24.2 ms
6 TCP timestamp: tcpts=2031225761
7
8 len=56 ip=178.63.149.226 ttl=53 DF id=0 sport=22 flags=SA seq=1 win=65160 rtt=19.8 ms
9 TCP timestamp: tcpts=2031226761
10 HZ seems hz=1000
11 System uptime seems: 23 days, 12 hours, 13 minutes, 46 seconds
Disable temporarily
1 echo 0 > /proc/sys/net/ipv4/tcp_timestamps
Disable persitent
{{{/etc/sysctl.d/tcp.conf
1 net.ipv4.tcp_timestamps=0