bergernet research

Resolving Raspberry Pi 3B+ Black Screen Flickering Issue

Situation

The Problem and The Solution

In this blog post, we get some insight to a problem that I faced with 10 Raspberry Pi 3B+ devices equipped with 7" LCD screens. The issue, a black screen with a flickering cursor in the top left corner, was not what I was suspecting at first.

The customer reported that several of the installed LCD screens are not working correctly. At some point it became clear that after experiencing a power disruption, these devices would boot up with a black screen and a flickering cursor in the top left corner. Initially, it was suspected that the devices were stuck at boot. However, further investigation revealed that they were still operational to some extent. I could ping these devices, and in some cases, even establish an SSH connection.

To diagnose the issue, the I dove into the logs of the reachable devices and found a clue in the /var/log/lightdm/x-0.log file. It contained the error message: &quotFatal server error: (EE) no screens found(EE)." Additionally, some devices exhibited memory-related issues, with syslog indicating out-of-memory (OOM) errors:

Aug 23 02:54:10 raspberrypi kernel: [704234.187774] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=lightdm,pid=5053,uid=0 Aug 23 02:54:10 raspberrypi kernel: [704234.187865] Out of memory: Killed process 5053 (lightdm) total-vm:983556kB, anon-rss:867896kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:954kB oom_score_adj:0 Aug 23 02:54:10 raspberrypi kernel: [704234.534421] oom_reaper: reaped process 5053 (lightdm), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB"

The OOM errors were caused by the automatic restarting of the X server (several times per second), which eventually led to memory exhaustion and the initiation of the OOM reaper over time.

After some thinking, a quite long investigation and trial-and-error, I discovered which of the following steps finally resolved the issue for all 10 devices.

The tested steps (not necessarily in this order..):

  1. Disable Splash in raspi-config: Disabling the splash screen was the first step, as I believed that this might change the mode in which the raspberry booted.
  2. Adding a boot_delay to config.txt: A delay was added to the boot process to ensure that the LCD display initialized correctly.
  3. Comment out dtoverlay in config.txt: This step proved to be the solution. I commented out the dtoverlay=vc4-kms-v3d setting in /boot/config.txt, which in the end fixed the issue permanently for all devices.
  4. Executesudo apt install --reinstall libraspberrypi0 libraspberrypi-{bin,dev,doc} raspberrypi-bootloader This is supposed to fix issues as the person that installed the system was not sure if he executed sudo rpi-update, which might have installed a leading edge firmware.
  5. Kernel and system update: Sometimes stuff is broken on specific kernel versions...

The still open question that remains: Why was this line activated at all? I do not know (yet).

Additional Challenges

One of the major challenges in resolving this issue was the remote location of the devices, which were in an isolated network accessible only through a Windows 10 machine via VPN/RDP. To reproduce the problem in the lab, the I to create an image of a running Pi remotely. This could be solved thanks to the CLI ssh interface of putty: plink. The used command looked like this:

C:\Users\user\Desktop&gtplink -l pi -ssh 192.168.99.14 &quotsudo dd if=/dev/mmcblk0 bs=8M | gzip -c -9" > image.dd.gz

This image then could be retrieved. After that the console output had to be stripped from the file with a hex editor:

Hex editor

The cleaned file then could be unpacked with gunzip and the resulting dd image cloud be copied to an micro sd card. This way I could test and debug on a device in my lab.

On to the next challenge!