Raspbian on ext4: Structure needs cleaning

Hi all,

I have several Raspberries running. One of them is used to collect sensor data, and since it is located > 100 miles distant, it is currently only accessible via VPN.

Last reboot was end of November and now I have to see that a stat, ls etc. to some directories on the ext4 partition return an error alert "structure need cleaning".

Reading in some RPi fora it looks like this is a common problem related to ext4.

So, some questions (since I don't have the chance to drive to that location now and replace the SD card / installation):

- Has someone already installed Raspbian (Buster) on ext3 instead of ext4, and if so, will this prevent the card from further problems like this?

- What can be done remotely now to "repair" the FS? -- I can hardly pull the card out of it and run an fsck on a second machine via VPN. B.t.w., badblocks came back with 0 errors, the SD card seems still ok.

I assume that the next reboot will be the last one... So, any ideas highly appreciated. -- Thanks!

Best regards,

Markus

--
Please reply to group only. 
For private email please use http://www.dipl-ing-kessler.de/email.htm
Reply to
Markus Robert Kessler
Loading thread data ...

A few questions:

- How full is the card? "df -h" should show that.

- How long has thus card been in use?

- Are you running 'fstrim' on the card and, if so , how frequently? -- but see below ---

- Can you describe how you collect and store data on the Pi4 - by that I mean how big are the files, how many are held on the Pi, how long is each file held, or are you using a database?

- If you're using files, how is each written to? IOW is it left open and data added until it hits a preset limit and a new one is opened, or is the file normally closed and every so often the file is opened, data appended to it and the file closed again. How often are old files discarded to make room for more data and how are the old files chosen for deletion?

- How easy is it to change the maximum number and size of files [or rows in the database table(s)] on the Pi?

** Guess ** Could it be that you have never run fstrim? If so, running it may help: something like "sudo fstrim -v /home" should do the trick - however I've just tried it on a Pi 2B running Buster and, since I don't usually run fstrim on the Pi, I thought it would do its thing (I run it weekly on a Lenovo laptop with a Sanyo 120 GB SSD and Fedora Linux; several Kb of blocks are trimmed each time fstrim is run. However, on the Pi 2 (16GB SD card fitted) 'fstrim' just reports zero bytes trimmed, i.e. it didn't do anything.

I assume from this that by design fstrim does nothing when pointed at a partition on and SD card. Can anybody confirm this? The manpage is silent about using in on SD cards.

However, the other stuff I asked about should let us make sensible suggestions.

--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie

Hi,

installation was end of November on a fresh 16 GB card.

As recommended in this group, I did not perform reboots every day / night, since there are only some bash and python scripts running, which consume only few resources and are ending after running properly.

$ who -r run-level 3 Nov 30 09:21 $ $ $ df -h Filesystem Size Used Avail Use% Mounted on /dev/root 15G 2.4G 12G 18% / devtmpfs 184M 0 184M 0% /dev tmpfs 216M 0 216M 0% /dev/shm tmpfs 216M 22M 194M 11% /run tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs 216M 0 216M 0% /sys/fs/cgroup /dev/mmcblk0p1 253M 54M 199M 22% /boot tmpfs 44M 0 44M 0% /run/user/0 $ $ $ cd /etc/resolvconf/update-libc.d $ ll ls: cannot access 'avahi-daemon': Structure needs cleaning total 8.0K drwxr-xr-x 2 root root 4.0K Feb 13 2020 ./ drwxr-xr-x 3 root root 4.0K Feb 13 2020 ../

-????????? ? ? ? ? ? avahi-daemon $ cat avahi-daemon cat: avahi-daemon: Structure needs cleaning $ fstrim -a -v /boot: 197.4 MiB (206990848 bytes) trimmed on /dev/mmcblk0p1 /: 0 B (0 bytes) trimmed on /dev/mmcblk0p2 $ ll ls: cannot access 'avahi-daemon': Structure needs cleaning total 8.0K drwxr-xr-x 2 root root 4.0K Feb 13 2020 ./ drwxr-xr-x 3 root root 4.0K Feb 13 2020 ../

-????????? ? ? ? ? ? avahi-daemon $

Well, indeed, I never used fstrim so far. But in this case it seems to do nothing, though.

Data is received via I2C bus, processed and transmitted to a webserver outside. These are only some Kilobytes, and there is no sensor data stored on disk.

Yes, same here. See above.

So, the errors still persist, and I don't dare to do a reboot...

Best regards,

Markus

--
Please reply to group only. 
For private email please use http://www.dipl-ing-kessler.de/email.htm
Reply to
Markus Robert Kessler

On Sat, 20 Mar 2021 00:00:55 -0000 (UTC), Martin Gregorie declaimed the following:

It does, however, state:

"""

-a, --all Trim all mounted filesystems on devices that support the discard operation. The other supplied options, like --offset, --length and --minimum, are applied to all these devices. Errors from filesystems that do not support the discard operation, read-only devices and read-only filesystems are silently ignored. """

I would suspect SD cards do not have "discard" (after all, they are, for the most part, optimized for FAT file systems)

--
	Wulfraed                 Dennis Lee Bieber         AF6VN 
	wlfraed@ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/
Reply to
Dennis Lee Bieber

are,

That makes sense. Thanks.

--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie

OK, but is it a noname card or from one of the better brands? I've been using Sandisk for everything (RPi, camera, glider navigation system and flight logger) a few years now and have had no card-related problems.

OK - nothing wrong so far

I think you'll find that avahi-daemon is only needed if your RPi needs to talk to some sort of Apple computer: here's synopsis:

The Avahi mDNS/DNS-SD daemon implements Apple's Zeroconf architecture (also known as "Rendezvous" or "Bonjour"). The daemon registers local IP addresses and static services using mDNS/DNS-SD and provides two IPC APIs for local programs to make use of the mDNS record cache the avahi-daemon maintains. First there is the so called "simple protocol" which is used exclusively by avahi-dnsconfd (a daemon which configures unicast DNS servers using server info published via mDNS) and nss-mdns (a libc NSS plugin, providing name resolution via mDNS). Finally there is the D-Bus interface which provides a rich object oriented interface to D-Bus enabled applications.

So, it seems that if you are using Apple kit you need it, otherwise kill it. I only connect to my RPi from a Linux system, so I don't understand or use avahi-daemon.

OK

OK That looks like all the obvious stuff covered, then.

If nothing else occurs to you or is suggested, check the SD card with fsck: "fsck -A -s" would seem appropriate and tell it not to fix any problems if it finds something and asks if it should repair it: IOW treat this as just a problem scan and only consider what to do if the complete fsck scan shows any errors.

If errors are found, try to back up anything useful (code, scripts etc that aren't already backed up) and then, if you're feeling keen back up the SD card onto new backup media, i.e. don't overwrite a good backup.

Then:

- try using fsck to fix errors. If that works, great.

- otherwise use gparted to clear the SD card, repartition and reformat it and copy the backed up stuff back onto it and see if its now OK.

- if still not fixed, repeat the last step with a new disk unless your backup was to a new, freshly partitioned and formatted SD card, in which case, use that as the RPi's main card and junk toe original.

--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie

I only use brands like Sandisk, Samsung EVO and similar.

It makes me cry to see that the card is totally ok,

# badblocks -vvv /dev/mmcblk0 Checking blocks 0 to 15446015 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. (0/0/0 errors)

and this seems to be one more ext4-issue.

In the meantime the filesystem was going more and more corrupted after I tried to perform fsck.ext4. Finally, there were errors in /home, and even /var was empty (!)...

So, the last thing I tried was to switch to NFS- or NAS-boot but I had to see that the total storage space at that location was by far not sufficient. Even worse, init didn't work either.

Since even /lib was more and more messed up, not even a shutdown / halt / poweroff etc. was possible. So, I kicked it out in the firewall to prevent it from doing unpredictable things after sshd also crashed and I lost the connection.

So, end of the line here. Oh man...

B.t.w.,

I set up one more box here with ext3 rootfs to make some experiments. It works perfectly, and if the installation survives the next days then I will switch all of my machines to ext3 one after the other.

So, thank you all for the nice discussion!

Best regards,

Markus

--
Please reply to group only. 
For private email please use http://www.dipl-ing-kessler.de/email.htm
Reply to
Markus Robert Kessler

Good.

Good.

Not necessarily - see below.

Besides, IME ext3 and ext4 are very reliable filing systems - I've never had any problems with either, even when retrieving the /home directory structure from a hard drive that was failing due to old age (50,000 hours).

The reason that I suggested running "fsck -A -s" is because:

- the -A option tells fsck to check every partition in /etc/fstab, using the appropriate filing system checker, as specified in /etc/fstab, for each partition

- the -s option tells it to check one partition at a time. The default is to check them all at once, but this does mean that the error messages will be jumbled together if more than one partition has errors.

So, what exactly did you run? Just fsck.ext4? If so, with what options?

If you let fsck.ext4 loose on both partitions of course it would throw errors because the boot partition is VFAT, not EXT4, but if you *DID NOT* let fsck.ext4 make any changes, then the filing system should not have been damaged (any more than it was already).

Did you make a backup copy, as I also suggested, before running fsck.ext4? If not, and the errors are due to telling fsck.ext4 to scan a VFAT partition, then your larger EXT4 partition may well be salvageable, but we can't tell you how unless we know what other computers you have and what operating systems they run.

For example everything here, apart from my RPi, runs on X86 chips and has Fedora Linux installed, so I could transfer only stuff I've written on my RPi to another SD card by:

- using gparted to make a same sized pair of VFAT and EXT4 partitions on a new SD card

- make two tar archives, one containing everything in /home and the other containing everything in /usr/local, saving them on a Fedora box.

- set up a clean copy of Raspbian Buster on the new card

- unpack the contents of the /home and /usr/local tar archives over the new Debian install

This would put me pretty much back in business with all my own code and data reinstalled in an upto date Debian Buster system. And, I've done this several times already as the SD card had grown form 4GB -> 8GB ->

16GB and Raspbian has successively outgrown the previous cards. Finally, back when you wrote your monitoring system, did you make a backup copy of the source code, binaries, shell scripts etc. on another machine before putting your RPi system into everyday use? Ans, I have to add: if not, why not?
--
Martin    | martin at 
Gregorie  | gregorie dot org
Reply to
Martin Gregorie

many thanks for your very interesting thoughts!

Well, to make it short,

snip [...]

First partition was ok and then I only tested the second one with

fsck.ext4 -y /dev/mmcblk0p2

I knew that this was not a good idea, but the filesystem was already so corrupted that it was not even possible to switch to runlevel 1 and unmount ext4 partition. Very soon it was clear that I can forget about the installation and there was not much to lose.

I just tried to learn from what happened.

snip [...]

I have the same data as backups and besides this there are several machines running with almost identical configuration.

To make things easier, some weeks ago I started with setting up one sample machine very carefully, I installed everything needed, created the necessary users, made all updates, and then I created two tgz-balls, one for boot and one for rootfs.

Making a new instance out of it can now be easily done by taking a new SD card out of the box, overwrite the first hundred MBs with 'cat /dev/zero

a Win95 partition with 256MB and the whole remaining space with linux filesystem using fdisk.

Afterwards I only have to format both partitions with the appropriate filesystem type and restore the backups.

Of course, the machine name has to be changed in /etc/hosts and /etc/ hostname, the most recent updates have to be applied and so on.

But creating a new installation based on this is a matter of half an hour.

So, there are only two things left annoying me somehow in this case:

- I still do not know why the filesystem changed to dust. Since I did not overload the machine, nor did I reboot all the time

- Now I have to travel to that location to replace the installation :-)

Well, the new box is already there, ready to replace the crashed one. This one now has ext3 as second partition. So, let's see how long this will work.

Thanks a lot!

Best regards,

Markus

--
Please reply to group only. 
For private email please use http://www.dipl-ing-kessler.de/email.htm
Reply to
Markus Robert Kessler

You can change your /boot/cmdline.txt to the following:-

dwc_otg.lpm_enable=0 console=tty1 root=/dev/sda1 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait rootdelay=5

Which will then run fsck to repair the disk everytime when you reboot it. It might take a good few minutes to come back up again, which is worrying when doing it remotely.

ext3 wont stop you getting these problems, but rather they will cause serious corruption, so stick with ext4.

It's just soft corruption, so the above should repair it.

There is always the possibility it wont come back from any remote reboot, so having someone at the remote location who can put in a backup SD card is always a good move.

---druck

Reply to
druck

Yeah, going back from ext4 to ext3 to solve some unknown problem doesn't seem like a good idea.

Even apart from the inherent stability improvements, it seems likely that modern kernel and application features depend on certain ext4 extensions like the smaller-than-one-second file timestamps (which also highlights another ext3 disadvantage: its Y2038 problem).

formatting link

Reply to
A. Dumas

"And so on": The partition IDs have to be adapted also:

When the SD card is mounted on a Linux workstation, then the two raspberry partitions will show up as (e.g.) /dev/sdb1 and /dev/sdb2 or similar -- just verify. Then get their partition names with blkid /dev/ sdb{1,2} and update boot/cmdlist.txt and rootfs/etc/fstab on the SD card.

Otherwise your cigarette box won't boot.

Best regards,

Markus

--
Please reply to group only. 
For private email please use http://www.dipl-ing-kessler.de/email.htm
Reply to
Markus Robert Kessler

On Sat, 20 Mar 2021 15:22:25 -0000 (UTC), Martin Gregorie declaimed the following:

If it makes anything better -- I tried to boot my experimental R-Pi 3b+ yesterday. Took three tries before it showed up on my router. Ran the apt update/upgrade sequence, and upgrade failed with the same error you have.

I copied off the top level my home directory, removed the SD card, and formatted it with SD Card Formatter. Now preparing to install new OS.

It appears that the R-Pi organization has dropped the NOOBS installer, and only provides downloads for each specific OS.

--
	Wulfraed                 Dennis Lee Bieber         AF6VN 
	wlfraed@ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/
Reply to
Dennis Lee Bieber

Finally!

Reply to
A. Dumas

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.