Thursday, May 11, 2017

A mystery problem, somehow fixed

TL;DR: XFS filesystem corrupted, ran xfs_repair on unmounted system via a live USB to fix it.

A couple of weeks back one of the drives in my RAID 5 array started going bad, with SMART errors appearing. It so happened to be the drive that I had bought to replace another drive that went bad (with the same error) back at the start of 2017. Anyway, warranty replacement drive went in, re-added it to the array and all seemed fine.

Then one Friday night, I went to watch a movie that had been recorded on MythTV previously. Switched on the TV, changed to MythTV, all of the recorded programs were showing as not found. Uh-oh. Rebooted the frontend, issue still there. Time to check the backend. Logged in to it, ran dmesg, up come a heap of XFS errors. That's not good. Better restart it and see if that fixes things. Nope, now it won't boot. It gets to the recovery console and I can't do much. Checking the SMART status of the drives, they seem fine, but the RAID array is not shown. It also seems to be missing a drive, but not the one I replaced. Scratching my head, wondering what's wrong, a niggling thought that I'd have to do a fresh re-install because Mythbuntu hosed itself. I'd been thinking about it for a while, maybe setting it up in a virtual machine, changing over to ZFS, but I wanted to do that in my own time, rather than having my hand forced. I went to bed to think about it.

The next day, I thought I'd try booting off a live USB of Mythbuntu. Fortunately, that worked, and I could see the array. It was missing one of the drives - so I added it back in. Many hours of rebuilding later, it was back up. While it was rebuilding, I looked in to repairing XFS filesystems.

Once the array was up, I unmounted the volume that contained the recordings, and ran xfs_repair /dev/raid1/tv. It found quite a few problems, and rectified quite a few. With fingers crossed, I tried rebooting back into the original system...success! With a sense of relief and amazement the desktop appeared. Upon checking the list of recordings, quite a few were still missing, mostly from the last few days. I hadn't checked up on it for a little while, so whatever issue there was, was screwing up the recordings. Deleting them from within MythTV, to clean up the database, fixed that. I also noticed that a number of older recordings were also showing as missing. Probably just a byproduct of what was happening. No big deal, they are just TV shows after all.

The system has been running OK for the last few days, so hopefully the issue is resolved. I'm still at a loss as to what actually went wrong though. Might have to dig through some log files.