maemo.org Bugzilla – Bug 5767
Corrupted /home/user/MyDocs VFAT filesystem out of the box
Last modified: 2010-09-27 11:56:26 UTC
You need to log in before you can comment on or make changes to this bug.
SOFTWARE VERSION: (Control Panel > General > About product) 1.2009.41-10 STEPS TO REPRODUCE THE PROBLEM: Not sure - yet. I don't know where to look. EXPECTED OUTCOME: No problem. ACTUAL OUTCOME: Problem: /media/mmc* filesystems mounted read-only. REPRODUCIBILITY: (always/sometimes/once) Sometimes. Too often actually. EXTRA SOFTWARE INSTALLED: A bunch.
This usually happens when the kernel thinks the filesystem on the card is corrupted (it automatically remounts it read-only to avoid causing more damage). Can you check the output of the command "dmesg" in an x terminal, and check/repair the card on another computer?
I actually meant /home/user/MyDocs, a bit confused with N810. I can find this in dmesg: [55636.575378] FAT: Filesystem error (dev mmcblk1p1) [55636.575408] fat_get_cluster: invalid cluster chain (i_pos 0) [55636.575439] File system has been set read-only [55638.017578] FAT: Filesystem error (dev mmcblk1p1) [55638.017639] fat_get_cluster: invalid cluster chain (i_pos 0) [55638.124145] FAT: Filesystem error (dev mmcblk1p1) [55638.124176] fat_get_cluster: invalid cluster chain (i_pos 0) Can I just fsck it from the N900 itself? I guess it would be nice if the software would deal with this in a more convenient way.
Weird. So that in dmesg _was_ the external card. And everything is still mounted rw *BUT* it's not writable: Nokia-N900-41-10:/home/user/MyDocs# mount|grep mmcblk /dev/mmcblk0p2 on /home type ext3 (rw,noatime,errors=continue,commit=1,data=writeback) /dev/mmcblk0p1 on /home/user/MyDocs type vfat (rw,noauto,nodev,noexec,nosuid,noatime,nodiratime,utf8,uid=29999,shortname=mixed,dmask=000,fmask=0133,rodir) /dev/mmcblk1p1 on /media/mmc1 type vfat (rw,noauto,nodev,noexec,nosuid,noatime,nodiratime,utf8,uid=29999,shortname=mixed,dmask=000,fmask=0133,rodir) Nokia-N900-41-10:/home/user/MyDocs# touch test touch: test: Read-only file system And then... Nokia-N900-41-10:/home/user/MyDocs# mount -o remount,rw /home/user/MyDocs Nokia-N900-41-10:/home/user/MyDocs# touch test Nokia-N900-41-10:/home/user/MyDocs# What can I try to fix this? Come to think of it... This might be the cause of #5766
(In reply to comment #2) > Can I just fsck it from the N900 itself? Yes (dosfsck is installed by default), or you can do it on a PC over a USB connection. (In reply to comment #3) > Weird. So that in dmesg _was_ the external card. And everything is still > mounted rw *BUT* it's not writable: > > Nokia-N900-41-10:/home/user/MyDocs# touch test > touch: test: Read-only file system Sounds like both internal & card need repair. I had the same issue with the internal one on mine (but no problems after a dosfsck repair), and there's at least one more report of a corrupted filesystem (bug 5414 comment 2). It may be an artifact of the way these particular devices were flashed, but at this point I think it's worth bringing to Nokia's attention considering a similar thing has happened before (bug 2940).
Ah! I thought this was only me! About a week after using my 2009.41 N900 I noticed this behavior, I now have to remount MyDocs RW on every bootup.
(In reply to comment #4) > at this point I think it's worth bringing to Nokia's > attention considering a similar thing has happened before (bug 2940). As we have a few reports now about the Maemo Summit devices having issues with damaged filesystems I'm also wondering. CC'ing Quim - any idea who to ping for this?
CCing Eero so he is informed. Summit devices were flashed in an unorthodox way (manual flashing with normal Linux laptops, instead of the official flashing process in factories). I have no idea whether this has anything to do with the problem reported but it's worth mentioning it.
(In reply to comment #5) > Ah! I thought this was only me! About a week after using my 2009.41 N900 I > noticed this behavior, I now have to remount MyDocs RW on every bootup. In general most likely reasons for the FAT corruption is "user error": - removing battery (or even just back cover) while FAT is being written - removing USB cable without doing the "safely remove" on the PC side first Could the issue be result of either of these? If the device HW watchdog resets the device while FAT was being written, that will corrupt it also. If device reboots because of the HW watchdog, the /proc/bootreason file will contain value "32wd_to". Have you had those? (FAT is used because of compatibility reasons, all USB mass storage devices support FAT by default. However, Microsoft's FAT file system is very fragile in regards to disconnects/power loss.) (In reply to comment #6) > (In reply to comment #4) > > at this point I think it's worth bringing to Nokia's > > attention considering a similar thing has happened before (bug 2940). > > As we have a few reports now about the Maemo Summit devices having issues with > damaged filesystems I'm also wondering. Bug 2940 thing is easy to check. What "cat /proc/partitions" and "df" say?
(In reply to comment #8) > In general most likely reasons for the FAT corruption is "user error": > - removing battery (or even just back cover) while FAT is being written > - removing USB cable without doing the "safely remove" on the PC side first > Could the issue be result of either of these? I am one of those currently with a (partially) corrupted disk. I know that I have at some point had to take the battery out because of a lock up and I may or may not have unsafely removed usb (I can't recall). These, however, are things any user is liable to, and will, do in the course of owning the device. If this is really the case, in that a user can corrupt the filesystem and cause it to be mounted read only (hence preventing most of the device's functionality from working properly) by doing normal "dumb user things" then we will end up with a lot of unhappy and confused users. The device itself will need to do something to counteract this, such as periodic fsck's or something.
(In reply to comment #9) > I am one of those currently with a (partially) corrupted disk. I know that I > have at some point had to take the battery out because of a lock up and I may > or may not have unsafely removed usb (I can't recall). These, however, are > things any user is liable to, and will, do in the course of owning the device. > > If this is really the case, in that a user can corrupt the filesystem and > cause it to be mounted read only (hence preventing most of the device's > functionality from working properly) by doing normal "dumb user things" > then we will end up with a lot of unhappy and confused users. It should give a notification that user needs to do something. If it didn't, that's a (separate) bug. > The device itself will need to do something to counteract this, such as > periodic fsck's or something. Unfortunately doing full fsck for tens of GB of data before it's mounted on bootup or on USB disconnect just takes too much time. With worst case file system content (full of small files & dirs), it will need hundred(s) of MBs of memory (fsck needs to keep whole disk structure in RAM and if it starts swapping, it will take tens of minutes) and besides not letting user access MyDocs, slows down the device while that happens. That would be really bad user experience. After the filesystem is mounted RW, you cannot modify it through direct block level access and even doing just a check while it's mounted (i.e. potentially modified through file system access) doesn't provide reliable information. There are not really good solutions for that. Proper solution would be getting rid of FAT, but we cannot do that for compatibility reasons.
Is it somehow possible to notice that filesystems are turned ro? If so, the user could be shown a warning sign and possibly a question whether to reboot and fsck the filesystem.
(In reply to comment #11) > Is it somehow possible to notice that filesystems are turned ro? If so, the > user could be shown a warning sign That should be already shown. If not, AFAIK that's a bug.
I haven't see any warnings about filesystems being remounted ro.
I'm not sure why FAT has to be used for the internal flash. Why? There are other ways to provide a USB mass storage that *looks* like FAT to a PC but is not internally. If FAT must be used, perhaps a solution is to place a journalling file on it to record metadata writes, with the file being preallocated? It would be replayed in the event of an unclean unmount, to protect against unexpected reboots and power failures in the same way as journalling filesystems, but completely FAT compatible. My experience of other phones is watchdog resets happen often enough that everything must be engineered to assume it happens. And with old batteries, or old/dirty battery contacts, or too much vibration, or dropping the phone on a hard surface, can result in sudden power loss.
> If FAT must be used, perhaps a solution is to place a journalling file on it to record metadata writes Btw, if this turns out to be a necessary solution, I might be able to advise/help with implementation.
(In reply to comment #13) > I haven't see any warnings about filesystems being remounted ro. Could you file a separate bug about that? (e.g. "File system RO; no notice/warning to user on bootup or USB disconnect") (In reply to comment #14) > I'm not sure why FAT has to be used for the internal flash. Why? There are > other ways to provide a USB mass storage that *looks* like FAT to a PC but is > not internally. If you know one that is: - More reliable that having real FAT - Well tested (to actually work fine with other operating systems) - Copes with tens of GB sized partition without gobbling large amounts of RAM Please let us know. > If FAT must be used, perhaps a solution is to place a journalling file > on it to record metadata writes, with the file being preallocated? How do you get Windowses and OSXes to do that?
(In reply to comment #16) > (In reply to comment #13) > > I haven't see any warnings about filesystems being remounted ro. > > Could you file a separate bug about that? > > (e.g. "File system RO; no notice/warning to user on bootup or USB disconnect") Ok https://bugs.maemo.org/show_bug.cgi?id=5997
Note to myself: "/home/user/MyDocs has become readonly" (int-141809) might be related.
(In reply to comment #16) > (In reply to comment #14) > > I'm not sure why FAT has to be used for the internal flash. Why? There are > > other ways to provide a USB mass storage that *looks* like FAT to a PC but is > > not internally. > > If you know one that is: > - More reliable that having real FAT > - Well tested (to actually work fine with other operating systems) > - Copes with tens of GB sized partition without gobbling large amounts of RAM > > Please let us know. I've no good answer here. Let's drop that. > > If FAT must be used, perhaps a solution is to place a journalling file > > on it to record metadata writes, with the file being preallocated? > > How do you get Windowses and OSXes to do that? That's not necessary. For the internal memory: I'm assuming the 32GB internal memory is FAT too, as an earlier comment says the internal memory was corrupted too. It would be written only when the internal memory is being updated by Maemo itself during normal use. That would protect against metadata corruption during power failures and watchdog resets, which this bug shows do occur. It would not protect against power failure when writing the internal memory from an external PC as a block device, but that's ok. The big problem to fix is corruption during normal Maemo use with power failures and watchdog resets. Linux' FAT driver knows which blocks are file data and which are not, and the journal file can be a normal preallocated file, so it's not unrealistic to put this in the kernel FAT driver. (But I realise it's a bit much to do before the release date ;-) Journal replay can be an external tool, and is only needed after unclean shutdown. But it might be easier to implement in the kernel. It's a simple process. For plugged-in memory cards, journalling (by Maemo FAT writes) is not an option unless there is a reliable way to detect when the card has been modified by another OS, so that the journal will not be replayed after that, when the card is reinserted. I don't know without further study if there is such a reliable way. But this only applies to the plugged-in memory cards. I've considered journalling FAT in Linux before to cope with this kind of problem on other devices, but not implemented anything. In the end I just used ext3 with barriers :-) It isn't an idea made up just for this bug.
(In reply to comment #19) > > > If FAT must be used, perhaps a solution is to place a journalling file > > > on it to record metadata writes, with the file being preallocated? [...] > It would be written only when the internal memory is being updated by Maemo > itself during normal use. This may be something to consider for Harmattan. Note that journaling would probably noticeably slow down the file system speed (MyDocs with FAT is nearly twice as fast as Ext3 used for /home/) which may affect for example Youtube streams buffering & camera recording (which use MyDocs) and slow down swap (as swap partition is on same media as MyDocs). It will also wear the flash more.
(In reply to comment #8) > - removing USB cable without doing the "safely remove" on the PC side first May be related: I'm getting lots of I/O errors when connected in USB mass storage mode on one linux (Ubuntu 9.04 earlier and 9.10 now) box here, even when only reading, reproducible every time. This ends with the USB getting reset and rescanned so the effect is similar to yanking the USB cable without umounting cleanly. All other machines I tried, including the laptop I had with me at the summit (same OS, same cable) work fine, so a) I don't think it's a Maemo bug and b) that didn't cause the corruption in my case at least. > Bug 2940 thing is easy to check. What "cat /proc/partitions" and "df" say? Slight mismatch, but not in a bad way (fs < device): major minor #blocks name 179 1 28315648 mmcblk0p1 Filesystem 1k-blocks Used Available Use% Mounted on /dev/mmcblk0p1 28312128 14912384 13399744 53% /home/user/MyDocs How is the FAT partition flashed btw? It is a copy of a filesystem image, or a new filesystem created and populated from a tarball or similar?
(In reply to comment #21) > How is the FAT partition flashed btw? It is a copy of a filesystem image, > or a new filesystem created and populated from a tarball or similar? Apparently it's actually nowadays created & populated from tarballs (home + MyDocs), so that issue in N810 shouldn't be able to happen.
For end-user flashing it looks like a filesystem image: $ ./flasher-3.5 -F RX-51_2009SE_1.2009.41-1.VANILLA_PR_EMMC_MR0_ARM.bin -u flasher v2.5.2 (Oct 21 2009) Image 'mmc', size 241163 kB Version RX-51_2009SE_1.2009.41-1.VANILLA Unpacking mmc image to file 'mmc'... $ file mmc mmc: x86 boot sector, mkdosfs boot message display, code offset 0x58, OEM-ID " Maemo", sectors/cluster 128, Media descriptor 0xf8, heads 64, sectors 8390306 (volumes > 32 MB) , FAT (32 bit), sectors/FAT 513, serial number 0x4acd9aa4, label: "Nokia N900 " After flashing (51_2009SE_1.2009.42-11_PR_COMBINED_MR0_ARM.bin) and the above, and just installing openssh-server (in order to get root access) dosfsck reports no errors. Now we just need someone to get root and run dosfsck on a straight-out-of-the-box retail unit and we can close this.
So did this issue ever happen to somebody having a "real" device bought, not a device from the maemo.org Summit in Amsterdam? If so, please speak up!
(In reply to comment #24) > So did this issue ever happen to somebody having a "real" device bought, not a > device from the maemo.org Summit in Amsterdam? > > If so, please speak up! > Yeah, I have a production N900 and when I put a SD card with pictures and tried to delete EVERYTHING on the card is read-only. I ran chkdsk and supposedly it fixed some errors, but I'm still not sure what to do.
(In reply to comment #25) > Yeah, I have a production N900 and when I put a SD card with pictures and > tried to delete EVERYTHING on the card is read-only. The issue was with the SD card, not the internal eMMC (MyDocs)? (Then it's not the potential issue discussed here.) > I ran chkdsk and supposedly it fixed some errors, but I'm still not > sure what to do. I assume it fixed the issues. FAT file system is easy to corrupt, especially in a multipurpose devices like N900 which can access the card data in many different ways. See comment 8.
I have also have a corrupted filesystem. I also find the mediaplayer causing sw_reboots when playing files from the FAT fs and I think that's also caused by the corrupted filesystem. In dmesg I find following rules: [37046.271850] FAT: Directory bread(block 9782837) failed [37046.283264] mmcblk0: error -110 transferring data, sector 9782964, nr 1, card status 0x200900 [37046.283294] end_request: I/O error, dev mmcblk0, sector 9782964 When I run fsck it returns: input/output error When I try to reflash the emmc image it returns: SU_GET_UPDATE_STATUS_REQ terminated with error code 1.
Bug 8235 command 15 has one more reason why file system could get corrupted. (In reply to comment #27) > I have also have a corrupted filesystem. I also find the mediaplayer > causing sw_reboots when playing files from the FAT fs and I think > that's also caused by the corrupted filesystem. Can you file a separate bug about that, put me on CC and attach to the bug contents of /dev/mtd2 file and files from /var/lib/dsme/stats/ directory? (you need to be root to access these) > When I try to reflash the emmc image it returns: > SU_GET_UPDATE_STATUS_REQ terminated with error code 1. That's bug 7433.
Closing this bug report as the issue could not be reproduced. Please feel free to reopen this report if you can provide better steps to reproduce this / answer the questions in comment 16 and comment 26.