Bug 2615 - jffs2_gcd_mtd4/ pdflush occasionally hogging CPU & leaving the device unresponsive for tens of seconds
: jffs2_gcd_mtd4/ pdflush occasionally hogging CPU & leaving the device unrespo...
Status: RESOLVED WONTFIX
Product: Core
Kernel
: 4.0
: All Maemo
: Low major with 5 votes (vote)
: ---
Assigned To: unassigned
: linux-kernel-bugs
:
: performance
:
:
  Show dependency tree
 
Reported: 2007-12-21 11:00 UTC by Neil MacLeod
Modified: 2009-06-10 14:09 UTC (History)
9 users (show)

See Also:


Attachments
Attachment 1of2 for Comment #20 (47.58 KB, text/plain)
2008-02-27 04:22 UTC, Nicolas D.L. Subotich
Details
Attachment 2of2 for Comment #20 (15.74 KB, text/plain)
2008-02-27 04:23 UTC, Nicolas D.L. Subotich
Details
Attachment for Comment #34 (9.39 KB, application/zip)
2008-03-04 15:34 UTC, Nicolas D.L. Subotich
Details
Attachment for Comment #39 (9.04 KB, application/zip)
2008-03-05 06:01 UTC, Nicolas D.L. Subotich
Details
Attachment 1of2 for Comment #45 (36.36 KB, application/zip)
2008-03-08 18:06 UTC, Nicolas D.L. Subotich
Details
Attachment 2of2 for Comment #45 (39.44 KB, application/zip)
2008-03-08 18:06 UTC, Nicolas D.L. Subotich
Details
dmesg boot crc errors on flash (15.28 KB, text/plain)
2008-04-08 08:38 UTC, Peter
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Neil MacLeod (reporter) maemo.org 2007-12-21 11:00:20 UTC
Firmware: OS 2008 on N800, RX-34_2008SE_2.2007.50-2_PR_COMBINED_MR0_ARM.bin

The process pdflush is using significant amounts of CPU (ie. 100%) for extended
periods of time (up to 30 seconds) which leaves the device unresponsive and if
a GUI application is being used the OS will quite often suggest that the
application should be terminated because it is not responding, however the lack
of response is actually due to pdflush hogging the CPU and not yielding. If I
ignore the "kill application" dialog the application will eventually spark back
into life once pdflush has completed whatever it was doing.

This is an extract of "top" output (captured using an ssh session) when
"apt-get update" has been started in a second window:

Mem: 117124K used, 9704K free, 0K shrd, 296K buff, 44172K cached
Load average: 2.02 1.27 0.52
  PID USER     STATUS   VSZ  PPID %CPU %MEM COMMAND
  112 root     RW         0     6 98.0  0.0 pdflush
 1794 root     RW      1960  1783  0.9  1.5 top
   78 root     SW<        0     6  0.5  0.0 OMAP McSPI/0

In the above instance, pdflush ran continuously for 18 seconds at close to 100%
CPU usage during which time apt-get appeared to be hung.

STEPS TO REPRODUCE THE PROBLEM:
Run any application, GUI or otherwise, and pdflush will eventually kick in and
steal all CPU. I don't know what is causing it, so can't give an exact trigger
but it seems that anytime data needs to be written to (or read from?) jffs2
storage, then pdflush will hog the CPU.

EXPECTED OUTCOME:
I shouldn't even be aware that pdflush is running.

ACTUAL OUTCOME:
Unresponsive device for significant periods of time.

REPRODUCIBILITY:
Always, since installing the official OS 2008 for N800.

EXTRA SOFTWARE INSTALLED:
openssh-server

OTHER COMMENTS:
This has only started happening since I installed the official release of OS
2008 on N800 - I did not notice this problem during the two or three weeks when
using "OS2008 for N810" on N800 (I didn't test the OS 2008 Beta)

I have seen the "unresponsive application, do you want to kill it" dialog
several times while using MicroB and Application Manager, on each occasion it
was due to pdflush hogging all CPU.

I have rebooted the device, but this has not made any difference - pdflush is
still a problem.

I have two 4GB SDHC cards installed, although I am not accessing this storage
when the problem occurs (at least, not intentionally). I have also seen the
jffs2_gcd_mtd4 process using large amounts of CPU but less often than pdflush.
It's possible jffs2_gcd_mtd4 and pdflush are related - could my JFFS2
filesystem be corrupted causing pdflush to go bonkers? I don't see any messages
in dmesg to suggest there is a problem with the jffs2 filesystem.
Comment 1 Neil MacLeod (reporter) maemo.org 2007-12-24 17:13:31 UTC
Have re-flashed to try and cure this problem, so far so good but will need to
monitor it for longer to be sure the problem has gone away - I've no idea what
caused it, but suspect some type of corruption (bad blocks?) in jffs2 memory.
Comment 2 Neil MacLeod (reporter) maemo.org 2007-12-30 14:24:09 UTC
The pdflush problem has returned and been with me the last few days so the
reflash has not cured it. I also have miserable standby/idle times (see bug
2602) which may be due to pdflush consuming lots of CPU and eating my battery.

I'm considering going back to OS 2007 as the usage experience is horrible
thanks to pdflush rendering the device unusable for 30 seconds or more every
few minutes and the standby time is frankly ridiculous (I'm yet to get 48 hours
when the device is left completely idle - bug 2602).
Comment 3 tmartino 2008-01-05 18:27:56 UTC
I see the same behaviour.  Thought I had it licked when I removed the
resource-hogging RSS reader applet from the home screen, but the problem is
still there.  The RSS reader just made it worse.  OS2008 is great, but unless
this serious problem is resolved soon, I'll have to downgrade.

If you are experiencing this problem, please be sure to vote for it!
Comment 4 Jussi Kukkonen 2008-01-06 13:58:22 UTC
> I don't know what is causing it, so can't give an exact trigger
> but it seems that anytime data needs to be written to (or read from?) jffs2
> storage, then pdflush will hog the CPU.

pdflush can freak out when a lot of stuff gets written to disk, faster than
bandwidth allows (IIRC). 

I've never noticed this, and from the votes it doesn't seem like many others
have either, so it might be worth it to find out whatever is writing so much on
your systems... What applications are you running that might do this?
Comment 5 Neil MacLeod (reporter) maemo.org 2008-01-06 16:20:54 UTC
(In reply to comment #4)
> I've never noticed this, and from the votes it doesn't seem like many others
> have either, so it might be worth it to find out whatever is writing so much on
> your systems... What applications are you running that might do this?
> 

When I first noticed this, I only had openssh-server installed. In order to
trigger pdflush I only neeeded to be running the browser or Application
Manager, not exactly heavy duty applications... basically this problem
manifests itself during normal, light usage.
Comment 6 Oliver Wagner 2008-01-30 23:01:32 UTC
I was also seeing this particular bug and have now reflashed as a workaround.

Usage pattern: RSS Reader + Applet, Internet, Pidgin (on demand, with IRC
plugin)
Comment 7 Keith Taylor 2008-02-16 01:27:39 UTC
I also had this problem and had to reflash. Almost for sure a problem with
writing to disk. it would take several hours to save a rss feed file to disk.
Comment 8 Eero Tamminen nokia 2008-02-18 15:41:57 UTC
(In reply to comment #0)
> Firmware: OS 2008 on N800, RX-34_2008SE_2.2007.50-2_PR_COMBINED_MR0_ARM.bin
> 
> The process pdflush is using significant amounts of CPU (ie. 100%) for
> extended periods of time (up to 30 seconds) which leaves the device
> unresponsive and if a GUI application is being used the OS will quite often
> suggest that the application should be terminated because it is not 
> responding, however the lack of response is actually due to pdflush hogging
> the CPU and not yielding.
>
> If I ignore the "kill application" dialog the application will eventually
> spark back into life once pdflush has completed whatever it was doing.

The application is most like doing sync i.e. pdflush is doing what
the application application asked.  App then just waits until the synched
data is written to disk.


> ACTUAL OUTCOME:
> Unresponsive device for significant periods of time.

The whole device (i.e. also power menu or task navigator menu coming from
Power and Home keys) or only the applications doing disk writes?

(Only applications writing to disk should be affected)


> I have seen the "unresponsive application, do you want to kill it" dialog
> several times while using MicroB and Application Manager, on each occasion it
> was due to pdflush hogging all CPU.

When this happens, have you viewed any Flash videos (youtube, ads...)
recently?  All the streamed video data is cached to internal flash or
to memory card if one is available and videos can be very large.

Application manager can write quite a bit of data too if you install
applications.


> I have two 4GB SDHC cards installed, although I am not accessing this storage
> when the problem occurs (at least, not intentionally). I have also seen the
> jffs2_gcd_mtd4 process using large amounts of CPU but less often than pdflush.
> It's possible jffs2_gcd_mtd4 and pdflush are related - could my JFFS2
> filesystem be corrupted causing pdflush to go bonkers? I don't see any
> messages in dmesg to suggest there is a problem with the jffs2 filesystem.

jffs2_gcd_mtd4 is the JFFS2 garbage collector.  If you do a large write
and then pdflush writes it to disk from page cache, that can trigger
JFFS2 GCing to make room for that write.  The fuller your internal
flash is, the slower your device.
Comment 9 Neil MacLeod (reporter) maemo.org 2008-02-18 16:02:34 UTC
Thanks for commenting Eero.

(In reply to comment #8)
> 
> The whole device (i.e. also power menu or task navigator menu coming from
> Power and Home keys) or only the applications doing disk writes?
> 
> (Only applications writing to disk should be affected)
> 

pdflush consumes close to 100% CPU when it goes nuts (typically 99.5% CPU
according to a remote ssh session running top) so the entire device is
unresponsive, other applications may still function but will of course respond
very slowly.

> 
> 
> When this happens, have you viewed any Flash videos (youtube, ads...)
> recently?  All the streamed video data is cached to internal flash or
> to memory card if one is available and videos can be very large.
> 
> Application manager can write quite a bit of data too if you install
> applications.
> 
> 

No - on one occassion I hadn't used the device for several hours (unused
overnight, for example) and when I went to view an RSS article by clicking on
an item in the RSS applet I got hit by the pdflush problem. I can't say if I
had been viewing a Flash video in the previous 12+ hours, it's possible but
unlikely as it's not something I do often as frankly they're barely watchable.
Ads are more likely though. But still, it would have been many hours previous.

> jffs2_gcd_mtd4 is the JFFS2 garbage collector.  If you do a large write
> and then pdflush writes it to disk from page cache, that can trigger
> JFFS2 GCing to make room for that write.  The fuller your internal
> flash is, the slower your device.
> 

I've been hit by this problem without doing any large writes - just regular
browsing, or reading an RSS feed. And since my device (at the time) only had
ssh-server installed, my internal flash was almost entirely empty.

Also, since re-flashing several times appears to have cured this problem (I've
stopped using my N800 for the time being) does this suggest that internal flash
is somehow being corrupted which is causing pdflush/jffs2_gcd_mtd4 to go mad
and consume excessive amounts of CPU when even small amounts of data is being
written to internal flash? Is there a problem with the flashing process that
can somehow leave internal memory in an inconsistent state? I've never seen
this type of problem with a 770 or N800 when running OS 2005/6/7 - only with OS
2008. So far it doesn't appear to have affected my N810.
Comment 10 Eero Tamminen nokia 2008-02-18 16:33:02 UTC
(In reply to comment #9)
> > The whole device (i.e. also power menu or task navigator menu coming from
> > Power and Home keys) or only the applications doing disk writes?
> > 
> > (Only applications writing to disk should be affected)
> 
> pdflush consumes close to 100% CPU when it goes nuts (typically 99.5% CPU
> according to a remote ssh session running top)

Note that tasks which don't slow others due to being running at lower priority
or yielding often, can still show up as using ~100% CPU in top when the other
processes are idle.


> so the entire device is unresponsive, other applications may still
> function but will of course respond very slowly.

Best test-case is probably the power key menu, it's not related to any disk
activity.  Task Navigator, Statusbar and Home are all in a same process
so if any of their applets is doing any disk activity and this issue is
triggered, it affects all of them.


> > When this happens, have you viewed any Flash videos (youtube, ads...)
> > recently?  All the streamed video data is cached to internal flash or
> > to memory card if one is available and videos can be very large.
> > 
> > Application manager can write quite a bit of data too if you install
> > applications.
> 
> No - on one occassion I hadn't used the device for several hours (unused
> overnight, for example) and when I went to view an RSS article by clicking on
> an item in the RSS applet I got hit by the pdflush problem.

The device was slow, and you checked that it actually was pdflush,
not jffs2_gcd_mtd4?


> I can't say if I
> had been viewing a Flash video in the previous 12+ hours, it's possible but
> unlikely as it's not something I do often as frankly they're barely watchable.
> Ads are more likely though. But still, it would have been many hours previous.

Ok, in that case I think it should have been already flushed to disk.


> > jffs2_gcd_mtd4 is the JFFS2 garbage collector.  If you do a large write
> > and then pdflush writes it to disk from page cache, that can trigger
> > JFFS2 GCing to make room for that write.  The fuller your internal
> > flash is, the slower your device.
> 
> I've been hit by this problem without doing any large writes - just regular
> browsing, or reading an RSS feed. And since my device (at the time) only had
> ssh-server installed, my internal flash was almost entirely empty.

JFFS2 GC is a bit different from pdflush.  The write or delete can have
happened a long time (days) before GC is being triggered.  Also, although
"df" shows internal flash to be almost empty, it's possible that this "free"
space is actually something that JFFS2 needs to GC before it can be written.
Had the internal flash been fairly full (say 80%) and you'd removed most
of it some time before this happened?


> Also, since re-flashing several times appears to have cured this problem
> (I've stopped using my N800 for the time being) does this suggest that
> internal flash is somehow being corrupted which is causing 
> pdflush/jffs2_gcd_mtd4 to go mad and consume excessive amounts of CPU
> when even small amounts of data is being written to internal flash?

See above JFFS2 GCing comment.


> Is there a problem with the flashing process that
> can somehow leave internal memory in an inconsistent state? I've never seen
> this type of problem with a 770 or N800 when running OS 2005/6/7 - only with
> OS 2008. So far it doesn't appear to have affected my N810.

I think the pdflush and jffs2_gcd_mtd4 issues are separate.  Because you
mentioned that you had 4GB SDHC card(s) inserted, I think the pdflush
issues are actually related to memory card use.  I guess you've checked
that your memory card file systems haven't corrupted?

If you get the pdflush issue again, please check with "ps" whether any of
the processes are in "D" state (deep sleep).
Comment 11 Neil MacLeod (reporter) maemo.org 2008-02-18 19:48:09 UTC
(In reply to comment #10)
> > pdflush consumes close to 100% CPU when it goes nuts (typically 99.5% CPU
> > according to a remote ssh session running top)
> 
> Note that tasks which don't slow others due to being running at lower priority
> or yielding often, can still show up as using ~100% CPU in top when the other
> processes are idle.
> 

Yes, but in this case pdflush is not yielding and is causing the application
which most likely caused the write to disk to then become a candidate for
closure by the OS as it is no longer responding. Other processes may be
unaffected, but in my case it was quite usual for the browser, RSS reader or
App Manager to "hang" for 30+ seconds at which point the OS would suggest the
app is closed as it's no long responding, only for the application to
eventually recover once the pdflush completes it's processing and allows the
application to continue.

> 
> Best test-case is probably the power key menu, it's not related to any disk
> activity.  Task Navigator, Statusbar and Home are all in a same process
> so if any of their applets is doing any disk activity and this issue is
> triggered, it affects all of them.
> 
If I recall the power key menu was working OK as before I realised what was
happening with pdflush I was on the verge of rebooting the device and had
brought up the power menu. But before I could hit "shut down", the application
came back to life and I learned that the device hanging was only a temporary
(but frequent) problem.

> 
> The device was slow, and you checked that it actually was pdflush,
> not jffs2_gcd_mtd4?
> 

According to top, which I run in a remote ssh session, it is mostly pdflush
consuming 99%+ CPU but I have also at other times seen jffs2_gcd_mtd4 consuming
50% CPU for extended periods (30+ seconds). I don't recall ever seeing both
processes running at the same time, it's one or the other with pdflush being
more common than jffs2_gcd_mtd4 (perhaps twice as common).

> 
> JFFS2 GC is a bit different from pdflush.  The write or delete can have
> happened a long time (days) before GC is being triggered.  Also, although
> "df" shows internal flash to be almost empty, it's possible that this "free"
> space is actually something that JFFS2 needs to GC before it can be written.
> Had the internal flash been fairly full (say 80%) and you'd removed most
> of it some time before this happened?
> 

No, I've barely used the device mainly because I haven't installed many
applications due to the server disaster over the holidays. I hadn't removed or
modified the internal flash in any significant way (at least not knowingly) - I
had installed ssh-server, and configured about 9 RSS Feeds (BBC News/Sport,
Register, Inquirer, Maemo Announce/Planet/News, Engadget, ITT). RSS Feeds were
updating every 4 hours with image caching enabled. This is pretty much how I
had RSS setup under OS 2007 and that never had any problems of this nature.

> 
> > Also, since re-flashing several times appears to have cured this problem
> > (I've stopped using my N800 for the time being) does this suggest that
> > internal flash is somehow being corrupted which is causing 
> > pdflush/jffs2_gcd_mtd4 to go mad and consume excessive amounts of CPU
> > when even small amounts of data is being written to internal flash?
> 
> See above JFFS2 GCing comment.
> 
It was a very lightly used system to be honest. Which is why I'm struggling to
think of what I could have done which caused such excessive adverse behaviour,
which I've never seen before on previous IT OS versions despite heavier usage
patterns.

> 
> > Is there a problem with the flashing process that
> > can somehow leave internal memory in an inconsistent state? I've never seen
> > this type of problem with a 770 or N800 when running OS 2005/6/7 - only with
> > OS 2008. So far it doesn't appear to have affected my N810.
> 
> I think the pdflush and jffs2_gcd_mtd4 issues are separate.  Because you
> mentioned that you had 4GB SDHC card(s) inserted, I think the pdflush
> issues are actually related to memory card use.  I guess you've checked
> that your memory card file systems haven't corrupted?
> 
The 4GB SDHC filesystem is fine. The only data on the 4GB SDHC is a bunch of
mp3's and videos which I certainly wasn't accessing at the time I would see
pdflush activity go through the roof - indeed, pdflush hogging the CPU would
always coincide with activity when using apparently "internal flash" only
applications such as MicroB, RSS Feed Reader and Application Manager.

> If you get the pdflush issue again, please check with "ps" whether any of
> the processes are in "D" state (deep sleep).
> 

I will do that, or hopefully anyone else with the same problem can report the
same. :)
Comment 12 Eero Tamminen nokia 2008-02-19 13:13:23 UTC
Pdflush should be involved only with memory card activity.
Do you have swap enabled?

When the slowdown happens next time, please also save the /proc/meminfo
(we're mainly interested about Dirty and Writeback fields).
Comment 13 Neil MacLeod (reporter) maemo.org 2008-02-19 13:32:41 UTC
swap is enabled. I'll report the content of /proc/meminfo should it ever happen
again but I've now moved on to the N810 which doesn't appear to have this
problem - it's apparent this problem isn't affecting everyone or every device,
so perhaps someone who still has this problem will be able to supply the
information as it's possible I won't be able to do so.
Comment 14 Nicolas D.L. Subotich 2008-02-23 05:39:46 UTC
Hi,

I have been having the same issue for a while now. I will try to capture the ps
and proc/meminfo details in the next couple of days. 

Some history about my experience: About a month ago I reflashed the N800 and
kept a bare minimum set of applications for a couple of weeks. No problems
there. I was happy.
I then started to re-install a one or two applications at a time and let it run
for a couple of days after each install. No problems for a week or so.
Then I got confident that the problem had gone away and installed/configured
several applications at once (yep, bad idea to aid troubleshooting). The apps
were: Camera, FM Radio, RSS w/5 feeds, Media Streamer. The problem with pdflush
came back. I removed some of the apps (Camera, FMRadio, set RSS not to auto
refresh) and it got much better, but occasionally the problem appears again. 
Then I saw this comment chain and here I am... I re-installed the same
applications I had removed to see if I can reproduce the problem systematically
again.

Some additional details:

- Swap enabled. Resized swap up and down several times along the way.

- Same 2GB SD Card used before the reflash (problem existed), after the reflash
(problem went away for several weeks) and now (problem comes every now and
then). I even tried a different SD card before the reflash and there was no
change: problem still there.

- Boot times:
    - aprox.  3 minutes when the pdflush problem was at its worst
    - aprox.  5 seconds after the reflash
    - aprox. 30 seconds once it started getting bad again and I removed the
"suspected" set of apps.

- When the problem was at its worst, I disabled swap, removed the SD card
altogether and rebooted. Boot time was still at around 3 minutes. Right after
the reboot, pdflush would show up in top as using up all available CPU for
another 30 seconds of so before going away.

- Also, when the problem was at its worst, refreshing the Applications List
from the Applications Manager triggered the problem almost every time. It would
take a couple of minutes just to refresh the list.

- When the problem happens, most of the time (90%) pdflush is the process using
up all the CPU time. I've also seen jffs2_gcd_mtd4 doing that, but far less
frequently and other times it is both of them sharing all available CPU.

- Before I was using just the local terminal, but now I setup the openssh
client and server and I am monitoring top in one session and I have a "cat
/proc/meminfo >meminfo.txt" ready to go in another session.

Now that I am watching it, it better start failing again just as it did before
so we can nail it down...

Thanks for getting this far down the message. Hopefully someone will be able to
figure out what is going on.

Regards,

Nick
Comment 15 Nicolas D.L. Subotich 2008-02-23 08:06:29 UTC
OK, I've got the info from a relatively short duration hung-up (~30seconds).
Steps: click on an entry in the RSS Reader applet -> call up the RSS Reader
application -> click an entry and call up an article in the browser.

Here is the output from /proc/meminfo:

MemTotal:       126828 kB
MemFree:          4368 kB
Buffers:           116 kB
Cached:          55228 kB
SwapCached:         32 kB
Active:          74104 kB
Inactive:        19448 kB
SwapTotal:       49144 kB
SwapFree:        49040 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:       38188 kB
Mapped:          28232 kB
Slab:            16940 kB
SReclaimable:     4004 kB
SUnreclaim:      12936 kB
PageTables:       2136 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:    112556 kB
Committed_AS:    93420 kB
VmallocTotal:   122880 kB
VmallocUsed:       936 kB
VmallocChunk:   121920 kB

Here is the output from ps (I do not know if it matters, but there are two
pdflush processes listed):

  PID  Uid        VSZ Stat Command
    1 root       1468 SW  init [2]     
    2 root            SWN [ksoftirqd/0]
    3 root            SW  [watchdog/0]
    4 root            SW< [events/0]
    5 root            SW< [khelper]
    6 root            SW< [kthread]
   14 root            SW< [dvfs/0]
   65 root            SW< [kblockd/0]
   66 root            SW< [kseriod]
   78 root            SW< [OMAP McSPI/0]
   85 root            SW< [ksuspend_usbd]
   88 root            SW< [khubd]
  112 root            SW  [pdflush]
  113 root            SW  [pdflush]
  114 root            SW< [kswapd0]
  115 root            SW< [aio/0]
  118 root            SW< [mipid_esd]
  237 root            SW  [mtdblockd]
  279 root            SW< [kondemand/0]
  280 root            SW< [kmmcd]
  291 root            SW< [krfcommd]
  311 root            SW< [mmcqd]
  335 root            SW< [mmcqd]
  341 root       1104 SW< dsme -d -l syslog -v 4 -p /usr/lib/dsme/libstartup.so
  349 root        776 SW  /usr/bin/bme_RX-34 
  351 root        564 SW  /usr/sbin/kicker 
  379 root            SW< [cx3110x]
  399 root            SWN [jffs2_gcd_mtd4]
  428 root       1576 SW< /sbin/udevd --daemon 
  606 messagebus   2308 SW< /usr/bin/dbus-daemon --system 
  629 haldaemon   3980 SW  /usr/sbin/hald 
  642 root       2800 SW  hald-runner 
  665 root       2952 SW  /usr/lib/hal/hald-addon-cpufreq 
  678 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  679 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  680 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  687 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  699 root       2940 SW  hald-addon-input: Listening on /dev/input/event2 /dev
  700 root       2436 SW  /usr/lib/hal/hald-addon-mmc 
  701 root       2436 SW  /usr/lib/hal/hald-addon-mmc 
  746 root       3168 SW  /usr/sbin/sshd 
  753 root       3636 SW< /sbin/mce --force-syslog 
  755 messagebus   3324 SW  /usr/lib/gconf2/gconfd-2 
  759 root      13048 SW< /usr/bin/Xomap -mouse tslib -nozap -dpi 96 -wr -nolis
  786 user       1444 SW< /usr/sbin/temp-reaper 
  789 user       2048 SW< /usr/bin/dbus-daemon --session 
  793 user       7084 SW< /usr/lib/sapwood/sapwood-server 
  797 user       6520 SW< /usr/bin/matchbox-window-manager -theme echo -use_tit
  805 root            SW< [dsp/0]
  808 root            SW< [dsp/0]
  811 root       3668 SW  /usr/sbin/dsp_dld -p --disable-restart -c /lib/dsp/ds
  816 root       2792 SW< /usr/bin/bme-dbus-proxy -N 
  876 root       4804 SW  /usr/sbin/multimediad 
  881 root       2176 SW< /usr/bin/esd 
  930 root            SW< [file-storage-ga]
  934 haldaemon   2508 SW  hald-addon-usb-cable: listening on /sys/devices/plat
  942 root       7852 SW< /usr/sbin/ke-recv 
  960 user      18960 SW< /usr/bin/maemo-launcher --daemon --send-app-died --bo
  989 user       1448 SW< /usr/bin/hildon-input-method 
  990 user      27492 SW< /usr/bin/hildon-input-method                         
  994 user       3916 SW< /usr/bin/clipboard-manager 
 1034 user       1448 SW< /usr/bin/hildon-desktop 
 1035 user      35988 SW< /usr/bin/hildon-desktop                              
 1041 nobody     1812 SW< /usr/sbin/dnsmasq -k -i lo -a 127.0.0.1 -z 
 1049 root       2520 SW< /usr/sbin/hcid -n -x -s 
 1052 root       2452 SW< /usr/lib/bluetooth/bluetoothd-service-audio 
 1055 root       2412 SW< /usr/lib/bluetooth/bluetoothd-service-input 
 1059 root       2924 SW< /usr/bin/btcond -l 
 1062 root       3580 SW< /usr/sbin/wlancond 
 1065 root       3204 RW< /usr/sbin/icd 
 1094 user       1448 SW< /usr/bin/osso-connectivity-ui-conndlgs 
 1095 user      20608 SW< /usr/bin/osso-connectivity-ui-conndlgs               
 1115 user      43544 SW< /usr/bin/osso-media-server 
 1145 systemui  16936 SW< /usr/bin/systemui 
 1150 user       3192 SW< /usr/bin/obexsrv -l 
 1170 user       5228 SW  /usr/bin/alarmd 
 1203 root       5084 SW< /usr/sbin/hulda 
 1204 root       3036 SW< /usr/sbin/hulda 
 1224 user      16204 SW  /usr/bin/mediaplayer-engine 
 1226 messagebus  14392 SW  /usr/bin/eapd 
 1234 user      16172 SWN /usr/bin/metalayer-crawler -F 
 1249 user       7944 SW  /usr/libexec/gnome-vfs-daemon 
 1252 root       1472 SW< /sbin/udhcpc -i wlan0 -s /etc/udhcpc/udhcpc.script -f
 1293 root       5776 SW  sshd: root@pts/0  
 1295 root       1964 SW  -sh 
 1312 root       5776 SW  sshd: root@pts/1  
 1314 root       1960 SW  -sh 
 1385 root       1960 SW  top 
 1508 user       1448 SW  /usr/bin/osso_rss_feed_reader 
 1509 user      28048 RW  /usr/bin/osso_rss_feed_reader                        
 1511 user     122280 RW  /usr/bin/browser 
 1521 root       1960 RW  ps
Comment 16 Eero Tamminen nokia 2008-02-25 09:55:13 UTC
(In reply to comment #15)
> OK, I've got the info from a relatively short duration hung-up (~30seconds).
> Steps: click on an entry in the RSS Reader applet -> call up the RSS Reader
> application -> click an entry and call up an article in the browser.

And when this happened, it was "pdflush" again taking CPU instead of Browser
just being busy with the new page?  Did the Browser page have any Flash content
(if you have Flashplayer enabled) that could be videos (which are buffered to
memory card)?


> Here is the output from /proc/meminfo:
> 
> MemTotal:       126828 kB
> MemFree:          4368 kB
> Buffers:           116 kB
> Cached:          55228 kB
> SwapCached:         32 kB
> Active:          74104 kB
> Inactive:        19448 kB
> SwapTotal:       49144 kB
> SwapFree:        49040 kB
> Dirty:               0 kB
> Writeback:           0 kB

Hm. There's no Dirty/Writeback, so there shouldn't be anything to do for
pdflush.  Did this output finish before device was again responsive?


> Here is the output from ps

There doesn't seem to be anything out of place here.


> (I do not know if it matters, but there are two
> pdflush processes listed):

I think that's OK/normal.

Could you attach also output of "ls -l /proc/*/fd" (list of open files)
while this happens and output of "dmesg" (kernel messages) after "pdflush"
isn't anymore busy?

What "df" reports?  (how full your device internal flash and memory card are)
Comment 17 Nicolas D.L. Subotich 2008-02-26 04:05:33 UTC
Please find my replies below.
By the way, last night I was checking InternetTabletTalk.com and I found about
the firmware that was recently released (OS2008 v2.2007.51-3) and I promptly
flashed away...(I know, I know, I am not helping the troubleshooting process).
I restored from a backup and now the N800 refuses to lock up (same SD card
too). I will keep hammering it and I will report accordingly. In any case, my
responses are below (this was before the flash update to 51-3)

(In reply to comment #16)
> (In reply to comment #15)
> > OK, I've got the info from a relatively short duration hung-up (~30seconds).
> > Steps: click on an entry in the RSS Reader applet -> call up the RSS Reader
> > application -> click an entry and call up an article in the browser.
> 
> And when this happened, it was "pdflush" again taking CPU instead of Browser
> just being busy with the new page?  Did the Browser page have any Flash content
> (if you have Flashplayer enabled) that could be videos (which are buffered to
> memory card)?
> 

Yes, pdflush was using up most of the CPU time, not Browser. I "think" (read: I
am not sure) that the site had flash contents, but it sure did not have flash
videos. 

After flashing to the new firmware, I repeated the test with the RSS Reader and
calling up the browser while monitoring CPU usage with "top -d 1". I noticed
that pdflush came up and went away in a single cycle while the free memory was
going down (~2000KB free). So, this behavior seems to make sense. Before the
latest flash update, pdflush would just lock-up at that point for over 30
seconds.


> 
> > Here is the output from /proc/meminfo:
> > 
> > MemTotal:       126828 kB
> > MemFree:          4368 kB
> > Buffers:           116 kB
> > Cached:          55228 kB
> > SwapCached:         32 kB
> > Active:          74104 kB
> > Inactive:        19448 kB
> > SwapTotal:       49144 kB
> > SwapFree:        49040 kB
> > Dirty:               0 kB
> > Writeback:           0 kB
> 
> Hm. There's no Dirty/Writeback, so there shouldn't be anything to do for
> pdflush.  Did this output finish before device was again responsive?
> 
It actually took a while (>15secs) for the "cat /proc/meminfo" to complete but,
yes, the output finished before the N800 was responsive again (it was like
pdflush "let go" enough for the command to complete). I was able to get the
output from "ps" before pdflush finished too.


> 
> > Here is the output from ps
> 
> There doesn't seem to be anything out of place here.
> 
> 
> > (I do not know if it matters, but there are two
> > pdflush processes listed):
> 
> I think that's OK/normal.
> 
> Could you attach also output of "ls -l /proc/*/fd" (list of open files)
> while this happens and output of "dmesg" (kernel messages) after "pdflush"
> isn't anymore busy?
> 
> What "df" reports?  (how full your device internal flash and memory card are)
> 

I will gladly capture all of this info if the problem happens again, which I
certainly hope it doesn't :)
Here is the "df" output:

>>>
Nokia-N800-51-3:~# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mtdblock4            2048      2048         0 100% /mnt/initfs
none                       512        92       420  18% /mnt/initfs/tmp
/dev/mtdblock4          257536    126052    131484  49% /
none                       512        92       420  18% /tmp
none                      1024        20      1004   2% /dev
tmpfs                     1024         0      1024   0% /dev/shm
/dev/mmcblk0p1         1992320    967712   1024608  49% /media/mmc2
/dev/mmcblk1p1          993376     38532    954844   4% /media/mmc1
<<<


Oh, by the way, after flashing the new firmware, boot up times have been back
to normal (5~10 seconds until the NOKIA splash screen shows up).

Thanks again for taking the time to look into this problem.

Best Regards,

Nicolas
Comment 18 Eero Tamminen nokia 2008-02-26 11:18:20 UTC
(In reply to comment #17)
> I will gladly capture all of this info if the problem happens again, which I
> certainly hope it doesn't :)
> Here is the "df" output:
>
> Nokia-N800-51-3:~# df

Um, I meant from the release that had the problem.


> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/mtdblock4            2048      2048         0 100% /mnt/initfs
> none                       512        92       420  18% /mnt/initfs/tmp
> /dev/mtdblock4          257536    126052    131484  49% /

This looks fine.


> Oh, by the way, after flashing the new firmware, boot up times have been back
> to normal (5~10 seconds until the NOKIA splash screen shows up).

JFFS2 gets slower when the file system is full (or has lots of
removed/overwritten data that needs to be garbage collected to
be re-usable).  This affects also the time it takes to mount
the file system which happens before the Nokia hands splash.

Adding services to early boot (such as syslog...) will also
delay the splash.
Comment 19 Nicolas D.L. Subotich 2008-02-26 15:22:52 UTC
Please see my answers below:

(In reply to comment #18)
> (In reply to comment #17)
> > I will gladly capture all of this info if the problem happens again, which I
> > certainly hope it doesn't :)
> > Here is the "df" output:
> >
> > Nokia-N800-51-3:~# df
> 
> Um, I meant from the release that had the problem.
> 
> 

It is the same 2 GB SD card. I didn't re-format it either. The bulk of the data
is made up of map files for "map" and "maemo mapper".
After re-flashing, I only deleted about five old folders created by the backup
application, which were small (<10MB).


> > Filesystem           1k-blocks      Used Available Use% Mounted on
> > /dev/mtdblock4            2048      2048         0 100% /mnt/initfs
> > none                       512        92       420  18% /mnt/initfs/tmp
> > /dev/mtdblock4          257536    126052    131484  49% /
> 
> This looks fine.
> 
> 
> > Oh, by the way, after flashing the new firmware, boot up times have been back
> > to normal (5~10 seconds until the NOKIA splash screen shows up).
> 
> JFFS2 gets slower when the file system is full (or has lots of
> removed/overwritten data that needs to be garbage collected to
> be re-usable).  This affects also the time it takes to mount
> the file system which happens before the Nokia hands splash.
> 
> Adding services to early boot (such as syslog...) will also
> delay the splash.
> 

Since the boot times were just as slow with the SD card removed, could the
problem be caused by the internal memory? It wasn't close to be full either, by
the way.

I'll keep an eye on my N800 and I'll let you know. Thank yoou very much, again.

Nicolas
Comment 20 Nicolas D.L. Subotich 2008-02-27 04:06:07 UTC
Ok, here we go... I had a couple of lockups even with the new firmware. Here
are the details:

- I called up an entry from the RSS Reader into the browser (an article from
the  Slashdot.com rss feed). 

- pdflush starting using up all the CPU so I ran a script (s1) to capture items
<2> through <4>. The script sat there for about 15~20 seconds.

- While pdflush was still hung up, I stopped top and captured its output (see
<1> below). At the exact time of the capture, pdflush was using about 50% CPU.

- After pdflush finished, I ran another script (s2) to capture the remaining
items.


#===============================================================================
<1> top
#===============================================================================

Mem: 113888K used, 12940K free, 0K shrd, 108K buff, 51228K cached
Load average: 1.21 0.43 0.35
  PID USER     STATUS   VSZ  PPID %CPU %MEM COMMAND
  113 root     RW         0     6 48.6  0.0 pdflush
 2566 user     RW     53136     1 17.7 41.8 browser
 2564 user     RW     28516   958 16.9 22.4 maemo-launcher
  757 root     SW<    12832   334  6.7 10.1 Xomap
 1036 user     SW<    40636   958  6.1 31.9 maemo-launcher
  795 user     SW<     6520   334  1.9  5.1 matchbox-window
 2536 root     RW      1960  2500  0.7  1.5 top
  379 root     SW<        0     6  0.3  0.0 cx3110x
  787 user     SW<     2048   334  0.1  1.6 dbus-daemon
  806 root     SW<        0     6  0.1  0.0 dsp/0
 1121 user     SW<    77044   334  0.0 60.6 osso-media-serv
  991 user     SW<    34400   958  0.0 27.0 maemo-launcher
 1146 systemui SW<    23328   334  0.0 18.3 systemui
 1117 user     SW<    20616   958  0.0 16.2 maemo-launcher
  958 user     SW<    18960     1  0.0 14.9 maemo-launcher
 1235 user     SWN    16172   334  0.0 12.7 metalayer-crawl
 1229 user     SW     16104   334  0.0 12.6 mediaplayer-eng
 1191 messagebus SW     14392     1  0.0 11.3 eapd
 1250 user     SW      7876     1  0.0  6.2 gnome-vfs-daemo
  940 root     SW<     7852   334  0.0  6.1 ke-recv
  791 user     SW<     7204   334  0.0  5.6 sapwood-server


#===============================================================================
<2> /proc/meminfo
#===============================================================================
MemTotal:       126828 kB
MemFree:         12748 kB
Buffers:           108 kB
Cached:          51224 kB
SwapCached:          4 kB
Active:          79464 kB
Inactive:         8812 kB
SwapTotal:       49144 kB
SwapFree:        49028 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:       36952 kB
Mapped:          25980 kB
Slab:            13800 kB
SReclaimable:     4248 kB
SUnreclaim:       9552 kB
PageTables:       2160 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:    112556 kB
Committed_AS:    97380 kB
VmallocTotal:   122880 kB
VmallocUsed:       936 kB
VmallocChunk:   121920 kB


#===============================================================================
<3> ps
#===============================================================================
  PID  Uid        VSZ Stat Command
    1 root       1468 SW  init [2]     
    2 root            SWN [ksoftirqd/0]
    3 root            SW  [watchdog/0]
    4 root            SW< [events/0]
    5 root            SW< [khelper]
    6 root            SW< [kthread]
   14 root            SW< [dvfs/0]
   65 root            SW< [kblockd/0]
   66 root            SW< [kseriod]
   78 root            SW< [OMAP McSPI/0]
   85 root            SW< [ksuspend_usbd]
   88 root            SW< [khubd]
  112 root            SW  [pdflush]
  113 root            SW  [pdflush]
  114 root            SW< [kswapd0]
  115 root            SW< [aio/0]
  118 root            SW< [mipid_esd]
  237 root            SW  [mtdblockd]
  279 root            SW< [kondemand/0]
  280 root            SW< [kmmcd]
  291 root            SW< [krfcommd]
  312 root            SW< [mmcqd]
  334 root       1104 SW< dsme -d -l syslog -v 4 -p /usr/lib/dsme/libstartup.so
  342 root        776 SW  /usr/bin/bme_RX-34 
  344 root        564 SW  /usr/sbin/kicker 
  361 root            SW< [mmcqd]
  379 root            SW< [cx3110x]
  399 root            SWN [jffs2_gcd_mtd4]
  428 root       1576 SW< /sbin/udevd --daemon 
  614 messagebus   2312 SW< /usr/bin/dbus-daemon --system 
  638 haldaemon   4008 SW  /usr/sbin/hald 
  651 root       2800 SW  hald-runner 
  674 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  675 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  676 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  677 root       2436 SW  /usr/lib/hal/hald-addon-omap-gpio 
  696 root       2940 SW  hald-addon-input: Listening on /dev/input/event2 /dev
  697 root       2436 SW  /usr/lib/hal/hald-addon-mmc 
  704 root       2436 SW  /usr/lib/hal/hald-addon-mmc 
  719 root       2952 SW  /usr/lib/hal/hald-addon-cpufreq 
  744 root       3168 SW  /usr/sbin/sshd 
  751 root       3636 SW< /sbin/mce --force-syslog 
  753 messagebus   3328 SW  /usr/lib/gconf2/gconfd-2 
  757 root      12832 SW< /usr/bin/Xomap -mouse tslib -nozap -dpi 96 -wr -nolis
  784 user       1444 SW< /usr/sbin/temp-reaper 
  787 user       2048 SW< /usr/bin/dbus-daemon --session 
  791 user       7204 SW< /usr/lib/sapwood/sapwood-server 
  795 user       6520 SW< /usr/bin/matchbox-window-manager -theme echo -use_tit
  803 root            SW< [dsp/0]
  806 root            SW< [dsp/0]
  809 root       3668 SW  /usr/sbin/dsp_dld -p --disable-restart -c /lib/dsp/ds
  822 root       2792 SW< /usr/bin/bme-dbus-proxy -N 
  874 root       4804 SW  /usr/sbin/multimediad 
  879 root       2176 SW< /usr/bin/esd 
  928 root            SW< [file-storage-ga]
  932 haldaemon   2508 SW  hald-addon-usb-cable: listening on /sys/devices/plat
  940 root       7852 SW< /usr/sbin/ke-recv 
  958 user      18960 SW< /usr/bin/maemo-launcher --daemon --send-app-died --bo
  990 user       1448 SW< /usr/bin/hildon-input-method 
  991 user      34400 SW< /usr/bin/hildon-input-method                         
  994 user       3916 SW< /usr/bin/clipboard-manager 
 1035 user       1448 SW< /usr/bin/hildon-desktop 
 1036 user      40636 SW< /usr/bin/hildon-desktop                              
 1042 nobody     1812 SW< /usr/sbin/dnsmasq -k -i lo -a 127.0.0.1 -z 
 1056 root       2520 SW< /usr/sbin/hcid -n -x -s 
 1059 root       2452 SW< /usr/lib/bluetooth/bluetoothd-service-audio 
 1064 root       2412 SW< /usr/lib/bluetooth/bluetoothd-service-input 
 1069 root       2924 SW< /usr/bin/btcond -l 
 1072 root       3580 SW< /usr/sbin/wlancond 
 1075 root       3204 SW< /usr/sbin/icd 
 1116 user       1448 SW< /usr/bin/osso-connectivity-ui-conndlgs 
 1117 user      20616 SW< /usr/bin/osso-connectivity-ui-conndlgs               
 1121 user      77044 SW< /usr/bin/osso-media-server 
 1146 systemui  23328 SW< /usr/bin/systemui 
 1151 user       3192 SW< /usr/bin/obexsrv -l 
 1171 user       5300 SW  /usr/bin/alarmd 
 1191 messagebus  14392 SW  /usr/bin/eapd 
 1208 root       5084 SW< /usr/sbin/hulda 
 1209 root       3036 SW< /usr/sbin/hulda 
 1229 user      16104 SW  /usr/bin/mediaplayer-engine 
 1235 user      16172 RWN /usr/bin/metalayer-crawler -F 
 1250 user       7876 SW  /usr/libexec/gnome-vfs-daemon 
 1253 root       1472 SW< /sbin/udhcpc -i wlan0 -s /etc/udhcpc/udhcpc.script -f
 2498 root       5776 SW  sshd: root@pts/0  
 2500 root       1960 SW  -sh 
 2514 root       5776 SW  sshd: root@pts/1  
 2516 root       1964 SW  -sh 
 2563 user       1448 SW  /usr/bin/osso_rss_feed_reader 
 2564 user      28632 RW  /usr/bin/osso_rss_feed_reader                        
 2566 user      84140 RW  /usr/bin/browser 
 2568 root       1884 SW  /bin/sh ./s1 
 2570 root       1960 RW  ps 


#===============================================================================
<4> ls -l /proc/*/fd
#===============================================================================
NOTE: The following error came up: 
ls: /proc/self/fd/3: No such file or directory

{see attachment for the actual output}


#===============================================================================
<5> df
#===============================================================================
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mtdblock4            2048      2048         0 100% /mnt/initfs
none                       512       104       408  20% /mnt/initfs/tmp
/dev/mtdblock4          257536    126644    130892  49% /
none                       512       104       408  20% /tmp
none                      1024        20      1004   2% /dev
tmpfs                     1024        24      1000   2% /dev/shm
/dev/mmcblk0p1         1992320    967712   1024608  49% /media/mmc2
/dev/mmcblk1p1          993376     38532    954844   4% /media/mmc1


#===============================================================================
<6> dmesg
#=============================================================================== 

{see attachment for the actual output}
Comment 21 Nicolas D.L. Subotich 2008-02-27 04:22:25 UTC
Created an attachment (id=728) [details]
Attachment 1 [details]of2 for Comment #20

nsubotich / 2008-02-26 / Output from ls -l /proc/*/fd
Comment 22 Nicolas D.L. Subotich 2008-02-27 04:23:16 UTC
Created an attachment (id=729) [details]
Attachment 2 [details]of2 for Comment #20

nsubotich / 2008-02-26 / Output from dmesg
Comment 23 Eero Tamminen nokia 2008-02-27 16:10:40 UTC
> Ok, here we go... I had a couple of lockups even with the new firmware.

Thanks!  Unfortunately these really don't have anything that would tell
why pdflush actually takes CPU.

I noticed the following strange things:

* Media server taking few tens of MBs more memory than normally:
 1121 user      77044 SW< /usr/bin/osso-media-server

* Something (GPE calendar?) being fond of keeping .gpe directory open
  multiple times and already deleted files still open:
/proc/1036/fd:\
[...]
lrwx------    1 user     users          64 Feb 25 23:33 13 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 14 ->
/var/tmp/sqlite_apmm5FdwaxvHrPw (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 15 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 16 ->
/var/tmp/sqlite_qhTn5E5hiDcRh0X (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 17 -> socket:[4234]\
lrwx------    1 user     users          64 Feb 25 23:33 18 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 19 ->
/var/tmp/sqlite_M7MBsj4gsPzGQxu (deleted)\
l-wx------    1 user     users          64 Feb 25 23:33 2 -> /dev/null\
lrwx------    1 user     users          64 Feb 25 23:33 20 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 21 ->
/var/tmp/sqlite_SnDpGr3Sy4pnKZG (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 22 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 23 ->
/var/tmp/sqlite_SDu17gOiWKPh0ej (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 24 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 25 ->
/var/tmp/sqlite_hMq4knVq5yPvpW7 (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 26 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 27 ->
/var/tmp/sqlite_OZ0kh39UKykbRwJ (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 28 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 29 ->
/var/tmp/sqlite_vgrP0TcjAqsUbpq (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 3 -> socket:[3876]\
lrwx------    1 user     users          64 Feb 25 23:33 30 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 31 ->
/var/tmp/sqlite_t6RytCyTuZ07f7W (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 32 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 33 ->
/var/tmp/sqlite_0kXglm1uhoS5EL2 (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 34 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 35 ->
/var/tmp/sqlite_VX2u5dEjnVq3HvS (deleted)\
lrwx------    1 user     users          64 Feb 25 23:33 36 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 25 23:33 37 ->
/var/tmp/sqlite_8xH9nzf1C6T00Sm (deleted)\
lrwx------    1 user     users          64 Feb 26 00:33 38 ->
/home/user/.gpe/calendar\
lrwx------    1 user     users          64 Feb 26 00:33 39 ->
/var/tmp/sqlite_U1AjonG21jYO4Oh (deleted)\
[...]


Anyway, my current suspicion is that your device is swapping and for
some reason it just doesn't show up correctly in "meminfo" Writeback
and Dirty fields.

Could you try disabling swap?  If it improves the situation considerably
or at least the slowness doesn't come from pdflush, I would say that the
pdflush case is only about swapping being slow (like it's also on Desktop
machines).  The quality of the memory card on which the swapping happens
can also have effect on this (I guess most manufacturers' memory cards are
not really designed/tested for that kind of use-case).

Media server taking more RAM than normally makes it more like for
there to be swapping (to get rid of its memory usage, you can kill
it as root, then it's just restarted).  What kind of media files
you're playing with it?  (mp3, mpg, avi, wma, real, ogg...?)

If your internal (256MB) JFFS2 filesystem is full, it will also consume
more RAM, which again makes swapping more likely.  In your case that
didn't seem an issue though as half of it was free.
Comment 24 Nicolas D.L. Subotich 2008-02-28 04:09:45 UTC
Thanks for checking out the info I posted.
I will try to get another SD card and see if that makes any difference.
In the meantime, some comments:

- I've got the whole suite of GPE installed, including the GPE Summary applet,
configured to show calendar events and to-do entries. This applet might be the
one keeping the .gpe files open.

- I also installed Video Center and Media Streamer. I don't think I played
anything other that some sample mp3, jpeg, mp4 and flash files. I use Media
Streamer to pull content off an HP MediaVault using TwonkyMedia, if that
matters.

- I do not know if it is possible, but it seems that when pdflush hogs the CPU
and I issue the meminfo command, it doesn't actually execute until pdflush lets
go for a while and then meminfo didn't really capture the actual status DURING
the hung up. I started checking around about using "renice" for lowering the
priority of pdflush and increasing the priority of the xterm so that I could
get meminfo to capture useful data. Would this be useful or just dumb?

Thanks again,

Nicolas
Comment 25 Eero Tamminen nokia 2008-02-28 16:33:18 UTC
(In reply to comment #24)
> Thanks for checking out the info I posted.
> I will try to get another SD card and see if that makes any difference.
> In the meantime, some comments:
> 
> - I've got the whole suite of GPE installed, including the GPE Summary applet,
> configured to show calendar events and to-do entries. This applet might be the
> one keeping the .gpe files open.

That shouldn't be causing this issue, but it might be leaking file descriptors
(they don't take much memory, but if process or system runs completely out of
them, it stops working properly).


> - I also installed Video Center and Media Streamer. I don't think I played
> anything other that some sample mp3, jpeg, mp4 and flash files. I use Media
> Streamer to pull content off an HP MediaVault using TwonkyMedia, if that
> matters.

If Media Streamer buffers the media locally, that could explain pdflush.
You could try stracing it a bit to see whether it indeed does buffering:
  strace -e trace=file <PID>

You can get strace from maemo tools repo:
  http://maemo.org/development/tools/


> - I do not know if it is possible, but it seems that when pdflush hogs
> the CPU and I issue the meminfo command, it doesn't actually execute until
> pdflush lets go for a while and then meminfo didn't really capture
> the actual status DURING the hung up.

Ok, if pdflush had already gotten _all_ pending data written to disk,
that would explain why Writeback and Dirty are zero.


> I started checking around about using "renice" for lowering the
> priority of pdflush and increasing the priority of the xterm so that I could
> get meminfo to capture useful data. Would this be useful or just dumb?

Pdflush is a kernel thread, I don't think anything you can do from 
user-space has an effect on it.

There are some settings in /proc/sys/vm for controlling pdflush though:
------------------------------------------------------
dirty_background_ratio
----------------------

Contains, as a percentage of total system memory, the number of pages at which
the pdflush background writeback daemon will start writing out dirty data.

dirty_writeback_centisecs
-------------------------

The pdflush writeback daemons will periodically wake up and write `old' data
out to disk.  This tunable expresses the interval between those wakeups, in
100'ths of a second.  Setting this to zero disables periodic writeback 
altogether.

dirty_expire_centisecs
----------------------

This tunable is used to define when dirty data is old enough to be eligible
for writeout by the pdflush daemons.  It is expressed in 100'ths of a second.
Data which has been dirty in-memory for longer than this interval will be
written out next time a pdflush daemon wakes up.
------------------------------------------------------

Current defaults are:
  for i in /proc/sys/vm/dirty_*_*; do echo "$i: $(cat $i)"; done
/proc/sys/vm/dirty_background_ratio: 6
/proc/sys/vm/dirty_expire_centisecs: 3000
/proc/sys/vm/dirty_writeback_centisecs: 500

You could try playing with them a bit if disabling swap doesn't help.

You can't make the dirty_background_ratio much smaller though, I think
minimum is 5(%). 0.05*128 = 6.4MB data to write when pdflush is invoked.
Normally that shouldn't be a problem though, SD cards have few MB
write speeds, but in some situations and with some cards that can drop
under 1MB i.e. data (music) could in some cases be streamed&buffered
faster over WLAN than the card can cope with.  

Trying a brand new & faster card could also help.
Comment 26 Eero Tamminen nokia 2008-02-28 17:23:44 UTC
Alternative you could use this to see disk accesses, it follows whole system
unlike strace, but it doesn't all disk accesses like strace can do:
   echo 1 >/proc/sys/vm/block_dump

you can then see the file accesses with dmesg.  I have no idea how this affects
device performance otherwise though, but echoing "0" there or rebooting gets
you
rid of this setting.
Comment 27 Neil MacLeod (reporter) maemo.org 2008-02-28 18:44:55 UTC
Just to say I experienced pdflush with practically nothing installed - just
ssh-server, so it's highly unlikely the cause is a third party app (unless it's
ssh) but I suppose some apps could trigger the problem more than others.

As for it being a problem with the SD cards - these cards worked perfectly with
OS 2007 and pdflush has only been a problem since OS 2008. I'm not saying it
isn't the cards but it would be a huge coincidence for us all to be
experiencing pdflush problems due to our SD cards going bad just after we had
all installed OS 2008. I don't recall ever seeing anyone report pdflush
problems under OS 2007 either despite the multitude of SD card/corruption
problems that OS had.
Comment 28 Nicolas D.L. Subotich 2008-02-29 05:16:15 UTC
Thanks for the advise. I will try to gather the information over the next
couple of days.

I also have to agree with the post above about the problem never happening on
OS2007. So far, the problem wasn't significant on OS2008 right after a reflash
either. The couple of times I reflashed it took about two weeks for the problem
to become quite noticeable again.

I'll keep checking, anyway. Thanks again.
Comment 29 Neil MacLeod (reporter) maemo.org 2008-02-29 05:24:27 UTC
(In reply to comment #28)
> So far, the problem wasn't significant on OS2008 right after a reflash
> either. The couple of times I reflashed it took about two weeks for the problem
> to become quite noticeable again.

That sounds about right - it's not an immediate problem but seems to occur
after a week or two and then it becomes a persistant issue until the next
reflash. I wish I could contribute more to this bug but my current
circumstances mean I have little time for Nokia tablets.
Comment 30 Eero Tamminen nokia 2008-02-29 09:54:21 UTC
Ok, thanks.

Is it still valid that nobody's encountered this on N810?
Comment 31 Neil MacLeod (reporter) maemo.org 2008-02-29 13:01:44 UTC
(In reply to comment #30)
> Ok, thanks.
> 
> Is it still valid that nobody's encountered this on N810?
> 

From the little time I have used my N810, which was last flashed well over 2
weeks ago (with 2.2007.50-2), I would say that it is still valid that this does
not affect the N810.
Comment 32 Jay Swackhamer 2008-02-29 18:03:07 UTC
I am experiencing this on a N810. With a new flash, no extra card, rss off, no
new apps, during browsing, with swap off or on.

(In reply to comment #31)
> (In reply to comment #30)
> > Ok, thanks.
> > 
> > Is it still valid that nobody's encountered this on N810?
> > 
> 
> From the little time I have used my N810, which was last flashed well over 2
> weeks ago (with 2.2007.50-2), I would say that it is still valid that this does
> not affect the N810.
>
Comment 33 Eero Tamminen nokia 2008-03-03 12:54:32 UTC
There's one more way to find more information about what is actually happening
(as ps, list of all open files in system and dmesg didn't help) if you have
time for it.

* First, install "sysklogd" package from maemo repos, according to
  the repository instructions at the bottom of this page:
   http://maemo.org/development/tools/

* When the pdflush issue happens the next time, please do following as root
  while it happens:
   # echo "t" > /proc/sysrq-trigger

  This outputs short backtrace for all kernel tasks (including user-space
  processes) to dmesg & syslog.  Syslog is needed because our dmesg buffer
  is too small for the whole task list.  (The "dmesg" output could be enough
  if the part that fit into dmesg buffer contains relevant tasks.)

* Then attach the syslog.  It's preferable to get several task lists from 
  different pdflush issues in case some attempts missed it. The log is saved
  to /var/log/syslog* files on the device.

  Note that you might want to edit the syslog a bit, it can contain sensitive
  information (URLs etc) you might not want in public Bugzilla.

You can remove syslog package and its logs once testing is done, they slow
down your device (cause more need for JFFS2 GCing, fill it etc).


/proc/slabinfo contents just before or during the pdflush issue are also
interesting, but less important than the task backtraces.
Comment 34 Nicolas D.L. Subotich 2008-03-04 15:34:00 UTC
Created an attachment (id=734) [details]
Attachment for Comment #34
Comment 35 Nicolas D.L. Subotich 2008-03-04 15:42:48 UTC
The whole issue started to be much more noticeable since yesterday. Here are
some details:

- I did not have the time to get the additional trace information.
- I attached another set of files to this bug (see Attachment section at the
top)
- This time around it happened while using the Application Manager tool. It
took well over 5 minutes just to update the list of applications and realize
that there was one update (OMWeather). Updating the application itself took
almost another 5 minutes.
- During that time, pdflush and jffs2_gcd_mtd4 where both quite active, but
mostly it was pdflush that used most of the CPU time.
- In this ocassion, the meminfo output shows a value of 4KB for Dirty (this
value stayed there for a while).
- I did not install any new applications or did anything significant on the SD
card.
- Boot up times are much longer now. It takes about 60 seconds compared to 10
right after a reflash.

I will try to capture more details tonight, time permitting.
Comment 36 Eero Tamminen nokia 2008-03-04 18:03:59 UTC
(In reply to comment #35)
> The whole issue started to be much more noticeable since yesterday.
> Here are some details:

Thanks!

> - I did not have the time to get the additional trace information.
> - I attached another set of files to this bug (see Attachment section at
>   the top)

There's nothing really suspicious in these files either.
Only strange thing is this in /proc/meminfo:
  Committed_AS: 4294887304 kB

But as that was reasonable in comment 20 and doesn't match what is reported
now by "ps" as memory usage of processes, I guess it's some unrelated issue.

Is this number still ridiculously large after pdflush has stopped?


> - This time around it happened while using the Application Manager tool. It
> took well over 5 minutes just to update the list of applications and realize
> that there was one update (OMWeather). Updating the application itself took
> almost another 5 minutes.

According to open files list, package management seems to be creating
and deleting some files while doing the update.  They shouldn't be that
large, so unless you do this very often (many times each day), it
shouldn't be any problem.


> - During that time, pdflush and jffs2_gcd_mtd4 where both quite active, but
> mostly it was pdflush that used most of the CPU time.

I've discussed about this here and pdflush can be triggered also by JFFS2,
it just not as regular/normal as with memory cards (JFFS2 is synchronous,
FAT is used asynchronously).  I.e. memory card usage (swap etc) isn't
necessarily needed for pdflush to do things, it just increases it.


> - In this ocassion, the meminfo output shows a value of 4KB for Dirty (this
> value stayed there for a while).

I guess we can ignore Dirty/Writeback as they don't seem to be significant.


> - I did not install any new applications or did anything significant on the SD
> card.
> - Boot up times are much longer now. It takes about 60 seconds compared to 10
> right after a reflash.

Boot time (from power button press until Desktop comes up) should
be a bit more than 30 secs after flashing.  If the filesystem later
gets full or JFFS2 needs to do GCing, it can take longer (even over
minute), but once the GCing is done and there's free space, it
shouldn't be anymore an issue.

According to "df" output your root file system is still relatively
empty, so only explanation for the slowness I can think of is that
something is frequently creating and removing files, or updating
them on your root file system and because of that, jffs2 needs to
do a lot of garbage collecting.


Do you have E-mail set up with periodic updates?  What about RSS?
Have you enabled automatic RSS feed updates and what is the interval
and your cache size? Have you large feeds?


> I will try to capture more details tonight, time permitting. 

When this happens next, please get /proc/slabinfo, then do
as root:
   echo "t" > /proc/sysrq-trigger

And attach here both slabinfo and dmesg contents.

If they don't list what's triggered pdflush and whether kernel has large
internal memory usage, then we need syslog to get the whole task list
as described in comment 33.


Btw. Just to make sure, I guess you're not running your device in R&D mode
(you shouldn't)?
Comment 37 Nicolas D.L. Subotich 2008-03-04 18:35:13 UTC
I will try to get the additional information tonight but, for the time being,
here are some of the answers:
> 
> Do you have E-mail set up with periodic updates?  What about RSS?
> Have you enabled automatic RSS feed updates and what is the interval
> and your cache size? Have you large feeds?
> 

I used to have e-mail setup with periodic updates (once an hour) until last
Saturday, when I switched to Claws e-mail. I removed the e-mail account from
the embedded e-mail application and configured it under Claws. I do not leave
Claws running in the background either.
As fas as RSS, I have these five feeds configured and they update every 15
minutes (cache size is the default):
- The one that comes with the N800 for Nokia updates
- usatoday.com (headline news)
- azcentral.com (local AZ headline news)
- BBC.com (world news)
- slashdot.com (general) <- This is the most "verbose" of the feeds

I also have the OMWeather applet configured for two cities and updating every
30 minutes.

> 
> Btw. Just to make sure, I guess you're not running your device in R&D mode
> (you shouldn't)?
> 

No, I am not running in R&D mode. I use "ssh root@localhost" when needed.

Thanks again!
Comment 38 Nicolas D.L. Subotich 2008-03-05 06:01:26 UTC
Created an attachment (id=735) [details]
Attachment for Comment #39
Comment 39 Nicolas D.L. Subotich 2008-03-05 06:52:24 UTC
I captured some more info from a new event. Here are the details:

- I have not installed sysklogd yet, but I captured all the other information.

- For this event, I ran a remote ssh session from another PC and I ran a
"renice -20 {pid}" for the sshd process associated with it. I had "top -d 1"
running on it to monitor pdflush.

- This time around it was much "harder" to get the problem to happen. The last
time it was enough to do a refresh from the Application Manager. Now, I even
had the following applications running and the problem still would not happen:
2 browser windows, Garnet VM, Claws-mail, Skype, RSS Feed Reader, Application
Manager. 

- At this point, osso-media-server's memory usage was 7416 as reported by top.
Also, the Committed_AS value from /proc/meminfo was around 50,000KB.

- I closed all the applications listed before and I then launched Video Center
and played a video from there (Surf's Up! trailer). pdflush started showing up
but for a few seconds at a time. osso-media-server's memory usage went way up
(90,000+), as expected (I guess).

- After closing Video Center pdflush run for a while (15~20 seconds, but not
minutes, like it did yesterday). At this point I captured the files (see
attachment 20080304.zip)

- After this new "incident", osso-media-server's memory usage stayed at 44704
and the Committed_AS from /proc/meminfo is now again at 4294963104 kB.

- After yesterday's event, I also noticed that osso-media-server had a higher
memory usage than the usual 7416, so I killed it. I used the N800 for a while
(didn't notice pdflush getting in the way) and eventually rebooted because I
just had installed an update for OMWeather (which requires a reboot).

The trigger for this event was different than the others, but the final symptom
was quite similar, although pdflush did not run for as long as it did
yesterday.

I hope this helps in some way and thanks again for your patience...
Comment 40 Nicolas D.L. Subotich 2008-03-05 07:13:33 UTC
I kept on playing a little more and here are some more findings about
osso-media-server's memory usage and the Committed_AS results:

- I did a "killall osso-media-server" and the process restarted shorly after
(memory usage: 7416). Commited_AS still showed as a huge number.
- I played the N800 video that came with the last flash image and closed it.
osso-media-server's memory usage was now at 52000+ and the Committed_AS went
back down to 33000+.
- I killed the process again and now Committed_AS shows a huge number again.
The number is the same range but slightly different.

Like you said, this might have nothing to do with the pdflush issue, though...
Comment 41 Eero Tamminen nokia 2008-03-07 10:09:04 UTC
Thanks, sorry for the late reply (I though I had commited this)!

Unfortunately dmesg didn't catch pdflush, I guess we need syslog
for that.  (you don't need to provide other data besides slabtop and
task list anymore as the other data didn't reveal anything interesting)

I think this bug is related to how much there are file changes happening in
the system, how large those files are and whether files are modified in place.
You had regular hourly email updates and quarter hourly RSS-updates, and
problem lessened when email updates were stopped.

Was Modest the E-mail application you used before Claws, or was it
the pre-installed Chinook E-mail application?

I guess Video center doesn't buffer files to file system, but do you use
often any other applications that could write often to the file system
(for example have you chat app logging the messages etc)?
Comment 42 Nicolas D.L. Subotich 2008-03-08 07:34:14 UTC
Thanks for the reply, but unfortunately I have not had the time to try your
recommendations yet. I hope to get the additional information over the weekend.
Comment 43 Nicolas D.L. Subotich 2008-03-08 18:06:23 UTC
Created an attachment (id=736) [details]
Attachment 1 [details]of2 for Comment #45
Comment 44 Nicolas D.L. Subotich 2008-03-08 18:06:49 UTC
Created an attachment (id=737) [details]
Attachment 2 [details]of2 for Comment #45
Comment 45 Nicolas D.L. Subotich 2008-03-08 18:17:43 UTC
OK, I've captured two more events, this time with sysklogd installed. Here is
the sequence of events:

- Install sysklogd
- Reboot
- ssh from another PC, then "renice -20 pid" and kept "top -d 1" running until
the pdflush problem appeared.
- Started videocenter, refreshed the list of videos,  played Talladega Nights, 
closed videocenter. pdflush showed up but briefly (few seconds)
- Opened a browser window, called up google news, left the window open.
- Open application manager, refresh application list -> BINGO! pdflush at full
blast.
- Stopped top and ran the first script to capture info. This took a couple of
minutes.
- Copied the last output from top into a file called "top".
- Waited for the application manager to finish refreshing the list (a few more
minutes).
- Ran the 2nd script to capture all the other information.
==> The files for this first event are in the attachment 20080308_1.zip

- Restarted "top -d 1"
- connected to USB and transferred the zip to the PC. Disconnected
- Noticed that metalayer-crawler started using up all the CPU for a while right
after unmounting the USB-connected volumes (mmc1/mmc2). After metalayer-crawler
was done, BINGO! pdflush at full blast again.
- Captured files like in the first instance.
==> The files for this seconds event are in the attachment 20080308_2.zip

I hope this helps and thank you again for your continued cooperation!
Comment 46 Nicolas D.L. Subotich 2008-04-02 05:31:12 UTC
Hi,

I just came back from vacations and I didn't see any updates on this bug. Did
anything ever come out of the last set of logs I submitted (#43 & 44)?

To make matters worse, the day I left for vacations, the N800 started rebooting
by itself non-stop, so I had to reflash it today... 

Nicolas
Comment 47 Eero Tamminen nokia 2008-04-03 16:53:14 UTC
(In reply to comment #46)
> I just came back from vacations and I didn't see any updates on this bug.
> Did anything ever come out of the last set of logs I submitted (#43 & 44)?

In the first one apt-worker was stalled here:
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] apt-worker    D C02512D0
    0  1416   1415 (NOTLB)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c0250de0>]
(schedule+0x0/0x5bc) from [<c0250c4c>] (__down+0x9c/0xe8)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c0250bb0>]
(__down+0x0/0xe8) from [<c0250b4c>] (__down_failed+0xc/0x20)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312]  r8 = 0000002C  r7 =
C11EE4E0  r6 = C7F39C2C  r5 = C11EE508
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312]  r4 = C7F39C00 
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c00e8fd4>]
(jffs2_reserve_space+0x0/0x240) from [<c00ec1fc>]
(jffs2_write_inode_range+0x5c/0x460)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c00ec1a0>]
(jffs2_write_inode_range+0x0/0x460) from [<c00e6be8>]
(jffs2_commit_write+0x1c4/0x31c)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c00e6a24>]
(jffs2_commit_write+0x0/0x31c) from [<c007eb10>]
(generic_file_buffered_write+0x41c/0x628)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c007e6f4>]
(generic_file_buffered_write+0x0/0x628) from [<c007f51c>]
(__generic_file_aio_write_nolock+0x568/0x598)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c007efb4>]
(__generic_file_aio_write_nolock+0x0/0x598) from [<c007f5c4>]
(generic_file_aio_write+0x78/0xf4)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c007f54c>]
(generic_file_aio_write+0x0/0xf4) from [<c009cf0c>] (do_sync_write+0xb0/0xf4)
Mar  8 08:30:03 Nokia-N800-51-3 kernel: [ 1052.570312] [<c009ce5c>]
(do_sync_write+0x0/0xf4) from [<c009d03c>] (vfs_write+0xec/0x178)

The other one had nothing suspicious looking.
Neither syslog had backtrace for pdflush for some reason. :-/


Both had this suspicious thing in slabinfo:
jffs2_refblock     28488  28500    256   15    1 : tunables  120   60    0 :
slabdata   1900   1900      0
  -> 7MB of RAM used for these kernel objects

Of the devices I could get my hand here, the one closest to this
amount had this at 22000 after ~90MB of data being written to the Flash.


I can only assume that your issue comes from lots of data being written,
updated or deleted on the Flash.  Pdflush should be working at low priority,
but processes can be waiting until it has finished doing something.

Some problematic use-cases for JFFS2:
- Files that are being frequently modified (the modifications can be queued
  and GCed only much later, in the meanwhile accesses to file get slower)
- Large files that are open (JFFS2 keeps in memory structure of the whole
  file system, open files take more memory)
- Files being written in very small pieces (JFFS2 needs to keep a large
  fragment tree for the file which is of course also slower to traverse,
  so Flash will GC such writes once it gets once compressed block full)


You can improve things by minimizing writes to the Flash (periodic updates
writing data to disk should either be disabled or interval increased, syslog
disabled).  Disabling swap could also help.



Thanks for your efforts, they are appreciated!

To debug this further, the use-case & difference to the device default state
would need to be minimized further to get a reliably reproducible test-case.
This doesn't appear in our internal testing, hopefully that's because its
fixed by other changes and not because it requires specific triggering
condition we don't know yet.


> To make matters worse, the day I left for vacations, the N800 started
> rebooting by itself non-stop, so I had to reflash it today... 

It shouldn't do that, even if Flash gets full.  What 3rd party applications
you had installed?
Comment 48 Nicolas D.L. Subotich 2008-04-04 07:15:06 UTC
Eero,

Thanks again for taking the time to look into this.

As far as what was installed at the time it started rebooting by itself... I
honestly didn't keep a record of everything that was installed. I don't think I
had installed anything more after getting syslogkd working, though.

This is the current status and my plan for the near future:
- Freshly reflashed to OS2008 v. 2.2007.51-3 on 4/1/2008.
- Restored backup (settings, documents, bookmarks, etc, but NOT the application
list).
- No new applications installed.
- RSS Reader configured for auto updates every 60 minutes with these feeds: BBC
News, AZCentral.com news, USA Today news and Slashdot main news.
- No change to the SD card.
- I will let it run for the next two or three weeks as is.
- If the problem comes back during that time, then I will repeat the log data
collection as before.
- If the problem does not happen again, I will install the applications that I
remember I had before and go from there...

Thanks again,

Nicolas
Comment 49 Peter 2008-04-04 08:02:27 UTC
Same Issue on my N800 that was just recently reflashed, metalayer-crawler was
using 98%, solved that issue and now it is jffs2 and pd_flush causing me the
issue.

Heard some comments about corrupted flash on some forums, I did load the flash
from a Windows Machine.

Do we have a Windows Bug?

Could a Disk Fix work on this?
Comment 50 Peter 2008-04-04 09:14:41 UTC
Did a powerdown, battery pull, Disk Fix, Block Scan, and a Defrag on two 4 GB
SDs. metalayer-crawler issue is back but PD_Flush and JFFS2 Issue isn't.

As soon as the Metalayer-crawler service is stopped/killed, PD_Flush and JFFS2
issue returns.

Memory on the two 4 gb SDs, internal one is 33% free and the removable is 47%
free.
Comment 51 Eero Tamminen nokia 2008-04-04 09:57:11 UTC
(In reply to comment #48)
> This is the current status and my plan for the near future:
> - Freshly reflashed to OS2008 v. 2.2007.51-3 on 4/1/2008.
> - Restored backup (settings, documents, bookmarks, etc, but NOT the
>   application list).
> - No new applications installed.
> - RSS Reader configured for auto updates every 60 minutes with these feeds:
>   BBC News, AZCentral.com news, USA Today news and Slashdot main news.
> - No change to the SD card.
> - I will let it run for the next two or three weeks as is.
> - If the problem comes back during that time, then I will repeat
>   the log data collection as before.

Sounds really good.  If you get it reproduced with this set, we could
then start narrowing it down to get it properly reproduced here also.


(In reply to comment #50)
> Did a powerdown, battery pull, Disk Fix, Block Scan, and a Defrag on two 4 GB
> SDs. metalayer-crawler issue is back but PD_Flush and JFFS2 Issue isn't.
> 
> As soon as the Metalayer-crawler service is stopped/killed, PD_Flush and JFFS2
> issue returns.

Interesting.  Could you strace the crawler to see what it does:
   strace -e trace=file -p $(pidof metalayer-crawler)
And add the information to bug 1842?

And when you see pdflush in top, could you do as root:
   # echo "t" > /proc/sysrq-trigger
And attach here what the dmsg command outputs after that?
Comment 52 Peter 2008-04-04 23:32:23 UTC
rebooted the device after disabling vitual memory, the issue has been reduced
by about 85%.

This has significantly improved performance.

metalayer-crawler is no longer consuming 98% of cpu anymore, stopping the
service is no longer needed.

I will run the tests as soon as this issue reoccurs.
Comment 53 Eero Tamminen nokia 2008-04-07 12:18:27 UTC
(In reply to comment #52)
> rebooted the device after disabling vitual memory, the issue has been reduced
> by about 85%.
> 
> This has significantly improved performance.
> 
> metalayer-crawler is no longer consuming 98% of cpu anymore, stopping the
> service is no longer needed.

Hm.  Crawler's monitoring memory card directory content changes, so it could
be woken also on all file changes (not just file additions & removal) to check
whether the file in question was a media file which requires metadata database
changes. As swap is a file on memory card FAT file system, that file contents
change on each swap update.

However, swap is always open and written with mmap() and at least when
using the the latest internal release on N810, swap writes do not trigger
crawler wakeup.  I.e. swap usage shouldn't affect crawler CPU usage, only
pdflush.

Which release and device you're using?
Comment 54 Peter 2008-04-08 08:38:12 UTC
Created an attachment (id=749) [details]
dmesg boot crc errors on flash

With vitual memory off, pd_flush doesn't cause an issue but now boot is very
slow.
Comment 55 Peter 2008-04-08 08:43:54 UTC
(In reply to comment #53)
> (In reply to comment #52)
> > rebooted the device after disabling vitual memory, the issue has been reduced
> > by about 85%.
> > 
> > This has significantly improved performance.
> > 
> > metalayer-crawler is no longer consuming 98% of cpu anymore, stopping the
> > service is no longer needed.
> 
> Hm.  Crawler's monitoring memory card directory content changes, so it could
> be woken also on all file changes (not just file additions & removal) to check
> whether the file in question was a media file which requires metadata database
> changes. As swap is a file on memory card FAT file system, that file contents
> change on each swap update.
> 
> However, swap is always open and written with mmap() and at least when
> using the the latest internal release on N810, swap writes do not trigger
> crawler wakeup.  I.e. swap usage shouldn't affect crawler CPU usage, only
> pdflush.
> 
> Which release and device you're using?
> 

N800 os2008 2.2007.51-3
Comment 56 Eero Tamminen nokia 2008-04-08 09:39:45 UTC
(In reply to comment #54)
> Created an attachment (id=749) [details] [details]
> dmesg boot crc errors on flash

The CRC notifications mean just that some block wasn't completely written
when device battery dropped off or HW watchdog rebooted the device.
They go away once JFFS2 gets around to garbage collecting ao. blocks
(can take weeks or months).  I.e. FS is still in consistent state,
you might have lost few last KB from last file write when the device
had went uncleanly down though.


> With vitual memory off, pd_flush doesn't cause an issue but now boot is very
> slow.

I assume this is due to either flash being very full (say less than 1/4
free) or garbage collection (after removing/updating files) or you have 
installed something that slows the boot.  If it's garbage collection,
make sure you have maybe 1/4 of the Flash free and it should eventually
go away unless your apps constantly write & delete stuff from flash.
Turning off RSS and Email refreshes helps in this.

If you need free space, move the example content from MyDocs to a memory card.
Comment 57 Eero Tamminen nokia 2008-04-09 17:10:49 UTC
We have now seen similar issue here.  It required running device for a couple
of weeks with syslog enabled, frequent RSS refreshes etc.  The issue seems to
be basically JFFS2 and its scalability and fragmentation (256MB is on the limit
of its usability but there isn't any better & well tested replacement Flash
file system).

If there's a lot of frequent file system write activity, JFFS2 can eventually
start spending a lot of time garbage collecting and if something is trying to
write to same file system at the same time while this happens, it will be 
blocked.  Often GCing can actually be triggered by a write.  If the file
system activity is reduced significantly, the issue will (eventually) go
away.

We've identified some issues in RSS feed caching which can cause more
disk activity than is necessary.  Hopefully fixing them helps with
this bug.

What are your RSS cache sizes and refresh rates?  Have you some large
feeds with large images (such as planet.gnome)?
Comment 58 Tobias Löfgren 2008-04-13 12:42:15 UTC
After reading through all the descriptions here, seeing RSS popping up here and
there and realizing that my pdflush-slowdown problem appeared after I started
using the built in RSS reader last week for the first time since I bought the
tablet a year ago I put two and two together. 

I had only one feed in the reader:

http://feeds.feedburner.com/tedtalks_video

from the page http://www.ted.com/talks

There's no images in that feed, but there are direct links to mp4-video files.
Perhaps the reader tries to cache them? 

When I set image cache to off, removed auto update, cleared the image cache and
removed the ted-talk feed the pdflash-problem disappeared after a reboot.
Comment 59 Nicolas D.L. Subotich 2008-04-14 06:33:18 UTC
Eero,

Thanks for the news! The fact the you've reproduced the problem is very
encouraging. 
As far as RSS settings, I had them setup to update once every 15 minutes, I
believe, and I had 5 feeds configured, none of them with big graphics.
Actually, I think I had the cache set to OFF the last time (at least that is
what I have now after reflashing and restoring the old settings). Some of the
feeds have ads in them, though.
I have now disabled automatic RSS updates, so I will see if anything changes in
the next week or so.
The only application I have that updates periodically is OMWeather (once an
hour now, it used to be once every 30 minutes).

Thanks again for the continued effort in trying to get this one resolved!

Big Thanks,

Nicolas
Comment 60 Benoît HERVIER 2008-06-13 12:07:03 UTC
I ve exactly the same problem ... on n810 on chinook
RSS unactivated ...

Boot take between 5 to 15 minutes. And can't use at all swap.
I ve also noticed that moving an applet in hildon-desktop make pdflush eating
cpu.

Internal sd was formatted. And this is the same thing with and without sd card.

dmesg isn't usefull (but in case of) :
stering interrupt 6 for device hookdet
[    0.570312] OMAP2 OneNAND driver initializing
[    0.570312] omap2-onenand omap2-onenand: initializing on CS0, phys base
0x04000000, virtual base c8880000
[    0.578125] OneNAND Manufacturer: Samsung (0xec)
[    0.578125] Muxed OneNAND 256MB 1.8V 16-bit (0x40)
[    0.578125] OneNAND version 0x0022
[    0.593750] Scanning device for bad blocks
[    0.726562] Creating 5 MTD partitions on "omap2-onenand":
[    0.726562] 0x00000000-0x00020000 : "bootloader"
[    0.726562] 0x00020000-0x00080000 : "config"
[    0.726562] 0x00080000-0x00280000 : "kernel"
[    0.726562] 0x00280000-0x00480000 : "initfs"
[    0.726562] 0x00480000-0x10000000 : "rootfs"
[    0.734375] TSC2301 driver initializing
[    0.734375] Initializing USB Mass Storage driver...
[    0.734375] usbcore: registered new interface driver usb-storage
[    0.734375] USB Mass Storage support registered.
[    0.734375] usbcore: registered new interface driver libusual
[    0.734375] usbcore: registered new interface driver usbhid
[    0.734375] drivers/usb/input/hid-core.c: v2.6:USB HID core driver
[    0.734375] drivers/usb/net/rtl8150.c: rtl8150 based usb-ethernet driver
v0.6.2 (2004/08/27)
[    0.734375] usbcore: registered new interface driver rtl8150
[    0.734375] usbcore: registered new interface driver cdc_ether
[    0.734375] usbcore: registered new interface driver net1080
[    0.734375] usbcore: registered new interface driver plusb
[    0.734375] usbcore: registered new interface driver rndis_host
[    0.734375] usbcore: registered new interface driver cdc_subset
[    0.734375] usbcore: registered new interface driver zaurus
[    0.742187] usbcore: registered new interface driver usbtest
[    0.742187] mice: PS/2 mouse device common for all mice
[    0.742187] TSC2005 driver initializing
[    0.742187] input: TSC2301 touchscreen as /class/input/input3
[    0.742187] tmp105: initializing
[    0.742187] menelaus 1-0072: Setting voltage 'VAUX' to 2800 mV (reg 0x0a,
val 0xb5)
[    0.757812] tmp105: found at address 0x48
[    0.796875] hci_h4p hci_h4p: Registering HCI H4P device
[    1.015625] menelaus 1-0072: Setting voltage 'VMMC' to 2800 mV (reg 0x0a,
val 0x75)
[    1.039062] menelaus 1-0072: Setting voltage 'VMMC' to 3100 mV (reg 0x0a,
val 0xf5)
[    1.062500] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb5)
[    1.078125] Advanced Linux Sound Architecture Driver Version 1.0.14rc3 (Wed
Mar 14 07:25:50 2007 UTC).
[    1.078125] omap24xx-eac omap24xx-eac: EAC version: 1.0
[    1.109375] ALSA device list:
[    1.109375]   #0: OMAP24xx EAC with codec TLV320AIC33
[    1.109375] Netfilter messages via NETLINK v0.30.
[    1.109375] ip_tables: (C) 2000-2006 Netfilter Core Team
[    1.109375] TCP cubic registered
[    1.109375] NET: Registered protocol family 1
[    1.109375] NET: Registered protocol family 17
[    1.109375] Bluetooth: L2CAP ver 2.8
[    1.109375] Bluetooth: L2CAP socket layer initialized
[    1.109375] Bluetooth: SCO (Voice Link) ver 0.5
[    1.109375] Bluetooth: SCO socket layer initialized
[    1.109375] Bluetooth: RFCOMM socket layer initialized
[    1.109375] Bluetooth: RFCOMM TTY layer initialized
[    1.109375] Bluetooth: RFCOMM ver 1.8
[    1.109375] Bluetooth: BNEP (Ethernet Emulation) ver 1.2
[    1.109375] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[    1.109375] Power Management for OMAP2 initializing
[    1.109375] PRCM revision 1.0
[    1.109375] DVS for OMAP2 initializing
[    1.109375] VFP support v0.3: implementor 41 architecture 1 part 20 variant
b rev 2
[    1.109375] Disabling unused clock "uart3_fck"
[    1.109375] Disabling unused clock "uart3_ick"
[    1.109375] Disabling unused clock "uart2_fck"
[    1.109375] Disabling unused clock "uart2_ick"
[    1.109375] Disabling unused clock "gpt12_fck"
[    1.117187] Disabling unused clock "gpt12_ick"
[    1.117187] Disabling unused clock "gpt11_fck"
[    1.117187] Disabling unused clock "gpt11_ick"
[    1.117187] Disabling unused clock "gpt10_fck"
[    1.117187] Disabling unused clock "gpt10_ick"
[    1.117187] Disabling unused clock "gpt9_fck"
[    1.117187] Disabling unused clock "gpt9_ick"
[    1.117187] Disabling unused clock "gpt8_fck"
[    1.117187] Disabling unused clock "gpt8_ick"
[    1.117187] Disabling unused clock "gpt7_fck"
[    1.117187] Disabling unused clock "gpt7_ick"
[    1.117187] Disabling unused clock "gpt6_fck"
[    1.117187] Disabling unused clock "gpt6_ick"
[    1.117187] Disabling unused clock "gpt5_fck"
[    1.117187] Disabling unused clock "gpt5_ick"
[    1.117187] Disabling unused clock "gpt4_fck"
[    1.117187] Disabling unused clock "gpt4_ick"
[    1.117187] Disabling unused clock "gpt3_fck"
[    1.117187] Disabling unused clock "gpt3_ick"
[    1.117187] Disabling unused clock "gpt2_fck"
[    1.117187] Disabling unused clock "gpt2_ick"
[    1.117187] Disabling unused clock "dss2_fck"
[    1.117187] Bootup reason: pwr_key
[    1.117187] OMAP GPIO switch handler initializing
[    1.117187] gpio-switch: required switch not found (-1, cam_act)
[    1.117187] gpio-switch: required switch not found (-1, cam_turn)
[    1.117187] slide (GPIO 110) is now closed
[    1.117187] kb_lock (GPIO 102) is now open
[    1.117187] headphone (GPIO 107) is now disconnected
[    1.117187] menelaus 1-0072: Setting voltage 'VMEM' to 1500 mV (reg 0x0a,
val 0xb4)
[    1.140625] menelaus 1-0072: Setting voltage 'VIO' to 2500 mV (reg 0x0a, val
0xb8)
[    1.187500] omap24xxcam omap24xxcam: sensor is TCM825x
[    1.203125] mmcblk0: mmc0:d555 SU06G 5970432KiB 
[    1.203125]  mmcblk0: p1
[    1.203125] mmci-omap mmci-omap.1: command timeout (CMD8)
[    1.203125] mmci-omap mmci-omap.1: command timeout (CMD8)
[    1.218750] menelaus 1-0072: Shutting off 'VIO'
[    1.218750] mmci-omap mmci-omap.1: command timeout (CMD8)
[    1.218750] mmci-omap mmci-omap.1: command timeout (CMD8)
[    1.234375] menelaus 1-0072: Shutting off 'VMEM'
[    1.250000] omap24xxcam omap24xxcam: registered device video0 [v4l2]
[    1.312500] VFS: Mounted root (jffs2 filesystem) readonly.
[    1.312500] Freeing init memory: 124K
[    1.671875] mmcblk1: mmc1:0001 000000 1966080KiB 
[    1.671875]  mmcblk1: p1
[    3.320312] umac: module license 'Proprietary' taints kernel.
[    3.351562] umac: version 2.0.7 loaded.
[    3.500000] cx3110x: chip variant STLC4550.
[    3.585937] cx3110x: firmware version 2.13.0.0.a.22.8.
[    3.601562] cx3110x: driver version 2.0.15 loaded.
[    6.203125] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb8)
[  190.039062] tahvo: Registering interrupt 7 for device 
[  190.039062] retu: Registering interrupt 8 for device 
[  190.039062] retu: Registering interrupt 1 for device 
[  190.039062] tahvo: Registering interrupt 1 for device 
[  190.578125] tahvo: Registering interrupt 2 for device 
[  270.500000] menelaus 1-0072: Setting voltage 'VMEM' to 1500 mV (reg 0x0a,
val 0xb8)
[  270.515625] menelaus 1-0072: Setting voltage 'VIO' to 2500 mV (reg 0x0a, val
0xb8)
[  270.562500] tcm825x_find_size(): size 3
[  270.609375] tcm825x_configure(): configuring Image Size 3
[  270.609375] tcm825x_configure(): configuring Pixel Format 1
[  270.656250] menelaus 1-0072: Shutting off 'VIO'
[  270.671875] menelaus 1-0072: Shutting off 'VMEM'
[  270.734375] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb8)
[  272.562500] JFFS2 notice: (407) check_node_data: wrong data CRC in data node
at 0x0ad6fa4c: read 0x52b938fe, calculated 0x1180fbc.
[  274.875000] MMU: OMAP mmu:dsp MMU initialized (HW v1.0)
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fff000, p=0x86ac5000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fce000, p=0x48008000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fe2000, p=0x48094000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fdc000, p=0x4807c000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fdd000, p=0x4807e000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fde000, p=0x48080000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fdf000, p=0x48082000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fe0000, p=0x48090000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fd9000, p=0x48068000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fb0000, p=0x54000000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fb1000, p=0x54001000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fb2000, p=0x54002000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fb3000, p=0x54003000, sz=0x1000
[  274.875000] MMU: mapping in ARM MMU, v=0xe0fb4000, p=0x54004000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fce000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fe2000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fdc000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fdd000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fde000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fdf000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fe0000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fd9000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fb0000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fb1000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fb2000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fb3000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fb4000, sz=0x1000
[  274.921875] MMU: unmapping in ARM MMU, v=0xe0fff000, sz=0x1000
[  274.921875] MMU: OMAP mmu:dsp MMU initialized (HW v1.0)
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fff000, p=0x86ac5000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fce000, p=0x48008000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fe2000, p=0x48094000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fdc000, p=0x4807c000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fdd000, p=0x4807e000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fde000, p=0x48080000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fdf000, p=0x48082000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fe0000, p=0x48090000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fd9000, p=0x48068000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fb0000, p=0x54000000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fb1000, p=0x54001000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fb2000, p=0x54002000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fb3000, p=0x54003000, sz=0x1000
[  274.921875] MMU: mapping in ARM MMU, v=0xe0fb4000, p=0x54004000, sz=0x1000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0028000, p=0x863d5000, sz=0x1000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0100000, p=0x85c00000, sz=0x100000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0200000, p=0x85d00000, sz=0x100000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0400000, p=0x85e00000, sz=0x100000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0500000, p=0x863e0000, sz=0x10000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0510000, p=0x863f0000, sz=0x10000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0520000, p=0x85f00000, sz=0x10000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0530000, p=0x85f10000, sz=0x10000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0540000, p=0x85f20000, sz=0x10000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0550000, p=0x85f30000, sz=0x10000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0560000, p=0x85f40000, sz=0x10000
[  274.945312] MMU: mapping in ARM MMU, v=0xe0570000, p=0x85f50000, sz=0x10000
[  275.273437] omapdsp: IPBUF configuration
[  275.273437]            512 words * 16 lines at 0xe0200000.
[  275.281250] omapdsp: found 5 task(s)
[  275.304687] omapdsp: task 0: name pcm3
[  275.304687] omapdsp: taskdev pcm3 enabled.
[  275.312500] omapdsp: task 1: name pcm_rec1
[  275.312500] omapdsp: taskdev pcm_rec1 enabled.
[  275.320312] omapdsp: task 2: name pcm1
[  275.320312] omapdsp: taskdev pcm1 enabled.
[  275.328125] omapdsp: task 3: name audiopp
[  275.328125] omapdsp: taskdev audiopp enabled.
[  275.335937] omapdsp: task 4: name pcm0
[  275.335937] omapdsp: taskdev pcm0 enabled.
[  275.867187] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb8)
[  275.984375] EAC mode: play enabled, rec enabled
[  276.742187] g_file_storage gadget: File-backed Storage Gadget, version: 28
November 2005
[  276.742187] g_file_storage gadget: Number of LUNs=2
[  276.742187] musb_hdrc musb_hdrc: MUSB HDRC host driver
[  276.750000] drivers/usb/core/inode.c: creating file 'devices'
[  276.750000] drivers/usb/core/inode.c: creating file '001'
[  276.750000] musb_hdrc musb_hdrc: new USB bus registered, assigned bus number
1
[  276.750000] musb_hdrc musb_hdrc: supports USB remote wakeup
[  276.750000] usb usb1: default language 0x0409
[  276.750000] usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1
[  276.750000] usb usb1: Product: MUSB HDRC host driver
[  276.750000] usb usb1: Manufacturer: Linux 2.6.21-omap1 musb-hcd
[  276.750000] usb usb1: SerialNumber: musb_hdrc
[  276.750000] usb usb1: uevent
[  276.750000] usb usb1: usb_probe_device
[  276.750000] usb usb1: configuration #1 chosen from 1 choice
[  276.750000] usb usb1: adding 1-0:1.0 (config #1, interface 0)
[  276.750000] usb 1-0:1.0: uevent
[  276.750000] hub 1-0:1.0: usb_probe_interface
[  276.750000] hub 1-0:1.0: usb_probe_interface - got id
[  276.750000] hub 1-0:1.0: USB hub found
[  276.750000] hub 1-0:1.0: 1 port detected
[  276.750000] hub 1-0:1.0: standalone hub
[  276.750000] hub 1-0:1.0: individual port power switching
[  276.750000] hub 1-0:1.0: no over-current protection
[  276.750000] hub 1-0:1.0: Single TT
[  276.750000] hub 1-0:1.0: TT requires at most 8 FS bit times (666 ns)
[  276.750000] hub 1-0:1.0: power on to power good time: 10ms
[  276.750000] hub 1-0:1.0: 200mA bus power budget for each child
[  276.750000] hub 1-0:1.0: local power source is good
[  276.750000] hub 1-0:1.0: enabling power on all ports
[  276.750000] drivers/usb/musb/tusb6010.c musb_platform_enable: dma not
reactivated
[  276.859375] drivers/usb/core/inode.c: creating file '001'
[  276.859375] hub 1-0:1.0: state 7 ports 1 chg 0000 evt 0000
[  278.562500] EAC mode: play disabled, rec disabled
[  278.859375] hub 1-0:1.0: hub_suspend
[  278.859375] usb usb1: usb auto-suspend
[  281.500000] EAC mode: play enabled, rec enabled
[  289.023437] EAC mode: play disabled, rec disabled
[  294.281250] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb8)
[  294.585937] cx3110x: loading 3826.arm firmware.
[  294.945312] slide (GPIO 110) is now open
[  295.679687] (c)opyright 2004 Conexant
[  295.679687] 
[  295.679687] build info: PRISM SoftMAC
[  295.679687]   creator: [kvalo]
[  295.679687]   date: [07/10/05-11:45]
[  295.679687] 
[  295.679687] cx3110x: MAC address 00:1d:6e:9b:c1:1e.
[  295.679687] cx3110x: libumac version 2.12.0.0.a.9.15-5
[  295.679687] cx3110x: lmac version 2.13.0.0.a.22.8
[  295.687500] cx3110x: PSM disabled.
[  297.484375] cx3110x: scanned 13 channels.
[  297.578125] cx3110x: shut down softmac.
[  303.156250] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb8)
[  304.234375] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb8)
[  306.992187] EAC mode: play enabled, rec enabled
[  309.671875] EAC mode: play disabled, rec disabled
[  314.453125] menelaus 1-0072: Setting voltage 'VMMC' to 3000 mV (reg 0x0a,
val 0xb8)

other check :
#
Nokia-N810-51-3:~# sfdisk -l
#

#
Disk /dev/mmcblk0: 61440 cylinders, 4 heads, 16 sectors/track
#
Units = cylinders of 32768 bytes, blocks of 1024 bytes, counting from 0
#

#
   Device Boot Start     End   #cyls    #blocks   Id  System
#
/dev/mmcblk0p1          0+  61439   61440-   1966072    b  W95 FAT32
#
/dev/mmcblk0p2          0       -       0          0    0  Empty
#
/dev/mmcblk0p3          0       -       0          0    0  Empty
#
/dev/mmcblk0p4          0       -       0          0    0  Empty
Comment 61 Eero Tamminen nokia 2008-06-16 11:18:50 UTC
(In reply to comment #60)
> I ve exactly the same problem ... on n810 on chinook RSS unactivated ...
> 
> Boot take between 5 to 15 minutes.

Always or sometimes?  Could you time it with a clock to get an accurate
measurement?   How full your rootfs is (i.e. what "mount" outputs)?


> And can't use at all swap.

What means "can't use swap"?


> I ve also noticed that moving an applet in hildon-desktop make
> pdflush eating cpu.

Just to verify: if you don't move an applet there's no pdflush,
and when moving the applet, there is?  How reproducible this is?

Does it matter which applet you move?
Comment 62 Benoît HERVIER 2008-06-16 11:24:03 UTC
"What means "can't use swap"?"

I mean when i active swap, pdflush eat cpu and freeze totally the device.

I ve see two things that make pdflush eating cpu.

- When i move the applet omweather around the desktop.
- When a bluetooth connection drop, sometimes a process related to bluetooth
eat more and more memory and cpu, and make pdflush eating cpu too.

Since i ve removed omweather, it seem's that i ve no more problems and i can
active swap. At this time i still get pdflush freezing device for 20s
sometimes,  but i can't found a relation with something.
Comment 63 Eero Tamminen nokia 2008-06-16 15:01:02 UTC
(In reply to comment #62)
> "What means "can't use swap"?"
> 
> I mean when i active swap, pdflush eat cpu and freeze totally the device.
> I ve see two things that make pdflush eating cpu.
> 
> - When i move the applet omweather around the desktop.
> - When a bluetooth connection drop, sometimes a process related to bluetooth
> eat more and more memory and cpu, and make pdflush eating cpu too.

Even when you had swap disabled?  To trigger pdflush in that case,
they would need to be doing disk writing...


> Since i ve removed omweather, it seem's that i ve no more problems and
> i can active swap.

Do you remember which version of omweather it was?
(at least some versions of the (3rd party) omweather
applet have leaked huge amounts of RAM which can cause
significantly more swap activity.)


> At this time i still get pdflush freezing device for 20s
> sometimes,  but i can't found a relation with something.

From what we found out, the freezes come out from JFFS2 garbage collecting.
If the rootfs has a lot of garbage to collect (garbage shows as free space
in "mount" output), even a one byte write to the disk can trigger the GC
needed to actually make that deleted file data internally re-usable by FS.
The process doing the writing naturally waits until the write is finished.

If the writing is done e.g. by Home plugin, Task Navigator and Statusbar
freeze also in addition to Home.
Comment 64 Peter 2008-06-16 23:29:14 UTC
I have the exact same symptoms, I had to disable my swap as well.

I have removed any RSS Feeds and all the email accounts, this has made this
issue nearly go away.

The browser will trigger an event, especially on the Most Emailed and Most
Viewed Pictures on Yahoo News. The event length was reduced by disabling all
components on the browser. Events still occur, intermittently while using the
device.

>From all the reading I have done on this issue, it seems the consensuses is that it has to do with the large SD cards in my device and how full they are.

I have two 2GB SD Cards in the device, one which is 60% full (removable) and
the other that is 40% full (internal).

I have not tried to delete all the files from the SD Cards, because I don't
believe that this is a reasonable solution.
Comment 65 Eero Tamminen nokia 2008-06-17 10:41:03 UTC
(In reply to comment #64)
> I have the exact same symptoms, I had to disable my swap as well.
> 
> I have removed any RSS Feeds and all the email accounts, this has
> made this issue nearly go away.

In Diablo RSS app does less writes to the internal flash which
should reduce the issues with RSS.   I think putting the email
folder to memory card instead of internal flash could also
help.


> The browser will trigger an event, especially on the Most Emailed and Most
> Viewed Pictures on Yahoo News. The event length was reduced by disabling all
> components on the browser.

When Flashplayer plays the videos (adverts etc), it needs to buffer
the streamed data to disk memory card(s) don't have enough space, it
uses the internal flash.  This buffering can cause a lot of disk churn.

You could try whether Flashblock helps:
  http://browser-extras.garage.maemo.org/news/7/


> Events still occur, intermittently while using the device.

Do you have anything else that could be doing lot of disk activity?
(especially to the internal flash, strace is your friend...)


>> From all the reading I have done on this issue, it seems the consensuses
>> is that it has to do with the large SD cards in my device and how full
>> they are.

Incorrect.  It's mostly about how much much writes and deletes are done to
the internal flash (JFFS2 file system).  If that file system is very full
that can make the issue radically worse.

It's possible that memory card writes (and deletes?) could also be making
this issue worse as use of swap is one of the triggers.  The cause could
be increased memory usage when swap is used (user can open more applications
which increases the RAM memory pressure) though.


> I have two 2GB SD Cards in the device, one which is 60% full (removable) and
> the other that is 40% full (internal).
> 
> I have not tried to delete all the files from the SD Cards, because I don't
> believe that this is a reasonable solution.

The internal flash is more interesting, how full it is?
(use "mount" command in XTerm to get reliable statistics)
Comment 66 Eero Tamminen nokia 2008-06-17 10:42:21 UTC
> (use "mount" command in XTerm to get reliable statistics)

Sorry, I meant "df".
Comment 67 Peter 2008-06-18 09:45:38 UTC
(In reply to comment #66)
> > (use "mount" command in XTerm to get reliable statistics)
> 
> Sorry, I meant "df".
> 

Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mtdblock4            2048      2048         0 100% /mnt/initfs
none                       512       136       376  27% /mnt/initfs/tmp
/dev/mtdblock4          257536    232692     24844  90% /
none                       512       136       376  27% /tmp
none                      1024        20      1004   2% /dev
tmpfs                     1024         0      1024   0% /dev/shm
/dev/mmcblk0p1         2010752   1199936    810816  60% /media/mmc2
/dev/mmcblk1p1         2010752   1557504    453248  77% /media/mmc1

sorry, your last email didn't make it through
Comment 68 Eero Tamminen nokia 2008-06-18 11:21:49 UTC
(In reply to comment #67)
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/mtdblock4          257536    232692     24844  90% /

Your problem is too full root file system.  Because of this,
JFFS2 file system needs to do garbage collecting much more
frequently when data is written to or removed from it. Note
that GCing gets usually done when space is needed (even if
its just 1 byte write), not not when user removes the files.

For best user experience, I would recommend having rootfs normally less
than 80% full (JFFS2 keeps whole disk structure in RAM so this will
save some memory too).  Move things that don't need to be on the rootfs
to the memory card(s).
Comment 69 Eero Tamminen nokia 2008-08-08 17:20:42 UTC
UBIFS documentation has some comparisons against JFFS2 on issues
relating to this bug (770/N800/N810 devices use JFFS2, UBIFS is
an experimental file system):
  http://www.linux-mtd.infradead.org/doc/ubifs.html#L_scalability

In case you're interested about technical information on factors
affecting Flash based file systems behavior.
Comment 70 Andre Klapper maemo.org 2008-12-02 22:57:32 UTC
(In reply to comment #57)
> We have now seen similar issue here.

Eero, is there any internal reference for this? Has this been handled or even
fixed in a later version?
Comment 71 Eero Tamminen nokia 2008-12-03 12:14:47 UTC
(In reply to comment #70)
> > We have now seen similar issue here.
> 
> Eero, is there any internal reference for this?

No, as this isn't a "bug" as such.  It's a feature of the JFFS2 file system
design (mainly in the jffs2_gcd_mtd4 case, pdflush could be more of a general
Linux kernel memory/file system feature).  Reproducing it requiring frequent
file system content changes for couple of weeks to trigger the issue doesn't
help either...


> Has this been handled or even fixed in a later version?

Hopefully it won't anymore be an issue in Fremantle, but it's way too early to
say.
Comment 72 Andre Klapper maemo.org 2009-05-20 15:56:12 UTC
(In reply to comment #71)
> No, as this isn't a "bug" as such.  It's a feature of the JFFS2 file system
> design (mainly in the jffs2_gcd_mtd4 case, pdflush could be more of a general
> Linux kernel memory/file system feature).  Reproducing it requiring frequent
> file system content changes for couple of weeks to trigger the issue doesn't
> help either...

Sounds like WONTFIX to me then.
Comment 73 Eero Tamminen nokia 2009-05-20 16:39:25 UTC
> Sounds like WONTFIX to me then.

JFFS2 isn't anymore used in Fremantle, so hopefully this will be fixed there,
but yes, for Diablo it's WONTFIX.
Comment 74 Andre Klapper maemo.org 2009-06-10 14:09:45 UTC
As per last comments I *assume* that this will not happen anymore in Fremantle
as JFFS2 isn't anymore used in Fremantle.

This is a WONTFIX for Diablo as Diablo is in maintenance mode and Nokia will
only provide bugfixes for critical issues, if at all.
Also see comment 71.