Bug 9353 - (int-159000) Writing large files to /home/user/MyDocs can take a very very long time
(int-159000)
: Writing large files to /home/user/MyDocs can take a very very long time
Status: NEW
Product: System software
File system
: 5.0/(3.2010.02-8)
: All Maemo
: Unspecified normal with 6 votes (vote)
: ---
Assigned To: unassigned
: file-system-bugs
:
: performance
:
:
  Show dependency tree
 
Reported: 2010-03-02 08:32 UTC by Alan Bruce
Modified: 2011-05-23 19:26 UTC (History)
5 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Alan Bruce (reporter) maemo.org 2010-03-02 08:32:33 UTC
SOFTWARE VERSION:
3.2010.02-8

EXACT STEPS LEADING TO PROBLEM: 
1. Get a large, compressed file onto the "Nokia N900" vfat partition (MyDocs)
(the file http://qole.org/files/debian-m5-v3b.img.ext2.lzma is a good test
case)
2. Close all other applications
3. use "time lzma -d " to decompress the file

EXPECTED OUTCOME:
The decompression should take 8-10 minutes

ACTUAL OUTCOME:
The decompression can take up to 1h45m

REPRODUCIBILITY:
More often than not it takes >45m. Occasionally, it takes 8-10m. It is hard to
determine what is causing the wild variation in time. This has been tried by
quite a few people, some immediately after reboot, and it can take a very long
time no matter what else is happening on the device

EXTRA SOFTWARE INSTALLED:
N/A (run on a wide variety of installs)

OTHER COMMENTS:
I have had significant improvements in speed after running fsck.vfat on the
MyDocs partition, which is why I filed this under "File System".
Comment 1 Lucas Maneos 2010-03-02 11:36:45 UTC
It's probably unrelated since as you say this is reproducible by several users,
but bug 5414 (caused by corrupt FAT fs) may be worth mentioning.
Comment 2 Lucas Maneos 2010-03-03 14:45:30 UTC
Seems to be mostly I/O-bound:

Nokia-N900-51-1:/home/user/MyDocs# time -v lzma -d debian-m5-v3b.img.ext2.lzma
    Command being timed: "lzma -d debian-m5-v3b.img.ext2.lzma"
    User time (seconds): 211.14
    System time (seconds): 36.12
    Percent of CPU this job got: 9%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 41m 41.83s
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 0
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 229
    Minor (reclaiming a frame) page faults: 37799
    Voluntary context switches: 7790
    Involuntary context switches: 44070
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

(yes, busybox time's filesystem I/O measurements are broken) and the
responsiveness of the device was seriously slow while this was running.  Sample
top output:

Mem: 241344K used, 4196K free, 0K shrd, 64K buff, 89284K cached
CPU:  0.4% usr  1.8% sys  0.0% nice  0.0% idle 97.5% io  0.1% irq  0.0% softir
Load average: 8.91 6.35 3.26
  PID  PPID USER     STAT   RSS %MEM %CPU COMMAND
 3363  3362 root     R    32624 13.2  0.7 lzma -d debian-m5-v3b.img.ext2.lzma

I checked with dosfsck just in case and it reports no errors on /dev/mmcblk0p1.

I also get similar results when uncompressing a same size file full of NULs
(produced with "dd if=/dev/zero bs=1024 count=2097152 | lzma > zero.lzma",
which compressed down to under 300K so reading and writing to the same device
isn't the cause of the problem).

Moreover, it doesn't seem that decompression matters much.  I can get worse
results (probably because lzma writes in larger chunks) results just by copying
the decompressed file:

Nokia-N900-51-1:/home/user/MyDocs# time -v cp debian-m5-v3b.img.ext2.lzma
debian-m5-v3b.img.ext2.2.lzma
cp: write error: No space left on device
Command exited with non-zero status 1
    Command being timed: "cp debian-m5-v3b.img.ext2.lzma
debian-m5-v3b.img.ext2.2.lzma"
    User time (seconds): 0.35
    System time (seconds): 19.91
    Percent of CPU this job got: 0%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 58m 20.43s
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 0
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 6
    Minor (reclaiming a frame) page faults: 226
    Voluntary context switches: 6200
    Involuntary context switches: 3135
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 1

(note that this terminated early due to full filesystem,
debian-m5-v3b.img.ext2.2.lzma was 757727232 bytes long, ie 35% of the original
at the end).

Now the question is whether the performance problem is at the block device of
filesystem layer.  Do you feel like running some vfat vs extX benchmarks on a
memory card?
Comment 3 Alan Bruce (reporter) maemo.org 2010-03-03 19:54:24 UTC
I was just going to come back and report that I get similarly dismal results
copying a 2GB file made with "dd if=/dev/zero". But it looks like you beat me
to it.

What I don't understand is why, very occasionally, it can be quick (10 mins) on
vfat MyDocs. What is different on those occasions?

I have a report that a user with ext3-formatted MyDocs has consistently fast
decompression.

http://mg.pov.lt/maemo-irclog/%23maemo.2010-02-11.log.html#t2010-02-11T20:10:48
Comment 4 Alan Bruce (reporter) maemo.org 2010-03-07 07:28:08 UTC
When mounted via USB in Mass Storage Mode to an Ubuntu machine with an Atom
processor, the lzma decompression (performed in the MyDocs partition, mounted
on the Ubuntu machine) takes less than 3 minutes.
Comment 5 Alan Bruce (reporter) maemo.org 2010-03-07 07:33:10 UTC
Also (when not connected to a PC), the same decompression on a FAT formatted
microSD auto-mounted as /media/mmc1 consistently takes much less time (approx
10-12 mins to complete) on my device.
Comment 6 Thomas Tanner 2010-03-12 10:21:41 UTC
on my device decompression takes consistenly 8-10mins on both ext3 eMMC and
vfat SD card (class 6)
when I start it over ssh and don't use the device during the process.
IIRC a few times it took much longer and I simply canceled it and tried again
after reboot.
Comment 7 Eero Tamminen nokia 2010-03-25 18:09:28 UTC
If something doing very large (compared to the amount of RAM in the device)
data writes to file system without using fadvise() appropriately, it will fill
kernel file cache and force processes to swap kernel tries to keep most
recently accessed memory pages in RAM.  Swapping makes device slow.
Comment 8 Bartosz Taudul 2010-03-25 18:21:47 UTC
(In reply to comment #7)

That's obviously unacceptable behavior. Shouldn't tuning
/proc/sys/vm/swappiness (eg. to 0) completly remove the problem you describe
(not necessarily fixing the bug)?
Comment 9 Lucas Maneos 2010-03-28 15:48:06 UTC
It seems that tracker is involved somehow.  Using the same zero.lzma as in
comment 2 but under /home/user/MyDocs/core-dumps (excluded from indexing by
default) I get:

Nokia-N900-51-1:/home/user/MyDocs/core-dumps# time lzma -d zero.lzma 
real    4m 16.20s
user    1m 4.82s
sys    0m 29.85s

and even with Alan's real-world case:

Nokia-N900-51-1:/home/user/MyDocs/core-dumps# time lzma -d
debian-m5-v3b.img.ext2.lzma 
real    7m 21.10s
user    3m 28.00s
sys    0m 36.99s

"vmstat 1" running during the above shows a bit of swapping taking place, but
certainly not thrashing.
Comment 10 Eero Tamminen nokia 2010-04-12 18:20:10 UTC
(In reply to comment #9)
> It seems that tracker is involved somehow.  Using the same zero.lzma as in
> comment 2 but under /home/user/MyDocs/core-dumps (excluded from indexing by
> default) I get:
...
> "vmstat 1" running during the above shows a bit of swapping taking place, but
> certainly not thrashing.

Tracker is tracking file system changes and every time something is written to
a file (in a directory tracker has asked to be notified about), kernel will
then wake it up.  If Tracker gets updates for same files in quick succession,
it should be waiting until they subside until re-indexing the file.

I don't have lzma.  Does it install some handler for .lzma files that tracker
would run (I don't see why tracker would otherwise do re-indexing)?

What other processes besides lzma "top" shows using CPU, or "htop" shows
increasing their "TIME+" column when the file is decompressed?
Comment 11 Lucas Maneos 2010-04-13 13:29:15 UTC
(In reply to comment #10)
> I don't have lzma.  Does it install some handler for .lzma files that tracker
> would run (I don't see why tracker would otherwise do re-indexing)?

Not as far as I can tell.  FWIW the package I'm using is is 4.43-14maemo5 from
extras, but the issue does not look specific to lzma since even a cp causes it.

> What other processes besides lzma "top" shows using CPU, or "htop" shows
> increasing their "TIME+" column when the file is decompressed?

The trackerd process and a freshly-spawned tracker-extract child go up by a
couple of seconds or so but then go to sleep, and htop obviously.  Nothing else
interesting.

The core-dumps difference was just an accidental observation.  I don't have a
good explanation for why it affects performance, but it looks reproducible and
not a one-time fluke.  Timings of 10 iterations writing in
/home/user/MyDocs/core-dumps are very consistent:

Nokia-N900-51-1:/home/user/MyDocs# for i in `seq 10`; do  echo `time 2>&1 lzma
-d < debian-m5-v3b.img.ext2.lzma > core-dumps/d$i`; rm core-dumps/d$i;  done    
real 6m 9.72s user 3m 26.66s sys 0m 33.37s
real 5m 38.06s user 3m 24.85s sys 0m 33.92s
real 5m 19.60s user 3m 24.47s sys 0m 33.29s
real 5m 22.01s user 3m 25.34s sys 0m 32.69s
real 5m 58.40s user 3m 25.23s sys 0m 33.35s
real 5m 35.32s user 3m 24.03s sys 0m 33.75s
real 5m 19.69s user 3m 25.37s sys 0m 32.34s
real 4m 55.66s user 3m 24.57s sys 0m 32.08s
real 5m 22.39s user 3m 24.56s sys 0m 33.03s
real 6m 10.54s user 3m 26.67s sys 0m 33.05s

while another 10 in /home/user/MyDocs have quite significant variations:

Nokia-N900-51-1:/home/user/MyDocs# for i in `seq 10`; do  echo `time 2>&1 lzma
-d < debian-m5-v3b.img.ext2.lzma > d$i`; rm d$i;  done                          
real 47m 33.54s user 3m 37.63s sys 0m 41.68s
real 48m 32.09s user 3m 38.46s sys 0m 40.85s
real 5m 46.97s user 3m 28.01s sys 0m 34.27s
real 11m 32.47s user 3m 27.28s sys 0m 34.36s
real 7m 19.78s user 3m 30.32s sys 0m 33.85s
real 6m 3.48s user 3m 30.01s sys 0m 33.80s
real 13m 22.91s user 3m 30.37s sys 0m 36.39s
real 7m 28.01s user 3m 29.17s sys 0m 35.26s
real 6m 10.12s user 3m 29.48s sys 0m 34.76s
real 12m 20.51s user 3m 32.11s sys 0m 39.64s

These runs were done consecutively over ssh/WLAN while the device was idle
(screen locked, IM/VoIP accounts and modest automatic updates disabled,
osso-xterm was the only user app running, no monitoring of any kind), fully
charged and plugged in to USB.

Perhaps iotop could provide more hints, but the default kernel doesn't have I/O
accounting enabled.  Or maybe oprofile, is there a vmlinux matching the
on-device zImage I can grab from somewhere?
Comment 12 Eero Tamminen nokia 2010-04-14 17:23:15 UTC
(In reply to comment #11)
> Perhaps iotop could provide more hints, but the default kernel doesn't have I/O
> accounting enabled.  Or maybe oprofile, is there a vmlinux matching the
> on-device zImage I can grab from somewhere?

On Fremantle you need just to install the Oprofile kernel module and user space
tools packages, you don't need to reflash anything:
  http://wiki.maemo.org/Documentation/devtools/maemo5/oprofile
Comment 13 Lucas Maneos 2010-04-16 13:08:44 UTC
(In reply to comment #12)
> On Fremantle you need just to install the Oprofile kernel module and user space
> tools packages, you don't need to reflash anything:

Yeah, I meant a vmlinux ELF image for using with "opcontrol --vmlinux" in order
to get a fuller picture.  Anyway, I couldn't find a ready-made one so I rebuilt
the kernel-2.6.28 package and used the vmlinux produced from that.

It doesn't seem to provide any more useful info for this case though (maybe I'm
using it wrong).  The top 20 reported symbols from an extraction to MyDocs (36m
58.73s user 3m 36.02s sys 0m 39.34s) are:

samples  %        app name                 symbol name
3619882  84.7012  vmlinux                  omap3_enter_idle
410772    9.6116  lzma                     /home/opt/maemo/usr/bin/lzma
31969     0.7480  oprofiled                /usr/bin/oprofiled
28148     0.6586  vmlinux                  tick_nohz_restart_sched_tick
21698     0.5077  vmlinux                  tick_nohz_stop_sched_tick
18048     0.4223  vmlinux                  __copy_from_user
11517     0.2695  vmlinux                  __do_softirq
10714     0.2507  vmlinux                  v7_dma_clean_range
5768      0.1350  mmc_block.ko             mmc_blk_issue_rq
4973      0.1164  vmlinux                  __copy_to_user
4619      0.1081  vmlinux                  schedule
4145      0.0970  vmlinux                  kmem_cache_alloc
3719      0.0870  vmlinux                  __make_request
3179      0.0744  fat.ko                   fat_get_block
2950      0.0690  fat.ko                   fat_bmap
2802      0.0656  vmlinux                  get_page_from_freelist
2670      0.0625  vmlinux                  __mpage_writepage
2602      0.0609  oprofile.ko              sync_buffer
2466      0.0577  vmlinux                  drop_buffers
2303      0.0539  vmlinux                  __remove_mapping

Do you know of any way to measure IO rather than CPU usage with the stock
kernel?
Comment 14 Eero Tamminen nokia 2010-04-16 16:48:23 UTC
(In reply to comment #13)
> It doesn't seem to provide any more useful info for this case though
> (maybe I'm using it wrong).

There are enough samples so as long as you started and stopped oprofile while
lzma was still working, it should be fine.


> The top 20 reported symbols from an extraction to MyDocs (36m
> 58.73s user 3m 36.02s sys 0m 39.34s) are:
> 
> samples  %        app name                 symbol name
> 3619882  84.7012  vmlinux                  omap3_enter_idle
> 410772    9.6116  lzma                     /home/opt/maemo/usr/bin/lzma
> 31969     0.7480  oprofiled                /usr/bin/oprofiled

Ok, so CPU was almost completely idle as already seen in comment 2 (where IO
shows as issue).


> Do you know of any way to measure IO rather than CPU usage with the stock
> kernel?

No.  For checking IO one could use iotop:
  http://packages.debian.org/squeeze/iotop

But it needs one to enable CONFIG_TASKSTATS, CONFIG_TASK_DELAY_ACCT and
CONFIG_TASK_IO_ACCOUNTING kernel options.

We already know IO is the problem, but it might show whether there's swapping
(which would certainly be an issue because swap is also on eMMC).  Seeing stuff
being in D state in top would tell that too.


Best would probable be to install real "procps" (as better replacement for
busybox-symlinks-procps) and using "vmstat 1" to see what's different between
invocations that work fast and ones that don't.
Comment 15 Thomas Tanner 2010-05-15 17:59:19 UTC
I wonder why CONFIG_MMC_BLOCK_BOUNCE is disabled in the kernel.
Could this be the reason?

          SD/MMC is a high latency protocol where it is crucial to
          send large requests in order to get high performance. Many
          controllers, however, are restricted to continuous memory
          (i.e. they can't do scatter-gather), something the kernel
          rarely can provide.

          Say Y here to help these restricted hosts by bouncing
          requests back and forth from a large buffer. You will get
          a big performance gain at the cost of up to 64 KiB of
          physical memory.

          If unsure, say Y here.
Comment 16 Alan Bruce (reporter) maemo.org 2011-04-25 03:20:46 UTC
Firstly, since PR 1.3, this problem has gotten much worse for many users. I am
getting constant reports of the N900 *REBOOTING* while decompressing large
files on the MyDocs partition, something it never did before. It seems that the
device becomes completely unresponsive for a period of time, then reboots.

Secondly, a few of us seem to have just discovered an interesting workaround.

Simply issuing the following two commands (as root) before decompressing a
large file in the MyDocs partition seems to "fix" the problem.

swapoff /dev/mmcblk0p3
swapon /dev/mmcblk0p3

I'm not sure why this fixes the problem. But it does. I have tested several
times, and the operation takes over an hour without these commands, and 6-8
minutes with them.

The operation I'm attempting is the lzma decompression of a file from
http://qole.org/files

time lzma -dk ubuntu-m5-v1.img.ext2.lzma
Comment 17 Alan Bruce (reporter) maemo.org 2011-04-25 20:50:49 UTC
The problem with rebooting is bug 11720 but I can see nobody's looking at these
bugs any more. :(

Anyway, I found that I can't just do swapoff/swapon, I need to also issue two
more commands before swapoff/swapon:

echo 512 > /sys/block/mmcblk0/queue/nr_requests
echo 512 > /sys/block/mmcblk1/queue/nr_requests
Comment 18 Eero Tamminen nokia 2011-05-03 16:17:19 UTC
(In reply to comment #17)
> Anyway, I found that I can't just do swapoff/swapon, I need to also issue two
> more commands before swapoff/swapon:
> 
> echo 512 > /sys/block/mmcblk0/queue/nr_requests
> echo 512 > /sys/block/mmcblk1/queue/nr_requests

Ok, so your copy program is causing "swapping hell" by poisoning page cache and
pushing all processes to swap.

Do copies with the file manager or gnome-vfs (which has been fixed to tell
kernel that that all the data it's copying doesn't need to be cached) instead
of cp / dd.
Comment 19 Alan Bruce (reporter) maemo.org 2011-05-21 02:07:59 UTC
(In reply to comment #18)
> Do copies with the file manager or gnome-vfs (which has been fixed to tell
> kernel that that all the data it's copying doesn't need to be cached) instead
> of cp / dd.

Wait, what? You're saying the decompression program (not copy program) I'm
using is at fault because it hasn't been "fixed" with the workaround that you
have added to the file manager?

Is there any way to make a little program that "tells the kernel" to stop
caching data for any arbitrary command or shell session? You know, the way you
can run "time command" or "tsocks command"?
Comment 20 Eero Tamminen nokia 2011-05-23 19:26:41 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > Do copies with the file manager or gnome-vfs (which has been fixed to tell
> > kernel that that all the data it's copying doesn't need to be cached) instead
> > of cp / dd.
> 
> Wait, what? You're saying the decompression program (not copy program) I'm using

Can you trigger the same thing just by doing e.g. "cp" with the uncompressed
file?


>  is at fault because it hasn't been "fixed" with the workaround
> that you have added to the file manager?

It's best if you first test whether the use-case works better if you do the
copying with File manager (or gnome-vfs command line util) to verify my
assumption.


> Is there any way to make a little program that "tells the kernel" to stop
> caching data for any arbitrary command or shell session? You know, the way you
> can run "time command" or "tsocks command"?

I doubt it as it would need to manipulate the program's FD at suitable
intervals.