Bug 1842 - (int-86424) Constant metalayer-crawler crashes & CPU usage with some VMA file(s)
(int-86424)
: Constant metalayer-crawler crashes & CPU usage with some VMA file(s)
Status: RESOLVED FIXED
Product: Data
metalayer-crawler
: 3.2
: All Maemo
: Medium critical with 3 votes (vote)
: 4.1.3
Assigned To: unassigned
: metatracker-bugs
:
: crash, performance, use-time
:
: 2602
  Show dependency tree
 
Reported: 2007-08-16 23:44 UTC by Paul Dundas
Modified: 2008-12-17 20:20 UTC (History)
9 users (show)

See Also:


Attachments
strace from metalayer-crawler (22.81 KB, text/plain)
2008-05-20 14:38 UTC, Paul Gear
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Paul Dundas (reporter) 2007-08-16 23:44:44 UTC
Almost immediately after upgrade of O/S to 4.2007.26-8 the device slowed and
ground to a virtual halt, with device eventually not clicking when screen
tapped, battery charging icon not animating, and no response to keys or screen.
As the machine slows, one instance of /usr/bin/metalayer-crawler consumes up to
95% CPU (admittedly low priority - PRI 31 nice 19) and 65% MEM (according to
top, or 80180 or so in ps -ax), and a second instance with "zero" CPU and same
MEM as above.

The memory usage rises steadily (albeit with brief, small decreases) over a
period of minutes (5 or so - not timed it) until the oom killer destroys the
process. When the metalayer-crawler is killed by OOM (or manually from the
command line), responsiveness of the device returns for a minute or two. The
cycle then repeats -- for ever, it seems (or until a reboot is forced by the
general instability). 

In my case, device has internal 8G SDHC card and 1G SD card. No symbolic links
that I'm aware of to create loops (see bug 1760), and in any case the problem
did not manifest under the previous OS version, 3.2007.10-7. 
dmesg mentions no media faults that I can see, but does refer to repeated
memory allocations being refused for metalayer-crawler due to low memory.

Google reveals a number of people with a similar problem.

Component could be media player, I suppose, but it happens whether or not media
player is running.
Comment 1 Paul Dundas (reporter) 2007-08-23 04:49:42 UTC
Actually is severe memory leak, and prevents use of device - hence change to
severity.
Comment 2 Eero Tamminen nokia 2007-09-03 14:57:47 UTC
(In reply to comment #1)
> Actually is severe memory leak, and prevents use of device - hence change to
> severity.

The crawler doesn't handle corrupt MMC cards well enough even in the latest
release, either of these MMC FAT corruptions create the given symptoms:
- recursive directory hierarchy (infinite directory depth)
  -> NB#60780
- looping directory contents (infinite number of files in directory)
  -> NB#67368
This will be fixed to the next release. Additionally its crawling uses
a recursive algorithm:
  -> NB#53578

In the meanwhile, please run fsck (on Linux) or Scandisk (Windows) on your MMC.
Comment 3 Paul Dundas (reporter) 2007-09-09 19:10:07 UTC
fsck seems fine (assuming these are sensible args/devices):

/home/user # /sbin/fsck.vfat -nv /dev/mmcblk0p1
dosfsck 2.11 (12 Mar 2005)
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Checking we can access the last sector of the filesystem
Boot sector contents:
System ID "MSDOS5.0"
Media byte 0xf8 (hard disk)
       512 bytes per logical sector
      4096 bytes per cluster
        32 reserved sectors
First FAT starts at byte 16384 (sector 32)
         2 FATs, 32 bit entries
   7883264 bytes per FAT (= 15397 sectors)
Root directory start at cluster 2 (arbitrary size)
Data area starts at byte 15782912 (sector 30826)
   1970802 data clusters (8072404992 bytes)
63 sectors/track, 255 heads
      8192 hidden sectors
  15797248 sectors total
Checking for unused clusters.
Checking free cluster summary.
/dev/mmcblk0p1: 19040 files, 1925336/1970802 clusters
/home/user #
/home/user # /sbin/fsck.vfat -nv /dev/mmcblk1p1
dosfsck 2.11 (12 Mar 2005)
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Checking we can access the last sector of the filesystem
Boot sector contents:
System ID "        "
Media byte 0xf8 (hard disk)
       512 bytes per logical sector
     16384 bytes per cluster
         1 reserved sector
First FAT starts at byte 512 (sector 1)
         2 FATs, 16 bit entries
    125952 bytes per FAT (= 246 sectors)
Root directory starts at byte 252416 (sector 493)
       512 root directory entries
Data area starts at byte 268800 (sector 525)
     62856 data clusters (1029832704 bytes)
63 sectors/track, 32 heads
       243 hidden sectors
   2011917 sectors total
Checking for unused clusters.
/dev/mmcblk1p1: 413 files, 18243/62856 clusters
/home/user #
Comment 4 Eero Tamminen nokia 2007-09-10 17:40:10 UTC
When the problem happens again, could you do following on both
of the crawler PIDs:  "ls -l /proc/PID/fd/"?

You might also install strace from the Maemo repos, attach to
the crawler process with "strace -p PID" and attach a short
bit of what strace logs crawler to be doing.

Thanks!
Comment 5 James Sparenberg 2007-11-29 06:49:48 UTC
I'm running the beta (not the leaked version) of OS2008 on an n800 hw build
1301.  I've been experiencing the same condition it is impossible to kill the
process once it starts.  To the point that it even comes back over a
shutdown/battery removal/restart.  I then ran

sudo update-rc.d metalayer-crawler0 remove and rebooted the system  At that
point it has finally been halted.  

Additionally I've found that even remove the two vfat formated 4GB disks and
then starting the system up didn't change it metalayer_crawler kept on going. 
I've on the suggestion of some checked the rootFS and I've not yet found any
file loops that could be grabbing it.  

Please let me know if I can provide additional info.
Comment 6 James Sparenberg 2007-11-29 07:05:30 UTC
Per the request in the above comment I ran 

strace -p PID

0 output.  It seems as if the process had halted action and was chewing up CPU
cycles in a wait state. Once I'm able to get the battery recharged I'll see if
I can mod the init script in /etc/init.d/ enough to have strace called at the
very beginning and see what's what.
Comment 7 Eero Tamminen nokia 2007-11-29 10:06:28 UTC
(In reply to comment #5)
> Additionally I've found that even remove the two vfat formated 4GB disks and
> then starting the system up didn't change it metalayer_crawler kept on going. 
> I've on the suggestion of some checked the rootFS and I've not yet found any
> file loops that could be grabbing it.  

This is very interesting/strange as we've had crawler problems
only with symlinks (which normal user cannot create), corrupted MMC
FAT file systems and libid3 being very slow in getting info from
some mp3 files (which all should be fixed).

Have you installed any extra software (what)?


(In reply to comment #6)
> Per the request in the above comment I ran 
> 
> strace -p PID
> 
> 0 output.  It seems as if the process had halted action
> and was chewing up CPU cycles in a wait state.

Maybe you could attach (the newly released) "ltrace"
to crawler to see what that outputs?

Please paste here some output of both ltrace & strace
when crawler is busy.


> Once I'm able to get the battery recharged I'll see if
> I can mod the init script in /etc/init.d/ enough to have
> strace called at the very beginning and see what's what.  

I don't think that gives any additional information.


I want to know whether crawler is stuck on some particular file,
going through a lot of files or what.  Could you do following on
the crawler PID:
  "ls -l /proc/PID/fd/"?

And then repeat it after a few minutes?


It "shouldn't" be behaving anymore as you described, but it's possible
we haven't found some condition that could still trigger this. :-/
Comment 8 Eero Tamminen nokia 2007-11-29 12:50:40 UTC
Seems that the symlink bug has crept back (NB#75984).
Create a symlink under a directory that crawler monitors and reboot.
-> crawler uses all CPU it gets.

Crawler is a niced process, so it cannot take CPU from other processes,
but large memory usage and and constant media access can still slow down
other applications and it drains the battery really fast.

However, you stated that you don't have such symlinks, so more info
is still needed as requested in comment 7.
Comment 9 Matt Emson 2008-01-10 15:42:41 UTC
Having only ever used the latest OS2007 firmware, the OS2008 beta and the
OS2008 final, and certainly not being a power user, I can confirm that the
Metadata Crawler goes insane on a semi regular basis. It consumes a lot of
processor time and ends up running the battery really fast. Reduced a tablet
with 5+ days stand by to a powerless brick in need of a full charge overnight. 

So I became a  power user. I gained root, and removed the executable from the
equation (renamed it), killed the running process and now my battery life is as
expected. I get 3 - 4 hours browsing and 5 - 8 days stand by. 

Conclusion - the metadata crawler causes devices to regularly consume their
entire battery charge when idle and with both Wifi and screen turned off. 

My system is N800, 2 memory cards (stock Nokia supplied in internal slot, 256MB
MMC in external.) MMC was reformatted before use.
Comment 10 Paul Gear 2008-02-08 09:38:52 UTC
This bug has affected my N800 on OS2008, with 2 x 2 GB cards (which both pass
fsck -f) installed.  I had to use the workaround described in
https://bugs.maemo.org/show_bug.cgi?id=978 (which is theoretically fixed) to
fix it.  I don't think #978 is as resolved as the bug report says...
Comment 11 Eero Tamminen nokia 2008-02-08 09:58:26 UTC
(In reply to comment #10)
> This bug has affected my N800 on OS2008, with 2 x 2 GB cards (which both pass
> fsck -f) installed.  I had to use the workaround described in
> https://bugs.maemo.org/show_bug.cgi?id=978 (which is theoretically fixed)
> to fix it.  I don't think #978 is as resolved as the bug report says...

The causes from it were fixed (except for symlink issue which crept back),
but there can be many different things causing crawler behave badly, most
of them related to corrupted memory card file system.


Please provide the information requested in comment 7.  If we don't have
the information about what (in your case) caused crawler to behave badly,
we cannot fix it.
Comment 12 Paul Dundas (reporter) 2008-02-29 13:27:11 UTC
In my case I have discovered that the immediate cause of the problem was a
rogue install of rapier, which had run amok and created subdirectories of
subdirectories (probably thought it was following a link, and failed due to
FAT32). The result was about 13000 levels of subdirectory. Or was it ten times
that. Basically it filled the rest of an 8G SD card.

But the program probably should not react by grabbing all the memory, failing
further memory allocation, and being killed by OOM killer.

Will try reactivating process now that the millions of subdirectories have been
deleted, and report back (assuming I survive next week's skiing).
Comment 13 Frantisek Dufka maemo.org 2008-03-21 12:07:37 UTC
Not sure if I should create new bug for this. If yes, it should be called
"metalayer-crawler grabs RAM & CPU, paralysing device when booting from MMC
card"

Issue is that even when crawler behaves relatively sensibly (i.e. card scan
with lot of media takes ~10 minutes with medium cpu load), same system when
cloned to mmc causes metalayer-crawler to index same cards for hours with 100%
load, see this post
http://www.internettablettalk.com/forums/showthread.php?p=157646#post157646 for
one example. I've seen similar slowdown ratio on my device when booted from mmc
but have less media and have habit of disabling metalayer crawler completely.
This behaviour is similar/same for any 200x OS so far that has
metalayer-crawler.

Is this some (sqlite?) write caching issue that can be tuned to behave better
when running from ext2 filesystem? If this is because of frequent syncing or no
write caching when updating crawler database, maybe this is worth fixing even
when running from jffs2 (less flash wear)?
Comment 14 Eero Tamminen nokia 2008-03-31 13:10:15 UTC
FYI: due to reliability reasons, the crawler sqlite database is journaled.
This means that even queries to the database cause a flash write.  Would
that explain your issues when booting device from memory card?

What strace/ltrace say when you trace crawler?
The earlier slowdowns have come from:
- corrupted memory cards which crawler didn't handle properly
- bad metadata extraction code in open source libid3
Maybe the gstreamer metadata extractors have some issues too?
Comment 15 Frantisek Dufka maemo.org 2008-04-01 13:48:31 UTC
(In reply to comment #14)
> FYI: due to reliability reasons, the crawler sqlite database is journaled.

Thanks for the tip. Googled a bit, found
http://www.sqlite.org/atomiccommit.html did quick strace and saw lots of
creating/writing/deleting of /home/user/.meta_storage-journal and many fsync
calls. That might really slow things down. I wonder why all this is needed in
our case. Most or all data in metalayer crawler database is recreated after
every boot or every card insertion. IMO all that is really needed for
reliability is to detect unclean shutdown (i.e. random reboot, power loss) and
recreate crawler database from scratch in such case. Then we don't need such
wasteful journaling at all.

> This means that even queries to the database cause a flash write.  Would
> that explain your issues when booting device from memory card?

Queries (=reading) cause journal write? Some bug in sqlite 3.4.1 used on
device?

> 
> What strace/ltrace say when you trace crawler?


Will do more tests with more media on card and will try to recompile
libsqlite3-0 with journal disabled and see how it affect time with ext2 and
jffs2. Also I wonder why jffs2 is fast, is fsync on jffs2 no-op or is there
some other filesystem caching involved?

Anyway, I don't like much what sqlite is doing with the journal, for media
database it seem like overkill to me.
Comment 16 Frantisek Dufka maemo.org 2008-04-01 14:42:50 UTC
(In reply to comment #15)
>  did quick strace and saw lots of
> creating/writing/deleting of /home/user/.meta_storage-journal and many fsync
> calls.

Just to clarify this, I have almost no media on the card, just scummvm games
and default maps (N810). Media player sees 27 media files on my card and 31 in
total (30 songs, 1 video in total, 3 songs + 1 video in internal flash, 27
songs on card). In strace of crawler after just opening and closing card doors
scan I have
28 times open("/home/user/.meta_storage-journal",
O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0644), 28 times unlink (i.e. delete) of this
journal file, 135 writes to journal, 27 reads of journal and 55 of fsync calls
to journal.

For each media on card this means 1 journal creation and removal, 2 fsyncs, 5
writes. All this traffic can be quite costly and not really needed.

Then I have additional 136 reads from /home/user/.meta_storage and 60 writes
(i.e. 2 writes per media) which is the actual traffic from/to media database.
Comment 17 Eero Tamminen nokia 2008-04-01 15:48:47 UTC
> IMO all that is really needed for reliability is to detect unclean shutdown
> (i.e. random reboot, power loss) and recreate crawler database from scratch
> in such case. Then we don't need such wasteful journaling at all.

For device to know that shutdown was unclean, you would need device to retain
power to keep that information.  This happens on HW watchdog reboots, but
not when battery is disconnected.

As to storing information about clean shutdown (and clearing that on bootup),
that cannot happen to rootfs as that can get full.  But maybe that's not
a problem, re-create will the also fail. ;-)  Hm. If the database would
be recreated on every boot, you might even keep the database in RAM disk
if it doesn't grow too large (for this you most likely will need to
increase /tmp tmpfs size).


> Also I wonder why jffs2 is fast, is fsync on jffs2 no-op or is there
> some other filesystem caching involved?

JFFS2 is run synchronously except for 2KB write buffer (which contains
compressed data so if the data is zeroes, there can be megs of it).
It's possible that because of this mostly synchronous behaviour & speed
reasons fsync() could be no-op, I don't know.
Comment 18 Frantisek Dufka maemo.org 2008-04-01 17:16:06 UTC
(In reply to comment #17)
> As to storing information about clean shutdown (and clearing that on bootup),
> that cannot happen to rootfs as that can get full. 

Well, shutdown (and boot) with full rootfs may not be very clean anyway ;-) And
in this case it will not hurt much, when device is full user will have bigger
problem than empty crawler database (if it will be deleted but not recreated
due to full disk). And currently it is re-created on each boot too (at least
for media on cards, you cannot be sure user did not remove card when device is
off).
And removing the database might be even healthy for the device when disk is
full :-)

BTW I added gross hack to main.c, line 720, function sqlite3BtreeFactory, in
libsqlite3-0 package. It omits journal just for crawler database:

#define CRAWLER_JOURNAL_HACK 1                                                  
#ifdef CRAWLER_JOURNAL_HACK                                                     
  // gross hack for metalayer crawler, match if end of file name is
.meta_storage                                                     
  // if yes then omit journal file                                              
  if (zFilename!=NULL){                                                         
    int nlen=strlen(zFilename);                                                 
    if (nlen>=13 && strcmp(".meta_storage",zFilename+(nlen-13))==0)             
        omitJournal=1;                                                          
  }                                                                             
#endif                                                                          


I verified it works and journal is not used for metalayer-crawler but did not
measure speedup, if anyone wants to measure difference with lot of media on
card, please do. Package is here
http://fanoush.wz.cz/maemo/libsqlite3-0_3.4.1-1osso3_armel.deb

First backup /usr/lib/libsqlite3.so.0.8.6, and after installation check if
/usr/lib/libsqlite3.so.0 symlink points to correct (new) file and reboot.
Uninstall by copying old file back or change symlink for temporary switch.
Comment 19 Frantisek Dufka maemo.org 2008-04-01 23:59:11 UTC
Added some mp3 files to miniSD card in my N810 but had only 800MB free so total
number of media is now 194 clips. The difference in speed between sqlite with
and without using journal is there but is not that significant in my case. In
journaled mode card scan takes 9-10 seconds and with journaling disabled 5-6
seconds. Measured with stopwatches in media player after card door is opened
and closed. The time is between beginning of "refreshing library" infoprint and
showing updated number of clips in library view.

Unfortunately I cannot add more media easily right now. Also I'm booting system
from internal N810 mmc drive and media files are on external miniSD card so
there is no concurrent read/write access to same card in this case. In addition
ext2 block size on root filesystem on internal drive is only 1KB and I have
high-speed 48MHz MMC mode enabled so extra journal writing may be faster and
slowdown not so significant. I'll try to do some further tests with N800 with
bigger SD card and more media preferably on same card as root filesystem. But
it won't be exactly tomorrow, my N800 currently has OS2007 not 2008 so it means
rebuilding different libsqlite with different SDK. Let's hope we'll have some
numbers from the person having 10 minutes vs 7 hours difference
http://www.internettablettalk.com/forums/showthread.php?p=163343#post163343
Comment 20 Roger Sperberg 2008-04-02 23:18:22 UTC
I installed 2.2007.51-3 on my N810. I have a 4GB microSD-HD card and I copied a
number of books, some as zip archives, and songs there afterwards. I already
had some 70 or so songs on the card. (During the filing of this report, I ran
ScanDisk on the card and was told there were no problems with this card;
however, the internal memory card was reported as in use and not accessible by
Windows.)

I installed easyroot and load-applet, which I have used.

I also installed a number of apps that I have not launched at all: fbreader,
games, maemo-recorder, gizmo-project, camera, bzip2, unzip and gpscamera.

I also installed and then uninstalled without ever using something I recall
vaguely as being called PAN Bluetooth or something like that. However, I can't
find anything by that name in the installable apps, so perhaps it was MaemoPAN.

I created a symlink. I also created a /home/user/.fonts/ directory and moved a
.ttf file there.

I also did one other thing: using a binary for fontforge supplied by Matan (as
discussed in this forum thread at Internet Tablet Talk:
http://www.internettablettalk.com/forums/showthread.php?t=18560&highlight=fontforge
), I made the binary executable, moved it to /usr/bin/ and ran it a few times
to see how it worked (launching from XTerm). I made some screen captures, but
these don't show the load-applet icons for CPU & RAM so I don't know whether
the problems occurred before or after these sessions.

Since then I have had extreme behavior not even hinted at in these earlier
reports: metalayer-crawler is consuming 55 MB and the load has never been below
200%. A moment ago it showed 793%. The chief observable by-products are the
rapid battery depletion and a complete inability to connect to our home network
via WiFi. I closed the metalayer-crawler once, only to discover it had
reappeared a few minutes later. I turned the N810 off and restarted it more
than once.

It's not clear whether the symlink is what got this started.

As requested in comment 4, I entered ls -l /proc/1509/fd/ in XTerm, with these
results:

~ $ ls -l /proc/1509/fd/
lrwx------ 1 user users 64 Apr 2 16:40 0 -> /mnt/initfs/dev/null
lrwx------ 1 user users 64 Apr 2 16:40 1 -> /mnt/initfs/dev/console
lr-x------ 1 user users 64 Apr 2 16:40 10 -> /
lr-x------ 1 user users 64 Apr 2 16:40 11 -> /dev
lr-x------ 1 user users 64 Apr 2 16:40 12 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 13 -> /home/user/MyDocs
lr-x------ 1 user users 64 Apr 2 16:40 14 -> /
lr-x------ 1 user users 64 Apr 2 16:40 15 -> /dev
lr-x------ 1 user users 64 Apr 2 16:40 16 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 17 -> /home/user/MyDocs
lr-x------ 1 user users 64 Apr 2 16:40 18 -> /
lr-x------ 1 user users 64 Apr 2 16:40 19 -> /dev
lrwx------ 1 user users 64 Apr 2 16:40 2 -> /mnt/initfs/dev/console
lr-x------ 1 user users 64 Apr 2 16:40 20 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 21 -> /home/user/MyDocs
lr-x------ 1 user users 64 Apr 2 16:40 22 -> /
lr-x------ 1 user users 64 Apr 2 16:40 23 -> /dev
lr-x------ 1 user users 64 Apr 2 16:40 24 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 25 -> /home/user/MyDocs
lr-x------ 1 user users 64 Apr 2 16:40 26 -> /
lr-x------ 1 user users 64 Apr 2 16:40 27 -> /dev
lr-x------ 1 user users 64 Apr 2 16:40 28 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 29 -> /home/user/MyDocs
lrwx------ 1 user users 64 Apr 2 16:40 3 -> /home/user/.meta_storage
lr-x------ 1 user users 64 Apr 2 16:40 30 -> /
lr-x------ 1 user users 64 Apr 2 16:40 31 -> /dev
lr-x------ 1 user users 64 Apr 2 16:40 32 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 33 -> /home/user/MyDocs
lr-x------ 1 user users 64 Apr 2 16:40 34 -> /
lr-x------ 1 user users 64 Apr 2 16:40 35 -> /dev
lr-x------ 1 user users 64 Apr 2 16:40 36 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 37 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 38 -> /proc/1509/fd
lr-x------ 1 user users 64 Apr 2 16:40 39 -> /
lr-x------ 1 user users 64 Apr 2 16:40 4 -> pipe:[153996]
lr-x------ 1 user users 64 Apr 2 16:40 40 -> /proc
lr-x------ 1 user users 64 Apr 2 16:40 41 -> /proc/973
l-wx------ 1 user users 64 Apr 2 16:40 5 -> pipe:[153996]
lrwx------ 1 user users 64 Apr 2 16:40 6 -> socket:[153997]
lrwx------ 1 user users 64 Apr 2 16:40 7 -> socket:[153999]
lr-x------ 1 user users 64 Apr 2 16:40 8 -> inotify
lr-x------ 1 user users 64 Apr 2 16:49 9 -> /home/user/MyDocs
~ $

Thanks,

Roger Sperberg
Comment 21 Eero Tamminen nokia 2008-04-03 15:30:12 UTC
To Frantisek:
I tested crawler again and the journal is written only when some media file
is added or deleted to watched directories.  Writes don't seem to happen when
the database is just read.  I.e. My earlier comment about that was wrong or 
obsolete.

If you bug happens only when rootfs is on memory card, please file that
as a separate bug (you can put me on CC).


(In reply to comment #20)
> Since then I have had extreme behavior not even hinted at in these earlier
> reports: metalayer-crawler is consuming 55 MB and the load has never been
> below 200%. A moment ago it showed 793%.

And crawler is the process taking all/most CPU?


> It's not clear whether the symlink is what got this started.

Symlink in a suitable place can make crawler crazy.
I think that after removing the symlink, you need to boot the device
to get it behaving again.


> As requested in comment 4, I entered ls -l /proc/1509/fd/ in XTerm, with these
> results:

Unfortunately this doesn't show crawler being stuck on any particular file.
Could you install ltrace according to instructions here:
  http://maemo.org/development/tools/

And run it with:
  ltrace -S -p $(pidof metalayer-crawler)

To see what crawler is doing.  (if this is not because of the symlink)
Comment 22 Eero Tamminen nokia 2008-04-21 18:24:29 UTC
(In reply to comment #20)
> ~ $ ls -l /proc/1509/fd/
> lrwx------ 1 user users 64 Apr 2 16:40 0 -> /mnt/initfs/dev/null
> lrwx------ 1 user users 64 Apr 2 16:40 1 -> /mnt/initfs/dev/console
> lr-x------ 1 user users 64 Apr 2 16:40 10 -> /
> lr-x------ 1 user users 64 Apr 2 16:40 11 -> /dev
> lr-x------ 1 user users 64 Apr 2 16:40 12 -> /proc/1509/fd
> lr-x------ 1 user users 64 Apr 2 16:40 13 -> /home/user/MyDocs

I didn't notice earlier that it had its own FD directory open.
Has anybody else having a problem with the crawler noticed this when doing:
  ls -l /proc/$(pidof metalayer-crawler)/fd/
?
Comment 23 Paul Gear 2008-05-20 14:22:50 UTC
This continues to affect me on the latest OS2008 image for N800.  I reflashed
the device, restored my applications, and immediately upon reboot i get 70-100%
CPU to metalayer-crawler.  I've double-checked my SD cards with 'fsck.vfat
-fnvV' from my Debian etch desktop.  I haven't created any symlinks that i know
of.  I'll attach an strace from the metalayer process shortly.
Comment 24 Paul Gear 2008-05-20 14:38:53 UTC
Created an attachment (id=769) [details]
strace from metalayer-crawler

uname -a output:
Linux Nokia-N800-51-3 2.6.21-omap1 #2 Fri Dec 7 11:17:13 EET 2007 armv6l
unknown

Looks like it's segfaulting on WMA parsing.
Comment 25 Eero Tamminen nokia 2008-05-21 19:00:39 UTC
(In reply to comment #23)
> This continues to affect me on the latest OS2008 image for N800.  I reflashed
> the device, restored my applications, and immediately upon reboot i get 
> 70-100% CPU to metalayer-crawler.

And this CPU usage never stops, not even 10-30 minutes later?
What if you don't have any card inserted, does it happen then?
(I.e. is the issue with something specific on the memory card)


(In reply to comment #24)
> Created an attachment (id=769) [details] [details]
> strace from metalayer-crawler
[...]
> Looks like it's segfaulting on WMA parsing.

Does the same file work fine in the default media player?
I.e. is this an issue in Gstreamer WMA plugin or in
crawler (+ WMA plugin combination)...

Is it looping on crashing and being re-started?
Comment 26 Paul Gear 2008-05-22 00:41:53 UTC
> ------- Comment #25 from eero.tamminen@nokia.com  2008-05-21 19:00 GMT+3 -------
> (In reply to comment #23)
>> This continues to affect me on the latest OS2008 image for N800.  I reflashed
>> the device, restored my applications, and immediately upon reboot i get 
>> 70-100% CPU to metalayer-crawler.
> 
> And this CPU usage never stops, not even 10-30 minutes later?

No.  Metalayer-crawler keeps getting restarted.

> What if you don't have any card inserted, does it happen then?

No.

> (I.e. is the issue with something specific on the memory card)

Yes, you can see the exact file name at the end of the trace.

> ...
>> Looks like it's segfaulting on WMA parsing.
> 
> Does the same file work fine in the default media player?
> I.e. is this an issue in Gstreamer WMA plugin or in
> crawler (+ WMA plugin combination)...

I don't know - i deleted the file and all is well.  I'll see if i can copy it
back to my N800 and try it again in the default media player.
Comment 27 Andre Klapper maemo.org 2008-06-04 13:06:50 UTC
According to the internal ticket this issue will be fixed in metalayer-crawler
1.3.8-1 released with Diablo (4.1).
Hence closing as fixed - please reopen if you can still reproduce this problem
with version 1.3.8-1 or newer.
Comment 28 Eero Tamminen nokia 2008-06-06 17:21:38 UTC
(In reply to comment #27)
> According to the internal ticket this issue will be fixed in
> metalayer-crawler 1.3.8-1 released with Diablo (4.1).
> Hence closing as fixed - please reopen if you can still
> reproduce this problem with version 1.3.8-1 or newer.

The internal bug is about symlinks and actually belongs to bug 1760
(moving & re-opening this one).  This bug has never been reproduced by us.
The comment 26 is the first clue to the issue, but we would need the WMA
file to find out what's wrong it so that the co. Gstreamer plugin can be
fixed.

Frantisek's rootfs-on-mmc crawler issue should be reported as
a separate bug.
Comment 29 Andre Klapper maemo.org 2008-06-10 00:28:37 UTC
Paul: If you can reproduce the issue, can you send the offending wma file by
email to me (and maybe also to Eero if he is interested)?
I guess attaching it here may be a violation of rights. *sigh*

Frantisek: Can you please file a seperate report about your rootfs-on-mmc
crawler issue and add the link as a comment here?

Eero: Thanks for clarifying. This bug seems to cover several different issues.

Thanks everybody!
Comment 30 Matt Johnston 2008-06-10 11:27:52 UTC
Given metalayer-crawler seems to hit various unplanned cases chewing memory,
would it be worth having a setrlimit(RLIMIT_DATA, ...) at the start so that at
least the tablet will remain responsive? (I assume that something restart it
after a 5 minute delay etc)
Comment 31 Eero Tamminen nokia 2008-06-10 11:35:22 UTC
(In reply to comment #30)
> Given metalayer-crawler seems to hit various unplanned cases chewing memory,
> would it be worth having a setrlimit(RLIMIT_DATA, ...) at the start so that at
> least the tablet will remain responsive? (I assume that something restart it
> after a 5 minute delay etc)

That would help only memory usage, the CPU/battery usage would still be there
as the process would abort and the SW watchdog would just restart it. I'd
prefer the actual bugs to be fixed rather that trying to kludge around them.

First we need a reproducible test-case / data though.
Comment 32 Eero Tamminen nokia 2008-06-12 19:00:50 UTC
(In reply to comment #31)
> First we need a reproducible test-case / data though.

ok, got it.  From the backtrace it seems more like an issue
in Gstreamer metadata code itself than in a VMA plugin.

Does anybody have any other problematic media files?
Comment 33 Eero Tamminen nokia 2008-06-25 12:30:39 UTC
As there were no other comments/test-cases for this, focusing this to VMA file
issues. Please file other bugs for other crawler issues (and make also them to
block bug 2602).
Comment 34 Eero Tamminen nokia 2008-07-09 15:26:31 UTC
(In reply to comment #5)
> I'm running the beta (not the leaked version) of OS2008 on an n800 hw build
> 1301.  I've been experiencing the same condition it is impossible to kill
> the process once it starts.

James, was it in "D" state i.e. disk sleep?

If yes, did any memory card reads or writes to the rootfs succeed
from other programs?  -> if not, this would be kernel issue in Chinook Beta.
Comment 35 Andre Klapper maemo.org 2008-08-08 12:54:02 UTC
James, can you answer Eero's questions?
Comment 36 Andre Klapper maemo.org 2008-09-22 16:34:54 UTC
After last weekend's Maemo Summit, it's clear that Nokia is working on an Open
Source replacement for metalayer-crawler based on Tracker, so this bug is
obsolete/invalid for Fremantle.
Hence I also don't expect much Metacrawler bugfixing for Diablo anymore, to be
realistic. This might be frustrating for Diablo users, but ressources are
unfortunately limited.

I'm going to close this report as WONTFIX for Diablo (and INVALID for
Fremantle) soon if nobody has strong objections.
Comment 37 timeless 2008-10-13 18:13:56 UTC
eero, you said the stacktrace implicated gstreamer code, is this upstream and
thus still applicable even with the indexer change?
Comment 38 Eero Tamminen nokia 2008-10-13 18:29:08 UTC
(In reply to comment #37)
> eero, you said the stacktrace implicated gstreamer code, is this upstream and
> thus still applicable even with the indexer change?

Actually it was in the metalayer code using gstreamer (and will hopefully get
fixed in next Diablo release(s)).

Other issues than the VMA one should be filed as separate bug(s) and provide a
repeatable test-case like was the case with VMA.
Comment 39 Andre Klapper maemo.org 2008-10-29 01:06:49 UTC
This will be fixed in libmetalayer 1.3.11-1 .

I'm going to close this as FIXED once I know which build version will include
this (or a newer) libmetalayer version.
Comment 40 Andre Klapper maemo.org 2008-10-30 12:50:21 UTC
Fixed in package
libmetalayer 1.3.11-1
which is part of the internal build version
diablo build x.2008.43 
(Note that 2008 is the year and the number after is the week.)

Any public update released with or after this build version will include the
fix.
Please verify that the new version fixes the bug by marking this bug report as
VERIFIED after the public update has been released and if you have some time.
Comment 41 Andre Klapper maemo.org 2008-10-30 12:53:24 UTC
Correcting - this public bug was split into two seperate, internal ones. Both
are fixed in    metalayer-crawler 1.3.19-2   not 1.3.11-1. Sorry for the noise.
Comment 42 Andre Klapper maemo.org 2008-12-17 20:20:59 UTC
Fix for this should be included in today's 5.2008.43-7 SSU update.
Please verify the fix by marking this bug report as VERIFIED if you have some
time.