Bug 2772 - Metalayer Crawler adds all oggs from Map application (with large CPU and memory usage)
: Metalayer Crawler adds all oggs from Map application (with large CPU and memo...
Status: CLOSED WONTFIX
Product: Data
metalayer-crawler
: 4.0
: All Linux
: Low enhancement (vote)
: ---
Assigned To: Felipe Contreras
: metatracker-bugs
:
: performance, use-time
:
:
  Show dependency tree
 
Reported: 2008-01-14 09:36 UTC by Tuomas Kulve
Modified: 2009-09-26 11:35 UTC (History)
4 users (show)

See Also:


Attachments
Strace output of the crawler daemon. (280.04 KB, text/plain)
2008-01-14 11:11 UTC, Tuomas Kulve
Details
A new strace output (475.26 KB, text/plain)
2008-01-14 12:40 UTC, Tuomas Kulve
Details
smaps for the crawler. (55.08 KB, text/plain)
2008-01-14 13:57 UTC, Tuomas Kulve
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description Tuomas Kulve (reporter) 2008-01-14 09:36:38 UTC
STEPS TO REPRODUCE THE PROBLEM:
1. Install ogg-support from http://maemo.org/downloads/product/OS2008/ogg/
2. Open Media Player

EXPECTED OUTCOME:
User should be able to decide not to see media files from certain directories
in the Library.

ACTUAL OUTCOME:
User is able to see over 1000 short ogg files from the Map application in the
Library.

REPRODUCIBILITY:
Always.

EXTRA SOFTWARE INSTALLED:
Ogg-support.

OTHER COMMENTS:
This same issue goes with applications including lots of mp3's or other media
files even without ogg-support package.

The crawler has been running actively now over an hour and is taking 41% of the
memory (VSZ 52316) according to the top. It has added ~1400 oggs of which most
are only 1sec long according to the Media Player.
Comment 1 Eero Tamminen nokia 2008-01-14 10:35:19 UTC
> The crawler has been running actively now over an hour and is taking 41%
> of the memory (VSZ 52316) according to the top. It has added ~1400 oggs
> of which most are only 1sec long according to the Media Player.

I've tried to track this kind of an issue in crawler, but the bug reporters
have never replied.

Could you give me following stuff from what crawler does when it behaves
like this:
    ls -l /proc/$(pidof metalayer-crawler)/fd/

And preferably also some output from:
    strace -p $(pidof metalayer-crawler)
Comment 2 Tuomas Kulve (reporter) 2008-01-14 10:40:26 UTC
I already booted the device to see what the crawler does after boot.

This is how it looked after it was finally finished:
2829 user     SWN    58576   345  0.0 46.1 metalayer-crawl

And after boot:
1232 user     SWN    16596   345  0.0 13.0 metalayer-crawl

So at least it behaves OK after booting.

I'll try removing the .meta_storage.
Comment 3 Tuomas Kulve (reporter) 2008-01-14 11:11:36 UTC
Created an attachment (id=697) [details]
Strace output of the crawler daemon.

(In reply to comment #1)
>         ls -l /proc/$(pidof metalayer-crawler)/fd/

This shows just 10 fds of which one points to the ogg currently being checked.

>         strace -p $(pidof metalayer-crawler)

I'm attaching an output from the strace. It didn't seem to show anything odd
either.
Comment 4 Eero Tamminen nokia 2008-01-14 12:31:34 UTC
> I'm attaching an output from the strace. It didn't seem to show anything odd either.

And this is from crawler that comsumed a lot of CPU and memory?

It really doesn't seem to be doing lot.  Statting directory entries,
reading meta_storage file and writing to Gconf (guessed based on FDs
and content), like it would have already indexed the files (or didn't
think them to need indexing).

One of the worst problems in first ITOS2007 was libid3 scanning
the whole mp3 when it didn't have metadata, I was guessing ogg
could have some similar problem or then something indicating
directory looping.
Comment 5 Tuomas Kulve (reporter) 2008-01-14 12:40:02 UTC
Created an attachment (id=698) [details]
A new strace output

(In reply to comment #4)
> > I'm attaching an output from the strace. It didn't seem to show anything odd either.
> 
> And this is from crawler that comsumed a lot of CPU and memory?

Yes.

> One of the worst problems in first ITOS2007 was libid3 scanning
> the whole mp3 when it didn't have metadata, I was guessing ogg
> could have some similar problem or then something indicating
> directory looping.

It's still unclear to me when the metalayer crawler finds the meta info for
oggs but it does find them sometimes. And these files most likely doesn't
include the meta info. 

I'm attaching a new strace output. Now the crawler takes 48M. Lots of mime
related files in the output.
Comment 6 Tuomas Kulve (reporter) 2008-01-14 13:10:35 UTC
This goes a bit OT, but..

I was about to install sp-endurance but:

Recommended packages:
  sp-smaps-measure

Package sp-smaps-measure is not available, but is referred to by another
package.

And:

- SMAPS data
 -> skipped as sp_smaps_snapshot is missing

I think the smaps data might be interesting in this case?

And a site note:

Nokia-N810-50-2:~# save-incremental-endurance-stats --help
Saving to --help/101:
mkdir: unrecognized option `--help/101'

;)
Comment 7 Eero Tamminen nokia 2008-01-14 13:34:10 UTC
Please file a separate bug about the sp-endurance (under tools).

But anyway, I think crawler memory usage is from heap (and based
on your earlier comments it's not leaking FDs), so sp-endurance
or smaps probably really tell anything new.  You might attach
/proc/$(pidof metalayer-crawler)/smaps file though so that this
can be verified. :-)

You can monitor the heap usage with something like:
  grep -A 6 heap /proc/$(pidof metalayer-crawler)/smaps


> I'm attaching a new strace output. Now the crawler takes 48M. Lots of mime
> related files in the output.

Yes, it seems to be re-reading it quite a lot. I guess this happens in
the crawler code itself and it could cache it as there shouldn't be
that many mime-types(?).

It seems that first it reads the same ogg (for example:
"/media/mmc2/map/navicore/sounds/sound_fra/Roxanne Jean (CAN)/sright.ogg")
several times, them it does a lot of mime stuff access/parsing, then opens
the ogg one more time, writes SQLite journal about it and then commits
(writes) that to meta-storate.

And this happens if not every, at least to many of the ogg files. 1400 files
is quite a lot of files, but >1 hour means that it takes to 3 secs to handle
each ogg. Mime parsing for each ogg and parsing each ogg multiple times
with SQLite commits per each ogg could actually explain that.
Comment 8 Tuomas Kulve (reporter) 2008-01-14 13:57:37 UTC
Created an attachment (id=699) [details]
smaps for the crawler.

(In reply to comment #7)
> Please file a separate bug about the sp-endurance (under tools).

bug #2774.

> or smaps probably really tell anything new.  You might attach
> /proc/$(pidof metalayer-crawler)/smaps file though so that this
> can be verified. :-)

Attached.

> You can monitor the heap usage with something like:
>   grep -A 6 heap /proc/$(pidof metalayer-crawler)/smaps

It's now finished and that looks like this:

00017000-00cb8000 rwxp 00017000 00:00 0          [heap]
Size:             12932 kB
Rss:              12844 kB
Shared_Clean:         0 kB
Shared_Dirty:         0 kB
Private_Clean:        0 kB
Private_Dirty:    12844 kB


> 
> And this happens if not every, at least to many of the ogg files. 1400 files
> is quite a lot of files, but >1 hour means that it takes to 3 secs to handle
> each ogg. Mime parsing for each ogg and parsing each ogg multiple times
> with SQLite commits per each ogg could actually explain that.

There are >1900 oggs included in Map, so all n810s have those.

It doesn't do that very intensively but sleeps every now and then (to avoid
hogging resources?) so the wall clock time doesn't say much.
Comment 9 Eero Tamminen nokia 2008-01-14 14:37:57 UTC
(In reply to comment #8)
> > You can monitor the heap usage with something like:
> >   grep -A 6 heap /proc/$(pidof metalayer-crawler)/smaps
> 
> It's now finished and that looks like this:
> 
> 00017000-00cb8000 rwxp 00017000 00:00 0          [heap]
> Size:             12932 kB
> Rss:              12844 kB
> Shared_Clean:         0 kB
> Shared_Dirty:         0 kB
> Private_Clean:        0 kB
> Private_Dirty:    12844 kB

In original report you said "VSZ 52316", I wonder where
the rest goes (about same about of size goes to shared
libs).  Does it do some larger allocs too?

This would show sum of all private memory:
awk '/^Private_Dirty/{m+=$2}END{print m} /proc/$(pidof metalayer-crawler)/smaps
Comment 10 Tuomas Kulve (reporter) 2008-01-14 14:49:08 UTC
(In reply to comment #9)

> In original report you said "VSZ 52316", I wonder where
> the rest goes (about same about of size goes to shared

Now:

Top:
56596   345  0.0 44.5 metalayer-crawl

smaps:
Size:             12932 kB


> libs).  Does it do some larger allocs too?


Nokia-N810-50-2:~# awk '/^Private_Dirty/{m+=$2}END{print m}' /proc/$(pidof
metalayer-crawler)/smaps
22692
Comment 11 Eero Tamminen nokia 2008-01-14 15:52:49 UTC
(In reply to comment #10)
> Top:
> 56596   345  0.0 44.5 metalayer-crawl
> 
> smaps:
> Size:             12932 kB
> 
> Nokia-N810-50-2:~# awk '/^Private_Dirty/{m+=$2}END{print m}' /proc/$(pidof
> metalayer-crawler)/smaps
> 22692

Could you attach the whole smaps file?  It seems to be dirtying 8MB
more memory from somewhere else than heap.
Comment 12 Tuomas Kulve (reporter) 2008-01-14 16:07:34 UTC
(In reply to comment #11)
> Could you attach the whole smaps file?  It seems to be dirtying 8MB
> more memory from somewhere else than heap.

Didn't I do that in comment #8? The attachment #699 [details]?
Comment 13 Eero Tamminen nokia 2008-01-14 16:23:02 UTC
> Didn't I do that in comment #8? The attachment #699 [details] [details]?

So it seems, thanks. :o)

40a00000-40f00000 rw-p 40a00000 00:00 0 
Size:              5120 kB
Rss:               5120 kB
Shared_Clean:         0 kB
Shared_Dirty:         0 kB
Private_Clean:        0 kB
Private_Dirty:     5120 kB

42e00000-43229000 rw-p 42e00000 00:00 0 
Size:              4260 kB
Rss:               4260 kB
Shared_Clean:         0 kB
Shared_Dirty:         0 kB
Private_Clean:        0 kB
Private_Dirty:     4260 kB

Could these be something that ogg/vorbis/gstreamer combination allocates
(and dirties), they are not there without oggs?
Comment 14 Tuomas Kulve (reporter) 2008-01-14 16:28:10 UTC
(In reply to comment #13)
> Could these be something that ogg/vorbis/gstreamer combination allocates
> (and dirties), they are not there without oggs?

I don't have a clue how the crawler works. 

This is what ogg-support package adds to crawler:

Nokia-N810-50-2:~# grep ogg /usr/share/libmetalayer/metadata_lib.conf 
ogg libmtext_gst

I tried asking what does that actually mean but I never got any replies.
Comment 15 Eero Tamminen nokia 2008-01-14 16:44:08 UTC
(In reply to comment #14)
> This is what ogg-support package adds to crawler:
> 
> Nokia-N810-50-2:~# grep ogg /usr/share/libmetalayer/metadata_lib.conf 
> ogg libmtext_gst
> 
> I tried asking what does that actually mean but I never got any replies.

Based on the "Extractors" comment on that conf file and:
# dpkg -S libmtext_gst
libmetalayer0: /usr/lib/libmtext_gst.so.0

I would guess that is a crawler library for extracting media file
information using Gstreamer.  There seems to be other formats listed
also as requiring libmtext_gst, so if that leaks (or otherwise has
large memory usage), those other file formats are going to have
the same issue too.
Comment 16 Eero Tamminen nokia 2008-01-14 18:31:23 UTC
Crawler could support something like ".noindex" file for skipping certain 
directories from scanning.  Map could then use that, but it would need
to be a maemo standard so that also 3rd party crawlers would use it.

Is there any (open source or other) standard for skipping directories from
scanning?
Comment 17 Andre Klapper maemo.org 2008-09-22 16:34:57 UTC
After last weekend's Maemo Summit, it's clear that Nokia is working on an Open
Source replacement for metalayer-crawler based on Tracker, so this bug is
obsolete/invalid for Fremantle.
Hence I also don't expect much Metacrawler bugfixing for Diablo anymore, to be
realistic. This might be frustrating for Diablo users, but ressources are
unfortunately limited.

I'm going to close this report as WONTFIX for Diablo (and INVALID for
Fremantle) soon if nobody has strong objections.
Comment 18 Andre Klapper maemo.org 2008-11-06 21:32:20 UTC
Tuomas, I assume you've added the option
     Settings > Control Panel > Extras > Ignore Maps' OGGs
because of that to ogg-support?

Anyway, WONTFIX as we switch to Tracker for Fremantle.
Comment 19 Tuomas Kulve (reporter) 2008-11-07 07:59:15 UTC
(In reply to comment #18)
> Tuomas, I assume you've added the option
>      Settings > Control Panel > Extras > Ignore Maps' OGGs
> because of that to ogg-support?

Yes.

> Anyway, WONTFIX as we switch to Tracker for Fremantle.

..
Comment 20 Tuomas Kulve (reporter) 2009-09-26 11:35:01 UTC
I'm closing this bug as it's already verified.