Bug 3357 - Modest lingers in memory using 100% CPU
: Modest lingers in memory using 100% CPU
Status: RESOLVED FIXED
Product: Email
General
: 4.1 (4.2008.23-14)
: ARM Maemo
: High major with 11 votes (vote)
: 4.1.1
Assigned To: unassigned
: modest-bugs
:
: moreinfo, use-time
: int-86286
:
  Show dependency tree
 
Reported: 2008-07-01 11:08 UTC by luarvique
Modified: 2008-12-06 15:24 UTC (History)
3 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description luarvique (reporter) 2008-07-01 11:08:53 UTC
SOFTWARE VERSION:
Latest 4.1 firmware from Nokia (aka Diablo).

STEPS TO REPRODUCE THE PROBLEM:
Create two email accounts: a GMail IMAP account and an arbitrary POP3 account.
Look at the process list regularly, using htop, top, or CPU load applet.

EXPECTED OUTCOME:
Modest should not be running, except when it is fetching email.

ACTUAL OUTCOME:
Modest is running and usually (but not always) using 100% CPU.

REPRODUCIBILITY:
sometimes

EXTRA SOFTWARE INSTALLED:

OTHER COMMENTS:

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9)
Gecko/2008052906 Firefox/3.0
Comment 1 Andre Klapper maemo.org 2008-07-01 14:52:37 UTC
Thanks for reporting this!

I assume you waited for 10 minutes to make sure that this is wasn't any time
taking resource freeing?

Does this also happen if you are not using SSL?
Can you provide us a wireshark or pcap log?

Can you maybe strace on the hanging process and post the information here?

Can you kill the hanging process with signal=11 to provide us with a core dump?
If i remember correctly it requires installing the "sp-rich-core" package from
the tools repository[1] and creating a "core-dumps" directory to the memory
card with at least few megabytes of free space. You can get core dumps also
without the rich-core package, but they are much less useful.
[1] http://maemo.org/development/tools/ 

Any of these would be helpful to track down the problem. Thanks in advance!



(Note to myself: int-66200 looks similar but has been fixed before Diablo.)
Comment 2 luarvique (reporter) 2008-07-01 16:02:26 UTC
> I assume you waited for 10 minutes to make sure that this is wasn't any time
> taking resource freeing?
Waited for a night. Woke up, found that the tablet is warm to the
touch, looked at CPU Load Applet process list, and found Modest eating
318% CPU (yes, the load applet is kind of imprecise about it).
Encountered this bug several more times since then.

> Does this also happen if you are not using SSL?
I do not know: Google account settings default to SSL.

> Can you provide us a wireshark or pcap log?
I can try, but do not expect to find anything interesting there: it
does not look like Modest is trying to send anything while hanging.

> Can you maybe strace on the hanging process and post the information here?
Do not have strace installed, afaik.

> Can you kill the hanging process with signal=11 to provide us with a core dump?
> If i remember correctly it requires installing the "sp-rich-core" package from
> the tools repository[1] and creating a "core-dumps" directory to the memory
> card with at least few megabytes of free space. You can get core dumps also
> without the rich-core package, but they are much less useful.
> [1] http://maemo.org/development/tools/
Will try tonight but can't promise.
Comment 3 Andre Klapper maemo.org 2008-07-01 16:26:16 UTC
(In reply to comment #2)
> Do not have strace installed, afaik.
Can you please install strace? Also see http://maemo.org/development/tools/

> Will try tonight but can't promise.
Thanks in advance!
Comment 4 Kamen Bundev 2008-07-06 14:15:14 UTC
Can confirm that, happened to me at least 5-6 times. IMAP to GMail. Hangs until
killed. Will try to investigate more next time.

Thread about it:
http://www.internettablettalk.com/forums/showthread.php?t=21640
Comment 5 Danilo Câmara 2008-07-06 17:10:51 UTC
Happens to me with just one IMAP GMail account. Already installed strace and
sp-rich-core packages. Will try to get the strace output and core-dump next
time.

By the way, I created core-dumps in the root (/) directory owned by root. Do I
need to run strace and also kill the process as root?
Comment 6 Danilo Câmara 2008-07-06 17:11:49 UTC
*** This bug has been confirmed by popular vote. ***
Comment 7 Eero Tamminen nokia 2008-07-07 10:25:16 UTC
(In reply to comment #5)
> Happens to me with just one IMAP GMail account. Already installed strace and
> sp-rich-core packages. Will try to get the strace output and core-dump next
> time.
> 
> By the way, I created core-dumps in the root (/) directory owned by root.

The "core-dumps" directory for core dumps should be on the *memory card* root,
that's how the core pattern has been configured from /proc/ in the devices.
I.e. just having that directory "enables" core dumping.

If you attach core dumps to here, you should install "sp-rich-core" package,
as we need "rich cores" to:
- get system information to the core dumps (exact release version,
  installed packages, memory usage, open files etc)
- for core dumps to be compressed (core dump includes whole process
  memory area, not just memory used/written by it.  As Modest uses
  threads, uncompressed cores may in worst case be hundreds of MBs)


If you're worried that that core-dump gives out private information
(e.g. email account & password) for us, you can use the core dump to
get the backtrace yourself.  For that you don't need sp-rich-core,
but you need to install gdb and debug packages. Easiest that is done
by installing "maemo-debug-scripts" (I think it's in SDK repository?)
and then doing:
    debug-dep-install /usr/bin/maemo-launcher /usr/bin/modest.launch

(without debug symbols backtraces either don't work or give bogus information)

And then:
    gdb /usr/bin/maemo.launcher <Modest core dump>


Note: With normal cores the core dump is named after maemo-launcher
as that was the actual running binary (the core dump named "modest"
is the maemo-invoker symlinked to "/usr/bin/modest").  sp-rich-core
takes care of dumping only the relevant one (launcher) and with name
(in this case "modest") that makes sense to user...

But from "rich cores" the actual core dump needs to be extracted with
rich-core-extract utility that you get from sp-rich-core-postproc package,
see:
    http://maemo.org/development/tools/


As this can be a bit confusing and is a bit more work, I would suggest
trying strace first:
    strace -p <PID using most CPU>

If that doesn't give any output (which means that
Modest is busylooping by itself), then core dump is needed.


> Do I need to run strace and also kill the process as root?

As the same user under which the process itself runs or as root.
You can do the killing e.g. like this:
    kill -SIGABRT $(pidof modest|cut -d' ' -f1)
Comment 8 Danilo Câmara 2008-07-07 17:42:19 UTC
I ran 'strace' as user and 'kill' as root (in another terminal) but wasn't able
to get the core dump. Output was:

$ top
Mem: 124716K used, 2080K free, 0K shrd, 7548K buff, 40424K cached
Load average: 1.01 1.01 0.99
  PID USER     STATUS   VSZ  PPID %CPU %MEM COMMAND
 9807 user     SW     70416  7075 98.2 55.4 maemo-launcher

$ ps | grep 9807
 9807 user      70416 SW  /usr/bin/modest

$ strace -p 9807
Process 9807 attached - interrupt to quit
futex(0x4017f07c, FUTEX_WAIT, 2, NULL)  = -1 EINTR (Interrupted system call)
--- SIGABRT (Aborted) @ 0 (0) ---
Process 9807 detached

# kill -SIGABRT $(pidof modest|cut -d' ' -f1)
(no output)

I am wondering if I need a reboot after creating /core-dumps directory or if it
isn't in the right place. Where exactly is "*memory card* root" on the N810?
'mount' and 'df' output is:

# mount
rootfs on / type rootfs (rw)
/dev/root on /mnt/initfs type jffs2 (ro)
none on /mnt/initfs/proc type proc (rw)
none on /mnt/initfs/sys type sysfs (rw)
none on /mnt/initfs/tmp type tmpfs (rw)
/dev/mtdblock4 on / type jffs2 (rw,rpsize=1024,rpuid=0,rpuid=30000)
none on /tmp type tmpfs (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
none on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
/dev/mmcblk0p1 on /media/mmc2 type ext3 (rw,data=ordered)
/dev/mmcblk1p1 on /media/mmc1 type vfat
(rw,nosuid,nodev,noexec,uid=29999,fmask=0133,dmask=0000,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8)

# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/mtdblock4            4096      2620      1476  64% /mnt/initfs
none                       512       188       324  37% /mnt/initfs/tmp
/dev/mtdblock4          255488    141500    113988  55% /
none                       512       188       324  37% /tmp
none                      1024        60       964   6% /dev
tmpfs                     1024         0      1024   0% /dev/shm
/dev/mmcblk0p1         1975496    405404   1469744  22% /media/mmc2
/dev/mmcblk1p1         5958780   4986112    972668  84% /media/mmc1
Comment 9 Eero Tamminen nokia 2008-07-08 09:29:32 UTC
(In reply to comment #8)
> I ran 'strace' as user and 'kill' as root (in another terminal)

Thanks.


> but wasn't able to get the core dump. Output was:
> $ top
> Mem: 124716K used, 2080K free, 0K shrd, 7548K buff, 40424K cached
> Load average: 1.01 1.01 0.99
>   PID USER     STATUS   VSZ  PPID %CPU %MEM COMMAND
>  9807 user     SW     70416  7075 98.2 55.4 maemo-launcher
> 
> $ ps | grep 9807
>  9807 user      70416 SW  /usr/bin/modest
> 
> $ strace -p 9807
> Process 9807 attached - interrupt to quit
> futex(0x4017f07c, FUTEX_WAIT, 2, NULL)  = -1 EINTR (Interrupted system call)

This means that process sleeps infinitely until the given address
has value of 2.  I.e. the process cannot take CPU while it's in
this syscall.

Wasn't there any other output from strace besides that one line you pasted
here?

(You could also try "ltrace -S -p <pid>" instead of strace)


> --- SIGABRT (Aborted) @ 0 (0) ---
> Process 9807 detached

Was this ABRT from killing the process?
(you can interrupt strace/ltrace before doing the killing)


> # kill -SIGABRT $(pidof modest|cut -d' ' -f1)
> (no output)
> 
> I am wondering if I need a reboot after creating /core-dumps directory
> or if it isn't in the right place.

No need to reboot.  If you don't have sp-rich-core installed, you can see
the exact place where the directory should be from:
    cat /proc/sys/kernel/core_pattern

If you've installed sp-rich-core, then core_pattern just notes that
core data is piped to "rich-core-dumper" (script) which will check which
of the memory cards (/media/mmc[12]) has "core-dumps" directory and
at least several MBs of free space.

Also "ulimit -c" (core dump size) should report "unlimited".
(and cards be writable like yours are according to mount output)


Where exactly is "*memory card* root" on the N810?
> 'mount' and 'df' output is:
[...]
> /dev/mmcblk0p1         1975496    405404   1469744  22% /media/mmc2
> /dev/mmcblk1p1         5958780   4986112    972668  84% /media/mmc1

in the mmc[12] directories.  In File manager you could just tap to
memory card icon on the left and create the "core-dumps" dir there.
Comment 10 Danilo Câmara 2008-07-08 18:31:22 UTC
I left the device in stand-by (connected to the charger) and this morning again
modest where in 100% CPU.

I noticed that I couldn't connect from the desktop to the device in SSH. This
happen the other time too, but I thought it was because the device was out of
resources.

(In reply to comment #9)
> > $ strace -p 9807
> > Process 9807 attached - interrupt to quit
> > futex(0x4017f07c, FUTEX_WAIT, 2, NULL)  = -1 EINTR (Interrupted system call)
> 
> This means that process sleeps infinitely until the given address
> has value of 2.  I.e. the process cannot take CPU while it's in
> this syscall.

Well... it was taking CPU.

Today strace output was:

$ strace -p 11342
Process 11342 attached - interrupt to quit
futex(0x4017f07c, FUTEX_WAIT, 2, NULL <unfinished ...>
Process 11342 detached

This time I interrupted strace with ^C

> 
> Wasn't there any other output from strace besides that one line you pasted
> here?
> 
> (You could also try "ltrace -S -p <pid>" instead of strace)

ltrace wasn't installed and network was not available (as I said above about
SSH). I decided not to check the network and continue.

> > --- SIGABRT (Aborted) @ 0 (0) ---
> > Process 9807 detached
> 
> Was this ABRT from killing the process?

Yes

> Where exactly is "*memory card* root" on the N810?
> > 'mount' and 'df' output is:
> [...]
> > /dev/mmcblk0p1         1975496    405404   1469744  22% /media/mmc2
> > /dev/mmcblk1p1         5958780   4986112    972668  84% /media/mmc1
> 
> in the mmc[12] directories.  In File manager you could just tap to
> memory card icon on the left and create the "core-dumps" dir there.

Done that. This time I got the core-dump.

Although I changed my e-mail password for this test (so I could change back
after the core-dump), my previous password (used elsewhere) is also in the
core-dump. I could remove it, but wouldn't be sure what other information would
be left.

I will reflash the device again and create a new GMail account with IMAP
access. I think the problem will persist.
Comment 11 Eero Tamminen nokia 2008-07-09 11:48:07 UTC
(In reply to comment #10)
> > > futex(0x4017f07c, FUTEX_WAIT, 2, NULL)
> > 
> > This means that process sleeps infinitely until the given address
> > has value of 2.  I.e. the process cannot take CPU while it's in
> > this syscall.
> 
> Well... it was taking CPU.

Hm.  As Modest is threading, I'm wondering whether it's possible that
strace somehow isn't showing the information for all threads or attaches
only to one.


Could you install also "gdb" (from same place as "strace"):
    apt-get install gdb

and debug symbols required by it for getting the thread information:
    apt-get install libc6-dbg
(I think for libc6-dbg you need to enable SDK repository temporarily)

After this you can remove some unnecessary files to save several
MBs of disk space (rest of gdb and debug symbols take about 2MB
of disk as JFFS2 compresses it):
    rm /usr/bin/gdbtui
    rm /usr/lib/debug/*.*


Then please do the following and paste the "info threads" output here:
    gdb /usr/bin/maemo-launcher $(pidof modest|cut -d' ' -f1)
...
    (gdb) info threads
?

Thanks!


Note: Gdb cannot give (correct) backtraces without additional debug symbols
but with all the debug symbols for Modest Gdb would need more RAM than is
available in the device.  So let's start with "info threads"...
Comment 12 Danilo Câmara 2008-07-10 14:58:50 UTC
I created a test GMail account and set a server to send a hourly mail to it.
After more than 24h without a problem in modest, I switch back to my personal
account. The problem may be related to my GMail account. Maybe because it has
thousands of mails in INBOX, but I use the client without any problems to read
my mail... until the background process get 100% CPU.

Today's output:

# top
Mem: 108520K used, 18276K free, 0K shrd, 196K buff, 47168K cached
Load average: 1.08 1.02 0.99
  PID USER     STATUS   VSZ  PPID %CPU %MEM COMMAND
 2999 user     SW     70400   983 98.2 55.4 maemo-launcher

# ps | grep 2999
 2999 user      70400 SW  /usr/bin/modest

# strace -p 2999
Process 2999 attached - interrupt to quit
futex(0x4017f07c, FUTEX_WAIT, 2, NULL <unfinished ...>
Process 2999 detached

# ltrace -S -p 2999
SYS_futex(0x4017f07c, 0, 2, 0, 0x4017f07c)       = -4

# gdb /usr/bin/maemo-launcher 2999
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
Attaching to program: /usr/bin/maemo-launcher, process 2999
/home/bifh1/rx34-osso1.1-arm-prereleased.gcc34qemu.distcc/work/gdb-6.6.dfsg/gdb/linux-nat.c:1026:
internal-error: linux_nat_attach: Assertion `pid == GET_PID (inferior_ptid) &&
WIFSTOPPED (status) && WSTOPSIG (status) == SIGSTOP' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

/home/bifh1/rx34-osso1.1-arm-prereleased.gcc34qemu.distcc/work/gdb-6.6.dfsg/gdb/linux-nat.c:1026:
internal-error: linux_nat_attach: Assertion `pid == GET_PID (inferior_ptid) &&
WIFSTOPPED (status) && WSTOPSIG (status) == SIGSTOP' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) y

A program is being debugged already.  Kill it? (y or n) n
Program not killed.
(gdb) info threads
Cannot access memory at address 0x2750
warning: Couldn't restore frame in current thread, at frame 0
0x400d2b00 in ?? ()
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/bin/maemo-launcher, process 2999
Comment 13 Eero Tamminen nokia 2008-07-10 16:37:33 UTC
/home/bifh1/rx34-osso1.1-arm-prereleased.gcc34qemu.distcc/work/gdb-6.6.dfsg/gdb/linux-nat.c:1026:
> internal-error: linux_nat_attach: Assertion `pid == GET_PID (inferior_ptid) &&
> WIFSTOPPED (status) && WSTOPSIG (status) == SIGSTOP' failed.

Could you try "info threads" also on core dump?  Instead of the process
PID, just give Gdb the core dump file name.

And also check how many threads Modest has and their statuses:
    egrep '^S[lt]' /proc/PID/task/*/status
?


> I created a test GMail account and set a server to send a hourly mail to it.
> After more than 24h without a problem in modest, I switch back to my personal
> account. The problem may be related to my GMail account. Maybe because it
> has thousands of mails in INBOX, but I use the client without any problems
> to read my mail... until the background process get 100% CPU.

Thanks for looking into this, I know it's pretty large effort, so it's
appreciated!

If Gdb doesn't work on core dump either, then we don't have any means to
find out where in code this bug happens you.  Any additional information
you might notice about pinpointing the exact thing triggering this issue
can help.
Comment 14 Kamen Bundev 2008-07-10 16:45:26 UTC
I keep only several mails in my Inbox, so probably its not related to the
number of mails, but maybe their content. 

It didn't happen for me in a while, so sorry that i can't help on that.
Comment 15 Danilo Câmara 2008-07-10 17:34:10 UTC
After running gdb (above) the process returned back to 0% CPU. Still PID 2999,
I didn't kill it.

# egrep '^S[lt]' /proc/2999/task/*/status
/proc/2999/task/2999/status:State:    T (stopped)
/proc/2999/task/2999/status:SleepAVG:    91%
/proc/2999/task/3006/status:State:    T (stopped)
/proc/2999/task/3006/status:SleepAVG:    91%
/proc/2999/task/3007/status:State:    T (stopped)
/proc/2999/task/3007/status:SleepAVG:    0%

... but maybe you want it when in 100% CPU

I have two core-dumps in the device (GMT-3 time):

# ls -l /media/mmc1/core-dumps/
-rw-r--r--    1 user     root       158571 Jul 10 08:40 gdb-6-3051.rcore.lzo
-rw-r--r--    1 user     root      1077867 Jul  9 19:45
modest-11-1941.rcore.lzo

I don't know about modest-11-1941.rcore.lzo. It's not the one I created with
'kill -SIGABRT' in comment #8.

How to run gdb with the core dump file? I tried:

# gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/gdb-6-3051.rcore.lzo 
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
"/media/mmc1/core-dumps/gdb-6-3051.rcore.lzo" is not a core dump: File format
not recognized
(gdb) info threads
No registers.
(gdb) quit

# gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/modest-11-1941.rcore.lzo 
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
"/media/mmc1/core-dumps/modest-11-1941.rcore.lzo" is not a core dump: File
format not recognized
(gdb) info threads
No registers.
(gdb) quit

Even with the lzo file decompressed I got the same message: "File format not
recognized"
Comment 16 Eero Tamminen nokia 2008-07-10 17:56:25 UTC
(In reply to comment #15)
> After running gdb (above) the process returned back to 0% CPU.

0% CPU even after you detached/quit Gdb?


> Still PID 2999, I didn't kill it.
> 
> # egrep '^S[lt]' /proc/2999/task/*/status
> /proc/2999/task/2999/status:State:      T (stopped)
> /proc/2999/task/2999/status:SleepAVG:   91%
> /proc/2999/task/3006/status:State:      T (stopped)
> /proc/2999/task/3006/status:SleepAVG:   91%
> /proc/2999/task/3007/status:State:      T (stopped)
> /proc/2999/task/3007/status:SleepAVG:   0%
>
> ... but maybe you want it when in 100% CPU

Yes, please.  T would indicate you're still tracing the process
with something (strace, gdb...) and that's why it's stopped.

The last thread has SleepAVG 0%, so that might have been the
busy one.   Could you next time check with the above grep
command which of the threads is not sleeping, and then strace
that, or even all of them (you can give multiple -p options
for strace)?


> How to run gdb with the core dump file? I tried:
> 
> # gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/gdb-6-3051.rcore.lzo 
[...]
> "/media/mmc1/core-dumps/modest-11-1941.rcore.lzo" is not a core dump: File
> format not recognized
> (gdb) info threads
> No registers.
> (gdb) quit
> 
> Even with the lzo file decompressed I got the same message: "File format not
> recognized"

Sorry, I forgot to mention about that.  You can extract the rcore.lzo
file contents with "rich-core-extract" tool from the sp-rich-core-postproc
package.

(If you look at the decompressed .rcore file, you can see that the file
is human readable and has very simple structure, core dump data is at
the end of it.)
Comment 17 Eric Warnke maemo.org 2008-07-10 19:03:34 UTC
People are correct that futexs should not behave like this.  It turns out that
futex + application that segfaults = fail unless you have a robust futex
implementation.

http://lxr.linux.no/linux+v2.6.21/Documentation/robust-futexes.txt

Unfortunately it looks like arm did not pick up the requisite functions until
2.6.22.

http://lkml.org/lkml/2007/7/27/198

I have confirmed that the requisite kernel interfaces for robust futexes have
not been backported to the diablo kernel. 

http://slexy.org/view/s216exSepG

This may also be responsible for other odd battery draining issues like...

http://www.internettablettalk.com/forums/showthread.php?t=21736
Comment 18 Faheem Pervez maemo.org 2008-07-10 20:37:26 UTC
I took a quick look at that patch and formatted it so you can apply it straight
onto kernel-source-diablo.

http://pastebin.com/f6126b483

Dunno if the actual patch works on 2.6.21 though.
Comment 19 Eric Warnke maemo.org 2008-07-10 23:24:26 UTC
Here is some additional links to back up the idea that this could be a kernel
bug.

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg177768.html

http://readlist.com/lists/vger.kernel.org/linux-kernel/93/468688.html

"Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default
implementation returns -ENOSYS, which is currently not handled inside of the
futex guts.

Futex PI calls and robust list exits with a held futex result in an endless
loop in the futex code on architectures which have no support."

http://lists.openwall.net/linux-kernel/2007/07/30/484

"I've studied the relevant futex code (handle_futex_death()
and asm-${arch}/futex.h), and it seems that many archs
are seriously broken now, not just arm. Any arch that wires
up the set_robust_list() syscall while implementing its
asm-${arch}/futex.h by including asm-generic/futex.h allows
its kernels to be hung by unprivileged user-space code."
Comment 20 Danilo Câmara 2008-07-11 05:48:38 UTC
With the two coredumps I have the outputs are:

# gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/modest-6-11342/coredump 
...
Core was generated by `/usr/bin/modest                                         
             '.
Program terminated with signal 6, Aborted.
#0  0x400d2b00 in free () from /lib/libc.so.6
(gdb) info threads
  3 process 11349  0x00000000 in ?? ()
  2 process 11350  0x00000000 in ?? ()
* 1 process 11342  0x400d2b00 in free () from /lib/libc.so.6
(gdb) quit


# gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/modest-11-1941/coredump 
...
Core was generated by `/usr/bin/modest                                         
             '.
Program terminated with signal 11, Segmentation fault.
#0  0x400d1f48 in malloc_consolidate () from /lib/libc.so.6
(gdb) info threads
  3 process 1948  0x00000000 in ?? ()
  2 process 1949  0x00000000 in ?? ()
* 1 process 1941  0x400d1f48 in malloc_consolidate () from /lib/libc.so.6
(gdb) quit
Comment 21 Eero Tamminen nokia 2008-07-11 11:53:03 UTC
(In reply to comment #17)
> People are correct that futexs should not behave like this.  It turns out that
> futex + application that segfaults = fail unless you have a robust futex
> implementation.
[...]
> I have confirmed that the requisite kernel interfaces for robust futexes have
> not been backported to the diablo kernel. 
[...]
> "Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default
> implementation returns -ENOSYS, which is currently not handled inside of the
> futex guts.
> 
> Futex PI calls and robust list exits with a held futex result in an endless
> loop in the futex code on architectures which have no support."
[...]
> "I've studied the relevant futex code (handle_futex_death()
> and asm-${arch}/futex.h), and it seems that many archs
> are seriously broken now, not just arm. Any arch that wires
> up the set_robust_list() syscall while implementing its
> asm-${arch}/futex.h by including asm-generic/futex.h allows
> its kernels to be hung by unprivileged user-space code."

While application crashes and kernel freezes are worrying,
the symptom in this particular seems different.  Modest
just freezes, it doesn't crash (the core dumps are forced
manually to get backtraces).

Could you file a separate bug about kernel futex issue (with
crash and use-time keywords) if there isn't one yet?  Thanks!

(If it turns out that it after all causes also this issue,
we can add a dependency)


> This may also be responsible for other odd battery draining issues like...

Battery drain has also many other potential reasons:
- bad 3rd party applet/application not idling when not used
- buggy WLAN access point (not implementing power management properly)
  - setting device to offline mode when not in use helps
- Keeping device in RD mode
- Corrupted memory card
- Metalayer crawler related bugs (only currently known is about gstreamer/WMA)
etc.
Comment 22 Eero Tamminen nokia 2008-07-11 12:00:29 UTC
(In reply to comment #20)
> With the two coredumps I have the outputs are:
> 
> # gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/modest-6-11342/coredump 
> ...
> Core was generated by `/usr/bin/modest                                         
>              '.
> Program terminated with signal 6, Aborted.
> #0  0x400d2b00 in free () from /lib/libc.so.6
> (gdb) info threads
>   3 process 11349  0x00000000 in ?? ()
>   2 process 11350  0x00000000 in ?? ()
> * 1 process 11342  0x400d2b00 in free () from /lib/libc.so.6
> (gdb) quit

Did the process have only three threads according to /proc/PID/task/*
when you forced it to abort?  (hm. why the 3rd & 2nd thread addresses
are zero...)


> # gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/modest-11-1941/coredump 
> ...
> Core was generated by `/usr/bin/modest                                         
>              '.
> Program terminated with signal 11, Segmentation fault.
> #0  0x400d1f48 in malloc_consolidate () from /lib/libc.so.6
> (gdb) info threads
>   3 process 1948  0x00000000 in ?? ()
>   2 process 1949  0x00000000 in ?? ()
> * 1 process 1941  0x400d1f48 in malloc_consolidate () from /lib/libc.so.6
> (gdb) quit

There's a separate bug 3311 about Modest crashes like this.
Comment 23 Danilo Câmara 2008-07-11 14:44:31 UTC
(In reply to comment #22)
> (In reply to comment #20)
> > With the two coredumps I have the outputs are:
> > 
> > # gdb /usr/bin/maemo-launcher /media/mmc1/core-dumps/modest-6-11342/coredump 
> > ...
> > Core was generated by `/usr/bin/modest                                         
> >              '.
> > Program terminated with signal 6, Aborted.
> > #0  0x400d2b00 in free () from /lib/libc.so.6
> > (gdb) info threads
> >   3 process 11349  0x00000000 in ?? ()
> >   2 process 11350  0x00000000 in ?? ()
> > * 1 process 11342  0x400d2b00 in free () from /lib/libc.so.6
> > (gdb) quit
> 
> Did the process have only three threads according to /proc/PID/task/*
> when you forced it to abort?  (hm. why the 3rd & 2nd thread addresses
> are zero...)

At the time I got this core-dump (Comment  #10) I wasn't aware to check
/proc/PID/task/*.


Since I installed modest in Chinook, it used to crash once a couple of days. I
didn't bored too much since I finally could have GMail with IMAP. I don't
remember why, but I guess the previous e-mail client had issues with GMail
IMAP.

This night modest crashed again. Message box: "(i) Internal error. Application
'E-mail' closed." I've got another 'signal 11, Segmentation fault' core-dump.
Finally with the /core-dumps directory, I'll get this logged.

The 100% CPU issue is worse because it drains the battery. Only once in Chinook
I woke up with the N810 dead with flat battery. I'm sure it was fully charged
at night but not sure what happened that time. The first time this happened
after I installed Diablo, is that I decided to investigate and keep the device
in the charger every time.
Comment 24 Danilo Câmara 2008-07-12 06:10:40 UTC
modest (PID 1498) at 100% CPU:

# egrep '^S[lt]' /proc/1498/task/*/status
/proc/1498/task/1498/status:State:    S (sleeping)
/proc/1498/task/1498/status:SleepAVG:    91%
/proc/1498/task/1505/status:State:    S (sleeping)
/proc/1498/task/1505/status:SleepAVG:    91%
/proc/1498/task/1506/status:State:    R (running)
/proc/1498/task/1506/status:SleepAVG:    0%

# strace -p 1506
Process 1506 attached - interrupt to quit
Process 1506 detached

# strace -p 1505
Process 1505 attached - interrupt to quit
futex(0x4017f07c, FUTEX_WAIT, 2, NULL <unfinished ...>
Process 1505 detached

# strace -p 1498
Process 1498 attached - interrupt to quit
futex(0x4017f07c, FUTEX_WAIT, 2, NULL <unfinished ...>
Process 1498 detached

# gdb /usr/bin/maemo-launcher 1498
...
Loaded symbols for /lib/libnss_dns.so.2
0x400d2b00 in free () from /lib/libc.so.6
(gdb) info threads
  3 Thread 1142101136 (LWP 1505)  0x400d2b00 in free () from /lib/libc.so.6
  2 Thread 1125098640 (LWP 1506)  0x400d1f3c in malloc_consolidate ()
   from /lib/libc.so.6
  1 Thread 1073857008 (LWP 1498)  0x400d2b00 in free () from /lib/libc.so.6
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/bin/maemo-launcher, process 1498

# egrep '^S[lt]' /proc/1498/task/*/status
/proc/1498/task/1498/status:State:    S (sleeping)
/proc/1498/task/1498/status:SleepAVG:    91%
/proc/1498/task/1505/status:State:    T (stopped)
/proc/1498/task/1505/status:SleepAVG:    72%
/proc/1498/task/1506/status:State:    R (running)q
/proc/1498/task/1506/status:SleepAVG:    0%

# kill -SIGABRT 1498
(no output or core dumped, process still running at 100% CPU)
Comment 25 Danilo Câmara 2008-07-12 20:58:33 UTC
This time I've got a core-dump. Output is basically the same:

# egrep '^S[lt]' /proc/1806/task/*/status
/proc/1806/task/1806/status:State:    S (sleeping)
/proc/1806/task/1806/status:SleepAVG:    83%
/proc/1806/task/1813/status:State:    S (sleeping)
/proc/1806/task/1813/status:SleepAVG:    91%
/proc/1806/task/1814/status:State:    R (running)
/proc/1806/task/1814/status:SleepAVG:    0%

# gdb /usr/bin/maemo-launcher 1806
...
0x400d2b00 in free () from /lib/libc.so.6
(gdb) info threads
  3 Thread 1142101136 (LWP 1813)  0x400d2b00 in free () from /lib/libc.so.6
  2 Thread 1125098640 (LWP 1814)  0x400d1f3c in malloc_consolidate ()
   from /lib/libc.so.6
  1 Thread 1073857008 (LWP 1806)  0x400d2b00 in free () from /lib/libc.so.6
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/bin/maemo-launcher, process 1806

# egrep '^S[lt]' /proc/1806/task/*/status
/proc/1806/task/1806/status:State:    S (sleeping)
/proc/1806/task/1806/status:SleepAVG:    20%
/proc/1806/task/1813/status:State:    S (sleeping)
/proc/1806/task/1813/status:SleepAVG:    72%
/proc/1806/task/1814/status:State:    R (running)
/proc/1806/task/1814/status:SleepAVG:    0%

# kill -SIGABRT 1806
Comment 26 Andre Klapper maemo.org 2008-07-14 12:53:55 UTC
(In reply to comment #21)
> (In reply to comment #17)
> Could you file a separate bug about kernel futex issue (with
> crash and use-time keywords) if there isn't one yet?  Thanks!

For the records: https://bugs.maemo.org/show_bug.cgi?id=3438
Comment 27 Jason Bartell 2008-07-18 20:56:31 UTC
This CPU saturation bug is likely not just with IMAP. I have a single POP
account (with SSL) on Modest and see the bug. top shows maemo-launcher with
~98% CPU, and checking the PID with ps shows /usr/bin/modest as the command.
Comment 28 Andre Klapper maemo.org 2008-08-04 16:45:23 UTC
(In reply to comment #23)
> This night modest crashed again. Message box: "(i) Internal error. Application
> 'E-mail' closed." I've got another 'signal 11, Segmentation fault' core-dump.
> Finally with the /core-dumps directory, I'll get this logged.

That sounds like bug 2968.


In general, I'm a bit lost here.
Eero, what more information do we need to track this down successfully?
Comment 29 Eero Tamminen nokia 2008-08-04 17:53:50 UTC
(In reply to comment #25)
> This time I've got a core-dump. Output is basically the same:
> 
> # egrep '^S[lt]' /proc/1806/task/*/status
> /proc/1806/task/1806/status:State:      S (sleeping)
> /proc/1806/task/1806/status:SleepAVG:   83%
> /proc/1806/task/1813/status:State:      S (sleeping)
> /proc/1806/task/1813/status:SleepAVG:   91%
> /proc/1806/task/1814/status:State:      R (running)
> /proc/1806/task/1814/status:SleepAVG:   0%
> 
> # gdb /usr/bin/maemo-launcher 1806
> ...
> 0x400d2b00 in free () from /lib/libc.so.6
> (gdb) info threads
>   3 Thread 1142101136 (LWP 1813)  0x400d2b00 in free () from /lib/libc.so.6
>   2 Thread 1125098640 (LWP 1814)  0x400d1f3c in malloc_consolidate ()
>    from /lib/libc.so.6
>   1 Thread 1073857008 (LWP 1806)  0x400d2b00 in free () from /lib/libc.so.6

Interesting.  I think malloc_consolidate() is triggered by a free()
i.e. there are three different threads trying to do free() at the same
time and I think this uses locking.


There's one more thing to test:  After Modest has stuck, do as root:
   echo "1" > /proc/cpu/alignment

And from "dmesg" command output whether there are messages about "Alignment
trap".

(If there are, this is ARM kernel issue about it not SIGBUSing processes
generating alignment exceptions on unligned multi-word accesses. Kernel has
asked CPU to handle unligned accesses, but ARM does that only for single
word accesses, kernel needs to catch/handle exceptions for multi-word ones.)


(In reply to comment #28)
>> This night modest crashed again. Message box: "(i) Internal error.
>> Application 'E-mail' closed." I've got another 'signal 11, Segmentation
>> fault' core-dump. Finally with the /core-dumps directory, I'll get this
>> logged.
> 
> That sounds like bug 2968.

That's more like bug 3311 that this bug.  This bug is about process
using 100% CPU, not crashes.


> In general, I'm a bit lost here.
> Eero, what more information do we need to track this down successfully?

In case this is related to robust futexes (instead of crash, due to
futex issue the process freezes with 100% CPU usage), fixing bug
3311 would make this to go away too.  I'm not completely convinced
about that though, I think the futex locking issues is supposed to
freeze the kernel, not application.
Comment 30 Eero Tamminen nokia 2008-08-04 19:48:11 UTC
Eric comments in bug 3438 that application freeze *is* a symptom for
the futex issues, so adding the Modest crasher and futex bugs as dependencies
(the issue is app not dying on crash if its futex is held by other 
thread/process, fixing either of the deps should in this case be enough
to make the issue to go away).
Comment 31 luarvique (reporter) 2008-08-04 19:53:52 UTC
(In reply to comment #30)
> adding the Modest crasher and futex bugs as dependencies
> (the issue is app not dying on crash if its futex is held by other 
> thread/process, fixing either of the deps should in this case be enough
> to make the issue to go away).
Are you absolutely sure about it? If yes, can Modest be modified to avoid
reliance on futex, tested, and pushed to end users via update system?
Comment 32 Andre Klapper maemo.org 2008-08-04 20:30:31 UTC
(In reply to comment #31)
> Are you absolutely sure about it? If yes, can Modest be modified to avoid
> reliance on futex, tested, and pushed to end users via update system? 

I'm quite sure that this is already work in progress.
Comment 33 Eero Tamminen nokia 2008-08-05 13:21:12 UTC
(In reply to comment #32)
> (In reply to comment #31)
> > Are you absolutely sure about it? If yes, can Modest be modified to avoid
> > reliance on futex, tested, and pushed to end users via update system? 
> 
> I'm quite sure that this is already work in progress.

Err... Threaded programs need locking (implemented with futexes).
What is needed is fixing the crash in Modest and kernel side
futex implementation.
Comment 34 Andre Klapper maemo.org 2008-08-05 13:40:18 UTC
(In reply to comment #33)
> (In reply to comment #32)
> > > Are you absolutely sure about it? If yes, can Modest be modified to avoid
> > > reliance on futex, tested, and pushed to end users via update system? 
> > 
> > I'm quite sure that this is already work in progress.
> 
> Err... Threaded programs need locking (implemented with futexes).
> What is needed is fixing the crash in Modest and kernel side
> futex implementation.

Oops, my comment was misleading. I only refered to pushing updates, not to
remove futex code. :-)
Comment 35 luarvique (reporter) 2008-08-14 17:39:10 UTC
Latest software update seems to have fixed this problem. Cannot say for sure
though, as the problem appeared sporadically.
Comment 36 kenneth 2008-08-15 02:01:05 UTC
(In reply to comment #35)
> Latest software update seems to have fixed this problem. Cannot say for sure
> though, as the problem appeared sporadically.
> 

nope, i still see it happen in my N800 after the SSU. and thats me having to
fully reflash diablo after a bad SSU update.
Comment 37 Eero Tamminen nokia 2008-08-15 10:04:39 UTC
(In reply to comment #36)
> nope, i still see it happen in my N800 after the SSU. and thats me having to
> fully reflash diablo after a bad SSU update.

If you've afterwards flashed the June Diablo release image, you don't have
the August Modest version from the SSU update.
Comment 38 kenneth 2008-08-15 10:21:27 UTC
(In reply to comment #37)
> (In reply to comment #36)
> > nope, i still see it happen in my N800 after the SSU. and thats me having to
> > fully reflash diablo after a bad SSU update.
> 
> If you've afterwards flashed the June Diablo release image, you don't have
> the August Modest version from the SSU update.
> 

ok, seems i need to clarify...

when i tried to install the recent SSU released update, my tablet ended up in a
state where wifi didnt work, and upon reboot went into a reboot loop.

so what i did was flash with the diablo firmware, apply the recent SSU update,
then went about reinstalling apps and stuff.
Comment 39 Andre Klapper maemo.org 2008-08-15 12:45:31 UTC
kemarken: Still a bit confused - If you start the X-Terminal, what is the
Version output of "dpkg-query -l modest" and "dpkg-query -l libtinymail-1.0-0"?
Comment 40 kenneth 2008-08-15 13:42:51 UTC
(In reply to comment #39)
> kemarken: Still a bit confused - If you start the X-Terminal, what is the
> Version output of "dpkg-query -l modest" and "dpkg-query -l libtinymail-1.0-0"?
> 

modest: 1.0-2008.30-1

libtinymail 1.0.0-svn3725
Comment 41 Andre Klapper maemo.org 2008-08-15 14:26:02 UTC
(In reply to comment #40)
> modest: 1.0-2008.30-1
> libtinymail 1.0.0-svn3725

OK, that's the latest SSU update, and you still run into this issue... Bad.

kemarken, can you please take a look at the former comments in this report and
provide more information (especially see comment 7)?
Comment 42 kenneth 2008-08-15 14:41:05 UTC
ok, ill set up the core-dumps dir and see if anything shows up.
Comment 43 Danilo Câmara 2008-08-15 15:31:07 UTC
Yesterday I did install "OS2008 Feature upgrade 4.2008.30-2", reboot and set
modest to automatically check for new mail (without this check the problem
doesn't happen).

This morning the blue led was flashing new mail. It was not at 100% CPU, but
for three times I couldn't open modest. I clicked on menu, the message box
appeared saying "Loading e-mail" but nothing happened then. Probably something
unrelated to this bug. Now it's working fine.

I'll wait a couple days more to see if the CPU issue is over.
Comment 44 Andre Klapper maemo.org 2008-08-15 15:53:49 UTC
(In reply to comment #43)
> for three times I couldn't open modest. I clicked on menu, the message box
> appeared saying "Loading e-mail" but nothing happened then. Probably something
> unrelated to this bug. Now it's working fine.

Yes, it's probably bug 3304 - fix hasn't been included in last update, will be
in the next.
Comment 45 kenneth 2008-08-16 06:58:23 UTC
here is a stacktrace:
Process 2431 attached - interrupt to quit

ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8,
events=POLLIN},{fd=7, events=POLLIN}, {fd=6, events=POLLIN}], 5, -1) = 1
([{fd=3, revents=POLLIN}])

ioctl(3, FIONREAD, [64]) = 0

read(3,
"W\1Y\0\223f\234\311\0\n\377\0\10\377\0\24\10\370\10\370\10\370\10\370\10\370\10\370\377\377\27\0W"...,
64) = 64

ioctl(3, FIONREAD, [0]) = 0

ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8,
events=POLLIN},{fd=7, events=POLLIN}, {fd=6, events=POLLIN}], 5, -1) = 2
([{fd=3, revents=POLLIN}, {fd=8, revents=POLLIN}])

ioctl(3, FIONREAD, [64]) = 0

read(3,
"\241\10Y\0\1\0\200\1F\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\305\16`\0\0\0\0\0\241"...,
64) = 64

read(8, "l\4\1\1\36\0\0\0\35\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/free"...,2048)
= 190

read(8, 0x114458, 2048) = -1 EAGAIN (Resource temporarily unavailable)

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7,
events=POLLIN},{fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 5, 0) = 0
(Timeout)write(3, "\21\0\2\0c\1\0\0"..., 8) = 8

read(3,
"\1\320Z\0\5\0\0\0\24\0\"\0\1\0\0\0\30\4\0\0\330\320\"\0\21\0\0\0\254\237\24\0"...,
32) = 32

readv(3, [{"_GTK_LOAD_ICONTHEMES"..., 20}, {""..., 0}], 2) = 20

ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7,
events=POLLIN},{fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1) = 1
([{fd=8, revents=POLLIN}])

read(8, "l\4\1\1\35\0\0\0\36\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/free"...,2048)
= 189

read(8, 0x114458, 2048) = -1 EAGAIN (Resource temporarily unavailable)

ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7,
events=POLLIN},{fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 5, 0) = 0
(Timeout)ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=7,
events=POLLIN},{fd=6, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1) = 1
([{fd=7, revents=POLLIN}])

read(7, "l\4\1\1M\0\0\0\243\0\0\0f\0\0\0\1\1o\0\16\0\0\0/com/noki"..., 2048) =
197

read(7, 0xb4648, 2048) = -1 EAGAIN (Resource temporarily unavailable)

ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6,
events=POLLIN},{fd=8, events=POLLIN}, {fd=7, events=POLLIN}], 5, 0) = 0
(Timeout)gettimeofday({1218858239, 141632}, NULL) = 0

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0writev(2, [{"modest[2431]: GLIB DEBUG ConIc - "..., 123}, {"\n"..., 1}], 2) =
124

socket(PF_FILE, SOCK_DGRAM, 0) = 13

fcntl64(13, F_SETFD, FD_CLOEXEC) = 0

connect(13, {sa_family=AF_FILE, path="/dev/log"...}, 110) = -1 ENOENT (No such
file or directory)

close(13) = 0

gettimeofday({1218858239, 220397}, NULL) = 0

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0writev(2, [{"modest[2431]: GLIB DEBUG default "..., 81}, {"\n"..., 1}], 2)= 82

socket(PF_FILE, SOCK_DGRAM, 0) = 13

fcntl64(13, F_SETFD, FD_CLOEXEC) = 0

connect(13, {sa_family=AF_FILE, path="/dev/log"...}, 110) = -1 ENOENT (No such
file or directory)

close(13) = 0

access("/var/run/resolv.conf", F_OK) = -1 ENOENT (No such file or directory)

access("/tmp/resolv.conf.wlan0", F_OK) = -1 ENOENT (No such file or directory)

access("/tmp/resolv.conf.ppp0", F_OK) = -1 ENOENT (No such file or directory)

gettimeofday({1218858239, 630371}, NULL) = 0

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0writev(2, [{"modest[2431]: GLIB DEBUG default "..., 97}, {"\n"..., 1}], 2)= 98

socket(PF_FILE, SOCK_DGRAM, 0) = 13

fcntl64(13, F_SETFD, FD_CLOEXEC) = 0

connect(13, {sa_family=AF_FILE, path="/dev/log"...}, 110) = -1 ENOENT (No such
file or directory)

close(13) = 0

gettimeofday({1218858239, 675262}, NULL) = 0

ioctl(3, FIONREAD, [0]) = 0

gettimeofday({1218858239, 694427}, NULL) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6,
events=POLLIN},{fd=8, events=POLLIN}, {fd=7, events=POLLIN}], 5, 4981) = 0
(Timeout)gettimeofday({1218858244, 678131}, NULL) = 0

gettimeofday({1218858244, 678741}, NULL) = 0

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0writev(2, [{"modest[2431]: GLIB DEBUG default "..., 84}, {"\n"..., 1}], 2)= 85

socket(PF_FILE, SOCK_DGRAM, 0) = 13

fcntl64(13, F_SETFD, FD_CLOEXEC) = 0

connect(13, {sa_family=AF_FILE, path="/dev/log"...}, 110) = -1 ENOENT (No such
file or directory)

close(13) = 0

gettimeofday({1218858244, 704467}, NULL) = 0

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0writev(2, [{"modest[2431]: GLIB DEBUG default "..., 103}, {"\n"..., 1}], 2) =
104

socket(PF_FILE, SOCK_DGRAM, 0) = 13

fcntl64(13, F_SETFD, FD_CLOEXEC) = 0

connect(13, {sa_family=AF_FILE, path="/dev/log"...}, 110) = -1 ENOENT (No such
file or directory)

close(13) = 0

gettimeofday({1218858244, 729370}, NULL) = 0

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0writev(2, [{"modest[2431]: GLIB DEBUG default "..., 94}, {"\n"..., 1}], 2)= 95

socket(PF_FILE, SOCK_DGRAM, 0) = 13

fcntl64(13, F_SETFD, FD_CLOEXEC) = 0

connect(13, {sa_family=AF_FILE, path="/dev/log"...}, 110) = -1 ENOENT (No such
file or directory)

close(13) = 0

gettimeofday({1218858244, 755523}, NULL) = 0

stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=806, ...}) =
0writev(2, [{"modest[2431]: GLIB DEBUG default "..., 79}, {"\n"..., 1}], 2)= 80

socket(PF_FILE, SOCK_DGRAM, 0) = 13

fcntl64(13, F_SETFD, FD_CLOEXEC) = 0

connect(13, {sa_family=AF_FILE, path="/dev/log"...}, 110) = -1 ENOENT (No such
file or directory)

close(13) = 0

ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6,
events=POLLIN},{fd=8, events=POLLIN}, {fd=7, events=POLLIN}], 5, -1) = 1
([{fd=3, revents=POLLIN}])

ioctl(3, FIONREAD, [64]) = 0

read(3,
"W\1Z\0008\336\234\311\0\n\377\0\10\377\0\24\10\370\10\370\10\370\10\370\10\370\10\370\377\377\27\0W"...,
64) = 64

ioctl(3, FIONREAD, [0]) = 0

ioctl(3, FIONREAD, [0]) = 0

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6,
events=POLLIN},{fd=8, events=POLLIN}, {fd=7, events=POLLIN}], 5, -1) = -1 EINTR
(Interrupt ed system call)

--- SIGABRT (Aborted) @ 0 (0) ---

Process 2431 detached

i also have a core dump but im not sure about attaching it.
Comment 46 Eero Tamminen nokia 2008-08-18 10:32:44 UTC
(In reply to comment #45)
> here is a stacktrace:
> Process 2431 attached - interrupt to quit
[...]
> --- SIGABRT (Aborted) @ 0 (0) ---
> 
> Process 2431 detached

This seems different thing than the original issue
(where strace showed Modest sitting in futex system call).
This was from Modest PID taking 100% CPU?

Could you check the status of all the Modest threads for that PID:
    egrep '^S[lt]' /proc/PID/task/*/status
?

What files this PID has open?  Could you check the output of:
    ls -l /proc/<PID>/fd/
?

(And if these don't given anything useful, it would be great
if you could you check with Gdb what the threads are doing in
more detail, see comment 9 and other comments around it.)
Comment 47 Danilo Câmara 2008-08-18 17:21:47 UTC
Modest is not getting in 100% CPU for me anymore, but now I'm experiencing bug
2579 (with the difference that I didn't set to download messages, just
headers).

Well, I'd say the 100% CPU issue was much worse...
Comment 48 kenneth 2008-08-18 22:49:59 UTC
(In reply to comment #46)
> (In reply to comment #45)
> > here is a stacktrace:
> > Process 2431 attached - interrupt to quit
> [...]
> > --- SIGABRT (Aborted) @ 0 (0) ---
> > 
> > Process 2431 detached
> 
> This seems different thing than the original issue
> (where strace showed Modest sitting in futex system call).
> This was from Modest PID taking 100% CPU?
> 
> Could you check the status of all the Modest threads for that PID:
>     egrep '^S[lt]' /proc/PID/task/*/status
> ?
> 
> What files this PID has open?  Could you check the output of:
>     ls -l /proc/<PID>/fd/
> ?
> 
> (And if these don't given anything useful, it would be great
> if you could you check with Gdb what the threads are doing in
> more detail, see comment 9 and other comments around it.)
> 

all the modest threads where sleeping.

the ls gave indication that the journal files for my accounts where open, 

lr-x------    1 user     users          64 Aug 18 21:43 0 -> /dev/null
l-wx------    1 user     users          64 Aug 18 21:43 1 -> /dev/null
lrwx------    1 user     users          64 Aug 18 21:43 10 ->
/home/user/.modest/cache/key3.db
lrwx------    1 user     users          64 Aug 18 21:43 11 ->
/home/user/.modest/cache/mail/pop/"account 1"/journal
lrwx------    1 user     users          64 Aug 18 21:43 12 ->
/home/user/.modest/cache/mail/pop/"account 2"/journal
lrwx------    1 user     users          64 Aug 18 21:43 13 -> socket:[10358]
l-wx------    1 user     users          64 Aug 18 21:43 2 -> /dev/null
lrwx------    1 user     users          64 Aug 18 21:43 3 -> socket:[9231]
lr-x------    1 user     users          64 Aug 18 21:43 4 -> pipe:[9233]
l-wx------    1 user     users          64 Aug 18 21:43 5 -> pipe:[9233]
lrwx------    1 user     users          64 Aug 18 21:43 6 -> socket:[9234]
lrwx------    1 user     users          64 Aug 18 21:43 7 -> socket:[9236]
lrwx------    1 user     users          64 Aug 18 21:43 8 -> socket:[9238]
lrwx------    1 user     users          64 Aug 18 21:43 9 ->
/home/user/.modest/cache/cert8.db

(i masked out the account dirs)

gdp gave me this:

  7 Thread 1118295184 (LWP 2416)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
   from /lib/libpthread.so.0
  6 Thread 1126769808 (LWP 2417)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
   from /lib/libpthread.so.0
  5 Thread 1136653456 (LWP 2418)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
   from /lib/libpthread.so.0
  4 Thread 1145042064 (LWP 2419)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
   from /lib/libpthread.so.0
  3 Thread 1153430672 (LWP 2420)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
   from /lib/libpthread.so.0
  2 Thread 1161819280 (LWP 2421)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
   from /lib/libpthread.so.0
  1 Thread 1073857008 (LWP 2413)  0x4011cd14 in poll () from /lib/libc.so.6
Comment 49 kenneth 2008-08-19 07:49:29 UTC
played around a bit with gdb, trying to backtrace the coredump i had.

this is what i got:

#0 0x4011cd14 in poll () from /lib/libc.so.6
#1 0x40b4bf54 in ?? () from /usr/lib/libglib-2.0.so.0

not really that helpful i guess. seems i failed at getting the debug symbols
installed.
Comment 50 Eero Tamminen nokia 2008-08-19 11:08:39 UTC
(In reply to comment #48)
> > This seems different thing than the original issue
> > (where strace showed Modest sitting in futex system call).
> > This was from Modest PID taking 100% CPU?
> > 
> > Could you check the status of all the Modest threads for that PID:
> >     egrep '^S[lt]' /proc/PID/task/*/status
> > ?

> all the modest threads where sleeping.

So it wasn't taking 100% CPU. What was the exact output of that command?


[...]
> gdp gave me this:
> 
>   7 Thread 1118295184 (LWP 2416)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
>    from /lib/libpthread.so.0
>   6 Thread 1126769808 (LWP 2417)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
>    from /lib/libpthread.so.0
>   5 Thread 1136653456 (LWP 2418)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
>    from /lib/libpthread.so.0
>   4 Thread 1145042064 (LWP 2419)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
>    from /lib/libpthread.so.0
>   3 Thread 1153430672 (LWP 2420)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
>    from /lib/libpthread.so.0
>   2 Thread 1161819280 (LWP 2421)  0x40bc6334 in pthread_cond_wait@@GLIBC_2.4 ()
>    from /lib/libpthread.so.0
>   1 Thread 1073857008 (LWP 2413)  0x4011cd14 in poll () from /lib/libc.so.6

This is clearly a different issue than this bug.
If somebody can still reproduce the 100% CPU usage
and strace showing it sitting in futex syscall,
please comment here, otherwise this is set as fixed.


Kenneth, please file a separate, focused bug for your
issue.

In the new bug, please describe exactly what you did
(what settings you had) and what happened and why you
think it's a bug.
Comment 51 kenneth 2008-08-19 12:10:42 UTC
exact output was this:

/proc/3105/task/3105/status:State:      S (sleeping)
/proc/3105/task/3105/status:SleepAVG:   91%
/proc/3105/task/3106/status:State:      S (sleeping)
/proc/3105/task/3106/status:SleepAVG:   76%
/proc/3105/task/3107/status:State:      S (sleeping)
/proc/3105/task/3107/status:SleepAVG:   41%
/proc/3105/task/3108/status:State:      S (sleeping)
/proc/3105/task/3108/status:SleepAVG:   35%
/proc/3105/task/3109/status:State:      S (sleeping)
/proc/3105/task/3109/status:SleepAVG:   9%
/proc/3105/task/3110/status:State:      S (sleeping)
/proc/3105/task/3110/status:SleepAVG:   10%
/proc/3105/task/3111/status:State:      S (sleeping)
/proc/3105/task/3111/status:SleepAVG:   6%
Comment 52 Eero Tamminen nokia 2008-08-19 12:47:51 UTC
(In reply to comment #51)
> exact output was this:
> 
> /proc/3105/task/3105/status:State:      S (sleeping)
> /proc/3105/task/3105/status:SleepAVG:   91%
> /proc/3105/task/3106/status:State:      S (sleeping)
> /proc/3105/task/3106/status:SleepAVG:   76%
> /proc/3105/task/3107/status:State:      S (sleeping)
> /proc/3105/task/3107/status:SleepAVG:   41%
> /proc/3105/task/3108/status:State:      S (sleeping)
> /proc/3105/task/3108/status:SleepAVG:   35%
> /proc/3105/task/3109/status:State:      S (sleeping)
> /proc/3105/task/3109/status:SleepAVG:   9%
> /proc/3105/task/3110/status:State:      S (sleeping)
> /proc/3105/task/3110/status:SleepAVG:   10%
> /proc/3105/task/3111/status:State:      S (sleeping)
> /proc/3105/task/3111/status:SleepAVG:   6%

Thanks.  It seems that only one thread is about idle, a couple are
doing a bit of something and three threads are 9/10th of their time
slice busy.

Could you attach this also to a new bug about this new issue and state
there how long it was since you had interacted with Modest (or had it
been the whole time on the background / not visible) before this
check was done.
Comment 53 Eero Tamminen nokia 2008-09-15 11:24:14 UTC
> Could you attach this also to a new bug about this new issue and state
> there how long it was since you had interacted with Modest (or had it
> been the whole time on the background / not visible) before this
> check was done.

Please do that, I'll mark this (*100%* CPU usage) bug as fixed for 4.1.1
as there was no contrary information.
Comment 54 luarvique (reporter) 2008-09-15 11:46:22 UTC
(In reply to comment #53)
> > Could you attach this also to a new bug about this new issue and state
> > there how long it was since you had interacted with Modest (or had it
> > been the whole time on the background / not visible) before this
> > check was done.
> Please do that, I'll mark this (*100%* CPU usage) bug as fixed for 4.1.1
> as there was no contrary information.
The new bug for this problem has been filed a while ago:

https://bugs.maemo.org/show_bug.cgi?id=3641