Bug 3455 - (int-87417) telepathy-gabble deadlooping
(int-87417)
: telepathy-gabble deadlooping
Status: RESOLVED FIXED
Product: Chat & Call & SMS
XMPP
: 5.0-alpha
: All Linux
: Low normal with 1 vote (vote)
: 5.0 (1.2009.41-10)
Assigned To: rtcomm@maemo.org
: xmpp-bugs
: https://bugs.freedesktop.org/show_bug...
: community-diablo, patch, upstream, us...
:
:
  Show dependency tree
 
Reported: 2008-07-16 22:26 UTC by cedric cellier
Modified: 2009-12-16 02:30 UTC (History)
5 users (show)

See Also:


Attachments
rich core dump of telepathy-gabble during looping (168.78 KB, application/octet-stream)
2009-02-17 22:01 UTC, Lucas Maneos
Details
rich code dump for the openssl case (246.06 KB, application/octet-stream)
2009-07-12 14:14 UTC, Lucas Maneos
Details
Quick kludge to stop _lm_ssl_begin spinning (260 bytes, patch)
2009-07-12 15:03 UTC, Lucas Maneos
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description cedric cellier (reporter) 2008-07-16 22:26:44 UTC
SOFTWARE VERSION:
4.2008.23-14

STEPS TO REPRODUCE THE PROBLEM:
Google chat account enabled, wifi connection that seams to firewall something.
(It does this only at home when I use a neighboring wifi AP, so I suppose
firewalling may play a role here)

EXPECTED OUTCOME:
telepathy-gabble fails with dignity if something goes wrong on the connection

ACTUAL OUTCOME:
telepathy-gabble deadloop like crazy on a write. strace says an infinite loop
of :
write(4,
"\212\27\303I\317\236\346\0012\4f\220%3\232\305\260\334\205\36u\353\222\265\314\361\365d\323\375\
352bO"..., 3318) = -1 EAGAIN (Resource temporarily unavailable)

I hope there is no password in there :-)

REPRODUCIBILITY:
Just did it twice in a row as soon as I activate the gtalk account from here.
Have to kill telepathy even after disabling the account.

EXTRA SOFTWARE INSTALLED:

OTHER COMMENTS:
Ask if you need a PCAP capture.

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9)
Gecko/2008062908 Iceweasel/3.0 (Debian-3.0~rc2-2)
Comment 1 Mikhail Zabaluev nokia 2008-07-17 11:06:26 UTC
Copied as freedesktop bug https://bugs.freedesktop.org/show_bug.cgi?id=16754,
might be generic enough.
Comment 2 Mikhail Zabaluev nokia 2008-08-08 13:15:38 UTC
Reproducible also in an x86 scratchbox target.
Comment 3 Lucas Maneos 2009-02-16 17:34:54 UTC
After some experiments on reproducibility:  it will happen if the connection is
blocked with an ICMP unreachable message (iptables -j REJECT) or TCP reset
(iptables -j REJECT --reject-with tcp-reset), but not if packets are dropped on
the floor with no response (iptables -j DROP).

We don't have REJECT on the tablet, but the TCP reset result means that the
easiest way to reproduce is to modify the account settings and specify a port
that does not have a service listening on the server, then try to connect. 
Reproducible every time here (5.2008.43-7).

For some reason I can't get any output out of strace -p `pidof
telepathy-gabble`
Comment 4 Lucas Maneos 2009-02-16 19:04:55 UTC
With gdb and libc6-dbg installed:

0x4008df3c in malloc_consolidate ()
   from /lib/libc.so.6
(gdb) bt
#0  0x4008df3c in malloc_consolidate () from /lib/libc.so.6
#1  0x4008eef0 in _int_malloc () from /lib/libc.so.6
#2  0x4009013c in calloc () from /lib/libc.so.6
#3  0x40086934 in open_memstream () from /lib/libc.so.6
#4  0x400dde2c in __vsyslog_chk () from /lib/libc.so.6
#5  0x400de3f8 in syslog () from /lib/libc.so.6
#6  0x411776ec in g_log_default_handler () from /usr/lib/libglib-2.0.so.0
#7  0x4117696c in g_logv () from /usr/lib/libglib-2.0.so.0
#8  0x41176be8 in g_log () from /usr/lib/libglib-2.0.so.0
#9  0x42235ef0 in ?? () from /usr/lib/libtelepathy-glib.so.0
Cannot access memory at address 0x0

Syslogd gets just the following entries:

Feb 16 16:41:26 Nokia-N800-36-5 telepathy-gabble[1619]: GLIB DEBUG default -
started version 0.6.2.1 (telepathy-glib version 0.7.0)
Feb 16 16:42:51 Nokia-N800-36-5 mission-control[1571]: _mcd_connection_connect:
tp_connmgr_request_connection failed: Did not receive a reply. Possible causes
include: the remote application did not send a reply, the message bus security
policy blocked the reply, the reply timeout expired, or the network connection
was broken. 

after which I see an "Unable to connect" notification and the presence applet
stops flashing,  but telepathy-gabble still spins until killed manually.
Comment 5 Eero Tamminen nokia 2009-02-17 12:16:12 UTC
(In reply to comment #4)
> With gdb and libc6-dbg installed:
> 
> 0x4008df3c in malloc_consolidate ()
>    from /lib/libc.so.6
> (gdb) bt
> #0  0x4008df3c in malloc_consolidate () from /lib/libc.so.6
> #1  0x4008eef0 in _int_malloc () from /lib/libc.so.6
> #2  0x4009013c in calloc () from /lib/libc.so.6
> #3  0x40086934 in open_memstream () from /lib/libc.so.6
> #4  0x400dde2c in __vsyslog_chk () from /lib/libc.so.6
> #5  0x400de3f8 in syslog () from /lib/libc.so.6
> #6  0x411776ec in g_log_default_handler () from /usr/lib/libglib-2.0.so.0
> #7  0x4117696c in g_logv () from /usr/lib/libglib-2.0.so.0
> #8  0x41176be8 in g_log () from /usr/lib/libglib-2.0.so.0
> #9  0x42235ef0 in ?? () from /usr/lib/libtelepathy-glib.so.0
> Cannot access memory at address 0x0

To get working backtraces, you need debug symbols for all the libraries in the
backtrace (i.e. at least glib and telepathy-glib).

Alternatively you could force the process to core-dump after installing
sp-rich-core from the tools repository & creating "core-dumps" directory to a
memory card with at least few MBs of free space. You can then attach the core
dump to this bug. (the compressed rich core dumps can be extracted with a tool
from sp-rich-core-postproc e.g. to check that they don't contain any sensitive
information you wouldn't want to attach to public bugzilla)
Comment 6 Lucas Maneos 2009-02-17 12:48:21 UTC
(In reply to comment #5)
> To get working backtraces, you need debug symbols for all the libraries in the
> backtrace (i.e. at least glib and telepathy-glib).

I couldn't find any such packages, do they exist somewhere?

Anyway, the low part of the stack should be accurate even if we don't have
symbols for the higher parts, no?  Most search results for malloc_consolidate
and infinite loop seem to point to double free bugs.  Now that we know how to
reproduce at will valgrind on x86 should help.

BTW:  if you time it right and start a telepathy-gabble process just before
setting the presence to an online state, that instance will be used instead of
starting a new one.
Comment 7 Eero Tamminen nokia 2009-02-17 16:19:00 UTC
(In reply to comment #6)
> (In reply to comment #5)
>> To get working backtraces, you need debug symbols for all the libraries
>> in the backtrace (i.e. at least glib and telepathy-glib).
> 
> I couldn't find any such packages, do they exist somewhere?

You don't get them just by:
  apt-get install libglib2.0-0-dbg libtelepathy-glib0-dbg
?

E.g. this has glib debug symbols, but I don't know whether they match your
version of Glib:
  http://repository.maemo.org/pool/maemo4.1.2/free/g/glib2.0/


(I don't currently have Diablo device, but if you can provide the rich-core
crash dump, I can check whether it's already internally reported.)
Comment 8 Lucas Maneos 2009-02-17 22:01:39 UTC
Created an attachment (id=1126) [details]
rich core dump of telepathy-gabble during looping

(In reply to comment #7)
> You don't get them just by:
>   apt-get install libglib2.0-0-dbg libtelepathy-glib0-dbg
> ?

Never mind, I naively assumed they'd be in the same repository as the glibc
ones.  After enabling the SDK repo they installed fine.  Still missing symbols
for some libs (libssl, libcrypto, libz, libdb1) but the stack trace is more
complete now:

(gdb) bt
#0  0x41089f3c in malloc_consolidate () from /lib/libc.so.6
#1  0x4108aef0 in _int_malloc () from /lib/libc.so.6
#2  0x4108c13c in calloc () from /lib/libc.so.6
#3  0x41082934 in open_memstream () from /lib/libc.so.6
#4  0x410d9e2c in __vsyslog_chk () from /lib/libc.so.6
#5  0x410da3f8 in syslog () from /lib/libc.so.6
#6  0x411776ec in IA__g_log_default_handler (log_domain=0x4119f7c4 "default", 
    log_level=G_LOG_LEVEL_DEBUG, 
    message=0x715b0 "no connections, and timed out", unused_data=0x411a5520)
    at gmessages.c:990
#7  0x4117696c in IA__g_logv (log_domain=0x0, log_level=G_LOG_LEVEL_DEBUG, 
    format=0x42240960 "no connections, and timed out", args1=0xbed3e414)
    at gmessages.c:479
#8  0x41176be8 in IA__g_log (
    log_domain=0x411370cc
"�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A�p\023A\004q\023A\004q\023A\fq\023A\fq\023A\024q\023A\024q\023A\034q\023A\034q\023A$q\023A$q\023A,q\023A,q\023A4q\023A4q\023A<q\023A<q\023ADq\023ADq\023ALq\023ALq\023ATq\023ATq\023A\\q\023A\\q\023Adq\023Adq\023Alq\023Alq\023Atq\023Atq\023A|q\023A|q\023A\204q\023A\204q\023A"...,
log_level=1092244828, 
    format=0x42240960 "no connections, and timed out") at gmessages.c:522
#9  0x42235ef0 in kill_connection_manager (data=0x411370cc) at run.c:67
#10 0x41171498 in g_timeout_dispatch (source=0x70498, 
    callback=0x42235eb5 <kill_connection_manager+1>, user_data=0x41136030)
    at gmain.c:3422
#11 0x4116e108 in IA__g_main_context_dispatch (context=0x64a80) at gmain.c:2045
#12 0x4116ffd4 in g_main_context_iterate (context=0x64a80, block=1, 
    dispatch=1, self=0x0) at gmain.c:2677
#13 0x4117034c in IA__g_main_loop_run (loop=0x634d8) at gmain.c:2881
#14 0x42236076 in tp_run_connection_manager (prog_name=0x0, 
    version=0x451f4 "0.6.2.1", construct_cm=0x4224ee70 <manager>, argc=0, 
    argv=0xbed3e6b4) at run.c:235
#15 0x00010034 in ?? ()
Cannot access memory at address 0x0

(In reply to comment #5)
> Alternatively you could force the process to core-dump after installing
> sp-rich-core from the tools repository & creating "core-dumps" directory to a
> memory card with at least few MBs of free space. You can then attach the core
> dump to this bug.

Attached.

> (the compressed rich core dumps can be extracted with a tool
> from sp-rich-core-postproc e.g. to check that they don't contain any sensitive
> information you wouldn't want to attach to public bugzilla)

The core dump contains the account password.  I tried to create one with a
dummy account but couldn't reproduce anymore.  After a few more tests it seems
that the infinite loop only triggers when the configured server has AAAA
records, and it goes away if I insmod ipv6.ko.  I think I may be chasing a
different bug to the one originally reported (gtalk.com doesn't advertise AAAA
RRs, at least currently).
Comment 9 Eero Tamminen nokia 2009-02-18 19:26:52 UTC
> Never mind, I naively assumed they'd be in the same repository as the glibc
> ones.  After enabling the SDK repo they installed fine.

Note that it's better to disable SDK repo after getting the needed packages to
prevent accidental installation of something you wouldn't want on the device
(except for debug symbol stuff, SDK repo stuff isn't really tested on the
device, only in SDK).


> Still missing symbols for some libs (libssl, libcrypto, libz, libdb1)

Btw. Loading debug symbols takes a lot of RAM (and even latest upstream version
of libbfd leaks memory), so for things that use a lot of libraries (like e.g.
desktop) it might actually be better just to have the ones up to the
interesting part of the backtrace.

As debug symbols are used only by debuggers & profilers (Gdb, Oprofile etc), if
there are a lot of them, I sometimes symlink /var/lib/debug to memory card to
save some of that precious internal Flash too (JFFS2 file system also uses more
RAM the fuller it is).


>    format=0x42240960 "no connections, and timed out") at gmessages.c:522
> #9  0x42235ef0 in kill_connection_manager (data=0x411370cc) at run.c:67

Thanks!

The message comes from here:
http://mxr.maemo.org/diablo/source/telepathy-glib-0.7.0/telepathy-glib/run.c#63

And it seems that the process should exit right after it.

I didn't find anything in our internal bug tracker with above function, and no
bugs about freeze with that message (there was one crash in Fremantle where
telepathy-ring had output this message, but as crash was in different process,
I don't think it was related).

Fremantle telepathy-glib is slightly newer than Diablo:
  http://repository.maemo.org/pool/maemo5.0/free/t/telepathy-glib/

Diablo version isn't going to be updated, but if you're someday feeling
particularly superior Maemo hacker, maybe you could try rebuilding the
Fremantle package for Diablo and check whether it helps with the issue? :-)
Comment 10 Lucas Maneos 2009-02-21 10:34:33 UTC
(In reply to comment #9)
> maybe you could try rebuilding the Fremantle package for Diablo and check
> whether it helps with the issue?

It depends on glib >= 2.16 so seems like more trouble than it's worth.  I did
try the Fremantle libloudmouth just in case but it made no difference. 
Telepathy-gabble itself is missing from both Fremantle pre-alphas.

If you have access to a complete enough XMPP stack on Fremantle feel free to
use home.maneos.org:5224 for testing (resolves as both A & AAAA and returns
RST).

Some more findings:

- Fiddling with /etc/gai.conf to make the resolver prefer IPv4 addresses made
no difference.
- Running with MALLOC_CHECK_=2 doesn't do or report anything new.
- When running with GABBLE_DEBUG=connection (or =all) telepathy-gabble exits
normally after failing to connect.

The last one suggests a somewhat acceptable (well, better than the alternative)
kludge, ie changing
/usr/share/dbus-1/services/org.freedesktop.Telepathy.ConnectionManager.gabble.service
to:

[D-BUS Service]
Name=org.freedesktop.Telepathy.ConnectionManager.gabble
Exec=/bin/sh -c "GABBLE_DEBUG=connection /usr/bin/telepathy-gabble"

BTW, are you able to comment on kernel-side Fremantle IPv6 support?  It's
currently disabled in the pre-alphas, but after bug 356 I'd expect it at least
to be built as a module.  If it's loaded by default or built into the kernel
then at least this flavour of the bug goes away.

Cedric: can you still reproduce your strace results with 5.2008.43-7?
Comment 11 Lucas Maneos 2009-02-21 10:46:13 UTC
(In reply to comment #10)
> [D-BUS Service]
> Name=org.freedesktop.Telepathy.ConnectionManager.gabble
> Exec=/bin/sh -c "GABBLE_DEBUG=connection /usr/bin/telepathy-gabble"

Slightly cheaper version:

Exec=/bin/sh -c "GABBLE_DEBUG=connection exec /usr/bin/telepathy-gabble"
Comment 12 Eero Tamminen nokia 2009-02-23 12:09:21 UTC
> BTW, are you able to comment on kernel-side Fremantle IPv6 support?
> It's currently disabled in the pre-alphas, but after bug 356 I'd expect
> it at least to be built as a module.

According to discussion in internal bug tracker (about enabling ping6 in
Busybox in Fremantle of which there's a public bug) IPv6 was enabled in Diablo
kernel because of WiMAX.  whether it will be enabled in Fremantle is open
(wishlist bug on this may help).
Comment 13 Andre Klapper maemo.org 2009-03-05 14:19:42 UTC
Forwarding comments from the internal ticket:

What is the version of Loudmouth?
The buffer it tries to send looks like something encrypted.
It reminds me of this bug:
http://bugs.freedesktop.org/show_bug.cgi?id=14341
http://loudmouth.lighthouseapp.com/projects/17276/tickets/5
Comment 14 Lucas Maneos 2009-03-05 14:52:57 UTC
(In reply to comment #13)
> What is the version of Loudmouth?

1.3.3-0osso5 (http://repository.maemo.org/pool/diablo/free/l/loudmouth/)

> The buffer it tries to send looks like something encrypted.
> It reminds me of this bug:
> http://bugs.freedesktop.org/show_bug.cgi?id=14341
> http://loudmouth.lighthouseapp.com/projects/17276/tickets/5

I don't think these are related.   The server does require TLS, but when this
bug (at least the comment 8 flavour) manifests there is no established
connection let alone a TLS handshake.  The loop happens when it tries to log
that it failed to connect.

Also, the maemo version is configured --with-ssl=openssl (not gnutls).
Comment 15 Lucas Maneos 2009-05-13 05:02:12 UTC
I saw this again earlier on a bad bt/hspa connection.  Unfortunately I didn't
have debug symbols installed, but for what it's worth this is the information I
managed to get out of it:

strace showed it looping on

    gettimeofday({1242165304, 759838}, NULL) = 0
    read(5, 0xa46a8, 5)                     = -1 EAGAIN (Resource temporarily
unavailable)

at around 200Hz, with the tz argument to gettimeofday() changing every time.   
I interrupted it with gdb a number of times and it seemed to be bouncing
between libc/libssl/libcrypto.  There was no output at all from ltrace.  FD 5
was a socket.

Telepathy-gabble was running with GABBLE_DEBUG=connection, and it logged the
following:

22:48:14 GLIB DEBUG default - started version 0.6.2.1 (telepathy-glib version
0.7.0)
22:48:14 GLIB DEBUG default - tp_base_connection_class_init: Initializing
(TpBaseConnectionClass *)0x693b8
22:48:14 GLIB DEBUG default - gabble_connection_class_init: Initializing
(GabbleConnectionClass *)0x69658
22:48:14 GLIB DEBUG default - tp_base_connection_init: Initializing
(TpBaseConnection *)0x6c808
22:48:14 GLIB DEBUG default - gabble_connection_init: Initializing
(GabbleConnection *)0x6c808
22:48:14 GLIB DEBUG default - tp_base_connection_constructor:
Post-construction: (TpBaseConnection *)0x6c808
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Handle repo for
type #0 at (nil)
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Handle repo for
type #1 at 0x6b320
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Handle repo for
type #2 at 0x6b350
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Handle repo for
type #3 at 0x5ebc0
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Handle repo for
type #4 at 0x6b380
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Channel factory
#0 at 0x6b400
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Channel factory
#1 at 0x6b430
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Channel factory
#2 at 0x6e410
22:48:14 GLIB DEBUG default - tp_base_connection_constructor: Channel factory
#3 at 0x5ece0
22:48:14 GLIB DEBUG default - gabble_connection_constructor: Post-construction:
(GabbleConnection *)0x6c808
22:48:14 GLIB DEBUG default - tp_base_connection_register: bus name
org.freedesktop.Telepathy.Connection.gabble.jabber.lucas_40maneos_2eorg_2fN810
22:48:14 GLIB DEBUG default - tp_base_connection_register: object path
/org/freedesktop/Telepathy/Connection/gabble/jabber/lucas_40maneos_2eorg_2fN810
22:48:16 GLIB DEBUG default - _gabble_connection_connect: disabling SRV because
"server" or "port" parameter specified, will connect to home.maneos.org
22:48:16 GLIB DEBUG default - do_connect: calling lm_connection_open
22:48:16 GLIB DEBUG default - tp_base_connection_change_status: was 4294967295,
now 1, for reason 1
22:48:16 GLIB DEBUG default - tp_base_connection_change_status: emitting
status-changed to 1, for reason 1
22:48:43 GLIB DEBUG default - do_auth: authenticating with username: lucas,
password: <hidden>, resource: N810 

and the server logs show:

=INFO REPORT==== 2009-05-12 22:48:43 ===
I(<0.272.0>:ejabberd_listener:116) : (#Port<0.7291>) Accepted connection
{{0,0,0
,0,0,65535,55723,33090},48659} -> {{0,0,0,0,0,65535,20923,21156},5222}

=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12746.0>:ejabberd_receiver:306) : Received XML on stream = "<?xml
version='
1.0' encoding='UTF-8'?>"

=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12746.0>:ejabberd_receiver:306) : Received XML on stream = "<stream:stream 
version=\"1.0\" xmlns=\"jabber:client\"
xmlns:stream=\"http://etherx.jabber.org/
streams\" to=\"maneos.org\" id=\"98850496166\">"

=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12747.0>:ejabberd_c2s:1352) : Send XML on stream = "<?xml
version='1.0'?><s
tream:stream xmlns='jabber:client'
xmlns:stream='http://etherx.jabber.org/stream
s' id='2015248111' from='maneos.org' version='1.0' xml:lang='en'>"

=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12747.0>:ejabberd_c2s:1352) : Send XML on stream =
"<stream:features><start
tls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/><mechanisms
xmlns='urn:ietf:params:
xml:ns:xmpp-sasl'><mechanism>DIGEST-MD5</mechanism><mechanism>PLAIN</mechanism><
/mechanisms><register
xmlns='http://jabber.org/features/iq-register'/></stream:f
eatures>"

=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12746.0>:ejabberd_receiver:306) : Received XML on stream = "<starttls
xmlns
=\"urn:ietf:params:xml:ns:xmpp-tls\" id=\"8461984942\"></starttls>\n"

=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12746.0>:shaper:61) : State: {maxrate,1000,0,1242164923130174}, Size=78
M=39.0, I=540.718


=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12746.0>:ejabberd_receiver:306) : Received XML on stream = []

=INFO REPORT==== 2009-05-12 22:48:43 ===
D(<0.12746.0>:shaper:61) : State: {maxrate,1000,72.11485122151463,
                                      1242164923670978}, Size=0
M=0.0, I=11.84


=INFO REPORT==== 2009-05-12 22:48:44 ===
D(<0.12746.0>:ejabberd_receiver:306) : Received XML on stream = []

=INFO REPORT==== 2009-05-12 22:48:44 ===
D(<0.12746.0>:shaper:61) : State: {maxrate,1000,36.05742561075731,
                                      1242164923682899}, Size=0
M=0.0, I=985.167


I couldn't reproduce later.

Other info: the libloudmouth used has the patch from bug 4119, and ipv6.ko was
loaded.
Comment 16 Andre Klapper maemo.org 2009-06-30 17:36:47 UTC
...so the question like always is: Does it still happen in Fremantle?
Probably too eaely to say... :-/
Comment 17 Lucas Maneos 2009-06-30 22:12:19 UTC
(In reply to comment #16)
> ...so the question like always is: Does it still happen in Fremantle?

Can't test with what's available in scratchbox, but if someone has a device
with enough marbles (I guess the accounts & presence applets plus the
telepathy/gabble/loudmouth libs stack) feel free to use the comment 10 details
to test the ipv6 flavour - I still haven't found a reliable way to reproduce
the other(s).

What would be even better is if someone with access to x86 binaries could run
telepathy-gabble under valgrind :-)
Comment 18 Andre Klapper maemo.org 2009-07-01 14:10:17 UTC
Internal comment: 

"I can't reproduce the bug by connecting to a port without a server behind it,
as the public bug suggests, but that appears to be a separate bug (pretty
bizarre — why would it deadloop on printing the "i'm going away, no-one needs
me!" message?)
So, step one is to figure out a way to reliably reproduce it..."
Comment 19 Lucas Maneos 2009-07-02 04:33:24 UTC
(In reply to comment #18)
> "I can't reproduce the bug by connecting to a port without a server behind it,
> [...]
> So, step one is to figure out a way to reliably reproduce it..."

Just in case it wasn't obvious: the configured server hostname needs to have
AAAA DNS records published in order to trigger that bug.

> but that appears to be a separate bug

I think so too, in fact there may have 3 different ones (original, ipv6 version
& the spinning-in-openssl one in comment 15).
Comment 20 Lucas Maneos 2009-07-11 20:23:38 UTC
I can tickle the comment 15 case (not always, but often enough to be useful -
maybe one time in three) by using iptables on the server side to simulate heavy
packet loss ("iptables -A OUTPUT -d $TABLETIP -m statistic --mode random
--probability 0.9 -j DROP").

With libc6-dbg, libcst0-dbg, libdbus-1-3-dbg, libdbus-glib-1-2-dbg,
libgconf2-6-dbg, libglib2.0-0-dbg, libloudmouth1-0-dbg & libtelepathy-glib0-dbg
installed the stack looks like this:

(gdb) bt
#0  0x4205b970 in EVP_MD_CTX_cleanup () from /usr/lib/libcrypto.so.0.9.8
#1  0x00000004 in ?? ()
Backtrace stopped: frame did not save the PC

There are no openssl -dbg packages as far as I can tell.
Comment 21 Lucas Maneos 2009-07-12 14:14:45 UTC
Created an attachment (id=1269) [details]
rich code dump for the openssl case

After rolling and installing my own libssl0.9.8-dbg package it becomes a bit
harder to trigger the bug, but eventually I did and got:

(gdb) bt
#0  0x410cfdac in read () from /lib/libc.so.6
#1  0x40104b54 in sock_read (b=0x80e48, out=0x3 <Address 0x3 out of bounds>, 
    outl=981) at bss_sock.c:139
#2  0x40102de0 in BIO_read (b=0x80e48, out=0xa5601, outl=981) at bio_lib.c:212
#3  0x40045f54 in ssl3_read_n (s=0x89100, n=3825, max=3825, extend=2844)
    at s3_pkt.c:198
#4  0x40046b58 in ssl3_read_bytes (s=0x89100, type=22, buf=0x9f580 "\002", 
    len=4, peek=0) at s3_pkt.c:315
#5  0x40047de4 in ssl3_get_message (s=0x89100, st1=4400, stn=4401, mt=-1, 
    max=16, ok=0xbe98831c) at s3_both.c:394
#6  0x40040bb0 in ssl3_get_server_certificate (s=0x89100) at s3_clnt.c:843
#7  0x40043818 in ssl3_connect (s=0x89100) at s3_clnt.c:298
#8  0x40053e08 in SSL_connect (s=0x89100) at ssl_lib.c:881
#9  0x4001680c in _lm_ssl_begin (ssl=0x6fc30, fd=-40208, 
    server=0x666e8 "maneos.org", error=0xbe9883dc) at lm-ssl-openssl.c:334
#10 0x400182cc in _lm_socket_ssl_init (socket=0x6cb80, delayed=1)
    at lm-socket.c:320
#11 0x40011cf4 in _lm_connection_starttls_cb (handler=0xfffffff5, 
    connection=0x70710, message=0x3d5, user_data=0x10) at lm-connection.c:780
#12 0x40013610 in _lm_message_handler_handle_message (handler=0xa5601, 
    connection=0xa5601, message=0x3d5) at lm-message-handler.c:47
#13 0x40012ac0 in connection_message_queue_cb (queue=0xfffffff5, 
    connection=0x70710) at lm-connection.c:307
#14 0x40014264 in message_queue_dispatch_func (source=0xfffffff5, 
    callback=0xa5601, user_data=0x3d5) at lm-message-queue.c:100
#15 0x4116e108 in IA__g_main_context_dispatch (context=0x61a80) at gmain.c:2045
#16 0x4116ffd4 in g_main_context_iterate (context=0x61a80, block=1, 
    dispatch=1, self=0x10) at gmain.c:2677
#17 0x4117034c in IA__g_main_loop_run (loop=0x604d8) at gmain.c:2881
#18 0x42236076 in tp_run_connection_manager (prog_name=0x0, 
    version=0x451f4 "0.6.2.1", construct_cm=0x4224ee70 <manager>, argc=16, 
    argv=0xbe988694) at run.c:235
#19 0x00010034 in ?? ()
Cannot access memory at address 0x0
Comment 22 Lucas Maneos 2009-07-12 15:03:10 UTC
Created an attachment (id=1270) [details]
Quick kludge to stop _lm_ssl_begin spinning
Comment 23 Mikhail Zabaluev nokia 2009-08-21 16:42:45 UTC
Fixed for Fremantle. Thanks to everybody involved.
Comment 24 Lucas Maneos 2009-10-18 12:54:13 UTC
The good news is that I haven't been able to reproduce this so far in
1.2009.41-10, but _lm_ssl_begin still looks scary to me.

Any pointers (where /is/ loudmouth's upstream source repository this week?) to
the actual fix appreciated to help verify, and also to see if it could be
backported properly to Diablo (in the context of
<http://wiki.maemo.org/Diablo_Community_Project>).
Comment 25 Lucas Maneos 2009-10-22 07:57:24 UTC
Marking patches of interest to Diablo (Maemo4) community updates, please excuse
the noise.
Comment 26 Andre Klapper maemo.org 2009-12-16 02:30:35 UTC
*** Bug 6441 has been marked as a duplicate of this bug. ***