maemo.org Bugzilla – Bug 8581
Reduce XMPP data packets to save battery
Last modified: 2010-06-18 13:18:12 UTC
You need to log in before you can comment on or make changes to this bug.
SOFTWARE VERSION: (Settings > General > About product) EXACT STEPS LEADING TO PROBLEM: (Explain in detail what you do (e.g. tap on OK) and what you see (e.g. message Connection Failed appears)) 1. 2. 3. Please see http://talk.maemo.org/showpost.php?p=497657&postcount=516 for some background. It looks like built in IM is the cause of poor battery life. Can N900 be smarter about handling this. N900 is especially power hungry when using 3G and not wifi Some suggestions - When no active IM conversation in place, i.e sitting idle waiting for an IM. Reduce the number of packets sent by 50%. Just means waiting for a new IM to 4 seconds v.s 2 seconds. no big deal - When an active conversation in place, revert back to old settings, when conversation is finished oe becomes idle, reduce packets again.. Simple change for a huge win on battery life? EXPECTED OUTCOME: ACTUAL OUTCOME: REPRODUCIBILITY: (always, less than 1/10, 5/10, 9/10) EXTRA SOFTWARE INSTALLED: OTHER COMMENTS: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7
Thanks for reporting this. For future reference, please always use the bug template and fill out all fields.
I can confirm this for the XMPP case (can't comment on skype). Steps: 1. Install tcpdump from tools repository. 2. Configure an XMPP account. 3. Take account online (but leave otherwise idle). 4. In a root shell session run "tcpdump -n -i wlan0 tcp port 5222" and watch the output. Outcome: One packet originating from the Maemo device every 33". This is quite excessive for TCP sessions. The Best Current Practice (RFC5382) for NAT gatways is to not expire TCP connections until at least 2 hours 4 minutes of idle state.
The observation in comment 2 was with a SIP/UDP account also enabled. With just the XMPP account active, the keepalives are sent approximately every 1'35" which is slightly better but still overkill.
Is the source code for this module open or closed? If its open, can you point me to the source and the tag for PR1.1 so I can play around with the intervals?
(In reply to comment #4) > Is the source code for this module open or closed? If its open, can you point > me to the source and the tag for PR1.1 so I can play around with the intervals? > I think rtcomm is open, the relevant library is called libpurple. Im not sure about skype, but skype's APIs are open which might help.
(In reply to comment #3) > The observation in comment 2 was with a SIP/UDP account also enabled. With > just the XMPP account active, the keepalives are sent approximately every 1'35" > which is slightly better but still overkill. Hmm. When both XMPP and SIP accounts are enabled, to which protocol the most frequent keepalives belong?
(In reply to comment #0) > - When no active IM conversation in place, i.e sitting idle waiting for an IM. > Reduce the number of packets sent by 50%. Just means waiting for a new IM to 4 > seconds v.s 2 seconds. no big deal With XMPP and SIP we are not polling for incoming IMs. It's just about keeping the NAT binding alive, otherwise we run into unrouted incoming calls and unreceived messages. For 3G connections, the default keepalive interval is now rarefied to 10 minutes, in assumption that typical 3G setups will not have draconian binding lifetimes.
(In reply to comment #6) > Hmm. When both XMPP and SIP accounts are enabled, to which protocol the most > frequent keepalives belong? Oh, I see it's filtered to XMPP. We need to check what our heartbeat stuff does there.
(In reply to comment #6) > (In reply to comment #3) > > The observation in comment 2 was with a SIP/UDP account also enabled. With > > just the XMPP account active, the keepalives are sent approximately every 1'35" > > which is slightly better but still overkill. > > Hmm. When both XMPP and SIP accounts are enabled, to which protocol the most > frequent keepalives belong? > XMPP and it depends from server - gtalk takes a most of battery but ovi.com doesn't. Skipe doesn't take too much if I compare it with GTALK. I have a single buddy in GTALK and a couple of them in skype and disconnection from gtalk saves me battery from 10 hours to 24+. I also have OVI.COM but no buddies.
(In reply to comment #7) > With XMPP and SIP we are not polling for incoming IMs. It's just about > keeping the NAT binding alive, otherwise we run into unrouted incoming calls > and unreceived messages. > For 3G connections, the default keepalive interval is now rarefied to 10 > minutes, in assumption that typical 3G setups will not have draconian binding > lifetimes. > Set it as an advanced parameter for each network connection separately. I have WLAN with 2 hours NAT time. But a typical parameters from nf_conntrack_proto_tcp.c are: static unsigned int tcp_timeouts[TCP_CONNTRACK_MAX] __read_mostly = { [TCP_CONNTRACK_SYN_SENT] = 2 MINS, [TCP_CONNTRACK_SYN_RECV] = 60 SECS, [TCP_CONNTRACK_ESTABLISHED] = 5 DAYS, [TCP_CONNTRACK_FIN_WAIT] = 2 MINS, [TCP_CONNTRACK_CLOSE_WAIT] = 60 SECS, [TCP_CONNTRACK_LAST_ACK] = 30 SECS, [TCP_CONNTRACK_TIME_WAIT] = 2 MINS, [TCP_CONNTRACK_CLOSE] = 10 SECS, [TCP_CONNTRACK_SYN_SENT2] = 2 MINS, };
Additional info from Linux "man 7 tcp": tcp_keepalive_time (integer; default: 7200) The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connection is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled. Additional info from "man 5 ssh_config" is: ServerAliveInterval Sets a timeout interval in seconds after which if no data has been received from the server, ssh(1) will send a message through the encrypted channel to request a response from the server. The default is 0, indicating that these messages will not be sent to the server, or 300 if the BatchMode option is set. This option applies to protocol version 2 only. ProtocolKeepAlives is a Debian-specific compatibility alias for this option. I think a default value 300secs for WLAN can be a good start point.
(In reply to comment #10) > Set it as an advanced parameter for each network connection separately. I have > WLAN with 2 hours NAT time. You can do that to some extent - check the profiles in /usr/share/osso-rtcom, they have keepalive parameters specific to connection types. No configuration specific to individual connections, though. > But a typical parameters from nf_conntrack_proto_tcp.c are: We want to survive less typical (but somewhat common) cases, too. Loss of functionality is a more serious problem than decreased battery life.
(In reply to comment #12) > (In reply to comment #10) > > Set it as an advanced parameter for each network connection separately. I have > > WLAN with 2 hours NAT time. > > You can do that to some extent - check the profiles in /usr/share/osso-rtcom, > they have keepalive parameters specific to connection types. No configuration > specific to individual connections, though. > > > But a typical parameters from nf_conntrack_proto_tcp.c are: > > We want to survive less typical (but somewhat common) cases, too. Loss of > functionality is a more serious problem than decreased battery life. > If it might help, the skype protocol is described here http://www1.cs.columbia.edu/~library/TR-repository/reports/reports-2004/cucs-039-04.pdf It seems to be a difficult protocol to control & block. I assume they might not allow tuning of any keepalive parameters? I know , for example, they have a file shared.xml which keeps a list of around 200 IPs to which the skype client talks, advertising its availability. If we could reduce this to 25 or 50 we could get an 8 fold or 4 fold decrease in network traffic which might help with the battery.
(In reply to comment #9) > XMPP and it depends from server - gtalk takes a most of battery but ovi.com > doesn't. Google talk servers send every client a whitespace every 30 seconds. At present we don't know if this can be controlled by the client. This still does not explain comment #3. Lucas, what was the service where you observed the difference between timing patterns depending on a SIP account being enabled? Can you reproduce this over, say, 5 minutes of use? I'm renaming this bug to narrow it down to XMPP protocol keepalives. If you have issues with other protocols, please file separate bugs. Appetites of the Skype client are known and followed internally, but we can't promise any fixes yet.
(In reply to comment #14) > This still does not explain comment #3. And I can't reproduce it now :-/ After watching about an hour's worth of tcpdump output XMPP packets are sent every ~ 93 seconds regardless of whether any SIP accounts are enabled. When SIP is enabled, the SIP keepalive interval is the same[1]. > Lucas, what was the service where you > observed the difference between timing patterns depending on a SIP account > being enabled? It's ejabberd 2.1.1 and asterisk 1.6.1 running on a local box, using a WLAN connection on the N900. [1] based on syslog getting spammed with these every 90-odd seconds: telepathy-sofiasip[4551]: outbound(0x48b90): FAILED to validate <URI> telepathy-sofiasip[4551]: outbound(0x48b90): FAILED with 200 OK (In reply to comment #12) > You can do that to some extent - check the profiles in /usr/share/osso-rtcom, /usr/share/osso-rtcom/google-talk.profile:Value-WLAN_INFRA = 120 /usr/share/osso-rtcom/google-talk.profile:Value-GPRS = 600 /usr/share/osso-rtcom/jabber.profile:Value-WLAN_INFRA = 120 /usr/share/osso-rtcom/jabber.profile:Value-GPRS = 600 /usr/share/osso-rtcom/nokiachat.profile:Value-WLAN_INFRA = 120 /usr/share/osso-rtcom/nokiachat.profile:Value-GPRS = 600 /usr/share/osso-rtcom/sip.profile:Value-WLAN_INFRA = 120 /usr/share/osso-rtcom/sip.profile:Value-GPRS = 600 These values don't seem to be used.
(In reply to comment #15) > It's ejabberd 2.1.1 and asterisk 1.6.1 running on a local box, using a WLAN > connection on the N900. > > [1] based on syslog getting spammed with these every 90-odd seconds: > > telepathy-sofiasip[4551]: outbound(0x48b90): FAILED to validate <URI> > telepathy-sofiasip[4551]: outbound(0x48b90): FAILED with 200 OK If it's a local box with no NAT between it and the N900, you can disable keepalives and "Discover public binding" in the account settings. > (In reply to comment #12) > > You can do that to some extent - check the profiles in /usr/share/osso-rtcom, > > /usr/share/osso-rtcom/google-talk.profile:Value-WLAN_INFRA = 120 > /usr/share/osso-rtcom/google-talk.profile:Value-GPRS = 600 > /usr/share/osso-rtcom/jabber.profile:Value-WLAN_INFRA = 120 > /usr/share/osso-rtcom/jabber.profile:Value-GPRS = 600 > /usr/share/osso-rtcom/nokiachat.profile:Value-WLAN_INFRA = 120 > /usr/share/osso-rtcom/nokiachat.profile:Value-GPRS = 600 > /usr/share/osso-rtcom/sip.profile:Value-WLAN_INFRA = 120 > /usr/share/osso-rtcom/sip.profile:Value-GPRS = 600 > > These values don't seem to be used. Check the keepalive period in the advanced account settings: is it set to something else than "auto"?
(In reply to comment #16) > Check the keepalive period in the advanced account settings: is it set to > something else than "auto"? It's "auto" for the SIP account, there is no keepalive interval option for XMPP.
(In reply to comment #15) > [1] based on syslog getting spammed with these every 90-odd seconds: > > telepathy-sofiasip[4551]: outbound(0x48b90): FAILED to validate <URI> > telepathy-sofiasip[4551]: outbound(0x48b90): FAILED with 200 OK It seems for me as not keepalive but protocol misunderstanding. SIP usually uses OPTIONS, re-registration or even INVITE to keep in touch with server.
(In reply to comment #12) > > But a typical parameters from nf_conntrack_proto_tcp.c are: > > We want to survive less typical (but somewhat common) cases, too. Loss of > functionality is a more serious problem than decreased battery life. > nf_conntrack values are difficult to change. And that values actually are NAT values in Linux based routers. However, it has sense to use 300secs just because it is a common timeout - look into SSH timeout, for exam. It is not common to establish less values in NAT just because total TCP setup/breakdown timeouts have the same range - any NAT timeout less this may have impact on TCP setup/breakdown functionality. (DISCLAIMER: that all is not true for fast internet projects but it is still in research phase) (2nd DISCLAIMER: in aggressive bittorrent environment the established NAT path may be broken faster than timeout value but it is actually impossible to operate in this environment for anybody besides bittorrent)
(In reply to comment #18) > It seems for me as not keepalive but protocol misunderstanding. It's a bit of both :-) Sample dialog: > OPTIONS sip:XXX@example.org SIP/2.0 > v:SIP/2.0/UDP YYY.YYY.YYY.YYY:52044;rport;branch=z9hG4bK8Zcjy9215mSpB > f:<sip:XXX@example.org>;tag=ZSKtpXHUjB82g > t:<sip:XXX@example.org> > i:xygJaE0EIzh02BKjvzi-AK > CSeq:126533051 OPTIONS > Accept:application/vnd.nokia-register-usage > s:REGISTRATION PROBE > l:0 < SIP/2.0 200 OK < Via: SIP/2.0/UDP YYY.YYY.YYY.YYY:52044;branch=z9hG4bK8Zcjy9215mSpB;received=YYY.YYY.YYY.YYY;rport=52044 < From: <sip:XXX@example.org>;tag=ZSKtpXHUjB82g < To: <sip:XXX@example.org>;tag=as36e4f2c7 < Call-ID: xygJaE0EIzh02BKjvzi-AK < CSeq: 126533051 OPTIONS < Server: Asterisk PBX 1.6.1.11 < Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO < Supported: replaces, timer < Contact: <sip:XXX.XXX.XXX.XXX> < Accept: application/sdp < Content-Length: 0 It seems harmless, though a bit redundant (it also does a REGISTER/401/REGISTER/200 round-trip before that which is more than enough to keep the session alive, plus asterisk sends its own periodic OPTIONS).
(In reply to comment #20) > (In reply to comment #18) > > It seems for me as not keepalive but protocol misunderstanding. > > It's a bit of both :-) Sample dialog: (skipped) That is a right way (may be syslog "FAILED" is just unwise choice of syslog message?) > It seems harmless, though a bit redundant (it also does a > REGISTER/401/REGISTER/200 round-trip before that which is more than enough to > keep the session alive, plus asterisk sends its own periodic OPTIONS). > It seems for me that the fact of reception of ANY message from server which should be ANSWERED should cancel the nearest keepalive message from client (N900). It decreases traffic twice in case of keepalives from server.
(In reply to comment #18) > > [1] based on syslog getting spammed with these every 90-odd seconds: > > > > telepathy-sofiasip[4551]: outbound(0x48b90): FAILED to validate <URI> > > telepathy-sofiasip[4551]: outbound(0x48b90): FAILED with 200 OK > > It seems for me as not keepalive but protocol misunderstanding. SIP usually > uses OPTIONS, re-registration or even INVITE to keep in touch with server. Sofia-SIP uses OPTIONS for keepalives. I'd like to see a packet dump, but it seems that the proxy also forces registration updates by setting a short expiration period for the REGISTER binding.
(In reply to comment #21) > It seems for me that the fact of reception of ANY message from server which > should be ANSWERED should cancel the nearest keepalive message from client > (N900). It decreases traffic twice in case of keepalives from server. For OPTIONS and REGISTER keepalives, we also want to detect NAT binding failures by examining rport/received parameters of the topmost Via header in the server's response (RFC 3581). Messages from the same proxy transport flow may be treated for bumping the timer, but nobody has come up with a Sofia-SIP patch yet. Feel free to file a separate SIP bug.
Hi guys, Has anyone looked at the FreeSwitch server who use the Sofia-SIP library too ? They may have improvements in the protocol that may resolve this issue. Cheers, Damien
(In reply to comment #24) > Has anyone looked at the FreeSwitch server who use the Sofia-SIP library too ? > They may have improvements in the protocol that may resolve this issue. If you find any, please send patches upstream at http://sourceforge.net/projects/sofia-sip/
Hi, The Freeswitch project have tried to push the patches upstream, but with not much success. In any case, their fisheye (http://fisheye.freeswitch.org/changelog/FreeSWITCH/libs/sofia-sip) shows all patches and bug fixes that they have applied. Cheers, Damien
A fix for this has been internally integrated in loudmouth 1.4.1-0osso10. Awaiting verification before changing status to FIXED here.
(In reply to comment #27) > A fix for this has been internally integrated in loudmouth 1.4.1-0osso10. > Awaiting verification before changing status to FIXED here. > Can you give us any indication of what was fixed. i.e what was improved and any tests to show how much battery life changed? Thanks!
Copying internal comments: "It will timebox the first heartbeat with something like (0, keepalive_interval) to potentially get in the "wave" with other clients, and after that use something like (keepalive_interval - 30, keepalive_interval)." "I checked for the batter life when Xmpp accounts are configured and connected via 3G/Wlan and i see that the battery life is more than 50% even after 14 hrs."
(In reply to comment #29) > Copying internal comments: > > "It will timebox the first heartbeat with something like (0, > keepalive_interval) to potentially get in the "wave" with other clients, and > after that use something like (keepalive_interval - 30, keepalive_interval)." > > "I checked for the batter life when Xmpp accounts are configured and connected > via 3G/Wlan and i see that the battery life is more than 50% even after 14 > hrs." > Great, when will this be integrated into a new code & released?
Patience. See comment 27, please. Will keep you informed.
This has been fixed in package loudmouth 1.4.1-0osso10+0m5 which is part of the internal build version 10.2010.06-8 (Note: 2009/2010 is the year, and the number after is the week.) A future public update released with the year/week later than this internal build version will include the fix. (This is not always already the next public update.) Please verify that this new version fixes the bug by marking this bug report as VERIFIED after the public update has been released and if you have some time. To answer popular followup questions: * Nokia does not announce release dates of public updates in advance. * There is currently no access to these internal, non-public build versions. A Brainstorm proposal to change this exists at http://maemo.org/community/brainstorm/view/undelayed_bugfix_releases_for_nokia_open_source_packages-002/
Setting explicit PR1.2 milestone (so it's clearer in which public release the fix will be available to users). Sorry for the bugmail noise (you can filter on this message).
So, part of this bug is fixed and according to internal testing provides longer battery life. However, Google Talk still sends whitespace pings every 30 seconds which is another issue that is NOT solved yet.
(In reply to comment #34) > However, Google Talk still sends whitespace pings every 30 seconds which is > another issue that is NOT solved yet. There's a Gabble bug tracking my proposed fix for this issue at <https://bugs.freedesktop.org/show_bug.cgi?id=27790>.
Not sure if it's related or if another bug should be open, but it seems that, even with an offline n900 (offline as no data connection, not completely offline), if the availability is set to “online” it'll drop the battery quite fast (as if it tries to connect even when there's no data connection, or something. Manually setting the availability to offline workarounds the problem but isn't really convenient.
(In reply to comment #36) > Not sure if it's related or if another bug should be open, but it seems that, > even with an offline n900 (offline as no data connection, not completely > offline), if the availability is set to “online” it'll drop the battery > quite fast (as if it tries to connect even when there's no data connection, or > something. > > Manually setting the availability to offline workarounds the problem but isn't > really convenient. Mission Control (the component which orchestrates connections) multiplies the time between reconnection attempts by two each time it fails, up to a maximum of half an hour. This should be infrequent enough not to cause this problem...
(In reply to comment #37) > Mission Control (the component which orchestrates connections) multiplies the > time between reconnection attempts by two each time it fails, up to a maximum > of half an hour. This should be infrequent enough not to cause this problem... I'll try to monitor stuff tonight, using following scenario: - charge device fully - disconnect all data connection - disconnect device from charger - move availability to online - let it sleep for some time - move availability to offline and see the difference in battery-eye.
(In reply to comment #38) > (In reply to comment #37) > > Mission Control (the component which orchestrates connections) multiplies the > > time between reconnection attempts by two each time it fails, up to a maximum > > of half an hour. This should be infrequent enough not to cause this problem... > > I'll try to monitor stuff tonight, using following scenario: > > - charge device fully > - disconnect all data connection > - disconnect device from charger > - move availability to online > - let it sleep for some time > - move availability to offline > > and see the difference in battery-eye. If you could also record the output of `dbus-monitor member=RequestConnection` to a file, we can also see how many times MC has tried to bring your account online. Given that and the time you've left it disconnected with the availability set to online, we can see if the reconnection backoff is working.
...and the related issue that Google Talk sends whitespace pings every 30 seconds has been fixed in telepathy-gabble 0.8.13-0maemo2+0m5 which is included in the internal version 2010.24-4 and will be available in one of the next public updates.