Bug 3945 - WiFi PSM keeps tablet from responding to external network requests when radio is idle
: WiFi PSM keeps tablet from responding to external network requests when radio...
Status: RESOLVED WORKSFORME
Product: Connectivity
WiFi
: 5.0/(1.2009.41-10)
: All Maemo
: Low normal with 1 vote (vote)
: 5.0/(2.2009.51-1)
Assigned To: unassigned
: wifi-bugs
:
:
:
:
  Show dependency tree
 
Reported: 2008-12-20 20:23 UTC by Vincent Lefevre
Modified: 2010-05-30 17:34 UTC (History)
9 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Vincent Lefevre (reporter) 2008-12-20 20:23:41 UTC
SOFTWARE VERSION:
5.2008.43-7

STEPS TO REPRODUCE THE PROBLEM:
1. Start a wifi connection.
2. Wait.
3. From a remote machine, do a "ping n810" (or other kind of connection, e.g.
ssh).

EXPECTED OUTCOME:
The N810 should respond to ping.

ACTUAL OUTCOME:
Though the wifi connection is still on, I get the following error messages:
PING n810.vinc17.org (192.168.0.7): 56 data bytes
ping: sendto: No route to host
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: No route to host
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
[...]

After starting the web browser with a connection to some site:
[...]
ping: sendto: Host is down
ping: sendto: No route to host
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
64 bytes from 192.168.0.7: icmp_seq=87 ttl=64 time=41.024 ms
64 bytes from 192.168.0.7: icmp_seq=88 ttl=64 time=77.591 ms
64 bytes from 192.168.0.7: icmp_seq=89 ttl=64 time=303.950 ms
[...]

REPRODUCIBILITY:
sometimes

EXTRA SOFTWARE INSTALLED:

OTHER COMMENTS:
I think this problem started to appear with diablo. IIRC, I didn't notice it
with chinook.
Comment 1 Ryan Abel maemo.org 2008-12-21 03:47:27 UTC
I'm quite certain this is WONTFIX, there's really no reasonable workaround.

If it really bothers you, turn off WiFi PSM (but expect to measure battery life
idle on WiFi in terms of hours instead of days).[1] Your options are, good
battery and poor response rates (as it wont respond when the radio's off) or
poor battery life and good response rates. See the wiki page for details on
workarounds.

[1]http://wiki.maemo.org/Wifi_Power_Saving_Mode_(PSM)
Comment 2 Ryan Abel maemo.org 2008-12-21 03:48:28 UTC
Revising summary and setting severity.
Comment 3 Vincent Lefevre (reporter) 2008-12-22 01:53:08 UTC
(In reply to comment #1)
> If it really bothers you, turn off WiFi PSM

I don't understand. PSM is active *all the time* on my N810, while the problem
I describe occurs *sometimes*.

Has anything changed between chinook and diablo?
Comment 4 Ryan Abel maemo.org 2008-12-22 03:10:33 UTC
(In reply to comment #3)
> (In reply to comment #1)
> > If it really bothers you, turn off WiFi PSM
> 
> I don't understand. PSM is active *all the time* on my N810, while the problem
> I describe occurs *sometimes*.
> 

WiFi PSM turns the radio off when it's not in use (100ms timeout), so if you
don't have any network activity on the tablet, it wont be listening.

If you don't believe me, let's be scientific about it. :) Turn off PSM as
described in the wiki page then tell me how often ping works.
Comment 5 Vincent Lefevre (reporter) 2008-12-22 03:34:49 UTC
(In reply to comment #4)
> WiFi PSM turns the radio off when it's not in use (100ms timeout), so if you
> don't have any network activity on the tablet, it wont be listening.

This is not what I can observe: with PSM enabled (and charger cable unplugged),
my N810 still responds to ping's, even though there was no network activity
before the ping's (according to netstat). So, this does not explain why
sometimes all ping's work and sometimes none of them work.

BTW, from what I've heard, the PS-Poll protocol has the effect to reduce the
latency, but one should still be able to connect to the device.
Comment 6 tz 2008-12-23 00:18:56 UTC
I think the inactivity uses one of the other timers - it doesn't just turn off
completely after 100mS, it seems to take a minute or two of complete network
inactivity to lose pings.

Also note ping on the tablet requires root privilege (I've added something to
my /etc/sudoers.d to allow a userland ping).

I solve this by having the process monitor statusbar applet include a command
that does network activity, so if it shuts down, I can just ping something to
turn it back on.

But an interesting question is if the network is idle anyway (no sends or
receives), how much power does it take just to leave it on all the time?

A possible alternative would be to turn it on for one or two seconds every
15-60 so that a ping or other packet would be received.
Comment 7 Andre Klapper maemo.org 2008-12-23 01:26:31 UTC
So what is a good way to reproduce this reliably, step by step?
Because I've never seen it myself.
Comment 8 Lucas Maneos 2008-12-23 07:51:00 UTC
PSM, when working correctly on both ends, may introduce latency but shouldn't
cause frames to be lost.  Kalle Valo has posted
(http://lists.maemo.org/pipermail/maemo-users/2008-July/034755.html) some more
details on how it works.

I've seen this myself (very occasionally, and only in Diablo IIRC) but can't
say for certain whether it's the tablet's fault (it could be the AP or external
RF interference for example).  When this happens here it causes ARP to fail
(also looks like that in the ping output above) which (if there actually is a
bug) may point to the broadcast TIM case.  FWIW my AP is a Buffalo WBR-G54
running OpenWrt 0.9 (whiterussian).

I can't reproduce on demand, but if there's any data I can get from either end
next time it happens I'll post it here.
Comment 9 tz 2008-12-23 16:40:16 UTC
Leave the browser off, no RSS feeds (or shut off updates), etc. so nothing will
turn the wifi on in the powersave mode.

Leave xterm up.

SSH into the tablet over wifi - if you can't, try pinging something from xterm.
do a ls or cd or something.

Wait 5-10 minutes.

Hit return on SSH.  Nothing happens.  Hit return a few more times.

Do a ping from xterm and the multiple returns will be handled.

Wait/repeat.
Comment 10 Vincent Lefevre (reporter) 2008-12-24 05:18:34 UTC
(In reply to comment #9)
> Hit return on SSH.  Nothing happens.  Hit return a few more times.

Here, after SSH'ing to the tablet and waiting for 5-10 minutes or more, I can
still hit return and it is taken into account immediately. But this is probably
because I use:

ServerAliveInterval 300

in my .ssh/config file. This means that when a SSH session to the tablet is
active, the tablet will never turn the radio off permanently. And though the
radio hasn't been turned off, the battery icon still shows a full battery after
5 hours. This means that standard PSM (PS-Poll protocol) is sufficient to have
good battery life, and turning the radio off permanently doesn't probably
improve it much.
Comment 11 Vincent Lefevre (reporter) 2008-12-26 17:02:07 UTC
Also, I observed the same problem with the power cable plugged in. So, if
there's any drawback due to PSM, there should also be an option to disable it
automatically in such a case (but from my comment #10, this may not be
necessary).
Comment 12 Kalle Valo nokia 2009-01-02 15:47:48 UTC
Few comments:

Lucas is correct, power save mode increases latency and there should not be any
packet loss. Well, with multicast transmission packet loss is possible in very
special situations (lots of collisions etc), but it's the same for wired
network as well so it doesn't make a big difference.

Also power save problems are _very_ AP specific problems, even different
firmware and hardware revisions can work differently. So it's not much of use
to test with totally different APs. I usually recommend filing a separate bug
for each AP having power save problems. 

And almost always the power save problems are because of bugs in AP, not in the
tablet. I also recommend testing with other client devices which make heavy use
of IEEE 802.11 power save features.
Comment 13 Vincent Lefevre (reporter) 2009-01-02 20:43:48 UTC
I could reproduce the bug (several times) with:
  * Netopia Cayman 3347WEU, firmware 7.5.0 (i.e., the latest for this model).
  * D-Link DSL-G624T (original firmware).
Comment 14 Lucas Maneos 2009-02-05 16:09:51 UTC
I saw it again just now (not sure exactly when the problem occured as I had
left the N810 alone overnight).  An N800 connected to the same AP was fine.

I don't know if they're helpful but these are the cx3110x kernel messages on
the N810 since the last association:

[42412.445312] cx3110x: associated to 00:07:40:xx:xx:xx (bcn 100 msec, DTIM 3).
[42412.453125] cx3110x: PSM disabled.
[42413.539062] cx3110x: PSM dynamic with 100 ms CAM timeout.
[53519.539062] cx3110x: WARNING chip requested DMA rx transfer of a zero length
frame
[58344.046875] cx3110x: WARNING prism_softmac_frame_tx_done() returned an empty
frame.
[61708.921875] cx3110x: PSM dynamic with 200 ms CAM timeout.

(I wasn't running syslogd at the time, sorry)

At that point disconnecting and reconnecting fixed things.
Comment 15 Martin Runge 2009-03-04 22:50:48 UTC
Hi .*,

I observed the same problems with a n810 with diablo with default settings.

Rebooting the device fixes the problem. It looks like accessing the device from
extern works as long as it is in range of the access point (of cause), but
shows the described problems when it was out of reach and came into reach again
with connection reestablished automatically, but not every time.

I just rebooted it with nearly empty battery (< 25%) and the problem was gone.
I  observed the problem with full battery in the past, so I guess its no power
saving issue.
Comment 16 Andre Klapper maemo.org 2009-03-24 14:52:51 UTC
As far as I have been told by Nokia only major and critical network bugs might
be fixed for Diablo, so this might become a WONTFIX.
Looking forward to Fremantle beta and its code changes and updates with regard
to this issue.

(Confirming as per several comments here.)
Comment 17 Vincent Lefevre (reporter) 2009-03-26 02:50:41 UTC
The fact that the radio suddenly gets idle *during* a rsync (as this has just
happened) looks like a major bug to me.
Comment 18 Klaus Anderson nokia 2009-03-26 17:11:23 UTC
(In reply to comment #17)
> The fact that the radio suddenly gets idle *during* a rsync (as this has just
> happened) looks like a major bug to me.

the timeout for going into idle (or actually power save mode) is 200ms, i.e.
it's enough if there is a 200ms gap in outgoing traffic.

BTW, I also added a bit more explanation and reference to Kalle's mail to the
wiki page mentioned in comment #1.
Comment 19 Vincent Lefevre (reporter) 2009-03-26 19:14:12 UTC
(In reply to comment #18)
> the timeout for going into idle (or actually power save mode) is 200ms, i.e.
> it's enough if there is a 200ms gap in outgoing traffic.

That's strange because this would mean that the wifi problem would occur very
often. However most of the time I can connect to my N810 (the problem occurred
much more frequently in the past, and I don't know what has changed).
Comment 20 Klaus Anderson nokia 2009-03-27 07:33:47 UTC
(In reply to comment #19)
> That's strange because this would mean that the wifi problem would occur very
> often. However most of the time I can connect to my N810 (the problem occurred
> much more frequently in the past, and I don't know what has changed).

as said on the wiki page, the problems in PSM are typically random at nature.
I.e. if you have buggy AP (or the tablet end is buggy), it might work perfectly
fine for some (even long) time before failing. And even when failing, often
some outgoing data resets the situation and it starts working again.

Have you tried adjusting the power save values as suggested on the wiki page?
Does the problem go away if you disable it completely (this will drain your
battery quite fast though) or does it become less frequent if you use the
intermediate setting?
Comment 21 Vincent Lefevre (reporter) 2009-04-16 14:16:59 UTC
(In reply to comment #20)
> Have you tried adjusting the power save values as suggested on the wiki page?
> Does the problem go away if you disable it completely (this will drain your
> battery quite fast though) or does it become less frequent if you use the
> intermediate setting?

I still got the problem after increasing the value. I don't know whether it
became less frequent. In both cases, the problem is quite rare, but without
logs, it's difficult to say. I don't want to disable it completely because it
can take time to get the problem reproduced and I don't want to drain my
battery. BTW, there should be an option to disable it automatically when the
power cable is plugged in.
Comment 22 Andre Klapper maemo.org 2009-09-28 19:14:27 UTC
This is unfortunately a WONTFIX for Maemo4 as Maemo4 is in maintenance mode and
Nokia will only provide bugfixes for critical issues if at all (for your
interest the Mer project aims to provide a community backport of Maemo5 for
N8x0 devices. See http://wiki.maemo.org/Mer for more information.)

What interests me is if somebody can still reproduce this in Maemo5 with an
N900, once it is out.
Please feel encouraged to update this bug report by adding a comment and
removing the "moreinfo" keyword if it is still an issue in Maemo5.
Thanks!
Comment 23 Vincent Lefevre (reporter) 2009-09-28 20:19:20 UTC
FYI, this bug almost never occurred in the last few months, even though the
conditions seem to be identical (same router, same place...).

I pre-ordered a N900, so that I can tell you if I notice the same problem.
Comment 24 tz 2009-09-28 20:20:49 UTC
When I get an n900 (maybe you can push PUSH to get me one as I have several
ideas submitted), I need to test this, plus the BT pings killing wifi, and some
GPS stuff.  There isn't a category, but it would be useful to somehow tag
things that need to be "Verified if they still exist or not on the n900".  Then
have people go verify them.  I plan on revisiting the list, but that won't
happen until I have a device.
Comment 25 Andre Klapper maemo.org 2009-09-28 20:26:23 UTC
(In reply to comment #24)
> When I get an n900 (maybe you can push PUSH to get me one as I have several
> ideas submitted)

I'm not in the position for this unfortunately - maybe Quim.
Entirely up to Nokia... :-/

> There isn't a category, but it would be useful to somehow tag
> things that need to be "Verified if they still exist or not on the n900".  

From time to time I crawl through the open tickets (like today) and ping
people. In general everything that does not have the Version field set to 5.0*
needs retesting (hehe) so no new tag needed, but you're free to use the Status
Whiteboard entry by using some namespace, e.g. "tz[n900retest]" if you really
want to, but I'd say that query.cgi options are sufficient (e.g. search for
tickets with reporter set to a certain email address).
Comment 26 Donn Morrison 2009-10-23 11:58:37 UTC
(In reply to comment #22)
> What interests me is if somebody can still reproduce this in Maemo5 with an
> N900, once it is out.
> Please feel encouraged to update this bug report by adding a comment and
> removing the "moreinfo" keyword if it is still an issue in Maemo5.

I couldn't remove the moreinfo keyword. I've noticed this with the N900. ssh
eventually stops working until some network activity is initiated from the
device. For example, while doing some development on the N900 over ssh, I had
to open a separate ssh session to my machine from the N900 and keep top running
in order to have a reliable ssh development session.

As far as I can tell, no packets are actually lost. Keypresses just have really
long latency (30 seconds or more), depending on when network activity is
initiated from the N900.

I'm using a Linksys WRT54GL router running dd-wrt.
Comment 27 Andre Klapper maemo.org 2010-01-14 12:32:18 UTC
Today Nokia released the Maemo5 update version 2.2009.51-1 for public (also
called "PR1.1" sometimes).
If you have some time we kindly ask you to test again if the problem reported
here still happens in this new version - just leave a comment (and feel free to
update the "Version" field to the new version if it's still a problem).
Comment 28 Vincent Lefevre (reporter) 2010-01-15 01:22:36 UTC
I've done some tests for several days with my N900, and haven't noticed this
problem. But it may be too soon to have a definitive answer...
Comment 29 Andre Klapper maemo.org 2010-01-19 19:56:07 UTC
Vincent, thanks for retesting (and great that it seems to work now for you). 

Maybe the other testers here can also retest this with 2.2009.51-1?
-> moreinfo
Comment 30 Andre Klapper maemo.org 2010-04-23 17:41:31 UTC
Closing as WORKSFORME for the time being as Vincent cannot reproduce it
anymore.
If anybody else still sees this with 3.2010.02-8 or later, please reopen.
Comment 31 giorgio 2010-05-30 17:22:35 UTC
i can't download by gestore applications, but the connection is good... i can
go to web and dowloading image, sound, and video, but not programs.. can you
help me with my n900?
Comment 32 Andre Klapper maemo.org 2010-05-30 17:34:14 UTC
(In reply to comment #31)
> i can't download by gestore applications

Sounds unrelated to this bug report.
Please ask for support in http://talk.maemo.org instead.