Bug 935 (int-128242)

Summary: RSS reader misbehavior on HTTP redirects (RFC 2616 violation)
Product: [Maemo Official Applications] RSS feed reader Reporter: Rhys Ulerich <rhys.ulerich>
Component: GeneralAssignee: unassigned <nobody>
Status: NEW QA Contact: rss-feed-reader-bugs
Severity: normal    
Priority: Low CC: andre_klapper, eero.tamminen, maemo, quim.gil, rhys.ulerich
Version: 5.0-beta2Keywords: enhancement-it2006, ITOS2007HE-garage
Target Milestone: ---   
Hardware: All   
OS: Maemo   
URL: http://tinyurl.com/57so4h

Description Rhys Ulerich (reporter) 2007-01-06 18:59:29 UTC
Taken from a thread on the maemo-users mailing list (referenced in URL above)

--8<--
 An interesting thing I have noticed with the RSS reader is that it
both follows HTTP 3xx redirect responses (rock!) AND saves the newly
received HTTP URL for the feed (not so rock...).

The latter behavior will silently destroy a whole folder of RSS feeds if--
a) you're using free wifi in a cafe, and
b) said wifi requires you to login through an HTTP proxy, and
c) the dumb user (that's me!) doesn't login prior to refreshing the feeds.

I couldn't figure out why none of my feeds would reload, investigated,
and found they all point to http://wifi-texas.com/login/78705SH/ after
I got some coffee.

As a suggestion, I think this update-URL-on-3xx behavior could be
improved by having the RSS reader save the new URL if and only if the
new URL contains valid feed content. 
--8<--


To which Andrew Flegg replied
--8<--
 On 1/6/07, Rhys Ulerich <rhys.ulerich[at]gmail.com> wrote:
>
> An interesting thing I have noticed with the RSS reader is that it
> both follows HTTP 3xx redirect responses (rock!) AND saves the newly
> received HTTP URL for the feed (not so rock...).
>
[snip]
> As a suggestion, I think this update-URL-on-3xx behavior could be
> improved by having the RSS reader save the new URL if and only if the
> new URL contains valid feed content.

It's a nasty misfeature that. I'd raise it on http://bugs.maemo.org/
as a serious bug: 302 means "Moved Temporarily", RFC 2616[1]
*specifically* says user agents (such as the RSS reader) should not
store the resulting URL.

301 means "moved permanently" and so the new URL to the feed could be
saved, although it would still be worth the sanity check you suggest.

If the wifi provider was using 301 rather than 302 to redirect to the
login page, *they're* the ones mis-reading the specs, so the
enhancement you suggest would be useful there.

Cheers,

Andrew

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.3 
--8<--

To which Nicola Larosa replied--

--8<--
 Andrew Flegg wrote:
> If the wifi provider was using 301 rather than 302 to redirect to the
> login page, *they're* the ones mis-reading the specs, so the
> enhancement you suggest would be useful there.

The practical situation of 3xx HTTP response code is a mess with historical
causes:

Redirect in response to POST transaction
http://ppewww.physics.gla.ac.uk/~flavell/www/post-redirect.html
--8<--
Comment 2 Andre Klapper maemo.org 2008-10-20 16:03:07 UTC
Rhys, can you please provide an example URL that will redirect?
Comment 3 Rhys Ulerich (reporter) 2008-10-20 16:53:24 UTC
(In reply to comment #2)
> Rhys, can you please provide an example URL that will redirect?
> 

I ran into the problem in a coffee shop that redirected all HTTP traffic to a
intro/login page.  A similar proxy setup should give you the redirect you need.

(Easier) You could setup two tinyurl.com redirects (one to a valid RSS feed,
and one to a regular web page) and use that as a test harness.  I am unaware if
tinyurl.com uses 301 or 302 redirects, but it should not matter for the "sanity
check" feature mentioned in the original bug.

Hope that helps,
Rhys
Comment 4 Andre Klapper maemo.org 2008-10-21 13:39:31 UTC
(In reply to comment #3)
> (Easier) You could setup two tinyurl.com redirects (one to a valid RSS feed,
> and one to a regular web page) and use that as a test harness.  I am unaware if
> tinyurl.com uses 301 or 302 redirects, but it should not matter for the "sanity
> check" feature mentioned in the original bug.

I created a tinyurl for my blog and added the tinyurl to the RSS news reader.
The icon shown for my blog in the left pane is the tinyurl icon.
The address stored and displayed in the User Interface is the tinyurl.

more /home/user/.osso_rss_feed_reader/feedlist.opml :
    <outline text="andre klapper's blog." title="andre klapper's blog."
description="andre klapper's blog." type="rss"
htmlUrl="http://blogs.gnome.org/aklapper" xmlUrl="http://tinyurl.com/57so4h"
updateInterval="-1" id="xjdjqeq" lastPollTime="1224585228" sortColumn="time"/>

So what are your htmlUrl and xmlUrl values?
Comment 5 Rhys Ulerich (reporter) 2008-10-21 18:22:03 UTC
I don't have my 770 available to check my  htmlUrl and xmlUrl values at the
moment-- here's the sleuthing I can do with the Firefox Live HTTP Headers
add-on when using the URL http://tinyurl.com/57so4h
:


GET /57so4h HTTP/1.1

Host: tinyurl.com

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.3)
Gecko/2008092510 Ubuntu/8.04 (hardy) Firefox/3.0.3

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive



HTTP/1.x 301 Moved Permanently

Location: http://blogs.gnome.org/aklapper/feed/

Content-Type: text/html

Content-Length: 0

Date: Tue, 21 Oct 2008 14:56:57 GMT

Server: TinyURL/1.6


Had TinyURL sent a 302 Moved Temporarily response, the xmlUrl value and favicon
you indicate would be correct.

However, TinyURL is sending a 301 Moved Permanently response.  From an RFC
purist perspective, your xmlUrl value should be
"http://blogs.gnome.org/aklapper/feed/".  Having a tinyurl.com xmlUrl after a
301 response seems, to me, a bug.  The htmlUrl appears to be from the <link/>
tag in the XML response from the final GET to
http://blogs.gnome.org/aklapper/feed/; the value
"http://blogs.gnome.org/aklapper" you report seems correct to me.  I am unsure
how the newsreader is constructing the URL to obtain the favicon, but
displaying a TinyURL favicon after a 301 also looks like a bug.

Others in this thread have indicated 301s and 302s tend to get mixed
inappropriately in practice, and the sanity check I originally suggested would
be to ensure that, after the 301 Moved Permanently response, the xmlUrl is only
updated if a GET to the 301 Location received valid RSS content in the
response.
After fixing the above two issues (xmlUrl and favicon incorrect after a 301), I
would add an additional piece of logic to ensure the xmlUrl and favicon are
only updated if the 301 Location URL gives you a valid RSS XML response.  You
could test this by pointing the newsreader to a TinyURL that redirects to some
static, non-RSS content.

I'm having a flashback to my RFC 3261 SIP specification days.  :)

Hope that helps,
Rhys
Comment 6 Andre Klapper maemo.org 2009-01-14 19:14:28 UTC
I don't expect any changes in Diablo to fix this (not a high priority).
No idea for Fremantle ...
Comment 7 Quim Gil nokia 2009-01-16 23:16:32 UTC
(In reply to comment #6)
> No idea for Fremantle ...

If you are so kind to provide a list of steps to reproduce I can give it a try.
I'm a bit lost with the current description.
Comment 8 timeless 2009-03-13 11:12:29 UTC
quim:
1. add a feed for http://tinyurl.com/57so4h
2. check the feed icon
3. open xterm
4. cat /home/user/.osso_rss_feed_reader/feedlist.opml

ideally for this case the icon should be from andre's blog

but more importantly, you should *not* see this:

htmlUrl="http://blogs.gnome.org/aklapper" xmlUrl="http://tinyurl.com/57so4h"

it should either have tinyurl in both places or blogs.gnome in both places. And
whichever it has should determine the favicon....

That's actually only half of the bug [301]

The other half is if the server sends [302], in which case it should retain the
url you provided instead of the redirect.

for 302, use http://timeless.justdave.net/maemo/andre-rss.pl

In current testing w/ diablo, both give the same results.

The right behavior for the 302 case is to retain the .pl url in both places. As
for which favicon to show, dunno. For kicks, I'm trying to supply a favicon for
the .pl file, however I have no opinion as to whether it should be shown :).
Comment 9 Quim Gil nokia 2009-03-13 15:12:28 UTC
No special icon to be seen, just the normal orange RSS icon.

About the rest, see for yourself:

BusyBox v1.10.2 (Debian 3:1.10.2.legal-1osso16) built-in shell (ash)
Enter 'help' for a list of built-in commands.

~ $ cat /home/user/.osso_rss_feed_reader/feedlist.opml
<?xml version="1.0"?>
<opml version="1.0">
  <head>
    <title>Liferea Feed List Export</title>
  </head>
  <body>
    <outline text="BBC News | News Front Page | World Edition" title="BBC News
| News Front Page | World Edition" description="BBC News | News Front Page |
World Edition" type="rss"
htmlUrl="http://news.bbc.co.uk/go/rss/-/2/hi/default.stm"
xmlUrl="http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml"
updateInterval="15" id="stgtyat" sortColumn="time"/>
    <outline text="BBC Sport | Sport Homepage | World Edition" title="BBC Sport
| Sport Homepage | World Edition" description="BBC Sport | Sport Homepage |
World Edition" type="rss"
htmlUrl="http://news.bbc.co.uk/go/rss/-/sport2/hi/default.stm"
xmlUrl="http://newsrss.bbc.co.uk/rss/sportonline_world_edition/front_page/rss.xml"
updateInterval="15" id="puhblxn" sortColumn="time"/>
    <outline text="Internet Tablet News" title="Internet Tablet News"
description="Internet Tablet News" type="rss" htmlUrl="http://nokia.com/n800"
xmlUrl="http://tableteer.nokia.com/rss/internettabletnews.xml"
updateInterval="-1" id="rgxfucg" sortColumn="time"/>
    <outline text="andre klapper's blog." title="andre klapper's blog."
description="andre klapper's blog." type="rss"
htmlUrl="http://blogs.gnome.org/aklapper" xmlUrl="http://tinyurl.com/57so4h"
updateInterval="-1" id="jfxsgye" lastPollTime="1236949408" sortColumn="time"/>
  </body>
</opml>
~ $
Comment 10 Quim Gil nokia 2009-05-10 03:00:48 UTC
Was this output useful?
Comment 11 Quim Gil nokia 2009-05-18 23:06:33 UTC
Hey, was that useful? Think that the RSS Feed Readser team is investing time
now bugfixing. Please help me trying to reproduce this bug. I'm your robot.
Just let me know what I need to do. Thanks!
Comment 12 Andre Klapper maemo.org 2009-06-04 17:11:37 UTC
moreinfo as per last comment
Comment 13 Lucas Maneos 2009-06-05 04:45:00 UTC
(In reply to comment #8)
> but more importantly, you should *not* see this:
> 
> htmlUrl="http://blogs.gnome.org/aklapper" xmlUrl="http://tinyurl.com/57so4h"
> 
> it should either have tinyurl in both places or blogs.gnome in both places. 

Actually I think those two are independent.  The htmlUrl value comes from the
link element found in the RSS document (not from the HTTP response Location:
header), and in any case the attribute is optional and could be omitted
altogether[1].  So it looks correct, and AFAICT osso-rss-feed-reader doesn't
use it for anything (at least in Diablo) anyway.

In the case of a 301 response the xmlUrl could be rewritten, but this is "only"
a SHOULD[2] so not stictly speaking a bug, and leaving it unmodified avoids the
original issue.

I guess rewriting the xmlUrl for 301 responses after successful validation of
the payload would be the most technically correct thing to do, but as it is
this could be considered FIXED sometime between Gregale and Diablo (the window
between the bug being opened and comment 4).

[1] <http://www.opml.org/spec2>:
> Optional attributes: description, htmlUrl, language, title, version. These 
> attributes are useful when presenting a list of subscriptions to a user, 
> except for version, they are all derived from information in the feed itself.
> 
> description is the top-level description element from the feed. htmlUrl is 
> the top-level link element.

[2] <http://www.ietf.org/rfc/rfc2616.txt>:
> 10.3.2 301 Moved Permanently
> 
>    The requested resource has been assigned a new permanent URI and any
>    future references to this resource SHOULD use one of the returned
>    URIs.  Clients with link editing capabilities ought to automatically
>    re-link references to the Request-URI to one or more of the new
>    references returned by the server, where possible. This response is
>    cacheable unless indicated otherwise.
Comment 14 Andre Klapper maemo.org 2009-07-14 17:48:02 UTC
Yay. Bugzilla changes tinyurl URLs automatically.

Hence:
So this should be in both cases  "http   tinyurl com mbug935"
Please add yourself the missing       ://       .   /


[From comment 8]
> 1. add a feed for blogs.gnome.org/aklapp...
> 2. check the feed icon
> 3. open xterm
> 4. cat /home/user/.osso_rss_feed_reader/feedlist.opml
> 
> but more importantly, you should *not* see this:
> htmlUrl="http://blogs.gnome.org/aklapper" xmlUrl="blogs.gnome.org/aklapp..."

This does not happen when using the direct blogs.gnome.org address (NOT using
tinyurl) in Fremantle. Both values start with "http://".

Trying again in Fremantle by using http://tinyurl.com/mbug935 as feed URL:
* tinyurl icon is used in RSS reader
* htmlUrl="http://blogs.gnome.org/aklapper"
* xmlUrl="http://tinyurl.com/mbug935"

[From comment 5]
> Had TinyURL sent a 302 Moved Temporarily response, the xmlUrl value and 
> favicon you indicate would be correct.
> TinyURL is sending a 301 Moved Permanently response.
> From an RFC purist perspective, your xmlUrl value should be
> "http://blogs.gnome.org/aklapper/feed/".
> Having a tinyurl.com xmlUrl after a 301 response seems a bug.


> for 302, use http://timeless.justdave.net/maemo/andre-rss.pl
> In current testing w/ diablo, both give the same results.
> The right behavior for the 302 case is to retain the .pl url in both places. 

cat /home/user/.osso_rss_feed_reader/feedlist.opml here is:
* htmlUrl="http://blogs.gnome.org/aklapper"
* xmlUrl="http://timeless.justdave.net/maemo/andre-rss.pl".

So this is still valid in Fremantle.
Comment 15 Lucas Maneos 2009-07-14 18:08:11 UTC
I don't think it's even valid in Diablo (see comment 13) - since xmlUrl is not
rewritten the user's feeds will not be destroyed by a misbehaving hotspot.