Bug 5524 (int-139791)

Summary: PulseAudio clients are muted and hang on exit when media player is playing music
Product: [Maemo Official Platform] Multimedia Reporter: Mikko Vartiainen <mvartiainen>
Component: PulseaudioAssignee: unassigned <nobody>
Status: VERIFIED FIXED QA Contact: pulseaudio-bugs
Severity: critical    
Priority: Low CC: andrew, andre_klapper, archebyte, davidfalkayn, eero.tamminen, jessi3k3, luarvique, maemo, milang, vdv100
Version: 5.0/(2.2009.51-1)Keywords: use-time
Target Milestone: 5.0/(10.2010.19-1)   
Hardware: N900   
OS: Maemo   
Attachments: Testcase for the bug
Compiled test case for the bug
The strace of VGB hung up inside the application menu, when it tried calling pa_simple_free().

Description Mikko Vartiainen (reporter) 2009-10-16 21:52:12 UTC
SOFTWARE VERSION:
1.2009.41-10

STEPS TO REPRODUCE THE PROBLEM:
1. Start playing music with media player.
2. Start some SDL app which uses audio (I tested with solarwolf,
tennix, vor, madbomber and ioquake3)
3. Try to exit the app

EXPECTED OUTCOME:
Program exits cleanly

ACTUAL OUTCOME:
Result is that SDL app hangs and it will not close until media player is
stopped. Opening and closing lens cover usually works

REPRODUCIBILITY:
always

OTHER COMMENTS:
Attached sdltest.c (sdltest is compiled binary) can be used as a test case too.
1. Start playing music with media player.
2. Start sdltest
3. sdltest should exit within about 1 second, but it hangs until music
is stopped (or killed with kill -9 )

It hangs somewhere in SDL deinitialization functions. Bug can be
avoided if all deinitialization functions are skipped, but it's not
supposed to be used that way.

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; fi; rv:1.9.1.3)
Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Comment 1 Mikko Vartiainen (reporter) 2009-10-16 21:54:25 UTC
Created an attachment (id=1443) [details]
Testcase for the bug
Comment 2 Mikko Vartiainen (reporter) 2009-10-16 21:55:29 UTC
Created an attachment (id=1444) [details]
Compiled test case for the bug
Comment 3 Andre Klapper maemo.org 2009-10-19 15:41:15 UTC
Internal comment:
"SDL is waiting for the mixer thread to finish and the mixer thread is waiting
for pulseaudio. This is not sdl specific, if you pacat file.wav while fmp is
playing it'll block too. This is most probably the policy"
Comment 4 Mikko Vartiainen (reporter) 2009-10-19 16:50:13 UTC
>This is most probably the policy

I'm not sure if I understood this. Does it mean that it's supposed to work that
way? For all applications which are affected by this it easily appears as the
whole phone is stuck, because it's not trivial for everybody to got out of the
frozen application.
Comment 5 Kaj-Michael Lang 2009-10-19 17:42:36 UTC
Sounds like the strange blocking I found with espeak when having media player
playing, and espeak is using pulseaudio driver. Isn't the whole idea with
pulseaudio that you can play multiple streams at the same time without others
getting blocked?
Comment 6 Mikko Vartiainen (reporter) 2009-10-19 17:51:57 UTC
removed libsdl from subject because it's not libsdl specific
Comment 7 Javier S. Pedro 2009-10-29 12:24:24 UTC
Does this also happen when a phone call is ongoing? I'm trying to guess if
https://garage.maemo.org/tracker/?func=detail&atid=3790&aid=4745&group_id=1014
is related to this or not.
Comment 8 Mikko Vartiainen (reporter) 2009-10-29 13:06:47 UTC
If drnoksnes is making some kind of sound deinitialization on "return to
launcher" game, I would say that it's the same problem. Test with playing music
while exiting the game.
Comment 9 Javier S. Pedro 2009-10-29 13:14:21 UTC
Yes, it's calling SDL_CloseAudio (fwiw, I had a lot of problems with that
function hanging even in Diablo, but those were due to weird pthread problems,
and mostly fixed by using static libgcc).

I am sure that it would hang while listening to music, but (rephrashing my
question) does CloseAudio() also hang while a phone call is active?
Comment 10 Mikko Vartiainen (reporter) 2009-10-30 21:19:17 UTC
Yes this is same issue, SDL_CloseAudio hangs also when phone call is active.
Comment 11 luarvique 2009-10-31 15:03:12 UTC
I am also getting this problem in my applications, although I am not using SDL.
Only using PulseAudio "simple" API.
Comment 12 luarvique 2009-11-03 22:48:46 UTC
Just got bitten by this bug really hard:

To save CPU cycles, I do pa_simple_free() every time my main window loses the
focus. When the Media Player is playing music, pa_simple_free() will hang until
you quit Media Player.

And now the worst part:
When you open the application menu, the main window loses focus. It calls
pa_simple_free() and hangs the application with the app menu open. In such a
state, it is not possible to kill the application or switch to a different
application. The "application not responding" dialog appears *under* the
application window and cannot be clicked. The only way out of this state is by
rebooting the device.

I still have not found a workaround for this. Looks like a problem with
PulseAudio. Also worth noting that when the Media Player is playing, I cannot
get audio from my own application. It does pa_audio_new() but there is no sound
at all (I guess it should mix with Media Player sound). As has been said
before, the eventual pa_audio_free() simply hangs.
Comment 13 Mikko Vartiainen (reporter) 2009-11-03 23:22:38 UTC
That's the exact behaviour I've experienced. Opening and closing lens cover
couple of time usually helps (camera app pauses the media player), but not
always.

I was also planning to close audio when focus is lost, but because it's so
unsafe I have disabled audio completely for some apps. This bug is going to
bite hundred+ applications based on that there is over 100 games for Diablo.
Comment 14 Eero Tamminen nokia 2009-11-04 14:57:10 UTC
(In reply to comment #12)
> And now the worst part:
> When you open the application menu, the main window loses focus. It calls
> pa_simple_free() and hangs the application with the app menu open. In such a
> state, it is not possible to kill the application or switch to a different
> application. The "application not responding" dialog appears *under* the
> application window and cannot be clicked. The only way out of this state is by
> rebooting the device.

Doesn't the Power menu "End task" work with this either?
Comment 15 luarvique 2009-11-04 15:04:58 UTC
(In reply to comment #14)
> Doesn't the Power menu "End task" work with this either?
No, it does not work.
Comment 16 luarvique 2009-11-04 15:14:32 UTC
All right, I got the PulseAudio 0.9.2 source code with Google CodeSearch and
did some research. Let us start with pa_simple_free() that hangs:

void pa_simple_free(pa_simple *s) {
    assert(s);
**  if (s->mainloop)
**      pa_threaded_mainloop_stop(s->mainloop);
    if (s->stream)
        pa_stream_unref(s->stream);
    if (s->context)
        pa_context_unref(s->context);
    if (s->mainloop)
        pa_threaded_mainloop_free(s->mainloop);
    pa_xfree(s);
}

I marked the suspicious code with the asterisks ("**"). Here is
pa_threaded_mainloop_stop():

void pa_threaded_mainloop_stop(pa_threaded_mainloop *m) {
    assert(m);
    if (!m->thread_running)
        return;
    assert(!in_worker(m));
    pthread_mutex_lock(&m->mutex);
**  pa_mainloop_quit(m->real_mainloop, 0);
    pthread_mutex_unlock(&m->mutex);
**  pthread_join(m->thread_id, NULL);
    m->thread_running = 0;
    return;
}

Again, my suspicion is that the thread never quits, so pthread_join() hangs.
So, let us check how pa_mainloop_quit() works:

void pa_mainloop_quit(pa_mainloop *m, int retval) {
    assert(m);
**  m->quit = 1;
    m->retval = retval;
**  pa_mainloop_wakeup(m);
}

void pa_mainloop_wakeup(pa_mainloop *m) {
    char c = 'W';
    assert(m);
    if (m->wakeup_pipe[1] >= 0)
        pa_write(m->wakeup_pipe[1], &c, sizeof(c));
}

Notice that the audio thread quits only when it gets to check m->quit. My guess
is that the audio thread *never* gets to checking m->quit. Instead, it hangs
waiting for some resource not available when the Media Player is playing. This
also explains why I am not hearing any of my own audio when the Media Player is
playing: the audio thread is stuck.

The next step would be to place some printf()s inside the above functions and
the audio thread, to see what exactly is going on. I do not have the
environment set up for this kind of job, so anyone else is welcome to try.
Comment 17 Eero Tamminen nokia 2009-11-04 15:19:18 UTC
(In reply to comment #15)
>> Doesn't the Power menu "End task" work with this either?
>
> No, it does not work. 

There's currently an issue in case where the application responds to system
("I'm alive"), but still doesn't close itself (cleanly) as requested. Fremantle
update will most likely change app to be unconditionally killed even when it
responds.


However, in the case here, the application was supposed to be frozen, so how it
can respond to the system?

What strace shows when tracing (strace -f -p <PID>) a process that is
supposedly frozen due to this bug?

Does top show the app using significant amounts of CPU?
(-> also use-time & performance issue)


(In reply to comment #15)
All right, I got the PulseAudio 0.9.2 source code with Google CodeSearch and
did some research. Let us start with pa_simple_free() that hangs:

Thanks for the analysis!  -> moving to pulseaudio.
Comment 18 luarvique 2009-11-04 15:34:08 UTC
(In reply to comment #17)
> What strace shows when tracing (strace -f -p <PID>) a process that is
> supposedly frozen due to this bug?
I am attaching the strace output to this bugtracker.

> Does top show the app using significant amounts of CPU?
> (-> also use-time & performance issue)
No, according to HTOP, CPU usage stays around 50% and is mostly due to the
Media Player PA process (23%) and MAFW-DBUS (11%). The second PA process stays
at 3% CPU usage.
Comment 19 luarvique 2009-11-04 15:36:30 UTC
Created an attachment (id=1534) [details]
The strace of VGB hung up inside the application menu, when it tried calling
pa_simple_free().
Comment 20 Eero Tamminen nokia 2009-11-04 15:58:18 UTC
(In reply to comment #19)
> Created an attachment (id=1534) [details] [details]
> The strace of VGB hung up inside the application menu, when it tried calling
> pa_simple_free().

Do you mean that it didn't hung, but was constantly doing this:
--------
rt_sigaction(SIGALRM, {0x12a68, [ALRM], SA_RESTART|0x4000000}, {0x12a68,
[ALRM], SA_RESTART|0x4000000}, 8) = 0
sigreturn()                             = ? (mask now [ABRT BUS USR1 SEGV PIPE
ALRM STKFLT CONT STOP])
futex(0x40a8f4d8, FUTEX_WAIT, 2762, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
-------
?

Or that it stopped doing even that?
Comment 21 luarvique 2009-11-04 16:03:16 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Created an attachment (id=1534) [details] [details] [details]
> > The strace of VGB hung up inside the application menu, when it tried calling
> > pa_simple_free().
> Do you mean that it didn't hung, but was constantly doing this:
> Or that it stopped doing even that?
No, it continues doing that when hung. Please keep in mind that I am using a
timer in my app, so the timer event may come from that timer.
Comment 22 Valério Valério maemo.org 2009-12-11 12:13:36 UTC
*** Bug 6840 has been marked as a duplicate of this bug. ***
Comment 23 Flandry 2009-12-26 10:28:27 UTC
*** Bug 6951 has been marked as a duplicate of this bug. ***
Comment 24 Mikko Vartiainen (reporter) 2010-01-14 19:57:30 UTC
Behaviour has changed for 51-1 firmware, but the bug remains

When application has hung on exit, ctrl-backspace now switches to task switcher
view. Taskswitcher has extra window labeled "unknown" (for sdl applications)
which can be closed. Closing the "unknown" window closes the main window too.
The application process is not killed but it is left running in background
using 0-1 % CPU (probably sdl main loop) killing battery time.

Tested with solarwolf and supertux.
Comment 25 Eero Tamminen nokia 2010-01-15 09:51:03 UTC
(In reply to comment #24)
> Behaviour has changed for 51-1 firmware, but the bug remains

The issues mentioned in comment 12 and comment 15 are fixed.


> When application has hung on exit, ctrl-backspace now switches to task
> switcher view. Taskswitcher has extra window labeled "unknown" (for sdl
> applications) which can be closed. Closing the "unknown" window closes
> the main window too.
>
> The application process is not killed but it is left running in background
> using 0-1 % CPU (probably sdl main loop) killing battery time.

-> use-time keyword.

If app is started by D-BUS (single invocation), and it doesn't exit
nor have anymore visible window, only way to restart it is rebooting the
device.  I.e. this is still also a reliability issue.
Comment 26 luarvique 2010-01-15 10:38:32 UTC
(In reply to comment #25)
> If app is started by D-BUS (single invocation), and it doesn't exit
> nor have anymore visible window, only way to restart it is rebooting the
> device.  I.e. this is still also a reliability issue.
This is really not an "application hangs" issue, but the "other applications
unable to play audio when Media Player playing" issue. The application hangups,
as serious as they are, are secondary to the audio issue.
Comment 27 Flandry 2010-01-16 20:34:59 UTC
This may be replacing the "no arrow key" localized layout blunder as my
favorite Maemo bizarre defect. What's needed to move forward with this?
Comment 28 Javier S. Pedro 2010-01-24 21:26:34 UTC
I don't know what causes the hang, but I know what causes games to be muted
(which indirectly causes the hang): by default, the policy.group gets assigned
to "othermedia" by the PA Nokia policy enforcer, and streams within that group
are muted when a "higher priority" one (mediaplayer) is ongoing.

Now, as revealed by using the PulseAudio CLI, Marbles (SDL game), Chess, and
Mahjong (Gtk+ games) are assigned to the "game" policy.group. This is done by
hardcoding the binary names in the /etc/pulse/xpolicy.conf file. This
policy.group is special in that it isn't muted when the MediaPlayer is playing,
and thus, the games don't hang.

Why wasn't this entire policy.group thingie documented is over me, btw. But I
guess that if the builtin games are using it we _should_ be able to use it as
well, without being in violation of any sound policy.


So a possible solution to this bug could be to just manually add each and every
game to the xpolicy.conf file, or to file an enhancement to Nokia for a
xpolicy.conf.d directory.
This works; adding: 

[stream]
exe = "vgba"
group = game

to xpolicy.conf fixes VGBA.


Another solution, and my personal favorite, would be to fix bug #7159 and then
create a fixed xpolicy.conf rule not unlike

[stream]
property = media.role@equals:"game"
group    = game 


Both solutions require fixes to client applications. In the first case, we
require Nokia to provide xpolicy.conf.d directory, and _every_ game author to
ship an extra file with the adequate 
In my second proposed solution, we need a new xpolicy.conf file, fix for #7159,
patch SDL to set PA's media.role to "game" (with this all SDL games should be
working already), and games not using SDL to select their "media.role"
appropriately. We also gain a "feature" (if the deadlock is ever fixed) where a
game could create two streams, one with media.role == "music" (thus muted when
the media player is going on), and the other one with media.role == "game"
(thus NOT muted).


As for the deadlock issue, it seems to also happen when trying to _free a
pa_simple connected to a suspended sink, or trying to write too much data to a
suspended sink. 

(bugsquad: may I ask for the summary to be changed to "PulseAudio clients are
muted and hang on exit when media player is playing music").
Comment 29 Andre Klapper maemo.org 2010-02-12 13:20:47 UTC
This has been fixed in package
policy-settings 0.4.10.1+0m5
which is part of the internal build version
10.2010.05-13
(Note: 2009/2010 is the year, and the number after is the week.)

A future public update released with the year/week later than this internal
build version will include the fix. (This is not always already the next public
update.)
Please verify that this new version fixes the bug by marking this bug report as
VERIFIED after the public update has been released and if you have some time.


To answer popular followup questions:
 * Nokia does not announce release dates of public updates in advance.
 * There is currently no access to these internal, non-public build versions.
   A Brainstorm proposal to change this exists at
http://maemo.org/community/brainstorm/view/undelayed_bugfix_releases_for_nokia_open_source_packages-002/
Comment 30 Mikko Vartiainen (reporter) 2010-02-12 13:34:08 UTC
Could you clarify how it was fixed?

Is it a half fix where all applications still require changes like Javier's
first proposal, or what I would consider a proper fix where applications do not
need any changes.
Comment 31 Andre Klapper maemo.org 2010-02-16 20:48:10 UTC
First comment:
"The scenario is fixed. But when we are playing some of mentioned game and we
get a call, answer it, press task switcher and return to the game (maybe play
some if call is boring) and end the game. Result is that game hangs on exit, to
recover we need to press power button and pick PHONE, then tap task switcher
and close game by tap "x"."

Second comment:
"by requirement, phone calls have the highest priority, so all the other
applications using audio are "frozen" when they attempt to access it. I don't
think there's any reasonable way to go through this in Fremantle."
Comment 32 luarvique 2010-02-16 21:04:58 UTC
(In reply to comment #31)
> "by requirement, phone calls have the highest priority, so all the other
> applications using audio are "frozen" when they attempt to access it. I don't
> think there's any reasonable way to go through this in Fremantle."
Of course there is a reasonable way. pa_mainloop has a data member
pa_mainloop->quit. When quitting, PulseAudio sets pa_mainloop->quit=1 and then
waits for the thread to end. The thread should be checking this data member
*even if it cannot play sound* and quit as soon as the data member is set to 1.
Comment 33 Javier S. Pedro 2010-02-16 21:37:40 UTC
(In reply to comment #31)
> "by requirement, phone calls have the highest priority, so all the other
> applications using audio are "frozen" when they attempt to access it. I don't
> think there's any reasonable way to go through this in Fremantle."
Er... but we're talking about the media player here, don't we? 

To me comment #31 reads like "nothing has been done at all". If that's not the
case, may I request for an extra bit of clarification?

Maybe I shouldn't have unified this under a single bug and should have filled
four bugs instead:
- Games are muted in phone call (WONTFIX according to policy)
- Games are muted while mediaplayer is playing (??? according to policy)
- Games muted by policy hang on exit (still not fixed!)
- Games hung by being muted by policy are unkillable by systemui (FIXED as per
previous comments).
Comment 34 Javier S. Pedro 2010-02-16 22:10:06 UTC
(In reply to comment #33)
> To me comment #31 reads like "nothing has been done at all". If that's not the
> case, may I request for an extra bit of clarification?

From conversation on IRC: "comment 31 says that media player playback should
not stall sdl, but still, phone call will". 
(OK; that's fine with me. Thanks!)

This leaves: 
- Pulseaudio clients muted by policy hang on exit (still not fixed!). 
(which iirc is all OSS stuff)
Comment 35 Eero Tamminen nokia 2010-02-17 09:52:45 UTC
(In reply to comment #34)
> (In reply to comment #33)
> > To me comment #31 reads like "nothing has been done at all". If that's not the
> > case, may I request for an extra bit of clarification?
> 
> From conversation on IRC: "comment 31 says that media player playback should
> not stall sdl, but still, phone call will". 
> (OK; that's fine with me. Thanks!)

I.e. this particular bug should be fixed.


> This leaves: 
> - Pulseaudio clients muted by policy hang on exit (still not fixed!). 
> (which iirc is all OSS stuff)

And for this there should be another bug (maybe with stuff from comment 32).
Comment 36 Andre Klapper maemo.org 2010-03-15 20:55:30 UTC
Setting explicit PR1.2 milestone (so it's clearer in which public release the
fix will be available to users).

Sorry for the bugmail noise (you can filter on this message).