Skip to content
Snippets Groups Projects
  • Walter Doekes's avatar
    3c6f1199
    sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same thread · 3c6f1199
    Walter Doekes authored
    When fixing ASTERISK~24212, a change was done so a scheduled callback could not
    be removed while it was running. The caller of ast_sched_del would have to wait.
    
    However, when the caller of ast_sched_del is the callback itself (however wrong
    this might be), this new check would cause a deadlock: it would wait forever
    for itself.
    
    This changeset introduces an additional check: if ast_sched_del is called
    by the callback itself, it is immediately rejected (along with an ERROR log and
    a backtrace). Additionally, the AST_SCHED_DEL_UNREF macro is adjusted so the
    after-ast_sched_del-refcall function is only run if ast_sched_del returned
    success.
    
    This should fix the following spurious race condition found in chan_sip:
    - thread 1: schedule sip_poke_peer_now (using AST_SCHED_REPLACE)
    - thread 2: run sip_poke_peer_now
    - thread 2: blank out sched-ID (too soon!)
    - thread 1: set sched-ID (too late!)
    - thread 2: try to delete the currently running sched-ID
    
    After this fix, an ERROR would be logged, but no deadlocks (in do_monitor) nor
    excess calls to sip_unref_peer(peer) (causing double frees of rtp_instances and
    other madness) should occur.
    
    (Thanks Richard Mudgett for reviewing/improving this "scary" change.)
    
    Note that this change does not fix the observed race condition: unlocked
    access to peer->pokeexpire (and potentially other scheduled items in chan_sip),
    causing AST_SCHED_DEL_UNREF to look at a changing id. But it will make the
    deadlock go away. And in the observed case, it will not have adverse affects
    (like memory leaks) because the scheduled item is removed through a different
    path.
    
    ASTERISK-28282
    
    Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856
    3c6f1199
    History
    sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same thread
    Walter Doekes authored
    When fixing ASTERISK~24212, a change was done so a scheduled callback could not
    be removed while it was running. The caller of ast_sched_del would have to wait.
    
    However, when the caller of ast_sched_del is the callback itself (however wrong
    this might be), this new check would cause a deadlock: it would wait forever
    for itself.
    
    This changeset introduces an additional check: if ast_sched_del is called
    by the callback itself, it is immediately rejected (along with an ERROR log and
    a backtrace). Additionally, the AST_SCHED_DEL_UNREF macro is adjusted so the
    after-ast_sched_del-refcall function is only run if ast_sched_del returned
    success.
    
    This should fix the following spurious race condition found in chan_sip:
    - thread 1: schedule sip_poke_peer_now (using AST_SCHED_REPLACE)
    - thread 2: run sip_poke_peer_now
    - thread 2: blank out sched-ID (too soon!)
    - thread 1: set sched-ID (too late!)
    - thread 2: try to delete the currently running sched-ID
    
    After this fix, an ERROR would be logged, but no deadlocks (in do_monitor) nor
    excess calls to sip_unref_peer(peer) (causing double frees of rtp_instances and
    other madness) should occur.
    
    (Thanks Richard Mudgett for reviewing/improving this "scary" change.)
    
    Note that this change does not fix the observed race condition: unlocked
    access to peer->pokeexpire (and potentially other scheduled items in chan_sip),
    causing AST_SCHED_DEL_UNREF to look at a changing id. But it will make the
    deadlock go away. And in the observed case, it will not have adverse affects
    (like memory leaks) because the scheduled item is removed through a different
    path.
    
    ASTERISK-28282
    
    Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856