Skip to content
Snippets Groups Projects
  • Università di Bologna - CESIA VoIP's avatar
    0c1c3866
    res_corosync: Fix crash in huge distributed environment. · 0c1c3866
    1) Fix memory-leaks
       Added code to release ast_events extracted from corosync and stasis messages
    
    2) Clean stasis cache when a member of the corosync cluster leaves the group
       Added code to remove from the stasis cache of the members remained on the
       group all the messages with the EID of the left member.
       If the device states of the left member remain in the stasis cache of other
       members, they will not be updated anymore and high priority cached values,
       like BUSY, will take precedence over current device states.
    
    3) Stop corosync event propagation when node is not joined to the group
       Updated dispatch_thread_handler code to detect when asterisk is not joined
       to the corosync group and added some condition in publish_event_to_corosync
       code to send corosync messages only when joined.
       When a node is not joined its corosync daemon can't send messages:
       the cpg_mcast_joined function append new messages to the FIFO buffer until
       it's full and then it blocks indefinitely.
       In this scenario if the stasis_message_cb callback, registered by
       res_corosync to handle stasis messages, try to send a corosync messages,
       the thread of the stasis thread-pool will be blocked until the node join
       the corosync cluster.
    
    ASTERISK-28888
    Reported by: Università di Bologna - CESIA VoIP
    
    Change-Id: Ie8e99bc23f141a73c13ae6fb1948d148d4de17f2
    0c1c3866
    History
    res_corosync: Fix crash in huge distributed environment.
    1) Fix memory-leaks
       Added code to release ast_events extracted from corosync and stasis messages
    
    2) Clean stasis cache when a member of the corosync cluster leaves the group
       Added code to remove from the stasis cache of the members remained on the
       group all the messages with the EID of the left member.
       If the device states of the left member remain in the stasis cache of other
       members, they will not be updated anymore and high priority cached values,
       like BUSY, will take precedence over current device states.
    
    3) Stop corosync event propagation when node is not joined to the group
       Updated dispatch_thread_handler code to detect when asterisk is not joined
       to the corosync group and added some condition in publish_event_to_corosync
       code to send corosync messages only when joined.
       When a node is not joined its corosync daemon can't send messages:
       the cpg_mcast_joined function append new messages to the FIFO buffer until
       it's full and then it blocks indefinitely.
       In this scenario if the stasis_message_cb callback, registered by
       res_corosync to handle stasis messages, try to send a corosync messages,
       the thread of the stasis thread-pool will be blocked until the node join
       the corosync cluster.
    
    ASTERISK-28888
    Reported by: Università di Bologna - CESIA VoIP
    
    Change-Id: Ie8e99bc23f141a73c13ae6fb1948d148d4de17f2