顶顶通-FreeSWITCH二次开发接口

顶顶通-FreeSWITCH二次开发接口

最近编译了fs 1.10.10-dev也就是 master版本(2023年7月6日) 给几个客户升级了一下,发现非常不稳定(每天都有几个通道卡在early状态),最近才有空来分析原因。

之前跑的是1.10.8 release 版本,从来没出现过这个问题,我把 1.10.8的代码和1.10.10-dev的代码整体对比了一下,整体改变不大,其中 switch_core_media_set_codec 函数的修改引擎我的警惕,因为这个修改时直接修改了互斥锁
switch_core_session_lock_codec_write(session);
switch_core_session_lock_codec_read(session);
的锁定范围。我查看了一下这个提交说明

2023年6月12日 这个提交给合并到到了master

SHA-1: 645b610e7968594910dd2368c511fbc93c1373b7

  • Merge pull request #2109 from signalwire/switch_core_media_set_codec

[Core] Fix possible deadlock in switch_core_media_set_codec()

晚上 我联系客户 看看有没有卡在early状态的通道,刚好有一个,把所有线程的调用堆栈 输出来, 还真是卡在 switch_core_media_set_codec 。

#0  0x00007f9dc597a54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9dc5975eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007f9dc5975daf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f9dc8687759 in fspr_thread_mutex_lock (mutex=<optimized out>) at locks/unix/thread_mutex.c:92
#4  0x00007f9dc850e895 in switch_mutex_lock (lock=<optimized out>) at src/switch_apr.c:301
#5  0x00007f9db9c9029c in sofia_receive_message (session=0x7f9cf5b4a7c8, msg=0x7f9d345ac4d0) at mod_sofia.c:1526
#6  0x00007f9dc853f80b in switch_core_session_perform_receive_message (session=session@entry=0x7f9cf5b4a7c8, message=message@entry=0x7f9d345ac4d0, file=file@entry=0x7f9dc86a961d "src/switch_core_io.c", func=func@entry=0x7f9dc86a9970 <__func__.19414> "switch_core_session_read_frame", line=line@entry=416) at src/switch_core_session.c:930
#7  0x00007f9dc854aa7e in switch_core_session_read_frame (session=session@entry=0x7f9cf5b4a7c8, frame=frame@entry=0x7f9d345ae568, flags=flags@entry=0, stream_id=stream_id@entry=0) at src/switch_core_io.c:416
#8  0x00007f9dc8608251 in switch_ivr_sleep (session=0x7f9cf5b4a7c8, ms=100, sync=<optimized out>, args=0x0) at src/switch_ivr.c:294
#9  0x00007f9daeb4dcc5 in ?? () from /ddt/fs/mod/mod_cti.so
#10 0x00007f9dc8543b5b in switch_core_session_exec (session=0x7f9cf5b4a7c8, application_interface=application_interface@entry=0x19077a0, arg=arg@entry=0x7f9d345aeb40 "38981 2000") at src/switch_core_session.c:2964
#11 0x00007f9dc85441ef in switch_core_session_execute_application_get_flags (session=<optimized out>, app=0x7f9db140b89b "cti_wait_for_answer", arg=0x7f9d345aeb40 "38981 2000", flags=<optimized out>) at src/switch_core_session.c:2824
#12 0x00007f9daeb8eb7e in ?? () from /ddt/fs/mod/mod_cti.so
#13 0x00007f9daeb92619 in ?? () from /ddt/fs/mod/mod_cti.so
#14 0x00007f9dc853c9c5 in switch_core_session_thread_pool_worker (thread=0x7f9d45af47a0, obj=<optimized out>) at src/switch_core_session.c:1790
#15 0x00007f9dc868d760 in dummy_worker (opaque=0x7f9d45af47a0) at threadproc/unix/thread.c:151
#16 0x00007f9dc5973ea5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f9dc4fc7b0d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)):




#0  0x00007f9dc597a54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9dc5975eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x00007f9dc5975daf in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f9dc8687759 in fspr_thread_mutex_lock (mutex=<optimized out>) at locks/unix/thread_mutex.c:92
#4  0x00007f9dc850e895 in switch_mutex_lock (lock=<optimized out>) at src/switch_apr.c:301
#5  0x00007f9dc852c6ec in switch_core_session_lock_codec_read (session=<optimized out>) at src/switch_core_codec.c:74
#6  0x00007f9dc8561e7c in switch_core_media_set_codec (session=session@entry=0x7f9cf5b4a7c8, force=force@entry=0, codec_flags=0) at src/switch_core_media.c:3608
#7  0x00007f9dc856763e in switch_core_media_activate_rtp (session=0x7f9cf5b4a7c8) at src/switch_core_media.c:8565
#8  0x00007f9db9cf73de in sofia_media_activate_rtp (tech_pvt=tech_pvt@entry=0x7f9cf5b53ff8) at sofia_media.c:58
#9  0x00007f9db9cf745e in sofia_media_tech_media (tech_pvt=tech_pvt@entry=0x7f9cf5b53ff8, r_sdp=<optimized out>, type=type@entry=SDP_TYPE_REQUEST) at sofia_media.c:189
#10 0x00007f9db9ccab5a in sofia_handle_sip_i_state (de=0x7f9d983bcea0, tags=0x7f9d882ae950, sip=0x0, sofia_private=<optimized out>, nh=0x7f9da1db69b0, profile=0x1ac35070, nua=0x7f9d98040be0, phrase=0x7f9d882aece6 "Ringing", status=183, session=0x7f9cf5b4a7c8) at sofia.c:7683
#11 our_sofia_event_callback (event=nua_i_state, status=<optimized out>, phrase=0x7f9d882aece6 "Ringing", nua=0x7f9d98040be0, profile=0x1ac35070, nh=0x7f9da1db69b0, sofia_private=0x7f9d1927a640, sip=0x0, de=de@entry=0x7f9d983bcea0, tags=0x7f9d882ae950) at sofia.c:1813
#12 0x00007f9db9ccea5b in sofia_process_dispatch_event (dep=0x7f9d2f76a2c0) at sofia.c:2253
#13 0x00007f9db9c8ffc9 in sofia_receive_message (session=0x7f9cf5b4a7c8, msg=0x7f9d2f76aa20) at mod_sofia.c:1347
#14 0x00007f9dc853f6d5 in switch_core_session_perform_receive_message (session=session@entry=0x7f9cf5b4a7c8, message=message@entry=0x7f9d2f76aa20, file=file@entry=0x7f9dc86c5e35 "src/switch_ivr.c", func=func@entry=0x7f9dc86c6ed0 <__func__.19070> "switch_ivr_parse_signal_data", line=line@entry=893) at src/switch_core_session.c:853
#15 0x00007f9dc8604f9c in switch_ivr_parse_signal_data (session=session@entry=0x7f9cf5b4a7c8, all=all@entry=SWITCH_TRUE, only_session_thread=only_session_thread@entry=SWITCH_FALSE) at src/switch_ivr.c:893
#16 0x00007f9dc8604fec in switch_ivr_parse_all_signal_data (session=session@entry=0x7f9cf5b4a7c8) at src/switch_ivr.c:906
#17 0x00007f9dc8605007 in switch_ivr_parse_all_messages (session=session@entry=0x7f9cf5b4a7c8) at src/switch_ivr.c:852
#18 0x00007f9dc8607b4a in switch_ivr_parse_all_events (session=session@entry=0x7f9cf5b4a7c8) at src/switch_ivr.c:925
#19 0x00007f9dc8548632 in switch_core_session_run (session=0x7f9cf5b4a7c8) at src/switch_core_state_machine.c:710
#20 0x00007f9dc85412ae in switch_core_session_thread (thread=<optimized out>, obj=0x7f9cf5b4a7c8) at src/switch_core_session.c:1726
#21 0x00007f9dc853c9c5 in switch_core_session_thread_pool_worker (thread=0x7f9d38410920, obj=<optimized out>) at src/switch_core_session.c:1790
#22 0x00007f9dc868d760 in dummy_worker (opaque=0x7f9d38410920) at threadproc/unix/thread.c:151
#23 0x00007f9dc5973ea5 in start_thread () from /lib64/libpthread.so.0
#24 0x00007f9dc4fc7b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f9dc8db78c0 (LWP 11507)):

接下来就是分析代码找出死锁原因了

Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)): 和线程 Thread 1 (Thread 0x7f9dc8db78c0 (LWP 11507)): 调用堆栈都有 sofia_receive_message 这个函数

具体代码

		switch_mutex_lock(tech_pvt->sofia_mutex);
		if (switch_core_session_in_thread(session)) {
			de->session = session;
		}

		sofia_process_dispatch_event(&de);
		switch_mutex_unlock(tech_pvt->sofia_mutex);

Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)): 卡在 switch_mutex_lock(tech_pvt->sofia_mutex);了

Thread 1 (Thread 0x7f9d2f76b700 (LWP 30595)) 成功获取了 tech_pvt->sofia_mutex) 锁定进入了sofia_process_dispatch_event。最后卡在switch_core_media_set_codec函数里面的switch_core_session_lock_codec_write(session)步骤。

通过看
Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)): 的调用堆栈 函数 switch_core_session_read_frame 里面调用了switch_core_session_perform_receive_message,
对应代码

	switch_mutex_lock(session->codec_read_mutex);

	...省略很多代码

	if (!switch_test_flag(session, SSF_WARN_TRANSCODE)) {
			switch_core_session_message_t msg = { 0 };

			msg.message_id = SWITCH_MESSAGE_INDICATE_TRANSCODING_NECESSARY;
			switch_core_session_receive_message(session, &msg);
			switch_set_flag(session, SSF_WARN_TRANSCODE);
		}

switch_mutex_lock(session->codec_read_mutex); 这个锁定和 switch_core_session_lock_codec_write(session) 对应的是一个锁定,至此 这个死锁的bug,我们就分析出来了。

Merge pull request #2109 from signalwire/switch_core_media_set_codec 这个提交为了解决一个bug,导致了一个更深的bug。

触发的原因是 cti模块有处理早期媒体,如果外呼用的是 ignore_early_media=true 应该是不会触发这个的bug的。

解决这个bug的方法是 调用 switch_core_session_receive_message 之前先 解锁 switch_mutex_unlock(session->codec_read_mutex)。代码如下。

			msg.message_id = SWITCH_MESSAGE_INDICATE_TRANSCODING_NECESSARY;
			switch_mutex_unlock(session->codec_read_mutex);
			switch_core_session_receive_message(session, &msg);
			switch_mutex_lock(session->codec_read_mutex);
			switch_set_flag(session, SSF_WARN_TRANSCODE);
08-04 11:44