【标题描述】:
【测试类型:故障注入】【测试版本:3.0.3】 故障注入服务器重启后openGauss启动异常
【操作系统和硬件信息】(查询命令: cat /etc/system-release, uname -a):
CentOS Linux release 7.9.2009 (Core)
Linux test123 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
【测试环境】
多实例环境(1主1备)
单实例环境
【被测功能】:
故障注入后的恢复表现
【测试类型】:
故障注入
【数据库版本】(查询命令: gaussdb -V):
gaussdb (openGauss 3.0.3 build 94f6a79e) compiled at 2023-09-13 08:32:24 commit 0 last mr
【预置条件】:
【操作步骤】(请填写详细的操作步骤):
【预期输出】:
【实际输出】:
Failed to read gaussdb.state: 0
Failed to set gaussdb.state with UNKNOWN_STATE.
【原因分析】:
这个问题的根因
发生异常时,data/gaussdb.state文件size会变为0.
gs_ctl start时,会有验证流程向gaussdb.state文件中写入状态,判断文件状态时
文件不存在时,流程继续
文件存在时,文件size与结构体size不相等时异常退出
导致opengauss启动异常
问题推断过程
还有哪些原因可能造成类似现象
该问题是否有临时规避措施
问题解决方案
预计修复问题时间
【日志信息】(请附上日志文件、截图、coredump信息):
2023-12-20 17:09:44.443 6582af58.1 [unknown] 140611239308416 [unknown] 0 dn_6001 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max = 4, actual = 4
2023-12-20 17:09:44.443 6582af58.1 [unknown] 140611239308416 [unknown] 0 dn_6001 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
Failed to read gaussdb.state: 0Failed to set gaussdb.state with UNKNOWN_STATE.[2023-12-20 17:09:45.471][118648][][gs_ctl]: waitpid 118654 failed, exitstatus is 256, ret is 2
[2023-12-20 17:09:45.471][118648][][gs_ctl]: stopped waiting
[2023-12-20 17:09:45.471][118648][][gs_ctl]: could not start server
【测试代码】:
上述命令故障注入
Hey @samli3388, Welcome to openGauss Community.
All of the projects in openGauss Community are maintained by @opengauss_bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at Here to find the details.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Hi @samli3388, please use the command /sig xxx to add a SIG label to this issue.
For example: /sig sqlengine or /sig storageengine or /sig om or /sig ai and so on.
You can find more SIG labels from Here.
If you have no idea about that, please contact with @xiangxinyong , @zhangxubo .
基于B011版本自测:
验收日期:2024-1-16
验收版本:gsql (openGauss 6.0.0 build d2533e77) compiled at 2024-01-10 08:37:56 commit 0 last mr
验收结论:通过
验收日期:2024-4-19
验收版本:gsql (openGauss 5.0.2 build 0db5202f) compiled at 2024-04-17 15:28:50 commit 0 last mr
验收结论:通过
[peilq_502@kwepwebenv02644 dn1]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
-------------------------------------------------------------------------------------------------------------------------------
1 kwepwebenv02644 10.29.180.204 50200 6001 /openGauss/peilq_all/peilq_app/peilq_502/cluster/dn1 P Primary Normal
2 kwepwebenv07952 10.243.194.134 50200 6002 /openGauss/peilq_all/peilq_app/peilq_502/cluster/dn1 S Standby Normal
3 kwemhisprc10431 7.212.123.28 50200 6003 /openGauss/peilq_all/peilq_app/peilq_502/cluster/dn1 S Standby Normal
[peilq_502@kwepwebenv02644 dn1]$ ll gaussdb.state
-rw------- 1 peilq_502 peilq_502 72 Apr 18 14:50 gaussdb.state
[peilq_502@kwepwebenv02644 dn1]$ truncate -s 0 gaussdb.state
[peilq_502@kwepwebenv02644 dn1]$ ll gaussdb.state
-rw------- 1 peilq_502 peilq_502 0 Apr 19 15:47 gaussdb.state
[peilq_502@kwepwebenv02644 dn1]$ gs_ctl -D /openGauss/peilq_all/peilq_app/peilq_502/cluster/dn1 restart
[2024-04-19 15:47:31.581][47700][][gs_ctl]: gs_ctl restarted ,datadir is /openGauss/peilq_all/peilq_app/peilq_502/cluster/dn1
waiting for server to shut down... done
server stopped
[2024-04-19 15:47:35.595][47700][][gs_ctl]: waiting for server to start...
.0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
0 LOG: [Alarm Module]Host Name: kwepwebenv02644
0 LOG: [Alarm Module]Host IP: kwepwebenv02644. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
0 LOG: [Alarm Module]Cluster Name: peilq_502
0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING: failed to parse feature control file: gaussdb.version.
0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version.
0 LOG: bbox_dump_path is set to /openGauss1/core/
2024-04-19 15:47:35.752 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 8, max = 4, actual = 4
2024-04-19 15:47:35.752 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
Failed to read gaussdb.state: 0, len: 02024-04-19 15:47:35.760 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
2024-04-19 15:47:35.760 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: kwepwebenv02644
2024-04-19 15:47:35.760 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: kwepwebenv02644. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
2024-04-19 15:47:35.760 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: peilq_502
2024-04-19 15:47:35.760 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
2024-04-19 15:47:35.768 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: loaded library "security_plugin"
2024-04-19 15:47:35.771 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2024-04-19 15:47:35.780 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2024-04-19 15:47:35.780 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4477 Mbytes) is larger.
2024-04-19 15:47:35.879 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [CACHE] LOG: set data cache size(805306368)
2024-04-19 15:47:36.370 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [SEGMENT_PAGE] LOG: Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
2024-04-19 15:47:36.441 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: gaussdb: fsync file "/openGauss/peilq_all/peilq_app/peilq_502/cluster/dn1/gaussdb.state.temp" success
2024-04-19 15:47:36.441 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Primary), connection index(1)
2024-04-19 15:47:36.464 66222197.1 [unknown] 139828598239552 [unknown] 0 dn_6001_6002_6003 00000 0 [BACKEND] LOG: max_safe_fds = 973, usable_fds = 1000, already_open = 17
bbox_dump_path is set to /openGauss1/core/
.
[2024-04-19 15:47:37.621][47700][][gs_ctl]: done
[2024-04-19 15:47:37.621][47700][][gs_ctl]: server started (/openGauss/peilq_all/peilq_app/peilq_502/cluster/dn1)
[peilq_502@kwepwebenv02644 dn1]$
登录 后才可以发表评论