DB2 - Problem description
Problem IT37136 | Status: Closed |
SQL1659 is not returned even when a RDMA device cannot be openedduring db2start, and db2diag.log is spammed with error messages | |
product: | |
DB2 FOR LUW / DB2FORLUW / B50 - DB2 | |
Problem description: | |
When a member fails to open a specific RDMA device, that is used for communication to the CF, during db2start, a SQL1659N warning did not get returned to the user from the db2start command. The open device failure looks similar to this: 2021-04-30-10.38.19.631990+540 I14510040A757 LEVEL: Warning PID : 11076474 TID : 772 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 HOSTNAME: MEMBER01 EDUID : 772 EDUNAME: db2castructevent 0 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCfOpenAndConnect, probe:1316 MESSAGE : CA RC= 2148204567 DATA #1 : PsOpen FAILURE: hostname:CF-host(member#: 128, cfIndex: 1) ; device:hca1 ; caport:56001 ; transport: UDAPL Connection pool target size = 16 ; Tolerate this PsOpen failure, connections will berestricted to use the successful opened device(s). conn (seq #: 251 node #: 1 connectTimeoutForLink: 10 maxTimeoutForLink: 20) The db2diag.log will also be spammed with the following error repeatedly: 2021-04-30-10.38.19.636533+540 I14513438A1711 LEVEL: Warning PID : 11076474 TID : 772 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 HOSTNAME: MEMBER01 EDUID : 772 EDUNAME: db2castructevent 0 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SQLE_SINGLE_CA_HANDLE::sqleCaCpGetTokenIndexByCfServerNetNameAnd DeviceName, probe:1737 MESSAGE : Could not find a match for the specified netname among CA tokens. DATA #1 : SAL CF Server Name, PD_TYPE_SAL_CF_SERVER_NAME, 8 bytes CF-host DATA #2 : SAL Member Device Name, PD_TYPE_SAL_MEMBER_DEVICE_NAME, 4 bytes hca1 DATA #3 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes 1 DATA #4 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes 128 CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol) [0] 0x090000000762EE58 sqleCaCpGetTokenIndexByCfServerNetNameAndDeviceName__21SQLE_SING LE_CA_HANDLEFRC27SQLE_CF_MEMBER_ADAPTER_LINKCP17SAL_ADAPTER_IND + 0x318 [1] 0x0900000007627014 sqleSingleCaRefreshAdapterStatus__21SQLE_SINGLE_CA_HANDLEFCb + 0x944 [2] 0x09000000076257FC sqleSingleCfOpenAndConnect__21SQLE_SINGLE_CA_HANDLEFCUi + 0x6DC [3] 0x090000000762D060 sqleSingleCaInitialize__21SQLE_SINGLE_CA_HANDLEFCUlCUi + 0x4F0 [4] 0x09000000076144B0 sqleCaCpAddCa__17SQLE_CA_CONN_POOLFsCUiCPUl + 0x7A0 [5] 0x0900000007679CE4 ROCM_StateCaInitMonitor__16sqleRocmNotifEduFv + 0xCA4 [6] 0x0900000007677BC0 RunEDU__16sqleRocmNotifEduFv + 0xCD0 [7] 0x0900000006EC1890 EDUDriver__9sqzEDUObjFv + 0x2F0 [8] 0x0900000006DD84D4 sqloEDUEntry + 0x364 [9] 0x09000000009E7FE8 _pthread_body + 0xE8 [10] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF 2021-04-30-10.38.19.636936+540 I14515150A1630 LEVEL: Error PID : 11076474 TID : 772 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 HOSTNAME: MEMBER01 EDUID : 772 EDUNAME: db2castructevent 0 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCaRefreshAdapterStatus, probe:7358 MESSAGE : ZRC=0x802700FC=-2144927492=SQLE_SAL_INV_PARM "Invalid input parameter" DATA #1 : String, 69 bytes Could not match this member's HCA with a device. Skip to the next HCA DATA #2 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes 1 DATA #3 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes 128 DATA #4 : SAL CF Server Name, PD_TYPE_SAL_CF_SERVER_NAME, 8 bytes CF-host DATA #5 : SAL Member Device Name, PD_TYPE_SAL_MEMBER_DEVICE_NAME, 4 bytes hca1 CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol) [0] 0x0900000007627104 sqleSingleCaRefreshAdapterStatus__21SQLE_SINGLE_CA_HANDLEFCb + 0xA34 [1] 0x09000000076257FC sqleSingleCfOpenAndConnect__21SQLE_SINGLE_CA_HANDLEFCUi + 0x6DC [2] 0x090000000762D060 sqleSingleCaInitialize__21SQLE_SINGLE_CA_HANDLEFCUlCUi + 0x4F0 [3] 0x09000000076144B0 sqleCaCpAddCa__17SQLE_CA_CONN_POOLFsCUiCPUl + 0x7A0 [4] 0x0900000007679CE4 ROCM_StateCaInitMonitor__16sqleRocmNotifEduFv + 0xCA4 [5] 0x0900000007677BC0 RunEDU__16sqleRocmNotifEduFv + 0xCD0 [6] 0x0900000006EC1890 EDUDriver__9sqzEDUObjFv + 0x2F0 [7] 0x0900000006DD84D4 sqloEDUEntry + 0x364 [8] 0x09000000009E7FE8 _pthread_body + 0xE8 [9] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF After applying the fix for this problem, db2start will return SQL1659N when a member fails to open a specific RDMA device during db2start, and the error messages above will be reduced so that they don't spam the db2diag.log. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * pureScale users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 11.5m7fp0 or higher * **************************************************************** | |
Local Fix: | |
Solution | |
Workaround | |
**************************************************************** * USERS AFFECTED: * * pureScale users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 11.5m7fp0 or higher * **************************************************************** | |
Comment | |
First fixed in Db2 11.5m7fp0 | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 06.06.2021 01.12.2021 01.12.2021 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) |