home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT05398 Status: Closed

RSCT WILL NOT NOTIFY DB2 THAT THE PORT IS DOWN WHEN WE MOVE THAT PORT TO
ANOTHER DIFFERENT VLAN

product:
DB2 FOR LUW / DB2FORLUW / A50 - DB2
Problem description:
Scenario : 
All the ports of the servers are belong to a same VLAN(e.g 
VlAN10) , if we change the  RoCE0 of one member(e.g:member0) to 
another different VLAN(e.g VLAN11) , after about 5 minutes , db 
connect will hang on the rest of members(e.g:member1 and 
member2), member0 works as normal . 
 
EDUs on member 1 and 2 is waiting for 
000E0000000000000000000076 SQLP_VALLOCK. The holder is member0. 
This caused the hang one member 1,2. 
 
From db2diag.log file for member0 , db2CFConnPoolMgr 0 is 
repeating sqleCaCeConnect, probe:720 and 
sqleSingleCaCreateNewConnec, probe:2135  when we connected to 
PRIMARY CF from device hba0, and it reports that PsConnect 
failed and port state detected by RSCT to be online, but 
encountered error. 
 
2014-10-17-10.20.53.827735+480 I422919A2148         LEVEL: 
Severe 
PID     : 16580738             TID : 24461          PROC : 
db2sysc 0 
INSTANCE: instance             NODE : 000 
HOSTNAME: host 
EDUID   : 24461                EDUNAME: db2CFConnPoolMgr 0 
FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for 
CF, SQLE_CA_CONN_ENTRY_DATA::sqleCaCeConnect, probe:720 
MESSAGE : CA RC= 2148073473 
DATA #1 : String, 17 bytes 
PsConnect failed. 
DATA #2 : PsToken_t, PD_TYPE_SD_PSTOKEN, 152 bytes 
Eye Catcher               = CATOKEN 
CF Server Info : 
 - Unique Sequence Number = 187 (0xbb) 
 - Port Number            = 56001 
 - Node Identifier        = 1 
 - Instance Identifier    = 0 
 - Netname                = netname-ib0 
Local Member Info : 
 - uDAPL Device           = ib0 
Transport Type            = UDAPL (0x1) 
Cmd Connection Use Types  = NORMAL (0x0) 
DATA #3 : SAL CF Server Name, PD_TYPE_SAL_CF_SERVER_NAME, 13 
bytes 
host 
DATA #4 : SAL Member Device Name, 
PD_TYPE_SAL_MEMBER_DEVICE_NAME, 4 bytes 
ib0 
DATA #5 : CF Retry Position, PD_TYPE_SAL_RETRY_COUNTER, 8 bytes 
10 
DATA #6 : unsigned integer, 8 bytes 
1 
CALLSTCK: (Static functions may not be resolved correctly, as 
they are resolved to the nearest symbol) 
  [0] 0x09000000063B9D84 
sqleSingleCaCreateNewConnectionsForPool__21SQLE_SINGLE_CA_HANDLE 
FCUlR12sqzDataChainXT18SQLE_CA_CONN_ENTRYT16sqzChainNodeBaseXT1 
+ 0x42C 
  [1] 0x09000000063B9E04 
sqleSingleCaCreateNewConnectionsForPool__21SQLE_SINGLE_CA_HANDLE 
FCUlR12sqzDataChainXT18SQLE_CA_CONN_ENTRYT16sqzChainNodeBaseXT1 
+ 0x4AC 
  [2] 0x0900000006339C7C 
sqleSingleCaCreateNewConnectionsForPool__21SQLE_SINGLE_CA_HANDLE 
FCUlR12sqzDataChainXT18SQLE_CA_CONN_ENTRYT16sqzChainNodeBaseXT1 
+ 0xB70 
  [3] 0x090000000502CB64 
sqleSingleCaGrowPool__21SQLE_SINGLE_CA_HANDLEFCUlT1C17SAL_ADAPTE 
R_INDEX + 0x6CC 
  [4] 0x0900000007AD9654 sqleCFConnPoolMgrEntry__FPUcUi + 0x5C8 
  [5] 0x0900000007ACEC90 sqleCFConnPoolMgrEntry__FPUcUi + 0x1B4 
  [6] 0x0900000007ACE678 sqleCFConnPoolMgrEntry__FPUcUi + 0x110 
  [7] 0x090000000644F9F0 sqloEDUEntry + 0x4B8 
  [8] 0x0900000000782E10 _pthread_body + 0xF0 
  [9] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF 
 
2014-10-17-10.20.53.830449+480 I425068A1808         LEVEL: 
Warning 
PID     : 16580738             TID : 24461          PROC : 
db2sysc 0 
INSTANCE: instance             NODE : 000 
HOSTNAME: host 
EDUID   : 24461                EDUNAME: db2CFConnPoolMgr 0 
FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for 
CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCaCreateNewConnec, 
probe:2135 
MESSAGE : Port state detected by RSCT to be online, but 
encountered error 
          establishing a uDAPL connection.  Netname, m_whichCa, 
          numOfflineAdapters, numConsecutiveFailures, CF node 
num, 
          numConnections, bInitialConnections 
DATA #1 : SAL CF Server Name, PD_TYPE_SAL_CF_SERVER_NAME, 13 
bytes 
host 
DATA #2 : SAL Member Device Name, 
PD_TYPE_SAL_MEMBER_DEVICE_NAME, 4 bytes 
ib0 
DATA #3 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes 
2 
DATA #4 : unsigned integer, 8 bytes 
1 
DATA #5 : unsigned integer, 8 bytes 
0 
DATA #6 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes 
129 
DATA #7 : unsigned integer, 8 bytes 
1 
DATA #8 : Boolean, 8 bytes 
false 
DATA #9 : Codepath, 8 bytes 
6:14:16 
CALLSTCK: (Static functions may not be resolved correctly, as 
they are resolved to the nearest symbol) 
  [0] 0x090000000633AAA8 
sqleSingleCaCreateNewConnectionsForPool__21SQLE_SINGLE_CA_HANDLE 
FCUlR12sqzDataChainXT18SQLE_CA_CONN_ENTRYT16sqzChainNodeBaseXT1 
+ 0x199C 
  [1] 0x090000000502CB64 
sqleSingleCaGrowPool__21SQLE_SINGLE_CA_HANDLEFCUlT1C17SAL_ADAPTE 
R_INDEX + 0x6CC 
  [2] 0x0900000007AD9654 sqleCFConnPoolMgrEntry__FPUcUi + 0x5C8 
  [3] 0x0900000007ACEC90 sqleCFConnPoolMgrEntry__FPUcUi + 0x1B4 
  [4] 0x0900000007ACE678 sqleCFConnPoolMgrEntry__FPUcUi + 0x110 
  [5] 0x090000000644F9F0 sqloEDUEntry + 0x4B8 
  [6] 0x0900000000782E10 _pthread_body + 0xF0 
  [7] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF 
 
Indeed, the ibstat output shows that port state as "UP" , 
---------------------------------------------------------------- 
ETHERNET PORT 1 INFORMATION (roce0) 
---------------------------------------------------------------- 
 Link State: UP 
 Link Speed: 10G XFI 
 Link MTU: 9600 
 Hardware Address: f4:52:14:cf:4a:da 
 GIDS (up to 3 GIDs): 
 GID0 :00:00:00:00:00:00:00:00:00:00:f4:52:14:cf:4a:da 
 GID1 :00:00:00:00:00:00:00:00:00:00:ff:ff:0a:de:01:65 
 GID2 :00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 
 
And all the EDU's kept trying to reconnect to the CF using hba0 
and did not try to use hba1 . 
 
Since we using RSCT to detect network adapter status , so if 
the status of the port is UP, RSCT will think it is UP and will 
notify DB2 that the port is "UP".While in this case , because 
of the VLAN isolation ,the port is suppose to report as 
INACTIVE state , so the expected behavior should be used hba1 
to reconnect to CF for all EDU's . 
 
As the exposure scenario is not covered in lab, and we didn't 
consider it at the beginning design ,so lead to the current 
problem.
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* Members hang                                                 * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* See Error Description                                        * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to V10.5fp7                                          * 
****************************************************************
Local Fix:
Solution
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
06.11.2014
07.01.2016
07.01.2016
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)
10.5.0.7 FixList