Monday, 6 March 2017

Websphere MQ Cluster Issues


1: A cluster sender channel is in retry state.
This is one of the common issues we face in Websphere MQ clusters Environment.
(e.g)
1 : display chs(*)
AMQ8417: Display Channel Status details.
CHANNEL(TO.QM2) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE)
CONNAME(computer.ibm.com(1414))
CURRENT CHLTYPE(CLUSSDR)
STATUS(RETRYING)
Cause of the issue:
Either the remote queue manager is not available or there is an incorrect parameter in either the local manually defined cluster sender channel definition or the remote cluster receiver definition.
Checks to be done:
a) First check whether the problem is the availability of the remote queue manager. Are there any error messages?
b) Is the queue manager active?
c) Is the queue manager's listener running?
d) Is the queue manager's channel able to start?
If the remote queue manager is available there could be a problem with a channel definition. Check the definition type of the cluster queue manager for the channel in retry state. For example:
1 : dis clusqmgr(*) deftype where(channel eq TO.QM2) AMQ8441: Display Cluster
Queue Manager details. CLUSQMGR(QM2) CHANNEL(TO.QM2) CLUSTER(DEMO)
DEFTYPE(CLUSSDRA)
If the definition type is CLUSSDR the channel is using the local manual cluster sender definition. Alter any incorrect parameters in the local manual cluster sender definition and restart the channel.
If the definition type is either CLUSSDRA or CLUSSDRB the channel is using an auto-defined cluster sender channel based on the definition of a remote cluster receiver channel. Alter any incorrect parameters in the remote cluster receiver definition. For example, the conname may be incorrect.
1 : alter chl(to.qm2) chltype(clusrcvr) conname('newhost(1414)') AMQ8016:
WebSphere MQ channel changed.
Changes to the remote cluster receiver definition will be propagated out to any cluster queue managers that are interested and the corresponding auto-defined channels will be updated accordingly. You can check that the updates have been propagated correctly by checking the changed parameter. For example:
1 : dis clusqmgr(qm2) conname AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM2) CHANNEL(TO.QM2) CLUSTER(CLUSDEMO) CONNAME(newhost(1414))
If the auto-defined definition is now correct, restart the channel.




2: Why DISPLAY CLUSQMGR shows CLUSQMGR names starting SYSTEM.TEMP. ?

DISPLAY CLUSQMGR shows CLUSQMGR names starting SYSTEM.TEMP.
(e,g)
1 : display clusqmgr(*)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM1) CLUSTER(DEMO)
CHANNEL(TO.QM1)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(SYSTEM.TEMPUUID.computer.hursley.ibm.com(1414))
CLUSTER(DEMO) CHANNEL(TO.QM2)
Cause:
This queue manager has not received any information from the full repository queue manager that the manually defined CLUSSDR channel points to. The manually defined CLUSSDR channel should be in running state.
Checks and Issue Rectification:
Check that the CLUSRCVR definition is also correct, especially its conname and cluster parameters. Alter the channel definition. It might take some time for the remote queue managers to complete a retry cycle and restart their channels with the corrected definition.



3: Applications get rc=2035 MQRC_NOT_AUTHORIZED when trying to open a queue in the cluster.
Applications get rc=2035 MQRC_NOT_AUTHORIZED when trying to open a queue in the cluster.
Cause:
If your application gets MQRC_NOT_AUTHORIZED when trying to open a queue in a cluster, but the authorization for that queue is correct, it is likely that the application is not authorized to put to the cluster transmission queue.
Checks and Rectifications:
MQS_REPORT_NOAUTH:
See MQS_REPORT_NOAUTH for information on using this environment variable to better diagnose return code 2035 (MQRC_NOT_AUTHORIZED). The use of this environment variable generates errors in the queue manager error log, but does not generate a Failure Data Capture (FDC).
MQSAUTHERRORS:
See MQSAUTHERRORS for information on using this environment variable to generate FDC files related to return code 2035 (MQRC_NOT_AUTHORIZED). The use of this environment variable generates an FDC, but does not generate errors in the queue manager error log.



4: Applications get rc=2085 MQRC_UNKNOWN_OBJECT_NAME when trying to open a queue in the cluster.
Applications get rc=2085 MQRC_UNKNOWN_OBJECT_NAME when trying to open a queue in the cluster.
Cause:
The queue manager where the object exists or this queue manager may not have successfully entered the cluster. Make sure that they can each display all of the full repositories in the cluster. Also make sure that the CLUSSDR channels to the full repositories are not in retry state.
(e.g)
1 : display clusqmgr(*) qmtype status
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM1) CLUSTER(DEMO)
CHANNEL(TO.QM1) QMTYPE(NORMAL)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM2) CLUSTER(DEMO)
CHANNEL(TO.QM2) QMTYPE(REPOS)
STATUS(RUNNING)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM3) CLUSTER(DEMO)
CHANNEL(TO.QM3) QMTYPE(REPOS)
STATUS(RUNNING)

If the queue is correctly in the cluster check that you have used appropriate open options. You cannot GET from a remote cluster queue so make sure that the open options are for output only.




5: Applications get rc= 2189 MQRC_CLUSTER_RESOLUTION_ERROR when trying to open a queue in the cluster.

Cause:
The queue is being opened for the first time and the queue manager cannot make contact with any full repositories. Make sure that the CLUSSDR channels to the full repositories are not in retry state.
1 : display clusqmgr(*) qmtype status
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM1) CLUSTER(DEMO)
CHANNEL(TO.QM1) QMTYPE(NORMAL)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM2) CLUSTER(DEMO)
CHANNEL(TO.QM2) QMTYPE(REPOS)
STATUS(RUNNING)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM3) CLUSTER(DEMO)
CHANNEL(TO.QM3) QMTYPE(REPOS)
STATUS(RUNNING)

If the queue is correctly in the cluster check that you have used appropriate open options. You cannot GET from a remote cluster queue so make sure that the open options are for output only.






6: Messages are not appearing on the destination queues.

Cause:
The messages may be stuck at their origin queue manager. Make sure that the SYSTEM.CLUSTER.TRANSMIT.QUEUE is empty and also that the channel to the destination queue manager is running.
(e.g)
1 : display ql(SYSTEM.CLUSTER.TRANSMIT.QUEUE) curdepth
AMQ8409: Display Queue details.
QUEUE(SYSTEM.CLUSTER.TRANSMIT.QUEUE) CURDEPTH(0)

2 : display chs(TO.QM2)
AMQ8417: Display Channel Status details.
CHANNEL(TO.QM2) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE)
CONNAME(comfrey.hursley.ibm.com(1415))
CURRENT CHLTYPE(CLUSSDR)
STATUS(RUNNING)


7: Return code=2082 MQRC_UNKNOWN_ALIAS_BASE_Q opening a queue in the cluster.

Cause and Rectification:
An MQOPEN or MQPUT1 call was issued specifying an alias queue as the target, but the BaseQName in the alias queue attributes is not recognized as a queue name.
This reason code can also occur when BaseQName is the name of a cluster queue that cannot be resolved successfully.
MQRC_UNKNOWN_ALIAS_BASE_Q might indicate that the application is specifying the ObjectQmgrName of the queue manager that it is connecting to, and the queue manager that is hosting the alias queue. This means that the queue manager looks for the alias target queue on the specified queue manager and fails because the alias target queue is not on the local queue manager.

8: Messages put to a cluster alias queue go to SYSTEM.DEAD.LETTER.QUEUE.

Symptom:
Messages put to an alias queue go to SYSTEM.DEAD.LETTER.QUEUE with reason MQRC_UNKNOWN_ALIAS_BASE_Q.
Cause:
A message is routed to a queue manager where a clustered alias queue is defined. A local target queue is not defined on that queue manager. Because the message was put with the MQOO_BIND_ON_OPEN open option, the queue manager cannot requeue the message.

When MQOO_BIND_ON_OPEN is used, the cluster queue alias is firmly bound. The resolved name is the name of the target queue and any queue manager on which the cluster queue alias is defined. The queue manager name is placed in the transmission queue header. If the target queue does not exist on the queue manager to which the message is sent, the message is put on the dead letter queue. The destination is not recomputed, because the transmission header contains the name of the target queue manager resolved by MQOO_BIND_ON_OPEN. If the alias queue had been opened withMQOO_BIND_NOT_FIXED, then the transmission queue header would contain a blank queue manager name, and the destination would be recomputed. In which case, if the local queue is defined elsewhere in the cluster, the message would be sent there.

Solution:
a) Change all alias queue definitions to specify DEFBIND (NOTFIXED).
b) Use MQOO_BIND_NOT_FIXED as an open option when the queue is opened.
c) If you specify MQOO_BIND_ON_OPEN, ensure that a cluster alias that resolves to a local queue defined on the same queue manager as the alias.

9: A queue manager does not appear to have up to date information about queues and channels in the cluster.
Cause:
Check that the queue manager where the object exists and this queue manager are still connected to the cluster. Make sure that they can each display all of the full repositories in the cluster. Also make sure that the CLUSSDR channels to the full repositories are not in retry state, as above.
Next make sure that the full repositories have enough CLUSSDR channels defined to correctly connect them together. The updates to the cluster only flow between the full repositories over manually defined CLUSSDR channels. After the cluster has formed these will show as DEFTYPE(CLUSSDRB) channels because they are both manual and automatic channels. There must be enough of these to form a complete network among all of the full repositories.
(e.g)
1 : dis clusqmgr(SYSTEM.CLUSTER.TRANSMIT.QUEUE) curdepth
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM1) CLUSTER(DEMO)
CHANNEL(TO.QM1) DEFTYPE(CLUSSDRA) QMTYPE(NORMAL) STATUS(RUNNING)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM2) CLUSTER(DEMO)
CHANNEL(TO.QM2) DEFTYPE(CLUSRCVR)
QMTYPE(REPOS)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM3) CLUSTER(DEMO)
CHANNEL(TO.QM3) DEFTYPE(CLUSSDRB)
QMTYPE(REPOS) STATUS(RUNNING)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM4) CLUSTER(DEMO)
CHANNEL(TO.QM4) DEFTYPE(CLUSSDRA)
QMTYPE(NORMAL) STATUS(RUNNING)
10: No changes in the cluster are being reflected in the local queue manager.

Cause:
The repository manager process in not processing repository commands. Check that the SYSTEM.CLUSTER.COMMAND.QUEUE is empty.
1 : display ql(SYSTEM.CLUSTER.COMMAND.QUEUE) curdepth
AMQ8409: Display Queue details.
QUEUE(SYSTEM.CLUSTER.COMMAND.QUEUE) CURDEPTH(0)

Next check that the channel initiator is running on z/OS®. Then check that there are no error messages in the error logs indicating the queue manager has a temporary resource shortage.

11: DISPLAY CLUSQMGR, shows a queue manager twice.
DISPLAY CLUSQMGR, shows a queue manager twice.
(e.g)
: display clusqmgr(QM1) qmid
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM1) CLUSTER(DEMO)
CHANNEL(TO.QM1) QMID(QM1_2002-03-04_11.07.01)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(QM1) CLUSTER(DEMO)
CHANNEL(TO.QM1) QMID(QM1_2002-03-04_11.04.19)

Cause:
1) The queue manager might have been deleted, and then recreated and Redefined without following right the procedure.
2) The queue Manager might have been cold started on z/OS®, without following right the procedure.
3) The cluster functions correctly with the older version of the queue manager being ignored, until it ages out of the cluster completely after about 90 days. To remove all trace of the queue manager immediately use the RESET CLUSTER command from a full repository queue manager, to remove the older unwanted queue manager and its queues from the cluster.
reset cluster(DEMO) qmid('QM1_2002-03-04_11.04.19') action(FORCEREMOVE) queues(yes)
AMQ8559: RESET CLUSTER accepted.
Using the RESET CLUSTER command stops auto-defined cluster sender channels for the affected queue manager. You must manually restart any cluster sender channels that are stopped, after completing the RESET CLUSTER command.


12: RESET CLUSTER and REFRESH CLUSTER commands were issued, but the queue manager would not rejoin the cluster.

Cause:
1) A side effect of the RESET and REFRESH commands may be that a channel is stopped, in order that the correct version of the channel runs when the command is completed.
2) Check that the channels between the problem queue manager and the full repositories are running and use the START CHANNEL command if necessary.


3 comments:

  1. I have had the same issue. checked all the channels, listeners and everything. Recreated them only to discover later that there is one more step which was not mentioned anywhere, that was to give rights to SYSTEM.CLUSTER.TRANSMIT.QUEUE

    setmqaut -m -n SYSTEM.CLUSTER.TRANSMIT.QUEUE -t q -g mqm +all

    ReplyDelete
  2. Hi
    We could see cluster receiver channel is in stopped state and to reduce the impact we have started. It is up and running.
    When we verified the MQ logs nothing is recorded and my Team did not stop manually .

    What is likely cause of the channel in stopped state . Please suggest us.

    ReplyDelete
  3. Hi Siva Garu,
    Can I have your number pls...

    ReplyDelete