Any major incident can put severe pressure on the people, including the incident manager. Naturally, people forget or fumble during a crisis. While incident management processes and policies cover a lot of detail, they tend to be lengthy documents.
What is Major Incident Management Guide, and Why is it Important?
Having an incident management guide helps the team quickly start managing the incident and not worry about reading lengthy documents or depending on someone else for managing an incident. The incident management guide will aid an incident manager (or anyone working the incident) to handle the incident without reading an entire document effectively.
The incident management guide is based on the theory that the first hour of the incident is the most crucial time. If the incident manager can follow the process, establish the required teams, and keep the stakeholders informed, it is a big win.
Therefore, the guide focuses on tasks to be done in the first 15 minutes, 30 minutes, and 60 minutes. After 60 minutes, the guide suggests providing regular updates till the incident is resolved. The guide focuses on providing practical steps that the incident manager should follow.
What Should Major Incident management Guide Template Include?
First 15 minutes
- The first 15 minutes focus on the details of the issue, impacted stakeholders, creating tickets, and forming the incident response team – these are the basics of incident management.
- The structure of the incident response team will depend on the resourcing capacity of the organization.
- In some organizations, we have a dedicated incident manager, and in some organizations, the role is played by various people.
- One of the critical steps in the first 15 minutes is engaging the stakeholders.
- Your stakeholders or customers must know that the incident is managed. Communicating early in the incident management cycle shows that the indent is handled with priority and urgency.
First 30 minutes
- After the initial setup is complete, the focus should be on understanding the root cause and identifying any fixes or workarounds.
- After 15 – 20 minutes, the incident response team should have more details about the issue.
- If not much progress has happened in the first 15 – 20 minutes, it is time to either escalate to managers or get more resources involved.
- After the first 30 minutes, there should be additional information that can be shared with the stakeholders.
- Again, the critical step here is to engage with the stakeholders by providing additional information and making the entire incident management process transparent.
First 60 minutes
- If the incident is not resolved within the first hour, then additional steps are needed.
- After the first hour, the response team should have a better understanding of the issue and should be able to suggest workarounds.
- The workarounds should be aimed at reducing business impacts. Workarounds can consist of manual steps or additional steps for critical business process steps to be continued.
- Again, communication with stakeholders holds the key as the incident is still in progress.
- Another critical decision that needs to be made is about business continuity. If the incident is stopping a vital business process, then implementing business continuity measures should be considered. Business continuity policy will have verified methods of performing business-critical tasks using alternate techniques or technologies.
- After the first hour of the incident, the updates should happen at a more regular interval. Typically, every 30 or 60 minutes, depending on the stakeholder expectation.
- If business continuity activities are in progress, then the incident response may need to support them.
- If the first hour has passed and there are no potential workarounds or fixes identified, the release manager should consider engaging executive management.
- If there is an impact on customers, then relevant parties need to be contacted to arrange customer updates.
- Incident resolution is the last phase of incident management.
- The incident manager should immediately inform the stakeholders that the issue is fixed. Before sending the comms out, the incident response team needs to ensure that the issue is resolved. The incident response team will need to test and verify the fix before the incident is resolved.
- The incident manager needs to update the incident ticket with all the details, including timelines, root cause, business impact, etc. This information will be helpful when drafting the post-implementation review (PIR).
- After the incident is officially closed, the next step is to conduct a post-implementation review and document all the feedback from all the teams involved in the incident. It is essential to get input from all the parties involved.
- The last step in the incident management process is to document a Post Incident Report or PIR with all the details of the incident, root cause, lessons learned, actions, and any problem identified. The incident manager should log a problem ticket for any residual issues.