Problem management is the standardized process for managing problems and known errors by identifying the root cause of the issue, discovering a workaround, and permanent fix.
Problem Management Process
Problem management will be performed at two stages:
- proactive problem management
- reactive problem management
Proactive problem management identifies, analyzes, and develops a resolution plan for recurring incidents or an incident with no solution. Proactive problem management identifies future issues through event management, incident management, availability management, and capacity management. Proactive problem management analyzes events, incidents, availability, and capacity designs and identifies vulnerabilities that can turn into problems.
Reactive problem management does the post-mortem investigation, diagnosis, and develops a resolution plan after the breakdown of services.
- Identify workarounds for incidents.
- Perform RCA and identify permanent fixes for recurring incidents.
- Establish and maintain the KEDB.
- Act as a next-level escalation point for unresolved issues in an incident management function.
|Detect the need for a problem analysis||The Problem Management process can get a trigger from several sources. Some of them are:
Major Incident Management:
The Problem Management process may get triggered if a major incident happens for which a workaround and not a permanent solution is given. Therefore, such a Major Incident can occur again.
The Incident Owner resolves an incident by giving a workaround to restore the service as soon as possible as per the objective of Incident Management, and the need for a problem analysis of the incident is detected.
Recurring incidents, Trend analysis, Pareto analysis, etc
Each resolver group may identify recurring incidents and identify potential problem areas through the conduct of periodic Trend/Pareto analysis and other statistical analyses with the help of the Problem Manager. This will lead to a need for doing Problem Manager.
|Record a problem ticket||If a unique problem is identified, then a problem record will be created with all the relevant details. The problem details should include information required for tracking and monitoring to ensure the problem is investigated in the shortest time possible based on the below criteria:
|Categorize and prioritize||The service desk categorizes the tickets based on the CTI. Then the problem tickets are prioritized based on the urgency and impact of the underlying incident.
The Problem ticket will be assigned to the correct Resolver Group, and the status of the ticket will be updated accordingly.
|Investigate the underlying issue||The Problem Owner and the Problem Manager will investigate further and gather more details around the Problem.|
|Conduct in-depth RCA||The Problem Owner, along with the Problem manager, conducts an in-depth and detailed root cause analysis.
In case the problem analysis was triggered because of a Major Incident, then the MIR is used as an input to the conduct of an RCA.
The following parameters are used in conducting an RCA:
|Identify workaround||The Resolver Group works on finding out the workaround for the problem based on the in-depth RCA conducted in the previous step.
The incident, which was kept pending due to not providing the resolution, is now opened, and the Workaround will be provided to the User, and the incident will be closed.
The KEDB is updated with the workaround provided to the ticket.
|Identify perm fix||The Problem Owner and Problem Manager will investigate whether the issue requires Supplier engagement.
If there is no Resolution found or the Resolution found is not cost-effective for the < Customer > Organization, the Problem Manager convenes the discussion with the Customer SME and then updates the ticket with the decision taken on the future course of action for the problem under consideration.
If no change is needed to implement the resolution, the resolution is directly enforced.
After providing the resolution, the Problem record is updated with the details of the Problem and discussion with < CUSTOMER > and resolution procedure of the Problem.
|Inform user||If a user is directly involved in asking for a Problem analysis, he/she is informed by the Problem owner regarding the resolution.|
|Close problem ticket||Once the Problem Manager ensures that all the necessary steps are taken towards Problem Closure, Problem Status is set to “Resolved.”
The problem ticket then gets the status “Closed.”
- Results of Proactive Problem Management
- The occurrence of a Major Incident (Reactive Problem Management)
- Any unknown issue
- Change Implemented
- Known errors arising from the IT development teams and test environments
- Configuration Management
- Service Level Management
- Incident Management
- Availability Management
- Request for Change (RFC)
- Resolution for the problem
- Knowledge Articles
- Trigger to Change Management