Command Center IT Engineer
Job Description
About the company
Albertsons Companies is at the forefront of the revolution in retail. With a fixation on innovation and building culture, our team is rallying our company around a unique vision: forging a retail winner that is admired for national strength, deep roots in the communities we serve, and a team that has passion for food and delivering great service.
Albertsons is one of the largest retail employers, providing approximately 300,000 jobs across 2,200 stores, 22 distribution centers, 20 food and beverage plants and various support offices. We operate in 34 states and the District of Columbia under the Albertsons banner, as well as Safeway, Tom Thumb, Jewel Osco, Shaw’s and many more recognizable names.
Albertsons Companies recently rolled out our Presence with a Purpose work model. Placing a premium on adaptability, safety and family well-being, Presence with a Purpose will help us build a hybrid work environment between remote work and office time. A one-size-fits-all approach does not apply to everyone, and teams are allowed to make decisions that are best for them.
What you will be doing
The Incident Manager (IM) is responsible for the effective restoration of services during a critical outage service interruption. Implementation of major incident guidelines for restoring services to normal. Carries out corresponding reporting on key metrics. The IM represents the first stage of escalation for major incidents.
The position will be based in Phoenix, Arizona or Plano, Texas.
Main responsibilities
- Restoring a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained.
- Facilitate the outage calls and ensure that all the required resources are engaged to work a mission critical incident.
- Leverage ServiceNow Performance Analytics and ServiceNow reports to identify repeat incidents.
- Manage queues within ServiceNow and monitor all open and aging cases daily.
- Execute proactive and reactive problem management analysis to minimize future problems.
- Monitor trouble tickets to identify impacted system and application outages that require follow-up analysis.
- Facilitate Post Mortem Reviews on major outages and track problem tasks to completion.
- Ensure effective communication is maintained with the Executives, Business Leadership during a mission critical incident.
- Drives service restoration for all major service interruptions.
- Works toward continuous operational and process improvement while maintaining 100%compliance with quality and legal standards.
- Must have a high degree of technical knowledge to understand the environment and provide Management updates when needed.
- Understanding of IT impact to the business and raises alternatives workarounds.
- Responsible for technical and service monitoring, detecting and incident handling for all technology related incidents.
- Collaborates with other teams and customers to improve service and increase value of Command Center Operations.
- Reports to Management on service interruptions and impact to Customer base.
- Support Mission Critical Incident Management reporting (KPIs and customer SLAs).
- Assists the Mission Critical Incident Management Process Owners in driving Service Management best-practice and ITIL process standardization.
- Review Problem Management policy and knowledge documentation on a regular basis to ensure relevance and accuracy.
- Facilitate task forces aimed at addressing a problematic issue with an unknown root cause
- Develop and review compliance metrics and KPIs to identify areas to mature the problem process, policies, and training material.
- Drives implementation of standard execution of the Mission Critical Incident Management process.
- Assist IT teams in training, utilization and adherence to the Problem Management policies
A copy of the full job description can be made available to you.
What we are searching for
- Bachelor’s Degree in related Field
- Requires 3+ years’ work experience as an incident analyst/manager in a large, enterprise environment.
- Must have a thorough understanding of ITIL disciplines and processes: Incident, Problem, Change, Configuration Management principles; ITIL v3 Foundations certification is required.
- Nice to have: Cisco, network, Oracle, Linux, Windows, PagerDuty app, Agile Waterfall
- Must have experience with ServiceNow.
- In-depth knowledge and proven experience in troubleshooting, problem determination, root cause analysis and rapid problem resolution.
- Must be knowledgeable on complex networks, open systems and mainframe computing services.
- Previous retail experience desired.
- Must possess strong leadership abilities and able to lead service restoration efforts across the organization.
- Good understanding of production IT Environment and IT Operations.
- Strong communication skills required, both written and oral communication.
- Demonstrates a high level of energy, results driven and able to work under pressure.
- Experienced in leading special projects and tasks that support Data Center Operations.
- Must have working knowledge of Data Center automation and monitoring, including service level-based monitoring.
- Ability to effectively communicate to Sr. IT and Business on problem status and approach to corrective action.
- Ability to support change while motivating and mentoring others.
What it is like at Albertsons?
Albertsons Culture Principles
Compassion: We always treat each other with kindness and respect
Team: We always support and recognize each other
Inclusive: We always value everyone’s perspective
Learning: We always strive to grow and develop ourselves and others
Competitive: We always act with integrity to win over the customer
Ownership: We always take actions to drive our success