Site Reliability Engineer, Monitoring and Control Engineering

NBCU is looking for creative engineers willing to learn from the current process but are not afraid to think outside of the box. This role is responsible for the engineering, operations, support, deployment and maintenance of core Distribution Engineering Monitoring and Control systems, both on-premises and cloud. · Utilize scripting and automation to develop, customize and enhance monitoring/alerting tools for “on-air” environments · Interact with automated monitoring infrastructure to ensure healthy environments · Create system dashboards that improve system availability and reliability · Query data stores to quantify the scope of reported issues · Create new metrics and identify monitoring deliverables to improve site reliability · Act as a Level 2 resource, drive and own investigations related to Broadcast issues and report back findings in a timely manner to leadership and operations. · This role requires on-call 24/7 support on a rotating shift schedule · Follow up with team members & 3rd party vendors if issues found cannot be solved and drive vendors for root cause and solutions if possible. · Create comprehensive documentation outlining the intricacies of encountered issue, elucidating the root cause and steps for effective issue resolution. · Administer monitoring and control systems within the “on-air” environments · Develop proof of concept deployments for evaluation of products and architectures · Utilize modern frameworks and scripting languages to develop products and services for NBCU's IP video distribution environment

Job ID
744000096674568
DetailURL
https://jobs.smartrecruiters.com/NBCUniversal3/744000096674568
Job Level
Job Location
Profession
LastUpdated
Search Meta
51606766 Operations & Technology Engineering Engineering United States All Remote
Job Reference number
51606766
Multi Location
No
Is Remote Job?
Yes