跳至主要內容
BE YOU. BE HERE. BE PART OF THE STORY.

Senior Systems Reliability Operations Engineer

工作 ID 10058991 地點 Mumbai, 印度 有意工作的公司 The Walt Disney Company (Corporate) 日期已公佈 Nov. 30, 2023
申請

樂享魔法

員工和演藝人員是我們一切工作的核心,因此 Disney 提供具有競爭力的全面獎勵方案,包括薪資、健康和儲蓄福利、休假計劃、教育機會等。

*福利和資格可能因企業和地點而異

  • 健康保險與健康
  • 托兒選項
  • 有薪假期
  • 退休計劃
  • 學費援助
  • 每週支薪
查看全部福利
「直到我來到這裡之前,我不認為我覺得好玩才去做的事——透過統計數據講述體育故事——會成為我的工作。」

ESPN 高級研究員 Ana

工作概要:

The Disney Technology Operations Command Center (DTOC) is a 24x7x365 mission-critical services operation center responsible for service availability, with primary focus to rapidly respond to, correlate for, and reduce impact of outages. We are accountable for identifying and facilitating the resolution of service impacting events, and collaborating with other technology teams to prevent future impact through proactive event management, incident and problem analysis. The DTOC drives the execution of the major incident process including communication to executives and key stakeholders. The DTOC owns and executes the IT Emergency Operations Center Crisis Management plan and process, with responsibility for maturing the plan and its integration into the overall Corporate Crisis Management and TWDC programs. The DTOC also provides ongoing first and second-level technical support of requests, performs validation procedures for routine system/service checks, and fulfills proactive monitoring with communication for HyperCare of significant business events.
 
The SRO Engineer will provide operational oversight and technical leadership and is responsible for monitoring, identifying, and coordinating with other technologists across segments to fine-tune system operations rallying to resolve service interruptions. This role is responsible for the end-to-end reliability and operations of IT services and performing consultations and training to other clients and segments within TWDC. 

The SRO Engineer will examine IT systems for defects and communicate maintenance schedules and critical events across the company.

Working with Engineers and Analysts at all levels and the SRO will interact with computer and software engineers, quality control specialists, infrastructure service leads, segment technologists, and others to ensure service availability, increase efficiency, and establish best practices for the execution and continuous improvement of the Event, Incident, Major Incident, Crisis Management, Hypercare execution, and Problem Management processes within the DTOC.

Additionally this position will drive service improvement initiatives through proactive monitoring and enhancement actions from gaps identified through analytics and  problem management.  The SRO engineer is an active member of the DTOC service team focused on Operations, but ensuring the operations sustainability by contributing to the development, testing, and evaluation of services supported.

Leverage partnerships with the Business, Customer base and the Suppliers to successfully deliver services to meet agreed upon expectations. Provides 24x7x365 first point-of-contact for centralized incident response and recovery that consistently and reliably triages reported or automated incidents, applies recovery procedures, and engages domain experts to restore steady-state operations; provides all core services on a priority basis and with dedicated support to ensure the success of critical events.

 Technology Focus

  • Carries and maintains a relevant and up to date skill set in the areas of x86 hardware technology, Windows, Linux, RISC operating systems, P-Series hardware, SAN, NAS and data protection technologies.
  • Must have a working knowledge of relevant WAN/LAN technologies, wireless infrastructure, DNS/DHCP, Load-Balancers, WAN Accelerators, and other network technologies.
  • Implement and maintain technology observability and alerting solutions to provide real-time insights into system health, performance, and compliance.
  • Establish and maintain service technology level objectives (SLOs) and service level indicators (SLIs) for critical enterprise services.
  • Monitor and manage the performance and availability of enterprise applications, systems, and infrastructure, ensuring they meet or exceed established service level objectives (SLOs).
  • Proactively identify, diagnose, troubleshoot, and resolve infrastructure, application, and IT operations issues in collaboration with other IT support teams.
  • Develop, implement, and maintain automation tools and scripts to improve the efficiency and reliability of IT operations and infrastructure.
  • Seasoned technologist whom will identify technology and operational challenges in solutions and products offered by Architecture and Engineering teams as well as outside vendors and OEMs.
  • In partnership and cooperation with the architecture and engineering teams – ensures that products currently in ideation and development are being engineered with long term operational sustainment goals in mind.
  • Must have a solid understanding of Internet technologies and availability strategies for digital platforms. 
  • Must be familiar with complex network topics and availability approaches in an effort to drive performance from all network operations center functions.

Responsibilities

  • Drive the efficiency and effectiveness of the Event, Incident, Major Incident, Request Fulfillment and Problem Management processes
  • Experience in enterprise IT operations, including system administration, application platforms, infrastructure, networking fundamentals, and IT service management.
  • Strong understanding of Windows, Linux/Unix operating systems, networking platforms & concepts.
  • Proficiency in one or more scripting languages (e.g., Python, Bash, Ruby) and automation tools (e.g., Python, PowerShell).
  • Solid understanding of observability, monitoring and alerting tools (e.g., Splunk, New Relic, Grafana, ELK Stack, Datadog).
  • Familiarity with modern operations support methodologies and practices, such as Site Reliability Engineering (SRE).
  • Strong technology problem-solving and analytical skills, with the ability to quickly diagnose and resolve complex technical issues.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
  • Identify service improvement opportunities through trend analysis, proactive techniques, and after-action reviews
  • Analyze and publish operational utilization and service performance metrics regularly
  • Identify and drive service availability improvement opportunities by executing leading practices
  • Ensure that all DTOC services are designed to deliver the levels of availability required by the business, and validate of the final design to meet the minimum levels of availability as agreed by the business for IT services
  • Elevate any service gaps  proactively with leadership.
  • Participate in creating, maintaining, and regularly reviewing department procedures, operational readiness plans and posture, aimed at improving the overall availability of IT services and infrastructure components, to ensure that existing and future business availability requirements can be met. This includes compiling daily operational reports and facilitation of operational readiness calls.
  • Ensure the DTOC is effectively monitoring available tools and systems for high availability and swift response to potential and actual outage situations.
  • Perform as the incident commander on service outage calls, orchestrating recovery activities of DTOC and other technology teams to drive fast restoration of service without added risk to the organization, providing command and control of the call
  • Effectively apply Incident Analysis and Problem Analysis technique during an incident and post-incident and ensure staff apply the same
  • During outage situations consistently provide Situation Reports in a timely fashion, ensure work streams toward resolution are clearly articulated following department procedures, and business impacts are obtained and all communicated
  • Manage and provide the technical direction of the team to ensure 100% on-site coverage required to effectively support incidents, service requests, proactive health checks and HyperCare services
  • Perform DR/BCP activities for critical events and emergency onsite response.

Strategy

  • Responsible for influencing and socializing DTOC solutions, practices, roles, responsibilities, and processes
  • Responsible for influencing and socializing Operational service gaps to Engineering for capability enhancements.
  • Participate in creating, maintaining, and regular reviews targeting the overall readiness of services for existing and future business needs, including Operational Readiness Reviews (ORR)
  • Contribute to the development and sustainment of an enterprise level incident, event, and availability management strategy
  • Participate in the development and governance of service level agreements.

Qualifications:

BA/BS in Computer Science, Engineering or related field.  Equivalent work experience within large IT Operations organizations would be considered in lieu of degree.

Master’s in IT Systems or Business Administration (MBA) or MS in technical discipline.

Work Experience:

  • 5+ years experience supporting converged infrastructure stacks, including: application, compute, storage and networking
  • 5+ years leading incident recovery with multi-disciplined geographically dispersed teams in a Fortune 500 organization
  • 3+ years of experience in either a large IT shared services organization or outsourced environment
  • Experience leading technical recovery of major incidents for Fortune 500 organization
  • Experience with hands-on support of cloud operations with one or more: AWS, Google Cloud or Azure
  • Experience supporting diverse portfolios, multiple business applications and IT services
  • Experience working in a 24x7 IT operations environment.
  • Demonstrated experience with Service and Event Management tools.
  • Demonstrated experience in systems integration, application infrastructure support and middleware operations.
  • Demonstrates management skills, both from a resource management perspective and from the overall control of a process
  • Proven experience and understanding of root cause analysis techniques
  • Proven ability to be detail, deadline, and results-oriented
  • Strong leadership skills with the ability to motivate and encourage others
  • Ability to manage competing priorities and workflow
  • Solid interpersonal skills for
  • written, oral, and face to face communications
  • Practical experience with influence and negotiation methods and techniques
  • Ability to serve as mentor and coach
  • Strong customer service orientation, seeking opportunities to serve clients.

Skills / Specialized Knowledge/ Competencies

  • IT Automation and scripting in languages such as Python and/or PowerShell
  • Experience with ITIL frameworks and processes
  • Experience working within large, complex production teams
  • Experience working within an outsourced environment
  • Vendor relationship management experience
  • Comfortable working within a highly matrixed organization
  • Strong technology driven process experience 
  • Ability to work under pressure, meet internal and external work schedules and or deadlines and show effective time and crisis management skills
  • Expertise in supporting large-scale environments in a diverse culture
  • Demonstrated attentiveness to detail
  • Demonstrated strong partnering skills
  • Demonstrated proactive problem-solving and decision making skills
  • Demonstrated ability to delivery work on time
  • Proven team player with the ability to mentor, guide, and influence cross-functional teams
  • ITIL v3 Certification Preferred

申請

關於The Walt Disney Company (Corporate):

在 The Walt Disney Company (Corporate),你會看到公司強大品牌背後各業務如何融會交流,建構出全球最創新、影響深遠和備受尊崇的娛樂公司。作為企業團隊的一份子,你將會與推動策略以讓The Walt Disney Company穩佔娛樂界頂尖地位的世界精英領袖一同工作。與其他具有創新精神的思想家惺惺相惜,同時讓這個世界上最偉大的故事敍述家為全球各地千百萬家庭締造回憶。

關於 The Walt Disney Company:

Walt Disney Company 連同其子公司和聯營公司,是領先的多元化國際家庭娛樂和媒體企業,其業務主要涉及三個範疇:Disney Entertainment、ESPN 及 Disney Experiences。Disney 在 1920 年代的起步之初,只是一間卡通工作室,至今已成為娛樂界的翹楚,並昂然堅守傳承,繼續為家庭中每位成員創造世界一流的故事與體驗。Disney 的故事、人物與體驗傳遍世界每個角落,深入人心。我們在 40 多個國家/地區營運業務,僱員及演藝人員攜手協力,創造全球和當地人們都珍愛的娛樂體驗。

這個職位隸屬於 UTV Software Communications Private Ltd,其所屬的業務部門是 The Walt Disney Company (Corporate)。

招聘流程

  • 您的故事從哪裡開始?

    探索 Disney 職位空缺和 The Life at Disney 網誌,了解華特迪士尼公司有待發掘的所有精彩機會。

  • 迪士尼的故事裏,有你更精彩成就迪士尼故事

    有許多不同品牌和業務可供探索。當您找到適合您的機會後,請填寫您的申請,進行下一步。

  • 下一章

    申請後,您將收到一封電子郵件,讓您可存取應徵者控制面板。建立您的登入資料,並確保經常檢視您的控制面板,以查看申請進度。

探索此地點 印度

The Walt Disney Company 運用精采故事的非凡力量,為世界各地獻上頂級娛樂、豐富資訊及靈感啟發,締造出使我們成為全球頂尖娛樂公司的知名品牌、創意理念及創新科技。

我們的文化

  • 行政領導

    我們的高級主管為公司的日常營運帶來了豐富經驗、遠見思維和對卓越、創意和創新的共同承諾。

    了解更多 
  • 多元、公平與包容

    在 Disney,我們致力於創造一個更美好的世界。整個世界充滿歸屬感,讓每人都覺得備受重視、傾聽和理解。整個世界滿載希望和承諾。

    了解更多 

登記收取職缺通知

即時收到最新的工作機會的資訊。

關注我們的職位

星號表示必填欄位。

興趣要求從選項列表中選擇工作類別。從選項列表中選擇工作地點。最後,點擊「添加 (Add)」以建立你的職缺通知。

一經建立帳戶,即代表本人同意使用條款(在新視窗中開啟),並確認已閱讀私隱政策(在新視窗中開啟)

一經點擊「提交」,即同意我們的使用條款(在新視窗中開啟),並確認已閱讀我們的私隱政策(在新視窗中開啟)。如果本人選擇接收營銷訊息或電子通訊,本人可以隨時撤回對這些營銷訊息的同意。

一經點擊「提交」,即同意我們的使用條款(在新視窗中開啟),並確認已閱讀私隱政策(在新視窗中開啟)Cookie 政策(在新視窗中開啟)歐盟私隱權內容(在新視窗中開啟)

我們如何使用您的個人資料以及您的權利:

  1. 你的個人資料由 The Walt Disney Company Limited 控制,公司地址為:3 Queen Caroline Street, London, W6 9PE, United Kingdom。
  2. 當你遊覽 Disney、在 Disney 購物或使用任何 Disney 產品、服務或流動應用程式,The Walt Disney Company Family of Companies 亦可能使用你的資料,以向你提供此等服務、度身定制你的體驗,並向你發送有關服務的最新消息及通訊資料。
  3. 你擁有多項權利,包括有權要求存取、更改或移除你的個人資料,或更改你的營銷偏好設定(包括隨時撤回同意)。請參閱我們的私隱政策(在新視窗中開啟),以進一步了解如何管理你的營銷偏好設定或刪除你的帳戶。
  4. 如欲聯絡我們的資料保護專員,可發送電郵至:dataprotection@disney.co.uk
  5. 你有權向英國資訊專員的辦事處投訴:https://ico.org.uk/(在新視窗中開啟)
  6. 有關 Disney 資料收集和使用方式的更多資料,請見 Disney 的私隱政策(在新視窗中開啟)

點擊「提交」,即表示你同意我們的使用條款(在新視窗中開啟),並確認你已經閱讀我們的私隱政策(在新視窗中開啟)收集聲明(在新視窗中開啟)

如要進一步了解我們的一般資料收集、用途及做法,包括如何管理你的喜好設定,請參閱我們的私隱政策(在新視窗中開啟)。本人已閱讀和同意使用條款(在新視窗中開啟)

Privacy Policy Agreement

Privacy Policy Agreement

Privacy Policy Agreement

Privacy Policy Agreement

Privacy Policy Agreement