Cloud Engineer – Overland Park, KS (Onsite)

Urgent
Apply Now

Job Description

Cloud Platform Eng

The CDNWAF Platform Eng Application resource collaborates and partners with IT, Cyber, Digital, Product, application owners, infrastructure owners, architects, and developers to design, develop, operate and integrate CDN and WAF, security tools such as web application firewalls and bot mitigation to protect client websites and mobile apps.
This role also collaborates with cyber, corporate security fraud teams, privacy teams, and incident response teams in investigations related to websites, applications and digital and front-line services.
This individual is responsible for the analysis, design, architecture, site reliability, implementation, and maintenance of the automation tasks for our platform, operations, system engineer.

This individual is also responsible for collaborating with application owners and security vendors to update/tune Web Application Firewall (WAF) and bot mitigation security policies – directly and indirectly impacts.
Work with Akamai’s suite of products & services, you are involved in every facet of Akamai’s customized services for T-Mobile Digital.
Monitoring, troubleshooting and resolving server and platform, application, network problems
Working with various internal/external stakeholders to narrow down problems and achieve resolution
Ensuring SLAs are achieved, and Web Platform Operations work quality expectations are met; defining, enforcing and operating the SRE roles of service level indicators and objectives and error budgets.
Ensuring the accuracy of the Platform Operations procedures/documents, runbooks
Diagnosing troubles identified by platform, infrastructure, application, network, monitoring and working to resolve issues
Acting as an escalation point
This is an opportunity to work in a large scale, Web Platform engineering and operations. Working directly with Akamai’s SMEs, product and professional services teams, you are involved in every facet of Akamai’s services engineering & operations. You will contribute to T-Mobile’s Web Platform Operations success while being part of a very important team. Monitoring, troubleshooting and resolving platform, app, server and network problems. Working with various internal/external stakeholders to narrow down problems and achieve resolution. Ensuring SLAs are achieved, and Web Platform Operations work quality expectations are met. Ensuring the accuracy of the Web Platform Operations procedures/documents.
Hands on knowledge of architecture, client/server and distributed computing concepts
Hands on knowledge of web, app (IOS, Android) routing and network/content delivery protocols and methods
Hands on knowledge of full-stack web and web application(s), including frameworks such in JavaScript, Typescript and Angular, React.
Strong or Unix Operating & Administration skills
Strong understanding of networking protocols: DNS, HTTP, SSL, SMTP, TCP
Excellent oral and written communication skills
Proven customer/client service skills.
Candidates must have good oral and written communication skills in English
Flexibility to work weekend & extended, on-call shifts
Monitor mission critical systems and infrastructure to ensure proper function with a focus on continuous improvement and optimization
Identify and investigate bottlenecks and latency issues and implement strategies to proactively address potential challenges
Measure and analyze system health at granular and holistic levels and leverage data to drive decision making, continually innovating and scaling systems to meet business objectives
Provide operational support as needed; identify, investigate, and resolve complex technical issues as they arise
Serve as a technical and strategic advisor to leadership on technologies and methodologies to scale systems and optimize their availability and performance

Qualifications

5 – 10 years of progressive experience in cloud engineering, site reliability engineering, DevOps, system infrastructure, full stack web platform development or a related technical role
Deep knowledge of and experience with cloud infrastructure services (GCP, AWS, Kubernetes)
Strong scripting and coding ability across multiple languages and technologies
Proven experience optimizing reliability and performance and scaling technical infrastructure in a fast paced and collaborative environment
Strong analytical skills and ability to provide data driven recommendations for critical decision making, mainly around system reconfiguration and improvement
Willingness and ability to be “on call” to address unforeseen issues outside of normal working hours as needed
Akamai CDN, Imperva CDN, Fastly, Cloudfront
Performance, web acceleration techniques
Essential observability engineering for ensuring system health, optimizing performance, and maintaining infrastructure reliability.
Experience with monitoring tools such as Datadog, Kibana, Log Stash, and Splunk
Ability to understand metrics and their impact on system performance
Building dashboards and alerts to ensure system reliability
Proven ability to investigate and resolve complex technical issues
Strong scripting and coding abilities
Experience with load testing technologies

Preferred Qualifications

Knowledge of C, python, Javascript or other coding languages
Experience with Infrastructure as Code e.g. Terraform
Worked with CDN technologies (Akamai, Imperva, CloudFront, CloudFlare, Fastly, etc.)
Knowledge of Kubernetes infrastructure in a production environment
Experience with monitoring tools such as Datadog, Quicksite, Kibana-Log Stash, Splunk, ability to understand metrics and impact to system performance, and ability to build dashboards and alerts to ensure system reliability
Experience managing database systems such as Postgresql, and Mysql
Experience with load testing technologies

Related Jobs