Talent.com
A vaga não está disponível no seu país.
Site Reliability Engineer

Site Reliability Engineer

YellowIpePortugal, PT
Há 8 dias
Descrição da vaga

About The YellowIpe

Our mission is to inspire the connection between technology and people, we foster the best of our professionals through our expertise in finding and attracting the best talent for the best projects. The Focus on People, Collaboration and Commitment are the pillars that guide us in this trajectory.

Join the yellow team as our new Site Reliability Engineer!

About the position

This role primarily involves managing and optimizing our Azure AKS clusters, ensuring a stable and highly available environment. You'll provide guidance and expertise to ensure smooth migrations, troubleshoot issues, and implement best practices to enhance efficiency and reliability. You'll play a crucial role in helping developers improve their applications with a focus on being “cloud-native.”

Beyond Kubernetes, you'll take on broader SRE responsibilities, including developing and documenting infrastructure as code to ensure repeatable, efficient, and reliable deployments. Automation will be a primary focus, helping us reduce manual interventions and accelerate our development cycles.

Working in a specialized team, you'll engage daily with various IT professionals, including architects, database engineers, network engineers, security engineers, and developers. We're looking for someone with high technical knowledge, proficiency in English, and strong soft skills to foster a collaborative and positive team environment.

Responsibilities :

  • Collaborate closely with internal and external stakeholders, developers, and managers to ensure timely deliverables.
  • Maximize the capabilities of our Kubernetes platform for business applications.
  • Continuously evolve and enhance our platforms.
  • Prioritize the observability of our Kubernetes platforms.
  • Maintain cost efficiency and FinOps control of the platform.
  • Implement SRE and DevOps practices with a focus on Infrastructure as Code, automation, and scalability.
  • Design, develop, and implement new features to foster a CI / CD mindset, optimizing the software development lifecycle from development to production.
  • Analyze, troubleshoot, and resolve complex incidents, ensuring they do not recur.
  • Implement code review and testing mechanisms to continuously improve quality.

Requirements :

  • Proven experience with Kubernetes and its ecosystem in a production environment.
  • Familiarity with containerization technologies.
  • Solid understanding of observability best practices, particularly in Kubernetes.
  • Proficiency in using code management tools, repositories, and CI / CD pipelines.
  • Experience with public cloud platforms.
  • Proven ability to work effectively in cross-functional teams.
  • Experience with GitOps tools.
  • Expertise in Infrastructure as Code, particularly with Terraform.
  • Strong autonomy and critical thinking skills, with a focus on collaboration.
  • Excellent communication skills (written, listening, and speaking), a collaborative mindset, and a proactive attitude.
  • Fluency in English (mandatory).
  • Additional / preferrable Skills :

  • Experience with Azure cloud.
  • Proficiency in using GitLab and GitLab CI / CD.
  • Familiarity with Argo CD.
  • Strong knowledge of observability stacks, including monitoring, logging, and tracing.
  • Development experience and a solid understanding of application lifecycle management.
  • Experience working with Agile methodologies.
  • Certified Kubernetes Administrator (CKA) certification.
  • Important informations :

  • Remote (1x / month in the office) - Leiria.
  • Candidates must reside in Portugal.
  • Apply for this opportunity in our ! =)

    Criar um alerta de emprego para esta pesquisa

    Site Reliability Engineer • Portugal, PT