Talent.com
A vaga não está disponível no seu país.
Lead Site Reliability Engineer (SRE)

Lead Site Reliability Engineer (SRE)

OutSystemsPortugal, Remote
Há 11 dias
Descrição da vaga

There are NO limits to your career : come shape the future and be part of a truly unique global culture at OutSystems!

About This Role

An Monitoring & Observability Engineer is responsible for ensuring that complex systems are observable, resilient, and performant by designing, implementing, and maintaining monitoring, logging, and alerting solutions. This work enables teams to detect, diagnose, and resolve issues quickly, improving system reliability and availability.

At OutSystems, the Monitoring & Observability Engineers work closely with other Engineers, Product Managers, and Stakeholders to identify opportunities for improvement, implement best practices, and drive continuous optimization across our platforms.

Key Responsibilities

As an Monitoring & Observability Engineer, these are your key responsibilities and duties :

Observability Implementation : Develop and maintain telemetry systems, including logs, metrics, traces, and dashboards.

Define and implement best practices for making systems and services measurable, and collaborate with stakeholders and teams to apply these practices

Collaborate with engineering teams to implement modern instrumentation and telemetry signal collection for their services, ensuring meaningful insights are derived

Create and maintain documentation related to the observability platform and SRE practices

Work closely with development and operations teams to ensure optimal performance, availability, and security of services

Contribute to our evolving "data-driven" and "cloud-first" culture through continuous learning

Monitoring & Observability Performance Indicators

The main KPIs that aid in understanding the impact and success of the Monitoring & Observability function at OutSystems are :

Coverage & Adoption

Telemetry Coverage (%) - Percentage of critical services instrumented with logs, metrics, and traces

Log Completeness (%) – Percentage of logs successfully collected vs. expected logs.

Tracing Coverage (%) – Percentage of distributed traces captured across service interactions.

Metric Coverage (%) – Percentage of monitored services with key metrics (latency, errors, traffic, saturation).

Dashboard & Alerting Adoption (%) – Percentage of teams actively using observability tools.

Impact on Reliability

Reduction in MTTR (%) – Improvement in Mean Time to Resolve due to observability insights.

Root Cause Identified (%) – Percentage of incidents where observability tools helped determine a clear root cause.

SLO Compliance (%) – Percentage of time services meet reliability objectives based on observability insights.

To illustrate the desired profile for a Monitoring & Observability Engineer. Nevertheless, the selection of candidates will always vary depending on specific knowledge of the field and prior experience.

Qualifications

STEM degree (BSc, MSc, in Software Engineering / Computer Science or related fields);

Strong experience in software development and / or operations;

Proficiency in at least one high-level programming language (C++, Python, Java, C#, etc.).

Strong troubleshooting and debugging skills.

Fluency in English and excellent communication skills.

Soft Skills

Communication - able to communicate effectively (in English) both orally and written showing empathy for the other person;

Humbleness - accepts mistakes and acts accordingly, with a humble attitude, apologizing for them and mitigating them ASAP to avoid higher impact.

Accountability - takes ownership of problems and makes sure to see them through. Even if he does not have all the necessary knowledge to move on alone, can involve the right people to reach closure.

Negotiation Skills - has tough and politically complex conversations with colleagues and customers, defusing disagreements and leading towards a mutual agreement and understanding of all parties involved.

Process Oriented - is organized and able to properly follow defined processes, whilst being able to properly challenge inefficient processes and suggest improvements.

Problem-solving - Has a top-down approach to problems, breaking them into smaller pieces and solving them by starting with a wider scope and narrowing it down as the analysis progresses. Has critical thinking, so can analyze information objectively and make a reasoned judgment.

Technical Skills

Strong experience with Observability / SRE tools, platforms, and standards, including but not limited to ELK Stack, Grafana, Prometheus, Loki, Nobl9

Familiarity with modern logging frameworks and best practices : Opentelemetry, Logstash, Logguru, etc.

Containerization technologies and orchestration platforms, mainly Kubernetes and EKS (CKA, CKAD, CKS certifications are valued);

Experience with Python, Go, Bash / Shell scripting, or other automation tools / languages;

Experience with automation and Infrastructure as Code (IaC) tools, such as AWS CloudFormation, Terraform, Puppet, Chef, Spacelift, etc;

Strong understanding of designing resilient and fault-tolerant systems;

Expertise in debugging complex distributed systems;

Proficiency in monitoring and troubleshooting complex distributed systems.

The Longer Story :

OutSystems is a global leader transforming how companies innovate through software, empowering IT leaders with a better way to build the software that matters most.We are looking for talented and motivated people to join us in helping companies solve some of their most strategic business challenges, from modernizing their workplace processes to transforming their employee and customer experiences. As a member of the OutSystems global team, you will help build, deliver, manage, and evolve the software that is a low-code market leader andpreferred by professional developers around the world.

OutSystems is a truly global company, with more than 800,000 developer community members, 1,700 employees, more than 500 partners, and thousands of active customers in over 75 countries and across 21 industries. Founded in 2001, OutSystems has offices in the United States, United Kingdom, the Netherlands, Portugal, Germany, the UAE, Japan, Hong Kong, Malaysia, Australia, India, and Singapore, and of course has a thriving, worldwide community of remote employees.

Working at OutSystems

Our goal is to ensure that OutSystems is a place for bright, happy, and motivated people who share a common purpose and take pride in excellent work towards our vision. Our culture is focused on building agility at scale, which allows us to operate with a high drive in a competitive market. At OutSystems, we operate like a startup at scale, where teams act as coordinated "startups" - a true Federation of Teams Culture. Our attributes define the core behaviors that fuel our innovation and foster agility at scale. We encourage our team members to collaborate, focus on results, act quickly, understand our business and reinvent themselves.

What do we have to offer you?

A company that continues to grow, change and innovate, and gives our teams the space to be proactive and creative.

Real career opportunities. We care about growth and development. Vertical career progression is an obvious possibility, but we also offer the possibility for lateral moves, joining different teams, and mastering specific skills.

Work colleagues that are as smart, hardworking and driven as you – and a team that is global.

Disrupting the status quo is in our DNA. In fact, it’s why our company exists.

We “Ask Why” a lot. It helps us connect our individual work to the bigger picture and sometimes even uncover a better way.

Are you ready for the next step in your career? Then we’d love to hear from you!

OutSystems nurtures an inclusive culture of diversity, where everyone feels empowered to be their authentic self and perform at their best. A company that embraces the creativity and innovation that comes through diverse perspectives. We are committed to creating a team that reflects society through inclusive programs and initiatives and are proud to be an equal opportunity employer. All qualified applicants receive equal consideration regardless of race, place of origin, color, age, marital status, religion, sex, sexual orientation, gender expression or identity, protected veteran status, disability status or any other status protected by law.

Join us in disrupting the status quo of the low-code market, we give you the power to "Ask Why", you give our customers the power to innovate through software!

Criar um alerta de emprego para esta pesquisa

Site Reliability Engineer • Portugal, Remote

Vagas relacionadas
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

SWORD Health, IncPortugal
Sword Health is on a mission to free two billion people from pain.With 67% of members achieving a pain-free life and a 70% reduction in surgery intent, at Sword, we are using AI Care to change live...Mostre maisÚltima atualização: há mais de 30 dias
  • Promovida
Site Reliability Engineer

Site Reliability Engineer

Asenium ConsultingPortugal, Portugal, Portugal
As a Site Reliability Engineer (SRE) with a focus on Application Support, you will be responsible for ensuring the stability, performance, and continuous improvement of a complex ecosystem in the r...Mostre maisÚltima atualização: 25 dias atrás
Site Reliability Engineer

Site Reliability Engineer

WHOWPortugal
Experiência superior a 3 anos em suporte aplicacional como SRE (Site Reliability Engineering).Conhecimento de SQL, Kibana, Elasticsearch, Prometheus, Grafana, AWS e Shell.Conhecimento Jira, Conflue...Mostre maisÚltima atualização: há mais de 30 dias
  • Promovida
Lead SRE

Lead SRE

DecskillPortugal, Portugal
Decskill, founded in 2014 as an IT Consulting Company, places paramount importance on its greatest asset : its people.Our main mission is to deliver value through knowledge and talent, and we achiev...Mostre maisÚltima atualização: há mais de 30 dias
  • Promovida
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Air Apps, Inc.Portugal, Portugal
At Air Apps, we believe in thinking bigger—and moving faster.We’re a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP), an...Mostre maisÚltima atualização: há mais de 30 dias
Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)

YellowIpePortugal, PT
Our mission is to inspire the connection between technology and people, we foster the best of our professionals through our expertise in finding and attracting the best talent for the best projects...Mostre maisÚltima atualização: há mais de 30 dias
  • Promovida
Site Reliability Engineer

Site Reliability Engineer

Millennium bcpPortugal, Portugal
ÁGIL, MODERNO, PRÓXIMO, SIMPLES E SUSTENTÁVEL.O Millennium bcp tem uma estratégia de Sustentabilidade que incorpora e promove uma cultura corporativa de responsabilidade ambiental e social, assente...Mostre maisÚltima atualização: 1 dia atrás
  • Promovida
Site Reliability Engineer

Site Reliability Engineer

emagine - PortugalPortugal, Portugal, Portugal
We’re Hiring : Site Reliability Engineer (Azure).Location : Lisbon, Portugal (Hybrid – 2x per week onsite).Type : Full-time | Long-term opportunity. Language : English-speaking environment.Are you passi...Mostre maisÚltima atualização: 8 dias atrás
  • Promovida
Site Reliability Engineer

Site Reliability Engineer

Signify TechnologyPortugal, Portugal
SRE / Site Reliability Engineer.Skills : AWS, Kubernetes, EKS, Terraform, high-scale systems, monitoring.Truly career defining roles here for Site Reliability Engineers with one of Europe’s fastest ...Mostre maisÚltima atualização: 27 dias atrás
Software Engineer II (SRE) (Viator)

Software Engineer II (SRE) (Viator)

TripadvisorPortugal
Viator’s mission is to bring more wonder into the world.To bring extraordinary, unexpected, and forever-memorable experiences to more people, more often, wherever they’re traveling, wherever they a...Mostre maisÚltima atualização: há mais de 30 dias
Lead Site Reliability Engineer

Lead Site Reliability Engineer

GympassPortugal (Remote)
Wellhub (formerly Gympass •) is a corporate wellness platform that connects employees to the best partners for fitness, mindfulness, therapy, nutrition, and sleep, all included in one subscription d...Mostre maisÚltima atualização: há mais de 30 dias
  • Promovida
Site Reliability Engineer

Site Reliability Engineer

PrimeITPortugal, Portugal, Portugal
A PrimeIT é uma empresa internacional com mais de 18 anos de experiência a fornecer as melhores soluções numa vasta gama de serviços : Team-as-a-Service, Team Extension, Managed Services, Nearshore ...Mostre maisÚltima atualização: 23 dias atrás
Site Reliability Engineer

Site Reliability Engineer

Amaris ConsultingPortugal
Find your place at Amaris Consulting as a.In Amaris, we are currently increasing our team.Join our team and start a new adventure in an nternational and agile Environment, where you will be able to...Mostre maisÚltima atualização: 17 dias atrás
  • Promovida
Site Reliability Engineer

Site Reliability Engineer

A2IT TechnologyPortugal, Portugal, Portugal
A2IT Tecnologia is a Portuguese company specializing in information technology services, founded in 2006.We provide integrations and management of technological solutions, with competence centers a...Mostre maisÚltima atualização: 11 dias atrás
  • Promovida
Service Reliability Engineer

Service Reliability Engineer

KCS iTPortugal, Portugal, Portugal
We’re looking for the special, unique and amazing YOU!.KCS IT, we look for the ones that stands out, for those that always wants to be better and fight for it, and for those who has the same values...Mostre maisÚltima atualização: 5 dias atrás
  • Promovida
Senior Backend.Net Engineer

Senior Backend.Net Engineer

ParserPortugal, Portugal, Portugal
This position allows you to join a fast-growing technology organization redefining productivity paradigms in the software engineering industry. If you are passionate about technology and want to lea...Mostre maisÚltima atualização: 10 dias atrás
DevOps Engineer

DevOps Engineer

AltorosPT
Quick Apply
Protofire is looking for a passionate and proactive DevOps Engineer to join our growing team.The ideal candidate will have hands-on experience with cloud infrastructure (AWS), containerization tech...Mostre maisÚltima atualização: 1 dia atrás
  • Promovida
Site Reliability Engineer

Site Reliability Engineer

Vivid RockPortugal, Portugal
Vivid Rock are proud to be partnered with an early stage SaaS business that have recently received significant investment and are expanding globally. As part of their expansion strategy, they are lo...Mostre maisÚltima atualização: 28 dias atrás