Ramadan Khalifa

Software Engineer

How to Pass the AWS Certified Solution Architect Associate Exam in Two Month: A Practical Guide

Amazon Web Services (AWS) is a cloud computing platform that provides a wide range of services, including computing, storage, and database services, to name a few. As the demand for cloud computing continues to grow, the need for certified professionals who can manage these services efficiently also increases. AWS Certified Solution Architect Associate Exam is an entry-level […]

How to Pass the AWS Certified Solution Architect Associate Exam in Two Month: A Practical Guide Read More »

Book Summary: SRE, Part 4, Best Practices for Building Monitoring and Alerting

Monitoring is a crucial aspect of Site Reliability Engineering (SRE) because it allows teams to detect, diagnose, and resolve issues in distributed systems. In this article, we’ll explore the principles of monitoring and best practices for monitoring distributed systems. First principle: Measure what matters Teams should identify key performance indicators (KPIs) that directly impact user

Book Summary: SRE, Part 4, Best Practices for Building Monitoring and Alerting Read More »

Root Cause Analysis (RCA) Using Distributed tracing

Distributed tracing is a method of tracking the propagation of a single request as it’s handled by various services that make up an application. Tracing in that sense is “distributed” because in order to fulfill its function, a single request must often traverse process, machine and network boundaries. Once we instrumented our application and exported our

Root Cause Analysis (RCA) Using Distributed tracing Read More »

Book Summary: SRE, Part 3, Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs)

In this article, We are going to learn about Site Reliability Engineering (SRE) core terminologies. It’s important to understand those terms because they are used a lot nowadays in the software industry. I know that learning terminologies might sound boring or complex but I will try to make it simple and as practical as possible.

Book Summary: SRE, Part 3, Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) Read More »

Book Summary: Site Reliability Engineering, Part 2, Error Budgets and Service Level Objectives (SLOs)

It would be nice to build 100% reliable services. Ones that never fail. right? absolutely not. It’s going to be really bad to do such a thing because it’s very expensive and it will limit how fast new features can be developed and delivered to the users. Also users typically won’t notice the difference between

Book Summary: Site Reliability Engineering, Part 2, Error Budgets and Service Level Objectives (SLOs) Read More »