Enhancing distributed systems through effective monitoring: The role of observability

Date

2024-05

Journal Title

Journal ISSN

Volume Title

Publisher

Yeshiva University, Yeshiva College

YU Faculty Profile

Abstract

Monitoring is a key component of developing a distributed system. A distributed system, as the name suggests, is distributed across many locations, computers, and processes. Being distributed allows the system to grow to meet higher demand, can reduce costs over time, and makes the system more resilient to crashes. However, it also introduces a lot of complexity to the system. This complexity makes it harder to track down problems and implement solutions when things go wrong. For example, one of the many components in a distributed system may have a bug in its code which may cause other components to get confused and return false or outdated information to the clients. Tracking down such a bug can be difficult if there is not a robust framework for monitoring the distributed system. The goal of a monitoring framework is to gather data such that the developers, operators, and product support teams can ask and answer questions about the system without digging around inside its guts. We call this “observability,” a term borrowed from engineering, because our goal is to gain insight into the inner workings of our system based on its outputs (Majors, 2019). •This year I worked with eight other engineers to build a distributed trade processing service. Our goal was to build an application that allows users to register, buy and sell stocks, and view the real-time value of their portfolios. The most important goal, however, was to distribute this functionality over a network of computers so that the application would be highly scalable, highly available, and fault tolerant. This project allowed us to put the knowledge we have gathered on distributed systems to the test by building a non-trivial distributed system. We split into teams of three and built different parts of the application in parallel. My team was responsible for developing the observability framework for the application. Observability is such a well-known problem that there are many free and paid observability solutions but we chose to implement our observability system from scratch. Having a custom-built observability framework allowed us to develop the most robust monitoring capabilities for our application. (from Introduction)

Description

Undergraduate honors thesis / Opt-Out

Keywords

Enhancing Distributed Systems, monitoring, TECHNOLOGY::Information technology::Computer science::Software engineering, TECHNOLOGY::Information technology::Computer science

Citation

Levy, E. (2024, May). Enhanci ng distributed systems through effective monitoring: The role of observability [Unpublished undergraduate honors thesis, Yeshiva University].