Systems Performance

# Systems Performance ![rw-book-cover](https://m.media-amazon.com/images/I/61aoW2kC0kL._SY160.jpg) ## Metadata - Author: [[Brendan Gregg]] - Full Title: Systems Performance - Category: #programming-best-practices #system-architecture #devops ## Highlights - performance issues can originate from anywhere, including areas of the system that you know nothing about and you are therefore not checking (the unknown unknowns). ([Location 476](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=476)) - Systems performance studies the performance of an entire computer system, including all major software and hardware components. ([Location 696](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=696)) - If you don’t have a diagram of your environment showing the data path, find one or draw it yourself; this will help you understand the relationships between components and ensure that you don’t overlook entire areas. ([Location 699](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=699)) - The typical goals of systems performance are to improve the end-user experience by reducing latency and to reduce computing cost. ([Location 700](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=700)) - speaking of systems performance, however, we use full stack to mean the entire software stack from the application down to metal (the hardware), including system libraries, the kernel, and the hardware itself. Systems performance studies the full stack. ([Location 704](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=704)) - Methodologies and tools to help perform these activities are covered in this book. Setting performance objectives and performance modeling for a future product. Performance characterization of prototype software and hardware. Performance analysis of in-development products in a test environment. Non-regression testing for new product versions. Benchmarking product releases. Proof-of-concept testing in the target production environment. Performance tuning in production. Monitoring of running production software. Performance analysis of production issues. Incident reviews for production issues. Performance tool development to enhance production analysis. ([Location 725](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=725)) - Performance engineering should ideally begin before any hardware is chosen or software is written: the first step should be to set objectives and create a performance model. However, products are often developed without this step, deferring performance engineering work to a later time, after a problem arises. With each step of the development process it can become progressively harder to fix performance issues that arise due to architectural decisions made earlier. ([Location 739](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=739)) - One such technique is testing new software on a single instance with a fraction of the production workload: this is known as canary testing. ([Location 744](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=744)) - Two perspectives for performance analysis are labeled in Figure 1.2: workload analysis and resource analysis, which approach the software stack from different directions. ([Location 759](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=759)) - The resource analysis perspective is commonly employed by system administrators, who are responsible for the system resources. Application developers, who are responsible for the delivered performance of the workload, commonly focus on the workload analysis perspective. ([Location 762](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=762)) - Performance, on the other hand, is often subjective. With performance issues, it can be unclear whether there is an issue to begin with, and if so, when it has been fixed. What may be considered “bad” performance for one user, and therefore an issue, may be considered “good” performance for another. ([Location 772](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=772)) - Subjective performance can be made objective by defining clear goals, such as having a target average response time, or requiring a percentage of requests to fall within a certain latency range. ([Location 779](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=779)) - Solving complex performance issues often requires a holistic approach. The whole system—both its internals and its external interactions—may need to be investigated. This requires a wide range of skills, and can make performance engineering a varied and intellectually challenging line of work. ([Location 792](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=792)) - the real task isn’t finding an issue; it’s identifying which issue or issues matter the most. ([Location 806](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=806)) - Latency is a measure of time spent waiting, and is an essential performance metric. ([Location 812](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=812)) - In systems performance, the term profiling usually refers to the use of tools that perform sampling: taking a subset (a sample) of measurements to paint a coarse picture of the target. ([Location 879](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=879)) - An effective visualization of CPU profiles is flame graphs. CPU flame graphs can help you find more performance wins than any other tool, after metrics. They reveal not only CPU issues, but other types of issues as well, found by the CPU footprints they leave behind. ([Location 881](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=881)) - Tracing is event-based recording, where event data is captured and saved for later analysis or consumed on-the-fly for custom summaries and other actions. ([Location 895](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=895)) - The Linux technology for kernel static instrumentation is called tracepoints. There is also a static instrumentation technology for user-space software called user statically defined tracing (USDT). USDT is used by libraries (e.g., libc) for instrumenting library calls and by many applications for instrumenting service requests. ([Location 901](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=901)) - Dynamic instrumentation creates instrumentation points after the software is running, by modifying in-memory instructions to insert instrumentation routines. ([Location 924](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=924)) - BPF, which originally stood for Berkeley Packet Filter, is powering the latest dynamic tracing tools for Linux. ([Location 937](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=937)) - Apart from observability tools there are also experimentation tools, most of which are benchmarking tools. These perform an experiment by applying a synthetic workload to the system and measuring its performance. ([Location 952](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=952)) - As an analogy: a car’s lap time at Laguna Seca Raceway could be considered a macro-benchmark, whereas its top speed and 0 to 60mph time could be considered micro-benchmarks. ([Location 956](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=956)) - you have two hands, observability and experimentation. Only using one type of tool is like trying to solve a problem one-handed. ([Location 982](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=982)) - New difficulties caused by cloud computing and virtualization include the management of performance effects from other tenants (sometimes called performance isolation) and physical system observability from each tenant. ([Location 992](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=992)) - IOPS: Input/output operations per second is a measure of the rate of data transfer operations. For disk I/O, IOPS refers to reads and writes per second. ([Location 1178](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1178)) - Throughput: The rate of work performed. Especially in communications, the term is used to refer to the data rate (bytes per second or bits per second). In some contexts (e.g., databases) throughput can refer to the operation rate (operations per second or transactions per second). ([Location 1180](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1180)) - Response time: The time for an operation to complete. This includes any time spent waiting and time spent being serviced (service time), including the time to transfer the result. ([Location 1183](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1183)) - Latency: A measure of time an operation spends waiting to be serviced. In some contexts it can refer to the entire time for an operation, equivalent to response time. See Section 2.3, Concepts, for examples. ([Location 1185](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1185)) - Utilization: For resources that service requests, utilization is a measure of how busy a resource is, based on how much time in a given interval it was actively performing work. For resources that provide storage, utilization may refer to the capacity that is consumed (e.g., memory utilization). ([Location 1187](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1187)) - Saturation: The degree to which a resource has queued work it cannot service. ([Location 1189](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1189)) - Bottleneck: In systems performance, a bottleneck is a resource that limits the performance of the system. Identifying and removing systemic bottlenecks is a key activity of systems performance. ([Location 1191](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1191)) - Workload: The input to the system or the load applied is the workload. For a database, the workload consists of the database queries and commands sent by the clients. ([Location 1193](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1193)) - Cache: A fast storage area that can duplicate or buffer a limited amount of data, to avoid communicating directly with a slower tier of storage, thereby improving performance. For economic reasons, a cache is often smaller than the slower tier. ([Location 1195](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1195)) - Some components and resources can be modeled as a queueing system so that their performance under different situations can be predicted based on the model. Disks are commonly modeled as a queueing system, which can predict how response time degrades under load. ([Location 1210](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1210)) - The latency is the time spent waiting before an operation is performed. ([Location 1225](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1225)) - File system record size (or block size): Small record sizes, close to the application I/O size, will perform better for random I/O workloads and make more efficient use of the file system cache while the application is running. Large record sizes will improve streaming workloads, including file system backups. ([Location 1296](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1296)) - Network buffer size: Small buffer sizes will reduce the memory overhead per connection, helping the system scale. Large sizes will improve network throughput. ([Location 1298](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1298)) - Performance tuning is most effective when done closest to where the work is performed. For workloads driven by applications, this means within the application itself. ([Location 1302](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1302)) - systems performance is not just about cost: it is also about the end-user experience. ([Location 1336](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1336)) - What may have been the best advice from a performance expert one week may become invalid a week later after a software or hardware upgrade, or after adding more users. ([Location 1365](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1365)) - The performance of the system under increasing load is its scalability. ([Location 1389](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1389)) - Common types of system performance metrics include: Throughput: Either operations or data volume per second IOPS: I/O operations per second Utilization: How busy a resource is, as a percentage Latency: Operation time, as an average or percentile ([Location 1418](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1418)) - 100% busy does not mean 100% capacity. ([Location 1460](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1460)) - utilization usually refers to the time-based version, which you could also call non-idle time. The capacity version is used for some volume-based metrics, such as memory usage. ([Location 1465](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1465)) - The degree to which more work is requested of a resource than it can process is saturation. Saturation begins to occur at 100% utilization (capacity-based), as extra work cannot be processed and begins to queue. ([Location 1468](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1468)) - Profiling builds a picture of a target that can be studied and understood. In the field of computing performance, profiling is typically performed by sampling the state of the system at timed intervals and then studying the set of samples. ([Location 1476](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1476)) - Caching is frequently used to improve performance. A cache stores results from a slower storage tier in a faster storage tier, for reference. An example is caching disk blocks in main memory (RAM). ([Location 1483](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1483)) - CPUs commonly employ multiple hardware caches for main memory (Levels 1, 2, and 3), beginning with a very fast but small cache (Level 1) and increasing in both storage size and access latency. ([Location 1485](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1485)) - One metric for understanding cache performance is each cache’s hit ratio—the number of times the needed data was found in the cache (hits) versus the total accesses (hits + misses): hit ratio = hits / (hits + misses) ([Location 1490](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1490)) - Most recently used (MRU) refers to a cache retention policy, which decides what to favor keeping in the cache: the objects that have been used most recently. ([Location 1508](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1508)) - Least recently used (LRU) can refer to an equivalent cache eviction policy, deciding what objects to remove from the cache when more space is needed. ([Location 1509](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1509)) - A cold cache is empty, or populated with unwanted data. ([Location 1514](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1514)) - A warm cache is one that is populated with useful data but doesn’t have a high enough hit ratio to be considered hot. ([Location 1516](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1516)) - A hot cache is populated with commonly requested data and has a high hit ratio, for example, over 99%. ([Location 1517](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1517)) - Cache warmth describes how hot or cold a cache is. ([Location 1519](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1519)) - Performance is a field where “the more you know, the more you don’t know.” The more you learn about systems, the more unknown-unknowns you become aware of, which are then known-unknowns that you can check on. ([Location 1535](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1535)) - Resource analysis begins with analysis of the system resources: CPUs, memory, disks, network interfaces, buses, and interconnects. It is most likely performed by system administrators—those responsible for the physical resources. ([Location 1544](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1544)) - Metrics best suited for resource analysis include: IOPS Throughput Utilization Saturation ([Location 1552](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1552)) - Workload analysis (see Figure 2.11) examines the performance of applications: the workload applied and how the application is responding. It is most commonly used by application developers and support staff—those responsible for the application software and configuration. ([Location 1560](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1560)) - The targets for workload analysis are: Requests: The workload applied Latency: The response time of the application Completion: Looking for errors ([Location 1564](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1564)) - Latency (response time) is the most important metric for expressing application performance. ([Location 1572](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1572)) - Metrics best suited for workload analysis include: Throughput (transactions per second) Latency ([Location 1581](https://readwise.io/to_kindle?action=open&asin=B08J5QZPNC&location=1581))