logging Archives - FusionReactor Observability & APM

Benefits of observability and monitoring

Nick Flewitt — Fri, 03 Dec 2021 10:48:12 +0000

Using observability and monitoring

There’s an increasing demand for flexible observability and monitoring solutions that can collect all the data about their environment in one place by organizations, businesses, and developers.

First off, you can only monitor an observable system. And with observability lies a powerful tool capable of evaluating the status of internal systems simply through its outputs. Your team can save time by using an observability tool instead of detective work that will distract you from product development. Observability is the ability to understand a system’s state from its outputs. This puts you at an advantage as it arms you with the knowledge to ask any question to know how codes are behaving. Observability provides insight into a system’s overall state and is usually related to its infrastructure.

Ops teams, Engineering and SRE use observability to actively debug their systems. As a consequence, observability explores areas that were not defined or uncovered. Why this is pertinent is because code can behave differently from the production stage and staging stage. Users may be impacted by these changes, so it is important to observe what’s happening in production when it affects them.

Benefits of observability;

It gives insight for a better understanding of the internal workings of production to make improvements for end-users to enjoy seamless usage.
Monitors the performance of applications.
Easy identification of root causes of issues and helps in troubleshooting
Intuitive dashboard showcasing what’s happening in real-time.
It is integrated with a self-healing infrastructure.
Provides readily accessible information

Monitoring is a practice that empowers SRE and Ops teams to look through and understand the different states of their system. This can be done through already established metrics, dashboard reports that are constantly updated in real-time.

What do observability and monitoring have in common?

Observability and Monitoring are symbiotic, and yes, they serve different purposes. Observability means that data is accessible, whereas monitoring means collecting and displaying that data, then relying on it for further analysis or monitoring.

The three pillars of observability

Observability has been divided into three core pillars, namely; metrics, Logs, and traces

Metrics

These numbers describe a particular process or activity measured over intervals of time. A metric is a collection of all data about the performance of a service. Most times, it is made up of a single number which is tracked over time. Before now, traditional system metrics such as CPU< MEMORY and disk performance were relied upon for tracking. This includes data collection such as; The number of queries by a particular time frame, The latency involved with service requests or queries, and CPU profiling.

The major setback with metrics is that, while this gives enough information on the system, it doesn’t provide user experience and ways to enhance your code performance. Modern monitoring services now offer APM services as a solution to this issue. APM is integrated with features capable of tracking application-level metrics. These metrics have requests per metric, error rates.

Every metric here tracks a single variable and can be quite cheap for storage and likewise sending. DevOps teams, Ops, and SRE teams usually play a big role in determining the best set of metrics to watch out for. This is subject to variations depending on the service itself and its overall maturity. Often, teams watch their metrics dashboard for code changes anytime a new fix or release is shipped. Common metric sources include; System Metrics (CPU, memory, disk), Infrastructure metrics, web tracking scripts (google analytics, digital experience management), application agents (APM, error tracking), Business metrics (revenue, customer sign-ups, bounce rate, cart abandonment).

Logs

Logs are often referred to as time-stamped records and immutable events. They are the permanent records of events that have occurred. They can be relied upon to identify specific patterns in a system. They generally represent the output from your code. Every process within a system emits logs; these logs contain records of individual user queries plus debugging information, which is always associated with the service. Programming languages and frameworks rely on libraries to generate logs from the running codes with a couple of distinct levels of specificity. Must have information in logs include; Timestamp, Mac address, Session Id, Source ID, Source IP, Status Code, Response time, HTTP headers. The process of analyzing the contents of logs through queries is called log monitoring.

Traces

When an event is time-stamped, a trace displays the operation flow from the parent event to the Child. Individual events forming a trace are called spans. Each unit of span stores the following information; Start time, duration, and parent-id. If there is no parent-id, such spans are rendered as root spans.

With traces, it is possible for individual execution flows to be traced through the system. This goes a long way to help teams figure out what component or set of code is responsible for a potential system error. Teams can adopt dedicated tracing tools to look into specific requests. Analyzing trace spans, including waterfall views responsible for showing multiple spans in your system, can help you run queries to examine latency errors. FusionReactor provides tracing capabilities as one of its core offerings.

APM (Application Performance Monitoring)

There’s an increasing demand for a flexible observability solution that can collect all the data about their environment in one place by organizations, businesses, etc. With an APM solution, you’re open to gaining insights into real-time customer behavior, and application errors, or a drop in conversion rates. With a reliable observability tool, you’re bound to get a better understanding no matter what complex infrastructure you are dealing with. Some benefits accrue to you for relying on a good observability tool. This goes on to include;

Faster application
Customer-specific service level objectives (SLOs) integrated with detectors to alert them as soon as issues arise.
Reliable downsizing analysis and capacity planning can help you save so much money
Reduction on the number of CI jobs pending

Also, If you are having issues migrating to a microservices-based environment, you might not be alone. Others have had similar problems. Before now, they relied on multiple monitoring tools to get visibility into their complex application system.

Advantages of using an APM;

Resolve issues regarding uptime and latency at a 60% faster rate
Get real-time alerts when there are issues as opposed to hours with multiple monitoring tools.
Ability to send and analyze over 300 metrics with 4x more granularity than before.
Real-time observability across their environment
Reduction of MTTR in Production from several minutes to seconds
Access to complex data analytics and better metrics correlation

The ultimate goal of observability and monitoring

The ultimate goal of observability and monitoring is to improve the system. Research teams that offer a variety of monitoring and observability solutions include; DevOps Research and Assessment (DORA) Research. Observability can help you integrate with your tools properly. It can equally help you run the analysis for faster and timely incident resolution and ongoing team learning. Lastly, for overall convenience, you can share the derived information with relevant stakeholders.

The post Benefits of observability and monitoring appeared first on FusionReactor Observability & APM.

Effective ways to Avoid Application Performance Overhead with Log data

Nick Flewitt — Thu, 02 Dec 2021 15:25:56 +0000

Effective ways to Avoid Application Performance Overhead with Log data

It becomes critical for every forward-thinking organization to adopt effective ways to avoid performance overhead when managing their logs.

Log data remains an untapped or poorly managed resource for many organizations. In fact, logs are enormously valuable as they unlock the potential to understand customers’ behaviors, analyze usage and security data, and solve performance issues. However, these core benefits are defeated if every log must have a known value, and the cost of ingesting, storing, and querying log data can be far more expensive than the value created.

While several log management solutions deliver innovative features like machine learning ML and AIOps for easier monitoring, they miss the pivotal point that all these will increase application performance overhead. Since we agree that not all log information is actionable and that logging everything will increase the total cost of ownership, especially when licensing models are measured in volume. Log management should be visualized as an affordable commodity service that minimizes the cost and complexities of utilizing log data.

It becomes critical for every forward-thinking organization to adopt effective ways to avoid performance overhead when managing their logs.

Here are some strategies organizations can adopt to avoid performance overhead with log data;

Sort logs according to source groups
Define log severity levels
Define log processing pipeline

Sort logs according to source groups

Log data is generated from various sources during events occurring in a device, application, or service installed throughout the network. Depending on the kind of network, services or application managed, some system administrators will find some log data more interesting than others.

A clearly defined source group will help the system administrator navigate the problem quickly when investigating events. For instance, if there is a security downtime in network infrastructure, the admin can quickly navigate to the security log group to fix the issue with certainty.

Log groups are primarily divided into three as often seen in computer networks;

Computer security logs: Contains info about possible attacks, intrusions, viruses, or authentication actions against a device.
Application logs: Contains info about operating system-related events.
Operating system logs: Contains info about the application, database servers, and infrastructure data.

These log source groups can filter and divide the log data during searches and analysis to remove irrelevant information. This can help network administrators save time since the number of log messages required to be searched and analyzed is decreased.

Define Logging severity levels

It is imperative to define log severity levels because it describes the asperity of events within the logs. Severity levels make it easy to filter out the verbosity of log information. The severity of log data can give valuable insight into the device’s operations and current state of affairs. Therefore, system admins can respond uniquely to log severity by enabling levels explaining the severity and other ranges relating to normal or irrelevant logs. Technical admins can get prompt notification of severe log events, and on the other hand, if the log event isn’t extreme, notification can come in periodically. For instance, application logging frameworks like slfj4 or log4j can deliver logging solutions that segregate several logs according to their log severity levels.

Different standards for measuring log severity depend strongly on the operating system, application, and devices that generated the log data. Some log severity levels are represented with numbers ranging from 0-8, while others are described with acronyms such as FATAL, WARN, NORM, DEBUG, etc.

Number standard

Numerical Code	Severity
0	Emergency: System is unusable
1	Alert: Action must be taken immediately
2	Critical: Critical conditions
3	Error: Error conditions
4	Warning: Warning conditions
5	Notice: Normal but significant condition
6	Informational: Informational messages
7	Debug: Debug-level messages

Acronyms

Level	Description	Example
DEBUG	Information for programmers and system developers	log.debug
INFO	Operational events	log.info
WARN	A severe condition	log.warn
ERROR	An application or system error	log.error

Define log processing pipeline

Another important strategy to follow is to define your company’s log processing pipelines. This pipeline shows the processing the logs are put through during its period. Having a complete understanding of the log processing pipeline gives system admins the right perspective on storing and analyzing log data for maximum efficiency.

There are four significant steps the log passes through, include;

Processing: the logs pass through several processes like parsing, filtering, and event aggregation. During this phase, the system can further process the log data appearance changes to a more uniform template.

Storage: After the logs have been processed, they are then stored to be available in the future for events like log rotation, log compression, log archiving, and integrity checking.

Analysis: logs are reviewed with remarkable correlation and relationship finding tools to gain valuable insights between past and present events or the peculiarity of current events.

Disposal: for instance, logs are stored for a specified period due to PSI and other relevant compliance. However, once that phase elapses, the technical admins aligned with the company’s framework for centralized log management can decide whether to keep or dispose of the logs.

How to avoid application performance overhead with log data

Any innovation comes with its advantages and disadvantages. Organizations that wish to make efficient use of their logs must implement effective ways to avoid application performance overhead when managing their logs. Given the outstanding benefits of avoiding application performance overhead related to the reduced total cost of ownership (TCU), simplicity, and scalability, we will see more companies ready to master the art of log monitoring and avoid application performance overhead in the process.

The post Effective ways to Avoid Application Performance Overhead with Log data appeared first on FusionReactor Observability & APM.

Best practices to optimize and enhance log data

Nick Flewitt — Thu, 02 Dec 2021 14:29:59 +0000

Best practices to optimize and enhance log data

Best practices to optimize and enhance log data, such as data compression and log parsing, should be considered effective ways to minimize the mounting costs of monitoring and querying logs.

The massive adoption of cloud, AI, machine learning, social media, and mobile data technologies have led to the increasing volume and variety of log data produced by many organizations today. Since organizations are heavily dependent on their ability to get valuable insights from logs, they must also derive innovative ways to optimize and enhance log data to avoid challenges such as mounting costs of logs storage, manual querying for incident investigation, integration difficulties, and the need for customization. Adopting best practices to optimize and enhance log data is the next best step to take when you have developed your organizations’ logging framework for log management.

The problem of logging cost

Log management has been a standard practice most organizations follow to maintain operational observability. Unfortunately, conventional log management solutions weren’t designed to manage the enormous volume and variety of logs produced daily. For instance, if you run a significant production operation every day, be sure to generate above 100 Gigabytes of logs or more. Not to mention that you are also monitoring logs.

According to a survey by IDC, CIOs have recognized log data cost as their worst nightmare to overcome. Regardless of the massive volume of logs produced, most log data don’t get queried or analyzed yet account for the lion’s share of logging costs. It’s annoying that some enterprises have no choice but to limit relevant logs because of the overwhelming cost of monitoring everything. And because it is volatile, it’s hard to determine the costs from the get-go.

Best practices to optimize and enhance log data

We firmly believe that the future of log optimization and enhancement should have innovative analytics and automated querying and correlation capabilities while leveraging cloud and other log-related technologies in a money-efficient manner.

Best practices to optimize and enhance log data, such as data compression and log parsing, should be considered effective ways to minimize the mounting costs of monitoring and querying logs.

Data Compression

Internet bandwidth is the volume of information that can be transferred over a connection within a period. Bandwidth is crucial because it can determine the amount of log transferred, the transmission time, and projected costs. Since CIOs see log data cost as their worst challenge in production, adopting ways that reduce the log volumes without tampering with the quality of the log is necessary for enterprises today. Many companies have adopted data compression as a unique way to optimize and enhance log data, especially when the log data cost is directly proportional to the volume of network bandwidth consumed.

Transmitting raw logs to the log server from the agent in plain text is inefficient utilization of network bandwidth and limits performance. Data compression assists in reducing the network bandwidth usage; that way, logs transmitted remain the same while lowering transmission and storage size. So the data compression algorithm converts the raw data into a compressed format which is later reinflated when received by the log server.

Data Parsing

Parsing and indexing go hand-in-hand. When log volumes are undoubtedly huge, parsing them is the next best step to having a deeper understanding of where they came from, what occurred and how they can be stored. Parsing is converting logs into data fields that are easier to index, query, and store. The good thing is that most log monitoring solutions have a default setting that allows the tool to parse logs and collect key-value pairs based on standard delimiters such as colon or equal characters.

There are a few critical parsing rules that are pivotal in handling multiline log statements, they include;

Replace data
Setting rules to replace content at index time when certain conditions are met.
Discard data
Setting rules to discard logs when there’s no more use for the log messages.

Metadata parsing

Parsing can help extract valuable pieces of information from log metadata. For instance, cloud infrastructure can extract default cloud providers’ metatags to provide more context to help enrich the data.

Log transfer reliability

Server connectivity can lead to missing logs which can impact the integrity of the logs. To protect the reliability and integrity of logs, log monitoring solutions should use different unique methods to preserve logs. Some APM solutions use the advanced log transfer protocol (ALTP) to evaluate the data persistence.

Deduplication

Duplicate logs are one of the most complicated tasks with log management. Since logs are transmitted from various network communications, it’s possible to collect duplicates from the various servers in the network. Parsing rules can prevent deduplication of logs which invariably optimize and enhance the quality of the logs.

Conclusion – Best practices to optimize and enhance log data

The volume of logs generated will increase as we continue to see the adoption of cloud, AI, machine learning, social media, and mobile data technologies. Moreover, conventional log management solutions are inefficient in terms of logging costs. To optimize and enhance log data, organizations must find new and innovative methods, such as log compression and log parsing and the enforcement of a log management policy

The post Best practices to optimize and enhance log data appeared first on FusionReactor Observability & APM.

Unmasking unstructured logs with FusionReactor Cloud Logging LogQL pattern parser

Nick Flewitt — Wed, 10 Nov 2021 14:57:37 +0000

Unmasking unstructured logs with FusionReactor Cloud Logging LogQL pattern parser

Log data represents untapped or mismanaged resources for many organizations. Even if they can harness valuable insights from structured logs through their log management system, unmasking unstructured logs pose the most significant challenge.

However, with FusionReactor Cloud’s logging feature, writing LogQL queries to access and parse unstructured log formats just got easier. Parsing unstructured log data can be done a lot faster than the conventional parser. Let’s unmask more!

Log-parsing challenges

When log volumes are large, parsing helps convert them into simple data fields to query with LogQL. Parsing queries in the regex can be challenging and time-consuming, unlike queries from JSON and Logfmt, which are pretty easy to use and fast.

With LogQL, performing a full text-based search to analyze unstructured logs becomes simple. FusionReactor logging comes with LogQL parsers that manage JSON, regex, and Logfmt.

For instance, when extracting labels and values from NGINX logs, finding the rate of requests by status and method can be challenging. Consider the regex query highlighted within this example:

sum by (method, status) (rate({stream=''stdout", container=''nginx} | regexp '' ^ (\\S+) (?P\\S+) (?P\\S+) \\[(.*)\\] \"(?P\\S+) (?P\\S+) HTTP/(?P\\d+\\.\\d+) \'' (?P\\d+) (?P\\d+|-) '' [1m] ) )

Using FusionReactor Pattern Parser

The latest FusionReactor logging feature comes with a pattern parser that is simple and easy to use to produce decisive results to extract insights from unstructured logs. It’s important to note that there is a vast difference in how the latest pattern parser expresses its output from other regular expressions because the pattern parses logs faster than other parsers.

To elaborate on the capability of this logging capability, we have highlighted the same query written using the FusionReactor logging pattern parser:

sum by (method,status) (rate({stream="stdout" ,container="nginx"} | pattern `<_> - - <_> " <_> <_>" <_> <_> "<_>" <_>` [$__interval] ) )

Pattern parser syntax and semantics

To invoke a pattern parser, we specify the following expression within the LogQL query described below:

| pattern ""

| pattern ``

defines the kind of structure the log line will have. The log line comprises capture and literals.

we’ll notice that the < and > characters define the field name status. Referring to the example mentioned above, represents the field’s name status, and <_> represents the unnamed capture because it skips and ignores matched content across the logline.
With such expressions, it’s easier to know when a capture does not match because captures are matched at the start of the line to the following sequence of the log query. The advantage of the FusionReactor logging pattern parser is that if capture is not identical, the parser immediately stops extracting data from the log lines.

Understanding the Pattern parser with examples

The following example of a pattern parser will be queried on an NGINX log. We will use a table to dissect the logline expression. For example

NGINX log line fields	NGINX sample
$remote_addr	203.0.113.0	<_>
–	–	–
$remote_user	–	–
[$time_local]	[08/Nov/2021:19:12:04 +0000]	<_>
“$request”	“GET /healthz HTTP/1.1”	“ <_> <_>”
$status	200
$bytes_sent	15	<_>
“$http_referer”	“-”	<_>
“$http_user_agent”	“GoogleHC/1.0”	“<_>”
	“-” “-” “-” “-”	<_>

Kubernetes example

Let’s look at another instance in a Kubernetes environment. The pattern parser is used to query an envoy proxy which returns the 99th percentile latency per path and method. The metric query is measured in seconds.

quantile_over_time(0.99, {container="envoy"} | pattern `[<_>] " <_>" <_> <_> <_> <_> ` | unwrap latency\[$__interval] ) by (method,path) / 1e3

Moving Forward

FusionReactor log parsing capabilities make it easy and faster to parse logs within multiple environments

How to create fast LogQL queries to filter terabytes of logs per second with FusionReactor

Performance matters when retrieving or analyzing data. This is why the need to create fast queries which filter terabytes of logs is critical because fast retrieval is as good as efficient queries.

FusionReactor with LogQL makes it easier to create fast queries to filter terabytes of logs per second. This article will break down the key concepts and give you simple tips to create fast queries in seconds.

Find out how

The post Unmasking unstructured logs with FusionReactor Cloud Logging LogQL pattern parser appeared first on FusionReactor Observability & APM.