Performance analysis with flame graph under Linux

Performance analysis with flame graph under Linux

Performance analysis with flame graph under Linux

This work is licensed under the Creative Commons Attribution-Non-Commercial Use-Sharing in the Same Way 4.0 International License Agreement . Please indicate the source for reprinting. Thank you for your cooperation

Due to my limited technical level and knowledge, if there are any deficiencies or needs to be corrected in the content, I welcome everyone to correct me, and I also welcome you to provide some other good debugging tools for inclusion, I thank you here

Software performance analysis, often need to check

Time-consuming, understand where the bottleneck is.

Flame graph (

flame graph
) Is a powerful tool for performance analysis

1 Introduction to flame diagram

When many people have a cold and have a fever, they often imitate Shennong's way of tasting herbs: try antiviral drugs first, then antibacterial drugs, don t control what medicines at home, what Chinese medicines and western medicines, the blind cat will always In the event of dead mice, this is naturally undesirable. The correct way is to go to the hospital for a blood test, and then prescribe the right medicine after the diagnosis.

Let us recall how we generally debug programs: usually relying on subjective assumptions without data, rather than thinking about what caused the problem!

There is no doubt that when tuning program performance problems, you also need to prescribe the right medicine. The good news is

Invented the flame graph

1.1 Flame graph

Common types of flame graphs are

, and also
and many more.

For a detailed introduction to the flame diagram, please refer to

, In short: The entire graphic looks like a ball of beating flame, which is the origin of its name. The one burning at the tip of the flame is
The operation being performed, but it needs to be explained that the color is random and has no special meaning in itself. The vertical indicates the depth of the call stack, and the horizontal indicates the time consumed. Because the call stack will be sorted alphabetically in the horizontal, and the same call stack It will be merged, so the larger the width of a grid, the more it may be a bottleneck. In summary, the main thing is to look at those relatively large flames, and pay special attention to those similar to Pingdingshan.

To generate a flame graph, you must have a handy

Tool, if the operating system is
, Then the choice is usually
One of. Among them
Relatively more commonly used, because it is
Linux Kernel
Built-in performance tuning tools, most
It is included, and interested readers can refer to it later
Linux Profiling at Netflix
In the introduction, especially in how to deal with
Broken stacks
The description of the problem, it is recommended to read it several times, and
Relatively more powerful, but the disadvantage is that you need to learn its own programming language first.

The early flame diagram is in

And the community is more active, if you are a
Development or optimization of staff, then I strongly recommend that you use Chun 's
, At first glance at the name, you might mistakenly think that this toolkit is
Dedicated, in fact, many of these tools are suitable for any
Program written in language:

Sampling data used to generate the On-CPU flame graph (
Sampling data used to generate Off-CPU flame graph (

1.2 On/Off-CPU flame diagram

So when to use

Flame diagram? When to use
What about the flame graph?

Depends on what the current bottleneck is, if it is

Then use
Flame graph, if it is
Or lock then use
Flame diagram. If you are not sure, you can use the pressure test tool to confirm: Use the pressure test tool to see if you can make
The utilization rate tends to be saturated, if you can use it
Flame graph, if no matter how you press it,
Utilization rate has never risen, so most of it means that the program is
Or the lock is stuck, suitable for use at this time
Flame diagram.

If you still can t confirm, then you might as well

Flame graph and
The flame diagrams are all messed up, and under normal circumstances they will be quite different. If the two flame diagrams are similar, it is usually considered
Preempted by other processes.

When sampling data, it is best to continue to pressure the program through the pressure measurement tool in order to collect enough samples. Regarding the choice of the pressure measurement tool, if you choose

, Then remember to turn on
Option to avoid exhausting the available ports of the system. In addition, I recommend trying to use something like
Such more modern stress testing tools.

##1.3 Flame Graph Visualization Generator

Brendan D. Gregg
Flame Graph
The project implemented a set of scripts to generate flame graphs.

Flame Graph
Project is located in


Put it
Come down

HTTPS clone git: // copy the code

The following steps are required to generate and create a flame graph

Capture stackuse
And other tools to grab the running stack of the program
Folded stack
The stack information of the system and program captured by the tool at each moment of running, they need to be analyzed and combined, and the repeated stacks are accumulated together to reflect the load and critical path
Generate flame graphAnalyze the stack information output by stackcollapse to generate a flame graph

Different trace tools capture different information, so

Flame Graph
Provides a series of

stackcollapse.plfor DTrace stacks
stackcollapse-perf.plfor Linux perf_events "perf script" output
stackcollapse-pmc.plfor FreeBSD pmcstat -G stacks
stackcollapse-stap.plfor SystemTap stacks
stackcollapse-instruments.plfor XCode Instruments
stackcollapse-vtune.plfor Intel VTune profiles
stackcollapse-ljp.awkfor Lightweight Java Profiler
stackcollapse-jstack.plfor Java jstack(1) output
stackcollapse-gdb.plfor gdb(1) stacks
stackcollapse-go.plfor Golang pprof stacks
stackcollapse-vsprof.plfor Microsoft Visual Studio profiles

2 Generate flame graph with perf

2.1 Perf collects data

Let's start from

Abbreviation for) Speaking, it is
The performance analysis tool provided by the system will return
The name of the function being executed and the call stack (

Perf -F Record the sudo 99 -p 3887 -g - SLEEP 30 duplicated code

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-KNgLEYyP-1624459176139)(./perf_record_chrome.png)]

perf record
Indicates that system events are collected, not used
Specify the collection event, then the default collection
(which is
CPU clock
-F 99
Means per second
-p 13204
Is the process number, that is, which process is analyzed,
Indicates that the call stack is recorded,
sleep 30
Is continuous

Specify the sampling frequency as
(Per second
Times), if
99 times
All return the same function name, that means
The same function is being executed this second, and there may be performance problems.

After running, a huge text file will be generated. If a server has

, Samples per second
Times, lasting
Seconds, you get
A call stack, up to hundreds of thousands or even millions of lines.

For ease of reading,

perf record
The command can count the percentage of each call stack appearing, and then sort from high to low.

sudo perf report -n --stdio copy the code

2.2 Generate flame graph

First use

perf script
Tool pair

# Generate folded call stack perf script -i &> perf.unfold Copy code

Save the parsed information for generating flame graphs

First use
The content parsed by perf
The symbols in are folded:

# Generate flame graph ./ perf.unfold &> perf.folded Copy code

Finally generated


./ perf.folded> perf.svg copy the code

We can use pipelines to simplify the above process into one command

perf script | FlameGraph/ | FlameGraph /> process.svg copy the code

3 Analyze the flame graph

Finally, you can use the browser to open the flame graph for analysis.

3.1 The meaning of the flame graph

The flame graph is based on

Information generated
Picture, used to show
Call stack.

The axis represents the call stack, each layer is a function. The deeper the call stack, the higher the flame, the top is the function being executed, and the bottom is its parent function.

The axis represents the number of samples, if a function is
The wider the width occupied by the axis, the more times it has been drawn, that is, the longer the execution time. Note,
The axis does not represent time, but all the call stacks are merged and arranged in alphabetical order.

The flame graph is to see which function on the top layer occupies the largest width. As long as there is a "flat top" (

), it means that the function may have performance problems.

The color has no special meaning, because the flame diagram represents

It's busy, so warm colors are generally chosen.

3.2 Interactivity

The flame graph is

Pictures, you can interact with users.

  • Mouse hover

Each layer of the flame will be marked with the function name. When the mouse is hovered, the complete function name, the number of sampling draws, and the percentage of the total sampling times will be displayed

  • Click to enlarge

Click on a certain layer, the flame graph will be enlarged horizontally, the layer will occupy all the width, and detailed information will be displayed.

The upper left corner will also display "Reset Zoom", click the link, the picture will be restored to its original shape.

  • search for

Pressing Ctrl + F will display a search box, the user can enter keywords or regular expressions, and all the function names that meet the conditions will be highlighted.

3.3 Limitations

In both cases, the flame graph cannot be drawn, and the system behavior needs to be corrected.

  • Incomplete call stack

When the call stack is too deep, some systems only return to the previous part (such as the first 10 layers).

  • Function name is missing

Some functions have no names, and the compiler only uses memory addresses to represent them (such as anonymous functions).

3.4 The flame graph of the browser

The browser can generate the flame diagram of the page script for

Open developer tools, switch to

Panel. Then, click the "Record" button to start recording data. At this time, you can perform various operations on the page, and then stop "Recording".

At this time, the developer tool will display a timeline. Below it is the flame graph.

There are two differences between the browser flame graph and the standard flame graph: it is inverted (that is, the function at the top of the call stack is at the bottom);

The axis is the time axis, not the number of samples.

4 Red and blue bifurcation flame diagram

Refer to


Flame graph (
flame graphs
The problem of utilization rate is generally better positioned. But to deal with the problem of performance regression, it is necessary to constantly switch and compare the flame graphs before and after the modification or between different periods and scenes to find out the problem. This feels like Search for Pluto in the solar system. Although this method can solve the problem, I think there should be a better way.

Therefore, the following Introducing the red/blue differential FIG flame (red/blue differential flame graphs)

4.1 Example of red and blue differential flame diagram

Above is a pair of interactive

Format picture . Two colors are used in the picture to indicate the state, red indicates growth, and blue indicates attenuation.

The shape and size of each flame in this flame picture are the same as the second grab

File corresponding
The flame graph is the same. (where,
The axis represents the depth of the stack,
The axis represents the total number of samples, and the width of the stack frame represents
The proportion of the function appearing in the file, the top layer represents the function that is running, and then the stack that calls it).

The following example shows that after the system is upgraded, a workload of

Utilization rate has increased. The following is the corresponding
Flame graph (

Usually, the colors of the stack frame and the stack tower in the standard flame diagram are randomly selected. In the red/blue differential flame diagram, different colors are used to represent the two

The difference in the file.

In the second

The function and its subsequent calls run more times than the previous one, so this stack frame is marked in red in the above figure. It can be seen that the cause of the problem is that the ZFS compression function is enabled, and before the system upgrade This feature is turned off.

This example is too simple, I can even analyze it without using the differential flame graph. But imagine that if you are analyzing a small performance degradation, such as less than 5%, and the code is more complex, the problem is as good as that. Dealt with.

4.2 Introduction to Red and Blue Differential Flame Diagram

I have been discussing this matter for several years, and finally I wrote an implementation that I personally think is valuable. It works like this:

  1. Grab the stack before modification


  2. Grab the modified stack


  3. use

    To generate the flame graph. (So the width of the stack frame is

  4. Use the difference of "2-1" to recolor the flame graph. The principle of coloring is that if the stack frame is in

    If it appears more frequently, it is marked as red, otherwise it is marked as blue. The color is filled according to the difference before and after modification.

The purpose of this is to use both before and after the modification

File comparison is very useful when performing functional verification tests or evaluating the impact of code modifications on performance. The new flame graph is based on the revised
File generation (so the width of the stack frame still shows the current CPU consumption). Through the color comparison, you can understand the reason for the difference in system performance.

Only functions that have a direct impact on performance will be marked with colors (for example, functions that are running), and the sub-functions it calls will not be marked repeatedly.

4.3 Generate red/blue differential flame graph


A program script is implemented in
Used to generate red and blue differential flame graphs. In order to show how the tool works, use Linux perf_events to demonstrate the operation steps. You can also use other

  • Grab the profile 1 file before modification:
# dedicate data perf record -F 99 -a -g - sleep 30 # Analyze data to generate stack information perf script> out.stacks1 # Fold stack ./ ../out.stacks1> out.folded1 Copy code
  • After a period of time (or after the program code is modified), grab the profile 2` file
# dedicate data perf record -F 99 -a -g - sleep 30 # Analyze data to generate stack information perf script> out.stacks2 # Fold stack ./ ../out.stacks2> out.folded2 Copy code

Generate red and blue differential flame diagram

./ out.folded1 out.folded2 | ./> diff2.svg copy the code
Only the "folded" stack
The file is operated, and the folding operation is performed by the previous
The series of scripts are completed. The scripts are output
Column data, one column represents the folded call stack, and the other two columns are before and after modification
File statistics.

func_a;func_b;func_c 31 33 [...] Copy code

In the above example, "funca()->funcb()->func_c()" represents the call stack, which is in profile1

Document CCP appeared
Times at
Document CCP appeared
Times. Then, use
The script handles this
3` column data, a red/blue differential flame graph will be automatically generated.

Here are some useful options:

other optionsdescription -nThis option will normalize the data in the two profile files so that they can match each other. If you don't do this, the statistics of all the stacks grabbed will definitely be different, because the grabbing time and CPU load are different. In this case, it looks either red (increased load) or blue (decreased load). The -n option balances the first profile file, so you can get a complete red/blue map -xThis option will delete the hexadecimal address. The profiler often fails to convert the address to a symbol, so there will be a hexadecimal address in the stack. If this address is different in the two profile files, the two stacks will be considered different stacks, but in fact they are the same. If you encounter such a problem, use the -x option to fix it --negateUsed to reverse the red/blue color scheme. In the following chapters, this function will be used

4.4 Shortcomings

Although the red/blue differential flame diagram is useful, there is actually a problem: if a code execution path disappears completely, then there is no place to mark blue in the flame diagram. You can only see the current one

Usage, and don't know why it becomes like this.

One way is to reverse the order of comparison and draw an opposite differential flame diagram. For example:

The flame diagram above is based on before modification

The file is the benchmark, and the color expresses what is going to happen. The part highlighted in blue on the right shows the modified
CPU Idle
Time will be less. (Actually, usually
To filter out, use the command line
grep -v cpuidle

In the figure, the disappeared code is also highlighted (or it should be said that it is not highlighted), because the compression function was not enabled before the modification, so it did not appear before the modification

The file is gone, and there is no part marked in red.

The following is the corresponding command line:

./ out.folded2 out.folded1 | ./ --negate> diff1.svg copy the code

In this way, the previous generation

Using them together, we can get:

Flame graph informationdescription
diff1.svgThe width is based on the profile file before modification, and the color indicates what will happen
diff2.svgThe width is based on the modified profile file, and the color indicates what has happened

If you are doing a functional verification test, I will generate these two images at the same time.

4.5 CPI flame graph

These scripts were initially used in the analysis of the CPI flame chart . Compare the before and after modification

The file is different, in the analysis
When the flame graph, it can be analyzed
The difference between the working cycle and the pause cycle changes, which can highlight the working status of the CPU.

4.6 Other differential flame diagrams

There are other people who have done similar work. Robert Mustacchi also made some attempts not long ago . The method he used is similar to the color code style during code inspection: only the differences are shown, and red indicates the new (rising) code. Path, blue indicates the code path for deletion (descent). A key difference is that the width of the stack frame only reflects the number of different samples. An example is on the right. This is a very good idea, but it feels a bit strange in actual use, because the context of the complete profile file is missing as a background, this picture is a bit difficult to understand.

Cor-Paul Bezemer also created a differential display method flamegraphdiff . He puts 3 flame graphs in the same graph at the same time, one for each standard flame graph before and after the modification, and a differential flame graph is added below, but the stack frame width It is also the number of samples of the difference. The figure above is an example . Move the mouse to the stack frame in the difference graph, the same stack frame in the three graphs will be highlighted. This method adds two standard flame diagrams, so the context problem is solved.

The difference flame diagrams of the three of us all have their own strengths. The three can be used in combination: the two images above in the Cor-Paul method can use my diff1.svg and diff2.svg. The flame diagram below can use Robert's way. To maintain consistency, I can use my coloring method for the flame map below: blue->white->red.

The flame map is spreading widely, and now many companies are using it. If you know of other ways to implement differential flame graphs, I wouldn't be surprised. (Please tell me in the comments)

4.7 Summary

If you have a performance regression problem, the red/blue differential flame graph is the fastest way to find the root cause. In this way, two ordinary flame pictures were captured, and then compared, and the differences were color-coded: red means rising, blue means falling. The differential flame graph is based on the current ("modified") profile file, and the shape and size remain unchanged. Therefore, you can intuitively find the difference through the difference in color, and you can see why there is such a difference.

The differential flame graph can be applied to the daily construction of the project, so that the performance regression problem can be discovered and corrected in time.


5 Reference

Use linux perf tool to generate java program flame diagram

Use perf to generate Flame Graph (flame graph)

The site of the great god brendangregg