Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate memory PSI collection and visualization tools #4141

Conversation

OhmSpectator
Copy link
Member

@OhmSpectator OhmSpectator commented Aug 9, 2024

This PR introduces a toolset for collecting and visualizing Pressure Stall Information (PSI) data, fully integrated into the EVE framework. The primary purpose of this PR is to enable the collection of PSI data, which is crucial for analyzing memory pressure patterns in the system. Understanding these patterns will allow us to develop more effective monitoring tools that can predict and mitigate potential Out-of-Memory (OOM) situations.

To support this functionality, the EVE Linux kernel configuration has been adapted to enable the CONFIG_PSI option. This change ensures that PSI data can be collected on systems running EVE, providing the necessary insights into memory pressure.

The PR includes several key enhancements. First, it integrates a PSI collector into the Pillar's AgentLog component, making it accessible via the internal debug API. This allows for the dynamic start and stop of the PSI collector through HTTP requests, as well as through the eve command-line tool for ease of use. Additionally, a standalone version of the PSI collector is provided, enabling its deployment on older versions of EVE where it is not yet integrated into the root filesystem.

The PR also introduces a PSI visualizer tool, designed to process and display the collected PSI data in an interactive format. This visualization is essential for understanding memory pressure over time, providing insights that can inform the development of new triggers and monitoring mechanisms to preempt OOM conditions.

TODO:

  • Daemonize the standalone version
  • Fix Start -> Stop -> Start for the EVE-integrated version.
  • Fix ManageStatFileSize function, so it correctly handles different corner cases of args
  • Add tests
  • Fix race conditions in test

@OhmSpectator
Copy link
Member Author

Yeah, I see the Yetus warnings... Will fix them.
We have to introduce a Makefile target to run Yetus locally on diff =(

@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch from 487eb68 to bd41930 Compare August 10, 2024 09:50
Comment on lines 41 to 46
if _, err := os.Stat(PIDFile); err == nil {
log.Noticef("Memory PSI Collector is already running")
return
}
// Create a PID file
err := createPIDFile()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not atomic - it is kind of the same discussion as #4106 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can hardly be a problem, as the standalone version will be started manually.
But now I remembered that I wanted to detach it from the terminal at the start.

@OhmSpectator OhmSpectator marked this pull request as draft August 12, 2024 09:48
@OhmSpectator
Copy link
Member Author

I dislike using Golang. I've struggled to merge a simple feature for over a week. The initial C PoC was ready in one day. It seems needlessly complex.

@OhmSpectator OhmSpectator self-assigned this Aug 12, 2024
@OhmSpectator OhmSpectator changed the title Integrate memory PSI collection and visualization tools [WIP] Integrate memory PSI collection and visualization tools Aug 12, 2024
@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch from bd41930 to 5148dab Compare August 12, 2024 18:36
@OhmSpectator OhmSpectator changed the title [WIP] Integrate memory PSI collection and visualization tools Integrate memory PSI collection and visualization tools Aug 12, 2024
@OhmSpectator OhmSpectator marked this pull request as ready for review August 12, 2024 18:37
@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch from 5148dab to 52f492c Compare August 12, 2024 19:01
@OhmSpectator
Copy link
Member Author

Use dup3 instead of dup2, to support ARM

@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch from 52f492c to 147c6b7 Compare August 13, 2024 14:55
@@ -27,6 +27,8 @@ Welcome to EVE!
dump-stacks
dump-memory
memory-monitor-update-config
psi-collector-start
psi-collector-stop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for other commands we use on/off as a parameter - f.e. https://github.com/lf-edge/eve/pull/4141/files#diff-a3b42769797ef2e752af884f98fd0877b6034d918bae243c3f165d7d710987a1L25

(also this comment comes from the guy who did not use on/off himself but start/stop)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of remote access, on/off looks fine. You either turn the access on or off. It's like a property. In the case of this tool, you have to start and stop it. It's a process, not a property. But I like the idea of adding a subcommand: "star|stop". Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with "psi-collector start|stop"

@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch 2 times, most recently from fb78e2a to 2981b6f Compare August 13, 2024 15:36
@OhmSpectator
Copy link
Member Author

I see a strange failure in the Go tests for my PR.

time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/FORWARD-device/Allow-DHCP-forwarding, content: iptables rule: -t mangle -I FORWARD-device --match connmark --mark 10 -j ACCEPT (Allow forwarding of all DHCP traffic)" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Chain/mangle/OUTPUT-device, content: iptables chain OUTPUT-device for table mangle " pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT/Traverse-device-wide-ACLs, content: iptables rule: -t mangle -I OUTPUT -j OUTPUT-device" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Save-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j CONNMARK --save-mark" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Default-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j MARK --set-mark 5" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Accept-marked-egress, content: iptables rule: -t mangle -I OUTPUT-device -m mark ! --mark 0 -j ACCEPT" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Restore-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j CONNMARK --restore-mark" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Chain/raw/PREROUTING-apps, content: iptables chain PREROUTING-apps for table raw " pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/raw/PREROUTING/Traverse-application-ACLs, content: iptables rule: -t raw -I PREROUTING -j PREROUTING-apps" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="Running state reconciliation for subgraph L3, reasons: address change" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for IPv4-Route/501/eth0/default, content: Network route for adapter 'mock-eth0' with priority 0: {Ifindex: 1 Dst: <nil> Src: <nil> Gw: 192.168.10.1 Flags: [] Table: 501 Realm: 0}" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Src-IP-Rule/eth0/192.168.10.5, content: Source-based IP rule: {adapter: mock-eth0, ifName: eth0, ip: 192.168.10.5, prio: 15000}" pid=1234 source=test
    linux_test.go:296: 
        Expected
            <[]net.IP | len:0, cap:0>: nil
        to have length 1

DONE 293 tests, 3 skipped, 1 failure in 356.089s

Isn't it expected, @milan-zededa?

@milan-zededa
Copy link
Contributor

I see a strange failure in the Go tests for my PR.

time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/FORWARD-device/Allow-DHCP-forwarding, content: iptables rule: -t mangle -I FORWARD-device --match connmark --mark 10 -j ACCEPT (Allow forwarding of all DHCP traffic)" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Chain/mangle/OUTPUT-device, content: iptables chain OUTPUT-device for table mangle " pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT/Traverse-device-wide-ACLs, content: iptables rule: -t mangle -I OUTPUT -j OUTPUT-device" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Save-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j CONNMARK --save-mark" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Default-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j MARK --set-mark 5" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Accept-marked-egress, content: iptables rule: -t mangle -I OUTPUT-device -m mark ! --mark 0 -j ACCEPT" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Restore-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j CONNMARK --restore-mark" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Chain/raw/PREROUTING-apps, content: iptables chain PREROUTING-apps for table raw " pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/raw/PREROUTING/Traverse-application-ACLs, content: iptables rule: -t raw -I PREROUTING -j PREROUTING-apps" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="Running state reconciliation for subgraph L3, reasons: address change" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for IPv4-Route/501/eth0/default, content: Network route for adapter 'mock-eth0' with priority 0: {Ifindex: 1 Dst: <nil> Src: <nil> Gw: 192.168.10.1 Flags: [] Table: 501 Realm: 0}" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Src-IP-Rule/eth0/192.168.10.5, content: Source-based IP rule: {adapter: mock-eth0, ifName: eth0, ip: 192.168.10.5, prio: 15000}" pid=1234 source=test
    linux_test.go:296: 
        Expected
            <[]net.IP | len:0, cap:0>: nil
        to have length 1

DONE 293 tests, 3 skipped, 1 failure in 356.089s

Isn't it expected, @milan-zededa?

Looks like a race in the test, I will have a look...

@@ -241,13 +251,47 @@ func listenDebug(log *base.LogObject, stacksDumpFileName, memDumpFileName string
}

info := `
This server exposes the net/http/pprof API.</br>
For examples on how to use it, see: <a href="https://pkg.go.dev/net/http/pprof">https://pkg.go.dev/net/http/pprof</a></br>
<a href="debug/pprof/">pprof methods</a></br></br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you removed this link? I used that in the browser ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned the link

@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch from 3bc2b2d to 0969b2b Compare August 23, 2024 13:51
@OhmSpectator
Copy link
Member Author

Virtualization test suite failed because of the docker rate limit, again:

Start eden failed: cannot start redis cannot start redis: StartRedis: error in create redis container: imagePull: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

I'm also trying to understand what happened to the Smoke Tests. I see some failure

=== RUN TestReboot
reboot_test.go:162: time: 2024-08-23T14:48:16.163688927Z out: Wait for state of b66d2a92-37c6-4f69-a0c0-03b242b9e549
testContext.go:380: done waiting for State
reboot_test.go:166: time: 2024-08-23T14:48:16.163709025Z out: timewait: 20m0s
reboot_test.go:167: time: 2024-08-23T14:48:16.163712572Z out: reboot: true
reboot_test.go:168: time: 2024-08-23T14:48:16.163716559Z out: count: 1
reboot_test.go:172: time: 2024-08-23T14:48:16.163722581Z out: LastRebootTime: 2024-08-23 14:40:28.683675137 +0000 UTC
reboot_test.go:174: time: 2024-08-23T14:48:16.163725907Z out: LastRebootReason: NORMAL: controller reboot at EVE version 0.0.0-pr4141-0969b2b9-kvm-amd64 at 2024-08-23T14:40:28.664489788Z
config changed, to see config run 'eden controller edge-node get-config'
rebooted with reason Reboot reason - system reset, reboot or kernel panic due to watchdog or kernel bug (no kdump) - at 2024-08-23T14:56:10.736743351Z at 2024-08-23 14:56:10.736743351 +0000 UTC/n reboot_test.go:64: abnormal reboot: Reboot reason - system reset, reboot or kernel panic due to watchdog or kernel bug (no kdump) - at 2024-08-23T14:56:10.736743351Z
testContext.go:302: WaitForProc terminated by timeout 20m0s
reboot_test.go:186: time: 2024-08-23T14:57:07.199932328Z out: Number of reboots: 0
--- FAIL: TestReboot (531.04s)
FAIL
FAIL: ../eclient/testdata/shutdown_test.txt:74: command failure
[background] eden.reboot.test -test.v -timewait=20m -reboot=1 -count=1: exit status 1
[stdout]
Reboot Test
=== RUN TestReboot
reboot_test.go:162: time: 2024-08-23T14:48:16.163688927Z out: Wait for state of b66d2a92-37c6-4f69-a0c0-03b242b9e549
testContext.go:380: done waiting for State
reboot_test.go:166: time: 2024-08-23T14:48:16.163709025Z out: timewait: 20m0s
reboot_test.go:167: time: 2024-08-23T14:48:16.163712572Z out: reboot: true
reboot_test.go:168: time: 2024-08-23T14:48:16.163716559Z out: count: 1
reboot_test.go:172: time: 2024-08-23T14:48:16.163722581Z out: LastRebootTime: 2024-08-23 14:40:28.683675137 +0000 UTC
reboot_test.go:174: time: 2024-08-23T14:48:16.163725907Z out: LastRebootReason: NORMAL: controller reboot at EVE version 0.0.0-pr4141-0969b2b9-kvm-amd64 at 2024-08-23T14:40:28.664489788Z
config changed, to see config run 'eden controller edge-node get-config'
rebooted with reason Reboot reason - system reset, reboot or kernel panic due to watchdog or kernel bug (no kdump) - at 2024-08-23T14:56:10.736743351Z at 2024-08-23 14:56:10.736743351 +0000 UTC/n reboot_test.go:64: abnormal reboot: Reboot reason - system reset, reboot or kernel panic due to watchdog or kernel bug (no kdump) - at 2024-08-23T14:56:10.736743351Z
testContext.go:302: WaitForProc terminated by timeout 20m0s
reboot_test.go:186: time: 2024-08-23T14:57:07.199932328Z out: Number of reboots: 0
--- FAIL: TestReboot (531.04s)
FAIL
[background] eden.app.test -test.v -timewait 10m -check-new RUNNING eclient:
[stdout]

here: https://github.com/lf-edge/eve/actions/runs/10527286917/job/29170125936?pr=4141
But I'm lost in these messages. Could someone help me interpreting it?
I would also like to know how to run these tests locally. @milan-zededa, do you have any nice command I can use to run a specific test locally?

@OhmSpectator
Copy link
Member Author

Meanwhile, I tried to rerun the failed tests...

@@ -292,6 +337,46 @@ func listenDebug(log *base.LogObject, stacksDumpFileName, memDumpFileName string
return
}
}))
mux.Handle("/memory-monitor/psi-collector/start", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's out of the scope of this PR, but how about we move from http.Mux to chi router, which supports middleware and you can separate POST and GET methods as well? We use it in metadata server

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... I don't mind. Feel free to implement or fire a ticket so as not to forget it =)

'fullAvg10 fullAvg60 fullAvg300 fullTotal'


def visualize_memory_pressure(log_file):
Copy link
Contributor

@uncleDecart uncleDecart Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holywar
we can actually utilise typing of Python3 and you can write something like

deff visualize_memory_pressure(log_file: pathlib.Path):

/holywar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can... Will try it next time =D

eve-kernel-amd64-v6.1.38-generic
    8859f43ee8c9: eve_defconfig: Enable PSI (Pressure Stall Information).

eve-kernel-amd64-v6.1.38-rt
    37bc37278441: eve_defconfig: Enable PSI (Pressure Stall Information).

eve-kernel-arm64-v5.10.104-nvidia
    7b9a55e0b659: defconfig: Enable PSI in default configuration.

eve-kernel-arm64-v6.1.38-generic
    acbfadebeb76: eve_defconfig: Enable PSI in default configuration.
    bb8ed98284d6: eve_defconfig: Remove ST33HTPM support.

The second commit in the 5.10 kernel update is not logically a part of this
patch set, but a leftover from a previous one.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces a Pressure Stall Information (PSI) collector to
the AgentLog component in Pillar. The collector periodically gathers
memory pressure data from the system, logging it in a managed file. The
code also includes functionality for handling the file size to ensure
that only recent data is kept. This collector is designed to be shared
between the AgentLog component and a future standalone tool for enhanced
memory pressure monitoring.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces new constants for the memory-monitor directories
within the Pillar codebase. Specifically, it adds paths for the
memory-monitor output directory and the PSI log file. This change
ensures that Pillar components are aware of and can correctly reference
the locations used by the memory-monitor.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit adds functionality to the Pillar internal debug API,
allowing the start and stop of the Memory PSI collector via HTTP POST
requests. New endpoints `/memory-monitor/psi-collector/start` and
`/memory-monitor/psi-collector/stop` have been introduced, enabling
control over the PSI collector process.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…ript.

This commit extends the `eve` script with new commands `psi-collector
start|stop`. These commands interact with the internal debug API to
start and stop the Memory PSI collector, respectively.

This addition makes it convenient to control the PSI collector directly
from the terminal, with the results stored in
`/persist/memory-monitor/output/psi.txt`.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit adds a standalone version of the PSI Collector, allowing it
to be built and used independently on older versions of EVE where the
collector is not integrated into the root filesystem. The new component
includes a Makefile for building the binary, a README with instructions,
and a basic main.go implementation that handles the PSI collection
process.

This ensures compatibility and provides memory pressure monitoring
capabilities for legacy systems.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit adds a new tool, `psi-visualizer`, designed to visualize PSI
data, which is essential for understanding memory pressure patterns over
time. The tool uses Python with Plotly to create interactive plots from
PSI logs generated by the PSI Collector.

The visualizer includes a Makefile for environment setup, a
`requirements.txt` for dependencies, and a `visualize.py` script that
reads PSI data and generates plots.

A README is also included to guide users through setup and usage, making
it easier to analyze memory pressure visually.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…ertions.

Added the `testify/require` package to enhance test assertions by allowing
tests to stop immediately when a required condition is not met. This
addition helps ensure that tests do not continue when critical
assertions fail, providing better control over test flows. Vendor files
for the require module have been included to support this functionality.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit changes the listenDebug function in the agentlog package to
ListenDebug, making it public. The purpose of this change is to allow
the function to be utilized in tests, enabling test scenarios that
require the pillar debug API.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…agement.

This commit introduces a comprehensive suite of tests for the PSI
collector integration and the file size management functionality within
the `agentlog` package. These tests cover various scenarios including
starting and stopping the PSI collector, emulating memory pressure
stats, verifying file content after operations, and ensuring the correct
handling of edge cases like non-triggered truncation and improper
threshold sizes.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…synchronization.

This commit introduces two mutex locks to ensure thread-safe handling of
PSI data and synchronization during testing.

The first lock (PsiMutex) is added to protect access to the PSI files,
preventing race conditions between the PSI data producer in the tests
and the consumer in memprofile.

The second lock (psiProducerMutex) ensures that only one PSI data
producer runs during tests, avoiding conflicts and ensuring consistent
test results.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch from 0969b2b to 122406e Compare August 26, 2024 22:42
@OhmSpectator
Copy link
Member Author

Fixed the comments from @milan-zededa and @uncleDecart

@milan-zededa
Copy link
Contributor

I see a strange failure in the Go tests for my PR.

time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/FORWARD-device/Allow-DHCP-forwarding, content: iptables rule: -t mangle -I FORWARD-device --match connmark --mark 10 -j ACCEPT (Allow forwarding of all DHCP traffic)" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Chain/mangle/OUTPUT-device, content: iptables chain OUTPUT-device for table mangle " pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT/Traverse-device-wide-ACLs, content: iptables rule: -t mangle -I OUTPUT -j OUTPUT-device" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Save-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j CONNMARK --save-mark" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Default-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j MARK --set-mark 5" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Accept-marked-egress, content: iptables rule: -t mangle -I OUTPUT-device -m mark ! --mark 0 -j ACCEPT" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/mangle/OUTPUT-device/Restore-egress-mark, content: iptables rule: -t mangle -I OUTPUT-device -j CONNMARK --restore-mark" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Chain/raw/PREROUTING-apps, content: iptables chain PREROUTING-apps for table raw " pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Iptables-Rule/raw/PREROUTING/Traverse-application-ACLs, content: iptables rule: -t raw -I PREROUTING -j PREROUTING-apps" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="Running state reconciliation for subgraph L3, reasons: address change" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for IPv4-Route/501/eth0/default, content: Network route for adapter 'mock-eth0' with priority 0: {Ifindex: 1 Dst: <nil> Src: <nil> Gw: 192.168.10.1 Flags: [] Table: 501 Realm: 0}" pid=1234 source=test
time="2024-08-13T15:41:24Z" level=info msg="DPC Reconciler executed create for Src-IP-Rule/eth0/192.168.10.5, content: Source-based IP rule: {adapter: mock-eth0, ifName: eth0, ip: 192.168.10.5, prio: 15000}" pid=1234 source=test
    linux_test.go:296: 
        Expected
            <[]net.IP | len:0, cap:0>: nil
        to have length 1

DONE 293 tests, 3 skipped, 1 failure in 356.089s

Isn't it expected, @milan-zededa?

No, the test expects list with one IP but in this case we got []net.IP. Something was probably slower than usual and we got a race. I plan to investigate this later.

@OhmSpectator
Copy link
Member Author

The tests go reeeed 😭😭😭

@OhmSpectator
Copy link
Member Author

The tests are green. Several approves are here. Merging.

@OhmSpectator OhmSpectator merged commit c54e898 into lf-edge:master Aug 28, 2024
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants