Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memprofile: add http debug test #7

Conversation

christoph-zededa
Copy link
Collaborator

@christoph-zededa christoph-zededa commented Aug 15, 2024

Summary by CodeRabbit

  • New Features

    • Introduced improved error handling for server operations.
    • Added a new test for memory pressure monitoring functionality.
  • Bug Fixes

    • Enhanced error distinction during server shutdown to improve logging accuracy.
  • Refactor

    • Renamed listenDebug to ListenDebug to increase its accessibility.
    • Changed PressureMemoryFile from a constant to a variable for runtime modifications.

OhmSpectator and others added 8 commits August 9, 2024 18:56
eve-kernel-amd64-v6.1.38-generic
    8859f43ee8c9: eve_defconfig: Enable PSI (Pressure Stall Information).

eve-kernel-amd64-v6.1.38-rt
    37bc37278441: eve_defconfig: Enable PSI (Pressure Stall Information).

eve-kernel-arm64-v5.10.104-nvidia
    7b9a55e0b659: defconfig: Enable PSI in default configuration.

eve-kernel-arm64-v6.1.38-generic
    acbfadebeb76: eve_defconfig: Enable PSI in default configuration.
    bb8ed98284d6: eve_defconfig: Remove ST33HTPM support.

The second commit in the 5.10 kernel update is not logically a part of this
patch set, but a leftover from a previous one.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces a Pressure Stall Information (PSI) collector to
the AgentLog component in Pillar. The collector periodically gathers
memory pressure data from the system, logging it in a managed file. The
code also includes functionality for handling the file size to ensure
that only recent data is kept. This collector is designed to be shared
between the AgentLog component and a future standalone tool for enhanced
memory pressure monitoring.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces new constants for the memory-monitor directories
within the Pillar codebase. Specifically, it adds paths for the
memory-monitor output directory and the PSI log file. This change
ensures that Pillar components are aware of and can correctly reference
the locations used by the memory-monitor.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit adds functionality to the Pillar internal debug API,
allowing the start and stop of the Memory PSI collector via HTTP POST
requests. New endpoints `/memory-monitor/psi-collector/start` and
`/memory-monitor/psi-collector/stop` have been introduced, enabling
control over the PSI collector process.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…ript.

This commit extends the `eve` script with new commands `psi-collector
start|stop`. These commands interact with the internal debug API to
start and stop the Memory PSI collector, respectively.

This addition makes it convenient to control the PSI collector directly
from the terminal, with the results stored in
`/persist/memory-monitor/output/psi.txt`.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit adds a standalone version of the PSI Collector, allowing it
to be built and used independently on older versions of EVE where the
collector is not integrated into the root filesystem. The new component
includes a Makefile for building the binary, a README with instructions,
and a basic main.go implementation that handles the PSI collection
process.

This ensures compatibility and provides memory pressure monitoring
capabilities for legacy systems.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit adds a new tool, `psi-visualizer`, designed to visualize PSI
data, which is essential for understanding memory pressure patterns over
time. The tool uses Python with Plotly to create interactive plots from
PSI logs generated by the PSI Collector.

The visualizer includes a Makefile for environment setup, a
`requirements.txt` for dependencies, and a `visualize.py` script that
reads PSI data and generates plots.

A README is also included to guide users through setup and usage, making
it easier to analyze memory pressure visually.

Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
Copy link

coderabbitai bot commented Aug 15, 2024

Walkthrough

The changes enhance the agentlog module by improving error handling and accessibility. A key function was renamed to be exported, allowing external use, while its error management was refined to differentiate between normal server shutdowns and other errors. Additionally, a new test was introduced for memory pressure monitoring, expanding the module's testing capabilities. Lastly, a constant was converted to a variable, permitting dynamic assignment of the memory pressure file path.

Changes

Files Change Summary
pkg/pillar/agentlog/agentlog.go Renamed listenDebug to ListenDebug for export; improved error handling for server shutdown.
pkg/pillar/agentlog/agentlog_test.go Added TestMemoryPressure to validate memory pressure functionality; checks file creation and updates.
pkg/pillar/agentlog/memprofile.go Changed PressureMemoryFile from a constant to a variable to allow runtime modifications.

Poem

🐇 In the code, we twist and twine,
Functions gleam and brightly shine.
Memory's pressure, we now can test,
Errors handled, we give our best!
With hops of joy, let’s celebrate,
Code so fine, it feels just great! ✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@OhmSpectator
Copy link
Owner

@coderabbitai review

Copy link

coderabbitai bot commented Aug 15, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range, codebase verification and nitpick comments (2)
pkg/pillar/agentlog/agentlog_test.go (1)

248-249: Consider making the HTTP server address configurable.

Hardcoding the server address can lead to conflicts in shared environments.

- go agentlog.ListenDebug(log, "stackdump", "memdump")
+ serverAddress := "localhost:6543"
+ go agentlog.ListenDebug(log, serverAddress, "stackdump", "memdump")
pkg/pillar/agentlog/memprofile.go (1)

44-44: Document the intended use cases for modifying PressureMemoryFile.

Changing this from a constant to a variable allows runtime modifications, which should be controlled and documented to prevent unintended side effects.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between aba25f0 and 9e9c3df.

Files selected for processing (3)
  • pkg/pillar/agentlog/agentlog.go (4 hunks)
  • pkg/pillar/agentlog/agentlog_test.go (2 hunks)
  • pkg/pillar/agentlog/memprofile.go (2 hunks)
Additional comments not posted (2)
pkg/pillar/agentlog/agentlog.go (2)

8-8: Good use of the errors package for improved error handling.

The addition of the errors package allows for more precise error checking using errors.Is.


Line range hint 237-342:
Exporting ListenDebug and refining error handling enhances functionality.

The changes improve accessibility and robustness. Ensure that the function is used appropriately in the codebase.

Verification successful

Exporting ListenDebug is currently contained within the package.

The function ListenDebug is used internally within the package and is tested in agentlog_test.go. There are no external usages impacting other parts of the codebase.

  • agentlog_test.go: Tests ListenDebug by starting and stopping the HTTP service.
  • agentlog.go: Internal usage of ListenDebug.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify new usages of `ListenDebug` in the codebase.

# Test: Search for usages of `ListenDebug`. Expect: Occurrences of the new function name.
rg --type go -A 5 $'ListenDebug'

Length of output: 1169

Comment on lines +233 to +235
mpfh, err := os.CreateTemp("", "memory-pressure")
if err != nil {
panic(err)
Copy link

@coderabbitai coderabbitai bot Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use t.Fatalf instead of panic for error handling in tests.

Using t.Fatalf provides better test output and integration with testing tools.

- if err != nil {
-   panic(err)
- }
+ if err != nil {
+   t.Fatalf("unexpected error: %v", err)
+ }

Also applies to: 239-241, 255-259, 261-264, 268-271, 273-276, 293-296

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need these "better test output and integration with testing tools" in the scope of this test?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OhmSpectator, using t.Fatalf instead of panic in tests is generally recommended because it provides better integration with the Go testing framework. It allows for more informative test output and ensures that the test is properly marked as failed, which can be particularly beneficial in automated testing environments or CI/CD pipelines. However, if these benefits are not crucial for your current testing scope, you might choose to keep the existing implementation. Ultimately, the decision depends on your project's specific needs and constraints.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to use both panic and t.Fatalf as this way, we can distinguish between the errors caused by generic system failure (testing env is broken - use panic) and real test failure (use t.Fatalf for that). Or are there better approaches?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OhmSpectator, your approach to using both panic and t.Fatalf to differentiate between system failures and test failures is understandable. However, using panic in tests can lead to abrupt exits, which might skip other tests and make it harder to gather comprehensive results.

A more structured approach could involve using custom error handling or logging to categorize errors. You could log system-related errors with a specific tag or message, making it clear they are not test failures. Additionally, using t.Skip or t.Skipf for known environmental issues can help manage test execution without resorting to panic. This way, you maintain the clarity of test results while avoiding the potential downsides of using panic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give me an example of such handling...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OhmSpectator, here's an example of how you can handle errors in tests using structured logging and custom error handling instead of panic. This approach helps distinguish between system-related issues and actual test failures:

package mypackage

import (
	"log"
	"os"
	"testing"
)

func TestExample(t *testing.T) {
	// Simulate an environmental check
	if !isEnvironmentReady() {
		log.Println("Environment not ready, skipping test")
		t.Skip("Skipping test due to environmental issues")
	}

	// Simulate a test operation
	file, err := os.Open("testfile.txt")
	if err != nil {
		log.Printf("System error: %v", err)
		t.Fatalf("Failed to open test file: %v", err)
	}
	defer file.Close()

	// Further test logic...
}

// isEnvironmentReady simulates an environmental check
func isEnvironmentReady() bool {
	// Add logic to verify if the environment is ready
	// Return false if not ready
	return true
}

In this example, log.Println and log.Printf are used to log messages with specific information about the error type. t.Skip is used to skip the test if the environment is not ready, while t.Fatalf handles actual test failures. This way, you maintain clear and structured test results without using panic.

@@ -42,6 +40,9 @@ const (
StatLineSize = 75
)

// PressureMemoryFile is the memory pressure file
var PressureMemoryFile = "/proc/pressure/memory"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christoph-zededa, I'm not too fond of making such a value a variable. It's a constant. Is there a way to mock a constant in Go? Or maybe we can introduce conditional builds with different values for test env and prod env?

Copy link
Collaborator Author

@christoph-zededa christoph-zededa Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could f.e. make the path a parameter to listenDebug and when we invoke it from the tests, we can give it a different path than in the real system.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... It looks like a proper solution...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, as I consider adding parameters to all the functions, I start thinking about the possibility of collecting PSI stats for not just memory but also CPU and IO. In that case, we could make the PSI collector more generic, move it to another file, or even a different component. However, I then realize that we don't actually need all of that right now, and I hesitate to redo all the code now.

So, can we mock a constant at this point?

panic(err)
}

err = os.MkdirAll(types.MemoryMonitorOutputDir, 0755)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will fail to create a directory at /persist on a host system...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it supposed to run only in a container? How do you start the tests? Not with go test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.MkdirAll is like mkdir -p it won't fail if the directory already exists.

I start the test either from an ubuntu container lf-edge/eve@master...christoph-zededa:eve:rr or from the eve-build container or the pillar container itself.

make test runs all tests in a container - so it takes a while ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha... In the pillar dir, I can run make build-docker-env enter-docker-env, and the tests work there. It's not very convenient, but it works.

@OhmSpectator
Copy link
Owner

I find the approach useful, but we still have to integrate it better...

@OhmSpectator
Copy link
Owner

@christoph-zededa, I wantd to push an extra commit on top of the PR, but got this

ERROR: Permission to christoph-zededa/eve.git denied to OhmSpectator.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

@christoph-zededa
Copy link
Collaborator Author

@christoph-zededa, I wantd to push an extra commit on top of the PR, but got this

ERROR: Permission to christoph-zededa/eve.git denied to OhmSpectator.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

I guess you either have to create a PR into my branch which is a PR into your branch.
Or you merge/cherry-pick my branch into your branch and put a commit on top.

@OhmSpectator
Copy link
Owner

I guess you either have to create a PR into my branch which is a PR into your branch.
Or you merge/cherry-pick my branch into your branch and put a commit on top.

I wanted to draft the approach in this PR.
To push changes into the PR, I should have collaborator permissions for your EVE fork, as I did for you in my fork (so you can push changes into my PRs).

@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch 7 times, most recently from 71c2091 to 19f2306 Compare August 16, 2024 23:58
@OhmSpectator OhmSpectator force-pushed the feature/add-mem-pressure-statistics-collector branch 3 times, most recently from 0969b2b to 122406e Compare August 26, 2024 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants