Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: move analyzers to an extensible sidecar #59869

Open
adonovan opened this issue Apr 27, 2023 · 20 comments
Open

x/tools/gopls: move analyzers to an extensible sidecar #59869

adonovan opened this issue Apr 27, 2023 · 20 comments
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) FeatureRequest gopls/analysis Issues related to running analysis in gopls gopls Issues related to the Go language server, gopls. Thinking Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Apr 27, 2023

We've talked a lot about how users can extend the set of analyzers that gopls runs so it can include org-specific checkers. Historically the challenge was that users would have to re-build gopls with the extended set of analyzers (which is not difficult but is inconvenient), or gopls would have to communicate with a separate process that contains the analyzers, which means it would have to do type checking yet again. (gopls already does--and must do--type checking of each package twice: once for its main index, and again for analysis.) But today @findleyr pointed out that, as of v0.12.0, there's no real efficiency reason not to move all of gopls' analysis into the sidecar process, so that the net number of invocations of the type checker is unchanged.

In essence, gopls would fork+exec a long-lived child process and communicate with it over a pipe. (The child process wouldn't need any other capabilities.) A request would send the metadata, source files, facts, and export data required to analyze a package, and the child would do the work and return serialized facts and diagnostics over the pipe. By default the child program would be a mode of gopls itself (e.g. 'gopls analyze'), but users could specify an alternative program that implements the same interface, analogous to the way 'go vet -vettool=...' runs an alternative unitchecker-based tool.

This approach still requires the user to build an executable containing both gopls code (the driver and core analyzers) and user code (their analyzers), but this could be done automatically by gopls: it would generate a main file from the user's configuration and then execute it, a bit like 'go test'. But perhaps this approach (and the potential for version skew of both the go toolchain and the gopls source) is more complex than the simple "re-build" approach. Something to think about.

@gopherbot gopherbot added Tools This label describes issues relating to any tools in the x/tools repository. gopls Issues related to the Go language server, gopls. labels Apr 27, 2023
@gopherbot gopherbot added this to the Unreleased milestone Apr 27, 2023
@adonovan adonovan added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Apr 27, 2023
@findleyr
Copy link
Contributor

Per in-person discussion, we need to decide whether this has significant advantages over simply making it easier to recompile gopls itself.

One primary advantage is isolation: analyzers can't mutate other global state that may affect gopls. But on the other hand, does this limit gopls extensibility to analyers? If so, is that sufficient?

@findleyr findleyr modified the milestones: Unreleased, gopls/later Apr 28, 2023
@findleyr findleyr added gopls/analysis Issues related to running analysis in gopls Thinking labels Apr 28, 2023
@adonovan
Copy link
Member Author

Yes, @hyangah raised a similar question: the user still has to build something containing their analyzers; why should that not just be gopls itself? There are fewer moving parts that way.

The only real advantage I can think of is isolation: a sidecar could make gopls robust to panics and crashes in the analyzers, and with some bisection logic it could automatically identify and retire the culprits. Also, we could run completely untrusted analyzer code safely if it ran in a sandbox (e.g. something like seccomp) with no capabilites. But that does seem like a lot of work, and presumably users trust their own org's analyzers.

The security angle is more interesting in the context of an idea that @griesemer brought up yesterday, of allowing each module to define its own analyzers that are automatically run on packages that import it. This would allow a module with a complex API to provide a set of checkers to catch misuses of it. Checkers could run as part of 'go test' (which is already running untrusted code), or they could be run by a hypothetical 'go build -analyze' command if the checkers can be sandboxed so that they run in an unprivileged process that can't access the file system or network.

Also, @ianthehat mentioned that in an earlier experiment with extensible sets of analyzers, there were significant user interface challenges with too many error messages, and cross-talk between analyzers. I don't have details but perhaps he can elaborate.

@findleyr
Copy link
Contributor

findleyr commented Apr 28, 2023

Another big advantage of an analyzer sidecar is that it could be dynamic.

An idea that has been thrown around is having codebase-specific analyzers (e.g. I have a local analyzer that I use to catch when I forget to add a Go copyright header). If there were a configurable "workspaceAnalyzers" setting that pointed to a workspace-relative directory with a pre-determined layout similar to x/tools/go/analysis/passes, we could recompile an analyzer binary for each workspace, and allow users to run their own analyzers for their codebase, for each workspace.

@ianthehat
Copy link

Another potential advantage is the ability to run multiple sidecars and cleanly merge the results

@findleyr
Copy link
Contributor

I look forward to a future where gopls hotreloads analyzers in x/tools while I edit them.

@timothy-king
Copy link
Contributor

A goplschecker.Main analog to *checker.Main seems like it would be easy for others to adapt a list of analysis.Analyzers to talk the gopls analyze protocol. Then a minimum but flexible configuration for external checkers is just a list of external binaries and flags. These could be compiled ahead of time. This only really requires an external_analyzer setting.

A part of the protocol could be a handshake to check for version skew (+config skew). gopls can warn when a a tool is disabled due to version skew.

More flexible users could just give a list of packages and Analyzer variable and the goplschecker + go.mod could be generated relatively easily. gopls could compile and recompile this on demand. It seems like this would be done fairly infrequently. This could just be another external_analyzer. My gut instinct is that a provided external_analyzer binary would be more attractive to security sensitive organizations.

@leighmcculloch
Copy link
Contributor

Historically the challenge was that users would have to re-build gopls with the extended set of analyzers (which is not difficult but is inconvenient)...

The difficulty, or inconvenience, for me is not the action of recompiling gopls, but that it is an unintuitive workflow. Setting an analyzer occurs like a config, and it's pretty uncommon when configuring an IDE tool to need to recompile the tool to collect the config.

Is it possible for gopls to call out to a binary that is a singlechecker or multichecker instead of a custom gopls API? Then users can build analysis tools as they do today, and build their multichecker, and use it with gopls.

The checkers have a -json option. If it isn't sufficient, can we improve the API of the checkers so they're usable with gopls?

@adonovan
Copy link
Member Author

adonovan commented Jul 7, 2023

The difficulty, or inconvenience, for me is not the action of recompiling gopls, but that it is an unintuitive workflow. Setting an analyzer occurs like a config, and it's pretty uncommon when configuring an IDE tool to need to recompile the tool to collect the config.

I agree, it's confusing and inconvenient, but we should keep in mind that there may be solutions to this other than the sidecar-based approach, for example a gopls that can rebuild itself based on configuration changes. (Something like this is already being discussed in the context of #61185, not that this approach is by any means trivial.)

Is it possible for gopls to call out to a binary that is a singlechecker or multichecker instead of a custom gopls API?

No, it's not possible today, and I don't think it's even the right interface for a gopls analysis sidecar because {single,multi}checker uses go/packages to load a complete program from source, whereas gopls should be in charge of dependency analysis and invalidation (and indeed providing source files, which may contain unsaved edits). The 'unitchecker' model of separate analysis is much closer to what we want, though again I would not use that interface exactly.

But the idea of building a multichecker-like tool that consists of a special main function plus a bunch of analyzers---and nothing else---is exactly what this proposal is about. It's just that the main function would be designed around the channel to gopls.

@leighmcculloch
Copy link
Contributor

leighmcculloch commented Jul 7, 2023

But could we build the gopls interface into the {single,multi}checker applications as a flag, -gopls, that enables this different mode, so that checkers by default are compatible with the language server? It seems more useful if the community of checkers are naturally and efficiently compatible with other ecosystem tools.

(Or the flag could be called -pipe or -stream or something else that isn't gopls specific if that notion of coupling to gopls specifically isn't welcome.)

@adonovan
Copy link
Member Author

adonovan commented Jul 7, 2023

But could we build the gopls interface into the {single,multi}checker applications as a flag, -gopls, that enables this different mode, so that checkers by default are compatible with the language server? It seems more useful if the community of checkers are naturally and efficiently compatible with other ecosystem tools.

(Or the flag could be called -pipe or -stream or something else that isn't gopls specific if that notion of coupling to gopls specifically isn't welcome.)

That's a really interesting idea. Yes, I think it should be possible to define a system-level interface to a process that runs a set of analyzers and communicates entirely over a pipe or socket (and has no other capabilities), without reference to any code in the gopls module.

@leighmcculloch
Copy link
Contributor

leighmcculloch commented Jul 7, 2023

Some usability advantages of bundling the logic into a single build for checkers:

  • Developers of checkers don't need to be educated about gopls' (or other tools like gopls) unique requirements.
  • Developers of checkers only need to build and distribute one binary for folks to use in a variety of places.
  • Users of checkers don't need to choose between different builds of multicheckers for use in different places.
  • Users of checkers who use both the regular and gopls multichecker modes in different places do not have to deal with problems such as having different versions of the two binaries leading to inconsistent errors in CI vs IDE vs console. (For an example of where this type of inconsistency pain with IDE vs terminal usage can be so painful, see rust-analyzer and rustc.)
  • Developers of other tools similar to gopls can also support the pipe mechanism with any community checkers.

(The different "places" I'm thinking of being:

  • CI, which would likely use the regular multichecker mode.
  • IDE, which would likely use the gopls / pipe variety of multichecker mode.
  • Console/terminal, which would likely use the regular multichecker mode.)

@firelizzard18
Copy link
Contributor

I think golangci-lint's new plugin system is worth taking a look at. TL;DR: The main package does almost nothing except call another package, so it's very easy to create a custom golangci-lint build that imports plugins; plus there's a builder tool that will automatically generate a custom main package and add plugin imports based on a config file. Regardless of whether analyzers are moved into a sidecar process, this seems like a promising model for including build-time plugins into a binary. Putting analyzers in a sidecar process seems somewhat orthogonal to custom analyzers to me. Unless each analyzer its its own separate executable (and process), some component would need to be dynamically rebuilt (when configuration updates) regardless of whether that component is a sidecar or gopls.

This is also applicable to the feature I'm working on, #59445. Specifically, it would be helpful to have an extensible mechanism for detecting test suites using 3rd party frameworks:

@hyangah made the point here golang/vscode-go#228 (comment) that supporting 3rd party frameworks could grow to be a considerable maintenance burden and that extensibility might be a better option. However, it's a lot harder to extend gopls than it is to extend vscode-go (Go vs JavaScript). If an extensibility mechanism is added for analyzers, I'd like to add something similar for this case.

Per in-person discussion, we need to decide whether this has significant advantages over simply making it easier to recompile gopls itself.

gopls seems pretty easy to compile. I just pop over to the tools repo and go install ./gopls whenever I want to test some change. I'm not familiar with other development environments, but at least for vscode-go rebuilding gopls shouldn't be hard.

@adonovan
Copy link
Member Author

adonovan commented Jul 23, 2024

Regardless of whether analyzers are moved into a sidecar process, this seems like a promising model for including build-time plugins into a binary.

I agree that this is a promising model, and it is certainly the simplest way to support extensible sets of analyzers. It has the advantage that there is only a single executable for gopls plus its analyzers, which simplifies the architecture considerably; however it does mean that users will no longer run the same gopls executables as each other, or as the one we test in our release process. Also, telemetry is currently enabled only in executables with a release tag. A sidecar approach could reduce the proportion of code that varies across users.

Unless each analyzer its its own separate executable (and process), some component would need to be dynamically rebuilt (when configuration updates) regardless of whether that component is a sidecar or gopls.

Yes, there is no getting away from that. (BTW, I don't think one executable per analyzer is feasible because each would have to repeat type checking, which is expensive. Ideally we would build one executable containing all analyzers in the workspace, and then tell that executable's process which subset of analyzers to run for each package.)

Putting analyzers in a sidecar process seems somewhat orthogonal to custom analyzers to me.

Part of the promise of dynamically recreating and loading sidecar processes is that gopls (and perhaps other analysis drivers) could automatically run sets of analyzers appropriate to the code being analyzed, with a minimum of configuration. For example, if your application depends on a third-party database and template package, your application's module would be automatically analyzed for mistakes in its database queries and template literals, without any extra action. This is a double-edged sword: it means the std packages are no longer special citizens, and any package can define static checkers that automatically apply to client code; but it also means your module's analysis experience (latency, CPU cost, verbosity) are now partly at the mercy of third-party analyzer maintainers. We would have to define clear norms.

@firelizzard18
Copy link
Contributor

This is a double-edged sword: it means the std packages are no longer special citizens, and any package can define static checkers that automatically apply to client code; but it also means your module's analysis experience (latency, CPU cost, verbosity) are now partly at the mercy of third-party analyzer maintainers.

I assume gopls and/or the sidecar would be able to gather telemetry on the latency and stability (crash rate) of analyzers? gopls could expose that telemetry via a custom command or by opening up a report in the browser, and (I think?) gopls could send a notification to the editor that an analyzer's latency is out of bounds. Giving the user a way to look under the hood and see which analyzers are misbehaving should mitigate the impact of a misbehaving analyzer.

@adonovan
Copy link
Member Author

I assume gopls and/or the sidecar would be able to gather telemetry on the latency and stability (crash rate) of analyzers?

Yeah, I was thinking along similar lines, but it's more stuff--measurement, configuration, feedback loops--that we would have to build and maintain.

@timothy-king
Copy link
Contributor

#48429 (or a minor extension to support go vet -vetool=... in addition to go run) could be a user friendly way to get a sidecars tools for specific modules. It would also be nice to play together with that proposal. There is a lot of overlap with the goals of the two proposals.

But AFAICT that kinda requires a separate process for a gopls analyzers (whether in a sidecar or the main gopls process) and that implies type checking twice to enable this. Thats a definite cost. Maybe we have a protocal for the tool to say which analyzers it can run, and if there is 100% overlap with the Analyzer list that gopls would run, we do not run the gopls analyzers that use types.

@firelizzard18
Copy link
Contributor

It seems like Alan is leaning towards a separate process, because otherwise you'd need to recompile gopls for every different combination of analyzers required by a workspace, which probably would mean recompiling it for every workspace that specifies any custom analyzers. With a sidecar process, only the sidecar would need to be recompiled for each workspace.

@adonovan
Copy link
Member Author

[Tim] But AFAICT that kinda requires a separate process for a gopls analyzers (whether in a sidecar or the main gopls process) and that implies type checking twice to enable this.

As a matter of fact, gopls already (necessarily) has two completely separate paths for type checking--one for most features and one for analysis--so it already type-checks some packages twice. So, the net effect on type-checking cost of moving all its analysis logic into a (single) sidecar process would be zero.

@findleyr
Copy link
Contributor

Perhaps relevant that in a separate discussion, @aclements reports poor gopls performance when having both the runtime package and a high level package open, because the analysis driver has to reprocess every package in between, making even the VS Code process itself painfully slow to respond.

Having analysis in a separate process might make it easier to decrease the priority of that process.

@adonovan
Copy link
Member Author

I strongly suspect the performance problem is that we didn't implement type-check batching in the analysis driver; we should do that, to reduce the amount of export-data parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) FeatureRequest gopls/analysis Issues related to running analysis in gopls gopls Issues related to the Go language server, gopls. Thinking Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

7 participants