Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dyno: initial steps for producing typed AST from uAST + types #26049

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

mppf
Copy link
Member

@mppf mppf commented Oct 7, 2024

This PR adds a new prototype for converting from uAST + types to production compiler AST. The dyno effort to improve the compiler has been recently focusing on creating an incremental resolver. We would like to use this incremental resolver in production chpl compiles, but we will need to convert its results into a form that the production compiler understands to do so.

This PR also improves the user-facing error message for mixing return; and return x; in one function.

Before this PR, convert-uast.cpp had some initial support for converting types and resolved calls. However, it was hard to see how to continue to extend this in-place because both the input and output of this compiler pass need to change:

  • the input for convert-uast.cpp is an untyped uAST traversal but it needs to be a typed uAST traversal using ResolvedVisitor
  • the output from convert-uast.cpp is predominantly untyped AST but it needs to be typed AST with other changes.

This PR changes direction by adding convert-typed-uast.cpp. This is formed in part by copying some of the code from convert-uast.cpp and modifying it. Besides having a tighter implementation, there are two key ideas here:

  1. We will need to be able to make incremental progress with resolving. To enable that, this PR computes a call graph of functions called from user code. Only the functions in this call graph will go through the new strategy.
  2. The AST produced by convert-typed-uast.cpp will match the compiler's AST after callDestructors. That means that convert-typed-uast.cpp will work with the dyno resolver to normalize and to handle copy-init and deinit calls.

Implementation Details:

  • Some code can be reasonably shared between convert-uast.cpp and convert-typed-uast.cpp. I moved that code to convert-help.cpp.
  • Since there are now two uAST -> AST converters that need to potentially interact or to replace each other, these are now subclasses of an abstract base class.
  • Since convert-uast.cpp no longer needs to handle types, I removed the type handling code from there (and much of it is added back in a slightly different form in convert-typed-uast.cpp).
  • To allow incremental development, this PR adds a call-graph analysis and uses the call graph of function called from user module code in order to decide which functions to convert in this new manner when compiling with --dyno.
  • In order to support deterministic ordering in the AST produced for different compiles with the same input code, this PR adds operator < to QualifiedType and

Future Work:

  • adjust call-graph.cpp and convert-typed-uast.cpp to handle nested functions
  • fill in convert-typed-uast.cpp for many code patterns (this PR is a starting point, and calls and module init functions seem to be working OK, but most uAST patterns are unimplemented).
  • fill in the implementation for operator < for QualifiedType to check the contents of Type*s.
  • handle proc main (both converting it if present & also creating the compiler-generated one)
  • full comm=none testing

Copy link
Contributor

@DanilaFe DanilaFe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed everything but the "meat" (convert-typed-uast.cpp). I expect that to take me some time, so here are my comments in the meantime.


struct Converter;

// base class for Converter and TConverter (typed Converter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call TConverter TypedConverter? The T seems a bit awkward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We repeat it for a lot of the methods e.g. TConverter::enter(...) and so wanted to use a short name to keep all of the formals from being too far to the right.

compiler/include/convert-uast.h Outdated Show resolved Hide resolved
compiler/include/passes.h Outdated Show resolved Hide resolved
compiler/passes/parseAndConvert.cpp Outdated Show resolved Hide resolved
frontend/lib/resolution/call-graph.cpp Show resolved Hide resolved
Comment on lines 49 to 74
bool
UntypedFnSignature::FormalDetail::operator<(const FormalDetail& other) const {
if (name != other.name) {
return name < other.name;
}

if (defaultKind != other.defaultKind) {
return defaultKind < other.defaultKind;
}

bool hasDecl = (decl != nullptr);
bool otherHasDecl = (other.decl != nullptr);
if (hasDecl != otherHasDecl) {
return hasDecl < otherHasDecl;
}
if (decl && other.decl && decl != other.decl) {
return decl->id() < other.decl->id();
}

if (isVarArgs != other.isVarArgs) {
return isVarArgs < other.isVarArgs;
}

return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why all these <s are needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for TConverter::convertFunctionsToConvert. That function converts all functions in a CalledFnsSet (CalledFnsSet is a set of functions = std::unordered_map<const ResolvedFunction*, int>). Because production AST does things like encode the order of declaration in the integer id; and because follow-on passes can depend on the ordering of AST declarations; we need this function to convert functions in the same order on every compilation, even though the unordered_map iteration order would be nondeterministic.

frontend/lib/uast/post-parse-checks.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@DanilaFe DanilaFe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having gotten through through convert-type-uast.cpp my chief concern is the amount of duplication between the typed and untyped conversions. I know that code was copied (I see David's comments that I remember from convert-uast.cpp, but it also looks significantly changed. I find this concerning because, given the complexity of typed and untyped conversions, it's relatively hard for me to observe any differences between the two.

This is true in general. I can read through these enter functions (I have given them only brief looks while reviewing), but I certainly can't tell you to what degree they are equivalent to their untyped versions. And if I were to fix a bug or make a change in the sibling pass (e.g., typed to untyped), I think I would have some difficulty ensuring that change occurs in the current pass.

I don't know what can be done to address these changes -- can we split up some complicated transformations into structs/classes with state, and implement the calls to enter in terms of method calls on such structs, which incrementally change state? We might then be able to interweave typed-specific logic with the general conversion shape. I am okay with the PR going in as-is in the name of progress, but I do think it's going to be a pretty significant maintenance challenge.

compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
Comment on lines +387 to +417
//Expr* singleExprFromStmts(AstListIteratorPair<AstNode> stmts);
// CallExpr* convertWithClause(const WithClause* node, const AstNode* parent);
//CallExpr* convertNewManagement(const New* node);
//Expr* convertScanReduceOp(const AstNode* node);
//Expr* convertLoopIndexDecl(const Decl* node);
//bool isBracketLoopMaybeArrayType(const BracketLoop* node);
//Expr* convertBracketLoopExpr(const BracketLoop* node);
//Expr* tryExtractFilterCond(const IndexableLoop* node, Expr*& cond);
//Expr* convertForallLoopExpr(const Forall* node);
//Expr* convertCalledKeyword(const AstNode* node, const Call* inCall);
//Expr* convertSparseKeyword(const FnCall* node);
//CallExpr* convertModuleDotCall(const FnCall* node);
//Expr* convertDmappedOp(const OpCall* node);
//Expr* convertTupleExpand(const OpCall* node);
//Expr* convertReduceAssign(const OpCall* node);
//Expr* convertToNilableChecked(const OpCall* node);
//Expr* convertLogicalAndAssign(const OpCall* node);
//Expr* convertLogicalOrAssign(const OpCall* node);
//Expr* convertTupleAssign(const OpCall* node);
//Expr* convertRegularBinaryOrUnaryOp(const OpCall* node,
// const char* name=nullptr);
//BlockStmt* convertTupleDeclComponents(const TupleDecl* node);
//FnSymbol* convertFunction(const Function* node);
//FnSymbol* convertFunctionSignature(const FunctionSignature* node);
//CallExpr* convertArrayType(const BracketLoop* node);
//DefExpr* convertEnumElement(const EnumElement* node);
//template <typename Iterable>
//void convertInheritsExprs(const Iterable& iterable,
// std::vector<Expr*>& inherits,
// bool& inheritMarkedGeneric);
//Expr* convertAggregateDecl(const AggregateDecl* node);
//Expr* convertTypeExpression(const AstNode* node);
//Expr* convertTypeExpressionOrNull(const AstNode* node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider declaring these delete (something like the following)

Expr* convertTypeExpressionOrNull(const AstNode* node) = delete;

This way they will show up in "smart" autocompletion and be typechecked, without the burden of having to implement them. If you use a helper marked delete, then see the comment above, that would be a sign to bring it over.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to leave these as-is for now.

compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Show resolved Hide resolved
compiler/passes/convert-typed-uast.cpp Outdated Show resolved Hide resolved
@mppf
Copy link
Member Author

mppf commented Oct 14, 2024

Having gotten through through convert-type-uast.cpp my chief concern is the amount of duplication between the typed and untyped conversions. I know that code was copied (I see David's comments that I remember from convert-uast.cpp, but it also looks significantly changed. I find this concerning because, given the complexity of typed and untyped conversions, it's relatively hard for me to observe any differences between the two.

This is true in general. I can read through these enter functions (I have given them only brief looks while reviewing), but I certainly can't tell you to what degree they are equivalent to their untyped versions. And if I were to fix a bug or make a change in the sibling pass (e.g., typed to untyped), I think I would have some difficulty ensuring that change occurs in the current pass.

I don't know what can be done to address these changes -- can we split up some complicated transformations into structs/classes with state, and implement the calls to enter in terms of method calls on such structs, which incrementally change state? We might then be able to interweave typed-specific logic with the general conversion shape. I am okay with the PR going in as-is in the name of progress, but I do think it's going to be a pretty significant maintenance challenge.

I struggled with adjusting convert-uast.cpp for a long time. I wanted to make incremental progress on it but kept getting caught up in problems with the structure of that pass. To me, splitting the typed conversion into a different file is what allows any progress in this area at all. When it comes down to it, they are doing very different things and they have different issues.

I can read through these enter functions (I have given them only brief looks while reviewing), but I certainly can't tell you to what degree they are equivalent to their untyped versions.

Of course they are different. The new ones are doing typed conversion!

And if I were to fix a bug or make a change in the sibling pass (e.g., typed to untyped), I think I would have some difficulty ensuring that change occurs in the current pass.

IMO the form of the bug fix would be very different between the two, because having type information available everywhere is a very different situation. Moreover, the AST that is being generated is totally different. convert-typed-uast.cpp generates normalized and resolved AST while convert-uast.cpp matches the AST from ye olde parser while adding in some scope resolution. As a specific example, convert-typed-uast.cpp will handle return 1 by storing 1 into the return value variable, rather than emitting a PRIM_RETURN and expecting the normalizer to change it later.

I would even go so far as to argue that a given bug is unlikely to affect both of them.

Another element here is that I don't expect that we'll keep convert-uast.cpp at all in the long term. Once the new resolver is complete and we have completed convert-typed-uast.cpp, we can remove convert-uast.cpp and put any bits of it that are still used by convert-typed-uast.cpp into that remaining file. It has always been our plan to replace the early passes in the compiler & IMO the strategy here with convert-typed-uast.cpp will help us to do that.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Because I want to add a bunch of other shared stuff to it.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Goal is to have < on ResolvedFunction able to make
a deterministic order (that does not depend on pointer values).

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
and show it by converting return type

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
because with the non-const RV callers might want to add it, so probably
better to use hasAst in that case.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
We have a different way (based on call graph ordering) to make it
deterministic. The operator< way has the challenge of needing to order
types which is not quite straightforward.

If these are needed in the future, the drafts removed in this commit
could be added back in.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants