Here’s a concise update on the latest about Abstract Syntax Trees (ASTs).
-
What ASTs are: Abstract Syntax Trees are tree representations of source code that strip away some syntactic details to expose the essential structure for analysis, transformation, or compilation. This foundational idea underpins many tooling tasks such as static analysis, code formatting, and automated refactoring.[7]
-
Current developments and trends:
- Recent research compares ASTs produced by different parsers (e.g., JDT, Tree-sitter, ANTLR, srcML) in terms of size, depth, and abstraction level, with JDT often yielding smaller, shallower trees but higher abstraction, while others offer richer but potentially more verbose representations; these trade-offs affect downstream tasks like code summarization and code search.[2][4]
- There is growing interest in the role of ASTs in large language model workflows, especially for code understanding, retrieval-augmented generation, and patching; researchers are examining how AST representations interact with model learning and how they can improve code-related tasks.[1][4]
- For multi-language tooling and code analysis, several languages and ecosystems continue to expand their ecosystem of AST tooling (e.g., languages with Tree-sitter, JDT, and other parsers), reflecting ASTs’ centrality in modern code intelligence pipelines.[9]
-
Practical implications:
- Tooling like code formatters, linters, and refactoring engines rely heavily on reliable ASTs; the choice of parser can influence performance, accuracy, and the cognitive load on models or developers using those tools.[2][7]
- As AI-assisted development grows, ASTs may play an increasingly important role in representing code for models, enabling more precise transformations and safer automated edits.[4][1]
If you want, I can tailor this to a specific language or toolchain (e.g., Java, JavaScript/TypeScript, Python) and pull the most relevant, up-to-date sources for that ecosystem. I can also summarize practical guidance for choosing an AST parser based on your use case (analysis, transformation, or generation). Citations can be provided after each relevant point.
Sources
Based on the extensive experimental results, we conclude the following findings: • The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On … pets require more high-level abstract summaries in code summarization, and code snippets semantically match but contain fewer query...
arxiv.orgWe apply the approach to gradually migrate the schemas of the AUTOBAYES program synthesis system to concrete syntax. Fit experiences show that this can result in a considerable reduction of the code size and an improved readability of the code. In particular, abstracting out fresh-variable generation and second-order term construction allows the formulation of larger continuous fragments and improves the locality in the schemas. … We used the recent grammar of the Arden Syntax v.2.10, and both...
www.science.govievans on June 7, 2021 It supports many more languages (~17 at various stages of development) and being able to do AST patching as in the original is one of the capabilities we're experimenting with: https://semgrep.dev/docs/experiments/overview/#autofix Would love your feedback!
news.ycombinator.com• The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On the contrary, ASTs generated by ANTLR exhibit the largest size and the lowest abstraction level. Tree-sitter and srcML are both intermediate in structure size and abstraction level between JDT and ANTLR. … • Among...
arxiv.orginterpreter, pyre-ast will be able to parse/reject it as well. Furthermore, abstract syntax trees obtained from pyre-ast is guaranteed to 100% match the results obtained by Python's own ast.parse API, down to every AST node and every line/column number.
alan.petitepomme.net