0x00E - Code Query Tools 👁️🗨️
If you want to get more insights about development trends and haven’t subscribed yet, go for it!
It took me some time to get this new issue afloat, many travels… I’m back in Tel Aviv now, so get ready for more!
Plus, if you can register on ProductHunt, that would be amazing ♥️ (next issue will include my first launch of Unzip - so having an account will be tremendously helpful).
Code Query Tools
Similar/related terms: Static code analysis, Code analysis engine, Semantic code analysis, SAST
TL;DR:
- Problem: Uncovering complex code patterns at scale is hard.
- Solution: Query codebases semantically (by meaning, not just syntax) at scale.
- In Sum: Find bugs, code insights, and enforce standards with semantic code analyzers instead of manual techniques (like ASTs).
How does it work? 💡
- The semantic code analyzer pre-processes your codebase so you could later query it quickly.
- You write queries such as:
from Function f where count(f.getAnArg() >5) select f
→ (CodeQL) that find functions that have more than 5 arguments, orfrom Function f where not exists(FunctionCall fc | fc.getTarget() = f) select f
finding functions that are never called. - You can integrate those queries into your build/ CI process by adding rules that fail the build if they come up with something problematic.
- Most products in this space have IDE plugins you can also use.
For the unfamiliar and curious, I strongly recommend you check out some other program analysis concepts that underlay a lot of the tools we talk about, such as:
- Abstract Syntax Trees - Representing code in a tree structure, to easily analyze and manipulate with code.
- Control Flow Graphs - Shows the paths the code could take when executed.
- Program Dependence Graphs - Represents dependencies between parts of code.
- Code Property Graph - Uses all 3 concepts above together, check the (interesting!) whitepaper.
- Other related terms: Lexing, Parsing, and Taint tracking (figuring out how values are propagated in a program).
Who is this for? ✅
- Security-minded developers and DevSecOps professionals.
- Code-aware products (developer tools, code analysis, code data-mining).
Why? 🤔
- Security: Automating security tests.
- Large codebases: It’s a lot easier and fast to query.
- Language agnostic: These tools usually support more than one language with minimal changes to your query.
- Abstraction: Using a semantic query language is more delightful than using something like ASTs. Not to mention the extra metadata (dependencies and control flow) you get from such tools.
Why not? 🙅
- License: Most of these tools are positioning themselves as security products, so their licenses are quite restrictive (e.g. CodeQL is only free for research and open-source products).
- Overkill: Sometimes a more basic approach using a classical static analysis might be enough (AST, CFG or PDG).
Tools & players 🛠️
- CodeQL: GitHub-owned tool for querying code in an SQL-like fashion (supports 10 languages currently) that includes a VSCode extension.
- Semgrep: YAML pattern rules to do code analysis with the ability to auto-fix problems.
- Joern: Open-source platform for analyzing source code (and bytecode/executables).
- Weggli - Semantic search tool for C and C++ in large codebases.
- (Update) ast-grep - code structural search, lint and rewriting.
Forecast 🧞
- Use case variety: Most products in this space target security-minded folks. But I think they are missing a big chunk of use cases that way. This trend could be very interesting for developer tools, code efficiency analysis, code quality checks, and more.
- New tool: Because most of these tools focus on security and have a restrictive license, I could see a CodeQL-like open-source tool that puts less emphasis on security checks and is less restrictive.
- Usage: As more developers become familiar with code query tools, I could see a new wave of developer experience (DX) tools and experiences we haven’t seen before.
Extra ✨
- Semgrep’s Playground
- CRAQL Whitepaper: A Composable Language for Querying Source Code.
Thanks 🙏
I wanted to thank @TomGranot (the best DevRel I know), @AndyKatz (tons of insights about technical graph-based code analysis) for such valuable insights on this issue.
EOF
(Where I tend to share unrelated things).
Some thoughts I have about ChatGPT going forward:
Any questions, feedback, or suggestions are welcome 🙏
Simply reply to this e-mail or tweet at me @agammore - I promise to respond!