Files
integration/Docs/Better-Debugging-With-LLDB.md

173 lines
7.2 KiB
Markdown

# Better Debugging With LLDB (in VS Code + CodeLLDB)
## The Problem
When debugging Unreal with VS Code + CodeLLDB, the **Variables** pane and
the **Watch** pane use two completely different evaluation paths:
- **Variables pane** walks a tree built by lldb's *synthetic children
providers* (Unreal's Python formatters for TArray, TMap, FName, FString,
and now our TObjectPtr/TSharedPtr/TWeakPtr). Values are looked up by
offset/type — no compilation. Base classes appear as named children
("SCompoundWidget", "UWidget"); smart-pointer inners get unwrapped;
container elements get indexed.
- **Watch pane** (without intervention) runs text through the `/se` (simple)
or `/nat` (Clang) evaluator. `/nat` fails for most Unreal paths;
`/se` transpiles to Python and walks the SBValue tree, but — as
originally shipped — fails on pointer auto-deref and can't find base
classes by type name.
When you right-click in Variables and **Add to Watch**, CodeLLDB sends
a path like `Top.Widget` (built itself for synthetic children, uniform
`.`, no pointer awareness). Without the fix below, that path fails in
Watch even though it identifies a real value Variables can show.
## What We Did
All changes live in `tools/UEDataFormatter.py`, loaded via `initCommands`
in every launch config in `Integration.code-workspace.tpl.json`.
### 1. Patched `codelldb.value.Value.__getattr__`
At module load, we monkey-patch CodeLLDB's `Value` class — the Python
wrapper it uses inside `/se` and `/py` evaluation — to:
- **Auto-deref pointers** before descending into a named child.
- **Fall back to iterating children** by `GetName()` when
`GetChildMemberWithName` returns invalid. This catches base classes
(which appear as named children when iterated but can't be looked up
by name).
**Result:** plain "Add to Watch" from the Variables pane produces a path
like `Top.Widget.SomeField` and Watch evaluates it correctly — same
expandable tree you'd see in Variables. No `/py fv(...)` wrapper, no
VS Code extension, no manual prefix typing.
Blast radius: every `/se` and `/py` expression that goes through
`Value.__getattr__`. The change is strictly more permissive (paths that
used to fail now succeed), but it is a global behavior change to
CodeLLDB internals. Breaks if CodeLLDB renames `Value` or the
`__sbvalue` slot; re-applied automatically on extension reload since
our patch runs on `command script import`.
### 2. Synth + Summary Providers for Smart Pointers
UE's engine formatter covers containers and TWeakObjectPtr, but not the
smart pointer family. We added providers for:
- **`TObjectPtr<T>`** — shows `nullptr` / `unresolved` / wrapped object's
summary. Expanding flattens straight to the target's members (no
intermediate `*DebugPtr` click).
- **`TSharedPtr<T>`** and **`TSharedRef<T>`** — shows `nullptr` / target
summary; expands straight to target's members.
- **`TWeakPtr<T>`** — checks
`WeakReferenceCount.ReferenceController.SharedReferenceCount` before
dereferencing. Expired weak refs show `expired` rather than garbage
from a dangling pointer.
All registration regexes are anchored with `^` — otherwise the greedy
`.+` matches nested occurrences (a `TArray<TObjectPtr<X>, ...>` would
get dispatched to the TObjectPtr provider instead of TArray).
### 3. Dynamic Type Resolution
Every launch config now sets:
settings set target.prefer-dynamic-value no-run-target
lldb reads the vtable at each polymorphic value and shows the runtime
type's members. A `UObject*` that actually points to a `UUserWidget`
expands to the full `UUserWidget` subtree, not just `UObject`.
`no-run-target` avoids running code in the debuggee, which is important
during synthesis.
### 4. Universal SIGTRAP Handling
`process handle SIGTRAP --notify false --pass false --stop false` is now
in every launch config. Unreal raises SIGTRAP internally in a number of
places (soft asserts, ensure-style checks); without this, the debugger
stops constantly.
## Reloading Without a Session Restart
If you edit `tools/UEDataFormatter.py`, reload in the Debug Console:
script import importlib; importlib.reload(UEDataFormatter); UEDataFormatter.__lldb_init_module(lldb.debugger, {})
The reload re-executes module code (updating the patch and the provider
classes). The explicit `__lldb_init_module` call re-runs the provider
registrations — lldb only fires `__lldb_init_module` on initial import,
not on reload.
## Summary of Workflow
| Action | Where | How |
|---|---|---|
| Explore a value | Variables pane | Click disclosure triangles |
| Track a value across steps | Watch pane | Right-click variable → Add to Watch (just works) |
| One-shot inspection | Debug Console | `v Widget.Object->SCompoundWidget.SWidget` (`v` = `frame variable`; use `->` for pointers explicitly) |
| Reload formatter edits | Debug Console | `script import importlib; importlib.reload(UEDataFormatter); UEDataFormatter.__lldb_init_module(lldb.debugger, {})` |
## Notes on the Design
### Why this works
CodeLLDB's `/se` evaluator transpiles user expressions into Python that
operates on `Value` objects. `Value.__getattr__` drives every `.field`
access. By making that method auto-deref and iterate for base classes,
every downstream mechanism (Watch, Debug Console, hover, conditional
breakpoints) inherits the fix.
### Why `fv` is no longer needed
Earlier we had an `fv(path)` helper plus a plan for a companion VS Code
extension to wrap "Add to Watch" results in `/py fv(...)`. The
`Value.__getattr__` patch makes the default path work, so that whole
layer is obsolete.
### Why not patch `SBValue` instead
Tempting, but much larger blast radius — affects every tool, every
adapter, every Python script using lldb. Patching `Value` confines the
change to CodeLLDB's expression pipeline.
## Ideas for Further Improvement
### Propose `/fv` mode to CodeLLDB upstream
Prefix dispatch is in CodeLLDB's Rust binary, so we can't add a new
prefix from Python. A clean feature request would be to add `/fv`
`SBFrame::GetValueForVariablePath(code)` as a native evaluator. That
would give direct access to lldb's own frame-variable walker without
the Python/Value indirection — though it has its own limitations
(no arithmetic, no casts).
### More synth providers
- **`TOptional<T>`** — hide the storage bytes and `bIsSet`; expose the
contained value (or `unset`) directly.
- **`TVariant<...>`** — expose the currently-held alternative as the
single child.
- **`FText`** — show the resolved localized string as the summary.
- **`FSoftObjectPtr` / `FSoftClassPtr`** — show the asset path.
### Enrich TObjectPtr summary
When resolved, show both class and name (`UUserWidget 'W_HUD_0'`)
instead of just the name. The class is reachable via
`ClassPrivate->NamePrivate`.
### A `fdump` helper
A Debug Console helper that prints an entire subtree as indented text —
useful for grabbing a snapshot of complex state into a log or comment.
### Get `Copy as Expression` to emit `->` for pointers
The path CodeLLDB builds for synthetic children uses `.` uniformly,
regardless of whether intermediate values are pointers. That's why
`v Top.Widget` fails but `v Top->Widget` works. A feature request to
have CodeLLDB emit `->` when traversing a pointer would make paths
`frame variable`-compatible out of the box.