Morph: AST-Level Refactoring Where the LLM Describes Intent.

When an LLM generates source code for a refactor, the output is a diff a reviewer must read line by line and trust blindly. There is no way to know if the model missed a reference, broke an import, or introduced a subtle logic change without reading every line.

Morph takes a different approach. Instead of asking the LLM to generate code, it asks the LLM to describe what to change as a structured plan of typed operations - RenameSymbol, MoveFunction, ExtractModule, and more. A reviewer reads ten structured operations in seconds and knows exactly what will change, why, and in what order. The transformation engine then validates the plan against the real codebase dependency graph, applies each operation atomically using tree-sitter AST manipulation, runs the test suite to confirm correctness, and stages clean changes for review. Failed transformations roll back automatically.

The LLM's job is intent declaration, not code writing. Morph's engine handles everything else.

Why Typed Plans Beat Source Code Generation

When a refactoring is expressed as a typed plan, every operation is verifiable before it runs. The plan validator checks file existence, symbol existence, dependency conflicts, and operation conflicts against a real dependency graph. The transformer applies operations in dependency order. The verifier runs pytest after every apply - any failure triggers automatic rollback.

Source code generation has none of these guarantees. A typed plan does.

The Pipeline

A natural language goal enters the LLM Planner, which outputs a validated TransformationPlan. The Plan Validator checks file existence, symbol existence, dependency conflicts, and operation conflicts against a NetworkX dependency graph. The Transformer applies operations in dependency order using tree-sitter AST manipulation, creating a file backup first. The Verifier runs pytest - any failure triggers automatic rollback. Clean changes are handed off to the Staging Manager via GitPython and summarised in a Report.

Supported Operations

Each operation is a typed Pydantic model. The LLM populates the fields — Morph validates and executes.

How the Dependency Graph Works

Before validating any plan, Morph parses the entire codebase with tree-sitter and builds a NetworkX dependency graph. This graph is used to:

Detect files that import the symbol being moved or renamed
Sort operations so dependencies are updated before dependents
Warn when a move will cascade across downstream files
Prevent circular dependency introduction from module extraction

This is what makes Morph safe to run on real codebases - the plan is validated against the actual dependency structure before a single file is touched.

Rollback Guarantee

Every non-dry-run apply call snapshots all affected files before touching them. If pytest reports failures after transformation, Morph restores from the snapshot automatically. The workspace is always left in a clean, known-good state.

Live Results

A real dry-run against anthropic/claude-haiku-4-5 via OpenRouter - the LLM parsed a natural language rename goal and produced a validated RenameSymbol plan in under 5 seconds. Full output and reproduction steps are in RESULTS.md.

Installation

pip install -e .

For local inference, install Ollama and pull a model:

ollama pull gemma4:e4b

For cloud backends, set the relevant environment variable:
OPENROUTER_API_KEY - OpenRouter (recommended)
OPENAI_API_KEY - OpenAI
ANTHROPIC_API_KEY - Anthropic

Usage

Describe what you want in plain English. Morph figures out the operations:

morph refactor --goal "rename calculate_total to compute_total" ./src

Preview the plan without touching any files:

morph refactor --goal "extract validation logic into validate_input()" ./src --dry-run

Generate and save the plan for inspection before applying:

morph plan --goal "add type annotations to all functions in utils.py" ./src --output plan.json

Apply a saved plan:

morph refactor --plan plan.json ./src

Verify the codebase passes its own test suite:

morph verify ./src

Generate a Markdown report of the last run:

morph report ./src --format markdown --output REFACTOR_REPORT.md

Supported Models

Morph works with any provider. OpenRouter is the recommended starting point - one API key routes to every model below without separate accounts.

The planner uses temperature=0.1 - low randomness produces more consistent structured output. Unknown model strings are automatically routed through OpenRouter with no --backend flag required.

CLI Reference

morph refactor --goal "..." PATH - Generate plan from goal and apply it
morph refactor --plan FILE PATH - Apply a previously saved plan
morph refactor ... --dry-run - Show plan without modifying files
morph plan --goal "..." PATH - Generate and display plan only
morph verify PATH - Run the test suite and report pass/fail
morph report PATH - Generate Markdown/JSON report of last run

Key flags: --model, --backend, --dry-run, --no-rollback, --output

Development

Clone and install in editable mode with dev dependencies:

git clone https://github.com/dakshjain-1616/morph
cd morph
pip install -e ".[dev]"

Run the full test suite:

pytest tests/ -v

Lint and type-check:

ruff check morph/ && mypy morph/

Final Notes

Morph shifts refactoring from code generation to intent declaration. The LLM describes what to change in a structured, validated plan. The engine does the mechanical work. Tests confirm correctness. The result is refactoring that is auditable before it runs, verifiable after it runs, and automatically reversible if it breaks anything.

This project was built using NEO. NEO is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.

The code is at https://github.com/dakshjain-1616/Morph
You can also build with NEO in your IDE using the VS Code extension or Cursor.
You can use NEO MCP with Claude Code: https://heyneo.com/claude-code

Morph: AST-Level Refactoring Where the LLM Describes Intent, Not Code

Why Typed Plans Beat Source Code Generation

The Pipeline

Supported Operations

How the Dependency Graph Works

Rollback Guarantee

Live Results

Installation

Supported Models

CLI Reference

Development

Final Notes

Comments

More from this blog

AgentLiar Detector: Catch Coding Agents That Falsely Claim Task Completion

Carbon-Aware Model Training: Scheduling GPU Workloads Around Electricity Carbon Intensity

Agentsync: Version, Merge, and Audit AI Agent Configurations Like Code

CostGuard: A Real-Time Circuit Breaker That Stops AI Spend Before It Gets Out of Control

ArchGuard: Detect Architecture Drift Before It Becomes Technical Debt

Command Palette

Why Typed Plans Beat Source Code Generation

The Pipeline

Supported Operations

How the Dependency Graph Works

Rollback Guarantee

Live Results

Installation

Supported Models

CLI Reference

Development

Final Notes

Comments

More from this blog