DAG Patch Analyzer - Prototype Application

Application Overview

A web application that ingests Git commits and auto-regresses them into DAG-based patch representations, with an interactive interface for exploring the factorization process and resulting operation graph.

Core Functionality

Input

Processing Pipeline

  1. Clone/Fetch - Retrieve repository and commit data
  2. State Extraction - Generate before/after tar archives
  3. Auto-Regression - Factor changes into operation DAG
  4. Analysis - Compute metrics, detect patterns
  5. Storage - Persist results for exploration

Output

Architecture

Backend Services

Git Ingestion Service

1
2
3
4
5
6
7
Input: repo_url, commit_id
Output: {before_state, after_state, metadata}

- Clone repository (shallow clone for efficiency)
- Checkout commit and parent
- Generate tar archives of both states
- Extract commit metadata (author, message, timestamp, files_changed)

Factorization Engine

1
2
3
4
5
6
7
8
9
10
11
12
Input: {before_state, after_state}
Output: operation_dag

Algorithm phases:
1. Whitespace isolation
2. Rename/move detection (file and symbol level)
3. Copy detection (code duplication)
4. Regex pattern matching (common replacements)
5. Reorder detection (block movements)
6. Binary delta generation (residual)
7. Dependency graph construction
8. Description length optimization

Analysis Service

1
2
3
4
5
6
7
8
9
10
Input: operation_dag
Output: metrics, patterns, insights

Computes:
- Compression ratio (operations vs raw diff size)
- Operation type distribution
- Dependency depth and parallelism potential
- Pattern matching (known refactoring types)
- Semantic coherence scores
- Change impact radius

Storage Layer

1
2
3
4
- PostgreSQL for metadata and analysis results
- Object storage (S3/local) for tar archives
- Graph database (Neo4j optional) for DAG queries
- Content-addressed blob store for operations

Frontend Interface

Main Views

1. Commit Input View

2. Processing View

3. Graph Explorer View Primary interface for exploring results:

Left Panel: DAG Visualization

Right Panel: Operation Details

Bottom Panel: Global Metrics

4. Comparison View

5. Export View

Technical Stack

Backend

Frontend

Infrastructure

Key Features

Factorization Visualization

Step-by-step Replay

Alternative Factorizations

Interactive Analysis

Operation Filtering

Dependency Exploration

Code Context

Pattern Recognition

Auto-Detection

Pattern Library

Metrics Dashboard

Compression Analysis

Complexity Metrics

Semantic Analysis

User Workflows

Workflow 1: Analyze Single Commit

  1. Enter repo URL + commit hash
  2. Wait for processing (show progress)
  3. Explore DAG in graph view
  4. Click operations to see details
  5. Export results or save analysis

Workflow 2: Compare Commits

  1. Load multiple commits
  2. View side-by-side DAGs
  3. Identify common operation patterns
  4. See metric comparisons
  5. Export pattern library

Workflow 3: Debug Factorization

  1. Load commit with unexpected results
  2. Step through factorization phases
  3. View alternative decompositions
  4. Adjust algorithm parameters (advanced mode)
  5. Re-run with new settings

Workflow 4: Pattern Mining

  1. Analyze repository history (bulk mode)
  2. Build pattern library automatically
  3. Browse discovered patterns
  4. Search for pattern occurrences
  5. Export reusable transform templates

API Endpoints

REST API

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
POST /api/analyze
  body: {repo_url, commit_id, options}
  returns: {job_id}

GET /api/jobs/{job_id}
  returns: {status, progress, result_id}

GET /api/results/{result_id}
  returns: {dag, metrics, metadata}

GET /api/results/{result_id}/graph
  returns: graph data in various formats

POST /api/compare
  body: {result_ids: [...]}
  returns: comparison analysis

GET /api/patterns
  returns: saved pattern library

POST /api/patterns
  body: {pattern_data}
  saves pattern for reuse

WebSocket

1
2
3
4
WS /api/analyze/stream
  - Real-time factorization progress
  - Live metrics updates
  - Step-by-step operation discovery

MVP Scope (Phase 1)

Core Functionality

Deferred Features (Phase 2+)

Implementation Phases

Phase 1: Proof of Concept (2-3 weeks)

Phase 2: Enhanced Factorization (2-3 weeks)

Phase 3: Polish & Features (2-3 weeks)

Phase 4: Production Ready (ongoing)

Success Metrics

Technical

User Experience

Example Use Cases

Large Refactoring Analysis

Pattern Learning

Merge Conflict Prevention

Open Questions