DVC - Data Version Control
Basic Information
- Company/Brand: DVC (Data Version Control), originally part of Iterative.ai, acquired by lakeFS in 2025
- Founders: Dmitry Petrov, Ivan Shcheklein
- Country/Region: USA
- Official Website: https://dvc.org/
- GitHub: https://github.com/treeverse/dvc
- Type: Open-source data version control tool
- Founded: 2017 (Iterative.ai company established in 2018)
- Funding Status: Acquired by lakeFS in November 2025
Product Description
DVC is an open-source version control system designed for data science and machine learning projects, offering a Git-like experience to organize data, models, and experiments. DVC is a command-line tool and VS Code extension that helps develop reproducible machine learning projects. In November 2025, DVC was acquired by lakeFS and will continue as an independent open-source tool, focusing on data scientists handling smaller datasets.
Core Features/Characteristics
- Data Version Control: Git-like version management for data and models, storing version information in Git
- Cloud Storage Backend: Supports storing data in various cloud storage services (S3, GCS, Azure, etc.)
- Lightweight Pipelines: Define ML pipelines, running only steps affected by changes
- Experiment Management: Track and compare ML experiments
- Model Registry: Manage model lifecycle in an auditable manner
- Metrics and Plots: Capture pipeline metrics and visualizations
- Access Anywhere: Access versioned data from any environment
Business Model
- DVC (Open Source): Completely free
- DVC Studio (formerly Iterative Studio):
- Previously offered free and paid tiers of a web interface
- Integration direction post-acquisition by lakeFS pending
- lakeFS Integration: Enterprise-level data version control provided by lakeFS
Deployment Methods
- pip installation (
pip install dvc) - VS Code extension
- Integrated with Git repositories
- Supports all major operating systems
Target Users
- Data scientists
- ML engineers
- Research teams requiring data version control
- ML teams using Git workflows
- Developers handling small to medium-sized datasets
Competitive Advantages
- Git-like user experience, low learning curve
- Fully open-source with an active community
- Seamless integration with existing Git workflows
- Supports multiple cloud storage backends
- Lightweight pipelines reduce unnecessary computation
Comparison with Competitors
| Dimension | DVC | lakeFS | Git LFS |
|---|---|---|---|
| Data Scale | Small to medium | Enterprise-level large scale | Small scale |
| Git Integration | Deep integration | Independent use | Native Git |
| ML Pipelines | Built-in | None | None |
| Experiment Tracking | Built-in | None | None |
| Cloud Storage | Multiple backends | Multiple backends | Git server |
Relationship with the OpenClaw Ecosystem
DVC provides data and model version control capabilities to the OpenClaw ecosystem. When developing and iterating AI agents, OpenClaw requires version management for training data, fine-tuning datasets, and model weights. DVC's Git integration allows unified management of data version control alongside code version control, enhancing the reproducibility of ML workflows. Although DVC was acquired by lakeFS, it remains an excellent choice for small to medium-sized projects as an open-source tool.