Paperless-ngx - Self-Hosted Document Management
Basic Information
- Name: Paperless-ngx
- GitHub: https://github.com/paperless-ngx/paperless-ngx
- Official Website: https://docs.paperless-ngx.com/
- Type: Open-source self-hosted document management system
- License: GPL v3
- Predecessors: Paperless → Paperless-ng → Paperless-ngx
Product Description
Paperless-ngx is a community-maintained document management system that transforms physical documents into searchable online archives. With features like OCR recognition, machine learning-based automatic classification, and a tagging system, it enables users to achieve true "paperless" office work. The project is the official successor to the original Paperless and Paperless-ng, maintained by a community team.
Core Features/Characteristics
- OCR Document Recognition: Uses the open-source Tesseract engine, supporting 100+ languages
- Machine Learning Auto-Classification: Automatically adds tags, correspondents, and document types
- Drag-and-Drop Upload: Full application-wide drag-and-drop document upload
- Email Processing: Automatically imports documents from email accounts, supports multiple accounts and rule configuration
- Full-Text Search: Full-text search across all document content
- Multi-Core Parallel Processing: Utilizes multi-core CPUs for parallel document processing
- Custom Views: Saveable custom views displayed on the dashboard and sidebar
- LLM Integration: Supports local LLMs for assisted document management and classification
- Mobile-Friendly: Responsive web interface
Technical Architecture
- Backend: Python/Django
- Database: PostgreSQL/SQLite
- OCR Engine: Tesseract
- Search Engine: Whoosh/Solr
- Deployment: Docker/Docker Compose
- Message Queue: Redis
Business Model
Completely free and open-source (GPL v3). The project is maintained by community volunteers and accepts donations via GitHub Sponsors.
Target Users
- Individuals and families needing to digitize paper documents
- Small businesses and freelancers for document management
- Self-hosting enthusiasts
- Organizations requiring compliant document archiving
Competitive Advantages
- Best self-hosted solution focused on document management
- OCR + ML auto-classification significantly reduces manual sorting work
- Active community maintenance and frequent updates
- Supports integration with local LLMs for AI-assisted document understanding
- Rich ecosystem of third-party integrations and tools
Community Ecosystem
- GitHub Discussions: Community discussions and feature requests
- Matrix Chat Room: Real-time communication
- Rich Related Projects ecosystem (mobile apps, CLI tools, integrations, etc.)
- Multiple teams responsible for different aspects (frontend, CI/CD, etc.)
Relationship with OpenClaw
OpenClaw can integrate with Paperless-ngx to enable AI agents to automatically process documents—scanning, classifying, searching, and extracting information. For example, users can instruct OpenClaw via a messaging platform to search for specific invoices or contracts in Paperless-ngx.
Sources
External References
Learn more from these authoritative sources: