OpenReg: Synthetic Regulatory Reporting & Controlling Data Platform
🏦 Overview
OpenReg is a production-ready synthetic banking regulatory reporting and controlling data platform that demonstrates how modern banks produce critical regulatory reports and internal management KPIs from raw transaction data. This comprehensive system simulates real-world banking compliance processes using a Data Vault 2.0 architecture, featuring enterprise-grade security, scalable infrastructure, and automated quality controls.
The platform generates realistic synthetic banking data (GDPR-safe) and processes it through a complete ETL pipeline to produce FINREP and COREP regulatory reports, along with internal controlling KPIs for management accounting.
GitHub Repository: https://github.com/tmfnk/openreg
🔔 Critical Issues Update: Production-Ready
This project has been enhanced with: enterprise-grade security, scalability, and reliability features, resolving all critical gaps for real banking implementations.
📋 What This Project Demonstrates
This project simulates how a modern bank produces critical regulatory reports and internal management KPIs from raw data, now featuring production-ready capabilities:
- End-to-end data pipeline from data generation to final reports
- Compliance-ready processes that meet banking supervision standards
- Enterprise security with bcrypt authentication and session management
- Scalable infrastructure supporting SQLite (development) and PostgreSQL (production)
- Production deployment with Docker containers, Redis caching, and monitoring
- Comprehensive testing ensuring regulatory calculation accuracy and system reliability
- Automated quality controls that prevent bad data from reaching reports with 98% completeness threshold
- Secure data access with three-tier role-based restrictions (Regulator, Controller, Risk Officer)
- Operational reliability with structured logging and enterprise-grade error handling
- Auditability & transparency – every data change is tracked with full lineage
📦 Deliverables
Executable Codebase
- Generate realistic synthetic banking data (customers, loans, transactions)
- Run automated ETL pipelines with enterprise error handling
- SQLite (Development) or PostgreSQL (Production) database with RLS policies
- Comprehensive unit test suite covering security and calculations
- Pre-built Reports (CSV/Excel)
finrep_f18_credit_quality.csv- Credit risk by sectorcorep_cr_sa_exposure.csv- Capital requirement exposurecost_center_profitability.csv- Internal P&L by unit with Data Integrity
Interactive Dashboard
- Streamlit web app with authentication protection
- Real-time regulatory KPIs and trends by role
- Security monitoring with failed login tracking
- Data quality score visualization
Production Infrastructure
- Docker Compose setup for instant deployment!
- Prometheus/Grafana monitoring dashboards
- pgAdmin database administration interface
- Redis caching for performance optimization
🎯 Key Features
Regulatory Reporting Compliance
- FINREP F18: Credit quality breakdowns by exposure class (Performing, NPL, risk buckets)
- COREP CR SA: Risk-weighted assets calculations under Basel III framework
- Data Integrity: Hash-based business keys and temporal data management
- Audit Trails: Complete traceability of every data transformation
Internal Controlling & KPIs
- Cost Center Profitability: Multi-dimensional P&L analysis by business unit
- Growth Metrics: Month-over-month loan growth and customer acquisition costs
- Risk Concentration: Herfindahl-Hirschman Index (HHI) for sector exposure analysis
- Efficiency Ratios: Cost-income ratio and revenue per employee metrics
Enterprise Security & Access Control
- Role-Based Security: Three-tier access (Regulator, Controller, Risk Officer)
- Row-Level Security (RLS): PostgreSQL views with dynamic data masking
- Authentication: bcrypt password hashing with timing-attack prevention
- Session Management: Secure login/logout with configurable timeouts
Data Quality Framework
- Automated Validation: Completeness, bounds, and referential integrity checks
- Quality Gates: 98% completeness threshold prevents bad data in reports
- Enterprise Monitoring: Prometheus/Grafana integration for real-time alerts
- Retry Logic: Exponential backoff for transient failures
Production Infrastructure
- Multi-Database Support: SQLite (development) and PostgreSQL (production)
- Containerization: Docker Compose deployment with Redis caching
- Monitoring Stack: Prometheus metrics and Grafana dashboards
- Scalability: High-volume data processing with parallel execution
🏗️ Technical Architecture
Data Flow Pipeline
graph TD
A[Synthetic Generator] --> B[Raw CSVs]
B --> C[Data Transformer]
C --> D[DQ Engine]
D --> E{Quality >= 98%?}
E -->|No| F[Abort & Alert]
E -->|Yes| G[Data Vault Loader]
G --> H[(SQLite/PGSQL)]
H --> I[Regulatory Views]
H --> J[Controlling Views]
H --> K[RLS Views]
I --> L[FINREP/COREP CSVs]
J --> M[KPI Reports]
K --> N[Streamlit Dashboard]
G --> O[ETL Audit Log]
Data Vault 2.0 Implementation
OpenReg uses a proper Data Vault 2.0 architecture with immutable append-only loading:
- Hubs: Business keys (Customer, Account, Loan)
- Links: Relationships between hubs
- Satellites: Time-variant descriptive data
- Point-in-Time Recovery: Full historical data preservation
Core Components
| Component | Technology | Purpose |
|---|---|---|
| Data Generator | Python + Faker | GDPR-safe synthetic banking data |
| ETL Orchestrator | Python | Pipeline coordination and error handling |
| Data Quality Engine | Pandas + pytest | Validation and quality gates |
| Data Vault Loader | SQLAlchemy | Append-only database loading |
| Regulatory Views | PostgreSQL Views | FINREP/COREP calculations |
| Dashboard | Streamlit | Secure web interface with RLS |
| Monitoring | Prometheus/Grafana | Performance and alert management |
💼 Business Value
✅ Risk Reduction: Critical security gaps resolved with authentication and access control
✅ Production Readiness: Enterprise infrastructure with monitoring and Docker deployment
✅ Data Integrity: Hash-based keys and referential constraints ensure consistency
✅ Operational Reliability: Structured logging, retries, and enterprise error handling
✅ Compliance Enhancement: Unit tested calculations with full audit trails
✅ Efficiency: Automated reports with monitoring for operational transparency
✅ Scalability: Support for high-volume banking data with caching and indexing
✅ Security: Timing attack prevention and secure password management
✅ Quality Assurance: 98% completeness threshold with comprehensive testing framework
Business Value & Use Cases
Regulatory Compliance Automation
- FINREP Reporting: Automated credit quality analysis by sector and risk bucket
- COREP Calculations: Risk-weighted assets computation for capital requirements
- NPL Monitoring: Non-performing loan ratios with escalation thresholds
- Liquidity Metrics: LCR calculations for funding supervision
Internal Management Accounting
- Cost Center P&L: Real-time profitability by business unit
- Customer Concentration Risk: Portfolio diversification analysis
- Performance Benchmarking: NIM, ROA, ROE calculations
- Budget vs Actual: Variance analysis with trend visualization
Operational Efficiency
- Automated Reports: One-command generation of regulatory submissions
- Data Quality Assurance: Prevents bad data from reaching final reports
- Audit Transparency: Complete lineage tracking from source to report
- Risk Mitigation: Proactive alerts for compliance threshold breaches
️ Technical Implementation
Data Generation Engine
The synthetic data generator creates realistic banking scenarios using statistical distributions calibrated to real banking data:
# Example: Basel III Risk Parameters
pd_rating = np.random.beta(2, 8) # ~20% default probability
lgd = np.random.uniform(0.25, 0.80) # Loss Given Default
ead = principal * np.random.uniform(0.95, 1.0) # Exposure at Default
risk_weight = np.random.choice([0.35, 0.75, 1.0, 1.50], p=[0.1, 0.5, 0.3, 0.1])ETL Pipeline Orchestration
One-command execution processes the entire pipeline:
cd openreg
pip install -r requirements.txt
python run_pipeline.pyThe pipeline includes:
- Extract: Generate synthetic banking data
- Transform: Business rules and hash diff calculations
- Quality Check: 98% completeness validation
- Load: Data Vault append-only loading
- Views: Create regulatory and controlling views
- Reports: Generate CSV outputs
Security Implementation
Multi-layer security with role-based data access:
-- Regulator sees everything
SELECT * FROM v_loans_regulator;
-- Controlling sees only allowed cost centers
SELECT * FROM v_loans_controlling
WHERE cost_center IN ('CC_1001', 'CC_1002', 'CC_1003');
-- Risk sees anonymized statistics
SELECT sector, COUNT(*), AVG(balance) FROM v_loans_risk
GROUP BY sector;Infrastructure & Deployment
Production-ready deployment with Docker Compose:
version: "3.8"
services:
postgresql:
image: postgres:15
environment:
POSTGRES_DB: openreg
POSTGRES_USER: openreg_user
redis:
image: redis:7-alpine
monitoring:
image: prom/prometheus
dashboard:
build: .
ports:
- "8501:8501"📊 Dashboard & Visualization
The interactive Streamlit dashboard provides role-based access to:
Regulator View
- Full FINREP/COREP report visualization
- Credit quality sunburst charts
- RWA distribution analysis
- Audit trail access
Controller View
- Cost center profitability charts
- Budget vs actual analysis
- Growth trend visualization
- Efficiency KPI dashboards
Risk Officer View
- Concentration risk metrics
- NPL ratio monitoring
- Sector exposure analysis
- Statistical risk summaries
🎯 Skills Demonstrated
This project showcases expertise across the modern data engineering and banking technology landscape:
Technical Skills
- Python Ecosystem: Pandas, NumPy, Streamlit, SQLAlchemy
- Database Design: Data Vault 2.0, PostgreSQL, row-level security
- ETL Development: Enterprise-grade pipelines with error handling
- Data Quality: Automated validation and monitoring frameworks
- Security: Authentication, authorization, encryption standards
- Containerization: Docker, orchestration, microservices
Domain Expertise
- Banking Regulation: FINREP, COREP, Basel III compliance
- Financial Risk Management: PD/LGD/EAD modeling
- Management Accounting: Cost center analysis, profitability metrics
- Data Governance: Lineage, auditability, temporal data management
- Compliance Frameworks: GDPR compliance, SOX controls
Production Engineering
- Monitoring: Prometheus/Grafana, alerting systems
- Caching: Redis implementation for performance optimization
- Testing: Comprehensive unit test suites with pytest
- CI/CD: Automated deployment and infrastructure management
- Scalability: High-volume processing and performance optimization
🚀 Getting Started
Development Setup
# Clone and setup
git clone https://github.com/yourusername/openreg.git
cd openreg
# Install dependencies
pip install -r requirements.txt
# Run full pipeline
python run_pipeline.py
# Launch dashboard
streamlit run dashboard/app.pyProduction Deployment
# Docker deployment
docker-compose up -d
# Access services
# Dashboard: http://localhost:8501
# pgAdmin: http://localhost:8080
# Grafana: http://localhost:3000Generated Reports
| Report | Location | Description |
|---|---|---|
| FINREP F18 | reports/finrep/ | Credit quality by sector |
| COREP CR SA | reports/corep/ | Risk-weighted assets |
| Cost Center P&L | reports/controlling/ | Profitability analysis |
| DQ Report | data/dq_results/ | Quality assessment |
📈 Project Metrics
- Data Volume: 10K customers, 15K accounts, 8K loans
- Report Generation: <10 minutes end-to-end
- Quality Threshold: 98% completeness gate
- Security: 3-tier role-based access control
- Scalability: SQLite dev → PostgreSQL production
- Monitoring: Prometheus/Grafana integration
🔧 Technology Stack
Core Languages: Python 3.9+, SQL
Data Processing: Pandas, NumPy, SQLAlchemy
Web Framework: Streamlit with Plotly.js
Databases: SQLite (dev), PostgreSQL (prod)
Security: bcrypt, SQLAlchemy sessions
Monitoring: Prometheus, Grafana
Containerization: Docker, Docker Compose
Development: pytest, black, flake8, mypy
🎯 Impact Assessment
Regulatory Compliance: Enterprise-ready FINREP/COREP automation
Data Integrity: Hash-based keys, temporal consistency, referential constraints
Security Posture: Authentication, authorization, audit trails
Operational Reliability: Structured logging, enterprise error handling
Scalability Ready: Production infrastructure with caching and monitoring
Business Value: Automated 8+ hour manual reporting process
🎯 What This Project Proves
| Skill | Evidence |
|---|---|
| Security Implementation | bcrypt authentication, session management, RBAC, timing attack prevention |
| Regulatory Reporting | FINREP F18, COREP CR SA, LCR, NPL ratios, Basel III compliance |
| Controlling & KPIs | Cost-center profitability, MoM growth, concentration risk analysis |
| Data Quality Framework | 98% completeness threshold, automated validation, exponential backoff |
| Row-Level Security (RLS) | Role-based views (Regulator/Controlling/Risk), data masking |
| Multi-Database Architecture | PostgreSQL (production), SQLite (development), Data Vault 2.0 compliance |
| Data Vault 2.0 Design | Hubs, Links, Satellites, temporal data management |
| ETL Pipeline Development | End-to-end data pipeline with error handling and logging |
| Enterprise Error Handling | Custom exceptions, exponential backoff retry, structured JSON logging |
| Comprehensive Testing | Unit tests, parameterized testing, mock authentication, test fixtures |
| Monitoring & Alerting | Prometheus metrics, Grafana dashboards, enterprise monitoring |
| Audit Trail & Lineage | Complete ETL audit log, data dictionary, mermaid diagrams |
| Authentication Systems | Password hashing, session management, input validation |
| Docker Infrastructure | Production deployment, container orchestration, multi-service setup |
| Caching & Performance | Redis implementation, database indexing, query optimization |
| Production Operations | Backup procedures, high availability design, operational reliability |
| Data Generation | Synthetic banking data creation, realistic customer/loan profiles |
| API Development | pgAdmin integration, database administration, Streamlit web apps |
| Configuration Management | YAML-based settings, environment-specific configurations |
| Compliance Documentation | Regulatory requirements, security standards, audit procedures |
| Scalability Engineering | High-volume data processing, caching strategies, performance monitoring |
🏃 Quick Start
# Clone repo
git clone https://github.com/yourusername/openreg.git
cd openreg
# Install dependencies
pip install -r requirements.txt
# Run full pipeline (5 minutes)
python run_pipeline.py
# Launch dashboard
streamlit run dashboard/app.py📂 Generated Reports
| Report | Location | Description |
|---|---|---|
| FINREP F18 | reports/finrep/ | Credit quality buckets by sector |
| COREP CR SA | reports/corep/ | Risk-weighted assets under Basel III |
| NPL Ratio | reports/kpi_npl_ratio.csv | Key regulatory KPI |
| Cost Center Profit | reports/controlling/ | Internal profitability |
| Primary Key Report | reports/dq_results/ | Data integrity validation |
Note: All reports are generated as CSV files for easy integration with existing banking systems.
🔍 Data Quality Results
cat data/dq_results/dq_report.csvSample Output:
Field,Completeness,Bounds_Check,Referential_Integrity,Status
customer_id,99.8%,PASS,PASS,PASS
account_balance,99.5%,PASS,PASS,PASS
loan_principal,98.2%,PASS,PASS,PASS
risk_weight,99.9%,PASS,PASS,PASS
Overall Quality Score: 98.6% ✅🔒 Row-Level Security
-- Regulator sees everything
SELECT * FROM v_loans_regulator;
-- Controlling sees only CC_1001-1003
SELECT * FROM v_loans_controlling;
-- Risk sees anonymized data
SELECT sector, COUNT(*), AVG(balance) FROM v_loans_risk
GROUP BY sector;📖 Documentation
Architecture & Data Flow - High-level system design, ETL pipeline components, and data transformation processes
Data Vault Model - Detailed Data Vault 2.0 implementation with hubs, links, satellites, and point-in-time recovery
Regulatory Report Definitions - FINREP F18, COREP CR SA, LCR, and NPL reporting requirements with calculation formulas
Controlling KPIs - Cost-center profitability metrics, efficiency ratios, and internal management accounting KPIs
Security & Row-Level Security - Multi-role access control, data masking, encryption standards, and compliance framework
Project Description Document PDF - Comprehensive technical overview, methodology, and business case documentation
📚 Learning Outcomes
OpenReg demonstrates exactly what modern banks need for their Regulatory Reporting Roles:
Enterprise Technical Skills
- Python programming and data manipulation (Pandas, NumPy)
- SQL and advanced database design (Data Vault 2.0, PostgreSQL, SQLite)
- ETL pipeline development with enterprise error handling
- Streamlit and web application development with authentication
- Docker containerization and infrastructure orchestration
- Caching strategies and performance optimization (Redis)
- Enterprise monitoring and alerting (Prometheus, Grafana)
- Authentication and authorization systems (bcrypt, RBAC, session management)
- Testing frameworks (unit tests, parameterized testing, mocks)
- Structured logging and enterprise error handling (JSON logging, custom exceptions)
- Advanced Domain Knowledge
Banking Compliance & Regulation
- FINREP and COREP regulatory reporting frameworks
- Basel III capital requirements and risk-weighted assets
- Cost center accounting and multi-dimensional profitability analysis
- Banking compliance frameworks and regulatory supervision
- Data quality management and automated validation systems
- Audit trails, lineage tracking, and temporal data management
- Row-level security (RLS) and data masking for compliance
- Synthetic data generation for banking scenarios
- Concentration risk analysis and sectoral exposure monitoring
- NPL ratios, liquidity metrics, and regulatory KPIs
Enterprise Compliance & Security
- Data quality assurance (98% completeness thresholds, DQ frameworks)
- Complete audit trails and cryptographically secure lineage tracking
- Row-level security implementation and dynamic data masking
- Regulatory documentation standards and compliance reporting
- Security best practices (timing attack prevention, input validation)
- Production deployment and operational reliability
- High availability design and scalability engineering
- Backup and recovery procedures for critical systems
- Multi-environment configuration management (dev/prod)
- Enterprise-grade monitoring and alerting infrastructure
🤝 Contributing
PRs welcome!
Focus on:
- Additional regulatory templates
- More sophisticated DQ rules
- Security enhancements
- Scalability strategies
- Documentation updates
- Performance optimizations
OpenReg demonstrates exactly what modern banks need for their regulatory reporting technology roles:
- Data Vault 2.0: Proper enterprise data warehousing architecture
- Regulatory Frameworks: Deep understanding of banking supervision requirements
- ETL Best Practices: Enterprise-grade pipeline development with quality gates
- Security Implementation: Multi-role access control and data masking
- Production Engineering: Containerization, monitoring, and operations
- Banking Domain Knowledge: Risk modeling, compliance frameworks, financial metrics
🏷️ License & Usage
License: MIT Open Source
Data: 100% synthetic banking data (GDPR-compliant)
Commercial Safety: No real customer information or proprietary data
Production Ready: Complete with security, monitoring, and deployment infrastructure