Advent Automation 2025

Day 10: Family Office Asset Management Data Warehouse

One-line pitch: Kimball star schema for tracking multi-jurisdictional family office portfolios with SCD Type 2 for compliance and audit trails.

Part of: Advent Automation 2025 - 25 Days of Data Engineering


⚠️ IMPORTANT DISCLAIMER

This is an educational portfolio project using 100% synthetic data.


Quick Access (By Role)

For Start Here Read Time
Recruiters Executive SummaryKey Takeaways 2 min
Business Stakeholders Executive SummaryRecommendations 5 min
Technical Reviewers Executive SummaryTechnical Deep Dive 10 min
Implementation Quick StartAdaptation Guide 15 min

Executive Summary

Business Problem: Family offices managing multi-jurisdictional UHNW (Ultra-High-Net-Worth) portfolios need consolidated views across diverse asset types (financial instruments, operating companies, real estate, IP holdings) while maintaining historical compliance records for cross-border tax and estate planning.

Solution Delivered: Kimball star schema data warehouse with SCD Type 2 implementation, tracking 5 fictional families with 100+ assets across multiple jurisdictions, enabling portfolio analysis and historical audit trails.

Business Impact: Enables “as-of” date analysis for regulatory compliance (e.g., “What was the portfolio composition on 2023-06-30?”) and consolidated cross-asset reporting for wealth planning decisions.

For: Rafael (Cross-Border Wealth Planning Specialist) Industry: Wealth Management/Finance Time: 3 hours Status: ✅ Complete

🔗 Connection to Day 16: One family in this portfolio owns a European manufacturing company. The 30 operational assets (equipment, IP, certifications) from this company will be consumed by Luna’s compliance dashboard in Day 16.


Key Takeaways

Business Value

Technical Achievement

Critical Learning

Dimensional modeling enables departmental analytics from enterprise data: By including operational assets (equipment, IP, certifications) in the family office data warehouse, we maintain a consolidated wealth view while preparing conformed dimensions for downstream departmental use cases (e.g., compliance dashboards).


Business Context

The Challenge

Through conversations with Rafael about cross-border wealth planning, a common pattern emerged: family offices managing complex portfolios need to answer “What did the portfolio look like on [specific date]?” for tax filings, estate planning, and regulatory compliance across multiple jurisdictions. Traditional point-in-time snapshots are insufficient when asset classifications change (e.g., active equipment → under maintenance → disposed).

Why This Matters:

Success Criteria

From Stakeholder Perspective:

  1. Can answer “Total portfolio value for MFG Owner Family?” in <5 seconds
  2. Can reconstruct historical asset classifications for compliance (SCD Type 2 working)
  3. Can filter ONLY operational assets (Equipment, IP, Certification) for departmental analysis

Technical Validation:


Solution Overview

What It Does

Capability Business Outcome
Portfolio Value Reporting Calculates total portfolio value per family across all asset classes
Asset Allocation Analysis Shows distribution across Equity, Equipment, IP, Certification asset classes over time
Historical Compliance Tracking Reconstructs asset classifications as of any historical date (SCD Type 2)
Departmental Filtering Isolates MFG operational assets for downstream compliance dashboards (Day 16)

Architecture at a Glance

[INPUT] → [TRANSFORMATION] → [OUTPUT]

Synthetic Family Office Data → Kimball Star Schema (SQLite) → Analytical Views
         ↓                            ↓                              ↓
   5 families              SCD Type 2 on dim_assets         4 analytical queries
   100 assets               Surrogate keys                  Ready for BI tools
   24 months                Conformed dimensions

Key Results & Insights

Business Metrics (Synthetic Data)

Metric Finding Implication
Total Assets Tracked 100 assets across 5 families Demonstrates scalability for multi-family office firms
MFG Operational Assets 30 clearly identified (Equipment, IP, Cert) Enables departmental compliance analysis (Day 16)
Historical Tracking 2+ SCD Type 2 examples (asset reclassifications) Audit trail for regulatory “as-of” date queries

Analytical Capabilities Demonstrated


Risks & Limitations

Current Limitations

Limitation Impact Mitigation Path
Synthetic data only Cannot validate real portfolio patterns Pilot with 3-6 months real data before production rollout
No multi-currency handling All values in single currency Add FX dimension table and conversion logic for production
No real-time updates Snapshot-based (daily refresh) Implement incremental refresh for near-real-time needs
No tax jurisdiction rules Asset tracking only, not tax logic Partner with tax/legal for jurisdiction-specific business rules

Assumptions Made

  1. Asset classifications are semi-stable - Changes occur infrequently enough for SCD Type 2 (not streaming updates)
  2. Consolidated view is valuable - Including operational assets in family office DW provides strategic wealth view
  3. Conformed dimensions are sufficient - dim_date and dim_assets can serve both enterprise and departmental analytics

Recommendations

For Rafael

Immediate Next Steps (Week 1):

  1. Review MFG asset examples - Validate that asset types (Equipment, IP, Certification) match real-world classifications
  2. Test SCD Type 2 logic - Confirm historical tracking meets compliance reporting needs

Short-Term (Month 1):

Production Readiness:

For Portfolio/Technical Evolution

Reusability:

Scale Considerations:


How to Use This Project

Quick Start (5 minutes)

# 1. Navigate
cd advent-automation-2025/day10

# 2. Install
pip install -r day10_requirements.txt

# 3. Generate synthetic data
python day10_DATA_synthetic_generator.py

# 4. Run models
sqlite3 data/day10_family_office_dw.db < models/day10_MODEL_star_schema.sql
sqlite3 data/day10_family_office_dw.db < models/day10_MODEL_dim_assets_scd2.sql
sqlite3 data/day10_family_office_dw.db < models/day10_MODEL_fact_holdings.sql

# 5. Run analytical queries
sqlite3 data/day10_family_office_dw.db < queries/day10_QUERY_portfolio_value.sql
sqlite3 data/day10_family_office_dw.db < queries/day10_QUERY_mfg_assets_filter.sql

Expected Runtime: ~2 minutes Expected Output: day10_family_office_dw.db with 5 tables, 4 analytical query results

Adapting for Real Data

Priority Changes (Do These First):

  1. Replace synthetic generator - Connect to real custody/ERP systems in day10_DATA_extract_real.py
  2. Adjust asset classifications - Map your asset types to asset_class values (Equity, Equipment, IP, Certification)
  3. Configure SCD logic - Define what triggers valid_from/valid_to changes (disposition, regulatory updates, etc.)

Schema Mapping: | Your Data | This Project | Transform Needed | |———–|————–|——————| | asset_master.instrument_id | dim_assets.asset_id | Direct mapping | | asset_master.instrument_type | dim_assets.asset_class | Map to controlled vocab: Equity, Equipment, IP, Certification | | positions.market_val | fct_holdings.market_value | Cast to DECIMAL(18,2) | | positions.as_of_date | fct_holdings.date_key | Join to dim_date on full_date |

Business Logic Adjustments:

-- Example: Adjust asset classification logic
-- Current: Hardcoded asset classes ('Equity', 'Equipment', 'IP', 'Certification')
-- Change in: models/day10_MODEL_dim_assets_scd2.sql, line 15

-- Map your asset types to standardized classes:
CASE
    WHEN your_asset_type IN ('Stock', 'Bond', 'Fund') THEN 'Equity'
    WHEN your_asset_type = 'Machinery' THEN 'Equipment'
    WHEN your_asset_type IN ('Patent', 'Trademark') THEN 'IP'
    WHEN your_asset_type = 'Regulatory' THEN 'Certification'
END as asset_class

Full adaptation guide: [See “Detailed Adaptation” section below]


Technical Deep Dive

📋 Full Technical Documentation (Click to Expand) ### Technical Stack **Core:** - **Language:** Python 3.11+ - **Database:** SQLite 3.x (production: PostgreSQL or BigQuery) - **Modeling Tool:** Plain SQL (production: dbt Core 1.7+) **Dependencies:** ``` pandas==2.1.4 # Synthetic data generation faker==20.1.0 # Realistic fake data sqlite3 (built-in) # Database ``` ### Data Model **Star Schema:** ``` fct_holdings (Fact Table) ├── holding_key - PK (surrogate key) ├── client_key - FK to dim_clients ├── asset_key - FK to dim_assets (SCD Type 2) ├── account_key - FK to dim_accounts ├── date_key - FK to dim_date ├── quantity - Decimal (shares/units) ├── market_value - Decimal (current valuation) └── cost_basis - Decimal (original cost) dim_clients (Dimension) ├── client_key - PK (surrogate) ├── client_id - Natural key ├── client_name - "Smith Family", "MFG Owner Family", etc. └── client_type - "Family Office" dim_assets (SCD Type 2 Dimension) ├── asset_key - PK (surrogate, changes with each version) ├── asset_id - Natural key (EQ_MFG_001, IP_MFG_002, etc.) ├── asset_name - "CNC Machine Haas VF-2", "Patent EP12345", etc. ├── asset_class - "Equipment", "IP", "Certification", "Equity" ├── asset_type - Subtype within class ├── valid_from - Start date for this version ├── valid_to - End date (NULL if current) └── is_current - Boolean (TRUE for latest version) dim_accounts (Dimension) ├── account_key - PK (surrogate) ├── account_id - Natural key ├── account_name - "MFG Company Operating Account", etc. ├── account_type - "Operating", "Investment", etc. └── parent_client_key - FK to dim_clients dim_date (Conformed Dimension) ├── date_key - PK (YYYYMMDD integer) ├── full_date - DATE ├── year - Integer ├── quarter - Integer (1-4) ├── month - Integer (1-12) ├── fiscal_quarter - Integer (configurable fiscal year) └── fiscal_year - Integer ``` **Relationships:** ``` dim_clients ─(1:N)→ dim_accounts ─(1:N)→ fct_holdings ↓ (N:1) dim_assets (SCD Type 2) ↓ (N:1) dim_date ``` ### Architectural Decisions #### Decision 1: Kimball Star Schema vs Data Vault **Context:** Family offices need both analytical reporting (portfolio summaries) and detailed audit trails (historical changes). Data Vault excels at audit but requires complex joins for analytics. **Options Evaluated:** | Option | Pros | Cons | Decision | |--------|------|------|----------| | **Data Vault** | Best audit trail, highly normalized, flexible | Complex queries, slower analytics, overkill for 100 assets | ❌ Rejected | | **Kimball Star Schema** | Fast analytics, simple queries, business-friendly | Less flexible for source changes | ✅ **Chosen** | | **Normalized OLTP** | Familiar structure, transaction support | Poor analytical performance, complex joins | ❌ Rejected | **Rationale:** For a 3-hour portfolio project demonstrating family office reporting, Kimball provides the best balance of analytical performance (simple JOINs) and historical tracking (SCD Type 2 on critical dimension). Data Vault would be better for enterprise-scale multi-source integration but adds unnecessary complexity here. **Tradeoffs Accepted:** - ✅ **Gained:** Query simplicity (4-way JOIN max), fast aggregations, business-user-friendly - ⚠️ **Sacrificed:** Flexibility for adding new sources (would require dimension/fact redesign) **Generalization:** Use Kimball when analytical performance and query simplicity are priorities. Use Data Vault for enterprise-scale, multi-source integration with evolving schemas. --- #### Decision 2: SCD Type 2 on dim_assets (Not Snapshot Fact) **Context:** Need to track historical asset classifications for compliance (e.g., "Was this equipment active or under maintenance on June 30?"). Could use SCD Type 2 dimension or snapshot fact table. **Options Evaluated:** | Option | Pros | Cons | Decision | |--------|------|------|----------| | **Snapshot Fact** | Captures full state each day, easy to query | Huge storage (100 assets × 730 days), redundant data | ❌ Rejected | | **SCD Type 2 Dimension** | Storage-efficient, tracks changes only, standard pattern | Slightly more complex queries (valid_from/valid_to) | ✅ **Chosen** | | **SCD Type 1 (overwrite)** | Simplest, smallest storage | Loses history - fails compliance requirement | ❌ Rejected | **Rationale:** Asset classifications change infrequently (equipment lifecycle, regulatory updates) but must be reconstructable for audit. SCD Type 2 tracks only changes (not daily snapshots), reducing storage while preserving full history. **Tradeoffs Accepted:** - ✅ **Gained:** Storage efficiency (only change records), full audit trail - ⚠️ **Sacrificed:** Query complexity (need valid_from/valid_to filters for point-in-time) **Generalization:** Use SCD Type 2 for slowly changing attributes where history matters. Use snapshot facts for rapidly changing measures (e.g., daily inventory levels). --- #### Decision 3: Include Operational Assets in Family Office DW **Context:** One family owns a manufacturing company with operational assets (equipment, IP, certifications). Should these live in family office DW or separate operational system? **Options Evaluated:** | Option | Pros | Cons | Decision | |--------|------|------|----------| | **Separate Operational DW** | Clean separation, optimized for each domain | Loses consolidated wealth view, dual maintenance | ❌ Rejected | | **Include in Family Office DW** | Consolidated wealth reporting, prepares conformed dimensions for departmental analytics (Day 16) | Mixes enterprise and departmental concerns | ✅ **Chosen** | | **Federated query** | Keeps systems separate, joins at query time | Performance issues, complex orchestration | ❌ Rejected | **Rationale:** For comprehensive wealth planning, Rafael needs to see ALL family assets (financial + operational + IP) in one view. By including operational assets in the enterprise DW, we maintain consolidated reporting while preparing conformed dimensions (dim_assets, dim_date) for downstream departmental use (Luna's Day 16 compliance dashboard). **Tradeoffs Accepted:** - ✅ **Gained:** Consolidated wealth view, conformed dimension reuse (Day 16) - ⚠️ **Sacrificed:** Cleaner domain separation (operational assets in wealth DW) **Generalization:** Include cross-domain entities in enterprise DW when consolidated reporting is critical AND you can design conformed dimensions for departmental reuse (Kimball's "enterprise bus architecture"). --- ### Implementation Details **Key Techniques:** **1. SCD Type 2 Implementation:** ```sql -- Close current record when asset classification changes UPDATE dim_assets SET valid_to = '2024-06-30', is_current = FALSE WHERE asset_id = 'EQ_MFG_001' AND is_current = TRUE; -- Insert new record with updated classification INSERT INTO dim_assets (asset_key, asset_id, asset_name, asset_class, valid_from, valid_to, is_current) VALUES ( 2, -- New surrogate key 'EQ_MFG_001', -- Same natural key 'CNC Machine Haas VF-2', 'Maintenance', -- Updated classification '2024-07-01', -- Effective date NULL, -- Open-ended (current version) TRUE ); ``` **2. Point-in-Time Query (Leveraging SCD Type 2):** ```sql -- "What was the portfolio composition on 2024-06-15?" SELECT c.client_name, a.asset_class, SUM(h.market_value) as total_value FROM fct_holdings h JOIN dim_assets a ON h.asset_key = a.asset_key AND '2024-06-15' BETWEEN a.valid_from AND COALESCE(a.valid_to, '9999-12-31') JOIN dim_clients c ON h.client_key = c.client_key JOIN dim_date d ON h.date_key = d.date_key WHERE d.full_date = '2024-06-15' GROUP BY c.client_name, a.asset_class; ``` **3. MFG Asset Filter (For Day 16 Reuse):** ```sql -- Isolate ONLY MFG operational assets SELECT a.asset_id, a.asset_name, a.asset_class, h.market_value, acc.account_name FROM fct_holdings h JOIN dim_assets a ON h.asset_key = a.asset_key JOIN dim_accounts acc ON h.account_key = acc.account_key JOIN dim_clients c ON acc.parent_client_key = c.client_key WHERE c.client_name = 'MFG Owner Family' AND a.asset_class IN ('Equipment', 'IP', 'Certification') AND a.is_current = TRUE AND h.date_key = (SELECT MAX(date_key) FROM dim_date); ``` **Performance Characteristics:** - **Current dataset:** 100 assets × 24 months × 5 families = ~12K fact rows in <1 second - **Tested up to:** Not performance-tested beyond synthetic data (SQLite sufficient for prototype) - **Bottleneck:** SCD Type 2 point-in-time queries (valid_from/valid_to range scans) - **Optimization:** Add indexes on dim_assets(asset_id, is_current) and dim_assets(valid_from, valid_to) for production ### Testing Approach **Validation Queries:** ```sql -- 1. Row count validation (expect ~12K holdings) SELECT COUNT(*) as total_holdings FROM fct_holdings; -- 2. SCD Type 2 validation (expect multiple versions for SCD examples) SELECT asset_id, COUNT(*) as version_count, SUM(CASE WHEN is_current THEN 1 ELSE 0 END) as current_versions FROM dim_assets GROUP BY asset_id HAVING version_count > 1; -- 3. MFG asset count validation (expect exactly 30) SELECT COUNT(*) as mfg_asset_count FROM dim_assets WHERE asset_id LIKE 'EQ_MFG_%' OR asset_id LIKE 'IP_MFG_%' OR asset_id LIKE 'CERT_MFG_%'; -- 4. Data quality check (no nulls in critical fields) SELECT SUM(CASE WHEN asset_key IS NULL THEN 1 ELSE 0 END) as null_asset_keys, SUM(CASE WHEN market_value IS NULL THEN 1 ELSE 0 END) as null_market_values FROM fct_holdings; ``` **Test Results:** - ✅ All validations passed - ✅ 30 MFG assets clearly identified (EQ_MFG_001-010, IP_MFG_001-010, CERT_MFG_001-010) - ✅ SCD Type 2 examples present (2+ historical changes demonstrated)

Detailed Adaptation Guide

🔄 Step-by-Step Production Adaptation (Click to Expand) ### Step 1: Assess Your Data **Checklist:** - [ ] Do you have access to source systems? (Custody platform for financial assets, ERP for operational assets) - [ ] Does data structure match expected schema? (asset IDs, classifications, positions) - [ ] Are there data quality issues? (missing valuations, unclassified assets) - [ ] What's the data volume? (How many families? How many assets per family?) - [ ] What's the update frequency? (Daily positions? Monthly compliance snapshots?) ### Step 2: Map Your Schema | Your Column | Project Column | Transformation | |-------------|----------------|----------------| | custody.instrument_id | dim_assets.asset_id | Direct mapping | | custody.instrument_name | dim_assets.asset_name | Direct mapping | | custody.asset_type | dim_assets.asset_class | Map to controlled vocab: Equity, Equipment, IP, Certification | | positions.as_of_date | fct_holdings.date_key | Join to dim_date on full_date → date_key | | positions.market_value_usd | fct_holdings.market_value | CAST to DECIMAL(18,2) | | positions.quantity | fct_holdings.quantity | CAST to DECIMAL(18,4) | ### Step 3: Modify Data Source **Replace:** `day10_DATA_synthetic_generator.py` **With:** `day10_DATA_extract_real.py` ```python import pandas as pd from sqlalchemy import create_engine def day10_extract_from_custody_system(): """Extract financial assets from custody platform""" # Example: Connect to custody platform API or DB engine = create_engine('postgresql://custody_db') query = """ SELECT instrument_id as asset_id, instrument_name as asset_name, asset_type, as_of_date, market_value_usd, quantity, account_id, client_id FROM custody.positions WHERE as_of_date >= '2023-01-01' """ return pd.read_sql(query, engine) def day10_extract_from_operating_erp(): """Extract operational assets from ERP""" # Example: Connect to operating company ERP # Extract equipment, IP, certifications pass # Combine financial + operational assets into dim_assets + fct_holdings ``` ### Step 4: Adjust Business Logic **Files to Review:** 1. `models/day10_MODEL_dim_assets_scd2.sql` - SCD Type 2 logic (what triggers valid_from/valid_to?) 2. `day10_CONFIG_settings.py` - Asset classification mapping, fiscal year start **Common Adjustments:** ```python # In day10_CONFIG_settings.py # Map YOUR asset types to standardized classes DAY10_ASSET_CLASS_MAPPING = { 'Stock': 'Equity', 'Bond': 'Equity', 'Fund': 'Equity', 'Machinery': 'Equipment', 'Vehicle': 'Equipment', 'Patent': 'IP', 'Trademark': 'IP', 'Copyright': 'IP', 'ISO_Cert': 'Certification', 'Regulatory_Approval': 'Certification' } # Fiscal year start (e.g., July 1 = month 7) DAY10_FISCAL_YEAR_START = 7 # Adjust to your organization ``` ### Step 5: Validate with Sample **Test with subset:** ```bash # Extract 1 month of data for 1 family first python day10_DATA_extract_real.py \ --client-id="FAMILY_001" \ --start-date="2024-11-01" \ --end-date="2024-11-30" ``` **Compare to known values:** - [ ] Portfolio value matches existing report for this family/month: ✅/❌ - [ ] Asset count matches expected number: ✅/❌ - [ ] SCD Type 2 logic handles real classification changes: ✅/❌ ### Step 6: Scale to Full Data **Incremental approach:** 1. Week 1: 1 family, 1 month 2. Week 2: 1 family, 6 months 3. Week 3: All families, 6 months 4. Week 4: All families, full historical (24+ months) **Monitor:** - Execution time (SQLite → PostgreSQL if > 1M rows) - SCD Type 2 performance (add indexes on valid_from/valid_to) - Data quality issues (log unclassified assets, missing valuations)

Project Files

day10/
├── README.md                                # This file
├── data/
│   └── day10_family_office_dw.db            # SQLite database
├── models/
│   ├── day10_MODEL_star_schema.sql          # CREATE TABLE statements
│   ├── day10_MODEL_dim_assets_scd2.sql      # SCD Type 2 logic
│   └── day10_MODEL_fact_holdings.sql        # Fact table population
├── queries/
│   ├── day10_QUERY_portfolio_value.sql      # Total value per client
│   ├── day10_QUERY_asset_allocation.sql     # Allocation by asset class
│   ├── day10_QUERY_mfg_assets_filter.sql    # MFG operational assets only (FOR DAY 16)
│   └── day10_QUERY_historical_tracking.sql  # SCD Type 2 demo
├── docs/
│   └── day10_ERD_star_schema.md             # Text-based ERD
├── day10_DATA_synthetic_generator.py        # Generates 5 families, 100 assets, 24 months
└── day10_requirements.txt                   # Python dependencies

Appendix

Time Breakdown

Phase Time %
Planning & Setup 30 min 17%
Development (Schema + SCD Logic) 90 min 50%
Testing & Validation 30 min 17%
Documentation 30 min 17%
Total 180 min 100%

Learning Outcomes

Technical Skills Acquired:

Business Domain Understanding:

Process Improvements for Next Project:

Naming Conventions Reference

All project files use day10_ prefix for isolation.

MFG Asset Naming (CRITICAL FOR DAY 16):

See PROMPT_project_setup.md for complete naming standards.



For Rafael

This project demonstrates technical patterns commonly discussed in cross-border wealth planning:

Key Architectural Patterns:

What This Does NOT Do:

Potential Real-World Applications:


Built in 3 hours Educational Portfolio Project View All 25 Days →