One-line pitch: Dockerized data pipeline that extracts on-chain transparency metrics from Cardano blockchain via Blockfrost API and loads to BigQuery for verifiable analysis.
Part of: Advent Automation 2025 - 25 Days of Data Engineering
Business Problem: Crypto/blockchain teams need verifiable on-chain data to prove network decentralization, fee transparency, and real adoption metrics beyond marketing claims.
Solution Delivered: Containerized Python pipeline extracting Cardano network metrics (3000+ stake pools, $0.17 avg fees, transaction volumes) from Blockfrost API to BigQuery with full transparency trail.
Business Impact: Enables data-driven verification of blockchain values - decentralization proof (3000 pools vs Bitcoin’s 5), accessibility proof (avg $0.17 fees vs Ethereum’s $5-50), and real adoption metrics.
| For: Blockchain/Crypto Analyst | Industry: Crypto/Blockchain | Time: 3 hours | Status: ✅ Complete |
Pedro teaches that Cardano’s core value is transparency:
But raw blockchain data is hard to query. This pipeline makes transparency accessible.
| Metric | Traditional Bank | Cardano Blockchain |
|---|---|---|
| Transaction history | Private (only yours) | Public (everyone’s) |
| Network fees | Hidden in fine print | On-chain, auditable |
| Centralization | Few entities control | 3000+ stake pools |
| Verification | “Trust us” | “Verify yourself” |
1. “Cardano is actually decentralized”
SELECT AVG(stake_pools_active) as avg_active_pools
FROM `project.cardano_data.cardano_network_activity`;
-- Result: ~3000 pools (vs. Bitcoin's 4-5 mining pools controlling 51%)
2. “Fees are actually low”
SELECT AVG(avg_transaction_fee) as avg_fee_usd
FROM `project.cardano_data.cardano_network_activity`;
-- Result: ~$0.17 per transaction (vs. Ethereum $5-50)
3. “The network is actually used”
SELECT SUM(total_transactions) as total_txs
FROM `project.cardano_data.cardano_network_activity`
WHERE timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY);
-- Result: Real adoption numbers, not marketing claims
┌─────────────────────┐
│ Blockfrost API │ ← Official Cardano infrastructure
│ (On-chain data) │ Free: 50K requests/day
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Docker Container │ ← Containerized extraction
│ - Extract script │
│ - Load script │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ BigQuery │ ← Queryable transparency metrics
│ cardano_data │ Enable SQL analysis
└─────────────────────┘
day04/
├── data/
│ ├── raw/ # Raw API responses (if needed)
│ └── processed/
│ └── cardano_transparency.csv # Local backup of metrics
│
├── day04_CONFIG_settings.py # Configuration management
├── day04_DATA_extract_blockfrost.py # Extract on-chain metrics
├── day04_DATA_load_bigquery.py # Load to BigQuery
│
├── Dockerfile # Container definition
├── docker-compose.yml # Orchestration config
├── day04_requirements.txt # Python dependencies
├── .env.example # Environment template
└── README.md # This file
# Visit: https://blockfrost.io
# Sign up (email only, no credit card)
# Copy your API key (starts with "mainnet")
# Navigate to day04
cd day04
# Copy environment template
cp .env.example .env
# Edit .env with your values
nano .env
Required variables in root ../config/.env:
DAY04_BLOCKFROST_API_KEY=mainnetXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
DAY04_BLOCKFROST_NETWORK=mainnet
DAY04_GCP_PROJECT_ID=your-gcp-project-id
DAY04_BQ_DATASET=cardano_data
DAY04_BQ_TABLE=cardano_network_activity
# Install gcloud CLI if needed
# https://cloud.google.com/sdk/docs/install
# Authenticate
gcloud auth application-default login
# Set project
gcloud config set project YOUR_PROJECT_ID
docker-compose up --build
This will:
data/processed/cardano_transparency.csv# Extract
docker-compose up
# Load to BigQuery
docker-compose run cardano-transparency python day04_DATA_load_bigquery.py
# Install dependencies
pip install -r day04_requirements.txt
# Extract metrics
python day04_DATA_extract_blockfrost.py
# Load to BigQuery
python day04_DATA_load_bigquery.py
The pipeline extracts these transparency metrics every run:
| Metric | Description | Why It Matters |
|---|---|---|
timestamp |
When data was collected | Time-series analysis |
total_transactions |
Total transactions in epoch | Network usage (transparency) |
active_addresses |
Estimated active addresses | Real adoption |
block_count |
Blocks produced in epoch | Network activity |
epoch |
Cardano epoch number (~5 days) | Time reference |
stake_pools_active |
Number of active stake pools | Decentralization proof |
total_ada_staked |
Total ADA staked | Trust in network |
avg_transaction_fee |
Average fee in USD | Accessibility proof |
============================================================
🔗 EXTRACTING CARDANO BLOCKCHAIN TRANSPARENCY METRICS
============================================================
📊 Fetching latest epoch data...
✓ Epoch 450
✓ Transactions: 2,847,391
✓ Blocks: 21,567
🌐 Fetching network information...
✓ Circulating supply: 34,567,890,123 ADA
✓ Active stake: 23,456,789,012 ADA
🏊 Counting active stake pools...
✓ Active stake pools: ~3,000
ℹ️ This demonstrates decentralization vs. Bitcoin's ~5 mining pools
------------------------------------------------------------
📈 TRANSPARENCY METRICS SUMMARY
------------------------------------------------------------
Timestamp: 2024-11-28 14:30:00
Epoch: 450
Total Transactions: 2,847,391
Active Addresses: ~3,456,789
Blocks Produced: 21,567
Active Stake Pools: 3,000 (decentralization)
Total ADA Staked: 23,456,789,012 ADA
Avg Transaction Fee: $0.17 USD
------------------------------------------------------------
✅ SUCCESS! Cardano transparency metrics extracted.
CREATE TABLE `project.cardano_data.cardano_network_activity` (
timestamp TIMESTAMP NOT NULL,
total_transactions INT64 NOT NULL,
active_addresses INT64,
block_count INT64 NOT NULL,
epoch INT64 NOT NULL,
stake_pools_active INT64 NOT NULL,
total_ada_staked FLOAT64 NOT NULL,
avg_transaction_fee FLOAT64 NOT NULL
);
Once data is in BigQuery, you can run powerful analyses:
SELECT
DATE(timestamp) as date,
AVG(stake_pools_active) as avg_pools
FROM `project.cardano_data.cardano_network_activity`
GROUP BY date
ORDER BY date DESC
LIMIT 30;
SELECT
epoch,
SUM(total_transactions) as total_txs,
AVG(total_ada_staked) / 1e9 as billions_ada_staked
FROM `project.cardano_data.cardano_network_activity`
GROUP BY epoch
ORDER BY epoch DESC;
SELECT
'Cardano' as network,
AVG(avg_transaction_fee) as avg_fee_usd
FROM `project.cardano_data.cardano_network_activity`
UNION ALL
SELECT
'Ethereum' as network,
25.00 as avg_fee_usd -- Approximate
ORDER BY avg_fee_usd;
# Check your .env file
cat ../config/.env | grep DAY04_BLOCKFROST
# Make sure you copied .env.example to .env and filled in your API key
# Re-authenticate with gcloud
gcloud auth application-default login
# Verify project is set
gcloud config get-value project
# Blockfrost free tier: 50K requests/day, 10 req/sec
# Wait a few seconds and retry
# Or upgrade to paid plan at blockfrost.io
# Rebuild container
docker-compose down
docker-compose build --no-cache
docker-compose up
requests - HTTP requests to Blockfrost APIpandas - Data processinggoogle-cloud-bigquery - BigQuery integrationpython-dotenv - Environment managementThis project demonstrates:
✅ Blockchain philosophy - Transparency isn’t just marketing, it’s verifiable
✅ API integration - Working with real blockchain data providers
✅ Docker containerization - Reproducible data pipelines
✅ Cloud data warehousing - Making raw data queryable
✅ Day-scoped isolation - All code prefixed with day04_ to avoid conflicts
“I built a data pipeline that proves Cardano’s values (transparency, decentralization, accessibility) through on-chain metrics - not just marketing claims, but real blockchain data.”
Why this matters:
These are ideas for future enhancement, NOT part of the core deliverable:
This is a portfolio project for educational purposes.
Built for Advent of Code 2025 - Day 04 Demonstrating blockchain transparency through data engineering 🔗