Project 1A - Ingestion Pillar Stakeholder: Daud - Marketing consultant who needs automated GA4 + Google Ads reports for agency clients Industry: Marketing/Advertising
This project extracts marketing data from Google Analytics 4 (GA4) and Google Ads, then loads it into BigQuery for analysis. The pipeline supports both real API data (GA4 Demo Account) and synthetic data generation.
This is a pure ingestion pipeline - data in, data out.
# 1. Python 3.11+
python --version
# 2. Install dependencies
cd /path/to/advent-automation-2025
pip install -r requirements.txt
# 3. Configure environment variables (see Configuration section)
# Navigate to day01 folder
cd day01
# Step 1: Extract GA4 data (synthetic by default)
python day01_DATA_extract_ga4.py
# Step 2: Generate Google Ads synthetic data
python day01_DATA_extract_ads.py
# Step 3: Load both datasets to BigQuery
python day01_DATA_load_bigquery.py
That’s it! Your data should now be in BigQuery.
ga4_sessionsStores GA4 session metrics by date and traffic source.
| Column | Type | Description |
|---|---|---|
date |
DATE | Session date (YYYY-MM-DD) |
sessions |
INTEGER | Number of sessions |
conversions |
INTEGER | Number of conversions |
bounce_rate |
FLOAT | Bounce rate (0.0 to 1.0) |
source |
STRING | Traffic source (google, facebook, direct, email, linkedin) |
Sample Data:
date | sessions | conversions | bounce_rate | source
-----------|----------|-------------|-------------|----------
2024-11-01 | 1520 | 45 | 0.42 | google
2024-11-01 | 890 | 22 | 0.48 | facebook
2024-11-02 | 1620 | 58 | 0.39 | google
google_ads_campaignsStores Google Ads campaign performance metrics.
| Column | Type | Description |
|---|---|---|
date |
DATE | Campaign date (YYYY-MM-DD) |
campaign_name |
STRING | Campaign name |
spend |
FLOAT | Daily spend in USD |
clicks |
INTEGER | Number of clicks |
impressions |
INTEGER | Number of impressions |
conversions |
INTEGER | Number of conversions |
Sample Data:
date | campaign_name | spend | clicks | impressions | conversions
-----------|-------------------|--------|--------|-------------|-------------
2024-11-01 | Brand Campaign | 450.00 | 320 | 12500 | 18
2024-11-01 | Product Launch | 680.00 | 510 | 18200 | 25
2024-11-02 | Retargeting | 320.00 | 245 | 9800 | 15
All configuration is stored in ../config/.env. Add these variables:
# BigQuery Configuration (REQUIRED)
DAY01_GCP_PROJECT_ID="your-gcp-project-id"
DAY01_BQ_DATASET="marketing_data"
DAY01_BQ_LOCATION="US"
# Feature Flags
DAY01_USE_SYNTHETIC_DATA="true" # Use synthetic data (recommended)
DAY01_SYNTHETIC_DAYS="30" # Days of historical data
DAY01_NUM_CAMPAIGNS="4" # Number of ad campaigns
# GA4 Configuration (OPTIONAL - only for real data)
DAY01_GA4_PROPERTY_ID="213025502" # Google Merchandise Store Demo
DAY01_GA4_CREDENTIALS_PATH="./credentials/ga4_service_account.json"
To upload data to BigQuery, authenticate with Google Cloud:
Option 1: Service Account (Recommended for production)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
Option 2: User Account (Quick for testing)
gcloud auth application-default login
Get Service Account Credentials:
GOOGLE_APPLICATION_CREDENTIALS to the key pathday01/
├── data/
│ ├── raw/ # Raw extracted data
│ │ ├── ga4_synthetic.csv # Generated GA4 data
│ │ └── ads_synthetic.csv # Generated Ads data
│ └── processed/ # Processed data ready for BigQuery
│ ├── ga4_sessions.csv
│ └── ads_campaigns.csv
│
├── day01_CONFIG_settings.py # Configuration constants
├── day01_DATA_extract_ga4.py # GA4 data extractor
├── day01_DATA_extract_ads.py # Google Ads synthetic generator
├── day01_DATA_load_bigquery.py # BigQuery loader
│
├── day01_requirements.txt # Project-specific dependencies
├── day01_.env.example # Environment variable template
└── README.md # This file
Default Mode: Synthetic Data
Optional: Real GA4 Data
DAY01_USE_SYNTHETIC_DATA="false" in config/.envAlways Synthetic (no free sandbox available)
After running extraction scripts, check the data/processed/ folder:
ls -lh data/processed/
# Should show:
# ga4_sessions.csv
# ads_campaigns.csv
# Preview data
head data/processed/ga4_sessions.csv
head data/processed/ads_campaigns.csv
Run these queries in BigQuery Console:
Check row counts:
-- GA4 Sessions
SELECT COUNT(*) as row_count
FROM `your-project-id.marketing_data.ga4_sessions`;
-- Google Ads Campaigns
SELECT COUNT(*) as row_count
FROM `your-project-id.marketing_data.google_ads_campaigns`;
Sample join query (GA4 + Google Ads):
SELECT
ga4.date,
ga4.sessions,
ga4.conversions as ga4_conversions,
ads.campaign_name,
ads.spend,
ads.conversions as ads_conversions,
ROUND(ads.spend / NULLIF(ads.conversions, 0), 2) as cost_per_conversion
FROM `your-project-id.marketing_data.ga4_sessions` ga4
JOIN `your-project-id.marketing_data.google_ads_campaigns` ads
ON ga4.date = ads.date
WHERE ga4.source = 'google'
ORDER BY ga4.date DESC
LIMIT 10;
Solution: Set DAY01_GCP_PROJECT_ID in ../config/.env
Cause: Missing Google Cloud authentication
Solution:
# Authenticate using gcloud
gcloud auth application-default login
# Or set service account credentials
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json"
Solution:
pip install google-cloud-bigquery google-analytics-data
data/processed/marketing_dataExpected! If using synthetic data (default), this is normal. The project pivots to synthetic data automatically.
To use real GA4 data:
DAY01_USE_SYNTHETIC_DATA="false"Only follow this if you want to extract real GA4 data instead of synthetic.
day01/credentials/ga4_service_account.json# In config/.env
DAY01_USE_SYNTHETIC_DATA="false"
DAY01_GA4_PROPERTY_ID="your-property-id" # Find in GA4 Admin
DAY01_GA4_CREDENTIALS_PATH="./credentials/ga4_service_account.json"
Google provides a demo GA4 property (Google Merchandise Store):
213025502Before considering this project complete, verify:
python day01_DATA_extract_ga4.py runs without errorspython day01_DATA_extract_ads.py runs without errorsdata/processed/python day01_DATA_load_bigquery.py runs without errorsSELECT COUNT(*) > 0)This project is Day 01 - Ingestion only. Future projects in the advent calendar will add:
For now, your data is ready to query in BigQuery!
Built as part of the Christmas Data Advent Calendar 2025 🎄