← Back to blog
analytics·April 18, 2026·7 min read

What Google Analytics actually tells you (and what it doesn't)

GA4 tells you someone visited your pricing page. Your warehouse tells you they visited three times, came from a Google Ad, and their company just raised a Series B.

analytics

What Google Analytics actually tells you (and what it doesn't)

GA4 is not your analytics platform. It's one data source. A good one — but one.

Most companies treat GA4 as the single source of truth for "how is our website performing?" They open the real-time dashboard, check sessions, maybe glance at a funnel, and call it a day. Then they wonder why the numbers don't match what sales is seeing, why marketing can't attribute pipeline, and why nobody trusts the dashboard.

The problem isn't GA4. The problem is asking GA4 questions it was never designed to answer.

What GA4 is actually good at

GA4 is a session-and-event machine. It captures:

  • Page views — what pages were loaded, in what order
  • Events — clicks, form submissions, video plays, scroll depth, custom events you define
  • Sessions — groups of events within a time window, attributed to a traffic source
  • User properties — device, geo, language, audience membership

For a content site, an e-commerce store, or a SaaS landing page, this is genuinely useful. You can see which pages get traffic, where users drop off, which channels drive visits, and whether your latest blog post is getting read.

GA4 answers the question: what happened on your site?

What GA4 doesn't tell you

Here's where it falls apart.

Who the visitor actually is

GA4 gives you anonymous user IDs and device fingerprints. It doesn't give you names, companies, or email addresses. If someone visits your pricing page three times from three devices, GA4 sees three different users.

Your CRM (HubSpot, Salesforce) knows who they are. GA4 doesn't. Without joining the two, you can't answer: "Which of our open deals visited the pricing page this week?"

What happened after they left

GA4's jurisdiction ends at your site boundary. It doesn't know whether the visitor booked a demo, became a customer, churned six months later, or expanded their contract.

Your CRM and billing system know that. GA4 never will.

How much revenue a channel actually drove

GA4 has conversion tracking, but it's session-scoped. It tells you "this session included a form submission" and attributes it to the last-click traffic source. It doesn't tell you the deal was worth $48K, took four months to close, and the first touch was a LinkedIn ad three quarters ago.

Multi-touch attribution requires data from GA4, your ad platforms, your CRM, and your billing system — joined together in a warehouse. GA4 alone gives you last-click, which is better than nothing and worse than reality.

Whether your content actually converts

GA4 can tell you a blog post got 2,000 sessions. It can't tell you that 3 of those sessions became SQLs and one closed at $120K ARR — unless you've piped CRM data back into GA4 via audiences or measurement protocol, which most teams haven't done.

The GA4 UI vs. the BigQuery export

This distinction matters more than most teams realize.

GA4's UI shows you aggregated, sampled, pre-processed data. It's designed for quick answers: "how many sessions this week?" It applies thresholding (data is hidden when counts are low), sampling (at high volumes), and consent-mode adjustments. The numbers in the UI are approximations.

GA4's BigQuery export gives you raw, event-level data. Every event, every parameter, every user property, every timestamp — one row per event. No sampling. No thresholding. This is the real data.

The export schema looks like this:

-- One row per event in the GA4 BigQuery export
SELECT
    event_date,
    event_timestamp,
    event_name,
    user_pseudo_id,
    -- Traffic source (first-click attribution)
    traffic_source.source,
    traffic_source.medium,
    traffic_source.name AS campaign,
    -- Event parameters are nested
    (SELECT value.string_value
     FROM UNNEST(event_params)
     WHERE key = 'page_location') AS page_url,
    (SELECT value.string_value
     FROM UNNEST(event_params)
     WHERE key = 'page_title') AS page_title,
    -- Device info
    device.category AS device_type,
    geo.country
FROM `your-project.analytics_XXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260401' AND '20260430'

The UNNEST(event_params) pattern is the first thing that surprises people. GA4's BigQuery export stores event parameters as nested key-value arrays, not flat columns. Every query that reads a parameter needs that subquery.

Once you get past the schema quirks, the BigQuery export is vastly more powerful than the UI. You can:

  • Run exact counts (no sampling)
  • Build custom attribution models
  • Join to other datasets
  • Query across months without hitting UI limits
  • Pipe results into dbt models and downstream dashboards

What changes when you warehouse it

The architecture we build for most clients:

GA4 ──→ BigQuery (native export, free)
         │
Google Ads ──→ BigQuery (via Fivetran / native transfer)
         │
Meta Ads ──→ BigQuery (via Fivetran)
         │
CRM (HubSpot / Salesforce) ──→ BigQuery (via Fivetran)
         │
         ▼
      dbt models
      (staging → intermediate → mart)
         │
         ▼
   Sigma / Looker / Streamlit

Staging: clean the raw data

-- models/staging/stg_ga4__events.sql
SELECT
    PARSE_DATE('%Y%m%d', event_date)               AS event_date,
    TIMESTAMP_MICROS(event_timestamp)               AS event_at,
    event_name,
    user_pseudo_id,
    traffic_source.source                           AS utm_source,
    traffic_source.medium                           AS utm_medium,
    (SELECT value.string_value
     FROM UNNEST(event_params)
     WHERE key = 'page_location')                   AS page_url,
    (SELECT value.int_value
     FROM UNNEST(event_params)
     WHERE key = 'engagement_time_msec')            AS engagement_ms
FROM {{ source('ga4', 'events') }}

One model. Flat columns. No more UNNEST in every downstream query.

Intermediate: sessionize and attribute

-- models/intermediate/int_ga4__sessions.sql
SELECT
    user_pseudo_id,
    ga_session_id,
    MIN(event_at)       AS session_start,
    MAX(event_at)       AS session_end,
    COUNT(*)            AS event_count,
    SUM(engagement_ms)  AS total_engagement_ms,
    -- First-touch attribution for the session
    FIRST_VALUE(utm_source) OVER (
        PARTITION BY user_pseudo_id, ga_session_id
        ORDER BY event_at
    ) AS session_source,
    FIRST_VALUE(utm_medium) OVER (
        PARTITION BY user_pseudo_id, ga_session_id
        ORDER BY event_at
    ) AS session_medium
FROM {{ ref('stg_ga4__events') }}
GROUP BY 1, 2

Mart: blend with CRM and ads

-- models/marts/mart_marketing_attribution.sql
SELECT
    s.session_source,
    s.session_medium,
    COUNT(DISTINCT s.ga_session_id)  AS sessions,
    COUNT(DISTINCT c.contact_id)     AS leads,
    SUM(d.deal_amount)               AS pipeline_value
FROM {{ ref('int_ga4__sessions') }} s
LEFT JOIN {{ ref('stg_hubspot__contacts') }} c
    ON s.user_pseudo_id = c.ga_user_id
LEFT JOIN {{ ref('stg_hubspot__deals') }} d
    ON c.contact_id = d.contact_id
GROUP BY 1, 2

Now you can answer: "LinkedIn organic drove 340 sessions last month, which produced 12 leads and $180K in pipeline." GA4 alone could only tell you about the 340 sessions.

The questions the warehouse answers

QuestionGA4 aloneGA4 + warehouse
How many sessions last month?YesYes
Which blog post gets the most traffic?YesYes
Which channel drives the most pipeline?No — no CRM dataYes
What's the CAC by channel?No — no spend or deal dataYes
Did that campaign produce ROI?Partial — conversions onlyYes — full-funnel
Which content converts, not just attracts?NoYes — join sessions to deals
Is a specific account engaging with our site?No — anonymous onlyYes — match via CRM

The cost

GA4's BigQuery export is free for the first 1M events/day (per property). For most SMBs and mid-market companies, you won't hit that limit.

BigQuery storage and compute are pay-per-use. A typical marketing analytics warehouse — GA4 + 2-3 ad platforms + CRM — costs $50–200/month in BigQuery compute. Fivetran connectors for the ad platforms and CRM add another $50–200/month depending on volume.

The expensive part isn't the infrastructure. It's getting the models right — the attribution logic, the join keys between GA4 and your CRM, the definitions of "lead" and "pipeline" that everyone agrees on.

The honest take

GA4 is good at what it does. It's a reliable, free, event-level analytics tool with a generous BigQuery export. The mistake is treating it as the end of the analytics stack instead of the beginning.

The moment you need to answer "which of these sessions turned into money?" — and every B2B company eventually needs to — GA4 alone can't help you. It needs a warehouse, a transformation layer, and a join key to your CRM.

GA4 tells you someone visited your pricing page. Your warehouse tells you they visited three times, came from a Google Ad, and their company just raised a Series B.

One of those is a pageview. The other is a signal.


We've built GA4 → BigQuery → dbt pipelines for marketing agencies, SaaS companies, and e-commerce brands. If your GA4 data is sitting in a silo, book a discovery call and we'll show you what it looks like when it joins the rest of your data.

Got a similar problem?

30 minutes. We'll tell you honestlywhat's broken.