Query found to be the biggest predictor of CTR delta.

Query is the strongest signal for CTR prediction

In a deep CTR analysis of 30,000 query/URL pairs, we looked for relationships between query and basic snippet text features including title, description and URL. All combined feature attempts resulted in signal noise and weak correlations. In response, we trained four separate classifiers, each specialising in a single aspect of the analysed SERP snippet. Each classifier was used to process the entire dataset and the results were combined.

This lead to clearer correlations, better feature interpretability and the discovery of search query as the dominant factor in CTR delta prediction. This finding highlights the importance of query intent comprehension with significant implications on traffic modelling and projection tasks in search.

The rest of this article outlines our process and concludes with practical application in the context of SEO analysis and strategy design.

Read on or skip to the end for TLDR.

Data Pre-Processing

In our study we slice the data to a single dimension group, for example:

Query: flux capacitor
Page: https://dejanmarketing.com/flux/
Impressions: 100
Clicks: 10
Position: 5
Device: Desktop
Country: USA

Data Source & Acquisition

Google Search Console data was pulled in via API with the following dimensions for a 90 day range:

Query
Page
Impressions
Clicks
Position
Device
Country

Query-URL Pair Canonicalisation

Since each query can have multiple target URLs we determine the canonical URL for each query by selecting the top URL by clicks for that query.

Example:

Query: flux capacitor
URL Candidates:
URL Candidate Clicks for “flux capacitor”:
- 120
- 14
- 8
- 3
- 0

This step is key in reducing bias and noise in CTR average calculations.

Query Intent Classification

User behaviour differs vastly between different query types and so we employ our custom-made query intent classifier to label the query dataset into:

LABEL_0: ‘Commercial’
LABEL_1: ‘Non-Commercial’
LABEL_2: ‘Branded’
LABEL_3: ‘Non-Branded’
LABEL_4: ‘Informational’
LABEL_5: ‘Navigational’
LABEL_6: ‘Transactional’
LABEL_7: ‘Commercial Investigation’
LABEL_8: ‘Local’
LABEL_9: ‘Entertainment’

This step is key because it allows for intent-specific CTR average calculations. A query “seo by dejan” on #1 will have a 30% CTR while query “seo testimonials” on #1 will have a 1.8% CTR. If you treat them equally you’ll end up with noisy CTR averages. Splitting calculations into branded and non-branded queries addresses the biases associated with brand-related query intent.

With clean CTR statistics it’s possible to do three levels of traffic projections:

CTR Improvement scenarios
Rank Improvement scenarios
Hybrid scenarios combining gains in both CTR and position

With the above it’s then possible to add conversion metrics such as conversion rate and conversion rate value and provide rudimentary ROI estimates signalling which area of SEO improvement will lead to which financial outcomes.

Important: When projecting clicks for rank increase scenarios it’s advisable to scale the projection proportionately to the current performance of the query. In other words if a query underperforms by 10% on rank 2, it will likely continue to underperform by 10% on rank 1. Similarly if a query performs better than average, moving it up in rank and factoring in site-average would indeed reduce its calculated click gains.

The next step is to scrape and extract meta data for each page and store it in a database. We use a combination of trafilatura and beautiful soup for the task. One for full page content and another for more precise extraction of titles and descriptions. Later down the track, the above mentioned metadata fits into the following structure:

Query [SEP] Title [SEP] Description [SEP] URL [SEP] CTR Delta

CTR Delta Calculation

Equipped with clean and unbiased CTR statistics we’re able to calculate CTR Deltas for each query/URL pair. In the example provided we see a non-branded query page receiving a 20% lower CTR than expected for that site resulting in a loss of 500 clicks. This holds true after we filter for that exact country, device and query type.

With a reliable CTR Delta metric we can calculate the click loss and flag that query for investigation.

Is the cause for bad CTR in the snippet itself or is there something else?

We won’t be going into special SERP features and their impact on CTR as it is not the focus of this particular workflow.

Believe it or not, all that was only the preliminaries.

And as is with anything related to machine learning, most of the time is spent on getting your data clean and ready. And now we’re ready. We have the CTR stats, the deltas, queries, titles, descriptions and URLs.

Training Classifiers

We fine-tune four small transformers on separate tasks and combine the predictions of each specialist model:

query transformer
title transformer
description transformer
url transformer

Each model provides classification for whether the query / snippet combination will result in better or worse CTR than average. It also provides a handy softmax value which can be used to scale predictions when calculating final score.

Here’s the full setup:

├── 2024-08-17T14-24_export.csv
├── albert-ctr-description-model
│ ├── config.json
│ ├── model.safetensors
│ ├── special_tokens_map.json
│ ├── spiece.model
│ └── tokenizer_config.json
├── albert-ctr-query-model
│ ├── checkpoints
│ │ ├── checkpoint-1553
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ ├── checkpoint-1582
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ ├── checkpoint-3106
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ ├── checkpoint-3164
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ ├── checkpoint-4659
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ └── checkpoint-4746
│ │ ├── config.json
│ │ ├── model.safetensors
│ │ ├── optimizer.pt
│ │ ├── rng_state.pth
│ │ ├── scheduler.pt
│ │ ├── trainer_state.json
│ │ └── training_args.bin
│ ├── config.json
│ ├── model.safetensors
│ ├── special_tokens_map.json
│ ├── spiece.model
│ └── tokenizer_config.json
├── albert-ctr-title-model
│ ├── checkpoints
│ │ ├── checkpoint-1582
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ ├── checkpoint-3164
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ └── checkpoint-4746
│ │ ├── config.json
│ │ ├── model.safetensors
│ │ ├── optimizer.pt
│ │ ├── rng_state.pth
│ │ ├── scheduler.pt
│ │ ├── trainer_state.json
│ │ └── training_args.bin
│ ├── config.json
│ ├── model.safetensors
│ ├── special_tokens_map.json
│ ├── spiece.model
│ └── tokenizer_config.json
├── albert-ctr-url-model
│ ├── checkpoints
│ │ ├── checkpoint-1582
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ ├── checkpoint-3164
│ │ │ ├── config.json
│ │ │ ├── model.safetensors
│ │ │ ├── optimizer.pt
│ │ │ ├── rng_state.pth
│ │ │ ├── scheduler.pt
│ │ │ ├── trainer_state.json
│ │ │ └── training_args.bin
│ │ └── checkpoint-4746
│ │ ├── config.json
│ │ ├── model.safetensors
│ │ ├── optimizer.pt
│ │ ├── rng_state.pth
│ │ ├── scheduler.pt
│ │ ├── trainer_state.json
│ │ └── training_args.bin
│ ├── config.json
│ ├── model.safetensors
│ ├── special_tokens_map.json
│ ├── spiece.model
│ └── tokenizer_config.json
├── raw_training_data.csv
├── test.py
├── train_description.py
├── train_query.py
├── train_title.py
└── train_url.py

Purposeful Overfitting

All four models were driven to the point of overfitting and the checkpoint before the overfit was selected as the best model.

Overfitting is when model is trained to the point where its weights are adjusted to best fit training data features. This makes it perform exceptionally well on training data but less likely to generalise on new and never-before-seen data.

Our approach is to test the limit of the model and see at which point it overfits and select the checkpoint saved just before it happens.

Training Configuration

Dataset: 31,640 samples [Query,Title,Description,URL,CTR_delta]
Base Model: albert-base-v2
Labels: 3 [-1,0,1 for negative, neutral and positive delta]
Input: Query
Target: CTR_delta
Training Validation Split: 80-20
Max Length: 128
Batch Size: 16
Warmup Steps: 500
Weight Decay: 0.01
Checkpointing Strategy: Epoch
Evaluation Strategy: Epoch
Number of Epochs: 3

Post Training-Testing

On inference we chain all four models in a single pipeline, each one making predictions for a single SERP snippet feature:

Query
Title
Description
URL

Output includes:

row_number
query_model
title_model
description_model
url_model
query_model_confidence
title_model_confidence
description_model_confidence
url_model_confidence
balanced_classification
true_label

Key Findings

Query is a strong bias signal and will impact CTR more than the snippet itself.
Title and description are about equally important in CTR prediction.
URL is a less reliable predictor of CTR performance.

We’ve developed a CTR delta prediction framework based on a pipeline of four small transformers, each dedicated to a single task.

The framework predicts whether CTR for a page query pair will have a better or worse CTR than expected. We input a query, the models return scores, we combine the scores and return a prediction. We do this in a batch-processing pipeline.

Output of all models is combined and balanced in proportion to their softmax values. This allows more confident predictions to have a stronger vote.

It predicts CTR delta correctly 8 out of 10 times. This is most likely as good as it gets without adding new biases such as special SERP features.

What’s next?

Regression. CTR is not a binary value and while our method predicts “better or worse” outcomes it would be more helpful to understand “by how much”. A regression model will solve that problem for us as the fifth node in the model chain.

Its input will be the classifier quartet output combined with the snippet features and its output will be a percentage of how much the CTR will deviate from the norm.

Stay tuned.