Key Findings
- Query is a strong bias signal and will impact CTR more than the snippet itself.
- Title and description are about equally important in CTR prediction.
- URL is a less reliable predictor of CTR performance.
We’ve developed a CTR delta prediction framework based on a pipeline of four small transformers, each dedicated to a single task.
The framework predicts whether CTR for a page query pair will have a better or worse CTR than expected. We input a query, the models return scores, we combine the scores and return a prediction. We do this in a batch-processing pipeline.
Output of all models is combined and balanced in proportion to their softmax values. This allows more confident predictions to have a stronger vote.
It predicts CTR delta correctly 8 out of 10 times. This is most likely as good as it gets without adding new biases such as special SERP features.
What’s next?
Regression. CTR is not a binary value and while our method predicts “better or worse” outcomes it would be more helpful to understand “by how much”. A regression model will solve that problem for us as the fifth node in the model chain.
Its input will be the classifier quartet output combined with the snippet features and its output will be a percentage of how much the CTR will deviate from the norm.
Stay tuned.