Internal Link Optimisation

In consultation with your team we design a highly customised strategy for your internal link optimisation project. Numerous factors are taken into consideration including your goals and objectives as well as technical aspects of your setup.

The list goes on, so a careful consideration of your specific setup and objectives is essential in preparation for the crawling, content extraction and link recommendations.

Crawling

We typically start from your sitemap files and use it as a starting point for URL discovery and crawl list generation. Additional URLs can be added from an arbitrary number of sources, including programmatic generation. User-agent based crawler then proceeds as either a single or multi-threaded process as gently or as aggressive as we design it to be.

Data Extraction

Raw HTML is then processed to extract meaningful and clean data. This typically involves text extraction from article/body or custom ids and classes taking care to exclude any boilerplate elements such as nav, sidebar and footer.

The system also finds and maps all your internal links, generates a link graph and calculates internal PageRank. This provides a more nuanced insight into internal page connectivity and can further inform link recommendation strategy and fine-tune the final output. Internal page authority is also useful in creating before and after optimisation projections and linking them with your business goals.

The final outcome of data extraction is:

Clean text
Page meta data
Internal link graph, anchor text and PageRank

BERT Vector Embeddings

Pre-processed and tokenised text is then converted to multi-dimensional vectors as language-agnostic BERT sentence embeddings. This enables similarity searches among pages in any major human language making it suitable for multilingual websites with complex alternate hreflang setups.

Similarity Search

In this stage we employ cosine similarity to generate a similarity matrix and are able to produce an arbitrary number of link recommendations in a many to many scenario. This maps all similar pages in the entire dataset.

Prioritisation

Your similarity matrix holds many thousands, even millions of link recommendations and so it’s important to carefully prioritise the rollout of implementation by focusing on heuristics to find high-impact link recommendations and filtering out the rest.

Examples rules:

Only link higher to lower PageRank URLs
Exclude pages which already link to each other
Prevent linking between pages that are too far apart in authority
Suggest links only between certain topical clusters

Note: We do not recommend automatic-linking, however LLM-based automated evaluation is an option.

Final Recommendations

You receive a spreadsheet of all link recommendations, prioritised and sorted for implementation. Our work isn’t finished at this stage however, as we stay with you offering advice, guidance and assistance during implementation and help you measure and report on impact of your internal link optimisation project.

Workflow