Popular Posts

Latest Articles

Semrush Partner
Search Engine Optimization
336 views

Screaming Frog’s Semantic Revolution: The Tool That Thinks Like Google

Discover how you can transform your SEO efforts with the revolutionary semantic analysis features of Screaming Frog v22. Installation guide included.

Screaming Frog SEO Spider version 22.0, codenamed “Knee-Deep,” represents a watershed moment for technical SEO. This isn’t just another update—it’s the first mainstream SEO tool to integrate Large Language Model embeddings directly into website analysis.

Think about this: Google has been using semantic understanding since Word2Vec in 2013 and the Hummingbird update. For over a decade, we’ve been analysing websites with keyword-counting tools while Google has been thinking in concepts, relationships, and meaning.

That gap just closed.

Understanding Vector Embeddings: The Map of Meaning

Vector embeddings sound technical, but the concept is elegantly simple. Imagine you have a massive map of meaning where every word, sentence, and document gets assigned coordinates based not on what letters it contains, but on what it means.

Pages about “automobiles” and “cars” would sit close together on this semantic map, even though they share no common letters. Content about “dog training” and “puppy behaviour” would cluster together because they address related concepts. Meanwhile, a random page about “chocolate recipes” would sit far away from both.

This is exactly how vector embeddings work. They convert text into numerical coordinates in a high-dimensional space where semantic similarity translates to mathematical proximity. Google’s been using this technology for years to understand search queries and match them with relevant content.

Now, Screaming Frog gives us the same capability.

Four Game-Changing Features

Duplicate and Semantically Similar Content Detection

This goes far beyond traditional duplicate content analysis. The tool can now identify pages that cover the same topic using completely different vocabulary. This is crucial for spotting content cannibalisation—where multiple pages compete for the same semantic space without you realising it.

When Screaming Frog shows similarity scores above 0.95, you’re looking at pages that are dangerously close in meaning. These are consolidation opportunities waiting to strengthen your site’s topical authority.

Low Relevance (Off-Topic) Content Detection

The system calculates the semantic centre of your entire website by averaging all page embeddings, then measures how far each page sits from this centre. Pages scoring below 0.4 relevance are essentially orphaned from your site’s main themes.

This feature is pure gold for large websites that have accumulated off-topic content over time. You can now systematically identify and address pages that dilute your site’s focus.

Content Cluster Visualisation

The Content Cluster Diagram provides a bird’s-eye view of your site’s semantic landscape. You’ll see how related content naturally groups together, identify strong topical clusters, and spot gaps in your coverage.

Off-topic content appears as isolated dots on the diagram’s edges. This visual analysis makes strategic decisions about content architecture much clearer.

Semantic Search

Perhaps the most exciting feature: you can enter any query and find pages that are semantically similar to that query, regardless of exact keyword matches. This transcends simple keyword searching to find content based on actual meaning and intent.

Setting Up Semantic Analysis: A Step-by-Step Guide

Getting these features running requires Screaming Frog SEO Spider v22 with a paid license. Here’s your setup roadmap:

Choose an AI Provider

Connect to a reliable AI provider’s API service—OpenAI, Gemini, or Ollama. Have your API key ready.

Add Embeddings Prompt from Library

Navigate to ‘Prompt Configuration’ and use ‘Add from Library’ to select the pre-configured embeddings setting. This prompt uses the optimised ‘SEMANTIC_SIMILARITY’ task type.

Connect to API

Ensure your API connection is active under ‘Account Information.’ This connection enables automatic embedding generation during crawls.

Enable HTML Storage

Go to ‘Config > Spider > Extraction’ and activate both ‘Store HTML’ and ‘Store Rendered HTML.’ This ensures the page text gets stored and becomes available for vector embedding generation.

Activate Embeddings Functionality

Navigate to ‘Config > Content > Embeddings’ and turn on ‘Enable Embedding functionality.’ Also, check ‘Semantic Similarity’ and ‘Low Relevance’ options to display relevant columns and filters in the ‘Content’ tab.

Crawl Your Website

Enter your target website URL and hit ‘Start.’ Wait for both crawl and API progress bars to reach 100%.

Run Crawl Analysis

For ‘Semantically Similar’ and ‘Low Relevance Content’ filters to work, initiate analysis after crawling completes. Do this manually via ‘Crawl Analysis > Start’ or enable ‘Auto-Analyse at End of Crawl’ under ‘Crawl Analysis > Configure.’

Review Results

Examine ‘Semantically Similar’ and ‘Low Relevance Content’ filters in the ‘Content’ tab. Also, explore the ‘Content Cluster Diagram’ under ‘Visualisations.’

Real-World Applications That Transform SEO

Content Audits and Strategy

Identify semantically similar pages competing for the same search intent. Consolidate these pages to strengthen topical authority. Find low-relevance content that dilutes your site’s focus and either update, merge, or remove these pages. Use the Content Cluster Diagram to spot content gaps and plan strategic expansion.

Internal Linking Optimisation

The “Duplicate Details” tab and semantic similarity filters reveal logical internal linking opportunities between related content. This improves user experience, site navigation, and page authority distribution. Semantic search helps you quickly find all content related to specific topics for strategic internal link building.

Keyword Mapping and Relevance Calculations

Measure how semantically relevant your pages are to specific keywords, moving beyond simple keyword density metrics. Vectorise your keyword lists and compare them with page embeddings to determine which pages best match which queries. Calculate relevance scores using cosine similarity between page and keyword embeddings, scoring content relevance on a 0-100 scale.

Redirect Mapping During Site Migrations

Use semantic analysis to map old URLs to their most semantically similar new URLs during site migrations. This approach minimises soft 404s that Google might detect when traditional exact-match redirects fall short.

Competitor Analysis

Crawl competitor websites to analyse their content clusters and semantic relevance patterns. See how comprehensively competitors answer specific queries compared to your content. Use these insights to fine-tune your own content strategy.

Link Building Target Identification

Analyse potential link source pages to determine their semantic relevance to your target pages objectively. This analysis helps you avoid irrelevant links and build more valuable, contextually appropriate backlinks.

Maximising Results: Tips and Limitations

Optimise Content Area

Embedding quality directly correlates with content quality. Exclude repetitive template text like menus, footers, and cookie notices for more accurate embeddings. Use HTML tag, class, or ID inclusion/exclusion options under Config > Content > Area.

Boost Performance

Reducing vector embedding dimensions can significantly improve processing speed on lower-performance machines. This optimisation saves time when analysing large websites.

Handle Large Pages

Very long pages might exceed AI providers’ token limits and disrupt analysis. Enable ‘Limit Page Content’ to restrict content to specific character limits when encountering these situations.

Adopt Hybrid Approaches

While semantic similarity excels at detecting exact and near-duplicate content, traditional text matching algorithms catch different types of similarities. Using both methods in parallel provides more comprehensive and reliable results.

Maintain Human Analysis

Despite this tool’s power, results won’t always be perfect or reliable in all scenarios. Always apply critical thinking and contextual analysis rather than blindly trusting data.

Ensure Model Consistency

Avoid comparing embeddings from different sources. These might have different lengths or come from different language models, leading to incorrect conclusions.

Share!

These May Also Interest You

Craving more SEO knowledge? Extend your learning with #SEOSDINERSCLUB

Subscribe to our newsletter for weekly SEO insights, join the discussion in our community, or engage with professionals on our Twitter group.