Seven SEO reports from a log file.
Most SEO teams rely on crawlers and GSC to understand what search bots actually do on a site. Both have gaps. Server access logs don’t. I built seo-log-auditor because I kept doing the same log analysis manually on client projects — so I packaged it into a Streamlit app anyone can run with one command.
Crawlers don’t tell the full story.
Screaming Frog and Botify are excellent for understanding what a site looks like from a crawler’s perspective. But they can only see what they’re allowed to see. Server access logs are the ground truth: every single HTTP request that hit the server, from every bot, regardless of robots.txt.
On every technical SEO audit I’ve run, the biggest leaks were invisible to crawlers. Budget wasted on redirect chains that resolved outside the crawl window. Bots hammering URLs with session parameters that no crawler would follow. Pages marked indexable but never touched by Googlebot in months.
The fix was always the same: open the logs, write some pandas, repeat. So I built the tool.
Drop a log, get a dashboard.
The app accepts a raw log file via the Streamlit file uploader, parses it with regex, and pipes the resulting dataframe into seven tabbed analysis views. No database, no cloud dependency. Runs on a laptop.
# run locally in under 30 seconds
git clone https://github.com/hitensangani/seo-log-auditor
cd seo-log-auditor
pip install -r requirements.txt
streamlit run app.py
Seven views, one log file.
Bot vs human traffic
Splits requests by user-agent, breaks down which bots are consuming budget, and highlights bot-to-human ratios that signal crawl inefficiency.
Pages bots find, humans don’t
Cross-references bot-accessed URLs against known sitemap and internal link signals to surface URLs that receive crawl attention but no organic equity.
4xx and 5xx crawl cost
Ranks the most-crawled non-200 URLs by bot hits, so you can see exactly how much budget is being spent on broken or server-error responses.
Crawled but forgotten
Flags URLs that haven’t been visited by any major crawler in 30+ days. A reliable signal for pages that are indexed but outside Googlebot’s active crawl cycle.
Slow URL identification
Aggregates server response times per URL and surfaces the slowest pages — a proxy for CWV issues before pulling up Lighthouse or CrUX data.
Fake vs legitimate bots
Compares declared user-agents against known reverse-DNS signatures to identify scrapers and shadow crawlers masquerading as Googlebot or Bingbot.
Infinite URL space detection
Surfaces URL patterns with high parameter variance — session IDs, filters, sort orders — that fragment crawl budget across hundreds of near-duplicate URLs.
Why I built it in public.
Log analysis is one of the highest-leverage SEO activities and one of the least democratised. Enterprise teams use Botify or Oncrawl. Everyone else exports a CSV from their hosting panel and opens Excel. That gap felt fixable.
Building in public also forced me to write cleaner code — the kind that runs on someone else’s machine with a pip install and no tribal knowledge. That discipline translates directly to the production PRs I write on engineering teams.
The project is MIT-licensed. Contributions welcome.