Managing Hugo Staging Sites
13 May 2026
Restricting search engine and user access to a staging Hugo site with Cloudflare
Image: dead____artist on Unsplash
Now that this site is taking shape, I wanted to cover basic SEO by adding it to Google Search and the Bing Webmaster tools. Both need a sitemap, which is just a list of the pages you want included by search crawlers.
Hugo will build you a sitemap as long as you set the sitemap parameter in the config TOML/YAML file. Which should be pretty easy, if not for the fact that the Cloudflare Pages worker rewrites all URLs in the site from its default subdomain to the base domain.
Hence we need a custom sitemap. While we are doimg that, we also need an accompanying robots.txt, to block search engine crawling of certaim pages.
A quick attempt didn’t work, as even though we’re building the site with the -b Hugo option to specify the real base URL, that didn’t seem to propagate to Hugo’s default sitemap.
After some fiddling, the following worked for me.
First, enable the sitemap and robots.txt in the config file, and let’s specify a new parameter for the base URL to just use in the sitemap:
File:hugo.yml
# Sitemap settings. Change frequency is a hint to search engines about how often your content changes.
# Options include: always, hourly, daily, weekly, monthly, yearly, never.
sitemap:
changeFreq: monthly
params:
# Canonical base URL for the sitemap
canonicalBaseURL: https://meantimecyber.com
# Robots.txt settings. If you want to disable it, set enableRobotsTXT to false or just delete the variable. By default, it allows all bots to access all content. You can customize it by adding rules under the robots section.
enableRobotsTXT: trueThen we need a sitemap template. GitHub Copilot drafted the following, with a few iterative prompts:
File:layouts/sitemap.xml
{{ printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>" | safeHTML }}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
{{- /* Use canonicalBaseURL from params to avoid Cloudflare Pages preview URLs appearing in the sitemap */ -}}
{{- $sitemapBaseURL := strings.TrimSuffix "/" (or site.Params.canonicalBaseURL site.BaseURL) -}}
{{- range .Site.Pages }}
<!-- Exclude tag taxonomy pages. Also exclude future-dated regular pages (scheduled posts),
but always keep section pages and the home page — their date is derived from children. -->
{{- if and (not (hasPrefix .RelPermalink "/tags/")) (not (hasPrefix .RelPermalink "/blogs/tags/")) (or .IsSection .IsHome (not (.Date.After now))) }}
<!-- Default lastmod to the page's own value -->
{{- $effectiveLastmod := .Lastmod -}}
<!-- For section pages (e.g. /blogs/), use the most recent *published* child page's date.
Explicitly filter to pages whose date is not in the future, so this works correctly
even in dev builds that use -F (include future posts). -->
{{- $publishedPages := where .Pages "Date" "<=" now -}}
{{- if and .IsSection $publishedPages -}}
{{- $effectiveLastmod = (index $publishedPages 0).Lastmod -}}
{{- end -}}
<url>
<loc>{{ printf "%s%s" $sitemapBaseURL .RelPermalink }}</loc>{{ if not $effectiveLastmod.IsZero }}
<lastmod>{{ $effectiveLastmod.Format "2006-01-02T15:04:05-07:00" | safeHTML }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
<changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0 }}
<priority>{{ .Sitemap.Priority }}</priority>{{ end }}
</url>
{{- end }}
{{- end }}
</urlset>This template:
canonicalBaseURL parameter as the base URL, or falls back to the Hugo default.tags in their path./blogs/ where we want the date of the most recently updated child.For each page, we get a loc entry in the generated XML:
<url>
<loc>https://meantimecyber.com/blogs/</loc>
<lastmod>2026-04-29T00:00:00+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>The sitemap tells crawlers what you want indexed. The accompanying robots.txt points to the sitemap, and instructs the crawlers on what to ignore:
layouts/robots.txt
User-agent: *
Disallow: /tags/
Sitemap: https://meantimecyber.com/sitemap.xmlTopics