Path routing in Cloudflare Pages

Cloudflare Pages is my go-to for serving static sites. It’s a great experience to push changes to GitHub and have it appear fully built just a couple minutes later. It’s a developer-friendly platform and this site is built with it. Even so, despite their accurate documentation on route handling, I have come across several sharp edges that have caused me to adopt specific practices in new projects.

I’ll explain the practices in suggestions but just so we’re all on the same page1 here’s a brief summary of Pages routing:

  1. Cloudflare Pages assumes the site is a Single-Page Application (SPA) by default and serves the root page instead of a 404 page when a path is not found. This is handy for React-based apps where JavaScript running in the browser handles all of the routing and page display but is confusing for static sites.
  2. URLs to static HTML files are redirected to remove the HTML extension (file /about.html is served at URL /about), which makes files appear folder-like. Actual folders, by contrast, are redirected to include a trailing slash (/contact/index.html is served at URL /contact/). More details later.
  3. Cloudflare Pages Functions (server-side code bundled with the site but run in a Cloudflare Worker) use different path routing than static paths.

Impact

If you’re not expecting this routing behavior it will impact your site in a couple of ways:

  1. Unnecessary additional files: If you’re building an SPA that manipulates the browser URL you don’t need to manually redirect back to home or make an HTML page at every URL that loads the SPA.
  2. Unnecesary redirects: For static sites the files on disk don’t correspond one-to-one with the URLs. Unless you incorporate the route handling behavior you will construct internal site links that trigger unnecessary redirects that worsen site performance2.
  3. Confusing behavior: I had a frustrating bug in one project that consisted of static pages but had a single API endpoint handled by a Pages Function. The API existed at route /api/s3 and was accessible at /api/s3/ but would trigger a 404 response (not even a redirect!) from the server when accessed at /api/s3. This conflicted with my expectation that files should be available without slashes. The problem was that the routing was handed off to a server-side framework router that forced trailing slashes.

Suggestions

Fortunately, there’s a few things you can do to avoid unnecessary redirects and confusing behavior.

ℹ️ NOTE These tips are not intended for SPAs, only static sites or hybrid sites that mix static and server-side (via Pages Functions) content.

  1. Specify a custom 404 page by creating a 404.html file in the root of your built site. In Astro, for instance, this would be a /src/pages/404.astro page. This disables Cloudflare’s assumption that the site is an SPA and will return an HTTP 404 status along with your custom 404 document for URLs that don’t exist. This helps you identify broken internal links on your site3.
  1. Consistently place your static pages in a folder like name/index.html instead of name.html4. Paths will consistently end in a slash this way. If you do the opposite you will run into a scenario where some pages end in slashes and others do not as shown in static routing details section. This isn’t necessary for other non-HTML files like images, /robots.txt, the sitemap, and so on.
  1. Generate internal site links to include a trailing slash by default. For small sites, you can just have a constant object that you always use to refer to internal URLs:

    export const ROUTES = {
      home: '/',
      about: '/about/',
      contact: '/contact/,
    };

    Or create a utility to append slashes for relative URLs. We can use the URL class to add the slash without disturbing any search parameters and fragments:

    function addTrailingSlash(href: string) {
      const url = new URL(href, Astro.site);
    
      // only append slashes if the URL is for this site
      if (url.origin === Astro.site.origin) {
        url.pathname += url.pathname.endsWith('/') ? '' : '/';
      }
      return url.href;
    }
  2. Some frameworks allow you to force trailing slashes in a local dev server, like Astro’s trailingSlash:always configuration option. As you work on your site locally this will help make you aware of when an internal link in your site would have triggered a redirect5.

    But be careful! Don’t enable this setting if your project also includes Pages Functions and if the setting also impacts routing for deployed server-side code. This was the reason for the confusing issue described previously: the Pages Function routing code caused the endpoint at /api/s3 to return a 404. Since I did not have control over the client I needed to support calling the endpoint without a trailing slash. Disabling this setting made it work.

Routing details

These suggestions emerge from the way Cloudflare Pages handles routing. If a Function is included with the project then Pages prioritizes routing to the Function first based on the contents of the _routes.json file. If no Pages Function is executed (or one doesn’t exist at the route) then Pages falls back to static handling. Let’s look at Functions routing first.

Routing to server-side code via Pages Functions

Pages uses the contents of the _routes.json file to determine which URLs should trigger Function execution and which should fall back to static routing. Here’s an example _routes.json file from a project that includes an API endpoint that was built using Astro and the Astro Cloudflare adapter:

{
  "version": 1,
  "include": [
    "/api/*"
  ],
  "exclude": [
    "/",
    "/robots.txt",
    "/rss.xml",
    "/404",
    "/blog/*"
  ]
}

The excludes section dictates which paths should be served statically (the Function will not be invoked), and includes are the paths that will be routed to the Function.

Exclude always take priority over include. So any request to a file inside of /blog/ will always be served statically and will follow the routing for static pages discussed in more detail below. Likewise, the /index.html root page is served statically. But any URL in /api/, for instance, gets routed to the Function.

If the request executes your function the routing is then up to your built code.

Routing to static pages and files

If your page is fully static, or if a route has been excluded from handling by a Pages Function, Cloudflare Pages6 will resolve URLs in the following way.

Given the following file structure:

📂 /
┣ 📄 about.html
┣ 📂 contact
   ┗ 📄 index.html
┣ 📄 blog.html
┗ 📂 blog
   ┗ 📄 index.html

In short: HTML documents are served without any extension or trailing slash (/about serves about.html) and directories are served with a trailing slash (/contact/ serves /contact/index.html).

URLResponse
/about.htmlRedirect to /about
/aboutServe file /about.html
/about/Redirect to /about
/about/index.htmlNo redirect (404 or serves / if SPA mode)
/contact.htmlNo redirect (404 or serves / if SPA mode)
/contactRedirect to /contact/
/contact/Serve file /contact/index.html
/contact/index.htmlRedirect to /contact/
/blog.htmlRedirect to /blog
/blogServe file /blog.html
/blog/Serve file /blog/index.html
/blog/index.htmlRedirect to /blog/

No redirection happens if the corresponding HTML file does not exist. By default, Pages will serve the home page in this case (SPA mode). But if a /404.html page exists, SPA mode is disabled and Pages returns HTTP code 404 along with the contents of the custom 404 page.

Pages triggers redirects by responding to the request with HTTP code 308 (permanent redirect).

Footnotes

  1. 😑

  2. Redirects worsen the performance of a site because the client browser needs to do an additional round trip to the server before fetching page content. The degree that this matters varies greatly but it is especially relevant for high-latency connections like cellular data networks.

    From a search engine performance perspective, a redirect works just fine and in fact, a redirect does not impact PageRank directly. But it can be confusing to operate a site where the search engine believes the canonical URL is different from the URL you expect for a given page. Since redirects are a strong signal for a search engine to discover canonical URLs, it’s better to treat the target of the redirect as the canonical page everywhere.

  3. Manually clicking around simple sites to discover links is fine, but once the site grows to be more sophisticated you might add automated link checking with a tool like lychee.

  4. Some frameworks like Astro automate the creation of folders with a nested index.html page for you. Very handy.

  5. If you have a /contact/index.html page, for instance, Pages will redirect /contact to /contact/. With forced trailing slashes, visiting /contact in your local dev server will trigger a 404 but /contact/ will work as expected. This is an easy way to make you notice that a link doesn’t point directly to the canonical URL.

  6. This table only applies to Cloudflare Pages. Other static site hosting providers will behave differently, check out the slorber/trailing-slash-guide repo on GitHub.