Preparing SEO for NextJS 13 site

The Context

Working on my personal website, I wanted to receive better indexing for individual articles on Google. I didn’t have any SEO set up on my site before this setup.

The Solution

There are 3 main steps:
  1. Add a /robots.txt endpoint to your site which tells search engine crawlers which URLs the crawler can access on your site.
  1. Add a sitemap.xml endpoint to your site which describes the exact available pages for a search engine to crawl. This can be done dynamically by:
    1. Adding a new api endpoint to fetch some data and return XML.
    2. Redirecting requests to the /sitemap.xml endpoint to the /api/sitemap endpoint by configuring your next.config.js
  1. Going to Google Search Console and submitting your sitemap to be indexed by Google.

1. robots.txt

With Next JS v13.3.0 I can add an /app/robots.ts file which Next parses and serves as a robots.txt file:
import { MetadataRoute } from "next"; import { DOMAIN } from "./constants"; export default function robots(): MetadataRoute.Robots { return { rules: { userAgent: "*", allow: "/", disallow: "/private/", // any paths you don't want to be indexed }, sitemap: `${DOMAIN}/sitemap.xml`, }; }
Output
User-Agent: * Allow: / Disallow: /private/ Sitemap: https://jameshw.dev/sitemap.xml

2. sitemap.xml

I can then add the sitemap by creating an api endpoint at /pages/api/sitemap.ts:
/* eslint-disable @typescript-eslint/restrict-template-expressions */ import type { NextApiRequest, NextApiResponse } from "next"; import { serverSideCmsClient } from "api/services/cms/cms.client"; import { isArticle, isJournalEntry } from "types/guards"; import { DOMAIN, PATHS } from "app/constants"; const getSitemapRoute = (path: string) => { return ` <url> <loc>${DOMAIN}${path}</loc> <lastmod>${new Date().toISOString().split("T")[0]}</lastmod> </url>`; }; export default async function handler(_: NextApiRequest, res: NextApiResponse) { res.statusCode = 200; res.setHeader("Content-Type", "text/xml"); // Instructing the Vercel edge to cache the file res.setHeader("Cache-control", "stale-while-revalidate, s-maxage=3600"); const articles = await serverSideCmsClient.getDatabaseEntries( process.env.BLOG_DB_ID, isArticle ); const journals = await serverSideCmsClient.getDatabaseEntries( process.env.JOURNAL_DB_ID, isJournalEntry ); res.send(`<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> ${Object.values(PATHS).map((path) => getSitemapRoute(path))} ${articles.map(({ slug, published }) => getSitemapRoute(`${PATHS.BLOG}/${published}/${slug}`) )} ${journals.map(({ slug, date }) => getSitemapRoute(`${PATHS.JOURNAL}/${date}/${slug}`) )} </urlset>`); }
I have one set of static URLs stored in the PATHS constant and two set of dynamic routes - one for my journal and one for my blog. I want the sitemap to update automatically when I publish a new blog or journal.
Next I need to point the route of /sitemap.xml to the new endpoint I’ve created. I can do this in the next.config.js:
/** @type {import('next').NextConfig} */ const nextConfig = { ... async rewrites() { return [ { source: "/sitemap.xml", destination: "/api/sitemap", }, ]; }, }; module.exports = nextConfig;

3. Google Search Console

Now I need to tell Google to index my site.
Go to the Google Search Console. I signed in with the same Google account with which I had the domain registered (I bought through Google).
Type in the link to the link to the location of the sitemap (https://jameshw.dev/sitemap.xml), and submit.
notion image
Click through to check that the sitemap was parsed successfully - initially my date formats were invalid.
notion image
After a few days, Google will have indexed the site!

The Result

Now when a bot crawls my website, they go to /robots.txt and find:
notion image
Which tells them where to find the sitemap.
Now navigating to /sitemap.xml gives:
notion image