Preparing SEO for NextJS 13 site

The Context

Working on my personal website, I wanted to receive better indexing for individual articles on Google. I didn’t have any SEO set up on my site before this setup.

The Solution

There are 3 main steps:

Add a /robots.txt endpoint to your site which tells search engine crawlers which URLs the crawler can access on your site.

Add a sitemap.xml endpoint to your site which describes the exact available pages for a search engine to crawl. This can be done dynamically by:

Adding a new api endpoint to fetch some data and return XML.
Redirecting requests to the /sitemap.xml endpoint to the /api/sitemap endpoint by configuring your next.config.js

Going to Google Search Console and submitting your sitemap to be indexed by Google.

1. robots.txt

With Next JS v13.3.0 I can add an /app/robots.ts file which Next parses and serves as a robots.txt file:


import { MetadataRoute } from "next";
import { DOMAIN } from "./constants";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: {
      userAgent: "*",
      allow: "/",
      disallow: "/private/", // any paths you don't want to be indexed
    },
    sitemap: `${DOMAIN}/sitemap.xml`,
  };
}

Output


User-Agent: *
Allow: /
Disallow: /private/

Sitemap: https://jameshw.dev/sitemap.xml

2. sitemap.xml

I can then add the sitemap by creating an api endpoint at /pages/api/sitemap.ts:


/* eslint-disable @typescript-eslint/restrict-template-expressions */
import type { NextApiRequest, NextApiResponse } from "next";
import { serverSideCmsClient } from "api/services/cms/cms.client";
import { isArticle, isJournalEntry } from "types/guards";
import { DOMAIN, PATHS } from "app/constants";

const getSitemapRoute = (path: string) => {
  return `
    <url>
        <loc>${DOMAIN}${path}</loc>
        <lastmod>${new Date().toISOString().split("T")[0]}</lastmod>
    </url>`;
};

export default async function handler(_: NextApiRequest, res: NextApiResponse) {
  res.statusCode = 200;
  res.setHeader("Content-Type", "text/xml");

  // Instructing the Vercel edge to cache the file
  res.setHeader("Cache-control", "stale-while-revalidate, s-maxage=3600");

  const articles = await serverSideCmsClient.getDatabaseEntries(
    process.env.BLOG_DB_ID,
    isArticle
  );
  const journals = await serverSideCmsClient.getDatabaseEntries(
    process.env.JOURNAL_DB_ID,
    isJournalEntry
  );

  res.send(`<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
    ${Object.values(PATHS).map((path) => getSitemapRoute(path))}
    ${articles.map(({ slug, published }) =>
      getSitemapRoute(`${PATHS.BLOG}/${published}/${slug}`)
    )}
    ${journals.map(({ slug, date }) =>
      getSitemapRoute(`${PATHS.JOURNAL}/${date}/${slug}`)
    )}
    </urlset>`);
}

I have one set of static URLs stored in the PATHS constant and two set of dynamic routes - one for my journal and one for my blog. I want the sitemap to update automatically when I publish a new blog or journal.

Next I need to point the route of /sitemap.xml to the new endpoint I’ve created. I can do this in the next.config.js:


/** @type {import('next').NextConfig} */
const nextConfig = {
  ...
  async rewrites() {
    return [
      {
        source: "/sitemap.xml",
        destination: "/api/sitemap",
      },
    ];
  },
};

module.exports = nextConfig;

3. Google Search Console

Now I need to tell Google to index my site.

Go to the Google Search Console. I signed in with the same Google account with which I had the domain registered (I bought through Google).

Type in the link to the link to the location of the sitemap (https://jameshw.dev/sitemap.xml), and submit.

Click through to check that the sitemap was parsed successfully - initially my date formats were invalid.

After a few days, Google will have indexed the site!

The Result

Now when a bot crawls my website, they go to /robots.txt and find:

Which tells them where to find the sitemap.

Now navigating to /sitemap.xml gives: