Issue! Duplicate without user-selected canonical

Opsie... So, google spider (more of an octopus) has been checking your site, and it found the above issue. That means at least the following things:

  • You didn't specify a canonical metatag or header
  • The same resource is reachable from more than one URI

Google elaborates more on this topic here, here and few other hundred pages so if you want a scientific definition for it you can find it there.

So what's the problem?

Aaand?

Well if the crawler finds out that you are messy and don't organize your resources as per their Uniform Resource Locators it will get totally mad and choose not to index your page.

But maybe you have resources that should be present in more than one single URL, and that's legit. i.e., consider a search in some commerce site where more than one filter returns the same set or subset of products

  • query one: /laptop/msi/the-id-here
  • query one: /laptop/msi?memory=32
  • query two: /laptop/msi?memory=32&color=black

In this sample, the result of the second and third queries might return a subset of the same product as the first one. So, which one is the true source of it?

That's where the canonical URL comes into play. You set which one is the definitive source of some resource, the crawler saves it, and everyone is happy.

In my case the issue was very simple, the same page would render with or without a trailing slash, and there's an entire saga of to-slash-or-not-to-slash in google docs.

The options

ASP.NET Core documentation

There's this cool document here on Microsoft docs on how to do this in IIS and Apache, makes use of RewriteRules NuGet package and it's open source here https://github.com/kyleherzog/RewriteRules.

Typical public app HTTP request goes through these relevant nodes:

[client] => [api-gtw | cache] => [lb] => [reverse-proxy] => [application]

The principle is that the sooner we let the client know of the new location, the better. I'm not using IIS or Apache hence we cross this option.

Configure Nginx to remove the trailing slash

If you're using nginx, and your setup allows you to play around with the configuration, then you'll need to add a rewrite directive in the root location. This needs to be the first one to catch all requests. You do something like this:


location / { 
    rewrite ^(/.*?)/?$ https://super-duper-domain.com$1 permanent;
}

  • pros: You'll have the redirect instruction without ever reaching your application endpoint
  • cons: Nginx (infrastructure) will have to know about SEO
  • gain: Saves the time spent going from Nginx to application and back

InCode change

Performance wise this is the worst place to resolve this issue, but depending on your needs it might be the only place it makes sense for you to do it. One sample where it might be worth doing it here is when there's a logic attached to the decision of whether we need to trail-slash or not. In such cases a filter is a good option, you can use it to decorate specific actions or controllers that need to apply a trailing slash or remove it. See more about filters in the documentation.

My case doesn't need any special logic attached to it, so the first place eligible to issue a permanent redirect toward the canonical URL is middleware.

One option is to make use of the Rewrite NuGet package, in that case updating the program.cs as follows makes it work:

using Microsoft.AspNetCore.Rewrite;
using RewriteRules;

// ... other configurations

var app = builder.Build();

// ... other middlewares to go before this (i.e., forwarded headers)

var options = new RewriteOptions().AddRedirectToCanonicalUrl(new CanonicalUrlOptions
{
    TrailingSlash = TrailingSlashAction.Remove
});

app.UseRewriter(options);

If you don't like to add a package for it you can also write a tiny middleware function such as:


// ... other configurations

var app = builder.Build();

// ... other middlewares to go before this (i.e., forwarded headers)


app.Use(async (context, next) =>
{
    var path = context.Request.Path;
    if (path.HasValue && path.Value.Length > 1 && path.Value.EndsWith("/"))
    {
        var nonSlashPath = path.Value[..^1];
        if (context.Request.QueryString.HasValue)
            nonSlashPath += context.Request.QueryString.Value;

        context.Response.Redirect(nonSlashPath, true, true);
        return;
    }

    await next();
});

My choice

The only available public route to reach my app is through the CloudFront edge node, this makes CloudFront Functions a very good candidate to do this little change

Cons

  • It moves app routing concern to a far-away infrastructure level

Pros:

  • Runs at the edge, so it's the nearest available network node capable of doing a redirect request, it couldn't be any faster than this
  • Serverless solution that scales to 10,000,000 requests per second or more
  • Attractive cost at only $0.1 per million invocations after the first 2 million invocations covered under the forever free tier

To do this head over CloudFront console and add the trailing slash function

function handler(event) {
    var originalUrl = event.request.uri;
    if(originalUrl.length > 1 && originalUrl.endsWith('/')){
        var newUrl = originalUrl.substring(0, originalUrl.length-1);
        
        var rawQuery = "";
        var cfQuery = event.request.querystring;
        for(var prop in cfQuery){
            rawQuery += `${prop}=${cfQuery[prop].value}&`;
        }
        
        if(rawQuery.endsWith('&')){
            rawQuery = rawQuery.substring(0, rawQuery.length - 1);
        }
        
        if(rawQuery){
            newUrl += `?${rawQuery}`;
        }
        
        var response = {
            statusCode: 301,
            statusDescription: 'Found',
            headers:{ 
                    "location": { "value": newUrl }
                }
        };

        return response;
    }
    
    return event.request;
}

Use the test functionality and verify that it works on your required case, then assign on your desired distribution