GatewayNode

The Generator

So this blog is created using a static site generator called Pelican (note: I've never used Pelican before, so I'm learning as I go here). This is a Python app, so if you have Python installed it's pretty straight forward to get it working. Just use pip to install and run the quickstart command (Docs). One caveat you might run into is the themes and plugins need to be downloaded from Github and then linked into your website project in the pelicanconf.py file. The idea is to write your content in a simple markup language like markdown or reStructuredText and when you run the generator it creates all the necessary HTML pages, CSS and Javascript that you can host anywhere.

The Hosting

No server side logic means we can do something really interesting and desirable called "serverless" hosting. Now I hate the term "serverless" (and the term "cloud" by the way), as they are just marketing abstractions, there are servers under there, and their security is something you should be concerned about. But the end result is we can create a blog that would be very, very hard to hack and easy to maintain without spending a lot of money.

So I'm using AWS as it happens to be a platform that I really should be focused on for my day job, but there are lot's of other perfectly acceptable alternatives like Microsoft Azure, Google's Cloud and a bunch of others. So the site is hosted on my laptop using Pelican and then I just upload the files to Amazon's Simple Storage Service.

Note

A lot of folks out there will then dive into how to turn an S3 bucket into a website using the bucket's built in functionality and this is plain wrong. You should never, ever, ever, ever turn your S3 buckets into websites using the bucket's "Static Website Hosting" option. You should always host websites on a CDN that sits in front of your content, in this case Amazon provides the CloudFront CDN, there are others as usual like CloudFlare, Limelight, Akamai, but we are focusing on how to do this in an Amazon native style. The reasons are numerous. Top among these are cost, you pay for bandwidth for S3 at a higher rate than you do for CloudFront. And you don't pay for the bandwidth between S3 and CloudFront. Security is also a big concern: S3 doesn't allow you to host a certificate for HTTPS on a custom domain, CloudFront does; you can't add any security headers in S3, CloudFront allows this; and your bucket name get's put in the URL which is less than ideal. Functionally we also get Edge Lambda's for when we need something more like dynamic server side logic.

Operational Security (OpSec)

So starting with our operational security, all online accounts should be protected by two factor authentication if at all possible. If it's not possible, due to the hosting provider or tool you are using, change the hosting provider or the tool, this isn't something you should ever compromise on. Don't use your root AWS user to automate anything (for personal accounts on AWS your log in user is the root user for your account). You should create a delegate user (using AWS IAM) for console access that can only do what is absolutely necessary and remove the rest of their permissions. Create an even more restricted user for the automated S3 bucket access, with only a little bit of access to that one bucket. There are a lot of ways to screw up AWS and roles and permissions so I'm going to try to avoid that by very carefully setting things up. So let's start with how I setup the pipe from my local machine to S3. I created an S3 bucket with no permissions. I created an IAM user with no permissions. I created a bucket policy for the website S3 bucket that has just the permissions needed to list, load, update and delete content as necessary for the IAM user with no permissions. Nothing else, just 4 actions enabled out of the 80 or so S3 buckets have. The policy looks something like this:

{
  "Version": "2018-06-23",
  "Id": "Policy000000000000",
  "Statement": [
      {
          "Effect": "Allow",
          "Principal": {
              "AWS": "arn:aws:iam::000000000:user/someusername"
          },
          "Action": "s3:ListBucket",
          "Resource": "arn:aws:s3:::somebucketname"
      },
      {
          "Effect": "Allow",
          "Principal": {
              "AWS": "arn:aws:iam::000000000:user/someusername"
          },
          "Action": [
              "s3:DeleteObject",
              "s3:GetObject",
              "s3:PutObject"
          ],
          "Resource": "arn:aws:s3:::somebucketname/*"
      }
  ]
}

Note

There is a quirk in AWS S3 policies you might have noticed here. There are 2 declarations inside the statement list because one is on the bucket for listing and the other is on the objects inside the bucket for get, put, and delete. The policy generator will fail you here, so be aware.

I use the AWS CLI for syncing, but by default Pelican seems to want to use something else. I'd recommend always using the AWS provided native tools as their is more reliability and support there (install with pip). Anyhow, so I created a set of access keys for my user and placed them in ~/.aws on my laptop and just stick with an aws s3 sync [source] [target] in the ./output inside the Pelican project dir. So there we go, no credentials in code and access is limited to a least privileged policy and user.

Making the Website go Live in Production

Three pieces of the puzzle are left to get our site in the S3 bucket public: setup a CloudFront distribution to read the S3 bucket (it will ask you if it needs to setup the read permissions and user, usually just say yes to this and follow along); request an Amazon TLS certificate for CloudFront (requires validation in DNS or email); create an AWS alias in Route53 to point domain name requests to the CloudFront distribution (you have to wait for CloudFront to deploy first). This might seem like a lot of work when there are plenty of blogging platforms that just require a sign up and a few clicks and forms filled out, and it is, but after it is working I'll rarely have to touch it. There may be a little fiddling with the caching settings in CloudFront, I'll probably set two factor CLI access for publishing to the S3 bucket, auto-renewal on the TLS certificate will have to be turned on when it is available. But after those few post launch tweaks I should have nothing to do but focus on writing articles. The best part, if I get tired of blogging and take a long break (like I do) I can be pretty certain that when I come back to it nothing needs to be done to continue.

Note

While serverless is great like this, the fact that setup is usually done once at length and then forgotten... it's the forgetting thing that gets me. I do some things so rarely in AWS that I forgot how I did them the next time I have to do it. Not the worst problem to have honestly.