Shipping Amazon SQS and Logstash

What is Amazon SQS

The Amazon SQS is a Simple Queue Service allowing both FIFO (First In First Out), and non-ordered queing for any type of messages up to 256Kb each. It can be used both inter and intra region and will queue up messages when they're not being processed for you. This enables us to send log messages where ever we want and process them at a frequency of our choosing. It also allows for processing of logs to be stopped temporarily without losing messages in the mean time as would happen with direct log outputs. You can set the amount of time to allow them to queue up, as well both monitor queue statistics, and setup custom CloudWatch alerting around those queues to let you know if something gets out of whack. Costs are very low per message, but as with everything AWS, make sure to run your use case through some quick caculations to make sure that you're not going to see a large bill later. At the time of this writing, the cost was about $0.04 per milion records.

What is Logstash

[Logstash|(https://www.elastic.co/guide/en/logstash/current/introduction.html) is "an open source data collection engine with real-time pipelining capabilities." It will accept a myriad of inputs and outputs allowing us to quickly spin up both a producer (to push to) and a consumer of (to pull from) our SQS. The only current negative with using Logstash is that it does not support the FIFO queues. Which may or may not be an issue for you. We'll talk about the differences shortly.

FIFO vs. Standard Queues

There are two different options for Amazon SQS Queues, FIFO and Standard. FIFO stands for First-In First-Out. This means a message goes into the queue on one side and comes out the other side Exactly Once and In Order. Unfortunately, at the time of this writing, Logstash doesn't support this queue type, so we're stuck with Standard. To deal with that, you're end application must be okay with potential out order insertions and possible duplicate documents.

Creating an SQS Queue

As usual, Amazon makes spending money easy. Simply go to https://amazon.com/sqs and hit the shiny "Get started" button, or sign into the AWS console and choose it from the massive "Services" drop down. You'll then get the opportunity to setup a queue by chosing a name you like. It has to match the requirements Amazon lists out (currently alphanumerics, underscores and dashes). Once you've got a great name picked out, you'll either be able to click "Create Queue", or hit the "Configure Queue" button to check the default settings and tweak them where necessary. For this example, we'll just roll with the defaults as they're mostly sane.

Setting up IAM Permissions

In our case, we're going to be using logstash processes running in Amazon EC2. This gives us a few choices for our security validation to use the queues we setup. We can either pass the Access Key/Secret in via the logstash config, or we can use IAM Roles on the nodes that are running the logstash processes. IAM rolls are generally the prefered method as they're specific, done completely within the cloud, and don't require rotation/storage of an Access Key/Secret codem but if you're not doing this all within the EC2 setup I'll go into where to inject those pieces when we talk about the Logstash Configuration.

For our IAM configuration here, we'll want to head on over to the IAM section of the AWS Console and find our ec2 instance's role under Roles. If you don't have IAM roles setup for your EC2 instances, I recommend it. It's a good way to make sure that all the matching instances you have, have a consistent permission setup.

You'll need to add the following block to a new IAM Role, or add the { ... } block inside of the Statement section to an existing one (with a comma for json validation of course):

{
    "Version": "2012-10-17",
    "Statement": [ 
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "sqs:GetQueueUrl",
                "sqs:SendMessageBatch",
                "sqs:SendMessage"
            ],
            "Resource": "arn:aws:sqs:*:*:*"
        }
    ]
}

Logstash configuration for pushing data into SQS

The following block in our Logstash Config (generally found in /etc/logstash/conf.d/logstash.conf), will accept input for Beats (part of the ELK stack) and shove it into the SQS Queue. You can use any valid Logstash input in that section!

# Read it in from the local beats endpoint...
input {
  beats {
    host => "0.0.0.0"
    port => 5044
  }
}

output {
  # Push messages to our SQS queue
  sqs {
    id => "Our-Example-Logging-Queue-Id"
    codec => "json"
    queue => "Example-Logging-Queue"
    region => "us-east-1"
  }
}

If you need to use the Access Key/Secret method of authentication for the queue, you'll add the fields access_key_id => "<access_key_id>" and secret_access_key => "<Secret Key>" within the sqs block. Similarly, you'll need it below to get the data back OUT of SQS.

Logstash setup for pull data out of SQS

input {
  sqs { 
    queue => "Example-Logging-Queue" 
    id_field => "sqs_id" 
    polling_frequency => 5 
    region => "us-east-1" 
    threads => 4 
  }

  output {
    ...
  }
}

Summary

SQS is a great way to get some data from point A to point B where you don't have any presence in between. It can queue up millions of records for you to relieve strain caused by inconsistent insertion load, enable timed batch processing, or just simply sending data between aws regions or on prem datacenters.