Real Time Framework on AWS using Kinesis,Lambda and DynamoDB

As we are in bigdata era, Organizations continuously produce data, We do batch processing ETL’s and place it in decision making system (ex: warehouse) as nice structured format.
If we want to obtain key insights and capitalize the opportunities as they occur, We can’t wait for batch processing to complete. Real-time processing enables us to process the event/transaction as it occurs.
One of the classic example of real-time processing is ‘Credit Card Fraudulent Detection’/’Credit Loan decision making’ .

Kinesis

AWS/GCP/Azure are offering real-time message broker systems to publish and subscribe events on real-time. We can consider Kinesis as AWS managed alternative to Kafka. It allows you to ingest high volumes of data(IoT,System Logs, Clickstream) per second and have the flexibility to scale automatically based on the demand.

Managing Kafka cluster itself is a complex task, The alternative is ‘Kinesis’ AWS manages it for us and we simply need to develop client applications to ingest and read data from Kinesis

DynamoDB

Fully Managed, Highly available No SQL distributed database. Scales to Massive workloads. It Can handle Millions of Requests per second and trillions of rows and petabytes of storage. Fast and consistency very low latency retrieval. Widely used in big data world for

Mobile apps
gaming
IoT sensors
Log Ingestion
E-Commerce Shopping carts

Would like to demonstrate real-time processing framework using AWS big data tool kit apps Kinesis,Lamda and DynamoDB.

Use-Case

AWS EC2 server produces Transaction logs with Customer-order transactions and ingests into Kinesis Data stream, On the other side whenever data occurs in Kinesis data stream , Lambda function picks up and pre-process/transforms into DynamoDB Table.

Steps

  1. Create Kinesis Stream

Create Kinesis Data Stream

Let’s logon to EC2 server and create data steam by using AWS CLI

Create Kinesis Data Stream

Let’s verify the Stream created or not in AWS console

The stream is ready. Now whenever an event occurs (let’s say customer made a transaction) EC2 server captures and writes to Kinesis data stream.
Let’s assume Customer-Orders transaction format

CustomerID|OrderID|Product|SaleValue|Department

Create DynamoDB table

A DynamoDB table must have primary Key, A primary Key can be defined in two ways

Primary Key = Partition Key
Or
Primary Key = Partition Key + Sort Key

Partition Key : Unique Value . Always define unique / high cardinal values column as Partition Key (Ex: UserId,DeviceId,CustomerID)
Sort Key : If we want to arrange values in an order inside a partition Key, then define Sort Key (Ex: Customer can make many orders-Transactions. So OrderID can de defined as Sort key. So all transactions are grouped and arranged by OrderID)

Create DynamoDB Table

Define Lambda Function

Every time a new event created in Kinesis Data stream would trigger AWS Lambda function. We can write any custom code that suites our requirement . In this case, It’s basic ETL to load Kinesis data stream into DynamoDB table.

The below code snippet is written Python. You can choose your favourite from NodeJs,Python,Ruby… Anytime my favourite Python :)

AWS Lambda ETL (Kinesis Data Stream To DynamoDB)

Ingest Data into Kinesis

There are many ways to ingest data into Kinesis data stream . Kinesis Producer Library(KPL), Kinesis Agent, Third party applications like Apache Spark

I am ingesting using Kinesis agent (AWS CLI)

Verify DynamoDB table

Kinesis Data Stream Records are successfully pre-processed and ingested into DynamoDB by Lambda function

Hope it gives you some idea on Real time frame work on AWS!!

Data Engineer, Big Data and Machine Learning Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store