Skip to content

Kinesis 101

What is streaming data

Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simulatenously, and in small sizes (order of KBs)

  • Purchaes from online stores (think amazon.com)
  • Stock prices
  • Game data (as the gamer plays)
  • Social netowork data
  • Geospatial data (think uber.com)
  • IoT sensor data

What is Kinesis

Kinesis is a platform on AWS to send your streaming data to. Kinesis makes it easy to load and analyze streaming data, and also providing the ability to build your own custom applications for you business needs.

Three Core Kinesis Services:

  • Kinesis Streams
    • Producers produce data (e.g. EC2, Mobile Devices, Laptops, etc.), which is captured by Kinesis Streams. The data by default is held for 24 hours but you can increase this for up to 7 days. The data is stored in what's called a "Shards". Once the data is stored in the Shard, a fleet of EC2 instances (called Consumers) pick the data out of the Shards and process it. Once the calculations are complete, they can forward it to things like DynamoDB, S3, EMR, and Redshift to be stored.
    • Streams consist of shards, giving you 5 transactions per second for reads, up to a max total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a amax total data write rate of 1 MB per second (including partition keys).
    • The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.
  • Kinesis Firehose
    • Like Streams, you have producers. Firehose captures the data. It doesn't concern itself with shards or streams; it's completely automated. You don't even have to worry about consumers mining the data. Added analysis layer (using Lambda) is optional. The data is then sent to S3. There is no retention period to worry about. Once data is consumed and optionally analyzed, it's sent over to S3.
  • Kinesis Analytics
    • Allows you to run SQL queries of data as it exists within Firehose or Streams and use the query to store within S3, Redshift, or Elasticsearch Cluster.

Exam Tips

  • Know the difference between Kinesis Streams and Kinesis Firehose. You will be given scenario-based questions and must choose the most relevant service.
  • Understand what Kinesis Analytics is.