Kafka Internals
Prerequisites:
Basic Java experience (This can be modified for a C# or other language environment)
Familiarity with command line interface
Preferably Mac or Linux operating system
Course Objectives:
Gain a deep understanding of Kafka's internal architecture and components.
Master the use of producers, consumers, and brokers for data streaming.
Build real-time data pipelines using Kafka Connect.
Effectively leverage Kafka developer APIs to interact with the platform.
Course Length: 3 days
Course Content:
Module 1: Introduction to Kafka Internals
Kafka Fundamentals:
Overview of Kafka as a distributed messaging system for real-time data processing.
Key terminology used in Kafka (topics, partitions, replicas, brokers, consumers, producers).
Kafka Architecture (Deep Dive):
Logical Architecture: Understanding the abstract components and their interactions.
Physical Architecture: Exploring the real-world implementation of Kafka components.
Hands-on Labs: Configuring and deploying a local Kafka cluster.
Kafka Internals:
Brokers: Responsibilities and functionalities of brokers within the cluster.
Producers and Consumers: Mechanisms for sending and receiving data streams.
Partitions and Topics: Data organization and distribution within Kafka.
Replication Mechanism: Ensuring data availability and fault tolerance.
Message Delivery Semantics: Understanding different message delivery guarantees.
ZooKeeper/Kraft Integration:
Role of ZooKeeper and Kraft in Kafka cluster coordination and management.
Exploring Kraft/ZooKeeper's basic operations and interaction with Kafka.
Module 2: Kafka Administration and Advanced Topics
Kafka Administration:
Key considerations for effective Kafka cluster management and monitoring.
Tools and techniques for monitoring Kafka cluster health and performance.
Kafka Security:
Implementing security measures like authentication and authorization for secure access.
Performance Tuning:
Strategies for optimizing Kafka performance based on hardware and data volume.
Kafka Integrations:
Building real-time data pipelines using Kafka Connect for data ingestion from various sources.
Hands-on Labs: Constructing an end-to-end streaming ETL pipeline utilizing Kafka Connect.
Module 3: Kafka Developer APIs and Advanced Concepts
Kafka Core APIs:
Producer API: Deep dive into producer configurations, message delivery mechanisms (sync/async), and message acknowledgment strategies.
Consumer API: Understanding various message delivery semantics and configuration options for fine-grained control of consumer behavior.
Hands-on Labs: Working with producer and consumer APIs to send and receive messages in Kafka.
Kafka Streams and KTables (Optional):
Introduction to Kafka Streams and KTables for building real-time stream processing applications.
KSQL (Optional):
Overview of KSQL DB for stream processing using SQL-like syntax.
Exploring functionalities like KTables, global KTables, and REST access.
Application Integrations (Optional):
Integrating Kafka with Spring Boot or Node.js applications for data ingestion and processing.
Note:
This is a sample outline, and the specific content covered may vary depending on the instructor and course provider.
If desired Zookeeper content can be 100% replaced with Kraft
Optional modules (KSQL, Application Integrations) can be included based on available time and student interest.
Additional Considerations:
While this course does not cover KSQL in depth, a basic introduction is included as an optional module.
Hands-on exercises are integrated throughout the course to provide practical experience with Kafka concepts and APIs.