Understanding Spark

Spark is a cluster computing framework for building parallel and distributed processing on massive amount of data. It somehow replaces MapReduce, but yet is not as simple. A better Hadoop Hadoop MapReduce is effective on processing huge amount of data. It divides data into many tiny parts, processes them locally in parallel on a cluster, and produces an output. … Continue reading Understanding Spark

Advertisements

Kafka core concepts

Kafka is a messaging framework for building real time streaming applications. It allows to build distributed publish-subscribe systems. In this article we will present the core concepts of the framework. APIs One way to start with Kafka is to understand its APIs. There are four of them : The Producer API allows to publish records to a … Continue reading Kafka core concepts