Articles in this series
You've heard of DevOps, but have you heard of DataOps before? Let's dig in and unravel the vast world of data management. · Background The volume of data...
Apache Spark: A big data processing API · Introduction Big data workloads are processed using Apache Spark, an open-source distributed processing engine....
A technical introduction to the Data Build Tool (DBT) data management platform, its features, and its integration capabilities. · Background Three key...
Apache Kafka: A scalable adaptable real-time message broker. · Introduction Technically speaking, event streaming is the practice of capturing data in...
Apache Airflow: A powerful workflow orchestration platform · Introduction Apache Airflow is an open-source platform for authoring, scheduling, and...
A practical approach toward learning data science with the help of PySpark. Part 1: RDDs and DataFrames. · Overview In a previous article, we covered the...