On the Blog
Using Apache Airflow and the Snowflake Data Warehouse to ingest Flume S3 data
Do you use Apache Flume to stage event-based log files in Amazon S3 before ingesting them in your database? Have you noticed
.tmp files scattered throughout S3? Have you wondered what they are and how to deal with them? This article describes a simple solution to this common problem, using the Apache Airflow workflow manager and the Snowflake Data Warehouse.
Data Integrity Goal
Your goal is to ingest each event exactly once into your analytic database during ETL (extract-transfer-load). You do not want to leave any events behind, nor do you want to ingest any event more than once. Otherwise, your event counts will be wrong. If we assume that any particular event is in exactly one log file, the goal becomes ingesting each log file exactly once. At Sharethrough, we have seen that this data integrity goal cannot be met without dealing with those darn
So You Want To Build a Keyboard
My journey begins with me already owning two keyboards, one for work and one for home. I was by all means already pushing the boundaries of minimalism. But then I saw The WhiteFox and I knew I simply had to have it in my life. And the only way I could justify getting a third keyboard (and get the...
Increasing Your Happiness in Meetings
Engineers moving into leadership sometimes have a hard time navigating the increased demands for meetings. And when meetings waste your time, or go off the rails, it hurts your team’s productivity and even affect their ability to work together. How can you make meetings better? In her talk, “Three Ways to Grow Your Happiness in Meetings,” Marcy Swenson offers great strategies for making meetings vital and worthwhile.
Humans are inconsistent and unpredictable by nature. This can come in the form of conflicting requests from upper management or from engineers who want to change teams only to want to quickly change again. Our job as leaders is to accept that fact and deal with the consequences.