Mastering Data Engineering: Insights and Best Practices
Introduction
I have been working with Data for a bit over 17 years now, I have seen it evolve from its nascent stages to a cornerstone of the tech industry. The journey has been nothing short of revolutionary, impacting businesses and society at large. The evolution and the role of a data engineer have expanded, requiring not just technical skills, but a deep understanding of business, security, and the human element within technology.
I wanted to share what I’ve learned; ideally, it’s a guidebook for those embarking on or navigating through their data engineering journey. Whether you’re a junior engineer looking to carve your path, a senior engineer seeking to deepen your expertise, or a mentor guiding the next generation, the lessons shared here are meant to inspire, challenge, and perhaps even change the way you view your role within the data engineering landscape.
1. Everyone is a Leader
When we think about leaders, we tend to think about CEO’s, Managers maybe politicians, and coaches. However we don’t always think about ourselves.
Leadership in my view is;
- Volunteering at the local school.
- Speaking encouraging words to a friend.
- Holding the hand of a dying parent.
- Signing up for your school council/committee.
- Driving the single parent’s kid home after practice.
- Taking care of yourself, and empowering others to do the same. I don’t think it is a position in a company, it’s an inherent power we should all step up and claim.
I learned this a bit late in my career when I was encouraged by a few senior leaders to lean into solving problems. Initially, it was something simple like helping the team build trust with each other, and unlock each team member’s skills and the value they brought to the team. Making people feel more comfortable in speaking up, included and heard.
Then it evolved into doing the same but at a larger scale. Seeing various leaders create complex plans, guiding and mentoring other staff. Creating and collaborating on policies and procedures. All of this happened at some point because a leader made it happen. These leaders aren’t Managers, they are just regular staff. However, their passion and ownership to drive it lead to it happening.
When this is unlocked correctly, it can have remarkable power within the organization.
2. Show your work?
You don’t have the be a genius. Normally ideas are formed via collaborative effort, but for that collaborative effort to surface you need to be brave enough to share your ideas. Once you share those ideas, collaboration starts to form. Others come and add to your idea, and maybe the output is better.
I was working on an idea around a problem with Slowly Changing Dimension table that works off insert insert-only mechanism, avoiding the update component. The idea made sense in my head, I shared this with some of the Engineers in my squad, and they took the idea and improved on it. Created a working version that we were able to implement. I was so proud of the team, and what they achieved with that initial idea. In moments like this, where you don’t need to know all the answers, but sometimes having the initial seed idea and talking about it, gives opportunity for improvement.
I think there is a lot of other dimensions to showing your work, even if you share just a little bit each day. It improves your relationships with others (via collaboration), and it allows you to develop and harden to feedback or criticism. It also improves your ability to tell stories, and compel others - which can have dividends of improvements to your future.
I feel adopting this early in my career really helped shape me as a person.
3. Make failure your fuel
I think not only I but people I have worked with sometimes can be consumed by the thought that if they fail, it will be terrible. We need to not worry about “What if I fail?” but rather promise, that “If I fail, I will stick around and fix it”
I remember one day at work when I was leading a team. The team was working on an incident where a table was inserting duplicate records, so it was a rather simple fix to remove the duplicates, and fix the root problem (in this case the recently released join, that was causing the issue). The table was rather large table with around 250,000,000 new rows per month, with around 7+ years of data in the table. However, the impact was just 1-2 days of data (the most recent records).
The team and I walked through what we needed to fix, and the steps and actions required. I logged off for the day, as they were comfortable with implementing the change. A few hours later I got a message “All the data is gone”. I logged on and had a quick call with the team, thinking in my head that they were likely over-reacting. We walked through the implementation, and it all looked correct from a high level, but all the data in the table was missing. In the script, instead of deleting the duplicate records. They had scripted it to delete all the non-duplicate records.
I felt a strong overwhelming sickness, as what had happened hit reality for me. I double-checked the data, with a query to make sure I was seeing things correctly. In the moment I felt that I was going to get fired for this. However, I stuck with it and connected with all the relevant people. In the past 7 years, we never had an incident like this happen, as we typically have all the right checks and procedures in place to prevent it from happening.
The other managers and tech partners were amazing, and other getting on a crisis call, we managed to restore the table from a backup within about 2 hours of downtime, and all the data was restored. I was impressed with everyone around me, and not at any stage did anyone blame anyone, it just fuelled us to get it right, getting it fixed now.
Post this incident we did a full Post Impact Review, to figure out how we need to tighten up controls to prevent these things happening. But more broadly it opened my eyes, to it’s ok to fail, as long as you stick around and fix it. Over the past years, nothing as bad as this, but I have had team members, other management peers, or senior leaders who have had failures along the road. Being there and supporting them at the moment is so important, but also allowing it to fuel you in preventing this from happening again is the key driver, over blaming.
I feel that having failures makes us stronger, it allows us to grow. However the key is it needs to fuel that growth, if you have the same failures over and over, then something is a miss.
4. Ownership and Accountability
Similar to the previous statement. When things go wrong, it is always easy to find someone else to blame, or it was bad luck or circumstances beyond my control. However when you blame others, who solves the problems? No one. So the problem is still there and they get worse.
However, if you examine what more you could have done, and implement a solution then problems get solved. This is what ownership is all about.
Great teams are constantly looking to improve performance, add capability, and push the standards higher. They aren’t satisfied with their performance.
- But my team isn’t qualified to do the job. OK, what can I help with to improve their skills?
- My boss doesn’t like me, I will never get promoted. Ok, what can I do to improve that relationship?
- My coach doesn’t put me in the game. Ok, what can I do to prove myself?
- I have knuckleheads in my team? Ok, why are they there, have you trained them?
I have a rather great mentor, that I’m still constantly learning from. This was one of the skills that landed with me. You need to Be Humble, admit your mistakes and failures, and create solutions but also take ownership of those solutions. You need to give ownership to others and encourage and empower others to lead and plan. Listen to everyone. Make sure everyone is clear as to “why” they are doing what they are doing. Train your team, and give them the skills to accomplish the project.
This isn’t something that I have mastered, it’s more “in development”, however, I think you need the right people around you to also help you take a zoomed-out view and get perspective on things. It’s like Chess sometimes we are so on the board, as the pieces are being moved around, but really we need to take that zoomed-out view of the board to understand what is happening. Remove emotions, and just really get the root problem, and how can you lean in to resolve it?
5. Data Resilience and Quality
As our data assets grow, and in particular our teams grow. The volume of collaborative people working will scale also. Naturally, more people working on the codebase, means that someone can implement something that will impact the code or pipeline you built 3-5 years ago.
The most important elements for the consumers of the data are - that the quality is correct, and that it is fault tolerant to either resolve issues automatically, or it can alert you when something isn’t correct. You don’t want to be in a position where you’re overwhelmed with alerts of messaging, but you want the ability to be proactive to issues and understand them.
One of the most common scenarios is when a source team, that you’re pulling data from changes their schema or adds a column, stops populating a column, or it was a string now it’s a decimal. You can have dependency agreements in place or contracts - but it still can occur, and take you offside.
So within reason, we want to be able to confirm;
- row counts or trend distribution meets the source with the output we have.
- data types are correct
- data population is within the bounds, categories or expected thresholds. Are there nulls, or blank values etc.
We can expand on the list, but we need to make sure we understand the quality of what we have. This allows us to keep trust with our consumers, by being proactive - but it also allows us to resolve the issues quickly.
Additionally, we want the ability to make our data resilient to change. A classic example is the division by zero error. We want controls in place to mitigate an issue occurring like that, or being able to handle it. We can also look at more advanced mechanisms around handing late arriving or data that is out of sequence. Preventing duplicates when processes are re-ran as well - when there is an incident, we want things to be dumb-friendly. We don’t want people thinking about 100 different scenarios when there is pressure to resolve data contamination or errors.
6. Building Relationships
I’m not a social person, I love grabbing a book and curling up against a fireplace and having a good read. However building relationships with peers, subordinates or people further up the chain is so critical. At the minimum, you should try to actively catch up and talk with these people. However, a more proactive return on investment is to work closely with them on something.
If it is someone within your team, then work with them on shadowing, mentoring or brainstorming on items together, offer to help work on something for the team that no one likes doing, but needs to get done. These micro-transactions will allow you to build trust and relationships fast and quickly (far more quickly than a Coffee catch-up each fortnight)
If it is a peer, then it might be looking at opportunities that you can collaborate and work together on delivering something. If their team is working on a high-value item for the organization, then maybe it’s a matter of leaning in to assist.
If it is a manager or leader, above you - maybe it would be taking accountability from them, and offering to assist. Maybe there is a meeting that is coming up next week, that you could chair, or maybe you could take notes. Maybe there is something you can follow up on an action item.
The idea here is, that making someone else day easier or helping, allows you to build trust with that person, and improves the relationship. Both parties will be seen as having a common shared objective and people will organically either know you have there back, or be able to count on you when things get hard. However they also will learn and understand how you react to things, what skills or value you bring to the table, and will know how you can help when needed.
7. Plan, but don’t over plan
When we work on any data project, it’s important to plan out what needs to be done. I have seen it all;
- Teams that don’t plan at all, and none of the team members know what to do.
- Teams that run into planning paralysis, and keep refactoring the plan and delaying the start till they have a solid foundation that they can deliver from
I think the sweet spot is in between, you need something that everyone can know what direction they are heading towards, and what needs to be done. However you don’t need a detailed plan of every join, business rules or logic. If there is uncertainty then just time box the time that is spent on those items.
8. Create your second brain
I don’t know about you, but my brain has limited storage. One item I find useful is having a git repository of;
- Common Queries
- Common Questions Asked
- Snippets of code like test harness for testing scenarios on a test data set, or implementation of certain code blocks, spark jobs etc
- Performance, Incident Queries used to quickly understand or interrogate logs
To add a layer to this, especially around the common questions. You could add this to a list, to give to a LLM like ChatGPT, and ask it to formulate an email response for you. Enabling you to answer questions fast and quickly. If you had a bit more time, you could also incorporate that into a chatbot functionality on the Slack channel.
The idea here is about having knowledge that is quickly accessible, so it doesn’t take a lot of time for you. Sometimes you might know the answer, but if it takes 30 mins to 60 minutes of your time to write up a query or explain, then that time out of your day delivers value.
You can also layer this onto the “Show your work” mentioned above, to make sure your collaborating a knowledge base with your peers.
These are some of the guiding lights for me. The important elements that have stuck out to me. Hopefully, some of these you find useful as well.