Hacker News

Ask HN: Best resources, books, & courses on Data Engineering?

Hi everyone, I'm searching for resources for learning data engineering, but I can't find anything on google because of SEO bullsh ruining the results.

My background is a Computer Engineering undergrad (lots of Java & Clojure, and an intro course to Hadoop), so I'm not starting from scratch. I want books/online resources that are up to date, and extremely practical from the perspective of someone doing big data in tech. E.g. Algorithms for Big Data, Hadoop, Spark, Kafka, Cassandra, Scala, etc.

I really appreciate all suggestions, and comments on the quality of others' suggestions.

16 pointselamje posted 11 days ago3 Comments
3 Comments:
nw__dataeng said 11 days ago:

I'd highly recommend reading [Designing Data-Intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications...). The book gives you a great overview of designing data systems - foundational knowledge you'll need in any DE role.

The reason you can't find data engineering materials online is because real data engineering really only happens at a handful of companies - and those companies maintain this knowledge base internally and do not share it.

I noticed that you listed tools / frameworks to learn, as well as languages. Another piece of advice would be to not focus on those because they come and go (for example, Hadoop is pretty much deprecated in any DE-heavy company). What lasts is an understanding of distributed systems, distributed query engines, storage technologies, and algorithms & data structures. If you have a firm grasp on those, you won't have to start from scratch every time a new framework is introduced. You'll immediately recognize what problems the tech is solving and how they're solving it, and based on your knowledge you can connect the dots and know if that solution is what you need.

Another thing to do is watch CS186 from Berkeley in its entirety. This course is about relational databases, but will give you the foundation you need to speak the DE language.

Source: I work as a data engineer at what some would call a big company :)

elamje said 10 days ago:

Great advice! I actually got that book last night as I researched more. I’ll be looking into the Berkeley class as well!

mindcrash said 10 days ago:

List of resources here:

https://github.com/adilkhash/Data-Engineering-HowTo

And here is a (free) book you might like:

https://github.com/andkret/Cookbook

"I get asked super often how to become a Data Engineer. That's why I decided to start this cookbook with all the topics you need to look into.

It's not only useful for beginners, professionals will definitely like the case study section."

Also +1 for Kleppmann's book mentioned below. That thing is awesome.