To everyone out there, who wants to become a Data Engineer, keep following this blog as I am on the same path as you are. Interested in solving any data challenges (big/small). Having exposure on many tools and technologies is a nice to have, but what's must is to understand the underlying concepts or technical architectures or the internals of a tool. It makes us a better data engineer only if we try things out, learn something new, gain new tech experience. Only if we know what each tool does, the pros and cons of using it, only then we can select the right tools to solve the right problems. So I want to catalog all the learnings as it helps someone out there who is on the same path as me. Just sharing :)
Primary skills to become a data engineer:
1. Programming skills (Java/Python/Scala)
2. Querying Skills (SQL/Hive QL/Spark SQL)
3. ETL architectures (Batch/Streaming)
4. Data warehousing concepts / Database Design
5. Cloud computing (AWS/GCP/Azure)
6. Big Data (Hadoop/Spark)
7. Familiarity with scripting/automation - Python/Shell
Nice to have skills:
1. Versioning tools (Git)
2. Automating deployments (Jenkins)
3. Writing efficient stored procedures, functions (SQL) - Yeah I meant those 100's of lines of SQL code
4. Tools (Databricks, Pentaho, Sqoop, Online Editors)
5. Building data lakes and DWH's (really helps if we build using traditional approach and then try to migrate the same to cloud).