Google Cloud Professional Data Engineer Certification — My personal road map and thoughts in 2020
[WITHOUT] Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.
Recently, I took and successfully passed in the Google Cloud Professional Data Engineer certification. That’s pretty tough exam, but also valuable. Here I will share my experience, exam preparation tips and overall thougts about the exam itself. I won’t give you the questions an all exams answers.
This article will point out a few things you may want to know and the steps I took to acquiring the Google Cloud Professional Data Engineer Certification.
1. My personal Road Map.
I started my journey in Cloud/Big data with IBM tecnologys such as IBM biginsights and IBM Cloud. I followed all free paths courses in Cognitive Class. There are lot of amazing recourses there and I really encorage you to do and pay atention in the labs, becouse its really were your going to dive into the tecnologys details (hint: recourses take too long to spin up in IBM Cloud, let resources been createing while you study the lessons and then complete the lab)
You can see in this link all the badges and paths that a got last year in order to learning a bit of Apache Foundation main projects.
Than, I enrolled in a Google Cloud Partner Program with no ambition, I thought that I had paying something to get acesses to the courses and programs, but turned out that I had full acesses coursera courses in Google Partner program and Qwilklabs credits. So, I started to study GCP with nothing in mind just to learn something with one of the best data companys in the world.
Someday I watched a Google webinar about free voucher exam, then I just did the challenge to get the 200 US$ voucher for data engineer and puff… I got a data engineer voucher. Since then I keeped a studying routine more harder and get most from Coursera courses. I did three specialization, two in Data Engineer and the other in Machine Learning in coursera plataform.
Those were the resources that I used specifically for GCP:
- Data Engineering, Big Data, and Machine Learning on GCP
- Data Engineering with GCP
- Machine Learning with TensorFlow on Google Cloud Platform
- Fresh courses like (maybe the next specialization patch):
- Smart Analytics, Machine Learning, and AI on GCP
- Building Batch Data Pipelines on GCP
- Building Resilient Streaming Analytics Systems on GCP
- Modernizing Data Lakes and Data Warehouses with GCP
- Qwiklabs for pratic GCP resources and get familiar(I took like 100 labs in Qwiklabs)
- Udemy — GCP Complete Google Data Engineer and Cloud Architect Guide
I took this Udemy course as a complement because I were on schedule. (I don’t encourage a lot because there are some very old informations)
lot ofs lectures from GCP events and releases:
I used this playlist but the really big take away here is just search keywords on the channel to access very nice resources.
- Google documentations is very painful and sometimes hard obscure but it’s essential.
You need at least get a feeling of what docs are arranged and where can you find useful informations for your need.
- 2 x The free Practical Simulation offer by Google.
- Learning from others Medium Article.
2. Impression, thoughts and tips.
I’m not a native speaker but I keep practicing every day so don’t give up especially if you were born with the English language. I didn’t have any trouble with the english exam, There are plenty time to complete the exam ( 120min for 50 questions), you can mark question to revision and go back and forth.
It’s wide and complicated the content of the exam and the questions required technology’s detail level. That’s crazy !!!. But you will face multiple choice questions and you can quickly discard 1–2 alternative, but the 2–3 last alternatives difers by tecnical detail.
Pay lot of attention to keywords and key phrases like: Cost-effective, Less invasive, Cost is a issue, ANSI SQL, Milli Secs, Secs, IoT Devices, Streaming, Analytics and son on. Those key points really matters to chose the correctest alternaltive.
Most part of my exam were focused in ETL problems, So all the optimizations techniques and best practices with, Dataproc, Pubsub, Dataprep, Dataflow, and Bigquery is essential. Another focused part were differentiate between Cloud Composer, Scheduler and Cron jobs. Last but not least, heavily technical questions for Bigquery optimization with partitioning, slots management and clustering techniques.
2.1 What did I miss ?
Doesn’t mean that weren’t in the exam, but were side questions with high level of technical knowledge.
Questions related to tensorflow and over all machine learning techniques were very shy in the exam. Shying 2–3 questions about AI APIs and AutoML I did expect some questions related MLops with Kubeflow and orchestrating ML pipelines (Maybe in DevOps Certification).
Zero questions about Data Studio, Data Fusion. Lack of questions to designing windowing in Pubsub + dataflow (I studied carefully and implementing windowing techniques)
2.2 Tips
The questions is more like the free Practical Simulation offer by Google but 3x harder + 4x complexity. In my exam I faced 2–3 really similar or equal the simulation. Make sure to take the simulations and read the documentation about the questions.
Overall considerations, be prepared with all resources that you can. It’s very wide and complex the data engineer knowledge. Don’t push to hard, knowledge take time and lots of practical experience.
The resources that I used is pretty much all the content present in exam guide, but there are other famous resources and so on, feel free to prepare yourself in your terms. It’s all just a matter to study hard and be consistent.
3. What would I change if I went to do it again ?
With my available resources I think wouldn’t be much different. Maybe I would increase the confidence in some questions and take 1–2 new correct questions if I had more time.
There are lot of technologies and techniques that I studied that weren’t convert in the exam, but as I always say “Knowledge is never enough”.
But I definitely would study more about the different applications for Cloud Composer vs Scheduler vs cron job in different jobs kind. Of course I know in a big data/ETL scenario I would prefer Cloud composer. I would study more IAM best practices to be confident with my answers and TPU vs GPU implementation in code (I have some experience in launch TPU environments but the specific question in the exam got me).
As my exam didn’t demand deep knowledge in windowing techniques and kubeflow. I wouldn’t take too many hours studying it.
4. Before/After the exam
I had to fly 2 hours to go to the test center (Currently my city, Brasilia, doesn’t have a test center for google’s exam). I barely slept in that day, because I woke up 4 a.m to go to the airport to take the 6 a.m flying (I couldn’t afford to lose the flying or the voucher).
Then I took the exam 12 p.m and finished 1:30 p.m. When you complete the exam you’ll only receive a pass or fail result. That’s scare !!!. So I don’t really know the minimum score for pass.
Once you’ve passed, you’ll be emailed next day with a redemption code alongside your official Google Cloud Professional Data Engineer certificate.
5. Whats next ?
I started my AWS Path and within 2–3 months I wanna make other article about AWS Certified Solution Architect — Associete and later on this very same year AWS Certified Solution Architect — Professional. That’s my goals for 2020 in Cloud world.
6. Fun facts
Were my first certification at all. I’m a young professional with less than a 2 years of experience in TI World. Took me almost 7 month since my first contact with GCP to take this exam. So If I can you also can !
“All the roads lead to the Cloud !”