Source: Google Images

I assume that you are familiar with how spark runs the job, basics of distributed systems, current utilisation of cluster, job SLA, resources details etc.

There are mainly two ways we can optimise our jobs:

  1. Application Code
  2. Cluster/Resources Configurations

There are lot of things we can do in spark while writing a job which i might also not be aware but i am trying to cover some important standard that each should follow according to my experience for better resource utilisation and execution :

A. Caching : Let suppose we are reading data from MySql through spark JDBC connector as…

I am a technology enthusiastic and trying to solve data puzzles

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store