ETL, big-data, cloud(AWS/GCP/Azure) technologies and possibly share random stuff along the way!!
Wednesday, August 9, 2023
AWS Connect: Reporting and Visualizations
Thursday, May 5, 2022
SoleTechie: Setting up Gitlab
- Created a gitlab group:
- Group name: SoleTechie
- Group URL:
- Visibility level: private
- Group ID: 52826632
- Project name: cicd-demo
- Project URL:
- Project slug: cicd-demo
- Project description: Setting up git lab runners and trying to implement the CI-CD flow
- Project deployment target (optional): Infrastructure provider (Terraform)
- Visibility level: Private
Friday, December 17, 2021
Getting started with AWS
We will learn how to setup AWS account, how to access AWS resources using AWS CLI, how to leverage VS Code to view AWS resources.
AWS documentation links for getting started guides:
Setting up AWS account:
1. Create Amazon Free Tier account:
- Provide your details (email, username, billing information, and make sure you select basicsupport-free option).
- Upon successful signup, we will be seeing a confirmation like this:
2. Signin as root user: provide your login information (email, password) and we will be able to see our aws dashboard.
3. Access AWS Management Console:
4. Follow this great documentation provided by AWS:
Using the above documentation link, we can find the best practices on how to setup our AWS cloud account.
Let us start making progress by following this guide:
Setting up AWS environment:
- I used "Google Authenticator" app as my MFA device. Scan the QR Code using the app, and enter 2 MFA codes. Once we successfully add the device, we can see our device under MFA.
- once we add the MFA, we can see the IAM dashboard as:
Create IAM Group ->
Now, we can proceed with creating user groups. As it is not advised to use root user for everything. We have to follow the least access privilege principle to keep our accounts more secure.Enter user group name: admins
Attach permission policies: search for "administrator access" & select it
Now we can see admins group getting created.
select "Add users"
username: soletechie
enable both programmatic access (to use AWS resources using CLI) and password - to access management console.
- we can create our account alias if we don't want to use our account ID to login to AWS console
- To create alias, go to IAM dashboard, and to your right, you can find your AWS account ID information, where you will have the option to create alias.
- Aliases must be unique, once you give a unique alias name, you will be able to sign in to AWS management console using this alias.
Setting up AWS-CLI:
- Use this link to setup AWS CLI (latest version v2) based on your operating system:
- To download macOS package file:
- once you run the installer, we will be able to see the software installed successfully.
- To verify if AWS CLI is successfully installed:
- type command - "aws configure" and provide your access key id, aws secret access key, default region name and default output format.
Setting up Cloud9: (use this only if we want to use browser based development tool)
- use this link to setup Cloud9.
- cloud9 is a free cloud based IDE that we can run using our browser. It supports programming languages including Python, JavaScript. So, we can work on our project basically using our browser rather than dealing with environment setups specific to our home/office laptops.
- AWS CLI command to spin up & access & destroy cloud9 environment. we use environment ID to access and delete the cloud9 environment.
aws cloud9 create-environment-ec2 --name getting-started --description "Getting started with AWS Cloud9." --instance-type t3.micro --automatic-stop-time-minutes 60
{ "environmentId": "8a34f51ce1e04a08882f1e811bd706EX" }
aws cloud9 delete-environment --environment-id <environmentID>
- To access the cloud 9 environment:<environment ID>?region=us-west-2
To dive deeper:
Must use VS Code Extensions for anyone working on Cloud
Here are the list of VS Code extensions that anyone working on cloud technologies can use to speed up their development.
To download any extension, refer to the extension tab on your VS code window:
As we will manage all our cloud resources using Terraform, we will start with Terraform Autocomplete Extension.
1. Terraform Extensions
Terraform: to manage terraform resources directly from VS Code.
Terraform Autocomplete: useful when we are creating terraform resources.
2. Docker: To build, manage and deploy docker containers from VS Code.
3. Python: extension that provides python interpreter
4. Prettier-Code formatter:
5. Markdown Preview:
6. Git:
Git History:
Git Graph:
Now we can select the below extensions, and click on install.
AWS VSCode Extensions:
1. AWS Toolkit: To interact with AWS resources directly from VS Code. Helpful in taking a look of AWS resources without having to login into console, provides us with a very cool UI to get a quick overview of our resources.
Upon successful installation, we can find AWS on the left toolbar as shown below:
2. AWS CLI Configure: To use the AWS profiles directly, will be very handy when we want to use multiple AWS accounts and want to manage them separately. Realtime use-case would be when we want to access AWS resources from different environments like PROD environment or DEV environment.
3. AWS boto3: boto3 is a python library that will help us communicate with AWS resources
4. Sort AWS IAM Policy: will be a lot of help when we want to prepare IAM document especially when we are dealing with too many AWS resources in the same document. Unless they are really sorted, IAM policy can quickly become a mess.
5. AWS Step Functions Constructor: Helps us to visualize the AWS step functions directly on the VSCode, without having to check the document definition on the console.
Azure VSCode Extensions:
1. Azure Account:
2. Azure Tools:
The above extension is a package installer - will install or download the following Azure extensions as well:
- Azure Functions
- Azure Resources
- Azure CLI Tools
- Azure App Service
- Azure Resource Manager (ARM) tools
- Azure Databases
- Azure Storage
- Azure Pipelines
- Azure Virtual Machines
- ARM Template Viewer
Google Cloud (GCP) Extensions:
1. Cloud Code:
2. Google Cloud Spanner Driver:
Sunday, May 16, 2021
Terraform lifecycle
If we are using terraform, terraform state file is the heart of all the infrastructure that we spin up using terraform templates.
There are several ways to deploy the infrastructure using terraform:
1. Using CLI (setup terraform and then run terraform commands)
2. Automated Build (terraform scripts integrated as part of your jenkins pipeline)
No matter of the way we chose, we must make sure that we are using the same terraform state file, so that we are having a sync and proper checklists of the resources that we used.
I would like to share the terraform commands that we do on a daily basis:
terraform init = the basic/starting command which initializes the terraform (make sure the proper provider is provided. In my case, I use AWS).
terraform workspace select <workspace name> (creates a new workspace, useful in scenarios where we have different terraform modules - database, servers, logs, storage)
terraform state list = shows the list of terraform resources that are created (uses the state file)
terraform plan = creates a terraform plan and will give us a list of changes without actually deploying them (make sure you specify -out tfplan.plan if we need to store the output of this plan command)
terraform apply = to actually deploy based on the terraform plan
terraform destroy = to destroy the resources which we created
Wednesday, January 6, 2021
Enterprise Patterns in Terraform
Wednesday, November 18, 2020
Databricks: Job aborted due to stage failure. Total size of serialized results is bigger that spark driver memory.
While running a databricks job, especially running a job with large datasets and longer running queries that creates a lot of temp space - we might be facing below issue if we have a minimal configuration set to the cluster.
The simple way to fix this would be changing the spark driver config in the databricks cluster tab
spark.driver.maxResultSize = 100G (change the GB based on your cluster size)
Amazon connect offers: - built in reports i.e., historical and real-time reports. We can customize these reports, schedule them and can int...
While running a databricks job, especially running a job with large datasets and longer running queries that creates a lot of temp space -...
If we are using terraform, terraform state file is the heart of all the infrastructure that we spin up using terraform templates. There ar...