Wednesday, August 9, 2023

AWS Connect: Reporting and Visualizations

Amazon connect offers:
- built in reports i.e., historical and real-time reports. 

We can customize these reports, schedule them and can integrate with any BI tool of our requirement to query and view the connect data. 


Sample solution provided by AWS:
1. Make sure Connect is exporting the CTR data using Kinesis Data Stream
2. Use Kinesis Firehose to deliver the CTR that are in KDS to S3. (CTR's can be delivered as batch of records, so one s3 object might have multiple CTR's). AWS Lambda is used to add a new line character to each record, which makes object easier to parse. 
3. s3 Event Notifications are used to send an event to modify the CTR record and saves it in S3.
4. Athena queries the modified CTR's using SQL. Use partitions to restrict the amount of data scanned by each query, improving performance and reducing cost. Lambda function is used to maintain the partitions. 
5. Quicksight is used to visualize the modified CTRs. 




Solution variations:

Convert records to Apache Parquet format:
- as we save each modified CTR as a single s3 object in JSON format
- athena charges you by the amount of data scanned per query
- you can save costs and get performance if you convert the data to a columnar format like Apache Parquet
- analyzing s3 data in athena: https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
- workshop: https://catalog.us-east-1.prod.workshops.aws/workshops/607718a8-cddd-416a-97b4-4fc9dc93ff7a/en-US/












Thursday, May 5, 2022

SoleTechie: Setting up Gitlab

 - Created a gitlab group:

  • Group name: SoleTechie
  • Group URL: http://gitlab.com/soletechie1
  • Visibility level: private        


  •  Group ID: 52826632

   - Creating a Gitlab project:
  • Project name: cicd-demo
  • Project URL: https://gitlab.com/soletechie/
  • Project slug: cicd-demo
  • Project description: Setting up git lab runners and trying to implement the CI-CD flow
  • Project deployment target (optional): Infrastructure provider (Terraform)
  • Visibility level: Private      








Friday, December 17, 2021

Getting started with AWS

 We will learn how to setup AWS account, how to access AWS resources using AWS CLI, how to leverage VS Code to view AWS resources. 


AWS documentation links for getting started guides:

https://aws.amazon.com/getting-started/?e=gs2020&p=console/#Get_to_Know_the_AWS_Cloud

https://aws.amazon.com/getting-started/guides/setup-cdk/

https://aws.amazon.com/getting-started/?e=gs2020&p=console/#Launch_Your_First_Application



Setting up AWS account:

1. Create Amazon Free Tier accounthttps://portal.aws.amazon.com/billing/signup?refid=ps_a131l0000085ejvqam&trkcampaign=acq_paid_search_brand&redirect_url=https%3A%2F%2Faws.amazon.com%2Fregistration-confirmation#/start

- Provide your details (email, username, billing information, and make sure you select basicsupport-free option).

- Upon successful signup, we will be seeing a confirmation like this:



2. Signin as root user: provide your login information (email, password) and we will be able to see our aws dashboard. 


3. Access AWS Management Console:




4. Follow this great documentation provided by AWS: https://aws.amazon.com/getting-started/?e=gs2020&p=console/#Get_to_Know_the_AWS_Cloud

Using the above documentation link, we can find the best practices on how to setup our AWS cloud account. 

Let us start making progress by following this guide:

Setting up AWS environmenthttps://aws.amazon.com/getting-started/guides/setup-environment/


Adding MFA:

- select "IAM" service, and add MFA (Multi Factor Authentication).  



- once we select "add MFA", it will take us to this page- and we need to select "Activate MFA"


- select "Virtual MFA device" and hit Continue: 

- I used "Google Authenticator" app as my MFA device.  Scan the QR Code using the app, and enter 2 MFA codes. Once we successfully add the device, we can see our device under MFA. 




- once we add the MFA, we can see the IAM dashboard as:


Create IAM Group -> 

Now, we can proceed with creating user groups. As it is not advised to use root user for everything. We have to follow the least access privilege principle to keep our accounts more secure. 



Enter user group name: admins

Attach permission policies: search for "administrator access" & select it

Now we can see admins group getting created. 


Create IAM User ->



select "Add users"

username: soletechie

enable both programmatic access (to use AWS resources using CLI) and password - to access management console. 
















Note

- we can create our account alias if we don't want to use our account ID to login to AWS console

- To create alias, go to IAM dashboard, and to your right, you can find your AWS account ID information, where you will have the option to create alias. 

- Aliases must be unique, once you give a unique alias name, you will be able to sign in to AWS management console using this alias. 


***********************************************************************************

Setting up AWS-CLI:

***********************************************************************************

- Use this link to setup AWS CLI (latest version v2) based on your operating system: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

- To download macOS package file: https://awscli.amazonaws.com/AWSCLIV2.pkg

- once you run the installer, we will be able to see the software installed successfully. 



- To verify if AWS CLI is successfully installed:


Time to CONFIGURE:

- type command - "aws configure" and provide your access key id, aws secret access key, default region name and default output format. 




More detailed information on how to configure AWS CLI:https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html


***********************************************************************************

Setting up Cloud9: (use this only if we want to use browser based development tool)

***********************************************************************************

- use this link to setup Cloud9. https://aws.amazon.com/getting-started/guides/setup-environment/module-four/?refid=ps_a131l0000085ejvqam&trkcampaign=acq_paid_search_brand

- cloud9 is a free cloud based IDE that we can run using our browser. It supports programming languages including Python, JavaScript. So, we can work on our project basically using our browser rather than dealing with environment setups specific to our home/office laptops. 

- AWS CLI command to spin up & access & destroy cloud9 environment. we use environment ID to access and delete the cloud9 environment.

aws cloud9 create-environment-ec2 --name getting-started --description "Getting started with AWS Cloud9." --instance-type t3.micro --automatic-stop-time-minutes 60

{ "environmentId": "8a34f51ce1e04a08882f1e811bd706EX" }

aws cloud9 delete-environment --environment-id <environmentID>

- To access the cloud 9 environment: https://console.aws.amazon.com/cloud9/ide/<environment ID>?region=us-west-2


Note:

To dive deeper: https://aws.amazon.com/getting-started/?e=gs2020&p=console/#Dive_Deeper

Must use VS Code Extensions for anyone working on Cloud

Here are the list of VS Code extensions that anyone working on cloud technologies can use to speed up their development. 

To download any extension, refer to the extension tab on your VS code window:



As we will manage all our cloud resources using Terraform, we will start with Terraform Autocomplete Extension. 

1. Terraform Extensions


Terraform: to manage terraform resources directly from VS Code. 




Terraform Autocomplete: useful when we are creating terraform resources.



2. Docker: To build, manage and deploy docker containers from VS Code.



3. Python: extension that provides python interpreter



4. Prettier-Code formatter:



5. Markdown Preview



6. Git:  

Git History:



Git Graph:





Now we can select the below extensions, and click on install. 


AWS VSCode Extensions:

1. AWS Toolkit: To interact with AWS resources directly from VS Code. Helpful in taking a look of AWS resources without having to login into console, provides us with a very cool UI to get a quick overview of our resources.



Upon successful installation, we can find AWS on the left toolbar as shown below:




2. AWS CLI Configure: To use the AWS profiles directly, will be very handy when we want to use multiple AWS accounts and want to manage them separately. Realtime use-case would be when we want to access AWS resources from different environments like PROD environment or DEV environment.



3. AWS boto3: boto3 is a python library that will help us communicate with AWS resources



4. Sort AWS IAM Policy: will be a lot of help when we want to prepare IAM document especially when we are dealing with too many AWS resources in the same document. Unless they are really sorted, IAM policy can quickly become a mess. 



5. AWS Step Functions Constructor: Helps us to visualize the AWS step functions directly on the VSCode, without having to check the document definition on the console. 




Azure VSCode Extensions:


1. Azure Account:



2. Azure Tools:



The above extension is a package installer - will install or download the following Azure extensions as well:

- Azure Functions

- Azure Resources

- Azure CLI Tools

- Azure App Service

- Azure Resource Manager (ARM) tools

- Azure Databases

- Azure Storage

- Azure Pipelines

- Azure Virtual Machines

- ARM Template Viewer



Google Cloud (GCP) Extensions:


1. Cloud Code



2. Google Cloud Spanner Driver:



Sunday, May 16, 2021

Terraform lifecycle

 If we are using terraform, terraform state file is the heart of all the infrastructure that we spin up using terraform templates. 

There are several ways to deploy the infrastructure using terraform:

1. Using CLI (setup terraform and then run terraform commands)

2. Automated Build (terraform scripts integrated as part of your jenkins pipeline)

No matter of the way we chose, we must make sure that we are using the same terraform state file, so that we are having a sync and proper checklists of the resources that we used. 


I would like to share the terraform commands that we do on a daily basis:

terraform init = the basic/starting command which initializes the terraform (make sure the proper provider is provided. In my case, I use AWS). 

terraform workspace select <workspace name> (creates a new workspace, useful in scenarios where we have different terraform modules - database, servers, logs, storage)

terraform state list = shows the list of terraform resources that are created (uses the state file)

terraform plan = creates a terraform plan and will give us a list of changes without actually deploying them (make sure you specify -out tfplan.plan if we need to store the output of this plan command)

terraform apply = to actually deploy based on the terraform plan

terraform destroy = to destroy the resources which we created




Wednesday, January 6, 2021

Enterprise Patterns in Terraform

What are Modules? 
- self contained pieces of IAC that abstract the infrastructure deployments 
- use clear organization and DRY (Dont Repeat Yourself) 
- helps in writing composable, shareable and reusable infrastructures 

 Scope the requirements into appropriate modules: - When building a module, consider 3 areas: 
 1. Encapsulation - Group infrastructure that is always deployed together 
 2. Privileges - Restrict modules to privilege boundaries 
 3. Voltatility - Separate long lived infrastructure from short-lived (Ex: Database-static vs Application Servers-dynamic) 

Create the module MVP :
* Always aim to deliver a module that works for 80% of usecases 
* Never code for edge cases. A module should be a reusable block of code. 
* Avoid conditional expressions in MVP 
* Module should only expose the most commonly modified arguments as variables. 

 Scoping Example - A team wants to provision their infrasturucture, web tier application, and app tier using Terraform 
- web application requires autoscaling group 
- app tier also requires autoscaling group, an S3 and a database.

So the modules for the above requirement could be as: 
Module 1: Network: [VPC, NACL, NAT Gateway] 
- responsible for infrastructure networking 
 - contains network ACLs and NAT gateway 
 - also includes VPC, subnets, peering and direct connect 

Module 2: Web: [Load Balancer, Auto Scaling Group] 
- creates and manages the infrastructure needed to run the web application 
 - contains load balancer and auto scaling group 
 - could also include EC2 instances, S3 buckets, security groups inside the application and logging 

Module 3: App: [Load Balancer, Auto Scaling Group, S3 bucket] 
- creates and manages the infrastructure needed to run the app tier application 
 - contains the load balancer, auto scaling group, and s3 buckets 
 - can also include EC2 instances, S3 buckets, security groups inside the application and logging 

Module 4: Database: [Database] 
 - creates and manages the infrastructure needed to run the database 
 - contains the RDS instance used by the application 
 - can also include all associated storage, all backup data and logging 

Module 5: Routing: [Hosted Zone, Route 53, Route Table] 
- creates and manages the infrastructure needed for any network routing 
 - contains hosted zones, Route 53, Route Tables 

Module 6: Security: [IAM- Identity And Access Management] 
- creates and manages the infrastructure needed for security 
 - contains IAM resources, also include security groups and MFA 

 After we are done writing modules
- we import them into the private module registry 
- advertise their availability to the respective team members for consumption


Define and use a consistent module structure:
- Define list of .tf files that must be in the module and what they should contain
- Define a .gitignore for modules
- Create a standard way of providing examples (terraform.tfvars.example)
- Use a consistent directory structure with a defined set of directories, even if they may be empty
- All module directories should have a README detailing the purpose and use of files within it


Use source control to track modules:
- Place modules ini source control to manage versions, collaboration, and audit trail of changes
- Tag and document all releases to master (use CHANGELOG and README as a minimum)
- Code review all changes to the master
- Encourage your module users to reference by tag
- Assign each module an owner
- Use only one module per repository
 

Documentation source:



Wednesday, November 18, 2020

Databricks: Job aborted due to stage failure. Total size of serialized results is bigger that spark driver memory.

 

While running a databricks job, especially running a job with large datasets and longer running queries that creates a lot of temp space - we might be facing below issue if we have a minimal configuration set to the cluster. 








The simple way to fix this would be changing the spark driver config in the databricks cluster tab

spark.driver.maxResultSize = 100G (change the GB based on your cluster size)