Databricks: Job aborted due to stage failure. Total size of serialized results is bigger that spark driver memory.

While running a databricks job, especially running a job with large datasets and longer running queries that creates a lot of temp space - we might be facing below issue if we have a minimal configuration set to the cluster.

The simple way to fix this would be changing the spark driver config in the databricks cluster tab

spark.driver.maxResultSize = 100G (change the GB based on your cluster size)

Comments

AWS Connect: Reporting and Visualizations

Amazon connect offers: - built in reports i.e., historical and real-time reports. We can customize these reports, schedule them and can integrate with any BI tool of our requirement to query and view the connect data. Sample solution provided by AWS: 1. Make sure Connect is exporting the CTR data using Kinesis Data Stream 2. Use Kinesis Firehose to deliver the CTR that are in KDS to S3. (CTR's can be delivered as batch of records, so one s3 object might have multiple CTR's). AWS Lambda is used to add a new line character to each record, which makes object easier to parse. 3. s3 Event Notifications are used to send an event to modify the CTR record and saves it in S3. 4. Athena queries the modified CTR's using SQL. Use partitions to restrict the amount of data scanned by each query, improving performance and reducing cost. Lambda function is used to maintain the partitions. 5. Quicksight is used to visualize the modified CTRs. Solution variations: Convert re...

SoleTechie: Setting up Gitlab

- Created a gitlab group: Group name : SoleTechie Group URL : http://gitlab.com/soletechie1 Visibility level : private Group ID : 52826632 - Creating a Gitlab project: Project name : cicd-demo Project URL : https://gitlab.com/soletechie/ Project slug : cicd-demo Project description : Setting up git lab runners and trying to implement the CI-CD flow Project deployment target (optional) : Infrastructure provider (Terraform) Visibility level : Private

soletechie

Search This Blog