Multi Cloud Analytics-Azure and GCP

Sivaprasad Mandapati
Dev Genius
Published in
5 min readJun 25, 2023

--

Data silos constitute separate or segmented storage locations inside an organization where information exists and managed independently. These frequently have links with certain departments, programmes, or cloud computing platforms. Productive data analysis and decision-making processes could be hampered by these data silos. Here are a couple of ways that multi-cloud solutions can address the challenges that data silos pose.

Multi clouding is a method of harnessing data analytics and business intelligence across numerous cloud computing environments or platforms . It involves bringing together and analyzing data from various cloud providers, like Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and others, with the goal collect comprehensive insights and make informed decisions.

Organizations benefit from increased flexibility, resilience, performance optimization, cost optimization, and compliance adherence with multi-cloud analytics. It enables businesses to use the strengths of numerous cloud platforms, maximizing the value of their data and enabling more robust and informed decision-making processes.

While multi-cloud analytics offers numerous advantages, it comes with a few drawbacks which organizations must deal with. Here are some of the significant challenges associated with multi-cloud analytics.

Difficulty of Data Integration
Governance and Security of Data
Integrity and reliability of Data
Complexity of Monitoring and management
Compatibility and interoperability

Addressing these issues necessitates thorough planning, comprehensive methods, and strong implementation. To effectively harness the benefits of multi-cloud analytics while mitigating associated issues, organizations should examine variables such as data integration capabilities, data governance frameworks, security measures, and resource management practices.

Multi Cloud Analytics

As part of this demonstration, I’d like to illustrate Multi Cloud analytics using GCP and Azure. We try to read, study, and evaluate sales data kept in Azure Data Lake Gen2 in GCP Bigquery without moving the data from Azure to GCP.

Set up Azure Data Lake Gen2 Storage Account 
Upload Data Files to Azure Data Lake Gen2 bucket
Create an Exteranl Resource connection in Bigquery
Create service principle in Azure IAM to allow Bigquery External source connection to read data from Azure Data Lake Gen2 bucket
Create BigLake table in Bigquery
Query BigLake table

Azure Data Lake Gen2

Azure Data Lake Storage Gen2 is a cloud-based data storage solution offered by Microsoft Azure. It is intended to allow organizations to store and analyze massive amounts of structured and unstructured data in a scalable, secure, and cost-effective manner.

It is quite simple to create an Azure Data Lake Gen2 storage account. I’m assuming you’re all familiar with Azure Cloud and how to set up an account. Create a container to house data files as well.

Google Bigquery

Bigquery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. With Bigquery, there’s no infrastructure to set up or manage, letting you focus on finding meaningful insights using SQL

BigLake Table

BigLake tables in Bigquery enables you to perform queries on structured data in external data stores (GCS, Azure Data Lake Gen2, AWS S3) via access delegation. To connect to the data store, an external connection connected with a service account is used. You just need to provide users access to the BigLake table since the service account handles the retrieval of data from the data store. This makes it possible greater sophisticated security controls, data masking, and multi-cloud analytics.

Create External Data connection to Azure Data Lake Gen2 . This uses BigLake on Azure through Bigquery Omni (an inbuilt multi cloud connector to Azure and AWS) .Fill the below columns including your Azure tenant ID and hit create connection.

External Resource Connection to Azure Data Lake Gen2
Azure Data Lake Gen2 External Resource Connection

Azure IAM Service Principle

In Azure, an Azure Active Directory (Azure AD) service principal, also known as a service account or app registration, is a security identity used by applications, services, or automation tools to access Azure resources. It provides a way to authenticate and authorize applications to interact with Azure services without the need for individual user accounts

The next step is to grant the required role to Bigquery external source connection to read data from Azure Data Lake Gen2 container.

Assigning Storage Blob Reader access to Bigquery External Resource connection that we created in Bigquery.

Query Azure Files in Bigquery

Create a Bigquery dataset in east-us2 region .

And finally, create BigLake table with external resource connection and specify the location of Azure data lake Gen2 blob file .

Now you can view and query the azure data file in Google Bigquery. We can treat this table as Bigquery internal table to perform any data analysis

Useful Links :

--

--

Azure ||Google Cloud Certified||AWS|| Big Data,Spark,ETL Frameworks,Informatica|| Database migration Specialist||Data Architect||Google Cloud Authorized Trainer