uick Data Sharing Only: S3 / IAM

Q: uick Data Sharing Only: S3 / IAM

The most basic approach, sharing only the S3 objects themselves.

This content was translated from Korean to English using AI.

Conclusion

Background

One of the biggest challenges while operating an AWS data lake environment was
how to share data.

As the organization grew, accounts were separated (Multi Account), and the teams producing and consuming data diverged. Managing
“how much access to grant” and
“who accesses what and how” became increasingly difficult.

I have experienced various data sharing approaches over time,
and after considerable trial and error, the pros and cons of each method became clear.

Before systematizing a data sharing approach that fits our company’s situation,
I wanted to first organize and review the concepts and selection criteria.

Goal

Understand the permission check flow during data queries and identify where permission issues occur.
Understand the various data sharing methods and the flow of each.
Be able to evaluate and select the appropriate data sharing method.

Explanation

The “data” referred to here means tables registered in the Glue Catalog (schema or table data).

The choice depends on “what to share” and “how much to control.”

Manage primarily through LakeFormation (LF) and handle exceptions separately for the best results.

In an AWS data lake environment, data sharing can be broadly divided into three approaches:

Permission Check Flow During Data Queries

The high-level flow is as follows:

User Data Queries

- Accessing tables via Athena queries
- Glue ETL reading Catalog tables as input
- EMR/Redshift Spectrum referencing Catalog tables
- Accessing Catalog tables via boto3 through Athena, Glue, EMR, Redshift

What it means to manage with Lake Formation...

Registering S3 data locations (bucket/prefix) as Data Lake Locations in Lake Formation

The Lake Formation permission model applies to Glue Data Catalog objects (databases/tables) that point to the registered S3 locations

Lake Formation assumes the IAM role specified during registration and issues temporary credentials (credential vending) to integrated services (Athena/EMR/Glue, etc.)

Subdirectories under the registered path are included in the management scope

With Hybrid access mode, you can gradually transition by applying LakeFormation permissions to only some databases/tables in the Data Catalog

Understanding Through Examples

Example Scenario

Producer / Data Lake Account A
- Stores data in S3
- Manages metadata in Glue Data Catalog
Consumer / Analytics Team Account B
- Uses shared data via Athena/Glue/ETL, etc.

The most basic approach, sharing only the S3 objects themselves.

Characteristics

Grants S3 access via bucket policies + consumer account IAM permissions
The consumer account must create tables manually for the metadata of the shared data
Glue Catalog metadata is not shared

Things to Be Aware Of

Best suited for quickly granting data access, but
Since only data is shared, schema synchronization requires Glue Crawler or other supplementary measures.

Detailed Implementation Flow

1. Producer (Account A)
- Identify the S3 bucket/prefix to share
- Add Consumer (Account B) access permissions to the bucket policy
- (If encryption is used) Add Consumer permissions to the KMS Key policy
 
2. Consumer (Account B)
- Grant S3 Read permissions to the IAM Role/User
- Create tables manually in Glue Data Catalog or generate schema via Crawler
- Start querying via Athena/Glue/EMR using the created tables

A method that shares not only data but also table definitions (schema).

Characteristics

Set resource policies on the data-owning account’s Glue Data Catalog
The consumer account registers it as an external DataCatalog in Athena
Tables can be queried in the format ownerCatalog.db.table

Things to Be Aware Of

After Glue Catalog permission verification,
- S3 access is checked separately via S3/IAM policies

Detailed Implementation Flow

1. Producer (Account A)
- Determine the target DB/tables to share
- Add Consumer (Account B) permissions to the Data Catalog Resource Policy
- Identify the S3 bucket/prefix to share
- Add Consumer (Account B) access permissions to the bucket policy
 
2. Consumer (Account B)
- Grant Glue permissions received from Account A to the Consumer IAM Role/User
- Register the external Data Catalog in Lake Formation
- Query in the format producerCatalog.db.table

An approach that leverages Data Lake Locations in Lake Formation, a dedicated data lake governance service.

Characteristics

Permission management at the DB/table level
Row-level and column-level access control available
Requires AWS RAM invitation acceptance + Resource Link creation

Advantages

Enables policy-centric data access management
Granular control at the account/role/user level

Detailed Implementation Flow

1. Producer (Account A)
- Review existing Glue tables
  Check DB/tables for the S3 paths to register
- Register Lake Formation Data Lake Location
  Select S3 path
  Specify IAM Role for LF to assume
  Check Hybrid access mode if needed
  (Hybrid access mode: keep existing IAM access vs. separate LF-governed targets)
- Grant DB/table permissions in Lake Formation
  Grant to Consumer account/ORG/OU
- Create AWS RAM sharing invitation
  Send resource sharing invitation to the Consumer account
- (If using Hybrid) Configure opt-in for LF-governed targets
  "Make LF Permissions effective immediately" option available
  
2. Consumer (Account B)
- Accept the invitation in the AWS RAM console
- Create a Resource Link in Lake Formation
- Grant Resource Link permissions
  Describe (Resource Link)
  Grant on target (source resource)
- Delegate permissions to internal Consumer IAM Roles/Users
  (If using Hybrid) Configure opt-in settings

Leon the start point

Unifying Governance with LF: Lake Formation Data Lake Location-Centric Operations

Conclusion

Background

Goal

Explanation

Permission Check Flow During Data Queries

Understanding Through Examples

Example Scenario

Characteristics

Things to Be Aware Of

Detailed Implementation Flow

Characteristics

Things to Be Aware Of

Detailed Implementation Flow

Characteristics

Advantages

Detailed Implementation Flow

그래프 뷰

목차

Unifying Governance with LF: Lake Formation Data Lake Location-Centric Operations

Conclusion

Background

Goal

Explanation

Types and Criteria for Data Sharing Methods

1. Quick data file sharing only: S3/IAM

2. Sharing schema and tables: Glue Data Catalog

3. (Recommended) Skip the above and share and manage everything at once: LakeFormation

Permission Check Flow During Data Queries

Understanding Through Examples

Example Scenario

Quick Data Sharing Only: S3 / IAM

Characteristics

Things to Be Aware Of

Detailed Implementation Flow

Sharing Schema and Tables: Glue Data Catalog

Characteristics

Things to Be Aware Of

Detailed Implementation Flow

Skip the Above and Share and Manage Everything at Once with LF: LakeFormation (Recommended)

Characteristics

Advantages

Detailed Implementation Flow

그래프 뷰

목차