You want to build AI-powered features for your European customers. You need to use LLMs, vector databases, and ML pipelines. But your data can’t leave the EU, and your compliance team needs documentation proving it.

Here’s how to architect GDPR-compliant AI pipelines on AWS, with everything running in the EU – specifically AWS Frankfurt (eu-central-1).

Why eu-central-1 (Frankfurt)

AWS Frankfurt is the most mature EU region for AI workloads:

  • Located in Germany – one of the strictest GDPR enforcement jurisdictions
  • Full service availability – SageMaker, Bedrock, Lambda, ECS, RDS, OpenSearch all available
  • AWS Bedrock – access to Claude, Llama, and other models with data processing agreements that cover EU data
  • Dedicated infrastructure – data physically resides in Frankfurt data centres

Alternative EU regions: eu-west-1 (Ireland), eu-west-2 (London), eu-south-1 (Milan), eu-north-1 (Stockholm). Frankfurt is preferred because German data protection authorities are the most active enforcers, so compliance with their standards typically satisfies all EU DPAs.

Architecture Overview

A typical GDPR-compliant AI pipeline on AWS eu-central-1:

User Request
  → API Gateway (eu-central-1)
    → Lambda / ECS (PII detection & anonymisation)
      → Bedrock / SageMaker (LLM inference)
        → Response de-anonymisation
          → DynamoDB / RDS (audit logging)
            → Response to user

Every component runs in eu-central-1. No data crosses regional boundaries.

Step 1: Lock Down the Region

Before writing any application code, enforce region restrictions at the AWS account level.

Service Control Policies (SCPs)

Create an SCP that prevents any service from launching outside EU regions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyNonEURegions",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "eu-central-1",
            "eu-west-1",
            "eu-west-2",
            "eu-north-1",
            "eu-south-1"
          ]
        }
      }
    }
  ]
}

This ensures that even if someone accidentally configures a service in us-east-1, the request is denied. This is your first line of defence for data residency.

VPC Configuration

  • Create your VPC in eu-central-1
  • Use VPC endpoints for AWS services (S3, Bedrock, DynamoDB) to keep traffic on the AWS private network
  • Enable VPC Flow Logs for audit purposes

Step 2: PII Detection and Anonymisation

Before any data reaches an LLM, strip personally identifiable information.

Using Amazon Comprehend

Amazon Comprehend’s PII detection is available in eu-central-1 and can identify:

  • Names, addresses, phone numbers
  • Email addresses, credit card numbers
  • Dates of birth, SSNs, passport numbers

Custom Anonymisation Layer

For domain-specific PII (patient IDs, internal employee codes, custom identifiers), build a custom anonymisation service:

  1. Detect – Use Comprehend + custom regex patterns to identify PII
  2. Tokenise – Replace each PII entity with a unique token (e.g., [PERSON_001])
  3. Store the mapping – Keep the token-to-PII mapping in an encrypted DynamoDB table (also in eu-central-1)
  4. Send anonymised text to the LLM
  5. De-tokenise – Replace tokens with original PII in the response

The mapping table should have a TTL (time-to-live) aligned with your data retention policy – typically 30-90 days for processing purposes.

Step 3: LLM Inference in the EU

AWS Bedrock runs in eu-central-1 and provides access to:

  • Anthropic Claude – Sonnet, Haiku
  • Meta Llama – 3.1, 3.2
  • Amazon Titan – Text, embeddings

Key compliance features:

  • Data stays in-region – inference happens in Frankfurt
  • No model training on your data – Bedrock does not use customer data to train models
  • AWS BAA available – Business Associate Agreement for healthcare workloads
  • DPA included – AWS’s Data Processing Addendum covers GDPR requirements

Option B: Self-Hosted Models on SageMaker

For maximum control, deploy open-source models on SageMaker endpoints in eu-central-1:

  • Llama 3.1 70B – Strong general-purpose model
  • Mistral Large – European-built, strong multilingual capabilities
  • Domain fine-tuned models – Your own models trained on your data

Benefits: complete data isolation, no third-party processing, full control over model versions and updates.

Trade-off: higher cost (GPU instances), operational overhead, potentially lower capabilities than frontier APIs.

Option C: Zero-Retention API Configuration

If you must use external LLM APIs (OpenAI, Anthropic direct), configure zero-retention:

  • Anthropic: Enterprise plan with zero-retention DPA
  • OpenAI: API data usage policy (opt-out of training) + DPA

Ensure your API calls route through your EU VPC – use a proxy Lambda to log requests and enforce anonymisation before data leaves your infrastructure.

Step 4: Vector Databases for RAG

If you’re building retrieval-augmented generation pipelines, your vector database must also reside in the EU.

Amazon OpenSearch Serverless (eu-central-1)

  • Native vector search support
  • Serverless – no infrastructure management
  • Encryption at rest and in transit by default
  • Fine-grained access control with IAM

Amazon RDS for PostgreSQL + pgvector

  • Run PostgreSQL with the pgvector extension in eu-central-1
  • Full SQL capabilities alongside vector search
  • Familiar tooling for most engineering teams

Self-Hosted Options

  • Weaviate on EKS in eu-central-1
  • Qdrant on ECS in eu-central-1
  • Pinecone – check their EU region availability (currently limited)

Step 5: Encryption Everywhere

GDPR doesn’t explicitly require encryption, but it’s listed as an appropriate technical measure under Article 32. For AI pipelines handling personal data, implement:

At Rest

  • S3: SSE-S3 or SSE-KMS with customer-managed keys
  • DynamoDB: Encryption enabled by default, use KMS for customer-managed keys
  • RDS: Encrypted storage volumes + encrypted snapshots
  • SageMaker: Encrypted model artefacts and training data

In Transit

  • TLS 1.2+ on all API endpoints
  • VPC endpoints to keep traffic off the public internet
  • Certificate pinning for service-to-service communication

Key Management

  • Use AWS KMS in eu-central-1 for all encryption keys
  • Keys never leave the region
  • Enable key rotation
  • Audit all key usage via CloudTrail

Step 6: Audit Logging

GDPR’s accountability principle requires you to demonstrate compliance. Build comprehensive logging:

What to Log

  • Every AI inference request (timestamp, user ID, anonymised input, model used)
  • PII detection results (what was found and anonymised)
  • Access to personal data (who accessed what, when)
  • Data deletion events (right-to-erasure fulfilment)
  • Model version changes

Where to Log

  • CloudTrail – API-level audit trail for all AWS actions
  • CloudWatch Logs – Application-level logging in eu-central-1
  • DynamoDB – Structured audit records with TTL for retention management
  • S3 – Long-term audit archive with lifecycle policies

Retention

Align log retention with your GDPR data retention policy. Typically:

  • Operational logs: 30-90 days
  • Audit logs: 12-24 months
  • Compliance documentation: duration of processing + 3 years

Step 7: Right to Erasure (Article 17)

Your AI pipeline must support data deletion requests. This means:

  1. Index all personal data – Know exactly where each person’s data lives across your pipeline
  2. Purge from vector databases – Delete embeddings derived from the individual’s data
  3. Clear anonymisation mappings – Delete token-to-PII mappings
  4. Remove from logs – Redact or delete personal data from audit logs (keep anonymised records)
  5. Confirm deletion – Document what was deleted and when

Cost Considerations

Running AI pipelines in eu-central-1 typically costs 5-10% more than us-east-1 due to regional pricing. For a typical enterprise workload:

ComponentMonthly Estimate
Bedrock (Claude Sonnet, 1M tokens/day)€2,000-4,000
SageMaker endpoint (Llama 70B, ml.g5.12xlarge)€5,000-8,000
OpenSearch Serverless (vector store)€500-1,500
Lambda + API Gateway€200-500
DynamoDB (audit logs)€100-300
Total€3,000-14,000/mo

The cost premium for EU hosting is negligible compared to the cost of GDPR non-compliance (up to 4% of global annual turnover).

Common Pitfalls

1. Using a global CDN that caches personal data outside the EU

Ensure CloudFront distributions are configured with EU-only edge locations, or don’t cache responses containing personal data.

2. Forgetting about CloudWatch cross-region replication

Disable any cross-region log replication that might copy personal data outside EU regions.

3. Using third-party AI APIs without DPAs

Every external service that touches personal data needs a Data Processing Agreement. This includes vector database SaaS providers, embedding APIs, and evaluation tools.

4. Not accounting for model updates

When AWS updates Bedrock models, your system’s behaviour changes. Log model versions with every inference for audit traceability.

Next Steps

Building GDPR-compliant AI pipelines requires careful architecture from day one – retrofitting compliance into an existing pipeline is significantly more expensive and error-prone.

For a deeper dive into GDPR and AI, read our comprehensive guide to GenAI and GDPR compliance. To understand the broader regulatory landscape including the EU AI Act, see our EU AI Act compliance checklist.


See how we apply this in specific industries:

At HASORIX, we build compliant AI systems for European enterprises – from architecture to deployment to documentation. Talk to us about your AI pipeline.