What are the privacy implications of using LLMs with user data?

Asked about 2 months agoViewed 158 times
7

Our company wants to use GPT-4 to analyze customer support tickets and suggest responses. Legal and compliance teams are concerned about:

  1. Data retention: Does OpenAI store our API requests?
  2. Training data: Will our data be used to train future models?
  3. GDPR compliance: How do we handle EU customer data?
  4. Sensitive information: What if tickets contain PII or confidential info?

Options we're considering:

  • Use OpenAI's zero-retention API
  • Self-host an open-source LLM (Llama 3, Mistral)
  • Implement PII redaction before sending to LLM
  • Use Azure OpenAI for enterprise compliance

What's the current best practice for using LLMs in privacy-sensitive contexts? Has anyone successfully navigated GDPR compliance with LLM-powered features?

asked about 2 months ago

Comments

L

Don't forget about data residency requirements in other regions like Canada, Australia, etc. Azure OpenAI has regional deployments for this.

Lisa Kim1890about 2 months ago

Please log in to add a comment

Log In

1 Answer

6

We went through this exact process. Here's what we learned:

OpenAI API Privacy (as of 2024):

  • ✅ API data is NOT used for training (per their policy)
  • ✅ Zero-retention available for Enterprise customers
  • ✅ Data deleted after 30 days (default) or immediately (zero-retention)
  • ⚠️ Still sends data to OpenAI servers (compliance issue for some)

Our Solution: We use a hybrid approach:

  1. PII Redaction (before LLM):
def redact_pii(text):
    text = redact_emails(text)
    text = redact_phone_numbers(text)
    text = redact_names(text)  # Using NER model
    return text
  1. Azure OpenAI for EU customers:
  • Data stays in EU region
  • GDPR compliant
  • Enterprise SLA
  1. Self-hosted Llama 3 for highest sensitivity:
  • Full data control
  • Higher infrastructure cost
  • Slightly lower quality

GDPR Compliance Checklist:

  • Data Processing Agreement (DPA) with provider
  • Document data flows in privacy policy
  • Implement data retention policies
  • Enable user data deletion requests
  • Regular privacy impact assessments

Cost comparison:

  • OpenAI API: $0.03/1k tokens
  • Azure OpenAI: $0.04/1k tokens (+ compliance)
  • Self-hosted Llama 3: $2000/month (GPU costs) + engineering

For most companies, Azure OpenAI + PII redaction is the sweet spot.

answered about 2 months ago

Comments

No comments yet. Be the first to comment!

Please log in to add a comment

Log In

Sign in to post an answer

Sign In