How to train ChatGPT on your own data (and avoid data leaking)

Hey all :eyes:
Was reading through some research on training ChatGPT on your own data, and the security section genuinely caught me off guard. A few things worth knowing:

  • Over 3 million custom GPTs have been created since OpenAI launched the feature
  • 75% of workers say using AI in their projects has improved their speed or output quality
  • Yet the majority of custom GPTs are vulnerable to prompt injection –

…meaning uploaded files and system instructions could be extracted with the right prompt :face_with_open_eyes_and_hand_over_mouth:

Have you tried training ChatGPT on your own data — and if so, what are you using it for?

Let me know in the comments :speech_balloon:

7 Likes

The prompt injection thing is wild and not talked about enough. I built a custom GPT for internal use, mostly onboarding docs, and a colleague extracted the entire system prompt in like 30 seconds just to prove a point. Ended up moving everything behind an API instead. Using custom GPTs for anything client-facing? Feels risky to me tbh

3 Likes

Thanks for sharing that, Tim! This is exactly why the security side deserves more attention. People train ChatGPT on their data expecting privacy by default, but clearly that’s not always the case. Disabling Code Interpreter helps a lot, though it’s not a full fix. Sounds like the API route was the right call for your setup :blush:

3 Likes