In our experience, scraping our web pages for chatbot training wasn’t very effective.
We started by scraping a single, comprehensive FAQ page from our site. While the content is SEO-optimized and works well for human visitors, it repeatedly confused the AI.
We had to heavily edit and shorten the training document created from the scraped page, removing any entries that overlapped with other training materials — even subtle or partial overlaps — before the chatbot gave satisfactory answers again.
Our takeaway: content that works for people doesn’t always work for the AI.
What works better for us is monitoring chatbot history emails. When we spot an unsatisfactory answer, we upload that chat to ChatGPT along with our training documents, explain the correct response, and ask ChatGPT to suggest specific edits to the training files.
Since the chatbot is newly installed, we’re currently making a few edits per day.
At this stage, we’re impressed by the quality of nearly all responses.