telecom

Automated Regulatory Website Monitoring & Information Structuring System

T-Regs

1 month

Key Results

100%

Manual Monitoring Eliminated

Previously required manual visiting across regulatory websites; now fully automated with zero manual checks.

90%

Time Saved per Day

Daily monitoring effort reduced from 1–2 hours to under 10 minutes through automated detection, translation and summarization.

Overview

AI-powered system that monitors 100+ regulatory sources daily, extracts relevant updates, and generates executive summaries automatically.

The Challenge

T-REGS monitors regulatory and authority websites across multiple jurisdictions to identify newly published materials relevant to their advisory work. These sources vary significantly in structure, language, and publication format, and updates are not consistently announced via standard feeds.

The client required a reliable and repeatable mechanism to:

Track selected public websites on a continuous basis
Identify newly published content per source
Translate articles from different languages
Store findings in a structured and queryable format
Distribute summaries internally in a consistent way

The solution needed to operate autonomously, remain transparent in its behavior, and be configurable without reliance on third-party platforms.

The Solution

A Python-based monitoring application was designed and implemented to periodically scan a defined list of regulatory and authority websites and detect newly published links.

Monitoring & Discovery

Websites are defined in a CSV configuration file, allowing explicit control over monitored sources. Each run compares discovered links against previously recorded entries to identify new items. First-time runs populate the database without AI processing to minimise unnecessary external calls.

Content Processing

For newly detected pages, the system extracts text content from publicly accessible HTML pages. When enabled, AI is used to generate structured summaries and metadata with support for 23 languages. Non-HTML documents are identified and recorded without content transformation.

Data Storage & Notification

All records are stored in a local SQLite database with automatic schema adaptation. Email notifications are generated based on configurable schedules, restricted to weekdays and defined time windows, including per-source counts and AI-generated summaries.

Technologies Used

PythonSeleniumBeautifulSoupOpenAISQLiteDockerSMTP

"We commissioned Kuda to build a specialised web scraping tool, augmented with AI translation and summarisation for 23 languages, with storage in a structured database and twice-daily automated e-mailing of results. Kuda delivered a working prototype, and a series of iterative improvements we requested, in a timely manner and to our satisfaction. Overall, we consider this to have been a highly successful project. We use the tool daily. It saves us time, speeds up our information flow, and enables us to discover content of high importance to our business."

Alexa Veller

Owner

Want results like these?

Book a discovery call to discuss your document-heavy workflows and see how we can help automate them.