finance

Automating UK Corporate Liquidity Analysis via AI-Powered OCR

Confidential Fintech Startup
2 weeks
companies house page

Key Results

75,000
Companies Processed
Total number of financial filings filtered and passed through OCR.

Overview

Built a pipeline to extract "cash in bank" data from 75k+ unstructured Companies House filings using OCR and GPT-5.2 to identify high-liquidity prospects.

The Challenge

The client needed a high-intent database of UK companies with significant cash reserves. While Companies House provides a directory of 5 million entities, detailed financial health is buried within annual accounts. These filings are predominantly PDFs (80%) with inconsistent formatting, making traditional scraping impossible. The manual effort required to open, read, and extract "cash at bank" figures from tens of thousands of documents was prohibitively expensive and slow.

The Solution

I developed a scalable pipeline that filtered the national registry down to 75,000 relevant entities. Using the Companies House API, I programmatically retrieved financial filings and applied OCR to convert PDFs into structured text. Finally, I utilized GPT-5.2 to intelligently parse balance sheets, identifying the specific "cash in bank" line items while accounting for different currencies (like JPY) and varying accounting terminologies.

Technologies Used

OCRLLMOpenAIAI

Want results like these?

Book a discovery call to discuss your document-heavy workflows and see how we can help automate them.