Challenges AI faces in understanding invoice information

Find out interesting insights with Niyati Chhaya, Co-Founder & VP-AI, Hyperbots

Moderated by Emily, Digital Transformation Consultant at Hyperbots

Don’t want to watch a video? Read the interview transcript below.

Emily: Hi, everyone. This is Emily, and I’m a digital transformation consultant at Hyperbots today. I’m very pleased to have Niyati join us today, who is the co-founder and VP of AI at Hyperbots. Today, we’ll be talking about invoices, their peculiarities, and the different nuances. Specifically, why is it not so trivial for AI to understand them? So Niyati, let’s start with the basics. What makes invoices peculiar?

Niyati: Thanks, it’s always great to be talking to you. What is an invoice? An invoice is the main document or the mode of communication for any accounts payable process or AP process. These invoices contain information about what payment needs to be made between a vendor and a company. It includes addresses, identities, goods or services that are being invoiced, and, of course, the amounts and other related information.

Emily: That’s quite a lot of information, right? All of this is usually put in a single format. So it’s all very structured, and the invoices look the same, correct?

Niyati: No, not really.In fact, an invoice by nature is unstructured. Every vendor may have their own way of representing an invoice. It’s not only about how it’s laid out on the page or multiple pages; it’s also about the format in which it may be communicated. Sometimes it’s via email, a scan, a PDF, or even an Excel sheet. An invoice doesn’t have to follow a specific structure as long as it communicates the necessary payment information and the details of goods or services being transacted.

Emily: So this would also vary across industries, right? Can you give me an example?

Niyati: Sure! Let’s take the example of a software company. Maybe they’re renting office space. The invoice between the office provider and the software company will likely include the number of seats, infrastructure costs, and similar details. However, they probably won’t have invoices for transportation costs. On the other hand, if we talk about a manufacturing firm building machinery, their invoices might include transportation costs for different parts of the machine—things like weight, transportation charges, and other associated costs. So, depending on the nature of the business and the industry, invoices can vary significantly.

Emily: Got it. So, industry verticals have a huge impact. Now, can you explain the part of the invoice that contains the actual amounts and item details? What’s challenging there?

Niyati: That’s a great question. Line items the part of the invoice that lists what’s being invoiced for are core to the document. They show things like I gave you 100 apples at this rate, and this is the cost. Line items can also define product codes. For example, an electric bulb manufacturer might offer different types of bulbs, and the line item would define which specific bulbs were delivered. It can also include things like labor charges, such as a service invoice saying, Service provided by person X at rate Y for 30 hours a week. The AI needs to understand that each type of line item has different unit prices, net amounts, taxes, or discounts. The tabular format of these line items determines the final amount or transaction between companies.

Emily: So an invoice will just have one table, right? Can’t we build models to understand that single table?

Niyati: Not necessarily! An invoice can have as many tables as needed. For example, I might decide to show basic goods in one table, additional charges in another table, and vendor identity information in a third table. Someone else might combine all this into one table, or display vendor information as text, the items in a table, and fuel charges as a subscript. So, there’s no rule about just having one table in an invoice—there can be many.

Emily: If I understand correctly, then the industry, document format, and layout of the line items all make invoices quite complex for AI to interpret, right?

Niyati: Exactly. Invoices are semi-structured business documents with a mix of well-written language, codes, and numbers all together.

Emily: What about the small subscript text often found in invoices? Is that important, and can AI models understand it?

Niyati: Yes, it is important. Just like when we sign things online without reading the fine print and later realize we’ve agreed to monthly charges, subscripts in invoices are critical. They can cover things like payment terms, penalties, discounts, or conditions that have already been agreed upon. However, understanding them is hard for AI. AI needs specialized models because subscripts can be tiny and difficult to read, and standard OCR (Optical Character Recognition) struggles with that. You need enhanced extraction capabilities for the AI pipeline to first read the subscript text. After that, the AI needs to interpret the details like if a payment term says a penalty applies if not paid by the due date, and interest is charged monthly, AI has to calculate all of that.

Emily: I get the part about discounts and penalties, but isn’t text extraction from invoices something OCR already solves?

Niyati: OCR does help, but only to an extent.  It’s an established technology, but it’s not always adapted to every scenario. For instance, OCR might struggle with crumpled paper or tables that aren’t read left to right. In some invoices, information like the invoice ID is written vertically rather than horizontally. Just like humans need to tilt their heads, AI needs to do the same. Standard OCR won’t handle that well, so there’s still some work to make it usable for finance and AP processes.

Emily: I didn’t realize there were so many practical challenges involved, It seems like even just reading an invoice requires expertise.

Niyati: Yes, exactly! That’s why it’s so interesting. Out of curiosity, let’s say I have a really advanced AI that solves all these problems. Would invoice understanding be easy then? Not quite. You’d still need a finance-minded perspective or expertise to fully understand invoices. Let me give you an example: If I show an invoice to a layperson, which is similar to showing it to an AI that doesn’t understand these nuances, certain things won’t make sense. For example, if there’s a term like “1% 10, net 30” on the invoice, it may look like a code. But to an accountant, it means if you pay within 10 days of the invoice date, you get a 1% discount. If not, the full payment is due in 30 days. So even if the AI is good at reading the invoice, it still needs to be trained in these kinds of accounting-specific interpretations.

Emily: Wow, after hearing you describe all these challenges, it sounds quite overwhelming. At the same time, it’s really exciting to see AI solving these problems for finance teams.

Niyati: Absolutely! The technology is finally at a point where we can build specialized models to tackle these challenges. It requires out-of-the-box thinking, experimentation, and intelligent product design, but we’re getting closer to solving this in a way that saves time and money for businesses.

Emily: Thank you so much, Niyati. It was wonderful chatting with you and hearing your enthusiasm!