Priming Data For RPA

By Express Computer On May 27, 2019

By Abhiram Modak, Chief Principal Consultant – BFSI Vertical, Persistent Systems

Robotic Process Automation (RPA) is becoming an indispensable part of back-office operations at banks and insurance companies.It is taking over the task of populating data into your old mainframe and other old systems which are difficult to open through web services or APIs.

With RPA, a bot can function as a human would, using the same user interface and keys, mimicking the same behaviors. They achieve this ability through observation. The new generation of intelligent bots is designed to learn how humans use applications. The robot will follow actions and mouse clicks, and learn over a period, eventually becoming capable of replicating the entire process by itself. It is already functional in the current IT landscape, boosting efficiency wherever it goes. There is, however, a chink in its Artificial Intelligence (AI) armor that requires attention.

Embracing RPA inherently involves assuming that the data populated by the bot is correct, but this may not always be true. Unlike a human, a bot cannot easily spot a discrepancy in a hand-filled form while entering data from it into the system. For example, if the address listed on the form is City: Mumbai; State: Gujarat; PIN: 400 001, a human would pick up on the error. Without the need for any instructions or clarifications, the human would know to correct the state entry from ‘Gujarat’ to ‘Maharashtra’. A bot, on the other hand, would typically not pick up on this and would enter the erroneous information.

The above example is a simple one, but it conveys the message: data must be properly sourced, cleansed, verified, and validated before a bot consumes it. There are, of course, ways around this. The application could be built or validations written to cause a red flag to be raised when such errors occur. However, that becomes an exercise in itself. Instead, there is the option of going the auto data priming route.

There are various ways to do this, but in the spirit of automation, the most obvious one entails Optical Character Recognition (OCR)+Machine Learning (ML) on incoming data. Where the data is already in soft form, ML solutions alone will suffice. The idea here to train the ML solution in particular processes and domains.

The ML solution will learn about the ‘3Cs’ of data viz:
1. Correctness (spellings, grammar)
2. Context (i.e., apple as a fruit vs. Apple as a company, Mumbai as a city in the state of Maharashtra and not Gujarat)
3. Category (entities, amounts, dates, etc.)

Once the data is primed, it can be picked up by the RPA tool and populated into the back end with more accuracy. RPA tools work extremely well once they have the right data to populate, primarily due to the fact that RPA is rule-driven. So,in the case of names, one can always expect names to be formatted a certain way, e.g., Last_name, First_name. Essentially, the downstream systems using this populated data can be sure about consistency. ML models are particularly useful in unstructured data situations such as when data comes from printed documents. They make it easy to pick up entire sentences, do a ‘3C’ analysis, and have the data ready for the RPA tool to consume. Otherwise, these sentences would need to be typed manually into various systems.

Banks are seeing over 40% savings on effort and more than 90% accuracy over end-to-end automated processes when they entail OCR+ML+RPA. Data priming makes an important contribution to these numbers. However, there are pitfalls of the ML strategy, an important one being data bias. This is true for most domains in which ML is implemented. To learn enough, ML models need to see plenty. This means that the more data they see, the more they learn. If the data they see is consistently wrong, then the ML model will learn the wrong data. As an example, if the correct Last_name, First_name of a person is Smith, Jon but is consistently miswritten as Smith, John, then the incorrect name becomes the learning. To rectify this, mechanisms to unlearn would need to be designed. This is also an essential part of data priming.

Data priming can be of great benefit to RPA implementation. These implementations at various banks and insurance companies are not without their obstacles. There are multiple factors involved in RPA failure, which can be debated at length. Despite certain weaknesses and vulnerabilities, however, RPA is here to stay as its benefits outweigh its limitations. Data priming can certainly help iron out a few creases in the RPA implementation process and make it smoother.