Development of smart-real time data recognition application for metering devices
Artificial intelligence has nowadays been used in countless applications in various fields. Thanks to increasing processing power of affordable chips and large amounts of available data, machine learning opened new opportunities for computer systems and software. Computer vision, in particular, showed its effectiveness in identifying and counting objects as well as reading content such as text and numbers. These applications proved to be useful not only in scientific fields but also in everyday use. Apps can recognise objects, quickly read data from them and send them into a database without any human input, eliminating costs and saving time of both the data handler and the user.
In this blog we would like to present one such application – our mobile app for automatic data capturing for electricity measuring devices, and AI models used to identify and read displayed data.
Keywords: mobile app, electricity measuring devices, object detection, digit recognition, labels.
I. Web Application
Our app is created with the purpose of performing real-time detection and analysis of a video stream, taken from a device equipped with camera, such as smartphones and tablets.
The app works by pointing the camera at a metering device, from which it detects data, and subsequently sends it to the backend for bookkeeping and further analysis.
The app UI is composed of camera, help, send feedback, and straightforward form. We have built one-purpose application with easy and intuitive flow. The whole capture is done automatically with minimal user intervention even for devices with multiple switching tariffs - all the user needs to do is enable camera and point it to his/her powering meter. After each capture user gets an instant feedback message based on capture result (successful/unsuccessful capture). If a device has multiple switching tariffs that have to be captured, user has to hold his camera a little longer until all tariffs are captured. The user is also guided, app shows him/her, what parts are still not well captured based on detected device type.
By moving their camera closer to / further from the device, the user can see how indicators change, thereby increasing probability of successful image capture. The whole capture process is very fast and takes less than few seconds. Once all necessary segments are captured, camera is automatically closed and the UI displays a mini-form with captured values.
At this point, the user can edit the values or send them to the server for bookkeeping and later retrieval. The only requirement for the mobile phone is to have a working internet connection, so that captured segments can be sent to the backend side for quick evaluation and text/digit extraction. Although the text/digit extraction can be performed on the mobile phone itself in order to avoid some issues with performance, precision and some quirky device/OS-specific issues, which we observed during internal testing phase, we currently run all tasks on the backend side exclusively.
The image is being sent to the backend only when all image quality requirements are being fulfilled (e.g. image is not blurry, segments are clearly visible and lighting conditions are good enough), and thus lowering costs of the data transfer over the internet (sending the whole camera stream of even one photo per second would be unnecessarily expensive and inefficient).
2. Application architecture
The app is divided into two parts:
- The backend is used for data validation, optical character recognition (OCR) parsing, digit extraction and recognition using another ML model.
Because it is a web application running in the user's web browser, there are some limitations regarding its performance and features available on mobile devices.
We tried to avoid some of these inefficiencies by running the AI model for image analysis directly on the device GPU (graphics card). Unfortunately, on the iOS platform, the GPU part of the library for running the model sometimes produced incorrect results due to low numerical precision in some data structures used by the library itself. This was another major reason why we decided to move a part of the code to the backend side.
An alternative approach that would help mitigate some of the issues is to write and maintain separate native application for each specific platform (iOS/Android) and use a platform-specific fine-tuned image processing library. It would, however, present serious disadvantages in our case. Maintaining a separate native app for each platform is much more costly in terms of development/testing/deployment complexity. App distribution is also simplified in the case of web platform, because it requires only opening the browser and navigating to the application via URL.
II. Object Detection
Object detection is a subfield of computer vision and is used to automatically identify items of interest in images. In the process, coordinates of bounding boxes that surround the items of interest are determined and the identified areas can be processed further.
1. Data augmentation
Our goal was to correctly identify objects of familiar metering devices, such as displays, serial numbers and device labels. To do so, it was necessary to have a dataset with images of sufficient quality. However, one of the challenges was a lack of high quality images. In order to tackle this challenge, couple of techniques have been performed to improve the quality of available images and to augment the dataset. Using Keras – a deep learning software library – we could perform data augmentation techniques such as random rotations, shifts and dimension reordering. With these techniques we have had around 25% more data for our dataset with diverse samples of sufficient quality.
2. Cognitive Services
In order to identify the boxes that surround the items of interest, we have used cognitive services from Microsoft Azure. It is a set of ML algorithms and tools developed by Microsoft to solve problems in the field of AI.
The model was trained on 700 images and four types of metering devices. Each of these devices has a different layout of elements and the ML model has to be able to identify the elements and label them correctly without asking the user to specify the device.
3. Model performance and prediction
The model performed with precision of 96%, recall of 77% and mean average precision of 91%. The image below, shows a processed image with the probability in each tag.
III. Digits Recognition
In order to identify and read digital characters on the device, common Python libraries dedicated to this purpose (e.g. Tesseract) were considered. However, we were interested in reading digits only from a certain part of the display area (as shown in the picture below). Moreover, the digits on device displays consist of segments, which presents additional challenge for such libraries.
Thus, we came to a conclusion to train our own ML model. To do so, it was necessary to get a dataset of identified digital numbers of similar fonts. We did not have such dataset, so it had to be created.
1. Digits extraction
The first step was extracting digits from each available image and saving them separately into folders labeled from 0 to 9. This new dataset was the foundation for our trained model.
2. Feature Engineering
In order to identify the unique characteristics of each digit and enable the model to cluster similar digits, a histogram of oriented gradients (HOG) algorithm was used for the feature engineering. The image was divided into cells and blocks so the algorithm can calculate gradients and orientation for each single cell.
After performing the feature extraction, training and testing datasets were created. Subsequently, the k-nearest neighbors (KNN) algorithm was utilised for the recognition due to its effectiveness in this kind of problems. The model for identifying individual digits was calculated to have an accuracy of 86%.
Once the model for identifying individual digits was trained, it was stored in a pickle (binary file) format and could have been used for recognition. New images have been prepared, cleaned using adaptive Threshold and other techniques, and dilation and erosion were applied. At his point, we were able to successfully recognise digits on displays of metering devices.
AI apps have opened possibilities for broad range of applications across many industries. We are thrilled to be one of the drivers of these change. Our app for real-time automatic data capturing for metering devices is only one of many applications of machine learning that we dedicate our time and effort to.
Stay tuned, more blogs will be published!