Our work at KIBU covers, among others, industrial innovation, this is how we got acquainted with the problems of companies with constantly changing inventories. One such problem is stock-taking, which, especially in case of big inventories, has to be carried out every one or two weeks and requires a lot of human resources, not to mention the errors coming from human mistakes. Our team has come up with a solution for this problem by creating a working prototype that matches today’s trends and Industry 4.0 innovations. The aim of the project was to develop an indoor drone that carries out logistical tasks autonomously. We had the chance to present our solution at the T-Systems Symposium in November 2019. After the first presentation, we received positive feedback from the market players and several companies showed interest in our solution. In the future, we are going to continue the development in order to turn the prototype into a complete, marketable product.

We divided the project into subsequent phases. After doing the research and conceptualizing the idea, we set the goal of making a prototype. The aim was to build a system in which the drone is able to take off, scan a shelving system with autonomous movements, fly back to its starting point, and simultaneously send the data to a database. For projects like this, it is important to know how drones move, what the difficulties of indoor flight are, how machine vision works and one has to have a deep understanding of machine learning, too. It’s also useful to know the communication protocols of different devices, let it be an RTMP server, TCP, since the devices used in the project communicate with each other in different ways. Experience in Android development was also a must as a significant part of the communication channel was provided by an Android device. We created our prototype with the complex application of the above knowledge, using the following devices:

●      DJI Mavic 2 Zoom

●      Ubuntu Server

●      Android SDK

●     Google Pixel 1      

As our project was planned for indoor flight, it was important to have a small, easily programmable drone equipped with a good camera. A camera barcode reader does not work properly outside a certain distance. However, it is not a good idea to fly a drone closer to the shelves than 30 centimeters as it can fall down or crash into the shelves due to turbulence. In fact, it’s safer to stay at least half a meter from the shelves. Furthermore, there is not much natural light in storehouses, therefore the resolution of the camera image is not the best from this distance. Thus, it is important for our device to be able to zoom with the camera. We chose the DJI Mavic 2 Zoom for this purpose because due to its size it is able to fly indoors and it has a camera with optical zoom function. It can be used with a Windows SDK, which means that we can control it with a single laptop. The truth is, however, that when we were developing the system the Mavic 2 Zoom was not able to carry out Windows SDK movements, it could only take off and land, which, unfortunately, turned out only after the purchase. The official documentation did not meet reality and therefore we couldn’t use the Windows SDK. Nevertheless, when running with the Android SDK, Mavic 2 Zoom could be navigated easily. This SDK has moderately good documentation, which did not always make our task easier, but at least on paper, it supports many things. It is important to mention that although the DJI products and their control panels might seem well-documented, the information in the documentation is in many cases far from reality.

In order to establish a connection between a mobile device and the DJI drone, first, we needed to connect the drone’s controller to the phone through a USB cable. After the device was recognized, we were able to code any controller’s commands using the code written in our Android application. We retrieved the live video from the drone’s camera with the help of the Android SDK and then we used the LiveStreamManager provided by SDK to send the image to a Linux server computer, which processes the information and sends the commands to the drone based on the processed data. We also used our application to make the drone take off and land and to send the movement data to the drone. Further tasks included moving the camera (gimbal) as well as setting and controlling the zoom. The Android SDK provides complete methods for most of the functions, thus the above functions were relatively easy to implement.

Some default security settings, such as the visual sensors, had to be turned off on the drone so that we could fly it indoors in a small area, close to the shelves. If we don’t turn them off, we can only approach an object from 1.5 meters. On one occasion we turned off all the visual sensors, which meant that the drone could only navigate with GPS signal. In this mode, drones can only rely on signals received from at least 6 satellites. However, this was not an option indoors, and thus the drone switched onto the ATTI mode, which is rather dangerous, as the drone does not know where it is and begins to fly in one direction at a constant speed. This was the first case when our drone fell down. We were lucky as the drone flew towards the ground and it did not hurt us. After this accident, we decided to carry out all the flight tests in a cage equipped with a safety net. Also, we drew the conclusion that it is crucial for the drone to have a downward-facing camera otherwise, it is not able to hold its position, which can result in accidents such as the one mentioned above. The camera monitors the floor and localizes itself based on the patterns, therefore it is particularly important that the surface is not too homogenous or reflective.

We created different flight modes on the drone in order to make it as effectively navigable in different environmental conditions as possible. The modes that emerged during the development are important for understanding the flying mechanism. Such modes are for e.g. the TRIPOD, POSITIONING FLIGHT MODE, SPORT, OPTI, ATTI, and JOYSTICK. The first three modes (T, P, S) can be set by a physical switch on the drone’s controller, while the others can be modified through a software. We are going to present the flight modes in a separate article. For indoor flights, we aimed at using the OPTI mode.

Machine vision: The collection and analysis of images affect how a system is controlled.

As we did not have enough GPS signals indoors, we could only reach autonomous navigation if the drone knew where it was located. As we used a camera for barcode scanning, i.e. we had a real time video data stream, we had to develop a localization system for the camera. We solved this by relying on the Aruco markers, which were developed by Rafael Muñoz and Sergio Garrido and have become a module of OpenCV. These are actually QR codes with easily measurable detection and 3D position. We can simply calculate the distance as we know the markers’ x, y, z coordinates, i.e. we know the positions of the markers measured from the center of the drone’s camera. If we have “X” different markers, we are able to use them to create a 3D coordinate-system where we can define the drone’s position, too. That is, if we dedicate a virtual 0 point somewhere (let it be the corner reference point on the right side of the cage), we can locate all the markers (x, y, z). We can easily transform the data from the markers’ coordinate-system to the drone’s coordinate-system and vice versa. This way we can reach centimeter-level accuracy. During the development, we had to consider camera distortion and correct it with camera calibration and calculational solutions (distortion coefficient).

Aruco marker detection

Distortion coefficients: factors affecting camera distortion.

By applying the above-mentioned solutions, we were able to define the drone’s actual location. The next challenge was barcode scanning. We used an open-source library called Zbar, which can read the value of barcodes appearing in images and videos.

Machine learning: Systems which are able to learn, i.e. to generate knowledge from experience. It is a branch of artificial intelligence.

As mentioned previously, the barcode reader needs a closeup of the object in order to function properly. We came up with the idea that an object recognition process should start to scan for the barcode if the drone’s camera is not within a certain distance. If there is a hit, the camera zooms on the center of the barcode and thus provides a big, high-resolution image of the barcode. What method did we use to recognize distant barcodes? We turned to machine learning by applying Faster RCNN. The thorough explanation of this mechanism would require a separate article (for an in-depth study of this object recognition method see the related scientific paper in the References section). This method is a solution that can be applied to barcode scanning quickly and easily. We annotated more than 2000 images for the training (we framed the spots where we saw barcodes), then we trained these images with a pre-trained model. Pre-trained models are trained for basic patterns thus they have the advantage of not requiring tens of thousands of images for the training process.

Faster RCNN in practice

Pre-trained models: Models that contain the weights so we only have to fine-tune them when applied for our solution. For e.g. the model contains the shapes of the cars but we only want to detect certain brands or shapes, or the model knows the shape of the lines and if there are many black straight lines it recognizes it as a barcode, etc.

For machine learning we used Keras. This is a high-level neural-network interface written in Python and able to cooperate with other ecosystems. It was designed to enable fast experimentation. In the test phase, it became clear that our model was able to predict with high accuracy. When we were using it we framed the objects with square markers. The drone zooms on these areas to scan the barcode. After the barcode is scanned the data is sent to a server and is shown on a visual interface. During the demos, this interface and the live image can properly demonstrate our system’s operation and accuracy. If we change the boxes on the shelves, the visual interface changes accordingly. For example, if we replace the box at A1 with the one at C4, this appears on the interface too. In the future, this solution might be applicable in business operation systems, for e.g. during stock-taking to compare real inventories with the ones in the system.

During development, we faced many difficulties. The most frustrating and unexpected obstacle, however, emerged in the last couple of weeks: in default position, the drone started to fly right forward. The online research we made suggested that many people experience the same problem called “shifting”. We read about several possible solutions, such as recalibrating the sensors, the RC and the compass, but these did not work in our case. We also read about a solution that suggested recalibrating the compass outdoors 2-3 kilometers from the place of use, but we did not have the opportunity to try this. During the demo, we solved the problem of shifting by moving the drone in the counter direction: with the help of the markers the drone detected that it was moving right forward and thus it had to move left back to scan the barcode.

The downward-facing camera also caused a problem: if it didn’t see the surface or the patterns on the surface clearly, it switched to a “renegade” ATTI mode. If this function hadn’t existed or could have been turned off, we would have had an easier task indoors during the initial developments. As we didn’t have this option, we covered the area’s floor with as many colored and non-reflective stickers as possible. We also missed the partial usability of the sensors. It would have helped if the sensors had been separately adjustable on the drone. Being able to easily extract the data of non-visual sensors is also something we could have made use of. We have ambivalent feelings about DJI support, too. Our Android SDK-related questions were answered within a short time, but in terms of the Windows SDK we missed the detailed descriptions with reference to all drones. We could not find any mention of the Windows SDK’s limitations in terms of the Mavic 2 Zoom, which took a lot of time from development. The support did not provide comprehensive information about the SDK’s shortcomings and we did not receive satisfactory replies to our questions either. A usual reply from the support on official DJI forums was that we can always rely on the help of the developer community (which is true), but at a certain point of the development, we had to translate the content of Chinese websites through Google Translate, which, much to our surprise, could actually contribute to solving the problem.

The DJI Mavic Mini came out halfway through the developments, and we would absolutely consider using it for the demo part of the project. Due to its small size, it would cause less turbulence and therefore it would become safer to fly towards the objects, eliminating the zoom function from the project. After the successful pilot, it would be useful to check these assumptions by testing our currently applied solution with this new model.

During the project, we learned a lot about current drone technologies, the DJI products and the difficulties of flying drones indoors. At the end of the project, we drew the conclusion that for further developments we would build our own drone so that we could define what components it should have. This way we could eliminate the significantly disturbing ATTI mode and we could choose a framework that suits the project’s aim better than the Android SDK. We would be able to change the camera, the size of the drone and the sensor data needed for implementing the project. The techniques and devices we used (Aruco markers, FRCNN) seem promising. In the future, we are going to continue the development to create an even better, more universal product.

No creatures were harmed during the project, only 10 propellers were broken.