Stereo Vision Systems for Drones
In our last article on stereo depth, we discussed how stereo vision systems can be used to map an environment in 3D, as well as some pros and cons stereo vision has over other mapping approaches. In this article, we will explore some use cases for stereo vision systems in drone and UAS development, as well as highlight some of the general challenges that come with integrating stereoscopic vision into resource-constrained robotics products. Finally, we list our favourite systems and some of their unique features so you can choose the right product for your AI or robotics application.
Why do drones need to map their environment in 3D?
Let’s first refresh on why drones and other UAS systems (which are in many ways, just a robot) need to map their environment in 3D.
Navigation & Autonomy
3D mapping is critical for drones and other systems to navigate safely and effectively. In particular, tasks such as stabilization, self-localization, collision avoidance, landing, and autonomous behaviors often rely on an accurate, real-time 3D representation of the surrounding objects and environment. In GPS-denied environments, or environments with many dynamic objects, 3D mapping is even more critical.
Surveying and Mapping
3D mapping is often required to create high quality datasets for a variety of surveying and mapping applications such as site assessment, stockpile inventory management, cut/fill measurements, topographic information, mining, and change detection (e.g. land use change). While photogrammetry, synthetic aperture radar, and LiDAR are the most dominant techniques, stereo vision can also be used, especially in time-critical applications such as disaster relief and search and rescue.
Inspection and Maintenance
3D mapping can be used to inspect and maintain structures and infrastructure, such as buildings, bridges, power lines, or tunnels. In these applications, the goal is usually to build a “digital twin” of an asset, mainly using visual data, to effectively conduct preventive maintenance and manage risk. Because inspection and maintenance applications typically present hazardous conditions for humans, the use of drones and 3D mapping technology greatly improves workplace safety.
Real-time 3D mapping improves the perception capabilities of a drone dramatically over other non-3D methods. This can be important not only for the use cases above, but also for better decision making, better obstacle detection, improved object tracking, and better tolerances in general during operation. If perception performance is not sufficient (e.g. Fig. 1), it will be impossible for drone autonomy applications to truly take off.
When Stereo Makes Sense
3D mapping can be achieved with a variety of sensors and approaches, so when does it make sense for your system to use stereoscopic vision instead of something else? To answer that, we need to know the advantages stereo techniques provide that other systems do not.
Weight, Size & Power
These factors are always important for developing a robotic system or product, but they are even more important when that product has to fly in the air. Stereo systems tend to be much lighter, much smaller, and draw much less power than, for example, their LiDAR counterparts. This makes stereo a compelling choice for applications where mission duration or product size are important.
It’s no secret that camera technology costs have dropped significantly. Today cameras have one of the best cost to byte ratios of any sensor out there today. As you will see in the product reviews below, these systems tend to be much less expensive not only because of the hardware itself, but also because of the improved integration costs due to their size and power requirements.
Stereo opens up the so-called operational envelope of your drone or robot dramatically because it can operate in real-time. Tasks such as obstacle avoidance or autonomous landing are not possible to run using post-analysis, making these systems a great choice for any time-critical application.
High resolution, colored images are possible with stereo vision systems, enabling downstream computer vision tasks that may not be possible or accurate enough using LiDAR or other alternatives. More accurate localization (either for the robot or objects in the environment) and better decision making are also enabled with better data.
Where Stereo is Less Useful
Despite the many advantages, stereo vision systems may not (and should not) be used in all applications. Below are some cases where stereo may not be required or exceptionally useful.
Relatively Simple Operating Environments
Higher altitudes, larger offsets from structures, or higher flight speeds all result in simpler operating environments that may not require the real-time accuracy of a stereo vision system. If landing, navigation, and even detect and avoid procedures are relatively simple or accomplished with other sensors (such as linear sensors or cameras), stereo systems will likely add unnecessary cost and complexity to your project.
Conditions with Poor Visibility
Stereo systems rely on visual data, so if that data isn’t available or reliable, stereo systems will not be sufficient for operations. Examples of visibility problems can include fog, smoke, dust, or canopy cover. These situations do not mean a stereo system is useless – for example, it might need to autonomously land after passing through the fog – but it does limit the ability of these sensors to accomplish missions in certain types of conditions. We note that structured light and IR sensors can of course change the visibility conditions, but because they are active, they can affect power consumption drastically.
Our Favorite Stereo Vision Systems for Robotics and Drone Development
Stereo vision systems have some clear advantages and disadvantages when it comes to robotics and drone development. However, with so many options on the market, how do you choose what is best for your application? Below we have compiled a list of some of our favorite sensors we have used to develop products at Adinkra and some useful information on their cost, performance, and community support.
The OAK-D lineup from Luxonis is a favorite among robotics developers. The Pro variant boasts an IR dot projector for low-to-no-light environments, while its on-board computer enables significant edge processing, including object detection and localization. The team behind this sensor has dozens of tutorials, making prototyping for your application a breeze. Other variants also allow users to change up lens type and properties for their application.
- Color Resolution: 4K/30FPS, 1080P/60FPS
- Stereo Resolution: 1280×800@120FPS
- Mass: 91g
- Connectors: USB C (supporting USB 2 and USB 3)
- Processing power: 4 TOPS of processing power (1.4 TOPS for AI)
- Frames per second: 120 FPS stereo
- Dimensions: 97 mm x 29.5 mm x 22.9 mm
- Price (USD): $299.00
The TaraXL is a high-performing, extremely lightweight option. It may not have onboard processing, but it stands out as one of the few options to leverage CUDA natively in their SDK, so you can leverage the hardware of your NVIDIA Jetson device right away.
- Resolution: Resolution: 0.3MP
- Mass: With Enclosure: 80.5 Grams
- Connectors: USB C (supporting USB 2 and USB 3)
- Speed: Streams VGA @ 60 fps & WVGA @ 60 fps
- Dimensions: With Enclosure: 100 x 30 x 35 mm
- Sensor: MT9V024 from Onsemi
- Price (USD): $349
Arducam 12MP MINI IMX477 Synchronized Stereo Camera Bundle
The Arducam 12MP MINI IMX477 Synchronized Stereo Camera Bundle is a far more DIY solution, for better or worse. You have full control over the baseline, lenses, software, and even cameras, but you also have to figure that out to get it to work at all. While it’s listed as Raspberry Pi compatible only, any device that uses a Raspberry Pi-style 2-lane CSI-2 connector should also work. There’s also a version for the NVIDIA Jetson Nano and Xavier NX devkit-style 2-lane CSI-2 connector.
- Resolution: 4056(H) x 3040(V) 12.3MP
- Connectors: Raspberry Pi-style 2-lane CSI-2
- Speed: up to 30 FPS
- Price (USD): $204.99
Stereolabs Zed 2
The Stereolabs Zed 2 is another favorite for robotics developers. Unlike other options, the Zed 2 features numerous sensors beyond the typical camera array and IMU, including a barometer, magnetometer, and temperature sensor. Backed by a powerful, well-documented and easy-to-access SDK and high performance cameras, this is one of the most fully featured devices on the market for prototypers and developers.
- Resolution: up to 4416×1242
- Mass: 124g
- Connectors: USB A (supporting up to USB 3 5Gbit)
- Speed: up to 100 FPS
- Dimensions: 175mm(W)x30mm(H)x32mm(D)
- Price (USD): $449
Carnegie Robotics MultiSense S7
The Carnegie Robotics MultiSense S7 is the first industrial-ready option on our list. With a narrow baseline and on-board lighting, this is meant more for close-range, precise guidance rather than navigation. Featuring IP67 ingress protection for dust and water, ruggedized housing and outdoor-rated connectors, Corning Gorilla Glass lens shields, and a wide temperature range, it’s truly meant to work in any environment. On-device depth processing and a multitude of integration options with direct support available from Carnegie Robotics makes this an easy device to add to any system. However, like the other industrial-ready options, you’ll have to work with Carnegie Robotics directly to get your hands on one.
- Resolution: up to 2048×2048
- Mass: 1200g
- Connectors: Gigabit Ethernet
- Speed: 30 FPS
- Dimensions: 65mmx130mmx130mm
Scarlet 3D Depth Camera
The Scarlet 3D Depth Camera line is another industrial-ready option. Like the S7, it features IP67 ingress protection for dust and water, ruggedized housing and outdoor-rated connectors, and a wide operating temperature range. There’s on-board image processing, great integration options with direct support available from Nerian, and a significant number of customization options including baseline and lenses to tailor to your specific application. The Scarlet 3D also features 10G ethernet for full output of image and disparity at full resolution and framerate. However, like the other industrial-ready options, you’ll have to work with Nerian Vision Technologies directly to get your hands on one.
- Resolution: 5.0 MP
- Mass: 1900g for 25cm baseline
- Connectors: Gigabit Ethernet, 10G Ethernet, GPIO (for trigger synchronization), 3-pin
- Binder 718/768 for power
- Speed: up to 125 FPS
- Dimensions: 320x68x148mm for 25cm baseline
FRAMOS D435e Industrial Depth Camera
The FRAMOS D435e Industrial Depth Camera continues our line of industrial options. Based on an Intel Realsense D430 module and D4 Vision processor, this features onboard processing and is the first industrial-ready option that is more openly available. It’s a great option for those who prototyped something out with another Intel Realsense camera and need to move up to a more ruggedized enclosure. This features IP66 ingress protection for dust and water and outdoor-rated connectors too, so it’s ready for harsher environments than your prototype. It’s also a good camera to start a project off with, as it’s fully compatible with Intel’s Realsense SDK, giving you plenty of options for integration into your project and a wide community of users with other Realsense devices.
- Resolution: 1280×720
- Connectors: Ethernet M12 (GigEVision)
- Speed: 30 FPS
- Dimensions: 100x38x47mm
- Price (USD): $909
Another industrial-ready option, the rc_visard line is an indoor-oriented option with the lesser IP54 rating for ingress protection against dust and water while still featuring ruggedized industrial connectors and a wider temperature range. It also features an onboard Nvidia Tegra K1 for on-device processing and a wide variety of customization options including baseline, lenses, and even cameras. There’s also an option for a dot projector for improved low-light performance and a first-party compute node to speed up their proprietary modules. However, like most of the other industrial-ready options, you’ll have to work with Roboception directly to get your hands on one.
- Resolution: 1280×960
- Mass: 680g (65mm base), 850g (160mm base)
- Connectors: 8-pin A-coded M12 socket for GigE, 8-pin A-coded M12 plug for GPIO + power
- Processing power: up to 365 GFLOPS
- Speed: up to 25FPS
- Dimensions: 135x75x96mm (65mm base), 230x75x84mm (160mm base)
The Duo MC is the lightest and most compact option on our list. It’s very simple, with no onboard compute or extra sensors, and has fairly low resolution cameras. The benefit is that this is also one of the lowest power consumption devices and the only one to feature full frame rate and resolution RAW support over a USB 2.0 connection, making it great for bandwidth-constrained applications. There’s also a full-feature SDK to make full use of the hardware that is there, and despite the lack of extreme ruggedization like other industrial options, there is direct support from Code Laboratories available for industrial integration. This also means, however, that you’ll need to work with them directly to get your hands on one.
- Resolution: 752×480
- Mass: 30g
- Connectors: MicroUSB
- Speed: 320 FPS+
- Dimensions: 57x30x15mm
The Arcure Omega is the best option for extreme environments. Featuring IP69k ingress protection for dust and water and a ruggedized enclosure and connectors, this is the only option on our list that you can clean off with a pressure washer. This rating comes at no surprise, as it’s intended for use on heavy machinery like harvesters, tractors, loaders, cranes, and more. This translates into excellent support from Arcure due to their experience especially with heavy machinery projects, and they offer significant customization options of the entire device and software stack to tailor it to your exact needs. This comes with the caveat of having to work directly with them for everything.
- Resolution: 828×544
- Mass: 1200g
- Connectors: Gigabit Ethernet
- Speed: up to 11FPS
- Dimensions: 186.8×75.95×64.6mm
Intel – D415 stereoscopic
The classic choice of robotics developers, the Intel Realsense line has been a staple since the first releases in 2015. With the updated D400 line, Realsense has come to define what we expect to see in a stereo camera, including on-device processing, a fantastic and constantly evolving SDK with significant integration options, an ever-growing community and detailed examples, infrared dot projectors for better low-light performance, and an optional IMU. The Realsense D415 is specifically tailored towards near-field, high-accuracy measurement. However, there are many other options within the Realsense lineup that may make more sense for your application, especially if you need an IMU. The bare modules also offer an excellent path towards better productization and have enabled the growth of an ecosystem of aftermarket camera options that are compatible with the same software stack while giving you flexibility in housing and connector options, like the aforementioned FRAMOS D435e.
- Resolution: 1920×1080
- Mass: 75.6g
- Connectors : USB-C 5Gbit
- Speed: 90 FPS
- Dimensions: 99x20x23mm
- Price (USD): $272
No matter where you are in your development – from prototyping to production – there are a lot of great stereo vision systems out there that can pack a punch for your application. Regardless, integrating these systems effectively requires not only a careful consideration of the application requirements, but also managing power, compute, software integration (e.g. using something like the Robot Operating System, ROS), calibration, and the relevant transforms. However, we hope this post gave you a great starting point in your development for stereo vision for your drone or other robotic system development journey.
Adinkra is an R&D engineering firm helping customers create state of the art robotics and AI products while minimizing costs and time to market. We combine a world-class engineering team with a flexible project management framework to offer a one-stop development solution and unlock your product’s full potential for your customers.