Database Heterogeneity Helps Address IoT Analytics Challenges
Developers turning IoT endpoint data into useful analytics have encountered proliferating built-for-purpose database types.
September 22, 2020
Key takeaways from this article include the following:
As the variety and volume of IoT endpoints continues to grow, new database types are gaining greater consideration.
Use of dedicated time-series databases can be expected to expand as people deal with ever-increasing volumes of time-oriented data.
Scalability is a chief objective of the specialized databases deployed for many IoT projects.
Data analytics was once limited to a few general-purpose relational databases. But, as large-scale web and Internet of Things applications proliferated, that situation has changed.
Today, the database options for IoT analytics are almost as diverse as the many IoT data types the new systems are meant to handle.
As a result, “built for purpose” databases have emerged: that is, databases less overarching in their intent and more focused on specific problems. These problems may involve ingesting data streams created by users or, in the case of Industrial IoT, a wide array of sensors measuring lidar correlates, telemetry positions, temperature and pressure readings. Open source versions of the built-for purpose database are often available.
High scalability – the ability to handle ever greater volumes of data to perform some IoT analytics in real time, as the data is generated – is a major objective of these specialized databases. The abilities to build out quickly and change on the fly are also paramount. Still, these are early days, and much developer craft is required to turn the vast raw data into useful analytical information.
In fact, relational SQL-based databases (Structured Query Language) still often play roles in IoT analytics. However, they have been joined now by a new lineup of NoSQL software that includes streaming data engines, document databases, key-value stores, search engines, graphs and time-series databases.
New database types gain greater consideration as the variety and volume of IoT endpoints continues to grow. More such growth is anticipated as sensors multiply.
In the manufacturing and natural resources industries alone, analyst group Gartner has predicted installed IoT endpoints to reach 1.9 billion units in 2028, or over five times that of IoT endpoints that comprised 331.5 million units in 2018.
Flexibly Drilling for IoT Analytics
A long-standing issue with relational databases has been their requirement to decide on a data schema before beginning development. That inhibited flexibility in application development generally.
Developers and line-of-business heads have found this limiting, as changes to schema can become showstoppers for new IoT apps or additional IoT app features.
This played into the planning of a development leader in the oil and gas data field. Jim Wang, software engineering manager at Houston, Texas-based start-up Corva employed the increasingly popular MongoDB document-oriented database as part of the company’s efforts to provide real-time, interactive data analytics services for oil-related industries.
As part of a larger system that uses Kafka messaging and AWS Lambda serverless computing, Corva employs MongoDB delivered as an Atlas cloud database service managed by the database’s originator, MongoDB Inc.
The database enables Corva to handle heterogeneity among oil rigs. Corva has expanded its service in 18 months to cover 270 rigs with 15 terabytes of data processed daily, said Wang.
“It’s really the flexible schema,” Wang said. “There are different kinds of oil and gas wells and different kinds of IoT sensor data, and you need to have a variety of different schemas.”
With a relational database, such schema variety is not realistic, he said.
Just as important for handling growing amounts of oil field data is the ability to scale.
In terms of cost, scaling means that “you don’t pay exponentially for exponential growth [in data],” Wang explains. In terms of processing, it means processing power does not diminish as database partitions are added. With MongoDB and other databases adding such capacity is described as “sharding.”
Object storage
Another overriding benefit is ease of programming, Wang said. In the past, developers typically had to work with relational SQL to handle data storage. This could be a complex undertaking. With MongoDB, complex software objects created by developers can be stored more easily, he said.
The value of simple object methods – especially in the form of JSON (JavaScript Object Notation) – has not been lost on mainline relational database vendors, such as IBM, Microsoft and Oracle, which have come to support forms of JSON within their flagship offerings. JSON extensions specific to IoT have also arisen.
Sahir Azam, chief product officer at MongoDB, said apps like Corva’s help show the value of the document database in IoT settings that call for “a heterogenous blend of data models.”
Among other MongoDB IoT use cases Azam cited were Bosch’s use of MongoDB as part of its IoT Suite, and Thermo Fisher’s use of MongoDB to provide spectrometry instrument status updates via a new cloud service.
It’s About Time-Series Data
Time-series data has lineage in industrial settings.
Manufacturing plants have long recorded and placed time-oriented activity in operational historians — proprietary stores usually defined by the original manufacturing equipment makers.
Relational databases too, some with special functions, have regularly been used to handle time-series data.
But the pace of new time-series database releases has picked up. Today, the roll call of time-series optimized data tooling includes InfluxDB from InfluxData, IRONdb from Circonus, kdb+ from Kx Systems, TimeScaleDB from TimeScale, and others.
Cloud players with IoT efforts are active in IoT data analytics systems as well. For Microsoft, that takes the form of Azure Time Series Insights. For Amazon Web Services, that takes various forms, including AWS IoT SiteWise, a managed service that collects data from plant floors, AWS IoT Analytics for time-series analysis and Timescale Cloud, a managed service for instances of TimescaleDB.
As it has grown, AWS has been an advocate for a purpose-built approach to databases. The company worked to create relational (including RDS and Redshift), document (DocumentDB), key-value (DynamoDB), graph (Neptune), time-series (TimeStream) and other databases for a variety of uses.
“We have a plethora of databases,” admits Dirk Didascalou, vice president of IoT, AWS. The reason is “you can’t just use any kind of data for any kind of problem.”
While the developer role is important, he notes that IoT projects are not always driven by developers, and that business decision-makers are often at the helm. “We [encounter] almost every role you can imagine,” he said.
Didascalou noted that industrial companies using AWS IoT SiteWise to tap into operational historical data include Volkswagen Group, Bayer Crop Science and Pentair filtration systems.
Much About the Cloud
Dedicated IOT platforms such as PTC’s ThingWorx have a natural need to handle data in the time domain, according to Chris Baldwin, vice president of product for ThingWorx at PTC.
“Almost all of PTC applications running on top of ThingWorx have some level of time-series trends and analysis,” he said, discussing PTC’s collaboration with time-series database maker InfluxData.
As is often the case with the cloud, several layers of services are involved. In short, PTC has agreed to sell and support the cloud version of InfluxDB Cloud for deployments of ThingWorx hosted in PTC’s cloud environment, which counts Microsoft Azure as a “preferred” cloud platform.
Baldwin said that ThingWorx time-series data was originally stored in a SQL database, but over time it was found to be “slow and hard to manage.” That set PTC looking for something best in class, and resulted in the deal with InfluxData.
“As an industrial IoT player, we are trying to differentiate in things that no one else is as good at,” Baldwin said. That means forgoing the proprietary protocols, visual frameworks and data storage types that may have been used historically.
Use of dedicated time-series databases can be expected to grow as people find they are dealing with ever-increasing volumes of time-oriented data, said Tim Hall, vice president, products, InfluxData.
“What has changed,” he said, “is that people want greater data granularity and access,” with ‘data granularity’ here being characterized as increasingly greater intervals of data sampling.
Finding What Works for IoT Endpoint Analytics
MongoDB and InfluxDB, like several of the newly styled databases, are offered in both open source and commercial versions. While IT has been increasingly ready to adopt open-source software, manufacturing operations have been slower, according to Frederik Van Leeckwyck, the business development manager at Factry.IO.
Van Leeckwyck said Maarkedal, Belgium-based Factry.IO grew out of engineers’ interest in bringing a more open approach to handling industrial manufacturing data. As part of that, it used InfluxDB and open-source Grafana time-series visualization software. The result is the Factry Historian data collection platform.
Among varied time-series data applications Van Leeckwyck has worked on are innovative energy undertakings. Included are systems to manage an industrial plant that generates energy using non-recyclable wood and a wind farm optimized to operate in a major European city.
“To get new insights into a process you need to know what time and where an event occurred,” he said. So, time-series analytics becomes an exercise where conditions are overlaid on such events, to provide context. Useful context varies widely by industry, and could, for example, be represented by temperature, plant location, shift, which personnel were working a production line, and so on.
Van Leeckwyck advises that comprehensive approaches that engage both operations-side engineers and IT-side software developers are the best way to go about Industrial IoT projects. That can help free data so that it has uses beyond just the factory floor.
Challenges are found throughout the life cycle of IoT data, and range from high-level architecture to implementation details. For example, Van Leeckwyck said, it is important for front-line naming protocols and basic configurations to mesh with back-end data handling.
Overall a team must understand what its technology of choice is capable of, and what the business is trying to achieve. He said: “and the database developer should be responsible for making sure that the data can actually be collected and stored with the resolution and frequency needed.”
Industrial IoT is maturing quickly in bridging what is widely perceived as a gap between operations and IT, according to Van Leeckwyck.
At the same time, he indicated, users will look for ways to build systems that include some level of architectural abstraction that will allow them to keep their future options open.
About the Author
You May Also Like