IoT Developers May Need Time Series Data Analysis Skills
For many IoT developers, time series data analysis is a new frontier. IoT developers may need to broaden their skill sets to build apps that can exploit newfound sensor data.
January 3, 2020
Today, IoT devices gather data at an ever-faster clip. But understanding the data generated and finding meaningful patterns amid the torrent remains an obstacle to successful Internet of Things implementations.
The fast-arriving data streams can take the form of time series data of varied types. It must be handled in sequence, and it is updated in small but continual amounts.
Time series data types are familiar in geological research, processing, manufacturing and other industrial settings, but developers, now charged with digital transformation, often lack time series know-how.
IoT device data could soon engulf organizations, and software developers who are expected to create useful reports, and real-time visualizations and automated feedback loops with IoT sensor data. They must also often do so with limited tooling or experience.
In the future, developers may need to learn elements of data science, and even AI deep learning – while still maintaining expertise in advanced programming – in order to succeed in time series data analytics, according to industry viewers. Yet, some say, AI is not always required, and could be overkill, in some instances.
Driving activity is the data flood that is growing as devices multiply. Market researcher Statista estimates IoT connected devices worldwide will reach 75.44 billion by 2025, five-times the number of such devices in 2015.
Understanding IoT Sensor Traits
Industry participants must acquire new skills to meet the data surge, according to Dan Lluch, principal marketing engineer at mathematical computing software maker MathWorks.
“In recent years, it has been much easier to add sensors to operational systems and have data streamed to remote locations,” Lluch said. “The cost of adding sensors is low, and the required infrastructure is now ubiquitous. The major challenge is the availability of expertise to realize value from the data streams.”
Lluch said developers need to gain an understanding of basic characteristics that different sensors display, to build useful analytics on top of applications.
“Data sampling rates are generally tied to the action needed on the system, which in turn impacts the analytics,” Lluch said. So, he continued, while it may make sense to take multiple samples per second for something such as heart rate measurements, it does not make sense to take multiple samples per second for ocean tidal measurements.
The sensor data type influences the implementation of analytics, according to Lluch, depending on whether, for example, the data of interest is a single data point, such as a temperature value, or is vector data, as may emanate from a chemical analyzer. Also calling for different approaches to analytics is RGB pixel data, which is often the key identifier in computer vision applications.
Manufacturing plants, supply chains and tiered service and maintenance offerings have become active users of time series technology that creates a “fusion of data” from multiple sensor types, Lluch said. So, developers should expect to mix and match different sensor data in time series dashboards.
The location of analytics execution and the available storage capacity and processing power are two factors developers must address, he noted. These days that can mean deploying analytics on edge computing systems, on-premises or on the cloud.
MathWorks MATLAB software, Lluch said, enables developers to create time series data analysis independently of the platform on which it may execute. The software also allows developers to work with new artificial intelligence-style machine learning and predictive analytics models, without learning separate data science tools.
Care and Feeding of Neurals
Today’s IoT developers are familiar with AI and deep neural networks and believe they must employ such approaches. But their results will vary depending on what type of data the sensors pick up.
That’s according to Chris Rogers, chief executive officer at SensiML, maker of a platform for edge computing and IoT data analysis. He said simpler machine learning tools, some that automate portions of the development task, may prove more useful some in instances.
As an example, he pointed to repeatable factory line visual inspection tasks, which, he said, can use simpler machine learning pattern recognition methods that have easier model training requirements.
“For time series data streams — from accelerometers, microphones, strain gauges, pressure sensors or load cells — classic machine learning algorithms can often prove a better fit, requiring much less training and test data, fitting in a smaller footprint, and requiring much less computing power to implement,” Rogers said.
Time Series Methods Merging
Time series data analysis is now seeing a useful merging of methods, according to Rosaria Silipo, a principal data scientist at KNIME, which is a maker of an open source data analytics platform.
“One branch comes from statistics, with [Autoregressive Integrated Moving Average, or ARIMA] and its statistical requirements, and the other comes from machine learning, with looser requirements and more powerful algorithms,” she said.
The two branches are still in the process of merging, she advised, and this is generating “quite a disorienting feeling” for developers who are analyzing time series data.
Of course, she emphasized, the big driver here is the fact that “IoT sensors generate data at a speed rarely seen before.”
“Calculation power and speed — possibly real time — have become more and more relevant,” she said, “allowing for a little drop in prediction performance if it brings a noticeable improvement in speed.” IoT developers now wrestle with this trade-off between speed and performance.
Silipo said the streaming of IoT data is fast, heterogeneous and usually highly dimensional. Successfully creating visualizations of such data — with fast implementation — may require a reduction in dimensionality to represent different data types in the same space.
“All those things have been possible so far, if taken separately,” she said. “The combination of all of them is the current challenge.”
The Moving Analytics Bottleneck
Overall, the changes in sensor and signaling systems over time is quite remarkable, SensiML’s Rogers said. Since the 1990s when he began his career as an automotive test engineer, key changes have taken place.
“At the time, sensor costs and complexity were orders of magnitude higher; the volume of data was limited by sensor costs,” he said. “For example, an NVH application then required an expensive chain consisting of piezoelectric accelerometers, charge amplifiers, signal conditioning modules and data acquisition boards that easily cost $5,000 per channel or more.
“Today we enjoy access to a wide array of low-cost [devices] capable of comparable signal quality in highly integrated, multi-axis sensor ICs for less than $1 per channel in most instances,” he continued.
However, that progress adds stress to the programmer side of IoT development.
“With so much more data to digest, the bottleneck in the process really becomes the engineers and computer and data scientists that used to devise, test and implement algorithms largely by hand-using high-level languages.”
Rogers, like others, is working on machine learning code generation tools that address his perceived IoT developer bottleneck, to better analyze data generated by the Internet of Things.
About the Author
You May Also Like