In its 11th Global Internet Traffic forecast, Cisco estimated that cloud traffic is likely to rise almost 4 times in the next 3 years, from 3.9 zettabytes (ZB, that is 1,000 TB) per year in 2015 to 14.1 ZB per year by 2020.
Most notably, the data produced by the Internet of Things are a big part of this growth. By 2020, databases, analytics and IoT workloads will account for 22% of total business workloads, while the volume of data generated by the IoT will reach 600 ZB per year by 2020.
With such volume of data available, the major goal of the industry is to exploit so much information to create added value. The greatest challenge, however, is how to process such data. By processing, we mean reading, manipulating and extracting relevant information from the data in our possession.
Thanks to the recent evolutions of electronics and, in particular, of system-on-a-chip, designing a system now can take in consideration a whole range of different solutions about which tier should be responsible of handling the processing logic.
In this article we’ll be detailing three tiers in which an IoT system can process data:
- at a central location, i.e. on your back-end platform
- on the edge, i.e. in the immediate proximity of the connected object itself
- in a gateway, i.e., in a middle layer between the “thing” and your back-end
At a central location
When dealing with an array of connected devices, a possible configuration consist in transferring the data read by the “things” themselves to a centralised storage location, such as your back-end.
Once the information has reached the back-end, data can be processed either in real-time or in deferred mode, in computational batches.
This approach is traditionally one of the most widespread, since it implies the production and the deployment of relatively inexpensive devices, which can perform the transfer via a connectivity radio module.
There is a number of valid reasons for processing data at a central location, in particular:
By storing data on your back-end, it is simpler to observe your system’s behaviour in its completeness. All the information retrieved by any sensor resides in one place: it is therefore more straightforward to observe specific patterns and correlations in the data you own.
Device cost and energy efficiency
When performing all data handling and processing on your back-end, all the intelligence and computational capability will live on your servers. This reduces the complexity of your distributed devices: since their sole responsibility is to perform reads and transfer data, the required computing power is limited, thus decreasing the complexity of the required hardware. Also, relatively “dumb” devices are generally more energy efficient and help reduce the costs of your system.
Once the information is stored at a centralised location, your system can easily expose services for sharing it in a variety of ways. As an example, it will be easier to create Web Services exposing APIs that can be further integrated into another system, either local to your Information Information System, or distributed in an Open Data flavour.
Storing and processing data in a unique location can present some major drawbacks as well, such as:
In case the data read by your devices is transferred via network, your system can present a number of vulnerabilities, related to the security policies that may or may not be implemented on your network stack. While it is beyond the purpose of this article to detail all possible security threats, it must be recalled that security is the most prominent aspect in the design of a platform.
When dealing with large volumes of data, and depending on your business logic, processing can end up being slow. As a consequence, triggering an action on the device could take a considerable amount of time.
The processing of data can notably be performed on the distributed devices themselves. This approach becomes more and more popular thanks to the advancements in electronics and the overall reduction in size of the hardware components. Distributed devices can now have enough processing power to perform complex calculations which were traditionally left to your back-end platform.
Also, and most notably, miniaturised and energy-efficient Unix-based systems-on-a-chip can now run complex calculations and even deep learning models to thoroughly understand and provide feedback in response to the input provided by the sensors.
There a number of reasons why this approach is transforming the industry, the most relevant being:
As stated above, one of the most important advantages of this approach is an immediate feedback loop. The software logic installed on devices can process the information on-site and immediately transfer an action to the actuators of your system. Neural networks can even identify complex patterns in rich sources of input, such as sounds or images. The advantages are easily intelligible and can foster the adoption of revolutionary use-cases.
Data Transfer and Security
By processing (or pre-processing) data on the edge, it is possible to drastically reduce the quantity of bytes you need to transfer to your main servers. For instance, useless or noisy information can be filtered and you will be able to distribute to your back-end an optimised quantity of data.
Depending on your needs, your system can even avoid transfers altogether, thus drastically reducing the risk of data interception. Also, devices provided with higher computational capabilities can enforce safer methods of data encryption.
Needless to say, no approach is exempt from weaknesses and data processing on the edge makes no exception.
The first negative impact is clearly the cost of the hardware: smarter device with a higher computing power can cost up to an order more than simple devices. Needless to say, the expenses are proportional to the size of your installation. In other words, budget is generally a hurdle only for large systems.
Another aspect directly associated to the higher performances demanded by edge computing is the low energy efficiency. On-the-edge data processing can drastically increase the power consumption of the devices and in most case your installation will not be able to run on batteries for an extended amount of time.
Lack of overall view
In the context of an heterogeneous and spread system, in which sensors are geographically separated, edge devices can be lacking of some crucial data for providing an accurate feedback in case such information is provided by a remote source.
A third tier able to implement your data processing logic is the gateway, i.e., a middle layer located between your connected devices and your servers. As an example, a gateway can be a mobile application polling data from your devices via bluetooth and then transmitting it to your Web Services.
There is a number of benefits this specific approach brings, and notably:
Feedback and interaction
In case interaction and feedback are required from a user, for instance in case of B2C appliances such as smart home, a gateway such as a modern smartphone, can provide an easy-to-use and efficient way to provide a feedback loop to the connected thing. In particular, most handheld devices provide enough computational power to perform complex calculations and present the user with rich information in form of visuals or sounds.
Gateways such as mobile devices can provide an efficient way to implement privacy policies related to the treatment of personal data. In other words, a smartphone can retrieve information from a variety of sensitive sources, process and store it in a secure way, while transferring only a limited and anonymised quantity to your servers.
Gateways are everywhere
Gateways are easily available: from mobile devices to routers or even antennas, constructors have a wide choice of devices capable of implementing the data processing logic.
Aggregation of data
Another major benefit of gateways is that they can handle processing of the data across multiple devices before it is sent to your back-end. For instance, information can be condensed and stored in a local database in order to maximise the amount that can be sent to the cloud over a single connection.
Another use case could consist in providing a reference timestamp for devices that can’t correctly manage their internal clocks.
Implementing the data processing logic on a gateway can also present a number of drawbacks.
A more complex system
Adding a new tier could imply a greater complexity of your system. Development and, especially, testing should then take in account all three levels of your infrastructure, i.e. the connected device, the gateway and the back-end. The complexity could be even greater in case your gateway is a network device you don’t have direct control on or you don’t necessarily own, such a LTE macro base station.
Software version management
An issue which is directly tied to the complexity of the system is managing which versions of your software runs at each level. In other words, when deploying a new version of the logic, major focus should be put in making sure that each level runs a software version which is compatible with the version deployed on the other tiers. This can be a major complexity in a case of a progressive rollout, in which each tier should be able to support different software versions deployed on other tiers.
The addition of a new tier also translate with a larger exposure of your platform to potential threats. However, it must be noted that such risks are greatly minimised by the security measures implemented on most gateway platforms: for instance, modern mobile Operative Systems enforce sandboxing and App Transport Security by default, in order to automatically prevent the most common threats.
Needless to say, the adoption of one or another strategy needs an accurate evaluation of constraints, needs and environment of the system. In many cases, especially in complex projects, a hybrid approach can be chosen, as different calculations and processing can be performed in different tiers of the system.
The choice can depend on a number of KPIs, such as:
- Volume of data to process
The rapid evolution of Edge computation in the Internet of Things is changing the way we process data. When designing a system requires a perfect understanding of the domain in order to make the best choice out of the plethora of technologies available today.