I was recently interviewed as a part of a group of technology executives for an article in InformationWeek regarding IoT devices and enterprises. The final article can be found here, but below I’m providing my full responses to the interview questions as I believe there is relevance to our customers and readers.
If you’re interested in hearing more about how we can help your organization deal with the massive amounts of data generated by IoT devices – feel free to reach out
1. IoT devices are making their ways into enterprises whether they want them or not. What are some of the biggest mistakes you see companies making, generally speaking?
Like with most tech-trends companies are falling into the usual trap of not planning before they act. Specifically, organizations are not planning for the infrastructure they will need to store and process all of the data that will be generated by IoT devices, and they are not properly securing information as it is being transmitted from these devices or even as the data sits at rest. Not planning for the infrastructure needs can result in all types of system failures, while not having a strong security plan around the data can expose an organization or its customers to malicious actors.
Companies also continue to keep their IT organizations separate from the business organization. Good technology and algorithms cannot be created in a vacuum. IT organizations must work hand in hand with the business organization to derive the most value from data. When it comes to data, context, not content, is king.
2. IoT devices can produce a lot of data. What’s the best way to decide what to keep and what not to keep? What’s the best way to determine the actual value of IoT data?
Determining the actual value of any data always boils down to the business case. As a general rule of thumb, enterprises should keep all data they are collecting until they understand what actionable intelligence can be derived from raw data. If you are generating the data, and you think there is no value that can come from it, you are wrong. All data has business value, it is all in how you look at that information.
Depending on the business, some data may appear to only value for a short time period (for instance, a temperature increase resulting in a temporary slow down of manufacturing devices), but that data can often be used for longer term projects (e.g., predictive analytics – determining time to next failure). You don’t have to keep all of the data around, sometimes aggregate information over a time period is good enough.
A software engineer, a data scientist, and business analyst walk into a bar with the same piece of data… When they walk out the software engineer has determined how decrease the time it takes for the business analyst to extract value from that data by 40%, the data scientists has figured out how to make relevant and accurate predictions about how that data will change over time, and the business analyst has figured out how to increase revenue by 30% over the next 6 months using that data. Throw individuals across multiple disciplines into a room – that’s how you’ll find the actual value of IoT data.
3. How should one decide what to do with all the IoT data they’re collecting? What infrastructure considerations are there?
There are a number of infrastructure considerations given the velocity and volume at which IoT devices generate data. While storage is often a major concern, I would argue that it should be the least of an organizations worries – these days storage is cheap. More important are the processing capabilities of the infrastructure you select. Are you leveraging a cloud provider like AWS or managing it all in house? Can data being generated be processed individually, or is it only valuable in aggregate? The answers to questions like this will alter the strategy.
If the data can processed as it comes in, how intense are the algorithms it is being run through? You may need a significant amount of CPU. If it’s being done in aggregate, you may need more RAM to handle all of the data in memory. Can the workload be distributed?
Of course at the end of the day the biggest factor will be cost and return on investment for the types of processing you do with that data.
4. IoT data analysis is often bimodal. What’s the best way to decide what should be analyzed at the edge and what should be analyzed back at the enterprise?
A general recommendation that organizations push as much of the workload to the device as they can so long as it does not impact performance. This is a great way to save costs at the end of the day. That said – it is impossible to do some analytics on the device as those analytics require other reference datasets or even data generated from other devices to be valuable.
5. How should IoT data be integrated into an enterprise’s overall data fabric? How should one think about the problem? How should one approach it?
IoT should be treated just like all other data in the enterprise. Whether data is generated from devices, comes from APIs, or is user generated content, you will always derive more value when that data analyzed along side other data.
Further more organizations should also follow the approach of democratization of data. Put data into more hands and you will be amazed at the ingenuity of your employees.
6. What effect might IoT data have on a company’s data management practices, if any?
In most cases, IoT data will have minimal impact on a company’s data management practices, but it is certainly possible that an organization, after deriving value through some analytic or joining of some set of specific data, will decide that other pieces of data should be kept longer or exposed through other tools to further enable analytics.
7. What security risks should be considered?
When collecting data from IoT devices, there are a number of security risks that should be considered. While you should democratize data as much as possible, in some cases, that democratization could expose your organization. If you’re dealing with consumer data generated from consumers, leaks of that data could lead to loss in trust of your organization. If you’re dealing with proprietary information, what exposure does leaking of IoT collected data cause for your organization? Are you exposing the location of transportation vehicles your employees are driving? Are you exposing information the machinery being used in your manufacturing processes?
All data poses some risk, the question that must be asked is what could a malicious individual do with the data you’re collecting. Part of that requires knowing yourself, what you could do with that data.
As a side: I once had a case where the raw data itself was not necessarily a security risk, but the analytics we used on that data which resulted in a derived data set was extremely sensitive.
8. How does the IoT affect security policies and what mistakes are people making there?
If you have good data security and management policies in place, IoT data should not have a major affect. The larger problem is when you have no policies in place. Due to the risks mentioned above, it is important to have security policies, for transmission of data, data at rest, and access to data.
9. I may think the value of my IoT data is X. Tomorrow, I may discover it is actually Y or X and Y. How should future potential value (perhaps unknown now) affect the approach to collecting, managing, and leveraging IoT data?
Future potential value of data has a major effect on the collection of data. A number of organizations throw data on the floor that they deem useless at the time. This is typically the wrong approach. As long as you can afford to keep it, store all data until you can definitely prove there is no value (in almost all cases you will not be able to prove this). At some point, you may find a use for that data, and when you do, you’ll be glad you kept it around.
10. What didn’t I ask you that you think I should have asked you?
Beyond infrastructure – a major question that comes up is how you can process and analyze all of this data – what software tools, and how much should you budget for software. In most cases, you can leverage a variety of open source technologies (Hadoop, ElasticSearch, Spark, Apache Flink, etc…) to build a custom solution that makes sense for your organization.
Want to hear more? Just reach out.
Written by Tim Tutt
Chief Technology Officer, Data Ninja, Technology Enthusiast