Integrating Structured and Unstructured data : are we there already ?
“By 2022, 50% of organizations will include unstructured, semistructured and structured data within the same governance program, up from less than 10% today.” Gartner Market Guide for File Analytics
How many companies have separate solutions to manage structured (database, transactional data) and unstructured data (documents, text, videos, images, email, social media, etc..)? After all, they are very different types of information, so they require different technology and governance approaches. Barb touched on this a bit when she wrote about innovations in information governance for 2019; I want to dig a little deeper.
What if this requirement to separate unstructured and structured data is no longer necessary? What if we merged the strategies and technologies related to structured data governance and unstructured information governance? Can we look at both types of data in a single governance program?
The fact is, we do that today already. Consider the Salesforce object with an invoice attached. Or records in an SAP system connected to some files. Or a NoSQL database with some text fields. Much of the data we manage today is semi-structured, so why have separate solutions to manage each one?
Making Unstructured Data, Structured
“80% of data is unstructured.” I’m sure you’ve heard this. You’ve implemented or are looking at file and content analysis solutions to help you manage it. In your efforts to manage your unstructured data, did you know you are actually making unstructured data structured?
File and content analysis solutions provide capabilities to analyze your information and either automatically or manually enrich and classify it by assigning taxonomy and metadata. You can scan your information for PII, PHI, PCI, custom regular expressions, named entities and so on, to automate metadata creation. This information could be anything – it could be text in a document, a string in a database, or just a tweet. By assigning taxonomy and metadata you are essentially extracting structure from your unstructured content.
Once you extracted structure, you can relate it and examine it alongside other structured data. It only makes sense then that you would want a file analytics solution that can analyze both structured and unstructured data doesn’t it?
Of course, due to compliance and security requirements you can’t simply merge all your data and provide it to everyone in the company in a big data lake; you need data governance.
Federation is the New Repository
Not so long ago we talked about moving everything into a single repository, whether that was Documentum, FileNet or some other system.
But the notion of moving everything to a single repository never became a reality. Now, it’s all about federation and managing “in-place”. So you have data in your ERP and CRM systems, content in your file shares, SharePoint, Office365, and numerous other applications and repositories and you want to keep them where they are. At the same time, you need to ensure they are managed following business and regulatory lifecycle and adequate information policies.
You don’t want to deal with separate solutions to manage data and content in-place. You need a solution that can help you look at your data as a whole and manage it appropriately.
Another thing to keep in mind. GDPR, CCPA (California Consumer Privacy Act) and other soon to come privacy regulations do not differentiate between structured data and unstructured content. It’s all personal information regardless of its form, and you need to be able to connect the dots easily between it all to support things like requests for information and right to be forgotten.
Blurring the Line Between Data Governance and Information Governance
We talk about data governance and we talk about information governance. But the lines are blurring between the two. Often, it’s more a matter of who you are talking to which term you use. If you are talking to IT, you refer to it as data governance, and if you are speaking to lines of business people, you call it information governance.
In the end, we always talk about the same thing – providing the capabilities necessary to connect to your data and content repositories regardless of where or what they are, analyzing the data they contain, figuring out how to organize, enrich and classify it (and get rid of the ROT), and manage the good data according to your business and compliance policies.
Data catalogs exist today to manage structured data and file analysis solutions exist to manage unstructured data. Is there a demand for a single information/data governance catalog?
From the records management and archiving world, we get classification, taxonomy, metadata and data retention or data minimization rules by information asset class. These solutions have been offering these capabilities for the past 20 years. By merging them with data catalogs for structured data and bringing in, not only records but all information (work in progress, convenience copies, and other renditions, etc) we get metadata and taxonomy alignment and we can manage all our data more effectively.
On an Everteam note, this is something we think about as we develop our information governance products (everteam.discover, everteam.policy, and everteam.archive). We already have a structured database connector in everteam.discover mainly used for application decommissioning and to archive some of the data. We can analyze structured and unstructured data side by side. There’s is still work to do to make this convergence happen and we are excited to keep moving forward to create the governance solutions enterprises need. If you’d like to learn more about our products and roadmap, do not hesitate to drop us a note.