Skip to content

Tech Tuesday: Getting Started with Classification

One of the most important elements of an information governance program is the proper classification of your data. A central, formal classification scheme is critical, especially when much of the information – structured and unstructured – is used in multiple departments or teams across the organization. If data is the lifeblood of your organization, then a good classification scheme ensures that everyone can find and leverage that data in their daily work. It also means you have a proven strategy to manage that data appropriately.

 

The Benefits of Classification

Imagine trying to find a document among thousands of documents spread across multiple file shares, or file-sharing applications. Maybe you know what the document is called, or maybe you only know what its contents hold. Maybe there are multiple versions of the document or multiple copies stored by other departments. Frustrated yet? Who wouldn’t be?

Not only do you have to move from one repository to another to try and find your document, but searching for it in repositories that do offer a search function is returning so many results, it will take forever to sort through it all.

Two things could help you here, and one of them is having a company-wide classification scheme. (For this article, I’ll focus on the classification of your documents and other unstructured content.)

Now, before I go any further, I don’t want you to think that you have to drop everything you are doing and kick off a year-long project to document your entire company’s taxonomy. Not only is that unreasonable to expect, but it also has the potential to slow down your information governance efforts.

Instead, we want you to think about your classification/taxonomy planning the same way we recommend you think about information governance – in phases or projects. Build your classification scheme as you build your governance program – one step at a time. By creating your taxonomy this way, you can add new content types or build on the content types already in the taxonomy, slowly and carefully developing a classification scheme that will work for everyone.

Let’s get back to it.

Effective classification of your content provides many benefits with the ultimate benefit that it gives you greater visibility into your information:

  • Identify sensitive information like PII, PCI, and other personal information
  • Separate the good information from the ROT
  • Respond quicker to requests for information
  • Assign cost-effective storage tiers
  • Apply appropriate security controls to prevent accidental disclosure or cyberhacking

There are many examples of the benefits of classification, but I’ll provide two:

  1. The first is responding to data subject requests from privacy regulations like upcoming CCPA or GDPR. Both of these regulations require you to provide a person with all the information you store on them in a certain amount of time (GDPR gives you 30 days, CCPA, 45 days). If you store customer information across many different repositories and each repository stores that information according to its own classification scheme, it’s going to be very challenging to find everything in a short period of time (unless, of course, you have many employees working together to do it – then you’re expending huge amounts of resources on each data subject request).
  2. The second is the risk of a cyber hack, which everyone says is not a matter of “if,” but a matter of “when.” According to a Harris Poll conducted for Symantec in January 2018, 60 million Americans have been affected by identity theft. Much of the data needed to perform identity theft is stolen from businesses that store customer information inappropriately or secure it inadequately. From the same article, “Cybercriminals will steal an estimated 33 billion records in 2023. That’s according to a 2018 study from Juniper Research. This compares with 12 billion records Juniper expects to be swiped in 2018.” If you’re not classifying your information and applying appropriate security policies to it, then you may find yourself one of those affected businesses.

 

Getting Started with Classification

Some people might think the first step to classification is to get a tool, but that’s not the first step. The first step is to bring together key stakeholders who create, curate and work with your organization’s information to get a complete picture of how information is used not only in one department or division but how that same information may be used in other departments or divisions. Keep in mind that you can still iteratively do this as you work on governance projects.

When you take the time to talk with everyone, you’re able to create a classification scheme that meets the needs of everyone. That’s very important because you don’t want one department to classify content differently from another – you’ll never be able to support regulations like CCPA. Maybe you won’t make everyone happy, but that’s not exactly the point of a central classification strategy.

After you’ve received input from key stakeholders, you can start to define content categories (or content types) and associated metadata. Share the classification scheme with everyone to ensure they follow it.

I said you don’t need tools to start, but investing in the right tools early does have certain benefits. For starters, as you define your taxonomy, you will need a place to record that taxonomy, indicating where and how it’s applied. A solution like everteam.policy can help you do that.

Our product, everteam.discover, connects to all your unstructured repositories, indexes your content and automatically applies your classification schema. It integrates seamlessly with everteam.policy to pull the classification schema to apply.

In everteam.discover, you can classify content in three ways: manually, using rules (query matches), or using machine learning (scanning the contents of a content asset).  Auto-classification using rules or ML is necessary when you have enormous amounts of content to classify. It will help you meet regulatory requirements much quicker (and more accurately) than manual classification. But there are also situations where manual classification is necessary.

Machine learning makes it possible to analyze unstructured data semantically to suggest classifications based on text found. You can then add these recommended classifications to everteam.policy.

Classifying Content with everteam.discover

You know how you want to classify your information, but there’s too much do manually (as in one document at a time), so you bring in everteam.discover. Everteam.discover connects to all your repositories and indexes the content. You can then review the content by different facets or views or search for content by a range of parameters. To manually classify a group of documents, you select them all and apply a classification category/content type using the taxonomy you have previously added to the tool.

Once you have identified the rules for classifying documents, you can easily set up steps to begin to automate. Add the rules to an everteam rules-based classifier.  The classifier will automatically execute any time a new document is added and apply a category to any documents that match the rules. Any newly added documents will automatically be classified eliminating the manual process.

Machine learning is the third way to classify content in everteam.discover. It enables you to analyze your content and suggest classifications. For machine learning to work, you have to provide some training sets of documents for each classification for everteam.discover to learn from. As more content is indexed and classified, it will get better at assigning the correct classification.

Here’s a look at everteam.discover’s classifier feature:

everteam.discover classifier

It’s not always possible to let the machine apply your classifications; you may need to provide a way for certain employees to apply classification manually. A good example here is identifying and dealing with ROT. You may be able to start with auto-classification, but you should have some people intervention to ensure you are getting rid of only the information no longer required.

I’ve only provided a quick overview of how you can use everteam.discover to help you apply your taxonomy to your content. There’s a lot more to understand about how you can use the Classifiers, as well as train a machine learning Classifier; topics we’ll cover in upcoming blogs, so make sure you sign up for our newsletter to hear about new blogs when we publish them.

Classification is not a one-time job

Whether you do it all at once (not recommended if you want to get things done) or do it in phases by initiatives, classification is not a one-time job. You can’t define it once and assume it works that way forever. Managing classifications (taxonomy) is an on-going process as you add new content types to your information, other content changes and the rules for how you manage information changes (new regulations, changes to existing regulations). How you want to use your information for supporting decisions will also affect how you classify your information.

To help you manage your taxonomy on an on-going basis, you can use everteam.policy. It not only enables you to define and manage your current taxonomy, but you can also define retention and life cycle management rules, identify access permissions and share all this information with people and systems across the organization that need to know and follow these classification rules.  

I’ll leave you with one final note about classifying your information. A classification content type (or category, depending on the term you use) should provide several things:

  • Description of the content type and all associated metadata/attributes
  • The rules for handling that information
  • How / where to store it
  • How to dispose of it when it’s no longer needed
  • The security/permissions to apply to it to ensure only the right people have access

If you’re interested in learning more about how everteam.discover can support the classification of your information (including the 80% dark data hidden in your repositories), reach out and request a demo.