CloudLock – Unstructured Data Governance

By Subraya Mallya - June 2010 | Topics - Cloud Computing, Data Security

Note: Aprigo has since changed its name to CloudLock. So any reference to Aprigo is just for history.

If Data Overload, Fileshare sprawl, Data Breach, Regulatory Compliance are part of your day-to-day jargon at work then this should interest you.

In this age of exponential growth of information, dynamic global workforce, companies have had to constantly adopt new technologies to reign in the content that is being created on a daily basis. The reasons could be many Intellectual Property Protection, Regulatory Compliance, Privacy and many more. Technologies like Document Management, Content Management besides the traditional ERP/CRM/SCM applications have provided companies with a slew of tools to organize information but gaining control over the information deluge has been elusive. Depending on whose estimates you trust, analysts put  information in a company into buckets of 70% unstructured and 30% structured. With Cloud Computing becoming a preferred model of future IT infrastructure, the challenge for IT organizations to reign in the sprawl is becoming that much more complex. The proliferation of unstructured information now extends past the on-premise file shares, emails onto cloud based storage or applications like Amazon S3 or Google Docs.

As part of our technology review series, I had a chance to talk to Gil Zimmerman, founder and CEO of CloudLock  recently. Here are some excerpts of the discussion.

SM: Can you give us a little bit of the background on CloudLock, its genesis, how long has it been in existence and your product offering?

GZ: CloudLock was started as a pre-seed company along with two of my co-founders Tsahy Shapsa and Ron Zalkind. We raised some pre-seed money, just enough to give us runway to go out and explore many different ideas that we had and thought were viable before we could raise Series A on the one final idea. We talked to many industry experts,  customers, vendors, other fellow entrepreneurs to figure out what the real pain point was, and the solution we wanted to focus in was. That is how CloudLock came about. We focused on three core areas around Storage, Networking and Security and arrived at this offering after talking to customers and understanding the big pain that they are facing around the ever growing challenge of managing data, specifically unstructured data i.e., the documents that were spread across the company. We found that there were very few tools that were available for the businesses and IT to effectively manage that data in terms of the security, recover-ability, storage costs. We also found that the best way to provide the solution to the problem in the target market, we were focusing on, the small and medium size businesses, was through a SaaS offering. That way we could deliver all the benefits that they were looking for while not adding any additional IT overhead on them.

SM: How do you define Small and Medium businesses in your world and why that focus?

GZ: For us small and medium businesses are companies ranging anywhere from 50-500M in revenue. The problem they are facing is that they are priced out of these large offerings in the Content Management space and at the same time short on resources to realistically manage additional IT deployments, infrastructure. All they are looking for is a simple solution that easy to deploy, support and does the job for them.

SM: Tell us a little bit about the SaaS offering, the technology involved.

GZ: After we identified the market we wanted to go after, we wanted to build out something very light, that would be agentless, very easy to deploy and consequently simple to sell and support. Then grow the product with the customers. We built out a prototype that was a collection service that goes off and collects metadata from customer’s on-premise resources and aggregates that data into correlation between system and presents that in a rich visual interface. All customers have to do is to download a utility on a VM that will traverse the network and collect the metadata. It uses the standard network protocols like CIFS, NFS, LDAP and no changes in network configuration is required. We will however, run multiple collectors to ensure geographical coverage from remote locations.

The key emphasis was on the visualization of the data. Given the nature and volume of files that would be there in a given terabyte of storage it was clear to us that presenting that in a tabular form would be a challenge. Instead of providing data to customers and letting them make sense out of it, we took the extra step of providing them with the visual representation and highlighting the key information that customers should be paying close attention to. We gave them tools to proactively look for issues like storage capacity, data breaches. We present all the information in a dashboard that allow companies to see potential risks.

SM: Are there any restrictions on the document types you support or given that you look at the standard metadata attributes it does not matter which document type it is?

GZ: We look at 28 different metadata attributes that are non-permission related attributes and then the permissions and LDAP repositories. Essentially we look at every external property of the file without opening the document or looking into the document. No actual file will be opened and no actual file will be transferred offsite.That should reduce the hesitation around their critical documents going outside their network. We are not a content indexing company so we don’t look inside the document.

SM: Let us talk about the specific area – Unstructured Data, you chose to focus on.

GZ: Very early in our exploration, we noticed that of all the IT resources in a company, files is the only resource that IT is accountable for but does not control, provision. Every employ in the company could create, copy, modify, delete, move, share them. The same is not true of any other IT resource whether it is the application, hardware or software. So IT really gets far outnumbered in this. It was not a infrastructure problem, it was truly a gap between infrastructure and business. So a CFO wants to ensure financial information is secure, marketing wants to ensure a key product announcement is not leaked. In some cases companies are required by regulatory mandates to secure certain sets of documents. It becomes very challenging for IT to conform to those guidelines while balancing the infrastructure needs and the business needs. So that is where we come in. We take the metadata from file sytems, active directory, cross correlate those to help IT answer questions around who has access to what information. For example they can easily find out who has access to a particular folder, how, when they got that access via visualizations. The same could be done the other way around, say when an employee leaves a company, you can easily find out all the files, documents they have access to. Effectively we tell companies, what data is exposed in their environment and go and address those loopholes.

SM: Talk a little bit about governance, compliance mandates. Your focus is mid-market and at the same time most of governance mandates are target at larger companies.

GZ: When I say small and medium businesses we are talking about companies in the 500M range. We have customers that medium sized but public. Also in terms of governance mandates, companies are still subject to compliance and regulatory mandates either directly or flow down due to their relationships with their larger partners. Either they provide services or products that is then part of the larger company’s offering. We are increasingly seeing mandates like new Massachusetts Privacy Laws, California state regulatory mandates that are rolled out broadly across other states. As such governance is one of our value propositions, besides data management and capacity & costs management.

SM: What do you do in the storage area?

GZ:At the same time we are collecting metadata about files scattered around the company in different file shares, we also collect the inventory of what is on each disks. Who the data belongs to, when it was last accessed, the nature of data, growth patterns etc. To help IT manager bridge gap with business, we introduced a simple cost calculator, that will allow them to assign a dollar value to cost of managing data and we apply that to any metrics that are being managed. So based on those calculations, companies can start making decisions around best way to manage data, for example, moving least accessed data to low cost storage etc. Cleaning orphan data will in turn postpone future storage purchases, reduce your backup capacity and reduce backup windows.

SM: So does Cloud based storage come into the discussion there?

GZ: That is way we see a lot of our customers pushing us towards. In a way, companies are using CloudLock as the cloud enabler since we provide a unified view of on-premise and cloud based data. Right now we provide view into Google Docs as our first cloud based storage options. With the hybrid storage options companies are exploring, we can provide them ways to compare and contrast the cost differences and move information to a cost-appropriate option.

SM: Tell us about visualization of documents stored in Cloud based options. What do you do there?

GZ: As such it is not all that different from the on-premise document storage. The same challenges that have existed for on-premise exist on cloud. We have 95% hit rate on highly sensitive information that is exposed to either everyone in the company or large subset of employees in the company or permissions that are granted and never revoked. This a big problem in on-premise, but atleast there the requirement is that the user has to physically attach himself/herself to the network. So when documents are moved to cloud, the problem becomes a magnitude larger. With cloud the IT manager has zero visibility into the files that belongs to their company in cloud. The only way they can get to them is through APIs. And we use those APIs when available or in other cases deploy our collectors on cloud servers to collect that information and provide the visualization of that information.

That said, I want to make sure this does not sound like we are against cloud. We are big believers in cloud and in fact everything we do is based on cloud.

SM: Currently you support Google Docs. Are there any plans to support Amazon S3, Rackspace, Microsoft Azure etc.

GZ: Google is the most widely used offering today so we focused on Google first. We will take it one at a time and based on customer demand we will support all leading cloud-based storage offerings in future.

SM: What is your pricing model? Do you charge companies by gigabytes of scanned storage or per user?

GZ: For access, it comes to $1 per employee per month and that includes 1TB scans. If you want to include more TBs at a time will come to $25 per month per TB. So if you have 500 employees in the company it comes to $500 per month and $50 for the 2 additional TBs of scans coming to $550 per month. A minimum subscription is $100 per month.

SM: One of the challenges with most governance, security solutions is quantifying ROI. If nothing goes wrong, the ROI is not tangible. Do you guys run into that challenge walking into a customer or trying to renew the contract?

GZ: Yes. It is the same mentality as with any security solutions. Obviously for people that have experienced a data breach, or undergoing a data audit, the conversation is a lot easier and ROI resonates very quickly. It is either dollars are spent on a prolonged audit or dollars spent on a system that provides the necessary answers. We try to focus on companies that have a good understanding of compliance and governance. As a small company, it is tough for us to evangelize governance and compliance and then try to sell our software. Our solution is very affordable that even small companies can easily budget for. If you are looking for soft ROI there are data sheets from analysts out there that quantify the cost of a single data breach in dollars. Each breach costs somewhere in the range $80K per breach to $300K each incident. Compared to those costs, the cost of using our solution becomes a easy decision.

On the other hand, we have our capacity management component, which clearly shows ROI. Our built-in calculator out-of-the-box will show you the true hard savings in dollars right after a single scan based on files that are least or never accessed. Data doubles year-over-year so this problem and savings add up.

SM: Thanks for taking time and sharing information on CloudLock with us.

Here is a quick view of CloudLock solution.

Back to Top
%d bloggers like this: