How much data does your organization generate? The answer may surprise you. The benefits that come from leveraging this data will surprise you even more. QTR Systems has the resources and experience to help your organization unleash its data and begin drawing insights about how you do business.
What is “Big Data”?
“Big Data” is a term that refers to a package of services that allow for the mining of “unstructured” data. Unstructured data usually consists of data or files that do not adhere to a specific format.
“Structured” data, on the other hand, usually refers to data that is stored in a consistent, standard format. Files that contain “structured” data could typically be directly imported to a relational database because of its adherence to a standard format. Examples of structured data could include anything from spreadsheets to relational databases and backup tapes. The important thing to remember is that “structured” data must have a consistent format common across all instances of the type of file in question. When dealing with “unstructured” data, the expectation is that there is little to no consistent in the format of the data.
How could one use “unstructured” data?
Here are a couple of use cases:
Let’s start with a simple use case. Let’s say that you have an employee that has left your organization. This employee kept meticulous notes in flat text files that she authored in Notepad. You want to find all instances of her speaking with a specific client as well as pull out any dates that could refer to a meeting. With a “big data” platform, these files could be evaluated with a process called “mapreduce” which can quickly scan these files for a set of queries and expressions to find the client’s name, the word “meet”, and any string of words that refer to a date. While this would hardly be a real-world use case for this technology, it should help you to start thinking about ways that you can leverage this data.
Let’s now consider a more “real world” use case. Let’s say, for instance, that you are in charge of a manufacturing plant. Your quality department wants to have a better understanding of how machine performance impacts the frequency of manufacturing defects. Meanwhile, your supply chain department wants to analyze part vendors against the quality and price of like parts for use in future price negotiations. On top of this, your customers want complete part traceability. That is, product genealogy from the time the vendor packages the raw materials for your company’s product to the time your company ships the finished product. You have data from your company’s ERP system, manufacturing system, telemetry from your manufacturing equipment, and data from your shipping system. “Big Data” platforms allow you to bring all of this data together and query it in one spot. This allows for you to correlate common keys. In this use case, you find that you can key your shipping system to your manufacturing system by serial number. Then, you match the data in your manufacturing system to equipment telemetry based upon the date and time that your product went through each manufacturing step. Next, you then find that you can match the model number of the serial number to a bill of materials, which is then keyed to GRNs (goods received number) for each part in SAP. Finally, you match the GRNs to PO numbers, which contain vendor and point of origination information for the parts that make up your final product. At this point, you find that you can generate product genealogy reports for your customers, reports for your supply chain department that tally up defects by part vendor, and you produce a report for your quality team that shows defects by hour, by machine and produced serial number.
What’s “under the hood”?
The main components of a “Big Data” platform consist of what one could imagine as a data processing layer and a storage layer.The data processing platform almost always will consist of a variant of “Hadoop”. Hadoop is a free and open source platform that is integrated into many of today’s most cutting edge business analytics engines.
At the core of Hadoop is mapreduce, which is both a programming model and a method of data processing. In essence, mapreduce is meant for evaluating large data sets over a set of distributed computers. That is, you can harness the power of a large number of computers in the execution of a query against a large data set. Not only is this efficient, but it is also very fast. Queries that would take minutes or hours to complete in a relational database take minutes with mapreduce.
The storage layer of Hadoop is simply a place for one to store unstructured data. This is referred to as the Hadoop Filesystem, or “HDFS”. This is a distributed filesystem, meaning that a piece of the data “folder” is stored across multiple nodes. This also allows for a copy of the data from each node to be stored across all other nodes. What this provides is the ability to lose multiple nodes of the Hadoop cluster without a loss of service. Aside from the obvious benefits of resiliency makes maintenance and deployment a seamless proposition for those who use the Hadoop cluster. In short, your data is available whenever you need it.
The data and query processing platform can range from a free and open source platform (such as Hortonworks or Cloudera) to commercial solutions. The performance and visualization of a query can be optimized through the use of in-memory data engines, such as SAP HANA, which is a commercial system. For example, one could implement Hortonworks with SAP HANA to provide query acceleration and ease of data visualization.
For the storage layer, many organizations are assembling “Data Lakes” as a common collection point for unstructured data. It is the “Data Lake” that makes use cases such as our “real world” example much easier. At the infrastructure level, “data lakes” typically sit on top of a cloud platform, commodity on-premises hardware, or a combination of both.
Selection of which platforms to use for the data processing and storage layers depend greatly upon each organization’s internal technical expertise and what economies of scale make the most sense for your business and its governing factors.
If you would like to understand how your data can help to accelerate, improve, and transform the way that you do business, contact QTR Systems to arrange a one-on-one discussion with one of our professionals.