Solutions Optimize Long-Term Data Management
By Manuel Terranova
SAN JOSE, CA.–Datasets in oil and gas can be a mess; more like the disorganized, sprawling, labyrinthine streets of Boston or London than the ordered Manhattan gridiron. Oil and gas companies have amassed petabytes of data from exploration and product design activities over decades, and the continued expansion of this already-heavy volume of largely unstructured data is becoming increasingly challenging to manage.
Washing over engineering and research teams globally is the realization that decades of backroom-led “storage strategy” (emphasis on the word “storage”) has underserved finding, accessing and re-harvesting the crown jewel datasets for the teams that need access the most. The traditional approach has centered on putting information into a system. Now, leading business, engineering and research leaders are talking about getting the data out, namely finding and accessing it.
During a discussion around simulation and geometry datasets, a senior-level engineer at a Fortune 20 company said it best: “If you can’t find it, then you haven’t saved it.” Engineers, researchers and scientists typically encounter very fundamental problems around ready access to data, finding data and aggregating related datasets together into one logical place.
For example, one industrial manufacturer describes having trouble aggregating simulation and analyses datasets–created early in the product design cycle–with data that factors in later on, namely test and field data from remote monitoring and diagnostics. Let’s face it: that company is not alone.
The “magic moment” comes when engineering and business leadership understand that these crown jewel datasets should be managed as assets, not as storage. This realization expresses itself in profound ways. For starters, the engineering team becomes a much more prominent actor in determining what works, and in many cases, will demand that data management tools be made available to them directly. Bringing these datasets under a single flexible construct (i.e., a single distributed, unified namespace) requires scale of hundreds of petabytes and on-the-ground engagement with the research and engineering populations as the primary customer.
Largely proprietary software/hardware storage constructs make it extremely difficult to logically group data that belong together, especially once one considers a time horizon for the dataset that extends one refresh cycle (approximately three to five years). Today, end users are left without a way to manage the data and the result, over time, erodes a company’s intrinsic value because it cannot harvest these mission-critical datasets.
On a technical level, these data will always be distributed far and wide, but that does not mean that they cannot be managed and governed properly as an asset and made to appear unified and accessible to end users (i.e., the engineers, technicians and scientists at oil and gas companies). Data unification through latest-generation “data abstraction” (i.e., virtualization) applications can deliver game-changing control and visibility of data.
Data Abstraction Benefits
Data abstraction enables companies to reconcile (finally!) the long-term nature of these datasets with the three- to five-year tech refresh cycle of the hardware on which they are housed. This cycle has proven to be enormously disruptive for end-user populations who have no “map” to traverse dozens to hundreds of mount points, none of which have a consistent organization. Every time a tech refresh is executed, files are migrated, links are broken, pathnames change and tribal knowledge suffers exponential decay.
As a result, it is easy for engineers–and the organization in general–to lose track of these large datasets over time. This is an especially vexing problem for oil and gas companies because the data are often valuable for anywhere from 10 to 50 years. The tech refresh cycle, which is always underway in any one part of the “tech stack” within an organization, is a primary contributor to why crown jewel datasets are lost within data-rich companies.
This is not to say that it is impossible for oil and gas test engineers and technicians to keep tabs on nondestructive testing simulations of, say, a turbine that was originally developed 15 years ago and is still in the field today. Engineers are creative folks, and very often they do find innovative ways to continually keep track of these mission-critical files that keep getting moved around every few years by the information technology department.
However, this ingenuity exacts a heavy toll on the asset team’s productivity. In our observations, for instance, product designers spend on average of between 45 and 120 minutes a day finding, moving, repositioning or otherwise getting a handle on mission-critical geometry, simulation and telemetry datasets each work day. The situation is surely not much different for engineers or geoscientists. Imagine what these professionals could accomplish if they could access these files as quickly and easily as the average Microsoft Word® document from a personal hard drive, and not just today, but in another decade or two as well.
Traditional monolithic architectures and IT departmental approaches to data storage over the past three decades or so have undermined how engineering teams really need to work with these mission-critical unstructured datasets going forward. If oil and gas companies want to stay competitive in the coming years, their technical teams will need instant access to geometry, simulation and telemetry datasets over the lifespan of each piece of serialized equipment on every asset. For example, 20 years down the road, engineers will require instant access to both telemetry data collected in sensors over time and the original test bench data, and the ability to immediately compare the two, in order to conduct predictive maintenance or execute a product upgrade.
In addition to the long-term challenges posed by the changing-out of technology hardware every few years, oil and gas companies face additional immediate limitations on their ability to store and manage large datasets. Product lifecycle management (PLM) systems generally used by large product designers are supposed to help readily find data, but current market leaders in the space have scaling challenges; they can store geometry (engineering drawings), but they are not equipped to grow with other rapidly expanding critical unstructured datasets, such as telemetry data or simulations that now generate up to 90 gigabytes to 2 terabytes per test.
Companies handling engineering and seismic data are generally using applications that often silo datasets and rely heavily on cumbersome database schemes to keep track of where the data resides. These suboptimal monolithic architecture designs also contribute to the misplacement of data over time and create “data islands.”
Managing The ‘Crown Jewels’
Prevailing storage and archive architectures are simply not set up to scale gracefully at the rate that data expands in contemporary oil and gas applications. Engineers and data scientists will continue to absorb the ever-increasing structural inconveniences of downtime and inflexible scale as the situation becomes more severe, reducing their time and energy to do their best work.
Consider that five or so years ago, a “big” file would run on the order of 250 megabytes. Today, simulation files for an industrial piece of rotating equipment will often be greater than 1 gigabyte, with a P95 greater than 70 GBs. In most cases, the siloed and fragmented underlying storage construct was not designed to perform at a gigabyte file–size scale, and therefore, cannot deliver performance to the end users.
Moreover, data inaccessibility can be compounded as storage space shortages lead to practices such as archiving older data on hard-to-access tape. At the departmental level, engineers may not know where to find critical data because their files often get “bumped” from shared drives.
Increasingly, engineering leaders are telling us that, after millions of dollars of investment and years of effort, PLM tools have not delivered on the promise of making engineering data easier to access and find. Stated differently, the population size, file size and heterogeneity of these unstructured engineering datasets make them unwieldy for PLM systems. The reality is that most organizations will admit that only a small fraction of the engineering crown jewel datasets is actually passing through the PLM solution.
Companies that are not able to effectively scale and manage storage over decades of data growth are going to be left with an increasingly ineffective system. Solving this problem is going to take more than new hardware or the flavor-of-the-month software solution. It is going to require a complete rethinking of data architecture. This will only happen if organizations view these datasets as more than simply bits and bytes in the enterprise, but instead see them as the invaluable company assets that they truly represent.
Like any valuable company asset, any strategy for managing crown jewel datasets should garner executive sponsorship from both the engineering and corporate sides of the business. The information technology function is a critical supporting actor and surely plays a big role. But rather than turning to traditional big IT software/hardware solutions that are constrained by their own legacy infrastructures, oil and gas companies now need to turn to innovative solutions that do a much better job of leveraging the latest technology developments. The alternative is to keep doing what many are doing: continue to be dependent on unworkable, calcified architectures that greatly limit the value that can be derived from the data over a decades-long product lifecycle.
In contrast, companies that transition to the “data-as-an-asset” mentality will transition to next-generation architectures so they can unlock the potential of the Internet of things, advanced analytics, predictive maintenance, large-volume 3-D printing and other “leaps” that will provide competitive advantages.