
By: Joseph A. Marr, Ph.D., Lead Data Scientist at Newport News Shipbuilding
“Industrial data science” means different things to different people. In fact, the practice of data science outside academia is different and depends upon many factors, such as industry type, corporate objectives and specific business requirements. I’ll define “industrial data science” to mean data science practiced outside academia and/or a research setting, with a tight focus on solving specific business problems with data, rather than development of new data analysis methods or models. I’ll address industrial data science from my perspective in the shipbuilding industry.
I’ll review some typical industrial data science problem types encountered at Newport News Shipbuilding, and by way of contrast, I’ll compare problems arising in a heavy manufacturing setting to a couple of problems arising in e-commerce. Doing so will illustrate the diversity of applied problem types covered by industrial data science. That comparison will be followed by a discussion of alignment between industrial data science practice and mathematical skills.
I’ll conclude by offering some general observations. In so doing, I hope to give the mathematics student and aspiring data scientist a sense of how your mathematics background confers certain advantages, but also how your background may skew your expectations about industrial data science practice.
Let me begin with a short description of my employer, Newport News Shipbuilding, and their product portfolio. This description is essential for understanding some of the business problems that arise at Newport News Shipbuilding and the context in which they arise—thus the problem types that data scientists who work there are routinely called upon to engage.
Newport News Shipbuilding (NNS)
“We shall build good ships here. At a profit if we can. At a loss if we must. But always good ships.”
Collis P. Huntington, Founder, Newport News Shipbuilding
NNS is an operating division of HII and is the nation’s premier provider of nuclear-powered combat vessels to the United States Navy. We are the nation’s sole supplier of nuclear aircraft carriers and one of two suppliers of nuclear submarines. Both vessel types are built simultaneously at NNS. It takes about seven years to construct a nuclear aircraft carrier and about three years to construct a nuclear submarine (Virginia class).
NNS also has a robust aircraft carrier and submarine maintenance line-of-business called Refueling and Complex Overhaul (RCOH). RCOH happens at mid-life for aircraft carriers (roughly 25 years from initial launch) and for submarines (roughly 15 years from initial launch). According to a 2002 RAND National Defense Research Institute Study, “A carrier RCOH may be the most challenging engineering and industrial task undertaken anywhere by any organization.” NNS is the only provider of aircraft carrier RCOH to the United States Navy.
Finally, NNS provides vessel decommissioning at an aircraft carrier’s end-of-life, approximately 50 years following initial launch. All of the above activities locate NNS in the United States’ heavy manufacturing sector. HII’s NNS shipbuilding division is the largest industrial employer in the State of Virginia and one of the largest shipbuilding companies in the United States.
Manufacturing At NNS
Given the time it takes to build a nuclear aircraft carrier or submarine, NNS is classified as a low volume, non-repetitive manufacturer. The low volume nature of the business is obvious from the product portfolio and delivery cycle times. The non-repetitive nature of the business may be less obvious. However, when considering product cycle times and the fact that aircraft carriers and submarines are composed of a tremendous number of unique parts, the non-repetitive aspect makes more sense.
The need for unique parts on a relatively slow time scale means that NNS generally does not consume or internally produce large quantities of these parts. This leads to a so-called “make-buy” decision, i.e., should a given part be constructed in-house or construction out-sourced to an external vendor? Whether made in-house or outsourced, the relatively small volume of unique parts are for internal consumption only by the NNS manufacturing process.
The size of some unique parts that enter the NNS build cycle (or “value stream”) can be truly breathtaking. Steel plates can sometimes span tens of feet long and wide, several inches thick, and weigh tens of tons. Other unique parts fit in the palm of your hand. As separate parts enter the value stream to become partial assemblies, full assemblies and then modules ready for final installation into an aircraft carrier’s hull, these assemblies and modules often require temporary storage. Which poses a problem: where should assemblies and modules be stored temporarily, prior to final installation? Space at NNS is a critical resource that must be very carefully managed.
Space management is a delicate balancing act that requires a unique skill set; the associated career path is called “production planning and scheduling”, or PPS. Among their many duties, PPS personnel manage the flow of material and assemblies through various construction phases, from the shop to final assembly and shipboard installation. In so doing they allocate and de-allocate temporary storage space across the entire shipyard. Managing temporary storage space ensures a smooth flow of assemblies from shop to ship. Accordingly, temporary storage space occupancies are usually scheduled well in advance of actual material arrival.
One can appreciate the nightmarish problems that may arise when material or assemblies are mis-routed, to (possibly) currently occupied storage space—thus necessitating an immediate re-routing decision. Or, perhaps an assembly is placed into a temporarily vacant storage space that has been scheduled to receive other material, currently in transit. Space blockages and assembly interference with one another can result in several unintended consequences, including lost material and construction schedule delays.
Production of assemblies and construction of the vessel involves a large variety of specialized trades. Welding and pipe fitting are two such trades. Craftsmen practicing these trades must execute their tasks in defined sequences to avoid work sequence conflicts. These conflicts can lead to “rework”: the need to unwind previously completed tasks, put them into a correct sequence, and then redo the work.
An example may be welding an assembly onto a bulkhead after having painted the reverse side of the bulkhead. The welding operation will cause the painted side of the bulkhead to blister, thus causing a repaint of the bulkhead. In this case the correct sequencing of tasks was to weld the assembly to the bulkhead first, then paint the reverse side of the bulkhead later.
Although simple, this example conveys the essence of rework—what it is and how it arises; there are actually many causes of rework, both direct and indirect. The net effect of rework is to slow down the construction cycle. It is a critical driver of cost over-runs and schedule slippages, and therefore must be avoided if at all possible.
Typical Business Problems At NNS
The short discussion above immediately suggests several manufacturing problem types that arise at NNS and that data scientists and their NNS colleagues confront regularly: job-shop scheduling, space utilization/optimization, supply chain optimization, individual process improvement and rework, just to name a few. A unifying thread among all these problem types is that they are classic manufacturing optimization problems, many of which belong to the field of operations research. Many of which are also exceedingly difficult to solve.
It is also the case that the data sets NNS data scientists work with are not large by “big data” standards: perhaps thousands up to a couple hundred thousand records at NNS vs. billions or trillions of records in internet applications. The relatively slow build cycles of our product portfolio and the non-repetitive nature of our manufacturing operations generally limits the size of data sets generated by NNS business operations.
The difficult nature of optimization problems and the (relatively) small size of data sets involved are not causes for despair, however. NNS data scientists routinely and successfully engage optimization problems using a variety of approximation methods, statistical techniques, data visualizations and computational models.We also utilize machine learning algorithms to induce models over data sets that are large enough to support them. The end result is that our models and analyses are deployed into business operations to capture value and effect change. Industrial data science at NNS is impactful.
By way of contrast, let’s look at two common optimization problems arising in the e-commerce industry. Optimizing a customer’s online interactions relies on insights from click-stream analytics, the practice of examining customer website interactions for exploitable behavioral patterns. The business requirements supported by a click-stream analytics study might be to enhance customers’ website experiences (e.g. “personalization”) or to increase sales (e.g., “customer conversion” rates). These goals can result in immediate impact for e-commerce vendors that can be measured in dollars and cents.
Click-stream optimization problems differ from our manufacturing optimization problems because click-stream data sets often contain several orders of magnitude more records. These large data sets, in turn, necessitate distributed data storage solutions and data processing techniques. Statistical sampling methods become important. In contrast, our smaller sized data sets do not require these approaches.
Another difference is time-to-value. In e-commerce, the impact of solving a click-stream optimization problem can often be measured in days or weeks using methods such as A/B testing. In NNS heavy manufacturing, the bottom-line impact of solving a process optimization problem may take months to unfold.
Finally, the velocity and variety of e-commerce click-stream data sets can be remarkable. Some of the larger vendors serve up tens to hundreds of millions of website impressions per day to customers who visit their websites. These data streams can consist of text, photos, video and sound, and all available for customer interactions. Interactions generate additional data for collection by the e-commerce vendor. This means that an e-commerce vendor is simultaneously serving up data to potential customers and collecting data about those customers. The situation is different, of course, at NNS. Data velocity and variety are restricted; our data sets change on a much slower time scale and are comprised mostly of numerical data and text. Industrial data science at NNS is inward-facing, not outward-facing.
Skills Required for Industrial Data Science At NNS
Industrial data science problem solving at NNS is very applied. We leverage programming (using python), data acquisition (using SQL), data visualization skills (using PowerBI), and detailed problem analyses to solve the problems derived from business requirements. One of the most important skills is the ability to translate data science findings into a coherent, meaningful presentation—a “story”—so that customers can actually understand the outcome and its business impact.
A background in applied or computational mathematics is quite helpful, as is the ability to create computational models to solve problems. Pure mathematics generally doesn’t deliver applied or computational skills to the student. What pure mathematics does do, however, is develop problem analysis skills. These skills are essential to extracting viable, high-value problems from business requirements. Solving these problems becomes the focus of a data scientist’s effort.
Within applied or computational mathematics a course in applied linear algebra or numerical linear algebra is helpful. Gaining a solid understanding of applied linear algebra pays substantial dividends: it is at the foundation of countless machine learning algorithms and analytic methods. Understanding that foundation will give you deep insight into these algorithms and methods—not only which algorithm you should try first, and why, but also the limitations of your choice.
That level of understanding is crucial for problem diagnosis when things go wrong, as they inevitably will from time to time. It takes no skill to copy python code from the internet, paste it into an IDE, connect the code to a data set, and then “click the green button” to obtain an “answer”. But that level of understanding only gets you so far. It will not help you at all when difficulties arise, when it’s time to modify your analysis, or when you must explain your analytic approach to another data scientist. If you don’t have a good, basic understanding of what you’re doing, then you won’t be doing it for long.
Acquiring a background in probability and statistics is useful. Typically this is a two semester introductory sequence that includes statistical inferencing and Bayesian techniques. Not all data science problems are solved using machine learning or deep learning—nor should they be! Sometimes careful, statistical data analysis and hypothesis testing is what’s required to support data-driven decision-making.
Finally, an introductory course in algorithms is helpful to understand a few fundamental data structures used in computer science (lists, trees, graphs, etc.) and basic operations on them (sorting, searching, shortest path finding, etc.). I consider the above skill set inventory to be your entry-level toolkit. You will come to rely on these tools as you practice data science. This toolkit also supports further study, should you wish to pursue a data science graduate degree.
Practicing Industrial Data Science at NNS
Industrial data science at NNS is dynamic. We engage a constantly changing, wide variety of manufacturing and non-manufacturing problems, from human resource modeling, to work sequence optimization, deploying tools to support productivity enhancements, modeling and evaluating material delivery risk from our supply chain, and modeling internet-of-things (IoT) data streams with machine learning—just to name a few.
As a result of this diversity we rely heavily on our business partners and customers to provide us with subject matter experts (SMEs). SMEs help us understand data in various business domains, and they may also provide us with access to data that is pertinent to the problem being solved. We also work closely with in-house data engineers to perform extract-transform-load operations on raw data, to clean that data, and to shape it into analytic-ready rectangular data sets. Additionally, regular customer briefs and feedback provide essential “in-flight guidance” as a data science project unfolds.
NNS data scientists require significant “soft skills” in addition to hard technical capability. The ability to work well in a team or independently is essential: Sometimes data science colleagues are available for consultations and sometimes they are not. Whether they are or aren’t is independent of the project delivery schedule. We therefore place a high premium on a data scientist’s independent decision making ability, solution ingenuity and product/analysis delivery speed.
As mentioned we regularly brief customers to gain their feedback. But we also brief senior corporate management at defined, regular intervals. As a result, our data scientists are required to be comfortable and effective in both settings. Effectiveness equates to “getting your point across” in a way that enhances customer and senior management understanding. The business value of what you’re doing and of your findings must be readily apparent.
Concluding Remarks and Observations
In the preceding sections I’ve tried to describe the reality of what industrial data scientists do at NNS, the heavy manufacturing environment in which they operate, and how their projects and activities map to traditional, university-level mathematics preparation. Below I summarize some of my own observations and opinions, based on over 20 years in the field (I was practicing data science before its formal recognition as a field of study and inquiry).
With respect to background preparation, applied/computational mathematics provides you with a useful, practical toolkit for industrial data science practice. That’s because the viewpoint of applied mathematics is different, with a focus on methods to address tough problems and to obtain industrially relevant, useful solutions—often a numerical result presented graphically, or a model of a phenomenon.
Whereas applied mathematics will develop your problem solving toolkit, pure mathematics will develop your ability to translate business requirements into problem statements. The ability to think abstractly enhances the ability to map one into the other, to discover exploitable structure, and to make connections between domains. That said, I’ve never proven a theorem on the job. I may use selected theorems in my work for the guarantees that they provide, but I’m not paid to prove theorems. Instead, it’s the mindset developed through mathematical training and how that mindset facilitates problem solving that is most valuable and applicable to industrial data science practice.
A good, robust solution will always be preferred to a clever method. That’s because a clever method should result in a good, robust solution! Unless you’re employed by an R&D unit such as Microsoft Research or Google DeepMind, clever methods are not ends unto themselves. They must lead directly to measurable business value. NNS data scientists are expected to solve business problems and to produce business value. They aren’t paid to advance the state-of-the-art in data science or mathematics. If that happens as a result of problem solving activities, wonderful! But it is never the primary goal. NNS is no different in this regard from the vast majority of businesses where you may find employment.
Poor problem definition is the pathway to perdition. I’ve seen more data science projects go sideways due to poor problem definition than for any other reason. It’s simple: If the problem you’re solving is not clearly articulated and crisply defined then you’ll wander in confusion. The remedy, of course, is to return to the drawing board and refine/re-define the business problem with customer input to gain needed clarity—if you have the time to do so!
At NNS we refer to the problem definition phase of a project as “sharpening the ask.” We invest significant up-front time sharpening the customer’s ask. We discovered that this up-front investment saves huge amounts of later effort that might be wasted trying inappropriate methods, using the wrong data, or accommodating shifting customer requirements (which by the way end up shifting, in part, due to an ill-defined problem statement!). And once you have that sharpened ask, your mathematical training will help you translate the resulting “word problem” into actionable symbolism. You can then bring to bear the full power of the mathematical machinery available to you.
When required to stand on your own, you are expected to do so, and to make measurable progress. As mentioned above, sometimes your colleagues are not available to provide support. In those circumstances, it’s up to you to lead the way. Your customer will expect this. You are the recognized expert at solving problems with data.
That said, data scientists—at NNS and elsewhere—are not “lone wolves”. Nothing could be further from the truth. Regular team meetings and informal discussions are essential: ideas, perspectives, and problem-solving advice are routinely swapped and discussed. But when this camaraderie is absent for whatever reason—the press of project deadlines, colleagues on leave, or on temporary loan to another department—you’ll need to carry on independently.
About the Author
Joseph Marr is currently the lead data scientist at Newport News Shipbuilding, a division of HII. He is also the delivery owner for an industrial data science team operating within the agile methodology. He received his BS in chemical engineering and applied mathematics from the Illinois Institute of Technology, MS in mathematics from George Mason University, MS in computer science from Johns Hopkins University, and his Ph.D. in chemical engineering from the Massachusetts Institute of Technology. You can reach him at: joseph(dot)marr(at)gmail(dot)com.