The evolution of data science
The idea of data science as we understand it today is a fairly recent concept. The term was first seen in a 1974 textbook by Peter Naur. The field expanded gradually during the 2000s. Then, when big data emerged as a prominent topic at the tail end of that decade, data science attracted even greater attention as an invaluable analytical practice.
In terms of day-to-day practice, the trade of data science is plied by data scientists, data engineers, and data analysts. These individuals are well acquainted with the data sources of the organizations that employ them, including traditional databases as well as data warehouse and data lake setups. Coding, mathematical, scientific and statistical analysis techniques are essential data science skills, employed alongside artificial intelligence (AI), machine learning (ML) and analytics tools. A data scientist not only crafts meticulously detailed reports on key aspects of enterprise operations and mines these results for methods of improving their business: Each data scientist will also constantly look for new ways to strategically distribute and structure the organization's data stores in a manner that contributes to optimal application function and is cost-effective.
5 pivotal trends in modern data science
Groundbreaking developments take place on a near-daily basis in data science. But the frequency with which these changes are emerging doesn't make each one any less extraordinary. There are several particularly compelling trends that will be the major subject of conversation in the data science sector during the next few years:
The dominance of AI and ML
At this point it's hard to imagine data science functioning without AI and machine learning. They have both grown notably more sophisticated in recent years, enabling organizations to realize more immediate business outcomes. While AI and ML have already gained purchase in data science, the next few years will see them fully enter the mainstream, especially at the enterprise level.
According to Gartner, AI and ML open up a number of possibilities for data science and analytics. For one, they allow organizations to adapt to running certain operations with less data using "small data" techniques, a necessity that has emerged in the wake of the COVID-19 pandemic making historical data less relevant. ML also has the potential to help facilitate the "XOps" framework. The workplace trends journal Reworked characterized this idea as a technology stack that unites data, ML, modeling and platform functions for greater operationalization, reducing redundancies and inefficiencies to allow for more automation. Furthermore, it optimizes decision intelligence, a factor that will become increasingly important as enterprises' business units look to better categorize their large-scale decision-making into optimized and repeatable processes.
"Small data" and "wide data"
Big data has been the name of the game for most of the time that data science has existed as a modern discipline. This is understandable, given the sheer volume of data that organizations generate—especially those under the enterprise umbrella. Now, that vast amount of raw data has led businesses to realize that analyzing it at scale is not always the best approach.
Hence we see the emergence of small and wide data. They can be categorized as such:
- Small data: Less data by volume is required in small data applications, with the idea that it can be processed as quickly as possible. It sometimes runs in conjunction with a specific strand of machine learning called "TinyML." According to Forbes, small data that uses TinyML algorithms can power applications running on low-powered hardware without relying on a cloud server, as with self-driving vehicles.
- Wide data: Data falling under this category can be either big or small—either way, it originates from a wide range of different sources across the enterprise. Analyzing wide data can be immensely valuable to cross-functional teams and multi-department initiatives, both of which are all but guaranteed to need data from many disparate sources. The famous anecdote about Target's marketing department using analytics to target families expecting newborn children is an example of wide data in action.
Operations at the edge
Data generated at the edge, where devices and physical assets reside, is no less valuable than data within the cloud or in the context of data centers or other on-premises infrastructure. As such, data scientists must account for it in their data architecture and storage considerations—and especially factor it into analytics operations.
Data Science Central pointed out that in the next several years, data analytics as a whole may largely shift to the edge, so as to more efficiently process the data that is within edge devices or otherwise in proximity to IT infrastructure. This will allow data leaders and their teams to more readily scale up and bring the value of their services to more units of their enterprises, while also significantly cutting down on latency in real time.
Further realization of cloud's possibilities
Like the tenets of modern data science and the emergence of big data awareness, cloud computing began to permeate the enterprise mainstream in the late 2000s and early 2010s. It makes perfect sense, then, that these things have become intricately intertwined.
Cloud, in particular, has opened up so many opportunities to optimize the value of enterprise data. These range from quick upscaling of public cloud resources to accommodate sudden workloads and their associated traffic to processing and streamlining the massive data sets that drive AI and ML operations. We will delve more deeply into cloud trends and their role in data analytics a little bit later in the article, but it bears noting here that the cloud's importance to data science will only increase in the near future.
While the developments described above are hardly the only prominent trends to follow in this field, they're certainly a good place to start for those less steeped in the ins and outs of contemporary data science.
Notable challenges facing data science
With a field as complex as data science, it's only to be expected that there will be certain difficulties enterprises will face as they look to make this discipline a key element of the organization's processes.
Lack of consensus
Back in 2013, Forbes contributor Gil Press discussed the lack of a standardized definition for data science and how that caused conflict among enterprise stakeholders looking to leverage the discipline.
Many enterprises now have a much better understanding of data science's value, but lack of consensus can still cause problems—just in a different way. According to Towards Data Science, disagreements may arise when data professionals and product managers or other department heads have opposing views for how data should be used to define and solve a business problem. For a data science project of any kind to succeed, there must be a unified strategy.
Overfitting
When a machine learning model develops so that it exactly matches its training data, overfitting has occurred. In this context, the problem of overfitting—which has been a challenge facing analytics since the advent of the concept—limits the ML tool's ability to accurately analyze new data. Backtesting and reinforcement learning can help mitigate this potential problem, but it must always be monitored.
Redundancy from multiple data sources
All enterprises have many different data sources, not all of which are easily accessible to data scientists exactly when needed. It'll be critical for data teams to use a leading-edge analytics platform that allows integration and brings analysis to the source, rather than forcing data scientists and analysts to make copies of disparate source data and create a mess of redundancy.
The dangers of team siloing
Data science involves numerous subcategories, managed by experts in such niches—e.g., programming-focused data scientists vs. analysts who specialize in visualization tools and so on. If these professionals operate in silos and do not readily communicate, this can cause serious problems for the organization. It'll be crucial for data teams to use analytics frameworks and tools that allow them to easily collaborate.
The data expert shortage
While it's unclear how long this will remain an issue, it's certainly true that the demand for data science professionals exceeds the number of these individuals on the job market right now. It will take time for this to change. In the interim, enterprises' senior data scientists and chief data officer (CDO) can institute training for employees who want more hands-on involvement with the data that powers their operations.
Applying data science to analytics in the cloud
When leveraged to its full potential, data science can be a resource that translates organizations' data into actionable insight to deliver important outcomes. These may range from improved fraud detection to reduced customer churn. With comprehensive analytics, employees, department heads, and C-level executives alike can gain a greater understanding of what makes the enterprise tick.
To deliver on the promise of a thriving enterprise, a multi-cloud architecture is indispensable for an enterprise looking to develop its data science capabilities. Teradata Vantage is the ideal platform for achieving total control of and visibility into enterprise analytics. To learn more about how Vantage supports data science initiatives, review our library of case studies and find out how businesses are leveraging the platform.
Learn more about Vantage