Article Series Part 5- Citizen Data Scientist: Digital Transformation Debts post-Covid-19

Registration is free. Login or register to view/download this content.

Author(s)

Principal and Chief Scientist, Khosh Consulting
Dr. Setrag Khoshafian has been a senior executive in the digital industry, where he has innovated, architected, and led the development of several digital transformation products, services, and solutions. Currently, he is the Principal and Chief Scientist of Khosh Consulting. Dr. Khoshafian is a pioneer and recognized expert in Intelligent Databases and Intelligent Business Process Management. His expertise spans Process Automation, IoT/IIoT, Blockchain, Low Code/No Code, AI, Design Thinking, and Competency Centers. He is a frequent speaker in international conferences. His TEDx talk covers the importance of Culture over Technology. Dr. Khoshafian is the author of more than 10 books - including How To Alleviate Digital Transformation Debt post-COVID-19 and the seminal Service Oriented Enterprises. He has also authored hundreds of business and academic articles in recognized journals.

Editor’s Note: The DBizInstitute is excited to share this article, written by Dr. Setrag Khoshafian, with our community and in advance of his new book release. Keep an eye on our website as we share additional articles in the coming months written by Setrag, as well as a pending Meet the Author webcast to discuss his new book ‘How to Alleviate Digital Transformation Debt’ expected to air Fall 2021. This article was originally published on CognitiveWorld.com on October 19, 2020.

 

This article is the fifth installment in a ten-part series on Digital Transformation Debt, post-Covid-19. Part 1 focused on Culture, while Part 2 delved deeper into Operational Excellence and inter-enterprise Value Streams As A Service (VSaaS).  Part 3  explained the spectrum of Automation and the shifts post-Covid-19, and Part 4 demonstrated how a new harvest of Low Code/No Code platforms is empowering Citizen Developers.

Due to the Covid-19 pandemic, organizations need to work to become agile and responsive. They need to understand trends and predict actions leveraging enterprise, sensor, customer, and partner data. They also need to be in motion and autonomic. This article, part 5, will focus on another critical dimension for alleviating Digital Transformation Debt: the emergence of the Citizen Data Scientist. Mining patterns from increasingly exploding data lakes and then acting upon those in real-time is critical for survival post-Covid-19.    

Data-Centric Organizations

By any estimate, the digital era is facing an unprecedented explosion of information. Digital technologies, solutions, and content generate 2.5 quintillion bytes of data each day! However, like the IT application development bottleneck, a more severe data scientist challenge is the shortage of Data Scientist. Organizations are hoarding data – but often mining and benefiting from the heterogeneous data lakes is a challenge. A new harvest of productivity, self-service, and drag-and-drop data tools is emerging and allowing citizens to discover and deploy analytical models – predictive, machine learning, or even deep learning. Nothing short of Artificial Intelligence platforms for the masses. We are witnessing the emergence of easy to use Citizen AI tools for customer engagement, with proven results.

In the Covid-19 era, data is becoming even more critical. The application of the models mined from the Covid-19 infection databases is obvious. Equally important are the supply chain, societal interaction, and overall economic trends amid shifts and transformation. The Covid-19 era is also accelerating the “Process +Data” narrative, where organizations need to complement and balance data-centricity combined with digitization and Automation of value streams. Bottom line – pre or post-Covid-19 – it is not just about the data. The insights need to be mined, discovered, or harvested from the vast, often messy lakes of data. Raw data to insights should be the mantra. Once insights are discovered, they need to be acted upon.

Database Management Systems (DBMS)

DBMSs that separated the management of the data from the application started to appear in the 1970s with navigational, hierarchical, and network models. In the 1980s, we saw a significant evolution to relational databases that became quite popular, especially with SQL’s emergence as the de-facto query language for databases! The evolution of databases from relational included Object-Oriented Databases that combined Object-Oriented and Database capabilities for persistent storage of objects and Object-Relational Databases that combine the characteristics of both relational and object-oriented databases.

More recently – especially for handling large unstructured multi-media data in new digital applications – we saw the emergence of NoSQL to handle the demands of Big Data; with large volume, variety, velocity, and veracity. This new generation of database focuses on dealing with the explosion of heterogeneous data and the storage and management of this data for innovative internet applications (especially IoT). Still, by and large, most transactional data for mission-critical systems of record (which require transactional integrity) remains relational. All these trends are culminating in intelligent DBMSs.

Data Lakes

Recently we have also seen the emergence of ‘Data Lakes.’ Here is how AWS explains Data Lakes: 

“Faced with massive volumes and heterogeneous types of data, organizations are finding that in order to deliver insights in a timely manner, they need a data storage and analytics solution that offers more agility and flexibility than traditional data management systems… Data Lake allows an organization to store all their data, structured and unstructured, in one, centralized repository.”

The following illustrates the key components and capabilities of a Data Lake:

The emergence of many heterogeneous data sources is at the core of the Data Lake. According to Aberdeen, there is a clear distinction in business execution between Data Lake leaders and followers (a.k.a. lagers). Strategic Data Lake investments and maturity characterize the leaders. 

The Data Scientist

The sections above illustrate the complexity of Data in enterprises – too many databases, repositories, sources, and strategies. The Data Scientist role is a relatively new one. Many assumptions that we had taken for granted in the management of databases, including integrity or logic pertaining to the independence of the data from the application, are now being challenged. The past couple of decades have created powerful gatekeepers of the enterprise data (the Database Administrators (DBA)) who sometimes block agility and the speed of change needed to sustain business requirements. The world – or I should say the digital world – is changing. The introduction of NoSQL databases, especially for Big Data, has introduced additional complexity for managing and maintaining heterogeneous DBMSs consistency. This transformational change emanates from the need to engage customers directly. It also results from the explosion of information on the Internet, especially with the Internet of Things. But more importantly, the mining of business value through analysis and machine learning techniques has given rise to this new – and sometimes DBA evolved – role in the enterprise, namely the “Data Scientist.”

Data Science is complicated and multi-disciplinary. Here is a definition of the role of a Data Scientist from a business perspective: 

“A data scientist identifies important questions, collects relevant data from various sources, stores and organizes data, decipher useful information, and finally translates it into business solutions and communicate the findings to affect the business positively.”

Data Science involves many disciplines. Data Scientists need to have many skills – from mathematics, statistics, machine learning, to programming, and more. Perhaps more importantly, Data Scientists need to communicate and present their findings in clear terms that the business understands. They also need to be subject matter experts and creative—one role for all this spectrum. No wonder Data Scientists are in great demand!

Here is a great illustration of Data Science:

The Data Scientist’s continuous activities are three fundamental areas: Data Analysis, Programming, and Business Analysis for concrete business results. Unfortunately, poor data quality complicates the Data Scientist’s tasks and objectives. About 70% of their effort is to ingest, prepare, and cleanse the data.

In my interactions with Data Scientists, they sometimes object to this estimate. It is more. In other words, only 10% – 30% of their time is the discovery of meaningful insights and business value from the often unruly and heterogeneous data sets!

The Citizen Data Scientist

As indicated above, Data Science involves many disciplines. According to Gartner, “citizen data scientist as a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.” They also predict 40% of Data Science tasks will be automated by 2020! Well, we are in 2020 and not even close to that level of Automation. Still, Data Scientists spend 70% or higher their time cleaning and preparing the analysis and discovery data.

Despite many technological advances, methodologies, and techniques, most organizations still suffer from Business-Technical Developers-Operations silos. The trend towards empowered Citizens who can achieve Data Science objectives is not hype. It is also not a panacea. It does have challenges.

The good news is that some emerging tools and platforms are addressing the requirements of Data Scientists. Intelligence and Automation in all the milestones and phases of the Data Science workflow make it real for Citizen Data Scientists

Here are some productivity, intelligence, and automation technologies that are targeting the inevitable trends towards a Citizen Data Scientist platform:

  • Automation of Data Preparation: This is the most crucial category, like cleaning and preparing the data constitutes more than 70% of the Data Scientists’ effort. We are starting to see some tools addressing these needs. Tableau Prep, for example, “… changes the way traditional data prep is performed in an organization. By providing a visual and direct way to combine, shape, and clean data, Tableau Prep makes it easier for analysts and business users to start their analysis, faster.”

  • Low Code/No Code Data Integration: Several emerging and robust tools automate data integration and aggregation from different sources. Most structured and unstructured databases have Application Programming Interfaces (APIs). These productivity and automation tools provide easy to use drag and drop capabilities for Data integration. Parabola is an example of a Low Code/No Code platform for automating integration.

  • Automating Machine Learning (AutoML): Automation in data integration and preparation is a pre-requisite for analysis and machine learning. Machine Learning leverages Artificial Intelligence (AI) algorithms to discover patterns in the data. It is critical in the overall Data Science process. Now, when we shift to Citizen Data Scientists, it becomes critical to automate Machine Learning. Here is one definition of AutoML – which is a bit extreme but drives home the objective of AutoML: “Automated machine learning, or AutoML, aims to reduce or eliminate the need for skilled data scientists to build machine learning and deep learning models. Instead, an AutoML system allows you to provide the labeled training data as input and receive an optimized model as output.” Several vendors are positioning their advanced AI automation tools as AutoML – this includes Google’s Cloud AutoML and IBM Watson’s AutoAI.
  • End-To-End Citizen Data Science Tools: As described earlier, the multi-discipline Data Science has many phases. The overall workflow involves data sourcing, preparation, analysis, modeling, prioritizing the models, and then deployment. One example of such a platform is DataRobot. Here is how they describe their support for Citizen Data Scientists: “Citizen data scientists can upload a dataset to DataRobot and pick a target variable based on the practical business problem they wish to solve. The platform automatically applies best practices for data preparation and preprocessing, feature engineering, and model training and validation.” The following illustrates the end-to-end workflow for Citizen Data Scientists.

With these platforms, the dream of a Citizen Data Scientist spanning Automation and self-service with drag and drop intuitive productivity tools are slowly becoming a reality. We still have a long way to go.

Recommendation: Citizen Data Scientists for the Data-Centric Enterprise

Data Science is complicated: The solution market is fragmented and confusing. Yet, they provide tremendous advantages when developing and deploying innovative applications. The speed of development could be existential – especially in the post-Covid-19 world.

The Covid-19 pandemic delivers a robust opportunity to rethink roles and tools for innovation and become a startup or an enterprise in motion. Here are the top recommendations:

  • Citizen Data Scientist Culture: This is extremely important. Some business stakeholders in enterprises or founders in startups might be reluctant to get involved in “Data Science.” Given the complexity of Data Science, this will most likely be a partnership between conventional data science technical roles and business savvy Citizen Data Scientists for specific data science workflow milestones.
  • Data Cleansing and Preparation Automation: The first place to start the automation and self-service Data Science is the data cleansing and preparation phase, which typically consumes 70%+ of the Data Scientists’ efforts. Given the heterogeneous data sources, this is quite complex, but it is critical for success. This typically needs a partnership between technical Data Scientists and Citizen Data Scientists – with most of the technical tasks assigned to the former and the data schemata assigned to the latter. 
  • Reskill and Upskill for Data Visualization and AutoML: Organizations need to leverage their employees, especially for the Data Visualization and the increasingly important area of AutoML or AutoAI. The visualization market is quite mature with tools such as Tableau. AutoML is more challenging but also more promising in terms of business value. Many software vendors are starting to provide robust solutions for AutoML. Therefore, following and re-skilling Citizen Data Scientists from Visualization to AutoML is critical.
  • Digital Design Sprints – being lean and effective. There is a perfect fit either during or immediately post the 4-5 day methodology to leverage Low Code/No Code for a Minimum Viable Product (MVP). The end-user testing can – and most likely will – end up with enhancements that could be easily and speedily achieved with a Low Code/No Code platform. 

This is a ten-part article series centered around Digital Transformation Debts in the wake of the Covid-19 pandemic. Be sure to check out all ten articles!

Similar Resources

Featured Certificate: BPM Specialist

Everyone starts here.

You're looking for a way to improve your process improvement skills, but you're not sure where to start.

Earning your Business Process Management Specialist (BPMS) Certificate will give you the competitive advantage you need in today's world. Our courses help you deliver faster and makes projects easier.

Your skills will include building hierarchical process models, using tools to analyze and assess process performance, defining critical process metrics, using best practice principles to redesign processes, developing process improvement project plans, building a center of excellence, and establishing process governance.

The BPMS Certificate is the perfect way to show employers that you are serious about business process management. With in-depth knowledge of process improvement and management, you'll be able to take your business career to the next level.

Learn more about the BPM Specialist Certificate

Courses

  •  

 

Certificates

  • Business Process Management Specialist
  • Earning your Business Process Management Specialist (BPMS) Certificate will provide you with a distinct competitive advantage in today’s rapidly evolving business landscape. With in-depth knowledge of process improvement and management, you’ll be able to take your business career to the next level.
  • BPM Professional Certificate
    Business Process Management Professional
  • Earning your Business Process Management Professional (BPMP) Certificate will elevate your expertise and professional standing in the field of business process management. Our BPMP Certificate is a tangible symbol of your achievement, demonstrating your in-depth knowledge of process improvement and management.

Certification

BPM Certification

  • Make the most of your hard-earned skills. Earn the respect of your peers and superiors with Business Process Management Certification from the industry's top BPM educational organization.

Courses

 

Certificates

  • Operational Excellence Specialist
  • Earning your Operational Excellence Specialist Certificate will provide you with a distinct advantage in driving organizational excellence and achieving sustainable improvements in performance.
 

 

OpEx Professional Certificate

  • Operational Excellence Professional
  • Earn your Operational Excellence Professional Certificate and gain a competitive edge in driving organizational excellence and achieving sustainable improvements in performance.

Courses

Certificate
  •  

  • Agile BPM Specialist
  • Earn your Agile BPM Specialist Certificate and gain a competitive edge in driving business process management (BPM) with agile methodologies. You’ll gain a strong understanding of how to apply agile principles and concepts to business process management initiatives.  
 

Business Architecture

 

Certificates

  • Business Architecture Specialist
  • The Business Architecture Specialist (BAIS) Certificate is proof that you’ve begun your business architecture journey by committing to the industry’s most meaningful and credible business architecture training program.

  • Business Architecture Professional
  • When you earn your Business Architecture Professional (BAIP) Certificate, you will be able to design and implement a governance structure for your organization, develop and optimize business processes, and manage business information effectively.

BA CertificationCertification

  • Make the most of your hard-earned skills. Earn the respect of your peers and superiors with Business Architecture Certification from the industry's top BPM educational organization.

Courses

 

Certificates

  • Digital Transformation Specialist
  • Earning your Digital Transformation Specialist Certificate will provide you with a distinct advantage in today’s rapidly evolving business landscape. 
 

 

  • Digital Transformation Professional
  • The Digital Transformation Professional Certificate is the first program in the industry to cover all the key pillars of Digital Transformation holistically with practical recommendations and exercises.

Courses

Certificate

  • Agile Business Analysis Specialist
  • Earning your Agile Business Analysis Specialist Certificate will provide you with a distinct advantage in the world of agile software development.

Courses

Certificate
  • DAS Certificate
  • Decision Automation Specialist
  • Earning your Decision Automation Certificate will empower you to excel in the dynamic field of automated decision-making, where data-driven insights are pivotal to driving business innovation and efficiency.