Greg Page is a Master Lecturer at Boston University Metropolitan College, where he teaches applied business analytics and data mining. He co-authored Lobsterland: A Python-Based Approach to Data Exploration, Statistical Analysis, and Machine Learning. Greg is also an instructor in UNSSC’s Data Quality for Impact, with Python programme. 

Dr. Peter Larkin is a Data Scientist at the United Nations Joint Staff Pension Fund, Office of Investment Management (UNJSPF OIM). He is a distinguished quantitative researcher and data scientist with over 15 years of experience in applying advanced data analytics and AI across global financial and risk institutions. 

Data quality is fundamental to reliable analysis, as issues in the data can silently distort results and lead to misguided decisions. In this interview with UNSSC’s Yiyi Hu, Greg explores common challenges in working with data and reflects on the mindset required for good data analysis. Dr. Larkin draws on his experience at the UN Pension Fund to show what data quality failure looks like in practice — and what it takes to build systems that catch problems before they reach decisions. 

Yiyi: When working with data, what are some of the most common issues you encounter, and how can they quietly affect results? 

Greg: Some of the most common issues around data, and datasets, are the seemingly little details that can quietly become big problems, depending on how they’re handled. For instance, how should a data analyst handle missing values? There is never a “one-size-fits-all” answer to this. Often, everything depends on the context. 

If a huge percentage of the values in a column are missing and the analyst just follows a simple playbook rule that says “replace the missing values with the median of the known values,” then, on the one hand, the problem will seem to have been “solved” rather quickly — but now, what new problems have been created? Have summary stats become distorted? Did new and misleading relationships among variables just get introduced to the dataset? 

Yiyi: Data quality problems can quietly erode trust in organizations. How does this play out specifically in the UN context? 

Dr. Larkin: In an investment context like the United Nations Joint Staff Pension Fund, trust really depends on accurate, timely data. If valuations are wrong or trades don’t reconcile, that rolls into inaccurate performance reporting and flawed risk models, which ultimately erodes confidence with stakeholders and beneficiaries — and for a pension fund, that’s not acceptable. 

How do we ensure trust? Firstly, governance is essential. We need clear data ownership so every critical dataset — whether security master or transaction records — has accountability. Using a data catalogue and lineage tracking gives full transparency into where the data came from and how it has been transformed, ensuring it’s fit for purpose. 

Secondly, we operationalize quality. Dashboards with key metrics — accuracy, completeness, timeliness — paired with threshold-based alerts change the game, signaling issues before they affect decisions. 

Yiyi: From your experience, how can tools help strengthen data quality for more reliable decision-making across UN programmes and policies? 

Dr. Larkin: Tools make reliable data quality management real. Enterprise data quality platforms are complemented by lightweight automation — for example, Python is great for building validation scripts and anomaly detection models to catch outliers, missing values, or duplicate trades. Coupled with visualization tools, reliable data quality management provides continuous monitoring across pipelines instead of silent failures. 

The next frontier is Augmented BI. Think conversational interfaces over the data, so we can ask ‘what am I seeing?’ or ‘what assumption drove this?’ That transparency gives confidence in making decisions based on trusted, high-quality data. 

Yiyi: UNSSC has designed a new programme on data quality using Python. As an instructor in the course, what practical habits will participants learn to help them identify and address the issues we have been discussing? 

Greg: Participants in the course will learn to engage with an interactive process in which they’re always querying and questioning the data. Together, we’ll practice and hone a workflow that involves things like checking for missing and impossible values, spotting outliers and other anomalies, and using simple visualizations to explore datasets.

In just a few sessions, participants will gain a strong familiarity with the pandas, numpy and matplotlib libraries in Python. Participants will use Google Colab, a tool that allows them to run Python in a web browser. From the very beginning, we will explore how to leverage the power of Gemini, a large language model embedded directly in Colab, to help make Python syntax far less daunting than it would otherwise be. 

Yiyi: In a nutshell, after completing this course, what do you think will be the most meaningful change in how learners approach data quality in their daily work? 

Greg: Participants in the course will gain greater appreciation for the importance of data quality through the entire process of handling, exploring, visualizing, and even modeling with data. Learners will come away from the course with a mindset shift — rather than just focusing on whether the model works, or whether the analysis looks right, they’ll be more in tune with the reliability of the underlying data. 

Interview by