Hard Problems in Data Science: An Emerging Science as Multi-Disciplinary Research Field

How do we define data science and is it really that different from single-domain computer science, statistics or econometrics? These were the questions put to Dr. Linnet Taylor and Dr. Ksenia Podoynitsyna during the last of four informal Hard Problems in Data Science sessions held at JADS.

‘It’s often very hard for people who have interesting work to explain what they actually do. This has to do with the fact that their work is not limited to one thing, but actually covers a whole range of domains,’ according to Linnet Taylor, Assistant Professor at Tilburg Law School, as she kicked off the afternoon’s discussion. ‘It’s also the very foundation of creativity: being able to combine perspectives from different disciplines.’ It’s also, she believes, the foundation on which data science is built.

‘The conundrum that I bring to you, though, is that within academia domain knowledge means the knowledge within a specific domain. Yet, increasingly we need people who are more a Renaissance type of people. So not simply, say, a geographer, but a geographer with an understanding of computer science. This has often been judged as being fundamentally incompatible. But, within my job of studying data science rather than doing it, I see how important it is that people understand the society they are working in and act upon what they see. If we don’t step out of our own domains and if we fail to engage with the world around us, we will be stuck with our own biases.’

Taylor calls to mind the currently popular hackathons: ‘They really are an interesting phenomenon: a complete lack of knowledge of a subject or a contextual setting seems to be the best qualification to participate. Of course, it could very well be that new insights and solutions are created. The question is whether we are creating the solutions that fit the context. Access to new data sources enables people to do new analyses, but it’s contextual knowledge that enables people to address problems that have a social component.’

The second speaker Dr. Ksenia Podoynitsyna — Assistant Professor of Entrepreneurship at Eindhoven University of Technology — speaks of The Good, the Bad and the Ugly aspects of multi-disciplinary data science projects. ‘I like doing multi-disciplinary projects, even though it sometimes feels like you’re in a Western movie. Hence the title of the presentation. The ‘good’ thing about multi-disciplinary data science is that we can use same techniques and data sets to extract learnings in different domains. For instance, NASA managed to solve a problem regarding black holes using the same statistical methods as used for geographical research on glaciers. To further develop a data-mining technique researchers have to use a particular dataset, which is often embedded in a business problem. By adding a researcher from the social sciences we can add the domain knowledge and see whether that dataset can also lead to an advancement in the social sciences domain: a clear win-win and better research output for the Graduate School. And then things quickly get ‘bad’! Why? All researchers have our own domain ‘language’. How can and do you compare things between disciplines? ‘Research design’ or ‘scientific contribution’ have very different meanings across disciplines, and it takes time to synchronize on this. Moreover, most university structures do not encourage inter-disciplinary research, while some incentive structure even work against it: publishing with many co-authors means you score fewer academic brownie points in some universities and faculties.’

‘Yet, I see that the researchers who are pushing the boundaries and working outside of the ‘ivory tower’ are those with a multi-disciplinary perspective. Two types of research emerge within multi-disciplinary data science in JADS. One is heavily quantitative, big data-driven in a certain business context such as entrepreneurship, ethics, legal or human-technology interaction. The second is design for data science and data-driven businesses, essentially qualitative although some parts of design validation could be quantitative: think of creating a new venture, engineering of information systems infrastructure, new laws and regulations, or ethical guidelines. Both type are highly relevant.

So what we are left with are the ‘ugly’ practicalities on how to stimulate and support multi-disciplinary research. By bundling PhD students in different disciplines to work together in one large project, assigning supervisors from different disciplines to a particular PhD student, or facilitating application of specialized set of quantitative skills to be applied across multiple projects, JADS could move towards multi-disciplinary research. ‘

Taylor remarks that increasingly data scientists are leaving academia for industry, where they have easier access to the data they want. However, within a business setting, research is often limited to certain business related questions or projects, and can’t be openly published under the conditions that allow academic research to contribute to generating new scientific knowledge — peer review and replicability — thereby limiting research. Companies are also not eager to publish research findings because they may represent commercial secrets, which limits collaboration with external researchers or partners. ‘The downside of doing so,’ according to Taylor, ‘is that they never get beyond their own insights and datasets. This is already a problem for predictive healthcare, which often requires multiple sources of data to provide ground truth for a new dataset. Similarly, insurance companies are struggling with limited datasets. Unlike the business setting, academics are free to ask their own questions. That is a major advantage over business. Society needs data scientists who are brave enough to ask the right questions.’

Concluding the discussion, Podoynitsyna points out how JADS is in a unique position to bring about the required change towards multi-disciplinary science: ‘JADS’s particular strength is the combination strong researchers from five distinct disciplines and its ecosystem. The former enables us to do rigorous research, while the latter enables us to do relevant research, impacting the world and making it better. ’

About the speakers

Dr. Linnet Taylor is Assistant Professor at Tilburg Law School. She researches the interface between big data, rights and democratic representation worldwide, looking at how people are represented through digital data, and particularly at social and economic inclusion and exclusion through data.

Previously, she was a Marie Curie fellow in the International Development Studies department at the University of Amsterdam.

Dr. Ksenia Podoynitsyna is Adjunct Program Director of the master ‘Data Science and Entrepreneurship’ at JADS and Assistant Professor of Entrepreneurship at Eindhoven University of Technology. Her research focus is business models and innovation ecosystems in data-driven and sustainable settings. She has supervised multiple PhD projects, both company and grant-funded.

Life is beautiful. Personally I think moderation is key. Also with Twitter. I hardly ever read my dms. he/him — @seldondigital @jadatascience @vaartsoftware