STEMF-logo-blue-no-back-6

 

Digital Science and two of its portfolio companies, Altmetric and Overleaf, are sponsoring this year’s STEM Fellowship Big Data Challenge. Now in its third year, the STEM Fellowship Big Data Challenge is a competition that helps high school students get excited about data science and its potential to support inquiry-based learning and problem-solving.

The theme this year is “Using impact data to understand and predict the future directions of science”. Competing teams are challenged to extract information from scientific publication attention data provided by Altmetric, collating and presenting insights into the future direction of science. To help manage and write their projects, students will have access to cloud-based collaborative writing tool, Overleaf. Our Vice President of Publisher Business Development, Adrian Stanley, will be on the judging panel.

The full project reports of previous Big Data Challenge winners can be viewed in the STEM Fellowship Journal archives.

To mark the 2016-2017 Big Data Challenge, we caught up with the founder and Executive director of the STEM Fellowship, Dr. Sacha Noukhovitch, and the Director for Big Data Education, Ahmed Hasan, to talk about the origins of the Challenge and more.

Ahmed Hasan  

S.-Noukhovitchahmed

Dr. Sacha Noukhovitch                                                                                                                                                                                                                                                                                                                                                          

What is the STEM Fellowship Big Data Challenge? Why was the challenge created?

SN: The STEM Fellowship Big Data Challenge is a unique endeavor and learning experiment that is focused on the development of the new generation of students’ natural data analysis talents. It has come to the attention of educators that traditional subject-specific and instruction-based learning does not meet the expectations of the industry and – more importantly – does not fit the new generation of student learning styles. The Challenge is, therefore, a pedagogical pilot, testing new forms of cross-disciplinary big data-based learning.

Why choose school students and not university students to compete?

SN: The new learning paradigm came about when qualitative growth of information technology “spilled” into the public domain. It happened five to seven years ago when high-speed Internet became a household norm; this technical change had far-reaching effects on children’s development.

AH: I believe we’re still in a transitionary phase – so to speak – where the onset of the ‘data native’ generation is only beginning to take root. As household technology continues to progress, newer generations of students are becoming increasingly technologically literate. This growth of technical ability in young people has effectively created a new way of learning and processing information, but – as far as we’ve seen – one that is still some time away from fully permeating the university level due to its relatively recent beginnings. Furthermore, a significant amount of university students in a range of fields, from finance to biology, have yet to realize the extent to which data-native thinking crosses over their own respective fields. This is not to say we’re ruling out a Big Data Challenge for university students – we’re just focused on the high school demographic for now.

The Challenge is three years old, could you comment on some of the best work to come out of the Challenge?

AH: The STEM Fellowship Journal is open access, so all previous winning project reports are available to read in full for anyone who visits our website. In addition, all participating teams had their abstracts published, whether they won or not. While all winning papers are absolutely top-notch works, I was particularly impressed by a winning entry from Uszkay et al. (2015) which dealt with large amounts of seemingly impenetrable transaction data. Undeterred, the researchers used a k-means clustering method to define five distinct groups of customers and devise a plan for more effective marketing based off their findings. Another paper I’m a particular fan of is the analysis by Xiang et al. (2016) looking at the relationship between the number of emergency services requests within different Toronto neighborhoods and the distance from first responders. The researchers showed that there are clear instances of neighborhoods that request emergency services at a level disproportionate to service availability, and went a step further in making recommendations towards a real-world problem using open data.

What are your thoughts about this year’s theme: ‘Using impact data to understand and predict the future directions of science’? How did you arrive at this?

SN: We want to harness the analytical skills of students honed through years of using technology, the Internet, and even gaming. By refocusing their skills toward the topics of science and discovery, we hope that students will bring a unique perspective to the problem at hand.

AH: Impact data is more relevant than ever in the age of mass information and global communication, and we believe that it’s very beneficial for students to become familiar with such metrics as early as possible. High school students are at a critical juncture in their lives where they’re deciding what to focus on in their upcoming university studies. Consequently, one of our goals was to allow students to look at potential interests from a ‘big picture’ perspective, examining the current state of research in a field they might be thinking of later pursuing. There is certainly a slightly greater learning curve associated with impact data than with other ‘classical’ open datasets, such as transportation data, but the students have been coping well so far.

Why is the collaboration process between universities, schools and industry important?

AH: As the world transitions to a more data-driven means of functioning, big data analysis skills are increasingly required within industry slowly creating a skills gap; a gap that would certainly continue to grow if left unchecked. This collaboration is important because it not only takes steps to bridge that gap at a relatively early stage of education, it also brings to light new pedagogical approaches that better support the new data-driven generations.

What does the future have in store for the Big Data Challenge?

SN: In the case of data analysis, high school students present a unique group, one that is totally unbiased in their approach to data. The Challenge is much more than a competition among students; it acts as a window into the nature of students, demonstrating how they process problems using data. We also believe that the Challenge leads to grassroots implementation of a new data-based form of inquiry, introducing trans-disciplinary activities in school curricula that start the pedagogical change – one teacher at a time.

AH: A university-level Big Data Challenge is the next obvious milestone, but – the aforementioned generational issue aside – the steps involved in bringing the Challenge to universities are still some ways from being worked out. For now, high school students have been very receptive to the Challenge’s format, and we’re just looking forward to the fantastic output we’ll see this year.

Finally, why include Digital Science’s tools for students to use?

SN: Digital Science is an absolute leader in data analysis among scholarly publishers. Like most of the Challenge participants, Digital Science approaches problems in unique ways. Digital Science’s collaborative tools and portfolio of companies offer new ways for generations to learn about the processes present during the scientific process.

AH: In a competition centered around using data to predict the future of science, it only makes sense for participants to work with tools that are very much involved in driving science forward!