Making big data accessible to students
Code is literally everywhere, and it is increasingly important that undergraduate students acquire computational skills alongside quantitative ones if they are to manoeuvre in the world of big data. However, very few undergraduate programmes currently offer more than a one-off Geographic Information Systems (GIS) class.
Geography's Spatial Data Science pathway is unique: offering students no fewer than three modules, including Foundations, Principles and Applications of Spatial Data Science – to develop both technical competence and critical rigour in dealing with (spatial) data.
This is important because students coming into a Geography undergraduate degree may not have much experience with quantitative data, nor the current crop of tools with which to analyse them. They may also find it challenging to visualise quantities or make maps, and although more students are coming to King’s with some kind of introductory Computer Science class under their belts, these classes have rarely been connected to the practical problems associated with collecting, analysing and interpreting data.
Consequently, the Spatial Data Science pathway has been designed to take students with little or no experience of programming and to empower them to understand and make use of code in real-world analytical challenges. To support students with no prior experience, the pathway also includes an optional introductory ‘Code Camp’ in which students learn the basics of the Python programming language online using geographical examples.
Although there are many ways to learn Python, our staff think that students learn best when they have access to familiar concepts, examples, applications and even names. To enable everyone to join in equally, Code Camp instruction is done using ‘Jupyter notebooks’, which allow students to run Python code in a web browser without having to install anything on their computer, tablet or phone.
The design of our BA and BSc degrees, and of the Spatial Data Science pathway, gives students a chance to ‘try before they buy’. Many students are inspired by the process of learning to code and decide to continue on to further study in Data Science and Geoinformatics master's programmes. Meanwhile, others go on to become data scientists or client partners for data-intensive companies. Yet, if students decide that it’s not for them, then they can stop at any time.
Why we use Jupyter notebooks to teach big data?
Jupyter notebooks removes significant barriers to teaching coding skills and big data analytics because it foregrounds interaction with data rather than with the code itself. We can combine explanation, commentary, code and output (eg graphs, statistics and maps) in a single document. Plus, it all works through a web browser.
Making Python, via Jupyter notebooks, more accessible allows students to learn through ‘hacking’ (in the MIT sense) and thereby come to better understand spatial analytical methods. Students also don’t need to know where their code is running and, together with the virtualisation tools (eg Docker and Vagrant) heavily used by companies like Amazon, Google and Apple, it becomes possible to hide some of the complexity of managing local programming language installations.
The ability to provide rich media and contextual information along with the code is one of the reasons that practising data scientists have adopted Jupyter – and these are the same reasons it is a good tool for teaching.
Trying out Code Camp
We want to get students connected early to the tools they will be using throughout their degree and beyond. They can learn the basics of coding from home via our introductory Code Camp. There is no software to install and we think the camp offers huge benefits for students if they are to hit the ground running with coding when they start their degree.