Code and data are central to research in all fields of study, from bioinformatics to digital humanities. That’s why F1000Research offer a dedicated article type for describing novel research software –supporting reproducible research and providing credit for all research outputs. But how can researchers take their Software Tool Article to the next level?
Code Ocean compute capsules are the perfect match for Software Tool Articles on F1000Research. Embedded within the article itself, these widgets use cloud-based Docker technology to allow readers to run (and re-run) your code, right there in the body of the article. We spoke to Simon Adar, CEO of Code Ocean, about all things software, code, and reproducibility.
Let’s go back to the beginning – how did Code Ocean start?
I’m a researcher myself, with an electrical engineering background. While I completed my PhD at Tel Aviv University, I was working with the German Space Agency on a big, collaborative project studying the environmental impact of coal mining areas. For the first year of this project, I was screening hundreds of articles looking for methods but finding it increasingly difficult to reproduce the research I was reading.
For one thing, most articles did not have the necessary code and data available to even attempt to reproduce the methods. If code or data was available, it was hard to get it up and running. Researchers all use different operating systems, different versions of software, different programming languages, and so on. Finally, if I did manage to get it running, most of the time I didn’t get the same results.
Beyond reproducibility, another hurdle I faced was simply having access to the right computing equipment. As the datasets I was working with grew, so did my computing needs. I could no longer work from my own personal laptop, I needed access to the university cluster and the server in the lab. But these resources were in high demand with my colleagues, and often suffered from downtime and security restrictions which held up my work. I ended up having to purchase my own more powerful computing machines on Amazon Web Services (AWS) in order to finish my project work, because access to shared equipment was proving so difficult.
When I finished my PhD, I knew that this was a big problem for the world of science. A problem worth solving. I had wasted too much time on computational reproducibility, and I knew this was an issue across all fields of science. I applied for a programme at Cornell Tech which would enable me to start a company to solve these problems, and so Code Ocean was born.
Today, Code Ocean provides tools for researchers around the world to create, collaborate on, and execute computational code and data. Reproducibility is fundamentally built into the platform, and with direct access to the cloud, users don’t need any extra hardware or software. We also work with the biggest publishers, including F1000Research, to bring code to life in published research, so that readers can run (and re-run) computational code right there in the body of the article using Code Ocean compute capsules.
Let’s talk a bit more about reproducibility. Why is it such an issue at the moment for the research community?
So reproducibility has two aspects to it. Firstly, it’s about the validity of scientific results – the assurance of the findings. This is essential for research which plays a role in policymaking, as it has the power to influence the daily lives of people everywhere. Think about the current climate, with the coronavirus pandemic: if covid-19 related research is not reproducible, this could have a negative impact on the decisions our governments are taking on public health measures. Think about covid-19 testing: if the tests are not reproducible, how can we trust them? I often say, it’s not really science if it’s not reproducible. If it’s not solid science, wrong decisions will be made.
So the first part of reproducibility is this validity. The second part is around reuse of science and building on research which others have done. This is the part which really speaks to the Code Ocean story; when I was doing my PhD, I needed to reuse the methods and data from other research to support my own projects, but I just couldn’t do it. The whole concept of science is building upon each other’s work – Isaac Newton said, “If I have seen further, it is by standing on the shoulders of giants”. But if we’re not able to reproduce another’s work, how can we build on it?
So where exactly does Code Ocean fit into this ‘reproducibility crisis’?
Code Ocean tackles computational reproducibility – anything to do with digital data and computational analysis. This could be statistical methods, algorithms, pipelines – anything that takes data as input, runs code written in any programming language, and then generates an output like a file, a figure, a table, and so on. The Code Ocean compute capsule incorporates all the required elements to generate the same results today, tomorrow, and every day in the future. The platform provides assurance on how the results are generated, with reproducibility at the heart of the model.
When you’re reading an article that has the compute capsule embedded in the article, researchers can reproduce the analysis at the click of a button, right there and then.
What are the benefits of opening up code and data through tools like Code Ocean, for anyone to see and use?
So everything on Code Ocean is open access. The benefits go back to the issues I faced when I was a researcher doing my PhD – if you’re not sharing the code and data, how can you rely on the assurance of the findings? It gives a stamp of quality for your research when everything is out in the open. Transparency means your research community can trust the science, and even re-use it in their own work.
What are the key challenges facing researchers working with code, software and data?
We’ve already talked about reproducibility and access to the right equipment for their research. But another issue is the lack of an established standard, which makes it hard to reuse the work of others. Best practices for reproducibility are not clear to most researchers. This is why most research projects are not organized clearly so that data and code are distinguished from one another and environment dependencies are provided. This often results in the need for both high skill setup level and time investment from anyone who would like to reuse this project. In Code Ocean, best practices are embedded in the product to guide you through each step and reuse by anyone is as easy as sharing a link. No setup or troubleshooting needed to run or augment the project.
Coming back to reproducibility for a moment, this isn’t just an issue which researchers need to think about when it comes to the end of a project. It should be something they’re thinking about in their daily work. With the Code Ocean platform, every line of code is reproducible from day one. The more people are using reproducible computational workflows like Code Ocean for their daily work, the better the quality of that work will be.
What fields of research are using Code Ocean at the moment?
We’re seeing uptake of Code Ocean across the board, but there is something special happening with bioinformatics and computational biology, across academic and private sectors. We also have use cases from computer science, engineering, and even the social sciences.
I think we have time for one last question. What does the future look like for code in research?
It’s very clear that computational reproducibility is going to be a must-have for the scientific community. Right now, there are certain researchers leading the charge, but in future this will be mainstream scientific culture. If it’s not reproducible, it won’t be regarded as science. People will ask, “How come we used to publish research, but only the article?”. It will be unthinkable!
Want to find out more? Head to the F1000 Research page on Software Tools for more details of how this article type supports reproducibility in research.