ben adams

We are very excited to bring you a new interview for our #FoundersFriday blog series! If you’ve missed our previous posts, Founders Friday is a forum  in which we interview the founders of different scholarly communication businesses, asking them to share their advice for others and their perspective on the industry as a whole.

For this edition, we have interviewed Ben Adams (@darwinzero), Co-Founder of Simiary, a software company that wants to improve the methods of how we process massive amounts of new content by offering intelligent exploratory search and analysis software. Simiary is one of two other companies (Writefull and Etalia) to be recently awardedDigital Science Catalyst Grant!

Steve Scott, Director of Portfolio Development at Digital Science adds:

“Simiary is approaching search in a new and exciting way, giving researchers the ability to construct and search across domain-specific relationships within their subject areas. With their intelligent parsing of content, we believe their software will help garner unique insights into new and existing material, especially in an increasingly cross-disciplinary world.”

What is Simiary?

Simiary is a software company that uses AI and visualization technologies to augment human learning, exploration, and discovery of research knowledge.  Our software tools can ingest massive amounts of unstructured scientific text and data, such as articles, data abstracts, historical documents and governmental reports. By using machine learning and other entity recognition techniques, our tools automatically refactor and re-index the content, allowing researchers to interactively explore the data via multiple contextual search engines.

What were you doing before Simiary? Talk about your “aha” moment – if there ever was one?

I started building a search engine called Frankenplace as a computer science Ph.D. student at the University of California, Santa Barbara. I was looking at what we can learn about places from the heterogeneous, big data that is generated from all kinds of different sources: social media, news articles, historical documents, etc.  The research was to use cutting-edge data science methods (text mining, machine learning, and other AI) to bring all of this place-based information together in an organized way.  Frankenplace was a clever name made up by my office mate, Grant McKenzie, who is now an Assistant Professor at the University of Maryland.  It is meant to capture the notion that our ideas of places are really patchworks made of the experiences of many people.  The name stuck because it was catchy, but at some point, I flipped the question around and started asking: what if we fix the topic and then look at how that topic manifests across many different places around the world?  In other words, can we make geography the contextual fabric over which any kind of topic could be explored? That lead, over a period of time, and through many iterations, to Frankenplace.

“The first “aha” moment I had was when I started to use Frankenplace and began learning and discovering a wide range of new information about topics that I would never have found using a traditional search engine”

What struck me is that by refactoring the content of these documents into a different frame of reference, namely a geographical one, it enabled the discovery of all kinds of new information.  The first “aha” moment I had was when I started to use Frankenplace and began learning and discovering a wide range of new information about topics that I would never have found using a traditional search engine; at that point, I knew it was something that had real potential to add value to how people do research.  What’s more, this was for something that was based on Wikipedia data – what if we were to make this possible for all kinds of documents and data?

It turns out that geography is a domain that touches on many different areas of research, including the agricultural sciences, archaeology, anthropology, digital humanities, economics, earth sciences, ecology, history, medical sciences, paleontology, political science, sociology, and urban planning. Yet, when we publish in these disciplines, even when it is open, it is siloed, at least in the sense that it shows up in a specific journal for a specific field, or a data repository meant for researchers in a specific discipline. For example, our current way of organizing knowledge doesn’t do a great job of helping the historian, who is interested in the history of mineral extraction, to find relevant geological articles, or find relevant primary historical sources by geographic location.

Another example comes from my own personal experience working as a post doc at the National Center for Ecological Analysis and Synthesis (NCEAS). Scientists at NCEAS were attempting to generate new results based on the re-use and integration of data sets that were created at different times for different purposes.  In many cases, the synthesis was based primarily on the notion of shared geographic or regional scope; the types of sources also varied a great deal. Researchers relied on their individual social networks, and a collection of learned sources to find information, but what insights were missed that could have been found if there was an easy way to explore using geography? And, of course, geography is just one of many lenses through which we can structure and explore our scientific knowledge.  Time is another dimension, as well as structured scientific concepts like biological taxonomy.

The other piece of the puzzle came from working as a research fellow at the University of Auckland’s Centre for eResearch, where I also met Simiary’s other co-founder Richard Hosking. It became clear to us that asking researchers to exhaustively describe all their research products with metadata is simply not a sustainable model.  Even when a data producer is generating detailed metadata, they cannot anticipate how someone else in a different context will want to use the data, so sometimes the metadata misses the mark.  That really hammered home the point that we really need to build better automated methods to refactor, organize and present all this scientific information into different frames that are broadly useful.

Why is Simiary needed? 

Much has been written about new types of data-driven research and the great potential it holds, but at the same time, it has led to a series of growing pains for institutions and research communities.

“There’s a massive amount of data being made available, and we need better ways to interactively navigate through it.  We also need to build tools that can increase the likelihood of the serendipitous discovery of new connections.”

Open science and making data available is a wonderful first step, but making data open is not the end game! There’s a massive amount of data being made available, and we need better ways to interactively navigate through it.  We also need to build tools that can increase the likelihood of the serendipitous discovery of new connections.  We generally perceive scientific research as a process following a set of principles: hypothesis making, experimental testing, and theory building. But, the real work of science is much more complex and circuitous, and exploratory research without a clearly defined end goal plays a big role.

At Simiary, we are taking on this growing gap between the exploratory aspects of scientific investigation, and the increasing flood of information we are generating.  We want to augment the way that researchers search for information in a way that helps them make fresh connections. These same set of tools can also be applied to institutions and content producers.

As the founder of a business, what are you most proud of?

Taking the research I did as a Ph.D. student and turning it into something practical that could make a real impact to how people learn and discover new knowledge. Founding a business is exciting, but it also requires taking risks that you don’t have to face in other situations; it’s incredibly satisfying taking those risks!

How will the Catalyst Grant help further your vision?

The Catalyst Grant provides “smart money”! We get cash, which gives us resources to build more features, but more importantly, we have access to expertise and a network of companies that are under the umbrella of Digital Science. To have the opportunity to learn and work with a group of companies that strive to make science function more efficiently is brilliant!

As a young start-up, what advice would you give those who have an idea? If you could go back in time and give your pre-startup self one piece of advice, what would it be and why?

“A key piece of advice is to not be too invested in “The Idea.” Instead, focus on actually building a real, working demo application – people will always respond more positively seeing something that actually works.”

We are still a very young start-up so if you ask this question in a few months the answer might be different!  A key piece of advice is to not be too invested in “The Idea.”  Instead, focus on actually building a real, working demo application – people will always respond more positively seeing something that actually works.  The most important thing to do is to talk to other people about your idea as much as possible!  It is easy to get too focused on a particular way of implementing your idea only to find out that it means a lot more to you than it does to other people – especially prospective customers. Having said that, keep your own sense of control over the big picture of what you want to create, because even if you need to modify the details, you want to make sure you continue working on building something that you are passionate about!

What does the future have in store for Simiary? Where do you see Simiary in five years?

We’ve got lots of big plans! We hope in five years that our software will be the go-to-place for researchers and students when they have a new idea and are starting to explore the space surrounding their research problem. There’s huge potential to build out our interactive search software to more fully support the collaborative sensemaking tasks that research groups engage with.

We also are set on developing partnerships with large content producers of data and publications. Imagine if you could explore the entire holdings of the British Library, the Library of Congress, and all the outputs of a publishing house – from multiple perspectives? This is our vision for how Simiary’s software will be helping content producers make their collections more discoverable.