We made Elicit more affordable by applying new research methods

We launched an update to Elicit 's Abstract summary!

The Abstract summary is one of the most visible features in Elicit. Many of you tell us in feedback surveys that this concise, question-relevant summary of the abstract is incredibly helpful.

Previously, we used a fine-tuned GPT-3 davinci model to generate these summaries. This davinci model is one of the biggest language models available. At first, the off-the-shelf model wrote summaries that were too inaccurate and hard to understand. So a little over a year ago, we created our own dataset of abstract summaries with highly vetted contractors and fine-tuned the model on that dataset.

Unfortunately, using this model became very expensive as Elicit grew. We tried multiple times to train a smaller model instead. But smaller models compromised accuracy in ways we didn't feel comfortable with, given the importance of this feature. To maintain accuracy with a smaller model, we would have to spend a lot of money collecting more examples to add to our dataset.

Then, 1 month ago, the AI lab Anthropic shared a research paper where they used language models to give feedback on tasks, instead of human contractors. Within 4 days, our ML intern Charlie implemented this technique in a prototype!

A few things about this work are really exciting:

It's cool to be able to convert a research idea into product improvements that make research better for almost 150,000 people in days. We love virtuous research <> product cycles.
We're now using an open source model instead of GPT-3, which adds to the diversity of models Elicit uses.
Because we can use open source models, this new model costs 10% of what the previous model cost! Without compromising the quality of the summaries too much, we were able to bring down costs significantly. This helps us keep Elicit much more accessible for all of you.*
This work applies language models in a whole new context: to evaluate the work of other language models. As capabilities of language models scale, our ability to evaluate their work will need to scale too.

Charlie's summary of the work is on Twitter here. Check it out to understand the details & give him kudos!

Best,

Jungwon

*We do estimate that this new model will be noticeably worse in ~ 10% of cases. If you're starting to see issues, please share feedback in the app, email, or Slack.

You're receiving this email because you signed up for Elicit, an AI research assistant. To view this email in your browser or share it, use this link. You can see recent product announcements here. If you don't want to get these emails, you can update your preferences or unsubscribe from this list.