Skip to content
How-to7 min read

How researchers manage web references

Reference managers cover papers; web sources fall through the cracks. A workflow for researchers who collect across blogs, preprints, datasets, and code.

Researchers have Zotero, Mendeley, Endnote for citations. They have nothing equally polished for the web sources that don't look like papers — blog posts, preprints in motion, dataset pages, code repositories, conference notes. A workflow that closes the gap.

The two kinds of references

  • Citable. Things with a DOI, an author list, a publication. Reference managers handle these well.
  • Trackable. Things you want to find again but won't cite directly. Reference managers handle these badly, because the metadata is sparse and the citation discipline is overhead.

A bookmark archive is the right home for the second category. The trick is to make it talk to the first category cleanly.

The workflow

1. Capture in the right tool

Citable papers go straight into Zotero (or similar) with the browser connector. Everything else — talks, blogposts, dataset readmes, threads, GitHub repos, code snippets — goes into the bookmark archive via the browser extension.

2. Tag with research topics, not folders

A folder structure that mirrors your projects gets stale within a year. Topics that mirror your research interests compose across projects and survive them. A bookmark on contrastive learning tagged contrastive-learning and self-supervisedstays useful even after the original project ends.

3. Use collections for context

A separate collection per major project, so the project's bookmark file is its own self-contained artefact. When the paper comes out, the bookmark collection can be archived as supplementary material — every link you consulted, in one file.

4. Cross-link to your reference manager

For the small fraction of bookmarks that become citations, include the Zotero entry key in the bookmark description. When you eventually write the paper, a quick search across bookmarks for the entry key surfaces every web source that fed the paper.

5. Periodic broken-link sweep

Web sources rot. A quarterly pass through the archive's broken links — replacing dead URLs with web.archive.orgsnapshots — keeps the trail intact. (mnera.io Pro ships a broken-link checker that does this in one pass.)

Why linked data matters here

Research archives are durable assets. The PhD bookmark archive should still be readable in twenty years; the early-postdoc one in forty. Vendor lock-in is unacceptable on that horizon.

Storing the archive as W3C linked data in a Solid pod is the only consumer bookmark format with a credible 20-year guarantee. Turtle files are plain text; the vocabularies are W3C standards; the storage protocol is open. The tool you use today is replaceable; the archive isn't.

Tip: Tag every research bookmark on the way in, even if it's a single rough topic. Retroactive tagging never happens.

A research-grade bookmark archive in your own pod

mnera.io stores every link as linked data, supports per-project collections, full-text search, broken-link checking, and exports to HTML, Markdown, JSON, or Turtle at any time.