Skip to content
On this page
On this page

Ingest Methods

And Considerations

The output should be a structured data 'container' of assets, perhaps zip'd, but also referencing the original source; and/or archive.org links.

Archive.org already has made an effort to form a decentralised version of the archive: as is available at the dweb.archive.org link; which uses, IPFS, WebTorrent and Gun.eco

Re: BrowserFunctionality

IDK yet.

Process to figure it out.

  1. define Schema
  2. define whether / how to do semantic enrichment of records
  3. define how to do entity statements.
  4. Define how to store the files temporarily, then how to package them
  5. define how to store them somewhere
  6. define how to represent them in a space-time navigator of sorts.

https://github.com/opensemanticsearch/open-semantic-entity-search-api

https://github.com/opensemanticsearch https://mklab.iti.gr/prophet/install.html https://github.com/MKLab-ITI/prophet https://github.com/SkBlaz/tax2vec

Browser Plugin?

goto page. press browser plugin button. side-panel becomes visable. processes text for named entities Ability to identify names and associate to a graph. ability to grab image and associate to a graph / entity / person. ability to grab PDF or media file, and associate with its origin & related 'metadata'.

  1. use a web-scraper https://www.google.com/search?q=github+chrome+extension+scraper

  2. https://github.com/topics/named-entity-recognition

  3. https://github.com/topics/named-entity-recognition?l=javascript

  4. https://freme-project.github.io/api-doc/full.html

  5. https://github.com/freme-project

https://github.com/opensemanticsearch/open-semantic-etl

Get Header information: particularly date information

should look for ogp and schemaorg tags

Edit this page
Last updated on 3/10/2023