Ingest Methods

And Considerations

The output should be a structured data 'container' of assets, perhaps zip'd, but also referencing the original source; and/or archive.org links.

Archive.org already has made an effort to form a decentralised version of the archive: as is available at the dweb.archive.org link; which uses, IPFS, WebTorrent and Gun.eco

Re: BrowserFunctionality

IDK yet.

Process to figure it out.

define Schema
define whether / how to do semantic enrichment of records
define how to do entity statements.
Define how to store the files temporarily, then how to package them
define how to store them somewhere
define how to represent them in a space-time navigator of sorts.

https://github.com/opensemanticsearch/open-semantic-entity-search-api

https://github.com/opensemanticsearch https://mklab.iti.gr/prophet/install.html https://github.com/MKLab-ITI/prophet https://github.com/SkBlaz/tax2vec

Browser Plugin?

goto page. press browser plugin button. side-panel becomes visable. processes text for named entities Ability to identify names and associate to a graph. ability to grab image and associate to a graph / entity / person. ability to grab PDF or media file, and associate with its origin & related 'metadata'.

https://github.com/opensemanticsearch/open-semantic-etl

Get Header information: particularly date information

should look for ogp and schemaorg tags

Edit this page

Last updated on 3/10/2023

Categories

Ingest Methods