Ingest Methods
And Considerations
The output should be a structured data 'container' of assets, perhaps zip'd, but also referencing the original source; and/or archive.org links.
Archive.org already has made an effort to form a decentralised version of the archive: as is available at the dweb.archive.org link; which uses, IPFS, WebTorrent and Gun.eco
IDK yet.
Process to figure it out.
- define Schema
- define whether / how to do semantic enrichment of records
- define how to do entity statements.
- Define how to store the files temporarily, then how to package them
- define how to store them somewhere
- define how to represent them in a space-time navigator of sorts.
https://github.com/opensemanticsearch/open-semantic-entity-search-api
https://github.com/opensemanticsearch https://mklab.iti.gr/prophet/install.html https://github.com/MKLab-ITI/prophet https://github.com/SkBlaz/tax2vec
Browser Plugin?
goto page. press browser plugin button. side-panel becomes visable. processes text for named entities Ability to identify names and associate to a graph. ability to grab image and associate to a graph / entity / person. ability to grab PDF or media file, and associate with its origin & related 'metadata'.
use a web-scraper https://www.google.com/search?q=github+chrome+extension+scraper
https://github.com/topics/named-entity-recognition?l=javascript
https://github.com/opensemanticsearch/open-semantic-etl
Get Header information: particularly date information
should look for ogp and schemaorg tags