Click on https://ecacorpus.eu/misc/ecac.zip
Unzip and click on index.html, the only file in the root directory, no installation is required.
The zip files contains
only data, no programs;
it is
Portable Web Package (PWP),
essentially a directory structure viewable with web browser.
The ECA Corpus can also be explored online at
https://ecacorpus.eu;
both, downloaded and online, are identical, as long as they have the same
timestamp.
A single zip file can contain the full structured data of an organisation such as ECA since its creation in 1977; ECA is a good example for this exercise. The zip file contains: tabular data, full text of the files (eventually), and generated reports with graphics. Anyone can simply unzip the file and:
Deeper analysis using both current and future technologies can be envisaged. For example:
A dossier represents a publication or other document types. Each dossier is identified by a unique dossier number. There are two main components:
The Register is the list of dossiers. It is a table containing the dossier number, title, and several additional fields. Once the Register has been cleaned and completed, its data will later be integrated into the Store. Currently, the https://ecacorpus.eu website is generated from the Register.
The Store contains the full text of each dossier in all available languages and formats.
It is organized as a directory structure with a root folder
named store, which contains a set of numbered folders;
each folder corresponds to one dossier.
At present, the Store consists of an empty skeleton directory structure,
ready to receive full-text files in all languages and formats.
All texts should be converted to plain text to facilitate further
processing and ensure long-term preservation.
If a dossier corresponds to a section of a larger document, only the relevant sections should be included. For example, in the case of the Official Journal C139/1979, only the material from page 15 concerning ECA should be included. Each opinion should be copied to the appropriate dossier, as this issue of the Official Journal contains six opinions.
Corrigenda must not be counted as separated reports, just corrections of original reports. There are seven corrigenda in the ECA Corpuse database, so the distortion is small.
Might not be up to date.