User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

kep:data_container [2017/02/21 15:06] (current)
andreeab created
Line 1: Line 1:
 +The Data Container is a platform accessible through a regular web browser that enables users to search through massive amounts of documents, either crawled / scraped from public websites or uploaded manually by them.
 +====== Infrastructure ======
 +DC is built on top of [[https://​​pudo/​aleph|Aleph]],​ a visual tool for exploring large datasets. Under its hood are a few key components:
 +  * an **elasticsearch** cluster maintaining fast indexes, handling all the search and filtering operations
 +  * a set of crawlers and scrapers, some of them built within Aleph, some of them externally triggered
 +  * **celeryd**,​ a queue runner whose job is to make sure all requests are executed properly
 +====== Key features ======
 +  * fast search through large sets of documents
 +  * intuitive search and filtering interface
 +  * possibility to add own document sources **and set a privacy level** by defining if it will be public or shared only with the selected platform members
 +  * integration with DocumentCloud (Dropbox support is on the roadmap also)
 +  * OAuth authentication (i.e. Twitter, Google)
 +====== Extra features ======
 +As a plus over Aleph'​s own features, we are developing a few more useful things:
 +  * **possibility to upload own datasets** (starting with PDF only, future plans include more formats)
 +  * **synchronized split search windows**: no more cluttering the UI with filters, keywords, search results and previews! We are designing an interface that will separate the search query and the results as separate windows (optional, can be turned off), enhancing a range of dual screen search operations.
 +  * **batch search** by uploading a file containing multiple queries (one per line). The results will be rendered all at once in vertical UI tabs (for now), eliminating the need for doing the same searches multiple times (e.g. investigating/​monitoring a set of keywords or feeding the system with search results from another source).
 +  * integrations with: [[owncloud|ownCloud]],​ FTP resource
kep/data_container.txt ยท Last modified: 2017/02/21 15:06 by andreeab