crawling process

The SCIARY uses specific software for web data crawling. By crawling process SCIARY discovers new scientific papers and add it to scientific database. SCIARY uses unique crawling algorithms that detect web sites that may contain scientific data and will involve they in crawl.

Crawling process begins by searching algorithm which finds web sites with scientific content. Searching will process different automatic content management systems of websites which are oriented into free scientific papers distribution (For example OJS, DSpace and others). SCIARY visits main pages of each web site and detects links which may be related to scientific content and add them to crawl list. Information that is be detected as scientific under some probability level will be add to SCIARY index.

How SCIARY accesses websites?

In most cases SCIARY uses sitemap file which contain list of webpage for crawling process. Webmasters can modify sitemap files to manage crawl process remotely. Also website crawl is processed according to robots.txt file, by which webmasters can close specific link which does not related to open information.

Which data will be saved in SCIARY database?

During crawling process, SCIARY detects and extracts information, which will be useful in future for scientific investigation by scientists around of world. SCIARY saves useful information inside specific database. Data that will be saved incudes description of different scientific and educational papers and documents.

NOTE. SCIARY does not save any files in database. All loaded files, except some pictures, will be deleted after indexing process. SCIARY just saves link to scientific content.

One of the main SCIARY task is to connect users with scientific resources around the Internet. In this key SCIARY does not hub scientific data - SCIARY redirects users to website, which hosts this information. SCIARY increases interaction of users with this web resource.

