STATOPERATOR service features and parameters
We provide Internet wide scan data on the daily basis. Оur system analyzes the information, accessible on main pages of all registered domains worldwide.
- on statoperator.com we provide insights daily, аpproximately between 16:00 and 17:00 UTC
- we share some dashboards to public, politicians for example, but we do not share all we have
1) Regular monitoring and alerts
Daily we make a snapshot of main pages along all existing and active domains. More than 450,000,000 domains, all existing TLD’s (top level domains).
Weekly, we update our domain registry with hundreds of thousands of new registered domains. All possible and accessible domains in the world. We provide access to our snapshots and ability to analyse it with any regular expressions. Pages content, html code, http headers are open to dig in.
2) Antifishing and Brand protection
Be aware of offensive threats against you. Scum sites, stolen identities, brand copies and clones, and other black hat competitors practices.
Every day we calculate “fingerprint”, extracted from html templates of main pages and pages content n-grams. This procedure repeats daily and covering all active domains, correctly responding on 80 or 443 port.
Antifishing solution generates reports with clusters of cloned or massively produced sites, similar to your sites or information systems.
- We can adjust update frequencies and apply different polices to different types of domains (new domain, black list, white list, domain karma, etc)
- We allow to tune “similarity level” and affect to precision/recall values. So you can fit it to your needs.
3) Preventive protection against threats in clearweb, .onion zone or Telegram
Due to high level of data sensitivity, we demand to fill this form in mandatory manner. Also, be aware of non-disclosure agreement, which is necessary for any type of conversation.
4) N-gram based analysis – investigate your content
Our crawler, on the initial level, was compatible with n-grams and allow easily collect and analyse it n-grams on the fly.
You can analyse your content and content of your competitor in order to enhance content marketing strategies and keep up to date with competitors
Have you ever been interested in:
- How many duplicate content on site may lead to organic traffic drop?
- Why some content gain traffic easily, and some not?
- How many organic search traffic can gain your site in absolute maximum? And how it would leverage within sites pages?
We can help to answer these questions in most cases. Basically, our solution use native crawler features to analyse n-grams. Here we provide:
- Collecting and analyzing n-grams on your sites and sites of your competitors (unlimited page number).
- Calculating your unique and rare n-grams, comparing to competitors, looking for correlations with page metrics.
- Calculating page level and site level “KEYWORD RANK”, representing content estimated maximum traffic from organic sources. In short, we use several billions keywords from users organic sessions, split it to n-grams, assign weights and compare with n-grams on pages/sites we rank.
You can find simple n-gram analysis on data.statoperator.com, where we aggregated information about 10 Million main pages in Alexa top. Just enter domain name in search form there and get global statistics for every n-gram on the domain
5) Техt mining solutions – create your own textual corpuses and thematic collections
- Get daily thematic updates. First, you can set up simple rules and “seed” some specific theme-relevant words/keywords/regexp and aggregate output data in solid thematic collections
- Flexible output format. By default, thematic dataset contains timestamp, host, page path, and all n-grams, related to “seed” keywords/regexp
Use all n-grams benefits:
- N-grams helps you understand more about “seed” keywords and result datasets. In fact, 3/4/5-grams is a short summary of your “seed” rules. So it is easy to create new attributes for existing categories and classify sites and pages keywords occurrences.
- Increase precision – get rid of noise, homonymy, uncontemplated and “stupid” occurrences
- Increase sensitivity (recall), expanding seed keywords with neighbored n-gram words