STATOPERATOR Demo service features and parameters
We provide daily Internet wide scan data on a commercial basis. Оur system analyzes the information, accessible on main pages of all registered domains worldwide.
- statoperator.com is our production solution in demo mode
- in demo version (on statoperator.com) we crawl 1 Million domains everyday. Approximately between 16:00 and 16:30 UTC
- we are checking content and html code with more than 300 regular expressions daily and visualise it on our dashboard
- we share some dashboards to public, polititians for example, but we do not share all we have
By the way, you can suggest us to check any regular expression, and if it seem interesting, we will add it to our demo version. Please, send request to email@example.com.
1) Regular monitoring and alerts
Everyday we make a snapshot of main pages along all existing and active domains. 450 Million domains, more than in 1500 TLD (top level domains).
Weekly, we update our domain registry with hundreds of thousands of new registered domains. Estimated coverage is 90-95% of all possible and accessible domains in the world. We provide access to our snapshots and ability to analyse it with any regular expressions. Pages content, html code, http headers are open to dig in.
2) Antifishing and Brand protection
Be aware of offensive threats against you. Scum sites, stolen identities, brand copies and clones, and other black hat competitors practices.
Every day we calculate “fingerprint”, extracted from html templates of main pages and pages content n-grams. This procedure repeats daily and covering 90-95% of active domains, correctly responding on 80 or 443 port.
Antifishing solution generates reports with clusters of cloned or massively produced sites, similar to your sites or information systems.
- We can adjust update frequencies and apply different polices to different types of domains (new domain, black list, white list, domain karma, etc)
- We allow to tune “similarity level” and affect to precision/recall values. So you can fit it to your needs.
3) Preventive protection against illegal threats in clearweb, .onion zone or Telegram
Due to high level of data sensitivity, we demand to fill this form in mandatory manner. Also, be aware of non-disclosure agreement, which is necessary for any type of conversation.
Send request to firstname.lastname@example.org, required information:
- Company name
- Country of jurisdiction
- Country of physical presence
- Company details and type of ownership
- E-mail on corporate domain
- Contact Person
- Contact Phone
4) N-gram based analysis – investigate your content
Our crawler, on the initial level, was compatible with n-grams and allow easily collect and analyse it n-grams on the fly.
Yo can analyse your content and content of your competitor in order to enhance content marketing strategies and keep up to date with competitors
Have you ever been interested in:
- How many duplicate content on site may lead to organic traffic drop?
- Why some content gain traffic easily, and some not?
- How many organic search traffic can gain your site in absolute maximum? And how it would leverage within sites pages?
We can help to answer these questions in most cases. Basically, our solution use native crawler features to analyse n-grams. Here we provide:
- Collecting and analyzing n-grams on your sites and sites of your competitors (unlimited page number).
- Calculating your unique and rare n-grams, comparing to competitors, looking for correlations with page metrics.
- Calculating page level and site level “KEYWORD RANK”, representing content estimated maximum traffic from organic sources. In short, we use several billions keywords from users organic sessions, split it to n-grams, assign weights and compare with n-grams on pages/sites we rank.
You can find simple n-gram analysis on data.statoperator.com, where we aggregated information about 10 Million main pages in Alexa top. Just enter domain name in search form there and get global statistics for every n-gram on the domain
5) Техt mining solutions – create your own textual corpuses and thematic collections
- Get daily thematic updates. First, you can set up simple rules and “seed” some specific theme-relevant words/keywords/regexp and aggregate output data in solid thematic collections
- Flexible output format. By default, thematic dataset contains timestamp, host, page path, and all n-grams, related to “seed” keywords/regexp
Use all n-grams benefits:
- N-grams helps you understand more about “seed” keywords and result datasets. In fact, 3/4/5-grams is a short summary of your “seed” rules. So it is easy to create new attributes for existing categories and classify sites and pages keywords occurrences.
- Increase precision – get rid of noise, homonymy, uncontemplated and “stupid” occurrences
- Increase sensitivity (recall), expanding seed keywords with neighbored n-gram words
Besides, we provide data sets for those clients who wish to take advantage of them for personal use. Format: Avro/CSV/JSON.