Service features and parameters
We provide Internet wide scan data on the daily basis. Оur system analyzes the information, accessible on main pages of all registered domains worldwide.
1) Regular monitoring and alerts
Daily we make a snapshot of main pages along all existing and active domains. All possible and accessible domain names in the world (450,000,000+, all existing TLD’s). We provide access to our snapshots and ability to analyse it with any regular expressions. Pages content, html code, http headers are open to dig in. We update our domain registry with hundreds of thousands of new registered domains daily.
Text search by all existing domains in all major TLDs with daily updates (210,000,000+ domain names, daily database updates)
3) Preventive protection against threats in clearweb, .onion zone or Telegram
Due to high level of data sensitivity, we demand to fill this form in mandatory manner. Also, be aware of non-disclosure agreement, which is necessary for any type of conversation.
Our crawler, on the initial level, was compatible with n-grams and allow easily collect and analyse it n-grams on the fly.
You can analyse your content and content of your competitor in order to enhance content marketing strategies and keep up to date with competitors
Have you ever been interested in:
- How many duplicate content on site may lead to organic traffic drop?
- Why some content gain traffic easily, and some not?
- How many organic search traffic can gain your site in absolute maximum? And how it would leverage within sites pages?
We can help to answer these questions in most cases. Basically, our solution use native crawler features to analyse n-grams. Here we provide:
- Collecting and analyzing n-grams on your sites and sites of your competitors (unlimited page number).
- Calculating your unique and rare n-grams, comparing to competitors, looking for correlations with page metrics.
- Calculating page level and site level “KEYWORD RANK”, representing content estimated maximum traffic from organic sources. In short, we use several billions keywords from users organic sessions, split it to n-grams, assign weights and compare with n-grams on pages/sites we rank.
You can find simple n-gram analysis on data.statoperator.com, where we aggregated information about 10 Million main pages in Alexa top. Just enter domain name in search form there and get global statistics for every n-gram on the domain
5) Техt mining solutions
Create your own textual corpuses and thematic collections!
- Get daily thematic updates. First, you can set up simple rules and “seed” some specific theme-relevant words/keywords/regexp and aggregate output data in solid thematic collections
- Flexible output format. By default, thematic dataset contains timestamp, host, page path, and all n-grams, related to “seed” keywords/regexp
Use all n-grams benefits:
- N-grams helps you understand more about “seed” keywords and result datasets. In fact, 3/4/5-grams is a short summary of your “seed” rules. So it is easy to create new attributes for existing categories and classify sites and pages keywords occurrences.
- Increase precision – get rid of noise, homonymy, uncontemplated and “stupid” occurrences
- Increase sensitivity (recall), expanding seed keywords with neighbored n-gram words