Newsletter #3

Time has passed since newsletter #2, and we are proud of announcing major new features and new information categories.

In the past, we were scanning 10 ports on the Internet, and we have now raised the bar to 30 ports. We have added new information categories like ctl (Certificate Transpancy Logs), sniffer (listening to Internet background noise) and onionscan (Dark Web scanning). We have also enriched datascan information category with X509 certificates data when an SSL/TLS connection is negociated, threatlist information category now has some ONYPHE entries for botnets (mirai-like, for instance) and resolver information category is enriched with geolocation information. A number of new extractions are also performed on data, like extracting the HTML description, copyright or keywords.

But the most important feature is the capability to classify a remote device and identify its vendor, product or even productversion. For instance, we are able to identify if a source is, say, a Mikrotik device, which product and in some cases its exact productversion. The same is true for a fair number of other device or product vendors. The consequence of this version identification is that we are able to add a cpe filter allowing users to search for CVE vulnerabilities by just using that information returned from most new datascan entries.


More ports scanned

We used to scan 10 ports and we have now reached the 30 ports milestone. They are scanned once a month on the full IPv4 Internet address space. The list is the following:

80/tcp (http), 443/tcp (https), 7547/tcp (tr069), 8080/tcp (http), 22/tcp (ssh), 21/tcp (ftp), 25/tcp (smtp), 53/tcp (dns), 110/tcp (pop3), 8000/tcp (http), 3306/tcp (mysql), 23/tcp (telnet), 3389/tcp (rdp), 554/tcp (rtsp), 111/tcp (rpc), 8888/tcp (http), 5000/tcp (upnp), 1521/tcp (oracle), 3128/tcp (http), 135/tcp (msrpc), 5555/tcp (adb), 5900/tcp (vnc), 9200/tcp (elasticsearch), 1433/tcp (mssql), 139/tcp (netbios), 2323/tcp (telnet), 445/tcp (smb), 502/tcp (modbus), 102/tcp (s7comm), 11211/tcp (memchached).

That does not mean we are only scanning these ports, it is just that we guarantee a once a month frequency. We are also scanning other ports with no specific algorithm, just for the love of research.


New information category: ctl

We have started integrating Certificate Transparency Logs. As it is quite a huge quantity of data, we are monitoring all the Cloudflare Nimbus (all years) logs at the moment. We will monitor more CTLs in the future.

By monitoring CTLs, we also perform massive DNS requests to enrich resolver information category. That is, our passive DNS technology is now enriched with information gathered from CTLs.

As a mater of fact, and to give a proportion of standard DNS requests versus those taken from CTLs, we can say that resolver information category is now filled with 22% of DNS information gathered thanks to CTLs. As we also perform reverse DNS requets for the full IPv4 address space, we can say that this source involves around 75% of DNS data. The rest is shared between DNS requests performed from extracting hostnames and IP addresses from pastries or sniffer information categories.

Example of a CTL entry from ctl information category:

category:ctl tld:fr


New information category: sniffer

Since May this year, we have started to listen to Internet background noise. We are listening for both TCP and UDP traffic and are performing passive OS fingerprinting on TCP with our own technology.

We detect patterns for some botnets (such as mirai-like ones), and when we detect them, we launch a synscan along with a datascan against the remote device.

For instance, we perform active datascan requests when a mirai-like signature is detected against the potentially infected host. Thus, application data is written to the datascan information category with a mirai tag, and the same is true for the synscan information category. As we enrich other information categories, we add a tag to keep track of that information when creating synscan or datascan entries.

From that activity, we have created our threatlist called “ONYPHE – botnet/mirai”. Thus, you can use the Web search or the API to search potentially infected hosts. For instance, by running the following search (as long as you have proper credentials) you can find infected hosts in France:

category:threatlist country:FR threatlist:"ONYPHE - botnet/mirai"

Or you can analyze the data returned from datascan information category by using the mirai tag:

category:datascan tag:mirai


New information category: onionscan

In fact, we were already scanning the Dark Web but it was integrated within the datascan information category. As we didn’t want to participate in potentially illegal activities, we chose to create a brand new information category to control which entities may be able to access that content. The new information category is called onionscan and is not accessible with free credentials.

The fields you may find in this category are the same ones you could find in the datascan information category, but in onionscan information category you will only get HTTP protocol related information. You can use the data filter to perform searches on content gathered from scanned .onion Web sites.

We are also working on automatically classifying Dark Web content in order to better control which ones may be accessible. For instance, we don’t want to participate in pedo-pornography, thus we want to be able to state that a .onion Web site is classified as such and filter out this data.

To search for string contained in .onion Web sites, just use the data filter:

category:onionscan data:market

Well, this one has even some cryptomining JavaScript from CoinHive, and it is tagged as such in the tag filter:


New fields for resolver information category

As we have seen, we gather DNS information from our different information categories. We needed a way to know which source gathered that information, and we decided to add a source filter. This filter states from which information category the DNS lookup happened. For instance, when we listen to the Internet background noise, we perform reverse PTR lookups. Thus, data is written to resolver information category with a sniffer value for the source field.

Also, we added geolocation information to resolver information category entries. Now, you will be able to search from asn, country or city (to name a few) on resolver information category.

category:resolver country:FR source:sniffer


New fields for pastries information category

Since July, pastries information category has been enriched with new fields: subdomains, host and tld. When a hostname is extracted from pastries content, we also split this hostname into multiple new fields to make it easy to search specific values within the pastries information category.

For example, searching all pastries within the .fr TLD is as simple as running the search:

category:pastries tld:fr


New data enrichments

From the datascan information category, we now also extract HTML copyright, description or keywords information. These fields are searchable via app.http.copyright, app.http.description or app.http.keywords along with already existing filters app.http.title or app.http.realm.

We added a new enrichment field: device. By using pattern matching, we are able to identify different information from a remote device, like its class, vendor, product or even productversion. This information is searchable by using filters device.class, device.product, device.productvendor or device.productversion.

For example, if you want to gather information on D-Link devices, just perform the following search (as long as you have the right credentials to access this kind of data):

category:datascan device.productvendor:"D-Link"

Well, it looks like this D-Link DCS-960L is infected by a mirai-like variant as you can see in the tag field. You can also note that the device is classified as a Router.

Regarding HTTPS datascan, we now extracts X509 certificate information.

For instance, looking for all certificates issued by a known CA is as easy as:

category:datascan tls:true issuer.organization:"Super Micro Computer Inc."

Well, another one infected by a mirai-like variant.

Which brings us to the next addition: the cpe filter. As we identify the product, productvendor and hopefully the productversion, we can set the cpe field and perform lookups on that normalized way of naming things to map with existing CVEs:

For instance, if we search for the latest libssh vulnerability:

category:datascan cpe:"cpe:/a:libssh:libssh:0.6.0"

And then search on the NIST portal:



You may have wondered why it took so long for us to write this newsletter. We trust that by getting knowledge of these new additions you understand that we were just working hard on those new features.

In our last newsletter, we claimed that we would unveil the commercial offer. Unfortunately, it is not true yet. Only a couple of weeks remaining before we can announce it.

Of course, you will be informed provided you have created your free user account at the below link. At the minimum, a free registration will allow you to use the current API:

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.