Architecture

The main design goal behind VulnIQ is to make it a solution that works for everyone, regardless of their budgets. For example, you can build an "AI" that runs some form of natural language processing to discover patterns by itself. But both developing and operating that solution would require significantly larger budgets which makes it a nonsolution.

Our mission is to provide maximum value for your budget.

At the same time, our solution is designed to scale when necessary. All components can be horizontally or vertically scaled. For example to scale on AWS, a customer can use AWS services for components such as RDBMS, Elasticsearch, file storage and the caching service. Data collection and processing is also designed to scale. If you were running 100 backend processes each process would poll jobs from the database and work independently.
Or you can deploy the backend process on a single large server and increase the number of processing threads.

Design Principles

  1. Works for all budgets:
    • Should run on a USD 80/month infrastructure (probably good enough for most cases).
    • Should also run on a 8K/month infrastructure for customers that want to process large data sets.
  2. Fully automated: Data should be processed automatically, without requiring human intervention. Data may be coming from a manually curated source but VulnIQ itself should not require any manual processing.
  3. API first: All data and functionality, including administration and similar functions, should be available via REST APIs
  4. Scalable: It should be able to scale up and down.
  5. No license fees: VulnIQ only uses open source third party components and no additional license is required to run VulnIQ.

Components

Components forming the VulnIQ solution can be summarized as :

  • The main database: This is a Mariadb instance. This is the most critical component as all other services depend on it.
  • Backend: Collects and processes data. You can run as many instances as you like. Stopping backend processes will not affect end user experience, only the data will not be updated while backend processes are down.
  • Frontend: The web application and REST APIs. You can run as many instances as you like behind a reverse proxy/load balancer.
  • File store: This is a configurable option. The default is to store files on the local file system which does not require any additional services. You can also store files on S3 or similar object stores. See documentation for more details.
  • Elasticsearch : VulnIQ uses an elasticsearch backend to index and search processed data. VulnIQ uses only the Apache 2 licensed, open source version of elasticsearch.
    You can also use a cloud service such as the AWS elasticsearch service. Your users or API calls will not be directly interacting with elasticsearch endpoints, all traffic will be proxied by VulnIQ APIs.
  • Web page, URL processor, a custom application developed by VulnIQ
  • Caching service, a memcached instance by default

Backend

VulnIQ backend process is responsible of data collection and processing. When it is down, all new data processing will stop, but all front-end services like APIs, web UI etc will continue to work independent of the backend process. This allows the backend process to be updated, stopped-started without affecting end user experience.

Frontend

API

VulnIQ provides simple and unified APIs for all data. Regardless of the data type, all data have certain common properties which can be accessed using the same endpoints. For some data types, data type specific endpoints are also provided.
For example git commits, CVEs and OVAL definitions all have the same basic attributes such as id, name, create date etc. But git commits can also have diffs associated with them where OVAL definitions have xmls attached to them.

Web UI

A web application that utilizes VulnIQ APIs. Built using vuejs. Not a completely single-page application per se, but relies heavily on ajax.
An API first approach is followed from the beginning.

Data Storage

RDBMS

A mariadb (mysql with a better license) is the default database for VulnIQ. This is the core data store and the only single point of failure in VulnIQ architecture. Without the database neither the front-end nor the back-end will work.

File Store

By default all files are stored on the local file system. These files include web page screenshots, original copies of data, source codes for git repositories and more.

Search

Data Tagging

VulnIQ adds tags/labels to all data it processes. Data tagging can be used in addition to full text searches. For example all data tagged with cve-2019-11555 can be queried easily without running a full text search.

Elasticsearch

VulnIQ uses elasticsearch for full-text indexing and searching. All processed data can be pushed into an elasticsearch instance for indexing.
VulnIQ main process sanitizes data before pushing them into elasticsearch to reduce the amount of data processed by elasticsearch and to reduce the noise. For example instead of pushing a 100KB html source, VulnIQ backend converts the html into plain text and removes irrelevant parts like headers, footers etc to minimize the amount of data added to elasticsearch.