Hash List API v1

The Hash List API provides users with a daily file containing all content hashes within the TCAP Archive database. The hash list file is updated daily, and users can submit three requests per day to retrieve the list as a single file to use in their content moderation processes.

Key features and use cases

The TCAP Archive is a repository of known terrorist or violent extremist content (TVEC) media files, including images, videos and documents. The TCAP Archive Hash List API allows platforms to ingest the hashes produced from these media files in bulk, so they can use them in their content moderation processes.

The TCAP Archive’s hashes are distinct from any existing TVEC hash lists, as they complement Tech Against Terrorism’s proactive monitoring of terrorist internet usage by its team of open-source intelligence specialists. By leveraging this expertise, alongside a suite of automated monitoring capabilities, the TCAP Archive hash list reflects content created and uploaded over a number of years by a range of violent Islamist and violent far-right terrorist entities.

Authentication

In order to use any of the Hash List endpoints you will need to be an on-boarded Hash List TCAP user. These endpoints sit inside the main TCAP backend so you will be able to access all with a standard TCAP JWT token.

To obtain a token make a request to the TCAP authentication endpoint with your username and password.

POST https://beta.terrorismanalytics.org/token-auth/tcap/
{
  username: YOUR_TCAP_USERNAME,
  password: YOUR_TCAP_PASSWORD,
}

Response

The Authentication endpoint returns the following data on each request:

  • token : String Token to be used to on each request you make the the API as a Bearer token

  • user : User Your system user information

Endpoints

The Hash List API includes 3 endpoints including a development endpoint:

GET /api/hash-list/:ideology  <'islamist' | ' far-right ' | 'all (default)' >

Endpoint for testing your hash list integration

GET /api/hash-list/dev

Response

The Hash List endpoint returns the following data on each request:

  • file_url : String The pre-signed URL to the Hash List file (JSON) expires after 5 minutes

  • file_name : String Name of the file

  • created_on : Date The date the file was created

  • total_hashes : Int The number of hashes in the given file

  • ideology : 'islamist' | 'far-right' | 'all' The ideology made in the request, will return 'all' on the /dev endpoint

Hash List JSON File

The hash list file will contain an array of hash objects with the following properties.

  • id : Int Hash ID

  • hash_digest : String The hash of the content

  • algorithm : "MD5" | "SHA256" | "SHA512" | "PDQ" The algorithm used to generate the hash

  • ideology : 'islamist' | 'far-right' | 'all' (default) The ideology associated with the original content

  • file_type : 'String' The file type of the original content

Usage With Metas Threat Exchange

If you want to use the Hash List from within Meta's Threat Exchange you can create a collaboration configuration to out api, fetch and compare PDQ Image and MD5 video hashes.

Step 1 - Install threat exchange

$ pip install threatexchange

Step 2 - Configure the default credentials

$ threatexchange config api tat --credentials '<TCAP_USERNAME>' '<TCAP_PASSWORD>'

Step 3 - Set up config

$ threatexchange config collab edit tat --create 'TAT'

Step 4 - Fetch hashes with verbose logging

$ threatexchange -v fetch

Step 5 - View dataset

$ threatexchange dataset

Step 6 - Match a piece of content

$ threatexchange match ~/path/to/image.jpg

ThreatExchange docs

Usage Limits

The development endpoint allows for up to 50 requests per day.

The production endpoint will allow up to 3 requests per day.

Usage limits apply to both the API and Threat Exchange usage