Content Hash List API (v2.0.0)

The Content Hash Fingerprinting API provides robust access to a comprehensive database of content fingerprints designed for advanced content moderation systems. The API leverages multiple hashing algorithms including MD5, SHA256, SHA512, and PDQ to ensure maximum coverage and accuracy in content identification.

Key features and use cases

The TCAP Archive is a repository of known terrorist or violent extremist content (TVEC) media files, including images, videos and documents. The TCAP Archive Hash List API allows platforms to ingest the hashes produced from these media files in bulk, so they can use them in their content moderation processes.

The TCAP Archive’s hashes are distinct from any existing TVEC hash lists, as they complement Tech Against Terrorism’s proactive monitoring of terrorist internet usage by its team of open-source intelligence specialists. By leveraging this expertise, alongside a suite of automated monitoring capabilities, the TCAP Archive hash list reflects content created and uploaded over a number of years by a range of violent Islamist and violent far-right terrorist entities.

Authentication

In order to use any of the Hash List endpoints you will need to be an on-boarded Hash List TCAP user. These endpoints sit inside the main TCAP backend so you will be able to access all with a standard TCAP JWT token.

To obtain a token make a request to the TCAP authentication endpoint with your username and password.

POST https://beta.terrorismanalytics.org/token-auth/tcap/
{
  username: YOUR_TCAP_USERNAME,
  password: YOUR_TCAP_PASSWORD,
}

Response

The Authentication endpoint returns the following data on each request:

token : String Token to be used to on each request you make the the API as a Bearer token
user : User Your system user information

Hash List by Ideology

GET /hash-list/v2/all

This Hash List API endpoint retrieves a hash list file filtered by a specified ideology.

Parameters

ideology: (Type: String, Required: Yes): Specifies the ideology to filter the results.

GET /api/hash-list/:ideology  <'islamist' | ' far-right ' | 'all' >

Response

The Hash List endpoint returns the following data on each request:

count : Integer Total number of hash records available
next : String URL of the next page results
previous : String URL of the previous page results (null if first page)
checkpoint : String Timestamp-based checkpoint identifier for synchronization
results : Array Array of Hash objects

Hash Object Fields

hash_digest : String The computed hash value
algorithm : "MD5" | "SHA256" | "SHA512" | "PDQ" The algorithm used to generate the hash
ideology : 'islamist' | 'far-right' | 'all' Content classification category
file_type : 'String' Source file format
deleted : 'Boolean' If the file has been removed from the system
updated_on : 'Float' Unix timestamp of the last update

Pagination

The API implements cursor-based pagination using timestamp and ID pairs. Results can be traversed using the next and previous URLs provided in the response.

Query Parameters

limit: Number of results per page (default: 1000)
offset: Starting position for pagination
order: Sort order for results (asc/desc)
after: Cursor value for pagination (format: timestamp,id)

Implementation Notes

Multiple hashing algorithms provide redundancy and enhanced detection capabilities
Checkpoint field enables efficient delta updates for client-side caching
Each hash entry includes metadata for content categorization and tracking
Real-time updates reflected through updated_on timestamps
Deleted flag allows for soft deletion while maintaining hash history

Example Response

{
    "count": 19676,
    "next": "http://beta.terrorismanalytics.org/hash-list/v2/all?<params>,
    "previous": null,
    "checkpoint": "1730213563.621023,29152",
    "results": [
        {
            "hash_digest": "baf781254eb82811cdf3fe4751240eb8",
            "algorithm": "MD5",
            "ideology": "Far-right",
            "file_type": "mp4",
            "deleted": false,
            "updated_on": 1730204429.302388,
            "id": 28023
        }
        ...
    ]
}

Best Practices

Implement local caching using the checkpoint mechanism
Process updates incrementally using the pagination system
Consider implementing parallel processing for multiple hash algorithms
Store hash values in their original format to maintain precision
Monitor the deleted flag for deprecated hash values

Usage With Metas Threat Exchange

If you want to use the Hash List through Meta's Threat Exchange you can create a collaboration configuration for our API, fetch and compare PDQ Image and MD5 video hashes.

Step 1 - Install threat exchange

$ pip install threatexchange

Step 2 - Configure the default credentials

$ threatexchange config api tat --credentials '<TCAP_USERNAME>' '<TCAP_PASSWORD>'

Step 3 - Set up config

$ threatexchange config collab edit tat --create 'TAT'

Step 4 - Fetch hashes with verbose logging

$ threatexchange -v fetch

Step 5 - View dataset

$ threatexchange dataset

Step 6 - Match a piece of content

$ threatexchange match ~/path/to/image.jpg

ThreatExchange docs