Content Hash List API (v2.0.0)
The Content Hash Fingerprinting API provides robust access to a comprehensive database of content fingerprints designed for advanced content moderation systems. The API leverages multiple hashing algorithms including MD5, SHA256, SHA512, and PDQ to ensure maximum coverage and accuracy in content identification.
Key features and use cases
The TCAP Archive is a repository of known terrorist or violent extremist content (TVEC) media files, including images, videos and documents. The TCAP Archive Hash List API allows platforms to ingest the hashes produced from these media files in bulk, so they can use them in their content moderation processes.
The TCAP Archive’s hashes are distinct from any existing TVEC hash lists, as they complement Tech Against Terrorism’s proactive monitoring of terrorist internet usage by its team of open-source intelligence specialists. By leveraging this expertise, alongside a suite of automated monitoring capabilities, the TCAP Archive hash list reflects content created and uploaded over a number of years by a range of violent Islamist and violent far-right terrorist entities.
Authentication
In order to use any of the Hash List endpoints you will need to be an on-boarded Hash List TCAP user. These endpoints sit inside the main TCAP backend so you will be able to access all with a standard TCAP JWT token.
To obtain a token make a request to the TCAP authentication endpoint with your username and password.
POST https://beta.terrorismanalytics.org/token-auth/tcap/
{
username: YOUR_TCAP_USERNAME,
password: YOUR_TCAP_PASSWORD,
}Response
The Authentication endpoint returns the following data on each request:
token:StringToken to be used to on each request you make the the API as aBearertokenuser:UserYour system user information
Hash List by Ideology
GET /hash-list/v2/allThis Hash List API endpoint retrieves a hash list file filtered by a specified ideology.
Parameters
ideology: (Type: String, Required: Yes): Specifies the ideology to filter the results.
GET /api/hash-list/:ideology <'islamist' | ' far-right ' | 'all' >Response
The Hash List endpoint returns the following data on each request:
count:IntegerTotal number of hash records availablenext:StringURL of the next page resultsprevious:StringURL of the previous page results (null if first page)checkpoint:StringTimestamp-based checkpoint identifier for synchronizationresults:ArrayArray of Hash objects
Hash Object Fields
hash_digest:StringThe computed hash valuealgorithm:"MD5" | "SHA256" | "SHA512" | "PDQ"The algorithm used to generate the hashideology:'islamist' | 'far-right' | 'all'Content classification categoryfile_type:'String'Source file formatdeleted:'Boolean'If the file has been removed from the systemupdated_on:'Float'Unix timestamp of the last update
Pagination
The API implements cursor-based pagination using timestamp and ID pairs. Results can be traversed using the next and previous URLs provided in the response.
Query Parameters
limit: Number of results per page (default: 1000)offset: Starting position for paginationorder: Sort order for results (asc/desc)after: Cursor value for pagination (format: timestamp,id)
Implementation Notes
Multiple hashing algorithms provide redundancy and enhanced detection capabilities
Checkpoint field enables efficient delta updates for client-side caching
Each hash entry includes metadata for content categorization and tracking
Real-time updates reflected through
updated_ontimestampsDeleted flag allows for soft deletion while maintaining hash history
Example Response
{
"count": 19676,
"next": "http://beta.terrorismanalytics.org/hash-list/v2/all?<params>,
"previous": null,
"checkpoint": "1730213563.621023,29152",
"results": [
{
"hash_digest": "baf781254eb82811cdf3fe4751240eb8",
"algorithm": "MD5",
"ideology": "Far-right",
"file_type": "mp4",
"deleted": false,
"updated_on": 1730204429.302388,
"id": 28023
}
...
]
}Best Practices
Implement local caching using the checkpoint mechanism
Process updates incrementally using the pagination system
Consider implementing parallel processing for multiple hash algorithms
Store hash values in their original format to maintain precision
Monitor the deleted flag for deprecated hash values
Usage With Metas Threat Exchange
If you want to use the Hash List through Meta's Threat Exchange you can create a collaboration configuration for our API, fetch and compare PDQ Image and MD5 video hashes.
Step 1 - Install threat exchange
$ pip install threatexchangeStep 2 - Configure the default credentials
$ threatexchange config api tat --credentials '<TCAP_USERNAME>' '<TCAP_PASSWORD>'Step 3 - Set up config
$ threatexchange config collab edit tat --create 'TAT'Step 4 - Fetch hashes with verbose logging
$ threatexchange -v fetchStep 5 - View dataset
$ threatexchange datasetStep 6 - Match a piece of content
$ threatexchange match ~/path/to/image.jpg