Distributed-flagging-system

Distributed Flagging System

A secure social media application which uses Distributed Computing to verify if a image a user wants to post is similar to the images given in the reference database. This is achieved using hashing methods like Perceptual Hashing, Average hashing, Difference hashing and Median hashing where the hash values are precomputed for reference images in a Database.

The uploaded image is then hashed with it's respective hash and later verified with the precomputed hashes with Hamming Distance and a threshold for the System to decide whether to vote or not for the given image.

If a Quorum exists the image isn't uploaded to the social media and the user is flagged and banned from the application after 2 warnings (Since this a proof of concept we used Cat Images as the reference image, so you can't post cat images in this server).

</img>

Image Hashing Algorithms

Image Hashing is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output here we use Similarity of Images.

Image	Image Hash
</img>	</img>
Query Image (Cat)	0xa4ad99b3629076ae

The above is an Example of Image Hashing Where We Query Using an Image and It’s Hash is computed on Each Server depending on the algorithm

Image Hashing Algorithms Implemented

Perceptual Hash
Average Hash
Median Hash
Difference Hash

Perceptual Hash

Uses Discrete Cosine Transform to capture frequencies in the image and Perform hash based on the 8x8 upper-left part of image and set bit to 1 if > mean

Median Hash

Scale Down grayscale Image to 8x8 and compute Median of the values of pixels and set bit to 1 if > median

Average Hash

Compute the average of the pixel values in the image (8x8) without DCT and compute the hash

Difference Hash

After Image Scaling and Gray scaling compute dhash by computing the difference between adjacent pixels and compute hash

What’s the need for Multiple Algorithms?

Certain Algorithm are not resistant to image augmentation such as Skewing , Cropping , Contrast changes

Each Algorithm will have a threshold beyond which the algorithm identifies the image as illegal so we provide a distributed environment using raft consensus protocol where each hash is computed and voted based on its threshold ensuring:

Fault Tolerance (Regular Raft Leader Election at select time-intervals)
Reliablity (Multiple Servers' Vote Ensure Reliablity)
Safety 

System Voting Architecture

</img>

Multiple P-Hash Implementation Servers are initialized using PySyncObj and the class is replicated at an IP Address and a Port.
They all communicate with each other with Raft Protocol and with Reference values in DB they check if a post is simialar to the reference hashes
Upon Quorum (Equivalent to [floor(Alive_Servers / 2) + 1]) if majority of alive servers voted the image to be similar then the user is flagged and image isn’t posted else it is posted.
Software Tech Stack

ReactJs, TailwindCSS & Javascript: Used for the frontend of the Social Media Application (Triple A).
JWT Tokens: Used for representing data securely between client and server.
Toaster: Used for Notifications on successful post, warning for innapropriate post and being banned.
Python & Flask: Programming Language and Micro Web Framework used for the Backend.
PySyncObj: A framework to replicate classes on different IP Addresses & Ports ensuring consensus using RAFT Protocol.
SQLite: Storing the URL for the image posts, user information and reference hashes for each P-hash Implementation.

You can check out the immplementation here: Demo video

Replicated Classes and Types:

The following is the type initilization for the HashServer Class in Backend:

(Note: Ensure this is the number mapping to server creation otherwise the comparison and hash retrieval will be wrong)
Server type 1 is for Average Hash
Server type 2 is for Difference Hash
Server type 3 is for Perceptual Hash
Server type 4 is for Median Hash

Make it Work for your Use Case:

The DB is initialized with the reference hash of the reference images which is set up here: Reference DB Images
Upon server being activated the reference hashes are calculated, so put the reference images you want instead if you want to block a specific type of content.
You can initialize any number of servers (This implementation we initialized 4 Servers) with varying hashes and thresholds.
A distributed system is more trustworthy but is also slower hence give enough time in between server initializations and voting time for better efficiency ( use time.sleep() in backend for this ).

Setup Instructions

Backend Setup

Navigate to the backend directory.
Create a virtual environment:
```
 > python3 -m venv venv
```

Activate the virtual environment:

On Windows:
```
  > venv\Scripts\activate
```
On macOS and Linux:
```
  > source venv/bin/activate
```

Install dependencies:
```
 > pip install -r requirements.txt
```
Set up the database:
- Modify the config.py file to specify your database configuration.
Start the Flask server:
```
 > flask run
```

Frontend Setup

Navigate to the d_flag directory.
Install dependencies:
```
 > npm install
```
Start the React development server:
```
 > npm start
```

Usage

Access the frontend at http://localhost:3000.
The backend APIs are available at http://localhost:5000.
Hit the endpoint http://127.0.0.1:5000/initdb to initialise the DB with reference cat images. This will be required for the algorithms to compare hashes and vote.

Folder Structure

backend: Contains the Flask backend code.
d_flag: Contains the React frontend code.