Information Exchange Portal Authors


Information Exchange Portal
Authors: Dheshan MAryan Jain
VIT UniversityVIT University
Abstract: This project, discusses in detail, the information required to host an Information Exchange portal, which will be hosting information provided by the user, and supplying the same to other users. This information sharing portal will also feature the AES Encryption standard to encrypt user data, and the user-generated content to protect the user’s private information, and the Portal’s IP (Intellectual Property)
Keywords: Huffman Encoding, AES Encryption, Compression, Abuse Prevention Systems.

Introduction
In modern times, where security breaches have now become common talk, the need to further conceal data in sophisticated ways arises. In this project, we will be tackling the problems faced in storing encrypted data, and the necessary steps that needs to be taken to ensure that the key remains secret and hidden from prying eyes.

In any service-based application, it becomes a necessity to protect the system from intentional abuse and exploits from users. Since our project relies on user input to function, we will also be looking at how to filter and vet content that is genuine and content that was just randomly typed to ‘game’ the system into getting more ‘points / Tokens’
Problem Description
In our Proposed model, we will be creating a Database that holds the following information:
User Information
userID, email Address, Name and other relevant details.

The Number of virtual tokens he has.

Information Portal
Content the user wants to share.

The Date, this information was posted on.

userID of the user who posted it.

Tags to help Classify the Information the user has provided.

The way our model operates is that any person can sign up for our service by providing his details, for which he will receive a few initial starting credits (Virtual Tokens). From there, he can use his credits to buy information which other people have posted for a fixed number of tokens (as decided by the system, based on the character count of the information) or post his Information and add it to our Database and receive more Tokens for it.
The Number of Tokens that will be deducted or earned from purchasing or posting information on the Portal will be determined by an Algorithm that checks to ensure the quality of the post and to prevent abuse of the system by typing random characters to increase character count and gain fore credits.

The sensitive information from users (i.e., email addresses, mobile numbers) and the IP of the portal (User submitted Information) need to be secure to ensure the systems’ integrity and the privacy of its users. To accomplish this, all sensitive content will be encrypted using the AES algorithm.

The number of Tokens to be offered upon a successful post Submission will be evaluated after the contents of the post are fed to a Huffman Encoding Algorithm, which will convert all the contents to its binary representation, and based on its Huffman Encoded length and the number of words in the post, the Tokens to be awarded are calculated.

Literature Review
Need for Encryption
Encryption is needed in our project to accomplish the following goals as stated by 1. Encryption in its very basic forms helps establish the following:
Confidentiality:
To ensure that only the intended recipient gets to read the information transmitted 1.

Authenticity/Authentication:
To prove that the person who claims to have written the letter is the one who actually wrote it and transmitted the data. It helps establish and verify the identity of the sender 1.

Data Integrity:
To check if the data has been modified either intentionally or accidentally by any third party without the explicit authorization of the original sender 1. This can be ensured by using hashing systems on both ends of the communication to confirm the integrity of the data.

Non-Repudiation:
To prove that message was in fact written and transmitted by the sender and was received by its intended recipient. It helps ascertain that the message was indeed sent by the sender 1.

Access Control:
It allows enforcing a set of policies that limit the powers and privileges of a user or a group of users so that they have access only to the files that they need access to. This helps prevent unauthorized use of the system 1.

Types of Encryption
Based on the number of keys that can be used to decrypt the data, the encryption algorithms can be divided/classified into the following two categories:
Symmetric key Cryptography
Asymmetric key Cryptography
The symmetric key system is the most straightforward of the two types of encryption systems. It uses a single key to both encrypt and decrypt the data 1. The same simplicity also leads to a very specific problem, regarding the transportation of the key as the key also has to be known by the recipient to decode the data, and in the event that the key is compromised, the whole of the encrypted data too gets compromised as it can be decrypted using the same key 1.

This problem is solved by the use of Asymmetric key systems as they use different keys for decrypting and encrypting. This means that there is no need for any transportation of keys, as it allows both the sender and the recipient to have a different/unique set of keys 1.

Why AES over other Encryption Techniques ?
Advanced Encryption Standard (Rijndael) also known as AES is an algorithm developed by Belgian cryptographers, Vincent Rijmen and Joan Daemen. It was chosen as the AES by the United States Secretary of Commerce despite the fact that the cipher was accessible to the public. It was also approved by the NSA for use in encrypting its confidential data 123.
Its wide spread approval to be chosen as the AES was very much due to its performance efficiency and the time required to break it/Brute-force it. It was estimated that breaking a symmetric 256-bit key by brute force requires 2128 times more computational power than a 128-bit key 1. Fifty supercomputers that could check a billion billion (1018) AES keys per second (if such a device could ever be made) would, in theory, require about 3×1051 years to exhaust the 256-bit key space. This made it practically impossible for anyone to brute force this algorithm, making it the most secure option among the Symmetrical encryption key algorithms 1.

How AES works?
AES is based on a design principle known as a substitution–permutation network and is fast in both software and hardware. For the data to be encrypted using AES, it is split into blocks of data (128, 192 or 265 bit each depending on the key) 3. The data is then represented in the form of a two-dimensional matrix and various operations are performed on the blocks for fixed number of rounds 2.

The number of rounds the operations are performed is dependent on the key chosen:
10 rounds for 128-bit keys.

12 rounds for 192-bit keys.

14 rounds for 256-bit keys.

Operations Performed
KeyExpansion—round keys are derived from the cipher key using Rijndael’s key schedule. AES requires a separate 128-bit round key block for each round plus one more 23.

Initial round key addition:
AddRoundKey—each byte of the state is combined with a block of the round key using bitwise XOR 3.

Repeated for 9, 11 or 13 rounds
SubBytes—a non-linear substitution step where each byte is replaced with another according to a lookup table 3.

ShiftRows—a transposition step where the last three rows of the state are shifted cyclically a certain number of steps 3.

MixColumns—a linear mixing operation which operates on the columns of the state, combining the four bytes in each column 3.

AddRoundKey
Final round (making 10, 12 or 14 rounds in total)
SubBytes
ShiftRows
AddRoundKey
Proposed Method
The issues faced by a content delivery based and user-generated-content driven services are,
Data Security
User Abuse
In order to counter the above-mentioned issues, the following counter measures can be employed.

Sophisticated Encryption Techniques
In common practice, all enterprises (small – medium sized) use centralized encryption servers meaning that all data has to be hauled to-and-from the server to be encrypted meaning that the encrypted data has to travel longer distances, all the while being vulnerable to attacks or sniffing. This can be dangerous when dealing with more sensitive data as even internal networks can indeed be invaded, and all data transfer lines need to be secured to ensure confidentiality.

33718501534795Figure 1Figure 13371850-546100
________________________________________________
Current Issue (as showcased in Fig 1), where there are open, encrypted data channels within the network. These unprotected channels can be tracked of sniffed out by any packet sniffing tools which will in turn reveal any data sent through the unsafe channel.

This problem can be relatively subverted by breaking the centralization of encryption system and instead opting for a decentralized local encryption server. One that resides in the client system and encrypts all outgoing data to the main server. The main server and the clients’ encryption server can be configured to use the same keys so that both the client and the server can read data freely without the need to transfer keys, completely elimination the need for a dedicated encryption server at the cost of a small overhead.

The next issue, when it comes to encryption is the method used in storing keys. Storing keys along with the same database can be dangerous, as when the system gets compromised, the key also gets compromised. Ideally, this can be prevented by using separate machines to store the data and the key, but, since this can’t be implemented in our project due to lack of technical knowledge, we decided in favor of adding another layer of security by encrypting the encryption keys using a different key that the user doesn’t have access to. This can be done in the back-end not adding any overhead to the user, albeit the server will suffer a bit more overhead.

Prventing User Abuse
Our project relies on giving out tokens to users for their fair contribution by submitting their ideas or any information. The number of tokens awarded is in-turn calculated programmatically based on the number of characters submitted. But this method can be heavily abused by exploitative users, who can just submit 1000 characters of random letters and get the same number of Tokens a genuine user who has given a significant amount of effort to produce a quality post worth 1000 characters.

Ideally, this problem can be solved by using compression techniques to reduce the size of the input, and ideally, the dictionary size used to compare and compress should be as large as possible. Since compression is both IO and processor intensive, our project will be utilizing a fair compromise in the form of using ‘Huffman Encoding’. That is the value of the string will not be based on the length of the string, but rather the encoded length of the string.

Results and Analysis
Let’s put the proposed method to combat User abuse prevention to the test. We will consider two Strings of identical sizes as our test input. Of the two, one will be the control i.e., in this context, will be a proper post written by a user with words and spaces resembling a typical English passage and the other input will be random keystrokes, this test case is representative of the people who are aiming to game the system for Tokens by trying to artificially inflate Character and word-count expecting a greater Reward. We will then be comparing the number of Tokens each Algorithm awards for the user.

Test Case 1
abbsfasabafnadfn dndnanf ndffnddnsd fndfsfbdfdrbsrnaer abbsfas abafnadfn dndnanf ndffnddnsdfndf sfbdfdrbsrnaer abbsf asabafnadfn dndnanf ndffnddnsdfnd fsfbdfdrbsrnaer abbsfasabafnadfn dndnanf ndffndd nsdfndfsfbdfdrbsrnaer abbsfasabafnadfn dndnanf ndffnddnsd fndfsfbdfdrbsrnaer abbsfasa bafnadfn dndnanf ndffnddnsdfndfsfbdfdrbsrnaer abbsfasab afnadfn dndnanf ndffnddnsdfndfs fbdfdrbsrnaer abbsfasabafnadfn dndnanf ndffnddn sdfndfsfbdfdrbsrnaer abbsfasabafnadfn dndnanf ndffnddnsdfndfs fbdfdrbsrnaer abbsfas abafnadfn dndnanf ndffnd dnsdf ndfsfbdfdrbsrnaer abbsfasa bafnadfn dndnanf ndffnddnsdfndfs fbdfdrbsrnaer 
Test Case 2 (Control)
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. This is a control.
Based on the above posts, it is clear that the Control post must get more Tokens when compared with the test post, which obviously looks like a troll post made with the sole intent to milk the system for its tokens without providing any valid information to the user. Therefore, the goal must be to determine such ‘troll’ post and prevent issuing more tokens to such posts and awarding more tokens to the Control post in this case.

Without using Huffman coding, the Tokens to be awarded is calculated based on the Formula:
# of Tokens = length of post / # of words (Calculated based on # of blank spaces)
When using Huffman encoding, the formula morphs to:
# of Tokens = length of (huffmanEncoding(post)) / # of words (Calculated based on # of blank spaces)
Word Value Algorithm Word Count Algorithm Using Huffman Encoding Without Huffman Encoding
Test 11.22 95 2 3
Control 5.063 50 3 3
Table 1 Consolidated results of the proposed method against other Algorithms.

Here, the Word value algorithm finds the worth of each word with respect only its length, therefore, the whole algorithm can be simplified to: # of characters / # of words. The Word count algorithm as the name suggests, used only the # of words as its factor in determining its end result.

Figure 2. Algorithms v. Tokens Awarded
On observing the Table and as clearly visualized by Figure 2, it is clear that the Proposed method of using Huffman Encoding is optimal as it greatly reduces the Tokens awarded and can be modified and has flexibility (by multiplying by constants to round off numbers etc.) while other algorithms are rigid and can’t be changed much.

References
Kumari, S. (2017). A research Paper on Cryptography Encryption and Compression Techniques. International Journal Of Engineering And Computer Science. https://doi.org/10.18535/ijecs/v6i4.20Daemen, Joan; Rijmen, Vincent (March 9, 2003). “AES Proposal: Rijndael”. National Institute of Standards and Technology.

Wikipedia. Advanced Encryption Standard. https://en.wikipedia.org/wiki/Advanced_Encryption_Standard