Research

HomeEducationExperienceResearchPublicationsAwardsTeachingHobbiesFamily

Introduction

Every day companies and governments are collecting, storing, and gaining huge amount of digital information about us (work, activities, hopes, etc.). Big Data is the technical term used today to represent this massive information. Big Data is getting bigger every day. It has been reported recently by IBM[1]  that, every day they create about 2.5 quintillion (1 quintillion = 1 billion billions) bytes of data. As such 90% of the data in the world today has been created in the last two years alone. This data is collected from many sources, such as sensors, posts to many sites of social media, online business transactions, smart phone, GPS signals, etc.

The “3Vs” of Big Data, which are Volume, Variety and Velocity, is one of the most common characteristic to define Big Data. It makes Big Data a fertile field for analysis that would reveal new insights to optimize decision-making. This phenomenon is impacting how businesses operate, by increasing overall efficiency and improving user experiences. At the same time it raises high-stakes of risks and opportunities because the network boundaries will be thawed and the adversaries become very sophisticated as well.



[1] “What is big data?“ http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

Research philosophy and approach

My research philosophy focuses on adapting experimentally verifiable problems and then developing techniques to address these challenges. My research approach uses experimental system techniques in which my inspiration is drawn from methodologies rooted in information security, wire and wireless networks, storage and mobile computing.

Current and future research

Big Data security and privacy is tightly coupled with the 3Vs characteristics. The volume of data being collected and stored today is exploding from Terabytes to Exabyte (quintillion bytes). The variety of data sources in Big Data does not depend only on the variation of different structured database. The variety went beyond that to include raw and unstructured data generated from huge amount of sensors. Smart phones, social media technologies, log files, domain-specific forums, emails, and list goes on and on. Similarly, the streaming nature of data captured or collected in Big Data makes its velocity incomparable with traditional changing rate of conventional data.

Therefore, the common data security and privacy protection techniques are unsuitable for Big Data. This is because these traditional protection mechanisms are designed for small scale of data that is generated from non-streaming homogenies sources.

In my curent research, Cryptography, data-driven encryption and real-time analytics will be collectively utilized to identify scalable solutions for Big Data security and privacy challenges. The outcome of my research is expected to help industry and government in providing best practices when they aimed to address security and privacy to their Big Data.

Research Scope

As there are a large number of topics in big data security and privacy, we will emphasis on intensive investigation and working on finding suitable solutions to the most promising problems in the field. Hence, we will attack these problems from different dimensions simultaneously each focusing on separate but interconnected research areas such as

  • cryptography

  • data-driven security

  • scalable analytics

The scope of this research is described by the above areas collectively to include the following four components and their challenges:

  1. Secure Infrastructure

     Challenges:

  • Secure Communication protocols

  • Key management and agreement

  • Service-availability

  1. Data Encyption

     Challenges:

  • Data-centric encryption to protect data despite its storage locations or access methods. For example, attribute-based encryption, operation-preserving encryption, or policy-based encryption

  • Searching and recognition on the encrypted data

  1. Data Privacy

     Challenges:

  • Personal information discovery – classification and protection

  • Data monitoring – masking and control

  1. Data Governance

     Challenges:

  • Real-Time data analytics

  • Security of analytics results, for example, providing a new dimension for authentication mechanisms.

 
Previous research

My past research focused on designing and developing digital watermarking techniques for digital image security and hidden communications. New approaches that utilize both Cryptography and Steganography are proposed. These approaches show great promise for building strong image security applications like Image authentication, Image copyright protection, and Hidden communications.

Image Authentication

We have shown that most of the block-based image authentication techniques have problems when dealing with the tradeoffs of security and localization accuracy. A novel approach that embeds fragile watermarks into an image is proposed, where this trade-off no longer exists. One approach proposed that, the cryptosystems might be used to protect the watermark embedding process rather than encrypt the watermark itself. Furthermore, another approach for generating short and image-dependant representations of an image was proposed, where the correlation coefficient statistics of a pixel-block was utilized. We have shown that this second approach helps increase both the localization accuracy and the watermark capacity. Finally, a novel image authentication technique utilizing these new approaches was introduced. We have shown that this technique will provide more security, better localization accuracy, clearer perceptual verification, faster embedding and forward-compatible, when compared with existing counterparts.

Image Copyright Protection

One of the main challenges of the candidate watermark in image copyright applications is that the embedded watermark (the copyright notice) should be robust against image manipulations such as lossy compression. An elegant information hiding approach that could survive the JPEG2000 image compression standard was introduced, where the in-formation was embedded during the rate/distortion optimization phase. We have shown that the image quality when it contains the hidden information is very close to the image quality that is produced by the JPEG2000 standard. Finally, a complete scenario of an image copyright protection technique that utilizes this approach is proposed.

Hidden Communications

An enhancement to the hidden communication model has been made by adding a key-generation unit to the general module. A new key-agreement protocol, stego-KA has been proposed to exchange the sego-key(s) between the communicating parties covertly. The proposed stego-KA protocol is based on Diffie-Hellman key establishment protocol and has significant advantages that support hidden communications.