The success of online social networks (OSNs) has attracted a constant interest in attacking and exploiting them. Attackers usually control malicious accounts, including both fake and compromised real user accounts, to launch attack campaigns such as social spam, malware distribution, and online rating distortion. The goal of this project is to design an effective, generic, and scalable defense mechanism to uncover large groups of malicious accounts.
We observe that malicious accounts tend to act together in a variety of social network context, because attackers usually use them in groups. In a large attack campaign, malicious accounts are controlled to perform certain types of actions in order to accomplish the attacker's missions, such as liking a specific set of Facebook pages, following a certain set of users on Instagram, and other campaign goals driven by black-market demands. Figure 1 shows a real-world example of malicious accounts that share the same mission. The left plot shows the user-following actions of 1000 randomly sampled users from a 7000-malicious-user group on Instagram. This group of users shows a salient on-off behavior pattern. During the active period, these malicious accounts follow the same set of users at around the same time. We call these actions loosely synchronized actions. In contrast, the user-following actions of 1000 sampled normal users shown on the right plot are much more diverse in terms of the followee IDs and the timestamps.
In general, malicious accounts often perform loosely synchronized actions, i.e., acting similarly at around the same time, in a variety of social network context. We believe that this is partly because attackers are driven by profit. They face constraints in order to keep their attack campaigns profitable. Specifically, the missions of attack campaigns constitute attackers' mission constraints and the limited Infrastructure to launch attack campaigns constitute resource constraints. These constraints originate from the economic motive of large attack campaigns. The mission constraints and the resources constraints affect an attacker's revenue and cost, respectively.
|Figure 2: An illustration of the mission and resource constraints an attacker faces in a large attack campaign.|
There are a few of challenges to reliably detect malicious account groups for large-scale social networks such as Facebook.
Based on the understanding of the mission and resource constraints an attacker faces, we propose a behavioral clustering approach, called SynchroTrap, to detect malicious accounts that send out loosely synchronized actions over time. At a high level, we group accounts with similar action sequences into clusters, and then designate large account clusters as suspicious. We made the following design choices for SynchroTrap's clustering system.
|Figure 3: Main idea of SynchroTrap. A vertical arrow on the time axis represents a user's action. Different types of arrows show different types of actions on an online social network. SynchroTrap detects large clusters of malicious accounts that send out loosely synchronized actions.|
Action formulation: To pinpoint synchronized actions, we use a tuple abstraction 〈U, T, C〉 to represent a user action. U represents user ID, T and C are the time-stamp and constraint object embedded in an action. We formulate the explicit constraint field so that we are able to directly attack the resource and mission constraints of an attacker. This tuple abstraction is expressive. The constraint field can express many resource and mission constraints, depending on the applications. For example, in Facebook page like and Instagram user following, we designate the page ID and followee ID as the mission constraint objects. In Facebook photo upload, we designate the source IP address as the resource constraint object.
Action matching: Based on the tuple abstraction, we define action matching, denoted by "≈". Two actions of different users match if they pertain to the same constraint object and their timestamps fall into the same time window of a pre-defined length Tsim (e.g., 1 hour), i.e., 〈Ui, Ti, Ci〉 ≈ 〈Uj, Tj, Cj〉 if Ci=Cj and |Ti - Tj| ≤ Tsim.
Similarity metric: We measure pairwise similarity of users by comparing their sets of actions performed during a detection window. Our similarity metric calculates the ratio of the matched actions to characterize the consistency of the synchronization between users. We use the Jaccard similarity, which measures the similarity of two sets by the ratio of the intersection set size to the union set size. Higher ratio of set overlapping indicates higher similarity. In SynchroTrap, we apply Jaccard similarity to action sets of users. When we compute the intersections and unions, we use our action matching operator ≈.
Parallelizable clustering: We use a parallelizable clustering algorithm that is transformed from the single-linkage hierarchical clustering algorithm. The single-linkage hierarchical clustering algorithm is a bottom-up approach. On a user similarity graph, a user is a node and the similarity of a user pair is the weight of the edge between them. The single-linkage hierarchical clustering algorithm starts with each user as a different cluster, and iteratively merges clusters with high similarity to generate larger clusters. The algorithm terminates when all users are merged into one cluster. This procedure generates a cluster-merging dendrogram. By breaking the dendrogram at a desired level, one can obtain a set of clusters, where the similarity of a user pair within a cluster exceeds a certain threshold. This algorithm is effective. However, the construction of the dendrogram is sequential, making it hard to be parallelized. Our idea is to set the similarity threshold first. If we set the similarity threshold first and filter out user pairs with similarity below that, the desired user clusters are exactly the connected components in the pruned user similarity graph. Therefore, we can avoid the sequential cluster-merging procedure and transform the algorithm to the connected components algorithm, which is known parallelizable and scalable.
SynchroTrap is designed for large-scale social networks such as Facebook and Instagram. To address the big-data challenge, we designed a scalable workflow for SynchroTrap using incremental processing. Our idea is to split computation on a bulk data set into a series of smaller computations and pipeline data chunks between them. As a result, we implement SynchroTrap as a pipeline of jobs. Specifically, we slice the user similarity computation into daily jobs, each of which generates user pairs with matched actions. We aggregate daily similarity measures and perform user clustering on a weekly basis. By breaking down large computations, this design reduces failure-recovery cost and simplifies the system management. The daily results are shared among weekly aggregation jobs. If we rerun particular aggregation and clustering jobs or run them against a data set of a particular period of time, we do not have to rerun the daily jobs. We have also designed other optimization techniques, such as partitioning data with overlapping sliding time windows, to further improve the system performance.
|Figure 4: SynchroTrap's workflow. New aggregation and clustering jobs (dashed) do not incur re-execution of daily jobs. Arrows indicate the data flow.|
We deployed SynchroTrap at Facebook and Instagram. We deployed it in Facebook login and photo upload to catch resource-constrained attacks; we deployed in Facebook application install and page like, and Instagram user following to catch mission-constrained attacks. We aim to answer three key questions regarding the effectiveness and scalability:
Effectiveness: Within the month of August 2013, SynchroTrap detected more than two million malicious accounts and thousands of large attack campaigns, with a precision higher than 99%. These detected attack campaigns involved hundreds of millions of malicious actions.
Detecting new attacks: Within the month of August 2013, at least 70% of the accounts identified in each application are not discovered by existing counter-measures inside Facebook. SynchroTrap complements existing counter-measures by focusing on the loosely synchronized actions across malicious accounts.
Scalability: A daily user comparison job can finish within 10 hours; a weekly aggregation job does so within 15 hours; and a weekly clustering job takes less than 2 hours.
qiangcao AT cs DOT duke DOT edu