Jump to Content
Sai Teja Peddinti

Sai Teja Peddinti

Sai Teja Peddinti is a Research Scientist in the Infrastructure Security and Privacy group at Google. He received his PhD in Computer Science from New York University in 2015. The focus of his PhD work was in large scale data-driven analysis to understand user privacy preferences and concerns, and to evaluate effectiveness of privacy solutions. His research interests are in privacy, machine learning, network and cloud security, and cryptography.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Towards Fine-Grained Localization of Privacy Behaviors
    Vijayanta Jain
    Sepideh Ghanavati
    Collin McMillan
    2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pp. 258-277
    Preview abstract Privacy labels help developers communicate their application's privacy behaviors (i.e., how and why an application uses personal information) to users. But, studies show that developers face several challenges in creating them and the resultant labels are often inconsistent with their application's privacy behaviors. In this paper, we create a novel methodology called fine-grained localization of privacy behaviors to locate individual statements in source code which encode privacy behaviors and predict their privacy labels. We design and develop an attention-based multi-head encoder model which creates individual representations of multiple methods and uses attention to identify relevant statements that implement privacy behaviors. These statements are then used to predict privacy labels for the application's source code and can help developers write privacy statements that can be used as notices. Our quantitative analysis shows that our approach can achieve high accuracy in identifying privacy labels, with the lowest accuracy of 91.41% and the highest of 98.45%. We also evaluate the efficacy of our approach with six software professionals from our university. The results demonstrate that our approach reduces the time and mental effort required by developers to create high-quality privacy statements and can finely localize statements in methods that implement privacy behaviors. View details
    Preview abstract Integrating user feedback is one of the pillars for building successful products. However, this feedback is generally collected in an unstructured free-text form, which is challenging to understand at scale. This is particularly demanding in the privacy domain due to the nuances associated with the concept and the limited existing solutions. In this work, we present Hark, a system for discovering and summarizing privacy-related feedback at scale. Hark automates the entire process of summarizing privacy feedback, starting from unstructured text and resulting in a hierarchy of high-level privacy themes and fine-grained issues within each theme, along with representative reviews for each issue. At the core of Hark is a set of new deep learning models trained on different tasks, such as privacy feedback classification, privacy issues generation, and high-level theme creation. We illustrate Hark’s efficacy on a corpus of 626M Google Play reviews. Out of this corpus, our privacy feedback classifier extracts 6M privacy-related reviews (with an AUC-ROC of 0.92). With three annotation studies, we show that Hark’s generated issues are of high accuracy and coverage and that the theme titles are of high quality. We illustrate Hark’s capabilities by presenting high-level insights from 1.3M Android apps. View details
    Preview abstract In this paper we present a methodology to analyze users’ concerns and perspectives about privacy at scale. We leverage NLP techniques to process millions of mobile app reviews and extract privacy concerns. Our methodology is composed of a binary classifier that distinguishes between privacy and non-privacy related reviews. We use clustering to gather reviews that discuss similar privacy concerns, and employ summarization metrics to extract representative reviews to summarize each cluster. We apply our methods on 287M reviews for about 2M apps across the 29 categories in Google Play to identify top privacy pain points in mobile apps. We identified approximately 440K privacy related reviews. We find that privacy related reviews occur in all 29 categories, with some issues arising across numerous app categories and other issues only surfacing in a small set of app categories. We show empirical evidence that confirms dominant privacy themes – concerns about apps requesting unnecessary permissions, collection of personal information, frustration with privacy controls, tracking and the selling of personal data. As far as we know, this is the first large scale analysis to confirm these findings based on hundreds of thousands of user inputs. We also observe some unexpected findings such as users warning each other not to install an app due to privacy issues, users uninstalling apps due to privacy reasons, as well as positive reviews that reward developers for privacy friendly apps. Finally we discuss the implications of our method and findings for developers and app stores. View details
    PAcT: Detecting and Classifying Privacy Behavior of Android Applications
    Vijayanta Jain
    Sanonda Datta Gupta
    Sepideh Ghanavati
    Collin McMillan
    Proceedings of the 15th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Association for Computing Machinery, New York, NY, USA (2022), 104–118
    Preview abstract Interpreting and describing mobile applications' privacy behaviors to ensure creating consistent and accurate privacy notices is a challenging task for developers. Traditional approaches to creating privacy notices are based on predefined templates or questionnaires and do not rely on any traceable behaviors in code which may result in inconsistent and inaccurate notices. In this paper, we present an automated approach to detect privacy behaviors in code of Android applications. We develop Privacy Action Taxonomy (PAcT), which includes labels for Practice (i.e. how applications use personal information) and Purpose (i.e. why). We annotate ~5,200 code segments based on the labels and create a multi-label multi-class dataset with ~14,000 labels. We develop and train deep learning models to classify code segments. We achieve the highest F-1 scores across all label types of 79.62% and 79.02% for Practice and Purpose. View details
    A Large Scale Study of Users Behaviors, Expectations and Engagement with Android Permissions
    Weicheng Cao
    Chunqiu Xia
    David Lie
    Lisa Austin
    Usenix Security Symposium, Usenix, https://www.usenix.org/conference/usenixsecurity21 (2021)
    Preview abstract We conduct a global study on the behaviors, expectations and engagement of 1,719 participants across 10 countries and regions towards Android application permissions. Participants were recruited using mobile advertising and used an application we designed for 30 days. Our app samples user behaviors (decisions made), rationales (via in-situ surveys), expectations, and attitudes, as well as some app provided explanations. We study the grant and deny decisions our users make, and build mixed effect logistic regression models to illustrate the many factors that influence this decision making. Among several interesting findings, we observed that users facing an unexpected permission request are more than twice as likely to deny it compared to a user who expects it, and that permission requests accompanied by an explanation have a deny rate that is roughly half the deny rate of app permission requests without explanations. These findings remain true even when controlling for other factors. To the best of our knowledge, this may be the first study of actual privacy behavior (not stated behavior) for Android apps, with users using their own devices, across multiple continents. View details
    PriGen: Towards Automated Translation of Android Applications' Code to Privacy Captions
    Vijayanta Jain
    Sanonda Datta Gupta
    Sepideh Ghanavati
    Research Challenges in Information Science, Springer International Publishing (2021), pp. 142-151
    Preview abstract Mobile applications are required to give privacy notices to the users when they collect or share personal information. Creating consistent and concise privacy notices can be a challenging task for developers. Previous work has attempted to help developers create privacy notices through a questionnaire or predefined templates. In this paper, we propose a novel approach and a framework, called PriGen, that extends these prior work. PriGen uses static analysis to identify Android applications’ code segments which process personal information (i.e. permission-requiring code segments) and then leverages a Neural Machine Translation model to translate them into privacy captions. We present the initial analysis of our translation task for ~300,000 code segments. View details
    Reducing Permission Requests in Mobile Apps
    Martin Pelikan
    Ulfar Erlingsson
    Giles Hogben
    Proceedings of ACM Internet Measurement Conference (IMC) (2019)
    Preview abstract Users of mobile apps sometimes express discomfort or concerns with what they see as unnecessary or intrusive permission requests by certain apps. However encouraging mobile app developers to request fewer permissions is challenging because there are many reasons why permissions are requested; furthermore, prior work has shown it is hard to disambiguate the purpose of a particular permission with high certainty. In this work we describe a novel, algorithmic mechanism intended to discourage mobile-app developers from asking for unnecessary permissions. Developers are incentivized by an automated alert, or "nudge", shown in the Google Play Console when their apps ask for permissions that are requested by very few functionally-similar apps---in other words, by their competition. Empirically, this incentive is effective, with significant developer response since its deployment. Permissions have been redacted by 59% of apps that were warned, and this attenuation has occurred broadly across both app categories and app popularity levels. Importantly, billions of users' app installs from the Google Play have benefited from these redactions View details
    Preview abstract A great deal of research on the management of user data on smartphones via permission systems has revealed significant levels of user discomfort, lack of understanding, and lack of attention. The majority of these studies were conducted on Android devices before runtime permission dialogs were widely deployed. In this paper we explore how users make decisions with runtime dialogs on smartphones with Android 6.0 or higher. We employ an experience sampling methodology in order to ask users the reasons influencing their decisions immediately after they decide. We conducted a longitudinal survey with 157 participants over a 6 week period. We explore the grant and denial rates of permissions, overall and on a per permission type basis. Overall, our participants accepted 84% of the permission requests. We observe differences in the denial rates across permissions types; these vary from 23% (for microphone) to 10% (calendar). We find that one of the main reasons for granting or denying a permission request depends on users’ expectation on whether or not an app should need a permission. A common reason for denying permissions is because users know they can change them later. Among the permissions granted, our participants said they were comfortable with 90% of those decisions - indicating that for 10% of grant decisions users may be consenting reluctantly. Interestingly, we found that women deny permissions twice as often as men. View details
    Perceived Frequency of Advertising Practices
    Allen Collins
    Aaron Sedley
    Allison Woodruff
    Symposium on Usable Privacy and Security (SOUPS), Privacy Personas and Segmentation Workshop, Usenix (2015)
    Preview abstract In this paper, we introduce a new construct for measuring individuals’ privacy-related beliefs and understandings, namely their perception of the frequency with which information about individuals is gathered and used by others for advertising purposes. We introduce a preliminary instrument for measuring this perception, called the Ad Practice Frequency Perception Scale. We report data from a survey using this instrument, as well as the results of an initial clustering of participants based on this data. Our results, while preliminary, suggest that this construct may have future potential to characterize and segment individuals, and is worthy of further exploration. View details
    Preview abstract The range of topics that users of online services consider sensitive is often broader than what service providers or regulators deem sensitive. A data-driven approach can help providers improve products with features that let users exercise privacy preferences more effectively. View details
    Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity
    Aleksandra Korolova
    2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014, IEEE Computer Society, pp. 493-508
    Preview
    No Results Found