Exploring Online Sub-communities for Autism and Social Connections (DSI-SRP)
This DSI-SRP fellowship funded Chet Weissberg to work in the Network and Data Science (NDS) laboratory led by Assistant Professor Tyler Derr in the Department of Computer Science during the summer of 2021. Chet is a junior with a majors in Mathematics and minors in Computer Science and Cinema and Media Arts.
The project funded by this fellowship aimed to better understand the nature of communication and relationships in online communities related to ASD (autism spectrum disorder) and mental illnesses including depression. This project used a collected dataset of user posts and comments from specific subreddit threads on Reddit.
Chet focused on looking to better understand this collected dataset using various data science Python libraries including Pandas, Vaex, and various language processing and sentiment analysis techniques/libraries such as the Natural Language Toolkit (NLTK). He analyzed different Reddit sub-communities to find basic statistics like average number of comments per post and frequency of activity for each user. He processed text of the posts to find posts’ sentiment and toxicity and used that information to learn about larger Reddit communities. One important instance of language processing in this project was Identity-First vs. Person-First Language. In the ASD community, it is generally preferred that language like “ASD individual” be used over “individual with ASD,” although still a controversial issue with some. Using part-of-speech (POS) analysis commonly used in natural language processing (NLP), Chet found that, in subreddit communities related to ASD, Identity-First language is currently being used about five times more often than Person-First language.
This language and sentiment analysis was an important part leveraged for the social network analysis. Various ways of modeling the user interactions as a social network were key to better understanding these communities. These included post-centric graphs (tree graphs of a post and its associated comments), and user-centric graphs (graphs with users as nodes connected by their interactions). These graphs were analyzed on the basis of sentiment and toxicity between nodes, and through the use of node degree analysis and various other graph and social network theory techniques. This yielded and continues to yield important results about the potentially unique qualities of these online communities.
Additionally, Chet and his partner Ben Van Sleen worked on creating a graph neural network (GNN) predictive model using these graphs. As proof of concept, one GNN sorted posts back into the subreddit they were posted in. In the future Chet and Ben are considering exploring further into this part of the project. Apart from continuing to analyze the data and its graphs, Chet and Ben are interested in continuing working on GNN models because they utilize the predictive value of this dataset. Improved models based on this data may be able to use an individual’s post and interaction history to predict if they will post in a certain mental health related community (e.g., the r/SuicideWatch subreddit).
In addition to receiving support through a DSI-SRP fellowship, this project was supported and facilitated by the DSI Data Science Team through their regular summer workshops and demo sessions.