Contact me at firstname.lastname@example.org.
SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories
Sweta Karlekar, Mohit Bansal
Oral Presentation at EMNLP 2018
Paper (EMNLP 2018)
Oral Presentation (EMNLP 2018)
Dataset and Data Splits
With the recent rise of #MeToo, an increasing number of personal stories about sexual harassment and sexual abuse have been shared online. In order to push forward the fight against such harassment and abuse, we present the task of automatically categorizing and analyzing various forms of sexual harassment, based on stories shared on the online forum SafeCity. For the labels of groping, ogling, and commenting, our single-label CNN-RNN model achieves an accuracy of 86.5%, and our multi-label model achieves a Hamming score of 82.5%. Furthermore, we present analysis using LIME, first-derivative saliency heatmaps, activation clustering, and embedding visualization to interpret neural model predictions and demonstrate how this helps extract features that can help automatically fill out incident reports, identify unsafe areas, avoid unsafe practices, and ‘pin the creeps’.
Detecting Linguistic Characteristics of Alzheimer’s Dementia by Interpreting Neural Models
Sweta Karlekar, Tong Niu, Mohit Bansal
Poster Presented at NAACL 2018
Paper (NAACL 2018)
Poster (NAACL 2018)
Presentation (UNC Undergraduate Research Symposium)
Alzheimer's disease (AD) is an irreversible and progressive brain disease that can be stopped or slowed down with medical treatment. Language changes serve as a sign that a patient's cognitive functions have been impacted, potentially leading to early diagnosis. In this work, we use NLP techniques to classify and analyze the linguistic characteristics of AD patients using the DementiaBank dataset. We apply three neural models based on CNNs, LSTM-RNNs, and their combination, to distinguish between language samples from AD and control patients. We achieve a new independent benchmark accuracy for the AD classification task. More importantly, we next interpret what these neural models have learned about the linguistic characteristics of AD patients, via analysis based on activation clustering and first-derivative saliency techniques. We then perform novel automatic pattern discovery inside activation clusters, and consolidate AD patients' distinctive grammar patterns. Additionally, we show that first derivative saliency can not only rediscover previous language patterns of AD patients, but also shed light on the limitations of neural models.Lastly, we also include analysis of gender-separated AD data.
#MeToo: Neural Detection and Explanation of Language in Personal Abuse Stories
Sweta Karlekar, Mohit Bansal
Poster Presented at NAACL WiNLP 2018
Paper (NAACL WiNLP 2018)
Poster (NAACL WiNLP 2018)
The detection and classification of domestic abuse stories shared online has ever-increasing importance in today's social activism sphere. With massive numbers of stories shared, automatic detection can aggregate stories from around the internet and help push forward the fight against domestic abuse from a social campaign to social change. We develop CNN, LSTM-RNN, and CNN-LSTM neural models to detect domestic abuse stories in the Reddit Domestic Abuse dataset. We achieved 95.8% accuracy in classifying posts as containing abuse stories versus not containing abuse stories, outperforming the current state-of-the-art. More importantly, we next present sentiment-only classification feasibility as well as interpretable and explainable analysis of the neural model's predictions using activation clustering techniques to automatically discover linguistic features.
Developing a Method to Mask Trees in Commercial Multispectral Imagery
Becker, S. J.; Daughtry, C. S. T.; Jain, D.; Karlekar, S. S.
American Geophysical Union, Fall Meeting 2015
The US Army has an increasing focus on using automated remote sensing techniques with commercial multispectral imagery (MSI) to map urban and peri-urban agricultural and vegetative features; however, similar spectral profiles between trees (i.e., forest canopy) and other vegetation result in confusion between these cover classes. Established vegetation indices, like the Normalized Difference Vegetation Index (NDVI), are typically not effective in reliably differentiating between trees and other vegetation. Previous research in tree mapping has included integration of hyperspectral imagery (HSI) and LiDAR for tree detection and species identification, as well as the use of MSI to distinguish tree crowns from non-vegetated features. This project developed a straightforward method to model and also mask out trees from eight-band WorldView-2 (1.85 meter x 1.85 meter resolution at nadir) satellite imagery at the Beltsville Agricultural Research Center in Beltsville, MD spanning 2012 - 2015. The study site included tree cover, a range of agricultural and vegetative cover types, and urban features. The modeling method exploits the product of the red and red edge bands and defines accurate thresholds between trees and other land covers. Results show this method outperforms established vegetation indices including the NDVI, Soil Adjusted Vegetation Index, Normalized Difference Water Index, Simple Ratio, and Normalized Difference Red Edge Index in correctly masking trees while preserving the other information in the imagery. This method is useful when HSI and LiDAR collection are not possible or when using archived MSI.