Big Data Tech 2016 has ended
Sessions are categorized to help you find the technical level you are looking for. Mild sessions are less technical, while spicy sessions are very technical. 
Back To Schedule
Tuesday, June 7 • 10:45am - 11:30am
Learning from History's Largest Source of Knowledge: The Promise and Pitfalls of Mining Wikipedia

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Wikipedia offers 279 different language editions ranging from English and Estonian to Swahili and Scots. This talk showcases the algorithmic opportunities of this amazing source of knowledge and touches on its potentially dangerous limitations.  We will discuss two new algorithms that extend the groundbreaking word2vec neural network. The first produces remarkably accurate vector representations of over 50 million words in hundreds of different languages. The second leverages simple geographic features to accurately infer the relationships between geographic places. Finally, we'll discuss big-data studies that surface geographic and gender inequalities within Wikipedia, and explore the effects of this bias on algorithms that learn from Wikipedia.

avatar for Shilad Sen

Shilad Sen

Associate Professor, Macalester College
Shilad Sen is an Associate Professor of Computer Science at Macalester College in St. Paul, MN and a data science research fellow for Target Corporation. He studies the relationship between algorithms, software, and people and focuses on biases and inequalities along dimensions such... Read More →

Tuesday June 7, 2016 10:45am - 11:30am CDT
P0806 (1st floor) 9700 France Ave S, Bloomington, MN 55431