Visually Representing Data: Dr. Lisa Singh
Dr. Singh believes data mining can be beneficial when used appropriately. (Photo: Roland Dimaya)
Dr. Singh worked with student Mitch Beard to develop a software tool called Invenio that they use to visualize social networks. (Photo: Roland Dimaya)
Students say Dr. Singh is a vibrant and exciting teacher. (Photo: Roland Dimaya)
"I have always had concerns about the amount of information we readily share in our emails, on our webpages, during online purchases, and now in applications like Facebook and Flickr."
--Dr. Lisa Singh
By Theodora Danylevich
Dr. Lisa Singh sees her research work as a puzzle waiting for her attention.
“All my life, I have enjoyed solving puzzles,” says Dr. Singh, a professor in Georgetown’s Department of Computer Science. “I view the problems I work on as complex puzzles with stringent constraints. I then think about different ways to solve them. I hope that my solutions are meaningful and can be used to better understand how things work and predict outcomes with some reasonable degree of accuracy.”
Computer science becomes a powerful, state-of-the-art tool in the hands of Dr. Singh. She and her students focus much of their research on data mining, the practice of using algorithms to sift through vast amounts of data in order to detect patterns, relationships, and irregularities. Data mining is gaining currency and importance due to the stockpiling of massive amounts of data in computer databases and on the Internet that has occurred over the last few decades.
“The ’90s was a big time for collecting data, and now people have such huge repositories and want to analyze it effectively,” says Dr. Singh, describing data mining as “a search for some type of hidden knowledge in large data sets.”
One important application of data mining is in credit card fraud detection and theft prevention. While data mining can be a great tool to help protect card holders, problems sometimes arise. It can be quite frustrating near the end of a big day of holiday shopping to find that a block has been put on your card due to “unusual activity.” This happened once to Dr. Singh when she was shopping for the Indian holiday of Diwali a few years ago. She was purchasing gifts for each of the many children on her list, when suddenly the cashier informed her that her card was being declined. The reason was not for lack of funds, but for unusual and suspicious purchasing activity, which data mining technologies are geared to respond to by placing a block on one’s card or account.
“I immediately got on the phone with the company, and made sure that it never happened again. I told them that their data mining algorithm was awful and explained some ‘checks’ that should be included. I think the manager was shocked at the detailed level of changes I thought were necessary,” she says. To Dr. Singh, this incident was more than a mere hassle: “Data mining is an imprecise science. Results need to be interpreted with care. Otherwise, good data mining technologies are overshadowed by bad ones.”
Dr. Singh also has a passionate interest in privacy preservation, a relatively new area of research that is of increasing importance in today’s age of massive circulation of personal data, from online social networking sites to online banking and credit card data mining activities.
“I have always had concerns about the amount of information we readily share in our emails, on our webpages, during online purchases, and now in applications like Facebook and Flickr,” says Dr. Singh. “I have even larger concerns about the sharing of information across companies for data mining applications. While data mining itself is not a privacy threat, companies and agencies that have large amounts of data may use the data in conjunction with other publicly available data to ‘discover’ details that we prefer remain private.”
Nonetheless, data mining is an important and beneficial tool when managed appropriately.
“Because I see the potential gains of data mining for a large range of applications, I think it is important to investigate ways to continue to use data mining algorithms and share data without revealing the identity of those whom the data involves,” Dr. Singh says.
In her work targeting privacy preservation in data mining, she seeks to strike a critical balance between gaining important knowledge from data while maintaining optimal consumer privacy. To do this, she thinks of ways to transform and aggregate data, removing possible data items that could potentially reveal the identity of an individual. Thus, data miners are able to identify peaks and bursts in data, which are useful indicators of unusual and/or typical patterns of activity, without ever being able to rediscover the private data of individuals.
Dr. Singh and her students also use data mining to learn about social networks, among other applications. Information about social networks can be particularly convoluted and difficult to analyze, but it is increasingly sought after in our global market economy, so Dr. Singh and her students have felt challenged to innovate visualization techniques that will make the networks easier to interpret. Dr. Singh’s two-year project with student Mitch Beard has yielded a cutting-edge dynamic social network visualization program called Invenio, which is a visual data mining tool that allows social network information to be accessed and understood in unprecedented ways (see related video).
“As I continue to teach computer science, I have noticed how useful pictures can be for understanding both the problem and the solution,” says Dr. Singh. “Given the complexities of the data mining problems, algorithms, and solutions, I believe users can benefit from visual tools that can be used to interactively explore the data, view the data at multiple abstraction levels, and attempt to interpret the results of various data mining algorithms.”
Dr. Singh and her students have also teamed up with Dr. Janet Mann (featured in a previous issue) to incorporate innovative data warehousing and data mining techniques into Dr. Mann’s study of dolphins in the Monkey Mia Dolphin Research Project. For this project, Georgetown senior Clare Schramm, a teaching and research assistant for Dr. Singh, helped develop interactive observation forms that can be used during data collection. Alumnus Gregory Nelson developed the data model for a data warehouse that stores the observations made by various members of the Monkey Mia Dolphin Research Project. With data stored in a scalable data repository, data mining and graph mining techniques can be developed to better understand dolphin social structure. In helping Dr. Mann, Dr. Singh and her students have gained the benefit of working first-hand with real-world observational data.
“It was challenging and rewarding to work with Dr. Mann and her researchers to create a model that allowed them to put all the details they observed into the database, where they could search and analyze it an order of magnitude faster compared to their old data store,” says Nelson.
Respected and appreciated by her students, Dr. Singh is considered a fun and dynamic instructor.
“Teaching is something I know and love,” she says. “I enjoy trying to find simple ways to teach complex ideas. Sometimes I use strange examples to help students understand a concept. In many ways my philosophy of teaching is intertwined with my research objectives: finding ways to take something complex and simplify it. My greatest reward is when I teach my students programming, database management, or data mining and they come back years later and say, ‘I am using ideas you taught me. My managers think I am brilliant.’”
For more information on data mining, see previous Research News profiles of Dr. Hans Engler and Dr. Mark Maloof.