##### Sections
News & Events

Home > Oral Candidacy - Sudip Vhaduri

# Oral Candidacy - Sudip Vhaduri

Start: 5/9/2017 at 11:00AM 5/9/2017 at 3:00PM 258 Fitzpatrick Hall Faculty and students are welcome to attend the presentation portion of the defense.
iCal vCal

Oral Candidacy
May 9, 2017          11:00 am          258 Fitzpatrick
Committee:
Dr. David Hachen         Dr. Aaron Striegel        Dr. Dong Wang

Title:

Reliable Discovery of Significant Places and Behavioral Patterns from Crowdsensed Datasets

Abstract:

Studies have shown that people spend around 92% of their day at places of personal importance’’, e.g., their home or work places. Knowledge of not only the actual geographic location, but also the type of place or the significance of that place to an individual, are essential components of location-based services. Examples of such services include location-based delivery of to-do lists (e.g., reminding a user of his/her shopping list when a user is close to a grocery store, or providing a list of books recommended by friends when a user has time to browse and is at a bookstore), activity recommendations (e.g., if a user visits some place for sightseeing, he/she can be recommended for some outdoor exercise, and for some food depending on the co-existence of those facilities at the same place or nearby places), transportation routine prediction,  location-based  activity  recognition, location-based social network, selection of operational modes of devices (e.g., switching a phone into vibrate mode when entering a hospital, a movie theater, a lecture hall, a place for personal reflection, or a place where one meets socially with others), mining an individual's life pattern, i.e., lifestyle and regularity in activity and mobility patterns, predicting future mobility patterns, finding similarity among users (e.g., people with  similar location  histories  are most likely to  share similar interests and preferences).

Places of personal importance are detected using spatio-temporal  clustering technique, which is  basically a  process  of  grouping objects based on their spatial and temporal similarity. It is an  emerging  data  mining research  area  dedicated  to  the  development  and application of novel computational techniques that  record  position,  time,  and thematic  or non-spatial properties of an object. Examples of such objects are  moving cars, forest fires, and earthquakes. Spatio-temporal data sets essentially capture changing values of spatial and thematic attributes over a period of time. An event in a spatio-temporal dataset describes a spatial and temporal phenomenon that  may happen at a certain time t'' and location x,'' e.g., earthquakes, hurricanes, traffic jams, and road accidents. While the two dimensional geographic dimensions are relatively manageable, their combination with time results in a number of challenges. The unique characteristic of spatio-temporal datasets requires significant  modification  of  data  mining  techniques  so  that they  can  exploit the  rich  spatial  and temporal relationships and patterns embedded in the datasets. Classical  data  mining  techniques  often perform  poorly  when modeling and representing spatio-temporal phenomena due to 2 main types of complexities:

• The changes of spatial and non-spatial properties of a spatio-temporal object is a mixture of continuous and discrete values.
• Each spatio-temporal object is influenced by its neighbor spatio-temporal objects. For example, the spread of fire is influenced by rain and changing wind speed and direction.

Researchers have been using spatio-temporal clustering techniques to group continuous streams of location data into different clusters, where each cluster may represent such a place of interest. However, due to some users' frequent position variations within the same locale (e.g., a person moving between different offices at work), the loss of location data (e.g., GPS-denied areas), and other situations (e.g., a user turns off the smartphone), there are often significant gaps in the location traces. As a matter of fact, studies have shown that many collected traces miss data for about 40-70% of the time on average. The result of such incomplete traces is that segmentation of the location data often yields a large number of small clusters, where many of these small clusters actually represent the same place. Determining appropriate clustering parameters is challenging, e.g., when the threshold for time gaps between consecutive data points used to separate clusters are chosen too small, too many clusters may be the result. On the other hand, if this threshold is chosen too large, the resulting clusters are inaccurate, i.e., many data points outside of the place of interest will be considered to be part of that place. One technique to address this is to start with a relatively low threshold and then merge co-located small clusters into fewer larger clusters, e.g., one such technique is fingerprint matching. Unfortunately, these techniques typically suffer from significant accuracy problems. Therefore, we need find techniques to fill the gaps in location traces to reliably determine user's significant places.