I am kind of changing the way I do things here on the blog, by introducing new categories, and by maybe letting go some of the categories. Its been over 5 years, and I feel that I need to make some changes now.

And so, the first of the many new fun, rookie things that we’re going to do is find out where a person has been by learning what WiFi networks he has connected to (or has been around) in the past. It does sound fun, no?

tl;dr => Installed an app on my mobile that recorded the details of all the WiFi networks that were discovered by my device over a period of some months. Then, used Google Maps Geolocation API to find out the location of these WiFi networks. Plotted those co-ordinates on Google Maps, and it accurately showed me where I had been in the past.

In order to carry out this project, I had installed an app called Funf Open Sensing Framework on my Android device. It is a sensing and data processing framework for mobile devices. That means, it lets you collect a wide range of data signals accessible via your mobile phone; you can record and play with your Call Detail Records, Applications Data, and so on. Here we are going to record and play with our WiFi data. I ran Funf for 3-4 months, I think, and what follows is the analysis of the same.

After running the app for I don’t know how long, I exported 480Mb worth of WiFi connection metadata as a CSV file.

There were over 2000K rows, which LibreOffice Calc was not able to open. But, luckily (maybe, obviously?), pandas did. After deduplicating the records, there were 1030 rows left i.e. 1030 different WiFi networks that my mobile phone had found, or joined in the past. These networks would be instrumental in finding out the places I have been to.

I’d just mention quickly, since curiosity did take the best of me, that out of those 1030 WiFi networks, 15 were still using WEP. Bummer.

Here’s how the dataset looks like (there are 22 columns, the image below contains only some part of it):

Wifi Networks

There are several columns, but we are primarily concerned with 3 of those: the BSSID (MAC address of the gateway device that we’re connecting to), the SSID (name of the WiFi network) and the Signal Strength ( ‘level’ column in the CSV).

What’s the use of this data? Well, the BSSID uniquely identifies the network device and thus, the network, and the signal strength tells us how far or close we are from that particular network. We’re going to use this information and make a request to Google Maps Geolocation API which would then return us the location information.

How does Google Maps link WiFi access points to location? My best guess would be that they’ve somehow collected location data of different WiFi access points, and when given a pair or more of these, along with their signal strength, they simply triangulate the position out of it. How they collected location data of WiFi access points is beyond me, maybe some measure similar to the way Google Streetview went around and picked up the photos? Not sure.

While it was all fine and dandy, there was still a problem. As I mentioned earlier, the Google Maps Geolocation API requires information about at least two nearby WiFi access points in order to return a location information but we have no measure of knowing which access points are near to each other in our dataset. Or do we?

It took me a while to realise that I could find out nearby WiFi access points by simply taking into account another column called ‘Seen’. This column had the timestamp of when the WiFi network was discovered by the device. I made a simple assumption that whatever networks I see within 30 minutes timeframe are networks of the same area.

Nearby WiFi Groups

So, starting with the first access point, if its difference with the successive access point was less than 30 minutes, then I added them both to a single network group. Then, I selected another successive access point, and then compared its difference with the first one. And did the same. However, if the difference was greater than 30, I created a new network group for the access point, and then again repeatedly compared the difference with its successive access points.

In this way, at the end, I had grouped nearby WiFi networks together (depending upon whether they were within 30 minutes range to each other). The 1030 WiFi networks were grouped into 128 groups.

Geolocation API Response

After that, I just queried the Google Maps Geolocation API with all the information of a particular network group and stored the latitudes and longitudes. While doing so, I discarded the returned location that had a not so good accuracy. The accuracy field contains the accuracy of the location, in meters. If it is high, then we cannot pinpoint the exact location. I discarded all the locations that had an accuracy of 3000 meters or above. So, I was left with only 105 groups.

After reverse geocoding the latitudes and longitudes (i.e obtaining a name of the location from its co-ordinates), here’s the visualization of the data – the streets and cities I’ve been to the most.

Streets I have been to the most
Streets I have been to the most
Cities I have been to the most
Cities I have been to the most

Finally, let’s plot these positions on Google Maps: Where have I been?

How accurate are these?  I’ll try to see if I remember being in these places.

  • There’s two places that have a lot of markers – around Dhapasi area, and around Sifal area. Dhapasi is where I live, and Sifal is where my college lies. You can actually trace a path using the markers from Dhapasi to Sifal, which is my everyday travel route. Zoom in on the map and follow the markers on the path starting from Dhapasi. It goes like this: Dhapasi – Basundhara – Ringroad – Gopi Krishna Hall – Chabahil – Mitrapark – Deerwalk Institute of Technology. Pretty accurate. These are the places where my WiFi was turned on, so it picked up the networks around that area and we used these networks’ MAC address and Signal strength to get their location. If you click on a marker, you can see what nearby WiFi networks triangulated my position to that particular location.
  • If you scroll a bit down, you’d see some markers around Sanothimi. That is where I had gone to take my semester exams.
My positions
My positions plotted on the Google Maps
  • The other group of markers are near the Kathmandu College of Management, where I had gone to take part in the Open Data Day Hackathon 2016.
  • Sundhara and Jamal – yeah I remember being to those places. If you zoom in on Sundhara, you can actually see it pointing to General Post Office, which is exactly where I was.
  • I had been to Chitwan and you can see that there are two markers outside of Kathmandu. While returning, I had checked in from the Bharatpur Airport, and if you zoom in on one of the markers, you can actually see it. The other marker is near the place where I had stayed while I was there.

See how accurate the whole thing was? And yes, I don’t travel much.

How to reproduce the same on your end?

I have made the de-duplicated dataset public along with the code that was used. Both of these are available on this github repo.

You can also analyse your locations based on your laptop’s WiFi data. In Windows, the network history details are kept in registry. There’s a python module called WinReg that you could use for the purpose. In Linux, however, information about all the seen networks are not kept — only some are kept. You can access these records by opening up the network manager’s Network Connections list. I could not find the file where these are stored along with their BSSID. So, you might have to do some automation to grab BSSID of these networks from the Network Connections’ dialogue box. Or, you could start today, and start collecting WiFi networks using airmon. If you just want to use your mobile phone, then install Funf and let it run.

Thank you for reading. I hope it was not boring.