My Travel Application Doesn’t Know About Stonehenge, and That’s Okay

Posted on 2026-01-26

Tagged as:

Personal Projects

In late 2023, my partner and I did the unthinkable and ventured six hours into the Sahara Desert to camp out on the sand for a night… with a tour group, hence I’m alive to write the very words that you’re reading right now. To say that we were off the beaten path is a bit of an understatement. In fact, about five hours into our drive, the driver had taken a hard right turn off of the last thing we’d see that even resembled a path until the next day.

Going to Egypt’s White Desert National Park was perhaps the greatest travel experience I had embarked on in my life up until that point. We spent the day sandboarding down dunes the size of buildings and exploring alien-like rock structures, only to fall asleep lying on the cool sand as we watched shooting stars crisscross the arms of the Milky Way above us.

You see, as I was rapidly approaching the end of my 20s, I would soon face the impossible choice about what hobby to turn into my entire personality. As I lay under the most beautiful night sky I’ve ever seen, it was hard to imagine that I would’ve chosen anything other than traveling.

In pursuit of that goal, I was determined to find more of these types of experiences around the world.

My Plan to Become the Coolest Person Alive

Normally, when I am planning for a trip, I gravitate towards blog posts about other travelers’ itineraries. I will usually end up reading a handful of these types of itineraries and pick out some of the interesting tourist attractions that I find between the 95,000 SafetyWing affiliate links.

I figured that finding truly unique travel destinations was pretty much just a matter of reading every single travel itinerary ever written and finding the places and activities that were mentioned the least. While I don’t have time to read that many blog posts, ChatGPT definitely did, so I hatched up my plan to download the text from as many blog posts as I could find and chuck them into ChatGPT.

Within about a month, I managed to process about 2 million blog posts that were sourced from about 500 individual travel blogs to build a list of over 18 million places to go. I was ecstatic. Surely, with a list of tourist attractions this long, I would be able to visit enough unique travel destinations to become the coolest person that anyone has ever met.

Coincidentally, it was around the time that the initial list was generated that I happened to be getting ready for a trip to Peru. It felt like a great opportunity to use what I had just built. I opened up localhost and excitedly typed in Peru and prepared to go somewhere so off the beaten path that no one had ever been there before.

Finding An Oasis in the Desert

Needless to say, this is not what happened when I first opened up the development version of my new hot travel website. In fact, most of the entries that were listed were just basic activities that could be done by travelers, such as “hiking in Peru” and “eating in Peru.” There were other massive issues too. For instance, in other countries, I would also discover entries for destinations that have long since been destroyed, such as the Colossus of Rhodes. Admittedly, if I had figured out a way to visit that, I would definitely have been the coolest person that anyone has ever met.

After I had scrolled past the entries for eating, breathing, and defecating in Peru, I came across two entries that did pique my interest: the Museum of Antonio Pasamán and the Oasis de Congollay. Seeing these two entries among the excruciatingly long list of bodily functions that I was able to perform in Peru felt like seeing a mirage in the desert (or like I imagine it would have felt, I did not see one in Egypt.)

There was just one tiny problem with both of these destinations: they didn’t exist. Which did at least accomplish my goal of going somewhere that no one had ever been to before.

Going Nowhere Fast (Literally)

The particularly adept product managers among us may be able to spot a slight problem with a travel application that includes both places that no longer exist and places that never existed. It just simply isn’t what anyone in the biz would call a delightful user experience, in fact, you might call it a DEFCON-1 product situation. Signing up for a new, unknown website to learn about unique travel experiences only to find the Colossus of Rhodes and a place that doesn’t exist erodes trust in the product.

Beyond that, though, the first iteration of Nomadic Atlas wasn’t even a useful product. I tend to view search-type websites the same way that I view convenience stores. I just want to get in, find what I’m looking for, and leave as quickly as possible. Scrolling through a list of 18 million travel destinations of which most are just some variation of “{general activity} in {country}” is just not that, even if you do happen to find a travel destination that ends up being the one thing you talk about for the rest of your life.

At this point, I had a product with a bad experience and no identity. Thankfully, all was not lost! I had a pretty solid idea of what went wrong and how to improve the situation, but to qualify it, we need to do a bit of background learning. Onto the world of machine learning!

Recalling My Machine Learning Course

When you’re building a machine learning model, there are a few different metrics with which you can evaluate it, but there are two that are particularly interesting to what you’re about to read. They are precision and recall.

To understand the difference, we need to use our imagination a little bit. Imagine that we’re building a classification model that determines if we want to display or hide a given tourist attraction to our users. A positive result means that an entry is a legitimate tourist attraction that someone could go to today and, therefore, should be displayed, while a negative result means, for whatever reason, that we should hide it.

Stretch your imagination a little bit further to imagine that we have a list of 100 tourist attractions, of which 40 are legitimate tourist attractions, like the Coliseum and not the Colossus. In a perfect world, our model would accurately classify these 40 as positive entries while the remaining 60 would be negative. It doesn’t take a lot of looking around to see that we don’t live in a perfect world, so imagine that our model labels 30 of the positive results as such while also labeling 5 of the negative results as such.

A model’s recall rate is the percentage of positive results that were labeled as positive. In this case, we have 40 results that should be labeled as positive, but only 30 of them actually are, so the recall rate is 75%.

It’s really important to note that mislabeling negative results doesn’t affect the recall rate. In our example this means that if, in addition to the 30 positive results it correctly identified, our model labeled all 60 negative results incorrectly, we would still have a recall rate of 75%.

In isolation, this rate doesn’t mean much then. That’s why we have precision rate. This rate, on the other hand, measures the percentage of entries that are truly positive of the number of entries our model said were positive. In our example, the model had a precision rate of 85% since of the 35 results it said were positive, 30 of them actually were. In this calculation, incorrectly labeled results do matter quite a bit.

Precision or Recall?

This may leave you wondering the following things: is precision better than recall? And why would I bring this up when Nomadic Atlas doesn’t use machine learning?

To answer is the first question, whether or not you want a higher recall rate or a higher precision rate depends on the kind of application that you’re building. Classification models output a percent chance that something is positive, which means that a developer needs to pick a tolerance above which everything is considered positive. The recall rate and precision rate are a function both of the model’s ability to calculate a reasonable likelihood, which is controlled by things like input data and model selection, and the tolerance that the development team uses.

A higher recall rate is better in situations when you want to catch everything and don’t mind noise. On the other hand, a really conservative model that used a tolerance of 100%, which would require 100% certainty that a data point was positive, would have less noise but miss positive results as well.

But why mention this when I’m not talking about a project that used machine learning? In effect, when making trade-offs between higher recall and higher precision, you’re making a decision about how conservative you want to be. But here’s the secret: anyone designing a product can think about this, even if you’re not dealing with a machine learning model directly.

The Precise Problem of Nomadic Atlas

In its original form, the Nomadic Atlas database was a high recall, low precision database. I say that it was high recall because it contained a ton of legitimately interesting tourist attractions, but it was low precision because it also allowed non-existent, destroyed, or unclear tourist attractions in the database.

The reason that I got here was that Nomadic Atlas had no way to verify the output of the LLM, it just directly allowed everything that the LLM output to reach the users. LLMs, by nature, tend towards a higher recall, a relatively low precision due to their propensity for hallucination and taking a generously broad latitude in interpreting instructions.

Turning Things Around

If you look at the website for Nomadic Atlas now, you’ll notice that while the precision is not exactly 100%, it is high enough to actually be relatively useful. I was actually able to increase the precision through months of elbow grease and another couple runs of my data pipeline. Here’s how I did it.

It maybe should have occurred to me earlier that if there was a tourist attraction that I could visit, it would exist on a map. Once I finally used my brain to think of this truly novel idea, it naturally made a lot of sense to use mapping APIs to validate whether or not something existed. I ended up using an unholy combination of about three separate mapping APIs to stay within free plans and rate limits as well as Wikipedia’s API to validate the existence of different places.

In order to increase the precision of the data, I used a few different methods to verify the authenticity of each entry.

Firstly, I was able to filter out any entry that the Wikipedia API mapped to a page that didn’t contain coordinates. This meant that something like hiking in Peru, which mapped to hiking, would be filtered out.

For entries that couldn’t map to Wikipedia pages, I used my cocktail of different mapping APIs to validate the places. I imposed a strict requirement that these entries had an exact match on the name to prevent something like hiking in Peru from mapping to Peru instead.

Wikipedia entries also have data associated with them via the Wikidata API, which gives qualitative information about the entry, including the category of thing to which the page refers as well as whether or not the page in question refers to something that has been closed or destroyed. I selectively picked the categories allowed as well as filtered out anything that had been destroyed, like the Colossus of Rhodes.

Lastly, I ended up ignoring items that were too close to each other and not located within 15 miles of a city center. This was actually how Stonehenge got filtered out. Stonehenge was too close to something else, and my system decided to play it safe by ignoring it!

Closing Thoughts (You Don’t Have to Go Home…)

When planning a trip to the United Kingdom, you would probably be surprised if the search engine that you were using didn’t mention Stonehenge. I’ll admit that it’s a bit weird, but you would probably be frustrated to have to filter through entries for destroyed buildings like the Crystal Palace or Euston Arch. Furthermore, I’d be willing to bet that you’d be pretty angry to be let down by an exciting tourist attraction that didn’t actually exist. It goes to show that when building a product, especially one that primarily functions as a search engine or directory, an incomplete but trustworthy product will be better than something that is complete but dubious.

LLMs have unlocked a ton of excitement around the types of products that we’re able to build. So far, though, most of these product features tend to involve simply strapping a chatbot onto an existing product, giving it some cute name like Cleo, and calling it a day. Personally, I’ve never been a fan of this approach. I do think that if there is value in using LLMs as a core part of a product, it likely relies on using them in the background like Nomadic Atlas. But it will have to be done well and with a high consideration for whether recall or precision is important to the product. Poorly implemented AI products that have high recall when high precision is needed or expected make the news in bad ways. Remember when Google’s integrated AI told its users that it was okay to put glue on their pizza? Conversely, there have not, to my knowledge, been any stories about someone not receiving the Google AI overview for a straightforward question.

Above all else, this experience taught me that AI is a tool that has upsides and downsides like any other tool. It’s not a panacea. The truth is that the introduction of ChatGPT convinced me that this project was now possible when it would have been extremely difficult, if not impossible, prior. This is why I decided to work on it.

The reality, however, is that the product ended up being a (hopefully) clever combination of a few different APIs, including Wikipedia and a few different mapping APIs, that all existed well before the introduction of LLMs. I definitely should have started there, as opposed to starting with AI.

While this isn’t intended to be a ChatGPT hit piece, the end result of Nomadic Atlas really taught me to start with high precision products as opposed to believing that AI can solve all my problems. I may use AI again in products, but only where high recall is acceptable and there are ways to keep its output in bounds. But for now, I’ll be starting most new projects without it.