Wayback Machine archives websites for over 20 years

Print More

Have you ever wished you could hop in an Internet time machine to see how the web or specific websites have developed over the years?

Kelly Stec, an MSU doctoral student studying education policy, researches how political think tanks’ stances on legislation has changed throughout time.

Stec, who is also a legislative assistant at the Michigan State House of Representatives, knew that in order to study this evolution, she needed to dig around the internet to find out how lawmakers felt about bills when they were initially introduced. There was a problem though: what if lawmakers deleted posts that they changed their mind about?

Lucky for Stec, her advisor recommended she check out a website called the Wayback Machine to help with her research.

A nonprofit founded in 1996, the Wayback Machine works to preserve most of what’s publically available online, Mark Graham, director of the Wayback Machine, said.

It does this by taking “snapshots” of millions of pages every week and conducting “crawls” of these pages as well, he said.

This means someone using the Wayback Machine could look at the evolution of a webpage over time, including the evolution of advertisements on a given page.

Whether it’s a platform like YouTube or a government website, the Wayback Machine can be used to monitor changes in the look of a website as well as the content. Stec uses the Wayback Machine consistently to monitor changes in opinion regarding policy in Michigan.

Stec said, “A couple of the conservative think tanks have kind of changed their tune about [the bills] over the years, so we’re looking back to see what they were saying when these cases were first breaking.”

Since the Wayback Machine catalogs what’s publically available online, it can catalog some social media profiles as well.

Someone whose social media profile is set to “private” mode wouldn’t find their photos, posts, etc. on the Wayback Machine though, said Ken Birman, a computer science professor at Cornell.

Stec said she hadn’t used Wayback Machine for any purposes outside of research for her project, and said she would be surprised to find one of her social media profiles on the Wayback Machine.

“I wouldn’t have thought about it,” Stec said.

Stec looked herself up on the Wayback Machine and was pleased that she didn’t find her social profiles. She did find that her sister’s Twitter account had a few snapshots taken, but when she clicked on them it was just code.

USING THE WAYBACK MACHINE

Users can use the Wayback Machine to take snapshots of and monitor a website, like someone’s Twitter account or another website they’d like to archive.

Graham said people can create an account to save URLs of their choice.

He said it’s important to note that saving a page is different than starting a crawl.

Wayback Machine crawl

Dozens of people every second are submitting URLs to save page now, Graham said.

Saving a page allows people to take a snapshot of a given page on the day the URL is saved. Graham said starting a crawl allows people to crawl through a page and its links, depending on the settings of the crawl, like a spider crawling around its web.

Graham said he’s currently working on crawls of 250,000,000+ URLs, and the Wayback Machine as a whole “archives more than a billion URLs a week,” he said.

Some crawls, in theory, could last forever.

“You can go as deep as you can … They could go on forever,” Graham said. “Some crawls run every hour, every day, week, or every few months.”

This is because there are certain rules that are established in its settings like saving the URL, determining how many links to follow and crawl through, determining whether or not to look at pages the links lead to, etc.

Crawls can consist of one URL and all of its elements, which would archive all elements of the page including links, but not following them, Graham said.

Crawls can also follow the links to other pages and archive the elements from that page, and continue the process for as long as you determine.

Stec said the Wayback Machine has been an extremely helpful tool throughout her researching process.

“I think it’s useful while it’s in the right hands,” Stec said. “As a researcher, it’s sort of invaluable. I like knowing what policy think tanks have done and said and how they’ve progressed. Both for people who agree with me and people who don’t.”

She said more information is available now than ever before, thanks to resources like the Wayback Machine.

“I think it’s helpful to be able to go back and see how people’s thoughts have changed and developed over time is fascinating, if questionably ethical,” she said.

Regarding whether or not it’s ethical to use the Wayback Machine to see public-facing social media profiles, Stec said she thinks if it’s used to help companies like Facebook place ads better, it could yield better experiences for users seeing those ads.

“I think psychographic marketing is brilliant,” Stec said. “And I think I’m certainly going to get ads that are targeted to things that I actually care about than if they were just saying, ‘24-year-old white female.’”

Psychographic segmentation categorizes your customers by their personality, interests and other factors, according to Directive Group’s website, a marketing company based in Tampa, FL.

Stec had mixed feelings when it came to social media profiles popping up on the Wayback Machine, though.

“With my Twitter, I certainly don’t mind because it’s public. I left everything public because I want that snapshot, I don’t care if other people see that snapshot,” Stec said. “Facebook on the other hand, I’ve been so diligent about my privacy settings since I first had it, because I mean I was really coming of age when social media was.”

She said she’s glad her privacy settings have held up well enough to not be detected on the Wayback Machine.