## 535.Encode and Decode TinyURL

Note: This is a companion problem to the System Design problem:Design TinyURL.

TinyURL is a URL shortening service where you enter a URL such as`https://leetcode.com/problems/design-tinyurl`and it returns a short URL such as`http://tinyurl.com/4e9iAk`.

Design the`encode`and`decode`methods for the TinyURL service. There is no restriction on how your encode/decode algorithm should work. You just need to ensure that a URL can be encoded to a tiny URL and the tiny URL can be decoded to the original URL.

Thoughts: 1. link to algorithm discussed in 534. Design TinyURL :

Hash Function: long_url -> md5/sha1: a. md5 convert a string into 128 binary bits, generally represented as 16 bytes hex:
http://site.douban.com/chuan-> c93a360dc7f3eb093ab6e304db516653; b. sha1 convert a string into 160 binary bits, generally represented as 20 bytes hex: http://site.douban.com/chuan-> dff85871a72c73c3eae09e39ffe97aea63047094

1. These two algorithms could make sure hash values are randomly distributed, implementation-wise simple.

2. Cons: the conflicts are inevitable. Any hash algorithm could have inevitable conflicts.

3. Solutions: 1. use (long_url + timestamp) as the hash function key. 2. When conflicts, regenerates the hash value(it's different because timestamp changes).

Overall, when urls are over 1 billion, there would be a lot of conflicts and the efficiency could be very low.

1. Base62: Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
1. This implementation simply adopts the base-62 (26*2 letters + 10 digits) mapping to the URL string. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
2. Each short_url represent a decimal digit. It could be the auto_increment_id in SQL database.
``````// You can type code here and execute it.
#include<iostream>
#include<algorithm>
#include<string>
using namespace std;

string idToShortURL(long int test){
char map[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
string shorturl;
while(test){
shorturl.push_back(map[test%62]);
test = test/62;
}
reverse(shorturl.begin(), shorturl.end());
return shorturl;
}

long int shortURLtoID (string shortURL){
long int id = 0;
for (int i = 0; i < shortURL.length(); i++){
if ('a' <= shortURL[i] && shortURL[i]<='z')
id = id * 62 + shortURL[i] - 'a';
if ('A' <= shortURL[i] && shortURL[i] <= 'Z')
id = id * 62 + shortURL[i] - 'A' + 26;
if('0' <= shortURL[i] && shortURL[i]<='9')
id = id * 62 + shortURL[i] - '0' + 52;
}
return id;
}

// function
int main(int argc, char** argv) {
unordered_map<int, string> record;

for (int i = 1; i< argc; i++){
cout<<"original URL: "<<(argv[i])<<endl;
record[i] = (argv[i]);
}
for (int i = 1; i< argc; i++){

cout<<"test URL: "<<record[i]<<endl;
string shortURL = idToShortURL(i);
cout<<"generated URL: "<<shortURL<<endl;
cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
}

// int test = 66666;
// string shortURL = idToShortURL(test);
// cout<<"generated URL: "<<shortURL<<endl;
// cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
}
``````
1. Base 62 with Random Generator: by StefanPochmann's post

The following solution doesn't have these problems. It produces short URLs like`http://tinyurl.com/KtLa2U`, using a random code of six digits or letters. If a long URL is already known, the existing short URL is used and no new entry is generated.

``````class Codec:
alphabet = string.ascii_letters + '0123456789'
def __init__ (self):
self.long2short = {}
self.short2long = {}

def encode(self, longUrl):
"""Encodes a URL to a shortened URL.

:type longUrl: str
:rtype: str
"""
while longUrl not in self.long2short:
# randromly generate a code
shortUrl = ''.join(random.choice(Codec.alphabet) for _ in range(6))
# check the existence
if shortUrl not in self.short2long:
self.long2short[longUrl] = shortUrl
self.short2long[shortUrl] = longUrl
return 'http://tinyurl.com/' + shortUrl

def decode(self, shortUrl):
"""Decodes a shortened URL to its original URL.

:type shortUrl: str
:rtype: str
"""
return self.short2long[shortUrl[-6:]]

# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(url))
``````

It's possible that a randomly generated code has already been generated before. In that case, another random code is generated instead. Repeat until we have a code that's not already in use. How long can this take? Well, even if we get up to using half of the code space, which is a whopping 626/2 = 28,400,117,792 entries, then each code has a 50% chance of not having appeared yet. So the expected/average number of attempts is 2, and for example only one in a billion URLs takes more than 30 attempts. And if we ever get to an even larger number of entries and this does become a problem, then we can just use length 7. We'd need to anyway, as we'd be running out of available codes.