535.Encode and Decode TinyURL

Note: This is a companion problem to the System Design problem:Design TinyURL.

TinyURL is a URL shortening service where you enter a URL such ashttps://leetcode.com/problems/design-tinyurland it returns a short URL such ashttp://tinyurl.com/4e9iAk.

Design theencodeanddecodemethods for the TinyURL service. There is no restriction on how your encode/decode algorithm should work. You just need to ensure that a URL can be encoded to a tiny URL and the tiny URL can be decoded to the original URL.

Thoughts: 1. link to algorithm discussed in 534. Design TinyURL :

Hash Function: long_url -> md5/sha1: a. md5 convert a string into 128 binary bits, generally represented as 16 bytes hex:
http://site.douban.com/chuan-> c93a360dc7f3eb093ab6e304db516653; b. sha1 convert a string into 160 binary bits, generally represented as 20 bytes hex: http://site.douban.com/chuan-> dff85871a72c73c3eae09e39ffe97aea63047094

1. These two algorithms could make sure hash values are randomly distributed, implementation-wise simple.
2. Cons: the conflicts are inevitable. Any hash algorithm could have inevitable conflicts.
3. Solutions: 1. use (long_url + timestamp) as the hash function key. 2. When conflicts, regenerates the hash value(it's different because timestamp changes).
  
  Overall, when urls are over 1 billion, there would be a lot of conflicts and the efficiency could be very low.
Base62: Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
1. This implementation simply adopts the base-62 (26*2 letters + 10 digits) mapping to the URL string. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
2. Each short_url represent a decimal digit. It could be the auto_increment_id in SQL database.

// You can type code here and execute it.
#include<iostream>
#include<algorithm>
#include<string>
using namespace std;

string idToShortURL(long int test){
    char map[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    string shorturl;
    while(test){
        shorturl.push_back(map[test%62]);
        test = test/62;
    }
    reverse(shorturl.begin(), shorturl.end());
    return shorturl;
}

long int shortURLtoID (string shortURL){
    long int id = 0;
    for (int i = 0; i < shortURL.length(); i++){
        if ('a' <= shortURL[i] && shortURL[i]<='z')
            id = id * 62 + shortURL[i] - 'a';
        if ('A' <= shortURL[i] && shortURL[i] <= 'Z')
            id = id * 62 + shortURL[i] - 'A' + 26;
        if('0' <= shortURL[i] && shortURL[i]<='9')
            id = id * 62 + shortURL[i] - '0' + 52;
    }
    return id;
  }

// function 
int main(int argc, char** argv) {
    unordered_map<int, string> record;

    for (int i = 1; i< argc; i++){
         cout<<"original URL: "<<(argv[i])<<endl;
        record[i] = (argv[i]);
    }
    for (int i = 1; i< argc; i++){

        cout<<"test URL: "<<record[i]<<endl;
        string shortURL = idToShortURL(i);
        cout<<"generated URL: "<<shortURL<<endl;
        cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
    }

    // int test = 66666;
    // string shortURL = idToShortURL(test);
    // cout<<"generated URL: "<<shortURL<<endl;
    // cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
}

Base 62 with Random Generator: by StefanPochmann's post

The following solution doesn't have these problems. It produces short URLs likehttp://tinyurl.com/KtLa2U, using a random code of six digits or letters. If a long URL is already known, the existing short URL is used and no new entry is generated.

class Codec:
    alphabet = string.ascii_letters + '0123456789'
    def __init__ (self):
        self.long2short = {}
        self.short2long = {}

    def encode(self, longUrl):
        """Encodes a URL to a shortened URL.

        :type longUrl: str
        :rtype: str
        """
        while longUrl not in self.long2short:
            # randromly generate a code 
            shortUrl = ''.join(random.choice(Codec.alphabet) for _ in range(6))
            # check the existence
            if shortUrl not in self.short2long:
                self.long2short[longUrl] = shortUrl
                self.short2long[shortUrl] = longUrl
        return 'http://tinyurl.com/' + shortUrl

    def decode(self, shortUrl):
        """Decodes a shortened URL to its original URL.

        :type shortUrl: str
        :rtype: str
        """
        return self.short2long[shortUrl[-6:]]

# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(url))

It's possible that a randomly generated code has already been generated before. In that case, another random code is generated instead. Repeat until we have a code that's not already in use. How long can this take? Well, even if we get up to using half of the code space, which is a whopping 626/2 = 28,400,117,792 entries, then each code has a 50% chance of not having appeared yet. So the expected/average number of attempts is 2, and for example only one in a billion URLs takes more than 30 attempts. And if we ever get to an even larger number of entries and this does become a problem, then we can just use length 7. We'd need to anyway, as we'd be running out of available codes.

535. Encode and Decode TinyURL

535.Encode and Decode TinyURL

results matching ""

No results matching ""