535.Encode and Decode TinyURL

Note: This is a companion problem to the System Design problem:Design TinyURL.

TinyURL is a URL shortening service where you enter a URL such ashttps://leetcode.com/problems/design-tinyurland it returns a short URL such ashttp://tinyurl.com/4e9iAk.

Design theencodeanddecodemethods for the TinyURL service. There is no restriction on how your encode/decode algorithm should work. You just need to ensure that a URL can be encoded to a tiny URL and the tiny URL can be decoded to the original URL.

Thoughts: 1. link to algorithm discussed in 534. Design TinyURL :

Hash Function: long_url -> md5/sha1: a. md5 convert a string into 128 binary bits, generally represented as 16 bytes hex:
http://site.douban.com/chuan-> c93a360dc7f3eb093ab6e304db516653; b. sha1 convert a string into 160 binary bits, generally represented as 20 bytes hex: http://site.douban.com/chuan-> dff85871a72c73c3eae09e39ffe97aea63047094

    1. These two algorithms could make sure hash values are randomly distributed, implementation-wise simple.

    2. Cons: the conflicts are inevitable. Any hash algorithm could have inevitable conflicts.

    3. Solutions: 1. use (long_url + timestamp) as the hash function key. 2. When conflicts, regenerates the hash value(it's different because timestamp changes).

      Overall, when urls are over 1 billion, there would be a lot of conflicts and the efficiency could be very low.

  1. Base62: Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
    1. This implementation simply adopts the base-62 (26*2 letters + 10 digits) mapping to the URL string. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
    2. Each short_url represent a decimal digit. It could be the auto_increment_id in SQL database.
// You can type code here and execute it.
#include<iostream>
#include<algorithm>
#include<string>
using namespace std;

string idToShortURL(long int test){
    char map[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    string shorturl;
    while(test){
        shorturl.push_back(map[test%62]);
        test = test/62;
    }
    reverse(shorturl.begin(), shorturl.end());
    return shorturl;
}

long int shortURLtoID (string shortURL){
    long int id = 0;
    for (int i = 0; i < shortURL.length(); i++){
        if ('a' <= shortURL[i] && shortURL[i]<='z')
            id = id * 62 + shortURL[i] - 'a';
        if ('A' <= shortURL[i] && shortURL[i] <= 'Z')
            id = id * 62 + shortURL[i] - 'A' + 26;
        if('0' <= shortURL[i] && shortURL[i]<='9')
            id = id * 62 + shortURL[i] - '0' + 52;
    }
    return id;
  }

// function 
int main(int argc, char** argv) {
    unordered_map<int, string> record;

    for (int i = 1; i< argc; i++){
         cout<<"original URL: "<<(argv[i])<<endl;
        record[i] = (argv[i]);
    }
    for (int i = 1; i< argc; i++){

        cout<<"test URL: "<<record[i]<<endl;
        string shortURL = idToShortURL(i);
        cout<<"generated URL: "<<shortURL<<endl;
        cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
    }

    // int test = 66666;
    // string shortURL = idToShortURL(test);
    // cout<<"generated URL: "<<shortURL<<endl;
    // cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
}
  1. Base 62 with Random Generator: by StefanPochmann's post

    The following solution doesn't have these problems. It produces short URLs likehttp://tinyurl.com/KtLa2U, using a random code of six digits or letters. If a long URL is already known, the existing short URL is used and no new entry is generated.

class Codec:
    alphabet = string.ascii_letters + '0123456789'
    def __init__ (self):
        self.long2short = {}
        self.short2long = {}

    def encode(self, longUrl):
        """Encodes a URL to a shortened URL.

        :type longUrl: str
        :rtype: str
        """
        while longUrl not in self.long2short:
            # randromly generate a code 
            shortUrl = ''.join(random.choice(Codec.alphabet) for _ in range(6))
            # check the existence
            if shortUrl not in self.short2long:
                self.long2short[longUrl] = shortUrl
                self.short2long[shortUrl] = longUrl
        return 'http://tinyurl.com/' + shortUrl

    def decode(self, shortUrl):
        """Decodes a shortened URL to its original URL.

        :type shortUrl: str
        :rtype: str
        """
        return self.short2long[shortUrl[-6:]]

# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(url))

It's possible that a randomly generated code has already been generated before. In that case, another random code is generated instead. Repeat until we have a code that's not already in use. How long can this take? Well, even if we get up to using half of the code space, which is a whopping 626/2 = 28,400,117,792 entries, then each code has a 50% chance of not having appeared yet. So the expected/average number of attempts is 2, and for example only one in a billion URLs takes more than 30 attempts. And if we ever get to an even larger number of entries and this does become a problem, then we can just use length 7. We'd need to anyway, as we'd be running out of available codes.

results matching ""

    No results matching ""