Universal String Enumeration

Initializing live version
Download to Desktop

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

This Demonstration shows a universal enumeration of all strings, of any length and any number of symbols. The sample button shows a series of the strings, together with their indices and weights. The other buttons show details of the algorithm for the same sample, or allow selection of an index or string and then generate the corresponding string or index, respectively.

Contributed by: Kenneth E. Caviness (March 2011)
Open content licensed under CC BY-NC-SA


Snapshots


Details

It is not hard to create a list of strings of some particular length and number of characters used. A more complicated enumeration specifies not only the number of characters but allows all possible lengths [1]. Here a method is shown for listing all strings of all possible lengths and all possible characters (or symbols).

Snapshot 1: the "sample with details" button gives a series of indices with associated strings (beginning with the index selected by the slider), shown with details—the thumbnail shows the same view without the details

Snapshot 2: in "index -> string" mode, you select an index by using the slider, and the string generated by the function UnrankString is shown: index 43 gives the string "ABC"

Snapshot 3: in "string -> index" mode, you can select a letter of the alphabet and then a string from the dropdown list, and the RankString function returns the index of that string: here is the string "MATHEMATICA" and its index

The enumeration algorithm is based on integer compositions: there are ways of writing as the sum of positive integers (where order counts). Since the "weight" of a string is here simply the sum of the character weights (with A having weight 1, B weight 2, etc.), there are then strings of weight . The index is used both to select the weight of the string and to indicate which integer composition is intended. First we find , the largest power of 2 that is less than the index. This is the offset to be subtracted from the index to give the reduced index, which is then treated as a binary number whose bits signal the presence of a comma or a plus sign between the 1s in the "meaning"). Finally the string (which by construction will have weight ) is generated from the integer composition: 1 represents the first character, 2 the second, and so on. This method first produces all strings of weight 1, then all strings of weight 2, and so on. Short strings with characters near the beginning of the alphabet appear early in this enumeration.

Although the functions here are limited to the uppercase characters A–Z, assuming access to some universal character list (not necessarily finite, merely countable) all lists of all strings of all characters will appear in the string enumeration.

References:

[1] T. Rowland, "Enumerating Strings," The NKS Forum.

[2] "Composition (number theory)" on Wikipedia (last modified on 28 January 2010).



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send