Title: Information per Area of Numerical Forms
Subtitle: Quantifying the Compactness of Symbolisation for Bases
Recently (①⑥th December ①②⓪⑦) on the DozensOnline website forum, a video on binary as a better base than base six or decimal was linked to. In chapter zero of that video, the following claim was narrated:
At the time 9:52 in the video there is a picture of the numerals with this claim:
The binary digits have been grouped there to form octal digits such that there do not appear to be more octal digits than decimal digits in the numbers, because the ratio \( \frac{\ln{10}}{\ln{8}} \) of the logarithms of ten and eight is close enough to one for the extra octal digits not to be noticeable for such small numbers as those that were used in the demonstration. However, the octal digits formed as linked binary digits only appear to be about the same width as the decimal digits in the image because the decimal digits were drawn so large. If the decimal digits had been shown at a size with similar thickness of their lines, internal spaces, and spaces between digits, then the octal digits would have been about one and a half times wider. So it was not a fair comparison.
Quantification of Information per Area
Pixel of Dot Matrix Grids
The smallest area containing information is a single square pixel. The number of possible characters that can be selected from to fill the pixel is two. Thus the number of available characters per square unit of area is two. To convert this number two to a score of one, its logarithm to the base two is taken. This kind of representation of information is that used in QR codes. However, while this method of representing information is dense, it is not very readable by a human, because a human is more reliant on using relative than absolute features for distinguishing and identifying shapes. The number of pixels of the same condition, state, or colour without spaces between them appears only as an absolute length, which to a human can appear distorted by the distance or inclination of the display.
Plotted Graphs
The proposed numerals for base two in the video used the relative lengths of lines to distinguish zeros from ones, with a shorter vertical line for zero and a vertical line twice as long for one. A minimum of two units of height are required to distinguish these vertical lines from each other, while one of width is required as empty space to separate the lines of different digits side-by-side from each other. The minimum number of square areas to contain one such binary digit therefore is two times one equaling two. The number of possible different characters or digits to select from per two units of area is two. I could call this form for numerals the simple linear binary form.
To convert this into a score comparable to that for pixel grids, it must be taken into account that a plotted form in a single square with just the four corners as vertices between plotted lines is capable of containing four bits of information excluding two edges forming the next squares to one side and on the next line below, by \(^{4}C_{2} - 2\). To get the amount of information per square unit of area, it is necessary not to divide the number of possible characters by the number of units of area, but to raise the number of characters to the power of the reciprocal of the number of square units of area, because of how the information in permuted squares combines multiplicatively rather than additively. So, considering a single square with four possible bits, the quartic root of the fourth power of two is taken before the logarithm to the base two is computed. This produces a score of one for this square, which has no empty spaces between it and other squares. It is the maximum amount of information that can be plotted per area in an isolated primitive square. However, a human does not readily read characters that are not separated by empty space between them.
The formula for working out the score of information per unit of area is
\[ \log_{2}{B^{1/(4m)}} \]
where \(B\) is the number of possible characters or digits, and \(m\) is the number of units of area per character or digit, including the area of empty space necessary to separate characters.
Applying this method of calculation to the linear designs for binary digits of the video, two bits per two squares of area leads to a score of \(\log_{2}{2^{1/(4*2)}} = 1/8\) = ⓪⁏①⑥. It can be seen that this score is relatively low. This means that the method of representing binary power digits in the video uses much more space than necessary to represent information. This is because the proposal in the video did not use diagonal lines of which plotted graphs are capable.
A form with diagonal edges as well as vertical or horizontal edges for bits with a higher score of information per square unit of area for characters that also has empty space included between it and other graphs of the same form would be a square with up to \(^{4}C_{2}=6\) bits and up to \(2^{6}\) possible characters or numerals in an area of four unit squares including the graph itself and the three empty squares beside and under the graph. This form has a score of \( \log_{2}{2^{6/(4*4)}} = 3/8\) = ⓪⁏④⑥. This is a better score than for the proposal in the video, but it implies a base of eight squared for the maximum score for this form. The score for this form would be less if not all of its possible glyphs are being used such that the base would be less than eight squared. The number of glyphs being used and the base to make the score for this form be not less than that of the simple linear binary form would be four. This square form appears to be the best to use if the base is between four and the square of eight. I may call this the simple square form for plotting.
If the number of possible characters or the numerical base desired is larger than the sixth power of two, then a larger form would be required than that of the simple square. A rectangle of two squares one above the other and beside and above empty spaces before the next character or line yielding a total of six square units of area per numeral has a maximum score of \( \log_{2}{2^{(^{6}C_{2} - 2)/(4*6)}} \) = ①①/②⓪ = ⓪⁏⑥⑥ if all of its possible glyphs are used. I can call this the rectangular form. It needs to use at least the ninth power of two of its bits in order to have a score higher than the maximum score of the simple square form. This can be achieved by using only the horizontal and vertical bits and the two long diagonal bits of this rectangular form. This is equivalent to the familiar modular display form of seven segments supplemented with two long diagonal segments passing through the centre of the glyph form between the corners of the rectangle. The maximum number of characters from the full rectangular form is ④⑧⑩⑧⁏. The score of the rectangular form will be as good as or better than that of the simple binary linear form if the base is at least as large as octal.
If more characters are required, the larger glyph form of a square of nine vertices can be used. This glyph form is the size of four simple square glyph forms, and including its empty square spaces to separate it from adjacent glyphs, it occupies an area of nine square units. Its maximum score is \( \log_{2}{2^{(^{9}C_{2} - 8 ) /(4*9)} } \) = 7/9 = ⓪⁏⑨④. This compound square form needs to use at least twenty of its bits, producing more than a million possible glyphs, in order to achieve a score better than that of the rectangular form. From this I conclude that the larger compound square form is not effective for compact representation of information unless a very large number of characters is required. Nevertheless, its score will be better than that of the simple binary linear form if the base is greater than a dozen plus eleven. The first practical base beyond that size is the double dozen.
Another form than the rectilinear ones may be considered. A hexagonal glyph form of six perimeter vertices and one central vertex in a unit cell with a ratio of three to two for external space to internal space may be analysed approximately as composed of as much as two ligatured simple glyphs worth of internal area because of the number of their segments. Its maximum score would be \( \log_{2}{2^{(^{7}C_{2} - 3)/(4*5)} } = 0.9\).
In these calculations of the scores for glyph forms and bases, the base \(B\) is two raised to the power of the number of bits per character. The number of available bits per character from a glyph form is the subtraction of the number of collinear pairs of bits away from the number of ways of choosing edges of two vertices from the number of vertices of the graph.
Discussion
The larger compound square form is comparable to the Chinese characters in complexity. While it is true to say that numerals of this complexity should not be used for binary digits, the claim of the use of Indo-Arabic digits for binary being comparable to use of Chinese characters for an alphabet would be an exaggeration. While the simple binary linear form may be better than other forms for true binary, this would not mean that this representation would be anywhere near as compact as the Indo-Arabic digits with a sufficiently larger base according to my scoring method. Thus, the linear binary representation is not as informationally compact as human scale bases using alphanumeric numerals.
Suitable glyph forms for the numerals of bases by the scoring method deployed here include the rectangular form of which the seven segment modular display form for the Indo-Arabic digits is a subset. The hexagonal form would also be a compact template for designing characters. Features of subsets of both the rectangular and hexagonal forms have appeared before in proposals for numerals, where the angles between segments at the centre of the numerals were sixths of a turn, and the numerals could be decomposed or analysed each into two graphs placed and conjoined one above the other as a double-storey figure. The hexagrams for a binary power base have similarities to the simple binary linear form. It appears that the rectangular forms on which the Western alphanumeric characters are based are about ideal for compactly representing information.
References:
Subtitle: Quantifying the Compactness of Symbolisation for Bases
Recently (①⑥th December ①②⓪⑦) on the DozensOnline website forum, a video on binary as a better base than base six or decimal was linked to. In chapter zero of that video, the following claim was narrated:
kepe wrote:"Maybe the problem isn't with binary itself but with a specific notation of binary. [...] The most common choice are [sic] the Hindu-Arabic numerals. [...] But for binary it makes absolutely no sense to do this. The two binary digits only need to be distinct from each other, not from eight other unused symbols. This is the equivalent of using Chinese characters to write English, but by just picking twenty-six characters to substitute for Latin letters. [...] Binary digits are only worth one bit of information. Their shapes can be designed much simpler and much thinner, say two vertical bars: low for zero, high for one. Now the comparisons between big numbers in decimal and in binary seem a whole lot more reasonable."
At the time 9:52 in the video there is a picture of the numerals with this claim:
kepe wrote:"it's these digits that we can compare to measure number length."
The binary digits have been grouped there to form octal digits such that there do not appear to be more octal digits than decimal digits in the numbers, because the ratio \( \frac{\ln{10}}{\ln{8}} \) of the logarithms of ten and eight is close enough to one for the extra octal digits not to be noticeable for such small numbers as those that were used in the demonstration. However, the octal digits formed as linked binary digits only appear to be about the same width as the decimal digits in the image because the decimal digits were drawn so large. If the decimal digits had been shown at a size with similar thickness of their lines, internal spaces, and spaces between digits, then the octal digits would have been about one and a half times wider. So it was not a fair comparison.
Quantification of Information per Area
Pixel of Dot Matrix Grids
The smallest area containing information is a single square pixel. The number of possible characters that can be selected from to fill the pixel is two. Thus the number of available characters per square unit of area is two. To convert this number two to a score of one, its logarithm to the base two is taken. This kind of representation of information is that used in QR codes. However, while this method of representing information is dense, it is not very readable by a human, because a human is more reliant on using relative than absolute features for distinguishing and identifying shapes. The number of pixels of the same condition, state, or colour without spaces between them appears only as an absolute length, which to a human can appear distorted by the distance or inclination of the display.
Plotted Graphs
The proposed numerals for base two in the video used the relative lengths of lines to distinguish zeros from ones, with a shorter vertical line for zero and a vertical line twice as long for one. A minimum of two units of height are required to distinguish these vertical lines from each other, while one of width is required as empty space to separate the lines of different digits side-by-side from each other. The minimum number of square areas to contain one such binary digit therefore is two times one equaling two. The number of possible different characters or digits to select from per two units of area is two. I could call this form for numerals the simple linear binary form.
To convert this into a score comparable to that for pixel grids, it must be taken into account that a plotted form in a single square with just the four corners as vertices between plotted lines is capable of containing four bits of information excluding two edges forming the next squares to one side and on the next line below, by \(^{4}C_{2} - 2\). To get the amount of information per square unit of area, it is necessary not to divide the number of possible characters by the number of units of area, but to raise the number of characters to the power of the reciprocal of the number of square units of area, because of how the information in permuted squares combines multiplicatively rather than additively. So, considering a single square with four possible bits, the quartic root of the fourth power of two is taken before the logarithm to the base two is computed. This produces a score of one for this square, which has no empty spaces between it and other squares. It is the maximum amount of information that can be plotted per area in an isolated primitive square. However, a human does not readily read characters that are not separated by empty space between them.
The formula for working out the score of information per unit of area is
\[ \log_{2}{B^{1/(4m)}} \]
where \(B\) is the number of possible characters or digits, and \(m\) is the number of units of area per character or digit, including the area of empty space necessary to separate characters.
Applying this method of calculation to the linear designs for binary digits of the video, two bits per two squares of area leads to a score of \(\log_{2}{2^{1/(4*2)}} = 1/8\) = ⓪⁏①⑥. It can be seen that this score is relatively low. This means that the method of representing binary power digits in the video uses much more space than necessary to represent information. This is because the proposal in the video did not use diagonal lines of which plotted graphs are capable.
A form with diagonal edges as well as vertical or horizontal edges for bits with a higher score of information per square unit of area for characters that also has empty space included between it and other graphs of the same form would be a square with up to \(^{4}C_{2}=6\) bits and up to \(2^{6}\) possible characters or numerals in an area of four unit squares including the graph itself and the three empty squares beside and under the graph. This form has a score of \( \log_{2}{2^{6/(4*4)}} = 3/8\) = ⓪⁏④⑥. This is a better score than for the proposal in the video, but it implies a base of eight squared for the maximum score for this form. The score for this form would be less if not all of its possible glyphs are being used such that the base would be less than eight squared. The number of glyphs being used and the base to make the score for this form be not less than that of the simple linear binary form would be four. This square form appears to be the best to use if the base is between four and the square of eight. I may call this the simple square form for plotting.
If the number of possible characters or the numerical base desired is larger than the sixth power of two, then a larger form would be required than that of the simple square. A rectangle of two squares one above the other and beside and above empty spaces before the next character or line yielding a total of six square units of area per numeral has a maximum score of \( \log_{2}{2^{(^{6}C_{2} - 2)/(4*6)}} \) = ①①/②⓪ = ⓪⁏⑥⑥ if all of its possible glyphs are used. I can call this the rectangular form. It needs to use at least the ninth power of two of its bits in order to have a score higher than the maximum score of the simple square form. This can be achieved by using only the horizontal and vertical bits and the two long diagonal bits of this rectangular form. This is equivalent to the familiar modular display form of seven segments supplemented with two long diagonal segments passing through the centre of the glyph form between the corners of the rectangle. The maximum number of characters from the full rectangular form is ④⑧⑩⑧⁏. The score of the rectangular form will be as good as or better than that of the simple binary linear form if the base is at least as large as octal.
If more characters are required, the larger glyph form of a square of nine vertices can be used. This glyph form is the size of four simple square glyph forms, and including its empty square spaces to separate it from adjacent glyphs, it occupies an area of nine square units. Its maximum score is \( \log_{2}{2^{(^{9}C_{2} - 8 ) /(4*9)} } \) = 7/9 = ⓪⁏⑨④. This compound square form needs to use at least twenty of its bits, producing more than a million possible glyphs, in order to achieve a score better than that of the rectangular form. From this I conclude that the larger compound square form is not effective for compact representation of information unless a very large number of characters is required. Nevertheless, its score will be better than that of the simple binary linear form if the base is greater than a dozen plus eleven. The first practical base beyond that size is the double dozen.
Another form than the rectilinear ones may be considered. A hexagonal glyph form of six perimeter vertices and one central vertex in a unit cell with a ratio of three to two for external space to internal space may be analysed approximately as composed of as much as two ligatured simple glyphs worth of internal area because of the number of their segments. Its maximum score would be \( \log_{2}{2^{(^{7}C_{2} - 3)/(4*5)} } = 0.9\).
In these calculations of the scores for glyph forms and bases, the base \(B\) is two raised to the power of the number of bits per character. The number of available bits per character from a glyph form is the subtraction of the number of collinear pairs of bits away from the number of ways of choosing edges of two vertices from the number of vertices of the graph.
Discussion
The larger compound square form is comparable to the Chinese characters in complexity. While it is true to say that numerals of this complexity should not be used for binary digits, the claim of the use of Indo-Arabic digits for binary being comparable to use of Chinese characters for an alphabet would be an exaggeration. While the simple binary linear form may be better than other forms for true binary, this would not mean that this representation would be anywhere near as compact as the Indo-Arabic digits with a sufficiently larger base according to my scoring method. Thus, the linear binary representation is not as informationally compact as human scale bases using alphanumeric numerals.
Suitable glyph forms for the numerals of bases by the scoring method deployed here include the rectangular form of which the seven segment modular display form for the Indo-Arabic digits is a subset. The hexagonal form would also be a compact template for designing characters. Features of subsets of both the rectangular and hexagonal forms have appeared before in proposals for numerals, where the angles between segments at the centre of the numerals were sixths of a turn, and the numerals could be decomposed or analysed each into two graphs placed and conjoined one above the other as a double-storey figure. The hexagrams for a binary power base have similarities to the simple binary linear form. It appears that the rectangular forms on which the Western alphanumeric characters are based are about ideal for compactly representing information.
References:
- https://www.tapatalk.com/groups/dozensonline/new-video-advocating-binary-t2424.html?sid=581ddfdcfbd8f16dba6d3280bba79e6d
- https://www.youtube.com/watch?v=rDDaEVcwIJM
Video title: "the best way to count"
Publication date: "Premiered Dec 16, 2023" - https://www.tapatalk.com/groups/dozensonline/senary-numerals-t1644.html
①③th February, ①②⓪①⁏.
Sat Sep 07, 2024 8:10 pm by Phaethon
» Twelve Metal Colossal Statues
Mon Sep 02, 2024 4:48 pm by Phaethon
» Dozenal Point
Thu Aug 29, 2024 2:01 pm by Phaethon
» Quantum Mechanics and the Principle of Least Radix Economy
Sat Jun 29, 2024 5:15 pm by Phaethon
» Phonetic Dozenal Number Names
Mon Apr 15, 2024 12:08 am by Phaethon
» Dozenal Number Words from Metric Prefixes
Sat Apr 13, 2024 3:38 pm by Phaethon
» Dozenalizing Metric
Fri Apr 05, 2024 12:23 pm by Phaethon
» Myon Dozenal Nomenclature
Sat Feb 17, 2024 3:18 pm by Phaethon
» Information per Area of Numerical Forms
Mon Jan 29, 2024 10:50 am by Phaethon