JAVA学习随笔(1)--int转各进制的字符串

开始在看JAVA语言。看到源码里有个计算int整数对应的最高位1所在的位置代码,感觉代码写的很神奇,记录一下,可以反复品味下。
前提是:int固定为32位,有点半分递归查找的味道,不断缩小统计范围,硬编码的问题,感觉可以解决下,主要是看到硬编码就下意识的想规避,也不知道这是不是个好习惯。。。

    public static int numberOfLeadingZeros(int i) {// HD, Figure 5-6if (i == 0)return 32;int n = 1;if (i >>> 16 == 0) { n += 16; i <<= 16; }if (i >>> 24 == 0) { n +=  8; i <<=  8; }if (i >>> 28 == 0) { n +=  4; i <<=  4; }if (i >>> 30 == 0) { n +=  2; i <<=  2; }n -= i >>> 31;return n;}

然后看了下toUnsignedString函数,作用是int转换成对应的进制的字符串表示,先上代码:

   public static String toUnsignedString(long i, int radix) {if (i >= 0)return toString(i, radix);else {switch (radix) {case 2:return toBinaryString(i);case 4:return toUnsignedString0(i, 2);case 8:return toOctalString(i);case 10:/** We can get the effect of an unsigned division by 10* on a long value by first shifting right, yielding a* positive value, and then dividing by 5.  This* allows the last digit and preceding digits to be* isolated more quickly than by an initial conversion* to BigInteger.*/long quot = (i >>> 1) / 5;long rem = i - quot * 10;return toString(quot) + rem;case 16:return toHexString(i);case 32:return toUnsignedString0(i, 5);default:return toUnsignedBigInteger(i).toString(radix);}}}

radix要求是2–36之间,因为定义Character.MIN_RADIX为2,Character.MAX_RADIX为36。
当i >= 0 的时候,进入toString函数,看看toString这个函数是干嘛的:

    /*** Returns a string representation of the first argument in the* radix specified by the second argument.** <p>If the radix is smaller than {@code Character.MIN_RADIX}* or larger than {@code Character.MAX_RADIX}, then the radix* {@code 10} is used instead.** <p>If the first argument is negative, the first element of the* result is the ASCII minus sign {@code '-'}* ({@code '\u005Cu002d'}). If the first argument is not* negative, no sign character appears in the result.** <p>The remaining characters of the result represent the magnitude* of the first argument. If the magnitude is zero, it is* represented by a single zero character {@code '0'}* ({@code '\u005Cu0030'}); otherwise, the first character of* the representation of the magnitude will not be the zero* character.  The following ASCII characters are used as digits:** <blockquote>*   {@code 0123456789abcdefghijklmnopqrstuvwxyz}* </blockquote>** These are {@code '\u005Cu0030'} through* {@code '\u005Cu0039'} and {@code '\u005Cu0061'} through* {@code '\u005Cu007a'}. If {@code radix} is* <var>N</var>, then the first <var>N</var> of these characters* are used as radix-<var>N</var> digits in the order shown. Thus,* the digits for hexadecimal (radix 16) are* {@code 0123456789abcdef}. If uppercase letters are* desired, the {@link java.lang.String#toUpperCase()} method may* be called on the result:** <blockquote>*  {@code Long.toString(n, 16).toUpperCase()}* </blockquote>** @param   i       a {@code long} to be converted to a string.* @param   radix   the radix to use in the string representation.* @return  a string representation of the argument in the specified radix.* @see     java.lang.Character#MAX_RADIX* @see     java.lang.Character#MIN_RADIX*/public static String toString(long i, int radix) {if (radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)radix = 10;if (radix == 10)
            return toString(i);char[] buf = new char[65];int charPos = 64;boolean negative = (i < 0);if (!negative) {i = -i;}while (i <= -radix) {buf[charPos--] = Integer.digits[(int)(-(i % radix))];i = i / radix;}buf[charPos] = Integer.digits[(int)(-i)];if (negative) {buf[--charPos] = '-';}
return new String(buf, charPos, (65 - charPos));}

如果进制数非法的话,直接转成十进制对应的字符串。否则定义一个长度65*2字节的临时buffer(没记错的话java的char是Unicode,2字节),判断int的正负性并标记转换(跟之前看的itoa源码有点像哈,不过这里是把int转成负数),然后从后到前依次转换进制并存入char数组
这和Solaris的itoa源码好像:
但是为啥要用负数来转换呢?难道是为了负数转正数可能的溢出吗?(因为32位int的范围为-2^31 — 2^31 - 1)

while (i <= -radix) {buf[charPos--] = Integer.digits[(int)(-(i % radix))];i = i / radix;}//digits是个对应转换后的字符索引数组:   /**
* All possible chars for representing a number as a String
*/
final static char[] digits = {'0' , '1' , '2' , '3' , '4' , '5' ,'6' , '7' , '8' , '9' , 'a' , 'b' ,'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,'o' , 'p' , 'q' , 'r' , 's' , 't' ,'u' , 'v' , 'w' , 'x' , 'y' , 'z'
};

再回到toUnsignedString函数,当i为负数,转换成2进制时,进入toBinaryString函数:

    /*** Returns a string representation of the {@code long}* argument as an unsigned integer in base&nbsp;2.** <p>The unsigned {@code long} value is the argument plus* 2<sup>64</sup> if the argument is negative; otherwise, it is* equal to the argument.  This value is converted to a string of* ASCII digits in binary (base&nbsp;2) with no extra leading* {@code 0}s.** <p>The value of the argument can be recovered from the returned* string {@code s} by calling {@link* Long#parseUnsignedLong(String, int) Long.parseUnsignedLong(s,* 2)}.** <p>If the unsigned magnitude is zero, it is represented by a* single zero character {@code '0'} ({@code '\u005Cu0030'});* otherwise, the first character of the representation of the* unsigned magnitude will not be the zero character. The* characters {@code '0'} ({@code '\u005Cu0030'}) and {@code* '1'} ({@code '\u005Cu0031'}) are used as binary digits.** @param   i   a {@code long} to be converted to a string.* @return  the string representation of the unsigned {@code long}*          value represented by the argument in binary (base&nbsp;2).* @see #parseUnsignedLong(String, int)* @see #toUnsignedString(long, int)* @since   JDK 1.0.2*/public static String toBinaryString(long i) {
        return toUnsignedString0(i, 1);}

这时直接调用toUnsignedString0(i, 1)。
当radix为4时,进入toUnsignedString0(i, 2),又是这个函数,待会咱们再看这个函数到底是何方神圣~
当radix为8时,进入toOctalString(i),一看名字就是转成8进制:

    public static String toOctalString(long i) {return toUnsignedString0(i, 3);}

又是toUnsignedString0。。。
当radix为10时:
这个没怎么看懂,先放一下,做个标记,,强调内容,,,,,

            case 10:/** We can get the effect of an unsigned division by 10* on a long value by first shifting right, yielding a* positive value, and then dividing by 5.  This* allows the last digit and preceding digits to be* isolated more quickly than by an initial conversion* to BigInteger.*/long quot = (i >>> 1) / 5;long rem = i - quot * 10;return toString(quot) + rem;

转16进制是toHexString(i):

    public static String toHexString(long i) {return toUnsignedString0(i, 4);}

转32进制是toUnsignedString0(i, 5)。
其他是toUnsignedBigInteger(i).toString(radix)。

看toUnsignedString0(int i,int radix)函数:
2,4,8,16,32进制对应的radix是1,2,3,4,5.对应的是2的多少次方,后面会用到。而且2进制为1个bit,4进制为2个bit,8进制3个bit

    /*** Format a long (treated as unsigned) into a String.* @param val the value to format* @param shift the log2 of the base to format in (4 for hex, 3 for octal, 1 for binary)*/static String toUnsignedString0(long val, int shift) {// assert shift > 0 && shift <=5 : "Illegal shift value";int mag = Long.SIZE - Long.numberOfLeadingZeros(val);int chars = Math.max(((mag + (shift - 1)) / shift), 1);char[] buf = new char[chars];formatUnsignedLong(val, shift, buf, 0, chars);return new String(buf, true);}

先算出二进制最高位1所在的index

int mag = Long.SIZE - Long.numberOfLeadingZeros(val);public static int numberOfLeadingZeros(int i) {// HD, Figure 5-6if (i == 0)return 32;int n = 1;if (i >>> 16 == 0) { n += 16; i <<= 16; }if (i >>> 24 == 0) { n +=  8; i <<=  8; }if (i >>> 28 == 0) { n +=  4; i <<=  4; }if (i >>> 30 == 0) { n +=  2; i <<=  2; }n -= i >>> 31;return n;

然后算转换成对应的radix的数的字符个数:

int chars = Math.max(((mag + (shift - 1)) / shift), 1);

(shift - 1)的作用是当位数不能被radix整除时做的填充作用~。
然后进入处理函数:formatUnsignedLong(val, shift, buf, 0, chars);

    /*** Format a long (treated as unsigned) into a character buffer.* @param val the unsigned long to format* @param shift the log2 of the base to format in (4 for hex, 3 for octal, 1 for binary)* @param buf the character buffer to write to* @param offset the offset in the destination buffer to start at* @param len the number of characters to write* @return the lowest character location used*/static int formatUnsignedLong(long val, int shift, char[] buf, int offset, int len) {int charPos = len;int radix = 1 << shift;int mask = radix - 1;do {buf[offset + --charPos] = Integer.digits[((int) val) & mask];val >>>= shift;} while (val != 0 && charPos > 0);
return charPos;}

转换和itoa的差不多,数组从后往前存,先把shift转换成对应的真正的进制radix,掩码max的作用是每次去进制对应的最低位的bit数,并转化为对应的字符:

        do {buf[offset + --charPos] = Integer.digits[((int) val) & mask];val >>>= shift;} while (val != 0 && charPos > 0);

最后返回转化后的char数组的起始位置.

其他进制的转换用函数toUnsignedBigInteger(i).toString(radix),这个明天再看,有点晚了,休息啦~

6.15,来来来,看看toUnsignedBigInteger函数:

 /*** Return a BigInteger equal to the unsigned value of the* argument.*/private static BigInteger toUnsignedBigInteger(long i) {if (i >= 0L)return BigInteger.valueOf(i);else {int upper = (int) (i >>> 32);int lower = (int) i;// return (upper << 32) + lowerreturn (BigInteger.valueOf(Integer.toUnsignedLong(upper))).shiftLeft(32).add(BigInteger.valueOf(Integer.toUnsignedLong(lower)));}}