How to use java to check text content contains chinese characters
This article will show how to use java to check text content contains chinese characters.
Use regular judgment text, is a common method to determine the Chinese characters is the same, just figure out the range of character sets on the good. Recently in doing a translation of something, one of the most basic needs is to determine whether the input text contains Chinese, if there is Chinese before calling google translate, or other translation API.
For the judgment of Chinese, in general, there are two kinds: Simplified Chinese, Traditional Chinese.
In the case of Chinese, the Unicode encoding of commonly used characters is roughly between U+4E00 and U+9FA5. This means that you can find most Chinese characters within this range. However, it should be noted that this range does not include all Chinese punctuation marks and other special characters.
One can use a programming language to detect whether a given text contains Simplified Chinese characters. For example, in Java, a regular expression can be used to match characters in the range of U+4E00 to U+9FA5 to determine whether the text contains Chinese characters.
Many characters in Simplified Chinese and Traditional Chinese overlap, so this range contains most Simplified Chinese and Traditional Chinese characters.
Here is the java code to implement the logic of the judgment:
public static boolean containsTraditionalChinese(String text) { Pattern pattern = Pattern.compile("[\\u4e00-\\u9fa5]"); Matcher matcher = pattern. Matcher matcher = pattern.matcher(text); return matcher.find(text); return matcher. return matcher.find(); }
From:Is Everything OK
Previous:How monitor postgresql
COMMENTS