How to Make Java MRZ Detector with Dynamsoft Label Recognizer for Windows and Linux

Xiao Ling - Dec 16 '22 - - Dev Community

This article aims to help Java developers build desktop and server-side Java applications to detect machine-readable zones (MRZ) in passports, travel documents, and ID cards. You will see how to encapsulate Dynamsoft C++ OCR SDK into a Java Jar package and how to quickly create a command-line MRZ detector with a few lines of Java code.

Java Classes and Methods for MRZ Detection

Create NativeLabelRecognizer.java, NativeLoader.java, MrzResult.java and MrzParser.java.

  • NativeLabelRecognizer.java is a wrapper class for the native library. It loads the native library and calls the native methods. The primary native methods defined in NativeLabelRecognizer are as follows:

    public NativeLabelRecognizer() {
        nativePtr = nativeCreateInstance();
    }
    
    public void destroyInstance() {
        if (nativePtr != 0)
            nativeDestroyInstance(nativePtr);
    }
    
    public static int setLicense(String license) {
        return nativeInitLicense(license);
    }
    
    public ArrayList<MrzResult> detectFile(String fileName) {
        return nativeDetectFile(nativePtr, fileName);
    }
    
    public String getVersion() {
        return nativeGetVersion();
    }
    
    public int loadModel() throws IOException {
        ...
        return nativeLoadModel(nativePtr, targetPath);
    }
    
    private native static int nativeInitLicense(String license);
    
    private native long nativeCreateInstance();
    
    private native void nativeDestroyInstance(long nativePtr);
    
    private native ArrayList<MrzResult> nativeDetectFile(long nativePtr, String fileName);
    
    private native int nativeLoadModel(long nativePtr, String modelPath);
    

    The loadModel() method is special. It needs to dynamically update the model path

    specified in the JSON-formatted template file according to the extraction path of the Jar package. Gson can be used to load and update the JSON object.

    public int loadModel() throws IOException {
        String modeFile = "MRZ.json";
        String tempFolder = new File(System.getProperty("java.io.tmpdir")).getAbsolutePath();
        String targetPath = new File(tempFolder, modeFile).getAbsolutePath();
        // Modify the model path based on your own environment
        FileReader reader = new FileReader(targetPath);
        char[] chars = new char[1024];
        int len = 0;
        StringBuilder sb = new StringBuilder();
        while ((len = reader.read(chars)) != -1) {
            sb.append(new String(chars, 0, len));
        }
        String template = sb.toString();
        if (reader != null) {
            reader.close();
        }
    
        Gson gson = new Gson();
        JsonObject jsonObject = gson.fromJson(template, JsonObject.class);
        JsonArray array = jsonObject.get("CharacterModelArray").getAsJsonArray();
        JsonObject object = array.get(0).getAsJsonObject();
        String modelPath = object.get("DirectoryPath").getAsString();
    
        if (modelPath != null && modelPath.contains("model")) {
            object.addProperty("DirectoryPath", tempFolder);
        }
    
        FileWriter writer = new FileWriter(targetPath);
        writer.write(jsonObject.toString());
        writer.flush();
        writer.close();
    
        return nativeLoadModel(nativePtr, targetPath);
    }
    
  • NativeLoader.java is a utility class to extract MRZ OCR model files and C++ shared library files from the Jar package, as well as load the native libraries. All assets will be extracted to the temporary directory of users' operating system. MD5 checksum is used to compare the file changes.

    private static boolean extractResourceFiles(String dlrNativeLibraryPath, String dlrNativeLibraryName,
            String tempFolder) throws IOException {
        String[] filenames = null;
        if (Utils.isWindows()) {
            filenames = new String[] {"DynamsoftLicenseClientx64.dll",
            "vcomp140.dll",
            "DynamicPdfx64.dll", "DynamsoftLabelRecognizerx64.dll", "dlr.dll"};
        }
        else if (Utils.isLinux()) {
            filenames = new String[] {"libDynamicPdf.so", "libDynamsoftLicenseClient.so", "libDynamsoftLabelRecognizer.so", "libdlr.so"};
        }
    
        boolean ret = true;
    
        for (String file : filenames) {
            ret &= extractAndLoadLibraryFile(dlrNativeLibraryPath, file, tempFolder);
        }
    
        // Extract model files
        String modelPath = "/model";
        filenames = new String[] {"MRZ.json", "MRZ.caffemodel", "MRZ.txt", "MRZ.prototxt"};
        for (String file : filenames) {
            ret &= extractAndLoadLibraryFile(modelPath, file, tempFolder);
        }
        return ret;
    }
    
    static String md5sum(InputStream input) throws IOException {
        BufferedInputStream in = new BufferedInputStream(input);
    
        try {
            MessageDigest digest = java.security.MessageDigest.getInstance("MD5");
            DigestInputStream digestInputStream = new DigestInputStream(in, digest);
            for (; digestInputStream.read() >= 0;) {
    
            }
            ByteArrayOutputStream md5out = new ByteArrayOutputStream();
            md5out.write(digest.digest());
            return md5out.toString();
        } catch (NoSuchAlgorithmException e) {
            throw new IllegalStateException("MD5 algorithm is not available: " + e);
        } finally {
            in.close();
        }
    }
    
  • MrzResult.java is a Java class to store the MRZ detection results, including detection confidence, text, and coordinates.

    public class MrzResult {
        public int confidence;
        public String text;
        public int x1, y1, x2, y2, x3, y3, x4, y4;
    
        public MrzResult(int confidence, String text, int x1, int y1, int x2, int y2, int x3, int y3, int x4, int y4) {
            this.confidence = confidence;
            this.text = text;
            this.x1 = x1;
            this.y1 = y1;
            this.x2 = x2;
            this.y2 = y2;
            this.x3 = x3;
            this.y3 = y3;
            this.x4 = x4;
            this.y4 = y4;
        }
    }
    
  • MrzParser.java is a Java class to parse the MRZ detection results and decode the MRZ information. The MRZ information includes the document type, issuing country, document number, date of birth, and expiration date, which are stored in a com.google.gson.JsonObject object.

    JsonObject mrzInfo = new JsonObject();
    ...
    // Get issuing State infomation
    String  nation = line1.substring(2, 7);
    pattern = Pattern.compile("[0-9]");
    matcher = pattern.matcher(nation);
    if (matcher.matches()) return null;
    if (nation.charAt(nation.length() - 1) == '<') {
        nation = nation.substring(0, 2);
    }
    mrzInfo.addProperty("nationality", nation);
    // Get surname information
    line1 = line1.substring(5);
    int pos = line1.indexOf("<<");
    String  surName = line1.substring(0, pos);
    pattern = Pattern.compile("[0-9]");
    matcher = pattern.matcher(surName);
    if (matcher.matches()) return null;
    surName = surName.replace("<", " ");
    mrzInfo.addProperty("surname", surName);
    // Get givenname information
    String  givenName = line1.substring(surName.length() + 2);
    pattern = Pattern.compile("[0-9]");
    matcher = pattern.matcher(givenName);
    if (matcher.matches()) return null;
    givenName = givenName.replace("<", " ");
    givenName = givenName.trim();
    mrzInfo.addProperty("givenname", givenName);
    // Get passport number information
    String  passportNumber = "";
    passportNumber = line2.substring(0, 9);
    passportNumber = passportNumber.replace("<", " ");
    mrzInfo.addProperty("passportnumber", passportNumber);
    ...
    

When the Java classes are finished, we can automatically generate the JNI header file by running:

cd src/main/java
javah -o ../../../jni/NativeLabelRecognizer.h com.dynamsoft.dlr.NativeLabelRecognizer
Enter fullscreen mode Exit fullscreen mode

Write JNI Wrapper for Dynamsoft C++ OCR SDK

We create a CMake project to build a JNI wrapper with Dynamsoft Label Recognizer SDK.

Here is the CMakeLists.txt file:

cmake_minimum_required (VERSION 2.6)
project (dlr)
MESSAGE( STATUS "PROJECT_NAME: " ${PROJECT_NAME} )

find_package(JNI REQUIRED)
include_directories(${JNI_INCLUDE_DIRS})

MESSAGE( STATUS "JAVA_INCLUDE: " ${JAVA_INCLUDE})

# Check lib
if (CMAKE_HOST_WIN32)
    set(WINDOWS 1)
elseif(CMAKE_HOST_APPLE)
    set(MACOS 1)
elseif(CMAKE_HOST_UNIX)
    set(LINUX 1)
endif()

# Set RPATH
if(CMAKE_HOST_UNIX)
    SET(CMAKE_CXX_FLAGS "-std=c++11 -O3 -Wl,-rpath=$ORIGIN")
    SET(CMAKE_INSTALL_RPATH "$ORIGIN")
    SET(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
endif()

# Add search path for include and lib files
if(WINDOWS)
    link_directories("${PROJECT_SOURCE_DIR}/lib/win/" ${JNI_LIBRARIES}) 
elseif(LINUX)
    link_directories("${PROJECT_SOURCE_DIR}/lib/linux/" ${JNI_LIBRARIES})
endif()
include_directories("${PROJECT_BINARY_DIR}" "${PROJECT_SOURCE_DIR}/include/")


# Add the library
add_library(dlr SHARED NativeLabelRecognizer.cxx)
if(WINDOWS)
    target_link_libraries (${PROJECT_NAME} "DynamsoftLabelRecognizerx64")
else()
    target_link_libraries (${PROJECT_NAME} "DynamsoftLabelRecognizer" pthread)
endif()

# Set installation directory
set(CMAKE_INSTALL_PREFIX "${PROJECT_SOURCE_DIR}/../src/main/")
set(LIBRARY_PATH "java/com/dynamsoft/dlr/native")
if(WINDOWS)
    install (DIRECTORY "${PROJECT_SOURCE_DIR}/lib/win/" DESTINATION "${CMAKE_INSTALL_PREFIX}${LIBRARY_PATH}/win")
    install (TARGETS dlr DESTINATION "${CMAKE_INSTALL_PREFIX}${LIBRARY_PATH}/win")
elseif(LINUX)
    install (DIRECTORY "${PROJECT_SOURCE_DIR}/lib/linux/" DESTINATION "${CMAKE_INSTALL_PREFIX}${LIBRARY_PATH}/linux")
    install (TARGETS dlr DESTINATION "${CMAKE_INSTALL_PREFIX}${LIBRARY_PATH}/linux")
endif()
Enter fullscreen mode Exit fullscreen mode

This is a shared library project. The dlr library is built from the NativeLabelRecognizer.cxx file. All shared libraries will be installed to the src/main/java/com/dynamsoft/dlr/native directory after building:

mkdir build
cd build
cmake .. 
cmake --build . --config Release --target install
Enter fullscreen mode Exit fullscreen mode

The JNI methods are implemented in the NativeLabelRecognizer.cxx file:

  • Initialize the license:

    JNIEXPORT jint JNICALL Java_com_dynamsoft_dlr_NativeLabelRecognizer_nativeInitLicense(JNIEnv *env, jclass, jstring license)
        {
            const char *pszLicense = env->GetStringUTFChars(license, NULL);
            char errorMsgBuffer[512];
            // Click https://www.dynamsoft.com/customer/license/trialLicense/?product=dlr to get a trial license.
            int ret = DLR_InitLicense(pszLicense, errorMsgBuffer, 512);
            printf("DLR_InitLicense: %s\n", errorMsgBuffer);
            env->ReleaseStringUTFChars(license, pszLicense);
            return ret;
        }
    
  • Create the instance of Dynamsoft Label Recognizer:

    JNIEXPORT jlong JNICALL Java_com_dynamsoft_dlr_NativeLabelRecognizer_nativeCreateInstance(JNIEnv *, jobject)
    {
        return (jlong)DLR_CreateInstance();
    }
    
  • Destroy the instance of Dynamsoft Label Recognizer:

    JNIEXPORT void JNICALL Java_com_dynamsoft_dlr_NativeLabelRecognizer_nativeDestroyInstance(JNIEnv *, jobject, jlong handler)
    {
        if (handler)
        {
            DLR_DestroyInstance((void *)handler);
        }
    }
    
  • Load the model file:

    JNIEXPORT jint JNICALL Java_com_dynamsoft_dlr_NativeLabelRecognizer_nativeLoadModel(JNIEnv *env, jobject, jlong handler, jstring filename) 
    {
        const char *pFileName = env->GetStringUTFChars(filename, NULL);
        char errorMsgBuffer[512];
        int ret = DLR_AppendSettingsFromFile((void*)handler, pFileName, errorMsgBuffer, 512);
        printf("Load MRZ model: %s\n", errorMsgBuffer);
        env->ReleaseStringUTFChars(filename, pFileName);
        return ret;
    }
    
  • Detect MRZ from an image file and return a list of MRZ results:

    JNIEXPORT jobject JNICALL Java_com_dynamsoft_dlr_NativeLabelRecognizer_nativeDetectFile(JNIEnv *env, jobject, jlong handler, jstring filename)
    {
        jobject arrayList = NULL;
    
        jclass mrzResultClass = env->FindClass("com/dynamsoft/dlr/MrzResult");
        if (NULL == mrzResultClass)
            printf("FindClass failed\n");
    
        jmethodID mrzResultConstructor = env->GetMethodID(mrzResultClass, "<init>", "(ILjava/lang/String;IIIIIIII)V");
        if (NULL == mrzResultConstructor)
            printf("GetMethodID failed\n");
    
        jclass arrayListClass = env->FindClass("java/util/ArrayList");
        if (NULL == arrayListClass)
            printf("FindClass failed\n");
    
        jmethodID arrayListConstructor = env->GetMethodID(arrayListClass, "<init>", "()V");
        if (NULL == arrayListConstructor)
            printf("GetMethodID failed\n");
    
        jmethodID arrayListAdd = env->GetMethodID(arrayListClass, "add", "(Ljava/lang/Object;)Z");
        if (NULL == arrayListAdd)
            printf("GetMethodID failed\n");
    
        const char *pFileName = env->GetStringUTFChars(filename, NULL);
        int ret = DLR_RecognizeByFile((void *)handler, pFileName, "locr");
        if (ret)
        {
            printf("Detection error: %s\n", DLR_GetErrorString(ret));
        }
    
        DLR_ResultArray *pResults = NULL;
        DLR_GetAllResults((void *)handler, &pResults);
        if (!pResults)
        {
            return NULL;
        }
    
        int count = pResults->resultsCount;
        arrayList = env->NewObject(arrayListClass, arrayListConstructor);
    
        for (int i = 0; i < count; i++)
        {
            DLR_Result *mrzResult = pResults->results[i];
            int lCount = mrzResult->lineResultsCount;
            for (int j = 0; j < lCount; j++)
            {
                DM_Point *points = mrzResult->lineResults[j]->location.points;
                int x1 = points[0].x;
                int y1 = points[0].y;
                int x2 = points[1].x;
                int y2 = points[1].y;
                int x3 = points[2].x;
                int y3 = points[2].y;
                int x4 = points[3].x;
                int y4 = points[3].y;
    
                jobject object = env->NewObject(mrzResultClass, mrzResultConstructor, mrzResult->lineResults[j]->confidence, env->NewStringUTF(mrzResult->lineResults[j]->text), x1, y1, x2, y2, x3, y3, x4, y4);
                env->CallBooleanMethod(arrayList, arrayListAdd, object);
            }
        }
    
        // Release memory
        DLR_FreeResults(&pResults);
    
        env->ReleaseStringUTFChars(filename, pFileName);
        return arrayList;
    }
    

Build Java Jar Package with Resources and Dependencies

The target package should include Java classes, C++ library files, model files, and dependencies. By default, Maven will only include Java classes. To include C++ library files, model files, and dependencies, we need to add the following configuration to the pom.xml file:

<build>
    <resources>
        <resource>
            <directory>src/main/java</directory>
            <excludes>
                <exclude>**/*.md</exclude>
                <exclude>**/*.h</exclude>
                <exclude>**/*.lib</exclude>
                <exclude>**/*.java</exclude>
            </excludes>
        </resource>
        <resource>
            <directory>res</directory>
        </resource>
    </resources>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
        </plugin>
    </plugins>
</build>
Enter fullscreen mode Exit fullscreen mode
  • src/main/java is the directory that contains the native library files, which are installed after building the JNI wrapper.
  • res is the directory that contains the model files. Its structure is as follows:

    res
    │
    └───model
        ├───MRZ.caffemodel
        ├───MRZ.json   
        ├───MRZ.prototxt
        └───MRZ.txt
    
  • maven-assembly-plugin is used to build dependencies into the target package for easy deployment.

Finally, run the mvn install assembly:assembly command to generate a dlr-1.0.0-jar-with-dependencies.jar file.

Steps to Build a MRZ Detector in Java

Now, let's create a Java MRZ detector with a few lines of code.

  1. Get a 30-day FREE trial license of Dynamsoft Label Recognizer, and activate the license in the Java code.

    NativeLabelRecognizer.setLicense("DLS2eyJoYW5kc2hha2VDb2RlIjoiMjAwMDAxLTE2NDk4Mjk3OTI2MzUiLCJvcmdhbml6YXRpb25JRCI6IjIwMDAwMSIsInNlc3Npb25QYXNzd29yZCI6IndTcGR6Vm05WDJrcEQ5YUoifQ==");
    
  2. Create a NativeLabelRecognizer instance.

    NativeLabelRecognizer labelRecognizer = new NativeLabelRecognizer();
    
  3. Load the MRZ detection model:

    labelRecognizer.loadModel();
    
  4. Detect MRZ from an image file:

    ArrayList<MrzResult> results = (ArrayList<MrzResult>)labelRecognizer.detectFile(fileName);
    
  5. Get the MRZ information by decoding the MRZ lines:

    String[] lines = new String[results.size()];
    for (int i = 0; i < results.size(); i++) {
        lines[i] = results.get(i).text;
    }
    JsonObject info = MrzParser.parse(lines);
    

Try the Sample Code

java -cp target/dlr-1.0.0-jar-with-dependencies.jar  com.dynamsoft.dlr.Test images/1.png
Enter fullscreen mode Exit fullscreen mode

MRZ detector with Java OCR SDK

Source Code

https://github.com/yushulx/java-mrz-ocr-sdk

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player