Introduction

I create different Generative AI examples in this blog post using NestJS and Gemini API. The examples generate text from 1) a text input 2) a prompt and an image, and 3) a prompt and two images to analyze them. Google team provided the examples in NodeJS and I ported several of them to NestJS, my favorite framework that builds on top of the popular Express framework.

Generate Gemini API Key

Go to https://aistudio.google.com/app/apikey to generate an API key on a new or an existing Google Cloud project

Create a new NestJS Project

nest new nestjs-gemini-api-demo

Install dependencies

npm i --save-exact @google/generative-ai @nestjs/swagger class-transformer class-validator dotenv compression

npm i --save-exact --save-dev @types/multer

Generate a Gemini Module

nest g mo gemini
nest g co gemini/presenters/http/gemini --flat
nest g s gemini/application/gemini --flat

Create a Gemini module, a controller, and a service for the API.

Define Gemini environment variables

In .env.example, it has environment variables for the Gemini API Key, Gemini Pro model, Gemini Pro Vision model, and port number

// .env.example

GEMINI_API_KEY=<google_gemini_api_key>
GEMINI_PRO_MODEL=gemini-pro
GEMINI_PRO_VISION_MODEL=gemini-pro-vision
PORT=3000

Copy .env.example to .env and replace the placeholder of GEMINI_API_KEY with the real API Key.

Add .env to the .gitignore file to ensure we don't accidentally commit the Gemini API Key to the GitHub repo

// .gitignore

.env

Add configuration files

The project has 3 configuration files. validate.config.ts validates the payload is valid before any request can route to the controller to execute

// validate.config.ts

import { ValidationPipe } from '@nestjs/common';

export const validateConfig = new ValidationPipe({
  whitelist: true,
  stopAtFirstError: true,
});

env.config.ts extracts the environment variables from process.env and stores the values in the env object.

// env.config.ts

import dotenv from 'dotenv';

dotenv.config();

export const env = {
  PORT: parseInt(process.env.PORT || '3000'),
  GEMINI: {
    KEY: process.env.GEMINI_API_KEY || '',
    PRO_MODEL: process.env.GEMINI_PRO_MODEL || 'gemini-pro',
    PRO_VISION_MODEL: process.env.GEMINI_PRO_VISION_MODEL || 'gemini-pro-vision',
  },
};

gemini.config.ts defines the options for the Gemini API

// gemini.config.ts

import { GenerationConfig, HarmBlockThreshold, HarmCategory, SafetySetting } from '@google/generative-ai';

export const GENERATION_CONFIG: GenerationConfig = { maxOutputTokens: 1024, temperature: 1, topK: 32, topP: 1 };

export const SAFETY_SETTINGS: SafetySetting[] = [
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
];

Bootstrap the application

// main.ts

function setupSwagger(app: NestExpressApplication) {
  const config = new DocumentBuilder()
    .setTitle('Gemini example')
    .setDescription('The Gemini API description')
    .setVersion('1.0')
    .addTag('google gemini')
    .build();
  const document = SwaggerModule.createDocument(app, config);
  SwaggerModule.setup('api', app, document);
}

async function bootstrap() {
  const app = await NestFactory.create<NestExpressApplication>(AppModule);
  app.enableCors();
  app.useGlobalPipes(validateConfig);
  app.use(express.json({ limit: '1000kb' }));
  app.use(express.urlencoded({ extended: false }));
  app.use(compression());
  setupSwagger(app);
  await app.listen(env.PORT);
}
bootstrap();

The bootstrap function registers middleware to the application, sets up Swagger documentation, and uses a global pipe to validate payloads.

I have laid down the groundwork and the next step is to add routes to receive Generative AI inputs to generate text

Example 1: Generate text from a prompt

// generate-text.dto.ts

import { ApiProperty } from '@nestjs/swagger';
import { IsNotEmpty, IsString } from 'class-validator';

export class GenerateTextDto {
  @ApiProperty({
    name: 'prompt',
    description: 'prompt of the question',
    type: 'string',
    required: true,
  })
  @IsNotEmpty()
  @IsString()
  prompt: string;
}

The DTO accepts a text prompt to generate text.

// gemini.constant.ts

export const GEMINI_PRO_MODEL = 'GEMINI_PRO_MODEL';
export const GEMINI_PRO_VISION_MODEL = 'GEMINI_PRO_VISION_MODEL';

// gemini.provider.ts

import { GenerativeModel, GoogleGenerativeAI } from '@google/generative-ai';
import { Provider } from '@nestjs/common';
import { env } from '~configs/env.config';
import { GENERATION_CONFIG, SAFETY_SETTINGS } from '~configs/gemini.config';
import { GEMINI_PRO_MODEL, GEMINI_PRO_VISION_MODEL } from './gemini.constant';

export const GeminiProModelProvider: Provider<GenerativeModel> = {
  provide: GEMINI_PRO_MODEL,
  useFactory: () => {
    const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
    return genAI.getGenerativeModel({
      model: env.GEMINI.PRO_MODEL,
      generationConfig: GENERATION_CONFIG,
      safetySettings: SAFETY_SETTINGS,
    });
  },
};

export const GeminiProVisionModelProvider: Provider<GenerativeModel> = {
  provide: GEMINI_PRO_VISION_MODEL,
  useFactory: () => {
    const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
    return genAI.getGenerativeModel({
      model: env.GEMINI.PRO_VISION_MODEL,
      generationConfig: GENERATION_CONFIG,
      safetySettings: SAFETY_SETTINGS,
    });
  },
};

I define two providers to provide the Gemini Pro model and the Gemini Pro Vision model respectively. Then I can inject these providers into the Gemini service.

// content.helper.ts

import { Content, Part } from '@google/generative-ai';

export function createContent(text: string, ...images: Express.Multer.File[]): Content[] {
  const imageParts: Part[] = images.map((image) => {
    return {
      inlineData: {
        mimeType: image.mimetype,
        data: image.buffer.toString('base64'),
      },
    };
  });

  return [
    {
      role: 'user',
      parts: [
        ...imageParts,
        {
          text,
        },
      ],
    },
  ];
}

createContent is a helper function that creates the content for the model.

// gemini.service.ts

// ... omit the import statements to save space

@Injectable()
export class GeminiService {

  constructor(
    @Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
    @Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
  ) {}

  async generateText(prompt: string): Promise<GenAiResponse> {
    const contents = createContent(prompt);

    const { totalTokens } = await this.proModel.countTokens({ contents });
    const result = await this.proModel.generateContent({ contents });
    const response = await result.response;
    const text = response.text();

    return { totalTokens, text };
  }

  ...
}

generateText method accepts a prompt and calls the Gemini API to generate the text. The method returns the total number of tokens and the text to the controller

// gemini.controller.ts

// omit the import statements to save space

@ApiTags('Gemini')
@Controller('gemini')
export class GeminiController {
  constructor(private service: GeminiService) {}

  @ApiBody({
    description: 'Prompt',
    required: true,
    type: GenerateTextDto,
  })
  @Post('text')
  generateText(@Body() dto: GenerateTextDto): Promise<GenAiResponse> {
    return this.service.generateText(dto.prompt);
  }

  ... other routes....
}

Example 2: Generate text from a prompt and an image

This example needs both the prompt and the image file

// gemini.service.ts

// ... omit the import statements to save space

@Injectable()
export class GeminiService {

  constructor(
    @Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
    @Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
  ) {}

 ... other methods ...

async generateTextFromMultiModal(prompt: string, file: Express.Multer.File): Promise<GenAiResponse> {
    try {
      const contents = createContent(prompt, file);

      const { totalTokens } = await this.proVisionModel.countTokens({ contents });
      const result = await this.proVisionModel.generateContent({ contents });
      const response = await result.response;
      const text = response.text();

      return { totalTokens, text };
    } catch (err) {
      if (err instanceof Error) {
        throw new InternalServerErrorException(err.message, err.stack);
      }
      throw err;
    }
  }
}

// file-validator.pipe.ts

import { FileTypeValidator, MaxFileSizeValidator, ParseFilePipe } from '@nestjs/common';

export const fileValidatorPipe = new ParseFilePipe({
  validators: [
    new MaxFileSizeValidator({ maxSize: 1 * 1024 * 1024 }),
    new FileTypeValidator({ fileType: new RegExp('image/[jpeg|png]') }),
  ],
});

Define fileValidatorPipe to validate that the uploaded file is either a JPEG or a PNG file, and that the file does not exceed 1MB.

// gemini.controller.ts

  @ApiConsumes('multipart/form-data')
  @ApiBody({
    schema: {
      type: 'object',
      properties: {
        prompt: {
          type: 'string',
          description: 'Prompt',
        },
        file: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
      },
    },
  })
  @Post('text-and-image')
  @UseInterceptors(FileInterceptor('file'))
  async generateTextFromMultiModal(
    @Body() dto: GenerateTextDto,
    @UploadedFile(fileValidatorPipe)
    file: Express.Multer.File,
  ): Promise<GenAiResponse> {
    return this.service.generateTextFromMultiModal(dto.prompt, file);
  }

file is the key that provides the binary file in the form data

Example 3: Analyze two images

This example is similar to example 2 except it needs a prompt and 2 images for comparison and contrast.

// gemini.service.ts

async analyzeImages({ prompt, firstImage, secondImage }: AnalyzeImage): Promise<GenAiResponse> {
    try {
      const contents = createContent(prompt, firstImage, secondImage);

      const { totalTokens } = await this.proVisionModel.countTokens({ contents });
      const result = await this.proVisionModel.generateContent({ contents });
      const response = await result.response;
      const text = response.text();

      return { totalTokens, text };
    } catch (err) {
      if (err instanceof Error) {
        throw new InternalServerErrorException(err.message, err.stack);
      }
      throw err;
    }
  }

// gemini.controller.ts

@ApiConsumes('multipart/form-data')
  @ApiBody({
    schema: {
      type: 'object',
      properties: {
        prompt: {
          type: 'string',
          description: 'Prompt',
        },
        first: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
        second: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
      },
    },
  })
  @Post('analyse-the-images')
  @UseInterceptors(
    FileFieldsInterceptor([
      { name: 'first', maxCount: 1 },
      { name: 'second', maxCount: 1 },
    ]),
  )
  async analyseImages(
    @Body() dto: GenerateTextDto,
    @UploadedFiles()
    files: {
      first?: Express.Multer.File[];
      second?: Express.Multer.File[];
    },
  ): Promise<GenAiResponse> {
    if (!files.first?.length) {
      throw new BadRequestException('The first image is missing');
    }

    if (!files.second?.length) {
      throw new BadRequestException('The second image is missing');
    }
    return this.service.analyzeImages({ prompt: dto.prompt, firstImage: files.first[0], secondImage: files.second[0] });
  }

first is the key that provides the first binary file in the form data and second is another key that provides the second binary file.

Test the endpoints

I can test the endpoints with Postman or Swagger documentation after starting the application

npm run start:dev

The URL of the Swagger documentation is http://localhost:3000/api

(Bonus) Deploy to Google Cloud Run

Install the gcloud CLI on the machine according to the official documentation. On my machine, the installation path is ~/google-cloud-sdk.

Then, I open a new terminal and change to the root of the project. On the command line, I update the environment variables before the deployment

$ ~/google-cloud-sdk/bin/gcloud run deploy \ 
--update-env-vars GEMINI_API_KEY=<replace with your own key>,GEMINI_PRO_MODEL=gemini-pro,GEMINI_PRO_VISION_MODEL=gemini-pro-vision

If the deployment is successful, the NestJS application will run on Google Cloud Run.

This is the end of the blog post that analyzes data retrieval patterns in Angular. I hope you like the content and continue to follow my learning experience in Angular, NestJS, and other technologies.

Resources:

1. - Github Repo: https://github.com/railsstudent/nestjs-gemini-api-demo
2. - NodeJS Gemini tutorials: https://ai.google.dev/tutorials/node_quickstart
3. - Cloud run deploy documentation: https://cloud.google.com/run/docs/deploying-source-code

Getting started with Gemini API using NestJS