- Search for image/text and short video materials.
- Efficiently analyze image/text and short video content, integrating scattered information.
- Provide content sources and decompose image/text and short video information, presenting information through content cards.
- Generate customized search results based on user interests and needs from image/text and short video content.
- Local deployment, enabling offline content search and Q&A for private data.
Directory
QMedia is an open-source multimedia AI content search engine , provides rich information extraction methods for text/image and short video content. It integrates unstructured text/image and short video information to build a multimodal RAG content Q&A system. The aim is to share and exchange ideas on AI content creation in an open-source manner. issues
Share QMedia with your friends.
Spark new ideas for content creation
Join our Discord community! | |
---|---|
Join our WeChat group ! |
-
- Display image/text and video content in the form of cards
Web Service
inspired by XHS web version, implemented using the technology stack of Typescript, Next.js, TailwindCSS, and Shadcn/UIRAG Search/Q&A Service
andImage/Text/Video Model Service
implemented using the Python framework and LlamaIndex applications- Web Service,
RAG Search/Q&A Service
, andImage/Text/Video Model Service
can be deployed separately for flexible deployment based on user resources, and can be embedded into other systems for image/text and video content extraction.
-
- Search for image/text and short video materials.
- Extract useful information from image/text and short video content based on user queries to generate high-quality answers.
- Present content sources and the breakdown of image/text and short video information through content cards.
- Retrieval and Q&A rely on the breakdown of image/text and short video content, including image style, text layout, short video transcription, video summaries, etc.
- Support Google content search.
-
Deployment of various types of models locally Separation from the RAG application layer, making it easy to replace different models Local model lifecycle management, configurable for manual or automatic release to reduce server load
Language Models:
- Support local Ollama model switching.
- llama3:8b-instruct Lightweight local deployment of LLM models.
- llama3:70b-instruct Eighth place in open-source LLM models.
Feature Embedding Models:
- Image Embedding: CLIP Encoder Convert images to text feature encoding.
- Text Embedding: BGE Encoder Multilingual embedded model, converting text to feature encoding, with local models aligned to GPT Encoder.
Image Models:
- Image Text OCR Recognition: Qanything Local Knowledge Base Q&A System OCR
-
Visual Understanding Models:
- llava-llama3: Ollama's locally deployed GPT-4V level visual understanding model.
Video Models
- Video Transcription:
- Faster Whisper: Quickly extract video transcription content, can run on local CPU.
- LLM-based Short Video Content Summarization
- Identification of highlights in short videos
- Recognition of short video style types
- Analysis and breakdown of short video content
- Support local Ollama model switching.
- Image/Text Short Video Content Analysis and Viral Content Breakdown
- Search for Similar Image/Text/Video
- Card Image/Text Content Generation
- Short Video Content Editing
QMedia services: Depending on resource availability, they can be deployed locally or the model services can be deployed in the cloud
-
Multimodal Model Service
mm_server
:-
Multimodal model deployment and API calls
-
Ollama LLM models
-
Image models
-
Video models
-
Feature embedding models
-
-
Content Search and Q&A Service
mmrag_server
:-
Content Card Display and Query
-
Image/Text/Short Video Content Extraction, Embedding, and Storage Service
-
Multimodal Data RAG Retrieval Service
-
Content Q&A Service
-
- Web Service
qmedia_web
: Language: TypeScript Framework: Next.js Styling: Tailwind CSS Components: shadcn/ui
mm_server
+ qmedia_web
+ mmrag_server
Web Page Content Display, Content RAG Search and Q&A, Model Service
- Service Startup Process:
# Start mm_server service
cd mm_server
source activate qllm
python main.py
# Start mmrag_server service
cd mmrag_server
source activate qmedia
python main.py
# Start qmedia_web service
cd qmedia_web
pnpm dev
- Using Functions via the Web Page
During the startup phase,
mmrag_server
will read pseudo data fromassets/medias
andassets/mm_pseudo_data.json
, and callmm_server
to extract and structure the information from text/image and short videos intonode
information, which is then stored in thedb
. The retrieval and Q&A will be based on the data in thedb
.
# assets file structure
assets
├── mm_pseudo_data.json # Content card data
└── medias # Image/Video files
Replace the contents in assets
and delete the historically stored db
file.
assets/medias
contains image/video files, which can be replaced with your own image/video files.
assets/mm_pseudo_data.json
contains content card data, which can be replaced with your own content card data. After running the service, the model will automatically extract the information and store it in the db
.
Can use the mm_server
local image/text/video information extraction service independently.
It can be used as a standalone image encoding, text encoding, video transcription extraction, and image OCR service, accessible via API in any scenario.
# Start mm_server service independently
cd mm_server
python main.py
# uvicorn main:app --reload --host localhost --port 50110
API Content:
Can use mm_server
+ qmedia_web
together to perform content extraction and RAG retrieval in a pure Python environment via APIs.
# Start mmrag_server service independently
cd mmrag_server
python main.py
# uvicorn main:app --reload --host localhost --port 50110
API Content:
QMedia
is licensed under MIT License
Thanks to QAnything for strong OCR models.
Thanks to llava-llama3 for strong llm vision models.