Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel ARC GPU - llama_model_load: can not find preferred GPU platform #3437

Open
Xav-v opened this issue Aug 30, 2024 · 0 comments
Open

Intel ARC GPU - llama_model_load: can not find preferred GPU platform #3437

Xav-v opened this issue Aug 30, 2024 · 0 comments
Labels
bug Something isn't working unconfirmed

Comments

@Xav-v
Copy link

Xav-v commented Aug 30, 2024

LocalAI version:
master / 2.1.20

Environment, CPU architecture, OS, and Version:
Docker environment on Debian 12, with Arc A380 GPU passthrough

Describe the bug
Running a model with llama backend, in my case Hermes-2-Pro-Mistral-7B.Q4_0.gguf gives an error llama_model_load: can not find preferred GPU platform.

To Reproduce
docker-compose file:

services:
  local-ai:
    container_name: local-ai
    image: quay.io/go-skynet/local-ai:master-sycl-f16-ffmpeg
    environment:
      - MODELS_PATH=/models
      - ZES_ENABLE_SYSMAN=1
      - DEBUG=true
      - GGML_SYCL_DEVICE=0
      - XPU=1
    volumes:
      - ./models:/models
    devices:
      - /dev/dri:/dev/dri
    group_add:
      - "105"

Use the above mentioned model, or also tested with others, and do chat request.

Logs

local-ai    | 6:24PM INF [llama-cpp] Attempting to load
local-ai    | 6:24PM INF Loading model 'Hermes-2-Pro-Mistral-7B.Q4_0.gguf' with backend llama-cpp
local-ai    | 6:24PM DBG Loading model in memory from file: /models/Hermes-2-Pro-Mistral-7B.Q4_0.gguf
local-ai    | 6:24PM DBG Loading Model Hermes-2-Pro-Mistral-7B.Q4_0.gguf with gRPC (file: /models/Hermes-2-Pro-Mistral-7B.Q4_0.gguf) (backend: llama-cpp): {backendString:llama-cpp model:Hermes-2-Pro-Mistral-7B.Q4_0.gguf threads:20 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0007fe008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
local-ai    | 6:24PM INF [llama-cpp] attempting to load with AVX2 variant
local-ai    | 6:24PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-avx2
local-ai    | 6:24PM DBG GRPC Service for Hermes-2-Pro-Mistral-7B.Q4_0.gguf will be running at: '127.0.0.1:33811'
local-ai    | 6:24PM DBG GRPC Service state dir: /tmp/go-processmanager1147068894
local-ai    | 6:24PM DBG GRPC Service Started
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr I0000 00:00:1725042289.385811   17159 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr I0000 00:00:1725042289.385983   17159 ev_epoll1_linux.cc:125] grpc epoll fd: 3
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr I0000 00:00:1725042289.386121   17159 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr I0000 00:00:1725042289.387113   17159 ev_epoll1_linux.cc:359] grpc epoll fd: 5
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr I0000 00:00:1725042289.387396   17159 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stdout Server listening on 127.0.0.1:33811
local-ai    | 6:24PM DBG GRPC Service Ready
local-ai    | 6:24PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:Hermes-2-Pro-Mistral-7B.Q4_0.gguf ContextSize:8192 Seed:69892966 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:20 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/Hermes-2-Pro-Mistral-7B.Q4_0.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /models/Hermes-2-Pro-Mistral-7B.Q4_0.gguf (version GGUF V3 (latest))
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   0:                       general.architecture str              = llama
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   1:                               general.name str              = jeffq
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   4:                          llama.block_count u32              = 32
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  11:                          general.file_type u32              = 2
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32032]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32032]   = [0.000000, 0.000000, 0.000000, 0.0000...
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32032]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 32000
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - kv  21:               general.quantization_version u32              = 2
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - type  f32:   65 tensors
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - type q4_0:  225 tensors
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_loader: - type q6_K:    1 tensors
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_vocab: special tokens cache size = 35
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_vocab: token to piece cache size = 0.1641 MB
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: format           = GGUF V3 (latest)
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: arch             = llama
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: vocab type       = SPM
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_vocab          = 32032
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_merges         = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: vocab_only       = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_ctx_train      = 32768
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_embd           = 4096
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_layer          = 32
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_head           = 32
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_head_kv        = 8
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_rot            = 128
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_swa            = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_embd_head_k    = 128
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_embd_head_v    = 128
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_gqa            = 4
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_embd_k_gqa     = 1024
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_embd_v_gqa     = 1024
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: f_norm_eps       = 0.0e+00
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: f_clamp_kqv      = 0.0e+00
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: f_logit_scale    = 0.0e+00
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_ff             = 14336
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_expert         = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_expert_used    = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: causal attn      = 1
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: pooling type     = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: rope type        = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: rope scaling     = linear
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: freq_base_train  = 10000.0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: freq_scale_train = 1
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: n_ctx_orig_yarn  = 32768
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: rope_finetuned   = unknown
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: ssm_d_conv       = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: ssm_d_inner      = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: ssm_d_state      = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: ssm_dt_rank      = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: ssm_dt_b_c_rms   = 0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: model type       = 7B
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: model ftype      = Q4_0
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: model params     = 7.24 B
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW) 
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: general.name     = jeffq
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: BOS token        = 1 '<s>'
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: EOS token        = 32000 '<|im_end|>'
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: UNK token        = 0 '<unk>'
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: LF token         = 13 '<0x0A>'
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: EOT token        = 32000 '<|im_end|>'
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llm_load_print_meta: max token length = 48
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_model_load: error loading model: can not find preferred GPU platform
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_load_model_from_file: failed to load model
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stderr llama_init_from_gpt_params: error: failed to load model '/models/Hermes-2-Pro-Mistral-7B.Q4_0.gguf'
local-ai    | 6:24PM DBG GRPC(Hermes-2-Pro-Mistral-7B.Q4_0.gguf-127.0.0.1:33811): stdout {"timestamp":1725042291,"level":"ERROR","function":"load_model","line":466,"message":"unable to load model","model":"/models/Hermes-2-Pro-Mistral-7B.Q4_0.gguf"}
local-ai    | 6:24PM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc = 

Additional context

@Xav-v Xav-v added bug Something isn't working unconfirmed labels Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
1 participant