Image token process malfunction

It seems that in model_utils.py, only one <image> token is passed in pllava_answer.
https://github.com/magic-research/PLLaVA/blob/6f49fd281a855c4fa3103595ba2800a4dce7ec88/tasks/eval/model_utils.py#L139-L146

However, in eval_utils, the multiple <image> tokens are passed for the model to perform video inference. Is that a bug in model_utils.py?
https://github.com/magic-research/PLLaVA/blob/6f49fd281a855c4fa3103595ba2800a4dce7ec88/tasks/eval/eval_utils.py#L402-L410

	def pllava_answer(conv: Conversation, model, processor, img_list, do_sample=True, max_new_tokens=200, num_beams=1, min_length=1, top_p=0.9,
	repetition_penalty=1.0, length_penalty=1, temperature=1.0, stop_criteria_keywords=None, print_res=False):
	# torch.cuda.empty_cache()
	prompt = conv.get_prompt()
	inputs = processor(text=prompt, images=img_list, return_tensors="pt")
	if inputs['pixel_values'] is None:
	inputs.pop('pixel_values')
	inputs = inputs.to(model.device)

	def answer(self, conv: Conversation, img_list, max_new_tokens=200, num_beams=1, min_length=1, top_p=0.9,
	repetition_penalty=1.0, length_penalty=1, temperature=1.0):
	torch.cuda.empty_cache()
	prompt = conv.get_prompt()
	if prompt.count(conv.mm_token) < len(img_list):
	diff_mm_num = len(img_list) - prompt.count(conv.mm_token)
	for i in range(diff_mm_num):
	conv.user_query("", is_mm=True)
	prompt = conv.get_prompt()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image token process malfunction #78

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Image token process malfunction #78

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions