Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print and store how many tokens were used in memory/logs #322

Closed
AntonOsika opened this issue Jun 22, 2023 · 8 comments
Closed

Print and store how many tokens were used in memory/logs #322

AntonOsika opened this issue Jun 22, 2023 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@AntonOsika
Copy link
Collaborator

In this way, we can also store this to benchmark results.

A huge increase in tokens will not be worth a minor improvement in benchmark resultss.

@AntonOsika AntonOsika added enhancement New feature or request good first issue Good for newcomers labels Jun 22, 2023
@yitron
Copy link

yitron commented Jun 22, 2023

this was exactly what i was trying to decipher today. unfortunately the information online on OpenAI Usage retrieval is quite limited

tried looking for some solutions that worked but on Langchain (i.e need some memory)

def run_chain(k, max_tokens, model_name, docs, user_question):
llm = ChatOpenAI(model_name=model_name, temperature=0, max_tokens=max_tokens)
chain = load_qa_chain(llm, chain_type="stuff")
with get_openai_callback() as cb:
response = chain.run(input_documents=docs[:k], question=user_question)
print(cb)
rounded_cost = extract_and_round_cost(cb)
st.write("Using " + model_name + ", " + f"${rounded_cost}")
st.write("---")
return response

def extract_and_round_cost(cb):
cb_str = str(cb)
cost_line = re.search(r"Total Cost (USD): $(.+)", cb_str)

if cost_line:
    cost_str = cost_line.group(1)
    cost = decimal.Decimal(cost_str)
    decimal.getcontext().rounding = decimal.ROUND_HALF_UP
    rounded_cost = round(cost, 3)
    return float(rounded_cost)

this basically prints the token usage and calculates costs.

output:

Tokens Used: 560
Prompt Tokens: 377
Completion Tokens: 183
Successful Requests: 1
Total Cost (USD): $0.0009315

not sure if this is what you are thinking about

@shubham-attri
Copy link
Contributor

@yitron when and @AntonOsika I looked upon your suggestion, tiktoken a module by OpenAI is a fast BPE tokeniser for use with OpenAI's models and the below code will print the code and store the value in a file token_count_log.txt which we can further plot graphs for benchtesting the results.

import tiktoken
import re
import decimal

def run_chain(k, max_tokens, model_name, docs, user_question):
llm = ChatOpenAI(model_name=model_name, temperature=0, max_tokens=max_tokens)
chain = load_qa_chain(llm, chain_type="stuff")
with get_openai_callback() as cb:
response = chain.run(input_documents=docs[:k], question=user_question)
print(cb)
rounded_cost = extract_and_round_cost(cb)
st.write("Using " + model_name + ", " + f"${rounded_cost}")
st.write("---")

    # Count tokens in the response
    token_count = tiktoken.count(response)
    print("Token count:", token_count)

    # Store token count in logs
    log_token_count(token_count)

return response

def extract_and_round_cost(cb):
cb_str = str(cb)
cost_line = re.search(r"Total Cost (USD): $(.+)", cb_str)

if cost_line:
    cost_str = cost_line.group(1)
    cost = decimal.Decimal(cost_str)
    decimal.getcontext().rounding = decimal.ROUND_HALF_UP
    rounded_cost = round(cost, 3)
    return float(rounded_cost)

def log_token_count(token_count):
with open("token_count_log.txt", "a") as f:
f.write(str(token_count) + "\n")

An example of the output in the log file and console :

Callback Object: <Callback object at 0x7f8a12b65470>
Using GPT-3.5, $0.035
Token count: 87

@AntonOsika would love to work on this issue, assign it to me

@UmerHA
Copy link
Collaborator

UmerHA commented Jun 27, 2023

@shubham-attri just fork the repo, make a change and submit a pr. that's the common way for open-source projects. :)

If you're not working on it, I'd quickly do the pr. Let me know!

@yitron
Copy link

yitron commented Jun 28, 2023

Oops just saw this! Is this assigned ? If not I can work on it..

@UmerHA
Copy link
Collaborator

UmerHA commented Jun 28, 2023

Hey, I now just went ahead an implemented it

@UmerHA
Copy link
Collaborator

UmerHA commented Jun 28, 2023

@yitron in open-source projects if an issue is not assigned, you can just do it. Don't wait for permission. ;)

@shubham-attri @yitron Feel free to ping me if you have any questions on contributing to open source!

@yitron
Copy link

yitron commented Jun 29, 2023

Hey got it! Just thinking (and not verifying) if it's done haha sorry!

AntonOsika added a commit that referenced this issue Jul 3, 2023
* Implemented logging token usage

Token usage is now tracked and logged into memory/logs/token_usage

* Step names are now inferred from function name

* Incorporated Anton's feedback

- Made LogUsage a dataclass
- For token logging, step name is now inferred via inspect module

* Formatted (black/ruff)

* Update gpt_engineer/ai.py

Co-authored-by: Anton Osika <anton.osika@gmail.com>

* formatting

---------

Co-authored-by: Anton Osika <anton.osika@gmail.com>
@AntonOsika
Copy link
Collaborator Author

Great job 🚀

70ziko pushed a commit to 70ziko/gpt-engineer that referenced this issue Oct 25, 2023
…gineer-org#438)

* Implemented logging token usage

Token usage is now tracked and logged into memory/logs/token_usage

* Step names are now inferred from function name

* Incorporated Anton's feedback

- Made LogUsage a dataclass
- For token logging, step name is now inferred via inspect module

* Formatted (black/ruff)

* Update gpt_engineer/ai.py

Co-authored-by: Anton Osika <anton.osika@gmail.com>

* formatting

---------

Co-authored-by: Anton Osika <anton.osika@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants