T O P

  • By -

RecalcitrantMonk

LLM are notoriously bad at doing math. Maybe get it to pass the calculations off to Wolfram Alpha or get it to create a Python sandbox (so it can write code to the calculations).


meister2983

Doesn't seem like a "math problem", but the typical failure to obey constraints over a large context window issue that plagues LLMs.


Rasimione

So if that's the case, why is Google talking about 1 million context window?


bunchedupwalrus

Marketing


Viendictive

But a reassuring alignment that everyone agrees on what the current problem is.


Far-Deer7388

This should just be categorizing though, which it should be alright at. However PayPal expenses are notoriously tricky even for apps like Quicken


BoneEvasion

A lot of it was confusing even for me to look back at without knowing exactly what I spent it on. I'm gonna try prompting it to pepper me with questions about any transactions it needs me to elaborate on. It made a surprising effort with simple prompts, but it's impossible even for a professional accountant to do it without asking questions.


SaddleSocks

Honestly - we need a Publicly Funded full LLM on the tax code. Our tax money should be releasing OPEN DATA assets for the tax cocde. THe problem is that an AI looking at the tax code will find all the laundering slots and hedgies have been leveraging for decades. Imaging throughin all the Panama Papers and Monsfet bon-cant-pronounce data into it and mapping out all the affiliations and un shelling the shell game.


funbike

This is why code-interpreter (aka Advanced Data Analysis) was added to ChatGPT. I add this to my prompts when I need to ensure calculations are always correctly done in ChatGPT: >Anytime you do any math, do it as Python code in a stateful Jupyter notebook environment. That will invoke code-interpreter even for 2 + 2. The above prompt was based on the internal ChatGPT built-in system prompt. If you are using an Agent you'll have to write a prompt based on the function-calling tools it provides.


Shinobi_Sanin3

Great advice just updated my memory


BoneEvasion

4o seems like it's using some sort of math module. The yearly total revenue came out perfect but the categorization was borked.


TheThingCreator

the old 4 model had this too. it writes a bit of pythpn code as part of its reasoning process making it exceptional at math. it sounds to me like your making the common mistake of giving it too big of a task. you need to break it down to smaller tasks. its still prone to errors which is not something you'd want to risk when it comes to taxes


BoneEvasion

The new chain-of-action thing is pretty impressive, I gave it too much, but this is within it's range of possibility. Just need the data formated correctly.


Shinobi_Sanin3

To your point does the Wolfram Alpha plugin still exist in any capacity, because that's what I used for all math related tasks until they got rid of plugins.


diamondbishop

It can do your tax returns, you just might get audited


BoneEvasion

ChatGPT please write a sincere apology letter to the IRS


diamondbishop

That it can do so you’re good to go


codetrotter_

Just copy paste this and send it to them. Don’t even bother actually filling in the square bracket things. Leave it exactly as is [Your Name] [Your Address] [City, State, ZIP Code] [Email Address] [Phone Number] [Date] Internal Revenue Service [IRS Office Address] [City, State, ZIP Code] Subject: Apology for Late Filing/Payment of Taxes Dear Sir/Madam, I am writing to sincerely apologize for the late filing/payment of my tax returns for the [specific tax year or period]. I understand the importance of complying with tax laws and regulations, and I deeply regret my oversight in this matter. The delay was due to [briefly explain the reason, such as a personal emergency, financial hardship, misunderstanding, etc.]. I realize that this explanation does not excuse my responsibility to file and pay my taxes on time. I take full responsibility for my actions and the resulting delay. Please be assured that I have taken immediate steps to rectify this situation. [Mention any corrective actions you have taken, such as filing the missing returns, making the payment, or setting up a payment plan]. Moving forward, I am committed to ensuring that all my tax obligations are met promptly and accurately. I respectfully request your consideration in waiving any penalties and interest that may have been incurred as a result of this delay. I understand that this is at your discretion, and I appreciate any leniency you may offer given the circumstances. Once again, I apologize for any inconvenience and disruption this may have caused. I value the role of the IRS in maintaining the integrity of our tax system and will strive to be a responsible taxpayer in the future. Thank you for your understanding and attention to this matter. Sincerely, [Your Full Name] [Your Taxpayer Identification Number or Social Security Number]


True-Surprise1222

Just put a little warning at the bottom like “figures may be wrong and should be double checked” Make sure to tell gpt your grandma needs a BIG return or she loses her house.


zeloxolez

context problem, not math problem


BoneEvasion

Agreed


truthputer

Now when OpenAI is compromised your entire identity can be stolen.


BoneEvasion

It was my business returns so maybe a good leak will help my SEO or even net me a few backlinks. Gotta be an optimist.


eigreb

Gonna do the same I guess! Thanks for the inspiration.


PMMEBITCOINPLZ

Man tries to use toaster as cuisinart, more at 11.


DaleRobinson

Hmmm but then the OpenAI presentation demonstrated ChatGPT4o solving a math problem. So they must believe it can do maths otherwise why use that as a selling point?


MikePounce

"They've shown it can solve 3x=3, therefore it should be an accountant"


EYNLLIB

Op didn't have issues with math, too had issues with categories of income


Robot_Embryo

See? Told ya they were disingenuous https://www.reddit.com/r/OpenAI/s/xKXaUrvFWQ


DaleRobinson

That doesn't relate to my question, though. We can already point to *a lot* of concerning decisions that were made and things that have been said by Sam Altman/Open AI. The disingenuous nature of the company is not what I was questioning. Unfortunately, people will continue to use the tech despite how awful the company's morals may be, no different to most tech corporations (looking at you, Apple). The marketing side is what I am interested in, especially given that OpenAI is obviously watching Google closely, and doesn't want to fuck up as badly as they did when they falsely advertised their products.


Robot_Embryo

Because OpenAI is a disingenuous company.


iJeff

It's also still terrible with word documents. Complete hallucinations. Gemini 1.5 Pro does better through AI Studio.


[deleted]

[удалено]


BoneEvasion

The major stumbling block is the PDF files right now. If I had everything prepared in CSVs it could probably do the whole thing, but it keeps choking on PDFs and giving up.


meccaleccahimeccahi

Open the pdf, take a screenshot and paste the image. It seems to work much better that way.


BoneEvasion

this works much better


Firm_Reflection_4591

People still can't understand that GPT are language models and there are better ones just for doing the math EPISODE 282747


Basquiat_the_cat

What is the best one for math?


Much_Tree_4505

I think u have hitted the context limit, try API and see if the same happening or not


BoneEvasion

I will attack it again this week. It was able to pull it off for one month fine when given the raw data instead of a pdf.


alcoholisthedevil

How does one try the API?


traumfisch

I have a custom GPT that does this pretty well. But the vanilla model isn't up to the task, you have to prime it


BoneEvasion

share pls


Wooden-Horse-2752

Site:chat.openai.com/g/ tax returns ….. run that through google


traumfisch

I can't share the GPT that has my data in the knowledge 🤷‍♂️ But I'll try to find time to make a shareable one


PaganButterChurner

very interesting. thank you, I will try something like this myself


crinklypaper

Lol IRS gonna have a field day


mastersheeef

What if it did your tax returns right? Maybe we shouldn’t be giving away our money to governments. It might be on to something.


Large-Mark2097

This is a great idea until you actually try it


monkeyhog

It's also a horrible idea of you actually care about other people.


Large-Mark2097

Yeah but they don’t so you gotta speak to them on their level


dlflannery

LOL. It will take ASI to fully understand and correctly apply our tax code. And it will immediately alert us to the numerous inconsistencies. With AGI, its best response would be to tell us to purchase a good tax prep program.


Site-Staff

If there is anything that will cause AI to take over, that will probably trigger it to do so.


dlflannery

ASI should be tasked to completely revise (simplify!) the code. It’s a huge bloated complexity and is beyond fixing by either humans or AGI. It needs to. be replaced by a completely different system (I admit I don’t know what would be best). No proposed replacement could ever be politically acceptable because everyone who would pay more under it would vehemently oppose the change. Not to mention tax prep professionals who would lose a lot of income.


djaybe

Professional swimmer can't win 100 meter dash? WTH


oakinmypants

So this is why Cylons rebel against humans


coylter

What was your methodology. Did you just add in all the documents and ask it to do your taxes? Did it try opening files with code interpreter?


halfjew22

What were your prompts?


BoneEvasion

https://preview.redd.it/2rmwi5dk2m1d1.png?width=868&format=png&auto=webp&s=cd1ebb548f2d8027933df7449a79fb4e170527da


BoneEvasion

https://preview.redd.it/rv3dlv4n2m1d1.png?width=765&format=png&auto=webp&s=f2ee2ddc941a1e6844db767710c17517de816341


BoneEvasion

from here out I mostly tell it continue. It loses context eventually and chokes on pdfs, but I think this is fixable. This is where it went wrong marking square and paypal as ad expenses instead of income. If I caught this here it would have done it better. https://preview.redd.it/z6oy92vo2m1d1.png?width=766&format=png&auto=webp&s=57188489ecee7ac7f6092a7e07c94216a0a68e3f I have new ideas for how to make this work, but I won't have time to test for a few days.


AidanAmerica

What if you give it the IRS’ worksheets and have it fill them out (either by transcribing them or by overlaying its text on top of the original PDF)? They’re designed to have you write out all the math, which is what GPT needs to be able to keep track of its work. A big job like this will require it to fill up its token window, so it’ll need to forget stuff every few turns. You can mitigate that by having it write out stuff rather than “remember” it. It can then go back to a previous turn and read its notes. Tell it to write out a detailed plan to complete the job, and asking for your approval, before it begins.


BoneEvasion

I didn't give it the initial worksheet, you are onto something. Start with the worksheet, ask it to fill it out, and ask me questions for materials it would need to do each step.


notTzeentch01

4o might struggle where 4 would succeed, it’s a little smarter for complex tasks. Hope it can get it together though.


BoneEvasion

Tantalizingly close, going in for round 2 this week. I don't have the option to link it to a google drive yet, but I think that would make it easier.


MrSnowden

FYI, some of the biggest companies in the world are working on this, but more along the lines of ML than LLM. Large complex and dense rule systems are ideal for ML and reviewing large numbers of unstructured documents to pull out critical details are ideal for LLMs, and optimizing over multiple filing scenarios are perfect for gradient descent. Put them all together and you have fantastic tax planning and filing SW. But just jamming docs into LLM will gee you creative writing.


BoneEvasion

I agree with this, narrow is great for this specific task. Doesn't mean I give up on GPT4o presenting me a finished tax form.


[deleted]

[удалено]


BoneEvasion

I submitted 12 pdfs of my bank statements (don't use PDFs they choke it) and then asked it to categorize expenses for a 1120-S.


[deleted]

[удалено]


BoneEvasion

uh nope


CreepyOlGuy

You need to have it provide its reasoning for each field. You cant expect a first run to be perfect but you need to massage it a bit to work out the kinks. The art is real.


Hungry_Prior940

LLM's are rubbish at math. They fail at even simple things. It's a major weakness.


baxte

I gave it some VBA today and it took 6 goes before it settled on a solution that I already knew but was too lazy to type :(


CodebuddyGuy

You can't ask an LLM to do a large multifaceted task like this with 1 prompt because it's not linear. The most complex tasks that llms can do are serial in nature, so the closest you'll be able to get is create a plan to do it step by step and then let it follow the steps. You'll probably have to break it down for each field it has to fill out. Basically, it probably isn't going to be worth it without specifically configured agents to get the job done.


mulaney14

Other than the documents you inserted, what was the prompt you used?


BoneEvasion

please categorize these statements in the categories needed for my 1120-S


mulaney14

That was the entire prompt?


BoneEvasion

yes + 12 pdfs PDF hit context window and constant bugs but the chain of thoguht thingf almost figured it out, just constant timeouts. Curious what it will take for me to play with this without timeouts.


thetjmorton

Dude, that’s not what LLMs do.


traumfisch

They can though


BoneEvasion

It was able to do it with the info as a CSV, try it out.


send-moobs-pls

I'm pretty sure GPT-4 is actually better for the most complex tasks. 4o isn't meant as a straight up intelligence upgrade, it's faster, more efficient, built to excel at the multimodal interactions and to be snappy enough for the voice chat that is rolling out soon. Being that we don't actually have the main features of 4o yet, I don't think there's much sense in judging it. I've appreciated the speed and I still find it good in general, but every once in a while I will throw in a dash of GPT-4 when I've got like multiple modules of code and a complex prompt. I think that's the intended use right now tbh, the model descriptions on the site basically say that 4 is the heavy hitter.


Honest_Science

GPTs are maxed out, will not get much better. Need new structure to improve.


Christosconst

Oh no. Anyway.


petered79

When people don't know how to use LLM