Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Frequency MS corresponds to #1233

Closed
Pushkaran-P opened this issue Mar 27, 2023 · 6 comments
Closed

Major Frequency MS corresponds to #1233

Pushkaran-P opened this issue Mar 27, 2023 · 6 comments
Assignees

Comments

@Pushkaran-P
Copy link

Pushkaran-P commented Mar 27, 2023

Hi all,

  • Environment Python 3.7, NeuralProphet version 0.5.3 installed from PYPI with pip install neuralprophet==0.5.3
  • My data's major frequency isnt close to 100%, i would like to know why is this the case, this is the log I get when running NeuralProphet() "Major frequency MS corresponds to 89.091% of the data."
  • The data i'm working with
    sample.csv
  • Code for replication
test_horizon = 3
ID = datasample[datasample.columns[0]].reset_index(drop=True)[0]
row = datasample.values[0][1:]
firstnonzeropos = row.nonzero()[0][0]
valdf = pd.DataFrame(row[firstnonzeropos:last_col_index].astype(float))
timedf = times[firstnonzeropos:].reset_index(drop=True)
df = pd.concat([timedf,valdf],axis=1)
df.columns=['ds','y']
df_train = df[:-test_horizon]
m = NeuralProphet(weekly_seasonality=False,daily_seasonality=False, yearly_seasonality="auto",
                 n_changepoints=30, n_lags=6, n_forecasts = 12+test_horizon,
                growth = "linear" )
metrics = m.fit(df_train, freq="MS",metrics=["MAE", "RMSE"])
print(metrics.tail(1))

Screenshot 2023-03-27 124553

future = m.make_future_dataframe(df_train, periods=m.n_forecasts)
forecast = m.predict(future)

Screenshot 2023-03-27 124850

Thanks in advance

@noxan
Copy link
Collaborator

noxan commented Mar 28, 2023

@Pushkaran-P thanks for providing feedback. I've tried the code snippet and dataset you posted, but it fails with ValueError: could not convert string to float: v1. Also when I try do dissect the code it seems to me that the loaded data might have the wrong format. Make sure to have one column with the time series information called y and one time related variable called ds (no additional columns are allowed).

@noxan noxan self-assigned this Mar 28, 2023
@Pushkaran-P
Copy link
Author

Pushkaran-P commented Mar 29, 2023

@noxan This is my sample code, it runs without any errors could u check again, also can you kindly suggest ways and methods to improve my forecast, thanks in advance !
sample.csv

import pandas as pd
import numpy as np
import time

from neuralprophet import NeuralProphet, set_log_level
datasample = pd.read_csv('sample.csv')
datasample.drop(['Unnamed: 0'],axis=1,inplace=True)
datasample.fillna(0,inplace=True)
test_horizon = 3
ID = datasample[datasample.columns[0]].reset_index(drop=True)[0]
row = datasample.values[0][1:]
firstnonzeropos = row.nonzero()[0][0]
last_col_index = 58
valdf = pd.DataFrame(row[firstnonzeropos:last_col_index].astype(float))
times = pd.to_datetime(datasample.columns[1:last_col_index+1], format='%Y%m').to_frame(index=False)
timedf = times[firstnonzeropos:].reset_index(drop=True)
df = pd.concat([timedf,valdf],axis=1)
df.columns=['ds','y']
df_train = df[:-test_horizon]
m = NeuralProphet(weekly_seasonality=False,daily_seasonality=False, yearly_seasonality="auto",
                 n_changepoints=30, n_lags=6, n_forecasts = 12+test_horizon,
                growth = "linear" )
metrics = m.fit(df_train, freq="MS",metrics=["MAE", "RMSE"])
print(metrics.tail(1))

Screenshot 2023-03-29 125244

future = m.make_future_dataframe(df_train, periods=m.n_forecasts+test_horizon)
forecast = m.predict(future)
forecast_plot = m.plot(forecast)
forecast_plot

Screenshot 2023-03-29 125439

Screenshot 2023-03-29 125538

@noxan
Copy link
Collaborator

noxan commented Mar 30, 2023

@Pushkaran-P thanks for providing your example code, it now works for me.

Regarding your question of the data frequency being detected for 89.091% of the dataset:

TL;DR: We do only match with months of 30 or 31 days - ignoring February - so your dataset is fine and matches the MS frequency, our calculation is not accurate.

Your dataset has data for 58 months (4.75 years). Our frequency detection method does use months with 30 and 31 days to match the frequency (lacks of months with 28 or 29 days). In your dataset there are 52 months with 30 or 31 days, there are 5 months with less days (February with 4 times 28 and 1 time 29 days) and one month which is being used as baseline (therefore ignored as it does not have any difference compared to itself). Respectively there are 52 / 38 months matching the MS (first day of month) frequency - which equals 89.655% of your dataset.

I'll have a second look why we do not respect months with less days (February) and why it's 89.655% vs 89.091%.

@noxan
Copy link
Collaborator

noxan commented Mar 30, 2023

if frequencies[np.argmax(distribution)] == 2.6784e15 or frequencies[np.argmax(distribution)] == 2.592e15:

@Pushkaran-P
Copy link
Author

@Pushkaran-P thanks for providing your example code, it now works for me.

Regarding your question of the data frequency being detected for 89.091% of the dataset:

TL;DR: We do only match with months of 30 or 31 days - ignoring February - so your dataset is fine and matches the MS frequency, our calculation is not accurate.

Your dataset has data for 58 months (4.75 years). Our frequency detection method does use months with 30 and 31 days to match the frequency (lacks of months with 28 or 29 days). In your dataset there are 52 months with 30 or 31 days, there are 5 months with less days (February with 4 times 28 and 1 time 29 days) and one month which is being used as baseline (therefore ignored as it does not have any difference compared to itself). Respectively there are 52 / 38 months matching the MS (first day of month) frequency - which equals 89.655% of your dataset.

I'll have a second look why we do not respect months with less days (February) and why it's 89.655% vs 89.091%.

@noxan would this cause any problems while fitting the model, as in cases where the algorithm might not factor in February data or cases where it does not respect months with less days

@noxan
Copy link
Collaborator

noxan commented Apr 7, 2023

@Pushkaran-P It does not make any difference for fitting the model, you're all good - only the logging message is misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants