You are never going to have less data than you do today. Industry experts believe that upwards of 80% of all data that is created is no longer used after its creation. Now in the CAD world that number is generally lower, but is it? Inventor creates OldVersion folders, users pack-n-go archives, shared components get spread all over directories, Content Center files get used locally instead on on a network and then get copied around. File management can get out of hand pretty easily if not managed properly.
Products like Vault and Upchain exist to help keep track of your data more and reduce redundancies and duplication. Transitioning to a Product Data Management software tends to bring with it many questions about your current data structure though. Questions like…
- How many Inventor files do I have?
- Do I have any Name Duplicates?
- How many of the files are OldVersion files?
- How do I herd cats?
Okay, the last one is not serious but it is often compared to the problems in sorting out years of network storage and varying employee modeling and archiving practices to decide what is the actual source of truth for the engineering documents.
A couple years ago I did an Autodesk University class on mass loading and analyzing data to be loaded to Autodesk Vault and this blog post will touch on the first step of that process to better understand your current data.
Even if you are not planning a direct move to a PDM system like Vault, this report I am about to show you can still help you start to wrangle your current data into better practices by first exposing what you currently have and then working a process to resolve it. The overall goal of this though is to understand how much work might be ahead of you in a Vault or Upchain initiative to either load data all at one, some important data first, or just through garden variety attrition.
Step 1: Extract a Directory Listing
There are many ways to get a directory listing from your local drives, network, or other shares, and one of my favorite is using a software from Karenware called Directory Printer. I have used it for over a decade with getting these reports and it has yet to let me down. What we really need from the extraction is Filename (Path + Name), Filename, and Extension. If you have a very large directory you are scanning then just Filename (Path + Name) will suffice. (I’ll explain later).
Here are the common settings I use with this Directory Printer...
As you can see here, my extraction was local so I didn’t need to check the Show Network button. Also make sure you turn off some of the other Formatting and Save options to keep your Notepad file a bit smaller. If you used another method for directory extraction, that’s fine too, but here is my result…
Step 2: Stop the Bloat!
When you extract this listing to Notepad it is usually a very large text file. Excel manages data and compression a lot better than a simple Notepad file and will gives us all of our power for the next steps. Select all your content of the Notepad file and paste it into Excel in cell A1. This will drastically reduce your file size.
Step 3: Formulas and Filtering
I know that step makes me sound like a chemist, but our goal now is all about formulas and filtering of our data. Inside Excel I typically add a new Row 1 and add headers like Full Filename, Filename, Ext, Duplicate Check, and Find. If you already extracted your listing with the first three of those then you are well set. If you need to parse out the other columns, here are some formulas to help you.
- Formula to Obtain Filename: =MID(A2,FIND("*",SUBSTITUTE(A2,"\","*",LEN(A2)-LEN(SUBSTITUTE(A2,"\",""))))+1,LEN(A2))
- Formula to Obtain Extension: =RIGHT(B2,4)
- Double Click the little box on a cell to propagate the formula down
Once you have these first three columns populated its time for the harvesting of our data. With this extraction you received every single file in that directory regardless of its value. The next step I perform makes every step after this easier. Select all the contents of your Excel file columns B and C and select copy and then right click and paste them back in as values. This will greatly reduce the strain on Excel.
Now use the Filter command to narrow down your Extensions you want to keep. Keep in mind you might care about non-Inventor file types like PDFs but this is your report so use it as you wish.
Anything leftover should be deleted from Excel. This may take a bit of time, but it removes things we don't need to investigate and makes the Excel file smaller yet.
Step 4: Find OldVersions
In your Find column add a formula to search Column A for OldVersions. It should look like this:
Use the Filter and remove anything that is listed as #VALUE as that cell did not have any OldVersions in it. Delete the rest of the entries as they are OldVersion records.
You can also do subsequent Finds if you want to remove any other files from this list that you may already know about, such as Pack-N-Go archives or Design Data folders.
Step 5: Duplicate Check
This next check is arguably one of the most telling. This will expose any files that have been duplicated throughout your folder structure. I once had a client with 60,000 files to load and 5,000 of those files were all named Rubber.ipt and all were different geometries.
Start off by clearing all your filters, you should only have files you care about at this point. Sort your Excel columns using Filename as the key.
In the Duplicate Check column add this formula and propagate it down.
This will return a True/False for you to search on to determine how many named duplicates exist and where they are located. It is important at this step to take a long hard look at your data. Just because a file is named the same does not mean it truly is the same. Furthermore, Inventor does actually know the difference in the files due to an Internal Name property that Inventor gives all files. This is why you should not just blanket delete duplicates as it may damage your assemblies. There are ways to face these issues and ways to address duplicates.
Here is a link to shed some light on the topic. These are all things that can be addressed in one way or another during a transition to a PDM.
One potential solution a lot of our clients use is the IMAGINiT Server Utilities Metadata Utility to perform Batch Renames from Excel throughout their entire Vault. It will actually keep the files intact to their assemblies and is essentially a supersized Vault Rename function once all your data is loaded.
Hopefully this this process allows you to start examining your data and makes the herding of felines a more surmountable task.
About the AuthorFollow on Linkedin More Content by Mark Flayler