Create – in your preferred language C# or VB.NET – a program which is able to read ANY file CSV (or at least 99% of them), assuming no prior knowledge about its structure (do not even assume to that a first line with variable names is necessarily present in the CSV: when not present, clearly, do some useful automatic naming).
Also, some data preprocessing should be carried out on the data (or a suitable subset) in order to empirically establish the most suitable type of data of each field and, thus, give a preliminary tentative choice of data types for the variable fields to the program user (which he can, then, try to change on the GUI at his will before attempting to read the file)
OPT 8_A. In the previous program 7_A, as a verification, plug the code you have already developed for computing the mean and the (univariate) statistical distribution, and allow the user to select any variable and compute the arithmetic mean (only when it makes sense) and the distribution. [Make this general enough, in anticipation of next homework program, where we will also add bivariate distributions and, in general, multivariate distributions, with various charts.]
VB.Net code
Source(Application7_n file folder)
https://drive.google.com/file/d/1vF2iwXClFHLlPakOl7rq1nIPLShcjanC/view?usp=sharing
Solution txt format
https://drive.google.com/file/d/1tw9dYnnNOFm-DER0wALCu-jWjXn5IJYu/view?usp=sharing
A little explaination
Take data with dialog
Imports Microsoft.VisualBasic.FileIO
...
Private Sub BtnDialog_Click(sender As Object, e As EventArgs) Handles BtnDialog.Click
Dim o As New OpenFileDialog
o.ShowDialog()
If o.FileName IsNot Nothing Then
Me.TxtFile.Text = o.FileName
End If
End Sub
...
Read the data.
First I read the first line
...
Public Sub Read_frist_line()
...
Using reader As New TextFieldParser(path)
reader.CommentTokens() = New String() {"#"}
reader.Delimiters = New String() {","}
reader.HasFieldsEnclosedInQuotes = True
Dim values() As String = reader.ReadFields
Dim i As Integer = 0
For Each v In values
TreeView1.Nodes(0).Nodes.Add(v).Tag = i
dicOfType.Add(i, Choose_type(v.GetType.ToString))
i += 1
Next
...
End Using
End Sub
...
I store the information building a treeview, so for check about ‘metadata’ I work on it:
...
Public Function Exist_metadata() As Boolean
If Me.CheckBox1.Checked Then
metadataFlag = False
Return False
Else
For tagNode = 0 To TreeView1.Nodes(0).Nodes.Count - 1
If Different_fromString(tagNode) Then
metadataFlag = False
Return False
End If
Next
End If
Return True
End Function
...
If they don’t exist change the tree.
For try to understand the right tipe, first of take all data i use
Empirically_type(reader) and Suitable_type()
when I read the first line.
...
Public Sub Suitable_type()
Dim nParent As Integer = Me.TreeView1.Nodes(0).Nodes.Count
For n = 0 To nParent - 1
Research_typeString(Me.TreeView1.Nodes(0).Nodes(n).Text, n)
Next
End Sub
...
With Research_typeString I try to understand information with some tipical name:
Imports System.Text.RegularExpressions
...
Public Sub Research_typeString(txt As String, n As Integer)
Dim dateMatch As New Regex("(\w*time\w*)|(\w*date\w*)", RegexOptions.IgnoreCase)
Dim soldMatch As New Regex("(\w*\$\w*)|(\w*\$\w*)", regexOptions.ignoreCase)
Dim numberMatch As New Regex("(\w*number\w*)", RegexOptions.IgnoreCase)
Dim unitMatch As New Regex("(\w*kg\w*)| (\w*cm\w*)|(\w*metre\w*)|(\w*height\w*)|(\w*weight\w*|(\w*length\w*)|(\w*width\w*))", RegexOptions.IgnoreCase)
If dateMatch.IsMatch(txt) Then
Dic_containKey(dicOfSuitType, n)
dicOfSuitType(n).Add(GetType(DateTime))
End If
...
End Sub
Instead, with empirically_type, I try to understand reading the second line of file, parsando ogni elemento.
Public Sub Empirically_type(reader As TextFieldParser)
Dim values() As String = reader.ReadFields()
For v = 0 To values.Count - 1
dicOfEmpType.Add(v, ParseString(values(v)).GetType)
Next
End Sub
dicoOfEmpType and dicOfSuitableType are used to store the recommended data, and then show it at the user when click on data.
Summing up, I read the first line, try to elaborate the data and only then do I read the other lines of the file, and complete the treeview.
I try also to write extra control for avoid bug or unexpected things.
Choose a variables in treeview i try to calculate avg and freq. distribution:





















